Instructions to use BSC-LT/ALIA-40b-fc-2605 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use BSC-LT/ALIA-40b-fc-2605 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="BSC-LT/ALIA-40b-fc-2605") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("BSC-LT/ALIA-40b-fc-2605") model = AutoModelForCausalLM.from_pretrained("BSC-LT/ALIA-40b-fc-2605") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use BSC-LT/ALIA-40b-fc-2605 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "BSC-LT/ALIA-40b-fc-2605" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "BSC-LT/ALIA-40b-fc-2605", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/BSC-LT/ALIA-40b-fc-2605
- SGLang
How to use BSC-LT/ALIA-40b-fc-2605 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "BSC-LT/ALIA-40b-fc-2605" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "BSC-LT/ALIA-40b-fc-2605", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "BSC-LT/ALIA-40b-fc-2605" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "BSC-LT/ALIA-40b-fc-2605", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use BSC-LT/ALIA-40b-fc-2605 with Docker Model Runner:
docker model run hf.co/BSC-LT/ALIA-40b-fc-2605
WARNING: This model has been trained on instructions but has not undergone safety or value alignment.
Work In Progress: New versions will be released over the coming months.
ALIA-40b-fc Model Card
The ALIA-40b-fc-2605 model is a fine-tuned variant of a context-extended base ALIA-40b model, which was pre-trained from scratch on 9.83 trillion tokens of carefully curated data spanning 35 European languages (including code). This version is primarily optimized for robust, reliable function calling, while still capable of following user prompts and engaging in multi-turn dialogue.
In keeping with our commitment to open-source development, all tools and sources used to process and create the training data are open-licensed. For clarity, our definition of open-licensed excludes any source, tool, model, or dataset whose terms of use impose restrictive conditions that impede standard open reuse.
This model is released under the permissive Apache 2.0 license.
To visit the model cards of other model versions, please refer to the Model Index.
Model Details
Description
The ALIA-40b is a transformer-based, decoder-only language model that was pre-trained from scratch on 9.37 trillion tokens of meticulously curated data. It subsequently underwent continued pretraining on additional 424 billion high-quality tokens, and was further extended with a supplementary 39 billion tokens drawn from a similarly diverse mixture, totalling 9.83 trillion tokens.
ALIA-40b-fc is an fine-tuned variant of ALIA-40b. Its development process comprises, in contrast to previous version, only two consecutive stages, each targeting a specific capability: (1) long-context adaptation to extend the model’s context window, (2) supervised fine-tuning to improve function calling capabilities. This means that this checkpoint has not yet undergone an alignment process, unlike previous versions.
After long-context adaptation, our post-training process consists of a supervised fine-tuning (SFT) stage to strengthen function calling and include conversational capabilities.
Although the base model is highly multilingual, the post-training process focused primarily on English due to the limited availability of high-quality datasets in other languages. Evaluation coverage outside English also remains limited. Future releases aim to further strengthen multilingual capabilities through the generation of high-quality synthetic data.
Hyperparameters
Here we list the specific hyperparameters used during the different training stages.
Long context CPT
| Hyperparameter | Value |
|---|---|
| Learning rate | 9e-7 |
| LR Scheduler | Constant |
| Tokens per update | 4M |
| Training tokens (4k →32k). | 2B |
| Training tokens (32k →160k). | 36.8B |
Supervised Fine-Tuning (SFT)
| Hyperparameter | Value |
|---|---|
| Learning rate | 5e-6 |
| Batch size | 256 |
| Epochs | 1 |
| LR Scheduler | Cosine |
| Warmup Ratio | 4 % |
| Total Steps | 5,687 |
Architecture
| Attribute | Value |
|---|---|
| Total Parameters | 40,433,885,184 |
| Embedding Parameters | 2,097,152,000 |
| Layers | 48 |
| Hidden size | 8,192 |
| Attention heads | 64 |
| Context length | 163,840 |
| Vocabulary size | 256,000 |
| Precision | bfloat16 |
| Embedding type | RoPE |
| Activation Function | SwiGLU |
| Layer normalization | RMS Norm |
| Flash attention | ✅ |
| Grouped Query Attention | ✅ |
| Num. query groups | 8 |
Intended Use
Direct Use
ALIA‑40b‑fc is primarily optimized for robust and reliable function calling in tool-augmented and multi-turn conversational settings, while remaining capable of supporting other general-purpose language tasks. As with all models in the ALIA family, it is released openly to support both research and commercial use in any of the covered languages.
Out-of-scope Use
The model is not intended for malicious activities, such as harming others or violating human rights. Any downstream application must comply with current laws and regulations. Irresponsible usage in production environments without proper risk assessment and mitigation is also discouraged.
Hardware and Software
Training Framework
The post-training process was conducted in NeMo-RL, with minor modifications to adapt it to our infraestructure.
Compute Infrastructure
All models were trained on MareNostrum 5, a pre-exascale EuroHPC supercomputer hosted and operated by Barcelona Supercomputing Center.
The accelerated partition is composed of 1,120 nodes with the following specifications:
- 4x Nvidia Hopper GPUs with 64GB HBM2 memory
- 2x Intel Sapphire Rapids 8460Y+ at 2.3Ghz and 32c each (64 cores)
- 4x NDR200 (BW per node 800Gb/s)
- 512 GB of Main memory (DDR5)
- 460GB of NVMe storage
The SFT stage was run across 8 nodes with a total of 32 GPUs.
How to use
The model can be used either directly in Python using the transformers library or deployed as a service and used through standard API calls.
While the former gives the most control over the inference process it requires the code to be executed on a machine with a sufficiently powerful GPU to run the model locally, and is more error prone than the alternative. We therefore strongly recommend the latter, as deploying the model as a service can be done either locally or on a remote server and makes the model available to multiple clients in parallel among other advantages.
Unless you have very specific needs (e.g. for research) that require adapting the inference process it is preferable to follow the "deployment as a service" guidelines below.
In any case, we recommend using a temperature setting close to zero (0.0–0.2) to achieve optimal performance.
Local inference with Python / transformers
The model utilizes the widely adopted ChatML template to structure conversational inputs and outputs. Using this standardized chat format ensures a consistent and enhanced conversational experience. The template can be easily applied through the tokenizer’s built-in functions, as illustrated in the example snippet below:
import torch
from datetime import datetime
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "BSC-LT/ALIA-40b-fc-2605"
text = "What is the weather like in Paris today?"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
torch_dtype=torch.bfloat16
)
message = [ { "role": "user", "content": text } ]
tools = [{
"type": "function",
"name": "get_weather",
"description": "Get current temperature for a given location.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City and country e.g. Bogotá, Colombia"
}
},
"required": [
"location"
],
"additionalProperties": False
}
}]
prompt = tokenizer.apply_chat_template(
message,
tokenize=False,
add_generation_prompt=True,
tools=tools
)
inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=1000)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Output:
<tool_call>
{"name": "get_weather", "arguments": {"location": "Paris, France"}}
</tool_call>
Deployment as service and remote use (Messages API)
- Deploy the model using vLLM docker image:
docker run --runtime nvidia --gpus all \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HUGGING_FACE_HUB_TOKEN=<secret>" \
-p 80:80 \
vllm/vllm-openai:latest \
--model BSC-LT/salamandra-7b-instruct-tools \
--enable-auto-tool-choice \
--tool-call-parser hermes \
--max_model_len 8196 \
--port 80
- Once the deployment is running, interact with the model through the OpenAI-compatible API:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/v1/",
api_key="hf_xxxx"
)
models = client.models.list()
model = models.data[0].id
system_message = ""
messages = [{ "role": "system", "content": system_message}] if system_message else []
messages.append( {"role":"user", "content": "What is the weather like in Paris today?"})
print(messages)
chat_completion = client.chat.completions.create(
model=model,
tools=tools
messages=messages,
stream=False,
max_tokens=1000,
temperature=0.1,
frequency_penalty=0.2,
)
msg = chat_completion.choices[0].message
# --- HANDLE TOOL CALL OR NORMAL CONTENT ---
if not getattr(msg, "tool_calls", None):
# Normal assistant message
print(msg.content)
messages.append({
"role": "assistant",
"content": msg.content
})
else:
# Assistant tool call message
print(msg.tool_calls)
messages.append({"role": "assistant", "tool_calls": msg.tool_calls})
# --- Fake tool execution example ---
tool_call = msg.tool_calls[0]
# Example: handle the get_weather tool
if tool_call.function.name == "get_weather":
# Fake tool result (this would come from your actual backend)
fake_tool_result = '{"temperature": 18, "unit": "C", "description": "Partly cloudy in Paris"}'
# Append the tool result message so the model can use it in the next turn
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"name": tool_call.function.name,
"content": fake_tool_result,
})
Training Data
The dataset used in the supervised fine-tuning stage is built from a mixture of high-quality, permissively licensed datasets developed by third parties and synthetic data generated in-house using DeepSeek-V3-0324.
The table below provides a detailed breakdown of the datasets included in this mixture:
| Dataset | Generation Method | License | Instances |
|---|---|---|---|
| nvidia/When2Call | Synthetic | cc-by-4.0 | 14,800 |
| Salesforce/xlam-function-calling-60k | Synthetic | cc-by-4.0 | 59,800 |
| glaiveai/glaive-function-calling-v2 | Synthetic | apache-2.0 | 102,891 |
| Team-ACE/ToolACE | Synthetic | apache-2.0 | 11,068 |
| Agent-Ark/Toucan-1.5M | Synthetic | apache-2.0 | 119,079 |
| allenai/Dolci-Instruct-SFT-Tool-Use-SA | Synthetic | cc-by-sa-4.0 | 1,369 |
| In-house function calling data (synthetically generated) | Synthetic | apache-2.0 | 19,227 |
| Instruction-tuning data (see ALIA-40b-instruct) | Mix | apache-2.0 | 399,800 |
| Total | 728,034 |
Note: Counts may differ slightly from the original datasets due to quality filtering (e.g., removal of poorly formatted or invalid samples) and because a small portion of each dataset was held out for validation purposes (total of 2,000 instances).
Evaluation
The model’s function-calling (FC) capabilities were evaluated using the BFCL benchmark, which is widely regarded as a standard and comprehensive suite for assessing tool-use and function invocation performance in large language models.
| Metric | Category | Score |
|---|---|---|
| Simple AST | Non-Live | 71.0% |
| Multiple AST | Non-Live | 94.5% |
| Parallel AST | Non-Live | 80.5% |
| Parallel Multiple AST | Non-Live | 81.5% |
| Simple AST | Live | 74.8% |
| Multiple AST | Live | 74.4% |
| Parallel AST | Live | 56.3% |
| Parallel Multiple AST | Live | 70.8% |
| Base | Multi-Turn | 15.5% |
| Miss Func | Multi-Turn | 2.0% |
| Miss Param | Multi-Turn | 12.0% |
| Long Context | Multi-Turn | 7.0% |
| Relevance Detection | Hallucination | 81.3% |
| Irrelevance Detection | Hallucination | 84.0% |
Ethical Considerations and Limitations
The ALIA-40b-fc model is an instruction-tuned variant. It has several limitations that users should be aware of. Ongoing work is addressing these areas, including comprehensive evaluation of societal and cognitive biases as well as safety.
Functional Limitations:
- Reasoning & Math: The model is not guaranteed to perform robust chain-of-thought reasoning or advanced mathematics. Complex logical puzzles or multi-step inferences may fail or produce inconsistent answers.
- Code Generation: Although exposed to code during pretraining, ALIA-40b-fc is not a specialized code-generation model. It may produce code-like text, but outputs should be verified and tested before use in production codebases.
- Agentive Capabilities: The model does not have agentive or autonomous action capabilities. It cannot act as an autonomous agent or execute multi-step workflows.
Recommendations:
Developers should implement additional safety filters, human oversight, targeted evaluation suites, and secondary evaluation models when deploying this model. Do not deploy ALIA-40b-fc in critical applications without extensive testing and mitigation. Users are responsible for assessing and mitigating harmful behavior or misinformation resulting from model outputs, and ensuring compliance with applicable regulations, including those governing the use of Artificial Intelligence.
Additional information
Author
The Language Modeling team from AI Institute at Barcelona Supercomputing Center.
Contact
For further information, please send an email to ai_institute_languagemodeling@bsc.es.
Copyright
Copyright(c) 2026 by The Language Modeling team from AI Institute at Barcelona Supercomputing Center.
Funding
This work is funded by the Ministerio para la Transformación Digital y de la Función Pública - Funded by EU – NextGenerationEU within the framework of the project Modelos del Lenguaje.
This work has been promoted and supported by the Government of Catalonia through the Aina Project.
Acknowledgements
This project has benefited from the contributions of numerous teams and institutions, mainly through data contributions, knowledge transfer or technical support.
We are especially grateful to our ILENIA project partners: CENID, HiTZ and CiTIUS for their participation. We also extend our genuine gratitude to the Spanish Senate and Congress, Fundación Dialnet, and the ‘Instituto Universitario de Sistemas Inteligentes y Aplicaciones Numéricas en Ingeniería (SIANI)’ of the University of Las Palmas de Gran Canaria. Many other institutions have been involved in the project. Our thanks to Òmnium Cultural, Parlament de Catalunya, Institut d'Estudis Aranesos, Racó Català, Vilaweb, ACN, Nació Digital, El món and Aquí Berguedà. We thank the Welsh government, DFKI, Occiglot project, especially Malte Ostendorff, and The Common Crawl Foundation, especially Pedro Ortiz, for their collaboration.
We would also like to give special thanks to the NVIDIA team, with whom we have met regularly, especially to: Marcelo Sanchez, Ignacio Sarasua, Adam Henryk Grzywaczewski, Oleg Sudakov, Sergio Perez, Miguel Martinez, Felipe Soares and Meriem Bendris. Their constant support has been especially appreciated throughout the entire process.
Their valuable efforts have been instrumental in the development of this work.
Disclaimer
Be aware that the model may show biases or other unintended distortions. When third parties deploy systems or provide services based on this model, or use the model themselves, they bear the responsibility for mitigating any associated risks and ensuring compliance with applicable regulations, including those governing the use of Artificial Intelligence.
The Barcelona Supercomputing Center, as the owner and creator of the model, shall not be held liable for any outcomes resulting from third-party use.
Citation
@misc{gonzalezagirre2025salamandratechnicalreport,
title={Salamandra Technical Report},
author={Aitor Gonzalez-Agirre and Marc Pàmies and Joan Llop and Irene Baucells and Severino Da Dalt and Daniel Tamayo and José Javier Saiz and Ferran Espuña and Jaume Prats and Javier Aula-Blasco and Mario Mina and Adrián Rubio and Alexander Shvets and Anna Sallés and Iñaki Lacunza and Iñigo Pikabea and Jorge Palomar and Júlia Falcão and Lucía Tormo and Luis Vasquez-Reina and Montserrat Marimon and Valle Ruíz-Fernández and Marta Villegas},
year={2025},
eprint={2502.08489},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.08489},
}
License
Model Index
- Downloads last month
- 148
