WARNING: This model has been trained on instructions but has not undergone safety or value alignment.

Work In Progress: New versions will be released over the coming months.

ALIA-40b-fc Model Card

The ALIA-40b-fc-2605 model is a fine-tuned variant of a context-extended base ALIA-40b model, which was pre-trained from scratch on 9.83 trillion tokens of carefully curated data spanning 35 European languages (including code). This version is primarily optimized for robust, reliable function calling, while still capable of following user prompts and engaging in multi-turn dialogue.

In keeping with our commitment to open-source development, all tools and sources used to process and create the training data are open-licensed. For clarity, our definition of open-licensed excludes any source, tool, model, or dataset whose terms of use impose restrictive conditions that impede standard open reuse.

This model is released under the permissive Apache 2.0 license.

To visit the model cards of other model versions, please refer to the Model Index.


Model Details

Description

The ALIA-40b is a transformer-based, decoder-only language model that was pre-trained from scratch on 9.37 trillion tokens of meticulously curated data. It subsequently underwent continued pretraining on additional 424 billion high-quality tokens, and was further extended with a supplementary 39 billion tokens drawn from a similarly diverse mixture, totalling 9.83 trillion tokens.

ALIA-40b-fc is an fine-tuned variant of ALIA-40b. Its development process comprises, in contrast to previous version, only two consecutive stages, each targeting a specific capability: (1) long-context adaptation to extend the model’s context window, (2) supervised fine-tuning to improve function calling capabilities. This means that this checkpoint has not yet undergone an alignment process, unlike previous versions.

After long-context adaptation, our post-training process consists of a supervised fine-tuning (SFT) stage to strengthen function calling and include conversational capabilities.

Although the base model is highly multilingual, the post-training process focused primarily on English due to the limited availability of high-quality datasets in other languages. Evaluation coverage outside English also remains limited. Future releases aim to further strengthen multilingual capabilities through the generation of high-quality synthetic data.

Hyperparameters

Here we list the specific hyperparameters used during the different training stages.

Long context CPT

Hyperparameter Value
Learning rate 9e-7
LR Scheduler Constant
Tokens per update 4M
Training tokens (4k →32k). 2B
Training tokens (32k →160k). 36.8B

Supervised Fine-Tuning (SFT)

Hyperparameter Value
Learning rate 5e-6
Batch size 256
Epochs 1
LR Scheduler Cosine
Warmup Ratio 4 %
Total Steps 5,687

Architecture

Attribute Value
Total Parameters 40,433,885,184
Embedding Parameters 2,097,152,000
Layers 48
Hidden size 8,192
Attention heads 64
Context length 163,840
Vocabulary size 256,000
Precision bfloat16
Embedding type RoPE
Activation Function SwiGLU
Layer normalization RMS Norm
Flash attention
Grouped Query Attention
Num. query groups 8

Intended Use

Direct Use

ALIA‑40b‑fc is primarily optimized for robust and reliable function calling in tool-augmented and multi-turn conversational settings, while remaining capable of supporting other general-purpose language tasks. As with all models in the ALIA family, it is released openly to support both research and commercial use in any of the covered languages.

Out-of-scope Use

The model is not intended for malicious activities, such as harming others or violating human rights. Any downstream application must comply with current laws and regulations. Irresponsible usage in production environments without proper risk assessment and mitigation is also discouraged.


Hardware and Software

Training Framework

The post-training process was conducted in NeMo-RL, with minor modifications to adapt it to our infraestructure.

Compute Infrastructure

All models were trained on MareNostrum 5, a pre-exascale EuroHPC supercomputer hosted and operated by Barcelona Supercomputing Center.

The accelerated partition is composed of 1,120 nodes with the following specifications:

  • 4x Nvidia Hopper GPUs with 64GB HBM2 memory
  • 2x Intel Sapphire Rapids 8460Y+ at 2.3Ghz and 32c each (64 cores)
  • 4x NDR200 (BW per node 800Gb/s)
  • 512 GB of Main memory (DDR5)
  • 460GB of NVMe storage

The SFT stage was run across 8 nodes with a total of 32 GPUs.


How to use

The model can be used either directly in Python using the transformers library or deployed as a service and used through standard API calls.

While the former gives the most control over the inference process it requires the code to be executed on a machine with a sufficiently powerful GPU to run the model locally, and is more error prone than the alternative. We therefore strongly recommend the latter, as deploying the model as a service can be done either locally or on a remote server and makes the model available to multiple clients in parallel among other advantages.

Unless you have very specific needs (e.g. for research) that require adapting the inference process it is preferable to follow the "deployment as a service" guidelines below.

In any case, we recommend using a temperature setting close to zero (0.0–0.2) to achieve optimal performance.

Local inference with Python / transformers

The model utilizes the widely adopted ChatML template to structure conversational inputs and outputs. Using this standardized chat format ensures a consistent and enhanced conversational experience. The template can be easily applied through the tokenizer’s built-in functions, as illustrated in the example snippet below:

import torch
from datetime import datetime
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "BSC-LT/ALIA-40b-fc-2605"

text = "What is the weather like in Paris today?"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.bfloat16
  )

message = [ { "role": "user", "content": text } ]

tools = [{
    "type": "function",
    "name": "get_weather",
    "description": "Get current temperature for a given location.",
    "parameters": {
        "type": "object",
        "properties": {
            "location": {
                "type": "string",
                "description": "City and country e.g. Bogotá, Colombia"
            }
        },
        "required": [
            "location"
        ],
        "additionalProperties": False
    }
}]

prompt = tokenizer.apply_chat_template(
    message,
    tokenize=False,
    add_generation_prompt=True,
    tools=tools
)

inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=1000)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Output:

<tool_call>
{"name": "get_weather", "arguments": {"location": "Paris, France"}}
</tool_call>

Deployment as service and remote use (Messages API)

  1. Deploy the model using vLLM docker image:
docker run --runtime nvidia --gpus all \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HUGGING_FACE_HUB_TOKEN=<secret>" \
    -p 80:80 \
    vllm/vllm-openai:latest \
    --model BSC-LT/salamandra-7b-instruct-tools \
    --enable-auto-tool-choice \
    --tool-call-parser hermes \
    --max_model_len 8196 \
    --port 80
  1. Once the deployment is running, interact with the model through the OpenAI-compatible API:
from openai import OpenAI

client = OpenAI(
        base_url="http://localhost:8080/v1/", 
        api_key="hf_xxxx"
    )

models = client.models.list()
model = models.data[0].id

system_message = ""
messages = [{ "role": "system", "content": system_message}] if system_message else []
messages.append( {"role":"user", "content": "What is the weather like in Paris today?"})
print(messages)
chat_completion = client.chat.completions.create(
    model=model,
    tools=tools
    messages=messages,
    stream=False,
    max_tokens=1000,
    temperature=0.1,
    frequency_penalty=0.2,
)

msg = chat_completion.choices[0].message

# --- HANDLE TOOL CALL OR NORMAL CONTENT ---

if not getattr(msg, "tool_calls", None):
    # Normal assistant message
    print(msg.content)

    messages.append({
        "role": "assistant",
        "content": msg.content
    })

else:
    # Assistant tool call message
    print(msg.tool_calls)

    messages.append({"role": "assistant", "tool_calls": msg.tool_calls})

    # --- Fake tool execution example ---
    tool_call = msg.tool_calls[0]
    # Example: handle the get_weather tool
    if tool_call.function.name == "get_weather":
        # Fake tool result (this would come from your actual backend)
        fake_tool_result = '{"temperature": 18, "unit": "C", "description": "Partly cloudy in Paris"}'

        # Append the tool result message so the model can use it in the next turn
        messages.append({
            "role": "tool",
            "tool_call_id": tool_call.id,
            "name": tool_call.function.name,
            "content": fake_tool_result,
        })

Training Data

The dataset used in the supervised fine-tuning stage is built from a mixture of high-quality, permissively licensed datasets developed by third parties and synthetic data generated in-house using DeepSeek-V3-0324.

The table below provides a detailed breakdown of the datasets included in this mixture:

Dataset Generation Method License Instances
nvidia/When2Call Synthetic cc-by-4.0 14,800
Salesforce/xlam-function-calling-60k Synthetic cc-by-4.0 59,800
glaiveai/glaive-function-calling-v2 Synthetic apache-2.0 102,891
Team-ACE/ToolACE Synthetic apache-2.0 11,068
Agent-Ark/Toucan-1.5M Synthetic apache-2.0 119,079
allenai/Dolci-Instruct-SFT-Tool-Use-SA Synthetic cc-by-sa-4.0 1,369
In-house function calling data (synthetically generated) Synthetic apache-2.0 19,227
Instruction-tuning data (see ALIA-40b-instruct) Mix apache-2.0 399,800
Total 728,034

Note: Counts may differ slightly from the original datasets due to quality filtering (e.g., removal of poorly formatted or invalid samples) and because a small portion of each dataset was held out for validation purposes (total of 2,000 instances).

Evaluation

The model’s function-calling (FC) capabilities were evaluated using the BFCL benchmark, which is widely regarded as a standard and comprehensive suite for assessing tool-use and function invocation performance in large language models.

Metric Category Score
Simple AST Non-Live 71.0%
Multiple AST Non-Live 94.5%
Parallel AST Non-Live 80.5%
Parallel Multiple AST Non-Live 81.5%
Simple AST Live 74.8%
Multiple AST Live 74.4%
Parallel AST Live 56.3%
Parallel Multiple AST Live 70.8%
Base Multi-Turn 15.5%
Miss Func Multi-Turn 2.0%
Miss Param Multi-Turn 12.0%
Long Context Multi-Turn 7.0%
Relevance Detection Hallucination 81.3%
Irrelevance Detection Hallucination 84.0%

Ethical Considerations and Limitations

The ALIA-40b-fc model is an instruction-tuned variant. It has several limitations that users should be aware of. Ongoing work is addressing these areas, including comprehensive evaluation of societal and cognitive biases as well as safety.

Functional Limitations:

  • Reasoning & Math: The model is not guaranteed to perform robust chain-of-thought reasoning or advanced mathematics. Complex logical puzzles or multi-step inferences may fail or produce inconsistent answers.
  • Code Generation: Although exposed to code during pretraining, ALIA-40b-fc is not a specialized code-generation model. It may produce code-like text, but outputs should be verified and tested before use in production codebases.
  • Agentive Capabilities: The model does not have agentive or autonomous action capabilities. It cannot act as an autonomous agent or execute multi-step workflows.

Recommendations:

Developers should implement additional safety filters, human oversight, targeted evaluation suites, and secondary evaluation models when deploying this model. Do not deploy ALIA-40b-fc in critical applications without extensive testing and mitigation. Users are responsible for assessing and mitigating harmful behavior or misinformation resulting from model outputs, and ensuring compliance with applicable regulations, including those governing the use of Artificial Intelligence.


Additional information

Author

The Language Modeling team from AI Institute at Barcelona Supercomputing Center.

Contact

For further information, please send an email to ai_institute_languagemodeling@bsc.es.

Copyright

Copyright(c) 2026 by The Language Modeling team from AI Institute at Barcelona Supercomputing Center.

Funding

This work is funded by the Ministerio para la Transformación Digital y de la Función Pública - Funded by EU – NextGenerationEU within the framework of the project Modelos del Lenguaje.

This work has been promoted and supported by the Government of Catalonia through the Aina Project.

Acknowledgements

This project has benefited from the contributions of numerous teams and institutions, mainly through data contributions, knowledge transfer or technical support.

We are especially grateful to our ILENIA project partners: CENID, HiTZ and CiTIUS for their participation. We also extend our genuine gratitude to the Spanish Senate and Congress, Fundación Dialnet, and the ‘Instituto Universitario de Sistemas Inteligentes y Aplicaciones Numéricas en Ingeniería (SIANI)’ of the University of Las Palmas de Gran Canaria. Many other institutions have been involved in the project. Our thanks to Òmnium Cultural, Parlament de Catalunya, Institut d'Estudis Aranesos, Racó Català, Vilaweb, ACN, Nació Digital, El món and Aquí Berguedà. We thank the Welsh government, DFKI, Occiglot project, especially Malte Ostendorff, and The Common Crawl Foundation, especially Pedro Ortiz, for their collaboration.

We would also like to give special thanks to the NVIDIA team, with whom we have met regularly, especially to: Marcelo Sanchez, Ignacio Sarasua, Adam Henryk Grzywaczewski, Oleg Sudakov, Sergio Perez, Miguel Martinez, Felipe Soares and Meriem Bendris. Their constant support has been especially appreciated throughout the entire process.

Their valuable efforts have been instrumental in the development of this work.

Disclaimer

Be aware that the model may show biases or other unintended distortions. When third parties deploy systems or provide services based on this model, or use the model themselves, they bear the responsibility for mitigating any associated risks and ensuring compliance with applicable regulations, including those governing the use of Artificial Intelligence.

The Barcelona Supercomputing Center, as the owner and creator of the model, shall not be held liable for any outcomes resulting from third-party use.

Citation

@misc{gonzalezagirre2025salamandratechnicalreport,
      title={Salamandra Technical Report}, 
      author={Aitor Gonzalez-Agirre and Marc Pàmies and Joan Llop and Irene Baucells and Severino Da Dalt and Daniel Tamayo and José Javier Saiz and Ferran Espuña and Jaume Prats and Javier Aula-Blasco and Mario Mina and Adrián Rubio and Alexander Shvets and Anna Sallés and Iñaki Lacunza and Iñigo Pikabea and Jorge Palomar and Júlia Falcão and Lucía Tormo and Luis Vasquez-Reina and Montserrat Marimon and Valle Ruíz-Fernández and Marta Villegas},
      year={2025},
      eprint={2502.08489},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.08489}, 
}

License

Apache License, Version 2.0

Model Index

Model Base Instruct Function Calling
2b Link Link N/A
7b Link Link N/A
40b Link Link Link
Downloads last month
148
Safetensors
Model size
40B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for BSC-LT/ALIA-40b-fc-2605

Base model

BSC-LT/ALIA-40b
Finetuned
(3)
this model
Quantizations
4 models

Datasets used to train BSC-LT/ALIA-40b-fc-2605

Paper for BSC-LT/ALIA-40b-fc-2605