Instructions to use Pragmir/LatamGPT-70B-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Pragmir/LatamGPT-70B-GGUF with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Pragmir/LatamGPT-70B-GGUF")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Pragmir/LatamGPT-70B-GGUF", dtype="auto")

llama-cpp-python

How to use Pragmir/LatamGPT-70B-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Pragmir/LatamGPT-70B-GGUF",
	filename="LatamGPT-70B-Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use Pragmir/LatamGPT-70B-GGUF with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf Pragmir/LatamGPT-70B-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf Pragmir/LatamGPT-70B-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf Pragmir/LatamGPT-70B-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf Pragmir/LatamGPT-70B-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Pragmir/LatamGPT-70B-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf Pragmir/LatamGPT-70B-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Pragmir/LatamGPT-70B-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Pragmir/LatamGPT-70B-GGUF:Q4_K_M

Use Docker

docker model run hf.co/Pragmir/LatamGPT-70B-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use Pragmir/LatamGPT-70B-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Pragmir/LatamGPT-70B-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Pragmir/LatamGPT-70B-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Pragmir/LatamGPT-70B-GGUF:Q4_K_M

SGLang

How to use Pragmir/LatamGPT-70B-GGUF with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Pragmir/LatamGPT-70B-GGUF" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Pragmir/LatamGPT-70B-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Pragmir/LatamGPT-70B-GGUF" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Pragmir/LatamGPT-70B-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use Pragmir/LatamGPT-70B-GGUF with Ollama:
```
ollama run hf.co/Pragmir/LatamGPT-70B-GGUF:Q4_K_M
```

Unsloth Studio

How to use Pragmir/LatamGPT-70B-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Pragmir/LatamGPT-70B-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Pragmir/LatamGPT-70B-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Pragmir/LatamGPT-70B-GGUF to start chatting

Atomic Chat new
Docker Model Runner
How to use Pragmir/LatamGPT-70B-GGUF with Docker Model Runner:
```
docker model run hf.co/Pragmir/LatamGPT-70B-GGUF:Q4_K_M
```

Lemonade

How to use Pragmir/LatamGPT-70B-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Pragmir/LatamGPT-70B-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.LatamGPT-70B-GGUF-Q4_K_M

List all available models

lemonade list

Llama-3.1-70B-LatamGPT-SFT-1.0

🌐 Language versions: Español | Português

LatamGPT is a language model developed from Latin America and the Caribbean, with a focus on representing the region’s linguistic, cultural, and regional particularities. It is built on top of Llama 3.1 70B and adapted through continued pretraining (CPT) with Latin American data and supervised fine-tuning (SFT) focused on instruction following, conversation, and natural language processing tasks in Spanish, Portuguese, and English. This development strengthens technological sovereignty and the deployment of local capabilities, enabling the region to lead its own innovation.

The goal of LatamGPT is to provide an open, multilingual, and culturally relevant model that helps reduce the regional representation gap compared with global models, with better coverage of expressions, contexts, cultural references, and language uses specific to Latin America and the Caribbean.

Model information

LatamGPT 1.0 is an autoregressive model based on the Transformer architecture. It inherits the general capabilities of Llama 3.1 70B and complements them with a regional adaptation process in two main stages:

Continued pretraining (CPT): specialization of the base model with data from Latin America.
Supervised fine-tuning (SFT): supervised adaptation to improve instruction following, conversational quality, usefulness, and performance on regional tasks.

Supported languages: Spanish, Portuguese, and English, with a special focus on Latin American variants, registers, and regional language use.

License: the use of the Llama 3.1 LatamGPT model is subject to the Llama 3.1 Community License Agreement Copyright © Meta Platforms, Inc. Built with Llama. We recommend carefully reviewing the applicable terms before redistributing, modifying, or deploying the model in production.

Contact: latam-gpt@cenia.cl

Note: a data catalog and a technical model report will be published soon. These documents will provide more details about the data sources, curation criteria, training stages, evaluation methodology, and main limitations of LatamGPT.

Intended uses

LatamGPT is intended for research, experimentation, application development, and commercial use in Latin American contexts where text generation, conversational assistance, summarization, classification, writing, analytical support, and other natural language processing tasks are required.

This model may be especially useful in scenarios where language, cultural references, or Latin American regional contexts are relevant to the quality of the responses.

Out-of-scope uses

LatamGPT must not be used for purposes that violate applicable laws, regulations, third-party rights, or license terms. It should also not be used without additional controls in high-impact contexts, such as health, education, finance, justice, public safety, or other areas where an incorrect output could cause significant harm.

The model is not optimized for languages other than Spanish, Portuguese, and English. Although it may generate text in other languages, performance outside these three languages is not a primary objective of this version.

How to use the model

Memory requirements

LatamGPT is based on a 70B-parameter model. For inference in BF16/FP16 precision, approximately 140 GB of VRAM are required for the model weights alone, so the use of multiple high-memory GPUs is recommended.

Use with Hugging Face Transformers

To use LatamGPT with transformers, you can load the model and tokenizer as follows:

from transformers import pipeline
import torch

pipe = pipeline(
    "text-generation",
    model="latamgpt/Llama-3.1-70B-LatamGPT-SFT-1.0",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are a helpful, clear, and precise assistant, with a focus on Latin America and the Caribbean."},
    {"role": "user", "content": "Explain in a brief paragraph what LatamGPT is."},
]

out = pipe(messages, max_new_tokens=256, temperature=0.6, top_p=0.9, do_sample=True)
print(out[0]["generated_text"][-1]["content"])

Training data

LatamGPT 1.0 was built from Llama 3.1 70B and adapted with regional data from Latin America and the Caribbean. The main focus of the training process was to incorporate representative data from the region, gathered through a strategic alliance of more than 75 institutions.

The continued pretraining stage was carried out with LatamGPT Corpus 1.0, a dataset composed of approximately 297B tokens. The dataset will be available soon on Hugging Face: [link coming soon].

This work covers data from 20 countries: Argentina, Brasil, Bolivia, Chile, Colombia, Costa Rica, Cuba, República Dominicana, Ecuador, El Salvador, Guatemala, Honduras, México, Nicaragua, Panamá, Paraguay, Perú, Puerto Rico, Uruguay, and Venezuela.

The regional data covers different thematic areas of cultural, social, historical, scientific, and territorial interest, including: Indigenous peoples, food and gastronomy, dialects and languages, historical events, celebrations and festivities, places and geography, arts, humanities and social sciences, communication and media, politics, important figures, mythology, flora and fauna, sports and recreation, economics and finance, hard sciences, education, and medicine and health.

The continued pretraining stage seeks to expand the model’s coverage of languages, expressions, entities, cultural references, and contexts specific to Latin America and the Caribbean.

The supervised fine-tuning stage incorporates examples aimed at improving instruction following, conversational interaction, and the model’s usefulness in natural language processing tasks.

The data construction process considers criteria of quality, traceability, and regional diversity. However, like any model trained on large text collections, LatamGPT may reflect biases, gaps, or errors present in its training data.

Responsibility and safety

LatamGPT should be deployed as part of broader systems that incorporate safety controls, monitoring, and risk mitigation mechanisms. The model should not be considered an infallible source of information or used without human validation in high-impact contexts.

Those who integrate LatamGPT into products or services are responsible for defining appropriate usage policies, filters, evaluations, and limits for their specific application.

Responsible deployment

Before deploying LatamGPT in production, we recommend:

Evaluating the model in the specific domain of use.
Implementing input and output filters when appropriate.
Monitoring errors, biases, and misuse.
Defining feedback and reporting mechanisms.
Preventing the model from making critical decisions without human supervision.

Ethical considerations and limitations

LatamGPT seeks to contribute to the development of language models that are more representative of Latin America. Even so, the model may produce incorrect, incomplete, biased, or outdated responses.

Its outputs should be interpreted as automatically generated content and not as professional advice. In sensitive applications, responses should be reviewed by domain experts before being used.

Its main limitations include:

Possibility of hallucinations or factual errors.
Performance variation across languages, countries, and domains.
Reproduction of biases present in the data.
Difficulty recognizing outdated information.
Sensitivity to the prompt and provided context.
Limited performance outside Spanish, Portuguese, and English.

Citation

If you use LatamGPT in research, products, or technical reports, please cite the project using the following reference:

@misc{latamgpt2026,
  title        = {Llama-3.1-70B-LatamGPT-SFT-1.0},
  author       = {LatamGPT Team},
  year         = {2026},
  howpublished = {\url{https://huggingface.co/latam-gpt/Llama-3.1-70B-LatamGPT-SFT-1.0}}
}

Acknowledgements

LatamGPT is made possible by the collaborative work of technical teams, institutions, communities, and organizations that contribute to the development of open and representative artificial intelligence for Latin America and the Caribbean.

Downloads last month: 220

GGUF

Model size

71B params

Architecture

llama

Hardware compatibility

4-bit

8-bit

Model tree for Pragmir/LatamGPT-70B-GGUF

Base model

meta-llama/Llama-3.1-70B

Quantized

(81)

this model