Instructions to use Pragmir/LatamGPT-70B-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Pragmir/LatamGPT-70B-GGUF with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Pragmir/LatamGPT-70B-GGUF") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Pragmir/LatamGPT-70B-GGUF", dtype="auto") - llama-cpp-python
How to use Pragmir/LatamGPT-70B-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Pragmir/LatamGPT-70B-GGUF", filename="LatamGPT-70B-Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use Pragmir/LatamGPT-70B-GGUF with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf Pragmir/LatamGPT-70B-GGUF:Q4_K_M # Run inference directly in the terminal: llama cli -hf Pragmir/LatamGPT-70B-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf Pragmir/LatamGPT-70B-GGUF:Q4_K_M # Run inference directly in the terminal: llama cli -hf Pragmir/LatamGPT-70B-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Pragmir/LatamGPT-70B-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf Pragmir/LatamGPT-70B-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Pragmir/LatamGPT-70B-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf Pragmir/LatamGPT-70B-GGUF:Q4_K_M
Use Docker
docker model run hf.co/Pragmir/LatamGPT-70B-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use Pragmir/LatamGPT-70B-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Pragmir/LatamGPT-70B-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Pragmir/LatamGPT-70B-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Pragmir/LatamGPT-70B-GGUF:Q4_K_M
- SGLang
How to use Pragmir/LatamGPT-70B-GGUF with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Pragmir/LatamGPT-70B-GGUF" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Pragmir/LatamGPT-70B-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Pragmir/LatamGPT-70B-GGUF" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Pragmir/LatamGPT-70B-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use Pragmir/LatamGPT-70B-GGUF with Ollama:
ollama run hf.co/Pragmir/LatamGPT-70B-GGUF:Q4_K_M
- Unsloth Studio
How to use Pragmir/LatamGPT-70B-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Pragmir/LatamGPT-70B-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Pragmir/LatamGPT-70B-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Pragmir/LatamGPT-70B-GGUF to start chatting
- Atomic Chat new
- Docker Model Runner
How to use Pragmir/LatamGPT-70B-GGUF with Docker Model Runner:
docker model run hf.co/Pragmir/LatamGPT-70B-GGUF:Q4_K_M
- Lemonade
How to use Pragmir/LatamGPT-70B-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Pragmir/LatamGPT-70B-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.LatamGPT-70B-GGUF-Q4_K_M
List all available models
lemonade list
Llama-3.1-70B-LatamGPT-SFT-1.0
🌐 Language versions: Español | Português
LatamGPT is a language model developed from Latin America and the Caribbean, with a focus on representing the region’s linguistic, cultural, and regional particularities. It is built on top of Llama 3.1 70B and adapted through continued pretraining (CPT) with Latin American data and supervised fine-tuning (SFT) focused on instruction following, conversation, and natural language processing tasks in Spanish, Portuguese, and English. This development strengthens technological sovereignty and the deployment of local capabilities, enabling the region to lead its own innovation.
The goal of LatamGPT is to provide an open, multilingual, and culturally relevant model that helps reduce the regional representation gap compared with global models, with better coverage of expressions, contexts, cultural references, and language uses specific to Latin America and the Caribbean.
Model information
LatamGPT 1.0 is an autoregressive model based on the Transformer architecture. It inherits the general capabilities of Llama 3.1 70B and complements them with a regional adaptation process in two main stages:
- Continued pretraining (CPT): specialization of the base model with data from Latin America.
- Supervised fine-tuning (SFT): supervised adaptation to improve instruction following, conversational quality, usefulness, and performance on regional tasks.
Supported languages: Spanish, Portuguese, and English, with a special focus on Latin American variants, registers, and regional language use.
License: the use of the Llama 3.1 LatamGPT model is subject to the Llama 3.1 Community License Agreement Copyright © Meta Platforms, Inc. Built with Llama. We recommend carefully reviewing the applicable terms before redistributing, modifying, or deploying the model in production.
Contact: latam-gpt@cenia.cl
Note: a data catalog and a technical model report will be published soon. These documents will provide more details about the data sources, curation criteria, training stages, evaluation methodology, and main limitations of LatamGPT.
Intended uses
LatamGPT is intended for research, experimentation, application development, and commercial use in Latin American contexts where text generation, conversational assistance, summarization, classification, writing, analytical support, and other natural language processing tasks are required.
This model may be especially useful in scenarios where language, cultural references, or Latin American regional contexts are relevant to the quality of the responses.
Out-of-scope uses
LatamGPT must not be used for purposes that violate applicable laws, regulations, third-party rights, or license terms. It should also not be used without additional controls in high-impact contexts, such as health, education, finance, justice, public safety, or other areas where an incorrect output could cause significant harm.
The model is not optimized for languages other than Spanish, Portuguese, and English. Although it may generate text in other languages, performance outside these three languages is not a primary objective of this version.
How to use the model
Memory requirements
LatamGPT is based on a 70B-parameter model. For inference in BF16/FP16 precision, approximately 140 GB of VRAM are required for the model weights alone, so the use of multiple high-memory GPUs is recommended.
Use with Hugging Face Transformers
To use LatamGPT with transformers, you can load the model and tokenizer as follows:
from transformers import pipeline
import torch
pipe = pipeline(
"text-generation",
model="latamgpt/Llama-3.1-70B-LatamGPT-SFT-1.0",
torch_dtype=torch.bfloat16,
device_map="auto",
)
messages = [
{"role": "system", "content": "You are a helpful, clear, and precise assistant, with a focus on Latin America and the Caribbean."},
{"role": "user", "content": "Explain in a brief paragraph what LatamGPT is."},
]
out = pipe(messages, max_new_tokens=256, temperature=0.6, top_p=0.9, do_sample=True)
print(out[0]["generated_text"][-1]["content"])
Training data
LatamGPT 1.0 was built from Llama 3.1 70B and adapted with regional data from Latin America and the Caribbean. The main focus of the training process was to incorporate representative data from the region, gathered through a strategic alliance of more than 75 institutions.
The continued pretraining stage was carried out with LatamGPT Corpus 1.0, a dataset composed of approximately 297B tokens. The dataset will be available soon on Hugging Face: [link coming soon].
This work covers data from 20 countries: Argentina, Brasil, Bolivia, Chile, Colombia, Costa Rica, Cuba, República Dominicana, Ecuador, El Salvador, Guatemala, Honduras, México, Nicaragua, Panamá, Paraguay, Perú, Puerto Rico, Uruguay, and Venezuela.
The regional data covers different thematic areas of cultural, social, historical, scientific, and territorial interest, including: Indigenous peoples, food and gastronomy, dialects and languages, historical events, celebrations and festivities, places and geography, arts, humanities and social sciences, communication and media, politics, important figures, mythology, flora and fauna, sports and recreation, economics and finance, hard sciences, education, and medicine and health.
The continued pretraining stage seeks to expand the model’s coverage of languages, expressions, entities, cultural references, and contexts specific to Latin America and the Caribbean.
The supervised fine-tuning stage incorporates examples aimed at improving instruction following, conversational interaction, and the model’s usefulness in natural language processing tasks.
The data construction process considers criteria of quality, traceability, and regional diversity. However, like any model trained on large text collections, LatamGPT may reflect biases, gaps, or errors present in its training data.
Responsibility and safety
LatamGPT should be deployed as part of broader systems that incorporate safety controls, monitoring, and risk mitigation mechanisms. The model should not be considered an infallible source of information or used without human validation in high-impact contexts.
Those who integrate LatamGPT into products or services are responsible for defining appropriate usage policies, filters, evaluations, and limits for their specific application.
Responsible deployment
Before deploying LatamGPT in production, we recommend:
- Evaluating the model in the specific domain of use.
- Implementing input and output filters when appropriate.
- Monitoring errors, biases, and misuse.
- Defining feedback and reporting mechanisms.
- Preventing the model from making critical decisions without human supervision.
Ethical considerations and limitations
LatamGPT seeks to contribute to the development of language models that are more representative of Latin America. Even so, the model may produce incorrect, incomplete, biased, or outdated responses.
Its outputs should be interpreted as automatically generated content and not as professional advice. In sensitive applications, responses should be reviewed by domain experts before being used.
Its main limitations include:
- Possibility of hallucinations or factual errors.
- Performance variation across languages, countries, and domains.
- Reproduction of biases present in the data.
- Difficulty recognizing outdated information.
- Sensitivity to the prompt and provided context.
- Limited performance outside Spanish, Portuguese, and English.
Citation
If you use LatamGPT in research, products, or technical reports, please cite the project using the following reference:
@misc{latamgpt2026,
title = {Llama-3.1-70B-LatamGPT-SFT-1.0},
author = {LatamGPT Team},
year = {2026},
howpublished = {\url{https://huggingface.co/latam-gpt/Llama-3.1-70B-LatamGPT-SFT-1.0}}
}
Acknowledgements
LatamGPT is made possible by the collaborative work of technical teams, institutions, communities, and organizations that contribute to the development of open and representative artificial intelligence for Latin America and the Caribbean.
- Downloads last month
- 220
4-bit
8-bit
Model tree for Pragmir/LatamGPT-70B-GGUF
Base model
meta-llama/Llama-3.1-70B
