How to use from
SGLang
Install from pip and serve model
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "allenai/Emo_1b14b_1T" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "allenai/Emo_1b14b_1T",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'
Use Docker images
docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "allenai/Emo_1b14b_1T" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "allenai/Emo_1b14b_1T",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'
Quick Links

Emo_1b14b_1T

The main release of EMO from EMO: Pretraining Mixture of Experts for Emergent Modularity — referred to as EMO (1T tokens, midtrained) in the paper.

1B-active / 14B-total parameter Mixture-of-Experts model (128 experts: 127 routed + 1 shared, k=8 active per token) pretrained on 1T tokens of the OLMoE pretraining mix and annealed under the EMO objective for an additional 50B tokens. Tokens within the same document are constrained to route through a shared pool of experts during training, producing expert subsets that can be deployed in isolation for specific domains with minimal performance degradation.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "allenai/Emo_1b14b_1T"
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)

inputs = tokenizer(["Language modeling is "], return_tensors="pt", return_token_type_ids=False)
out = model.generate(**inputs, max_new_tokens=100, do_sample=True, temperature=1.0, top_p=0.7)
print(tokenizer.batch_decode(out, skip_special_tokens=True)[0])

Citation

@article{wang2026emo,
  title  = {EMO: Pretraining Mixture of Experts for Emergent Modularity},
  author = {Wang, Ryan and Bhagia, Akshita and Min, Sewon},
  year   = {2026},
  url    = {https://arxiv.org/abs/2605.06663}
}

Links

Downloads last month
2,715
Safetensors
Model size
14B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for allenai/Emo_1b14b_1T

Finetunes
1 model
Quantizations
1 model

Dataset used to train allenai/Emo_1b14b_1T

Collection including allenai/Emo_1b14b_1T

Paper for allenai/Emo_1b14b_1T

Article mentioning allenai/Emo_1b14b_1T