Instructions to use yhavinga/gpt-neo-125M-dutch with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use yhavinga/gpt-neo-125M-dutch with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="yhavinga/gpt-neo-125M-dutch")# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("yhavinga/gpt-neo-125M-dutch") model = AutoModelForMultimodalLM.from_pretrained("yhavinga/gpt-neo-125M-dutch") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use yhavinga/gpt-neo-125M-dutch with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "yhavinga/gpt-neo-125M-dutch" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "yhavinga/gpt-neo-125M-dutch", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/yhavinga/gpt-neo-125M-dutch
- SGLang
How to use yhavinga/gpt-neo-125M-dutch with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "yhavinga/gpt-neo-125M-dutch" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "yhavinga/gpt-neo-125M-dutch", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "yhavinga/gpt-neo-125M-dutch" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "yhavinga/gpt-neo-125M-dutch", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use yhavinga/gpt-neo-125M-dutch with Docker Model Runner:
docker model run hf.co/yhavinga/gpt-neo-125M-dutch
| import torch | |
| import numpy as np | |
| import jax | |
| import jax.numpy as jnp | |
| from transformers import AutoTokenizer | |
| from transformers import FlaxGPTNeoForCausalLM | |
| from transformers import GPTNeoForCausalLM | |
| tokenizer = AutoTokenizer.from_pretrained(".") | |
| tokenizer.pad_token = tokenizer.eos_token | |
| model_fx = FlaxGPTNeoForCausalLM.from_pretrained(".") | |
| # def to_f32(t): | |
| # return jax.tree_map(lambda x: x.astype(jnp.float32) if x.dtype == jnp.bfloat16 else x, t) | |
| # model_fx.params = to_f32(model_fx.params) | |
| # model_fx.save_pretrained("./fx") | |
| model_pt = GPTNeoForCausalLM.from_pretrained(".", from_flax=True) | |
| model_pt.save_pretrained(".") | |
| input_ids = np.asarray(2 * [128 * [0]], dtype=np.int32) | |
| input_ids_pt = torch.tensor(input_ids) | |
| logits_pt = model_pt(input_ids_pt).logits | |
| print(logits_pt) | |
| logits_fx = model_fx(input_ids).logits | |
| print(logits_fx) |