Instructions to use xlr8harder/talkie-1930-13b-base-tf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use xlr8harder/talkie-1930-13b-base-tf with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="xlr8harder/talkie-1930-13b-base-tf", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("xlr8harder/talkie-1930-13b-base-tf", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use xlr8harder/talkie-1930-13b-base-tf with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "xlr8harder/talkie-1930-13b-base-tf" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "xlr8harder/talkie-1930-13b-base-tf", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/xlr8harder/talkie-1930-13b-base-tf
- SGLang
How to use xlr8harder/talkie-1930-13b-base-tf with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "xlr8harder/talkie-1930-13b-base-tf" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "xlr8harder/talkie-1930-13b-base-tf", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "xlr8harder/talkie-1930-13b-base-tf" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "xlr8harder/talkie-1930-13b-base-tf", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use xlr8harder/talkie-1930-13b-base-tf with Docker Model Runner:
docker model run hf.co/xlr8harder/talkie-1930-13b-base-tf
| language: | |
| - en | |
| license: apache-2.0 | |
| library_name: transformers | |
| pipeline_tag: text-generation | |
| model_name: talkie-1930-13b-base-tf | |
| base_model: | |
| - talkie-lm/talkie-1930-13b-base | |
| tags: | |
| - transformers | |
| - safetensors | |
| - bfloat16 | |
| - custom_code | |
| - text-generation | |
| - conversion | |
| - talkie | |
| - pre-1931 | |
| # talkie-1930-13b-base-tf (BF16 Transformers + safetensors conversion) | |
| This repository is a Transformers-compatible conversion of | |
| [`talkie-lm/talkie-1930-13b-base`](https://huggingface.co/talkie-lm/talkie-1930-13b-base), the original Talkie base completion model. | |
| The upstream model is a 13B vintage language model trained on 260B tokens of pre-1931 English-language text, according to the original model card. | |
| The original base checkpoint is FP32. This repository stores a BF16 conversion of those weights and packages them for Transformers with custom `trust_remote_code` modules and BF16 sharded safetensors. | |
| This is not an official Talkie release; refer to the upstream model card for | |
| the author-provided provenance and usage notes. | |
| ## Source Model | |
| - Original model: [talkie-lm/talkie-1930-13b-base](https://huggingface.co/talkie-lm/talkie-1930-13b-base) | |
| - Talkie report: [talkie-lm.com](https://talkie-lm.com/) | |
| - Reference code: [github.com/talkie-lm/talkie](https://github.com/talkie-lm/talkie) | |
| ## Conversion Details | |
| - Weight dtype: BF16 | |
| - Weight format: sharded safetensors | |
| - Context length: 2048 tokens | |
| - Architecture: custom Talkie code loaded with `trust_remote_code=True` | |
| - Tokenizer: Talkie tiktoken-compatible tokenizer exposed through `AutoTokenizer` | |
| ## Usage | |
| ```python | |
| import torch | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| path = "xlr8harder/talkie-1930-13b-base-tf" | |
| tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True) | |
| model = AutoModelForCausalLM.from_pretrained( | |
| path, | |
| trust_remote_code=True, | |
| dtype=torch.bfloat16, | |
| device_map={"": "cuda"}, | |
| use_safetensors=True, | |
| ) | |
| ``` | |
| For base completions: | |
| ```python | |
| inputs = tokenizer("The latest discoveries in physics suggest that", return_tensors="pt").to("cuda") | |
| output = model.generate(**inputs, max_new_tokens=64) | |
| print(tokenizer.decode(output[0], skip_special_tokens=True)) | |
| ``` | |
| ## vLLM | |
| The included remote-code model implements the Transformers attention-interface | |
| hooks expected by vLLM's Transformers modeling backend. For compatibility with | |
| that backend, the original single-scalar `lm_head_gain` is folded into | |
| `lm_head.weight` during conversion; the other Talkie gain parameters remain | |
| explicit model parameters. Using vLLM's `logit_scale`-style approach was not | |
| used because it applies scaling after the output matmul, while Talkie applies | |
| the gain to the head weight before the matmul. In BF16 this can introduce small | |
| rounding differences and, in smoke tests, changed one near-tied top-token | |
| ordering. | |
| ```bash | |
| vllm serve xlr8harder/talkie-1930-13b-base-tf \ | |
| --task generate \ | |
| --model-impl transformers \ | |
| --trust-remote-code \ | |
| --dtype bfloat16 \ | |
| --max-model-len 2048 | |
| ``` | |
| ## Validation | |
| The BF16 checkpoint matched a runtime BF16 cast from the original FP32 checkpoint exactly on the tested forward pass. The Transformers safetensors model was also compared against the Talkie reference architecture; the top-10 next-token ordering matched exactly, with observed max absolute logit difference `0.03125`. | |