Instructions to use xlr8harder/talkie-1930-13b-base-tf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use xlr8harder/talkie-1930-13b-base-tf with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="xlr8harder/talkie-1930-13b-base-tf", trust_remote_code=True)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("xlr8harder/talkie-1930-13b-base-tf", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use xlr8harder/talkie-1930-13b-base-tf with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "xlr8harder/talkie-1930-13b-base-tf"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "xlr8harder/talkie-1930-13b-base-tf",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/xlr8harder/talkie-1930-13b-base-tf

SGLang

How to use xlr8harder/talkie-1930-13b-base-tf with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "xlr8harder/talkie-1930-13b-base-tf" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "xlr8harder/talkie-1930-13b-base-tf",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "xlr8harder/talkie-1930-13b-base-tf" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "xlr8harder/talkie-1930-13b-base-tf",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use xlr8harder/talkie-1930-13b-base-tf with Docker Model Runner:
```
docker model run hf.co/xlr8harder/talkie-1930-13b-base-tf
```

talkie-1930-13b-base-tf / README.md

xlr8harder

Upload Transformers safetensors conversion

15d5bc6 verified about 2 months ago

preview code

Raw

History Blame Contribute Delete

3.35 kB

	---
	language:
	- en
	license: apache-2.0
	library_name: transformers
	pipeline_tag: text-generation
	model_name: talkie-1930-13b-base-tf
	base_model:
	- talkie-lm/talkie-1930-13b-base
	tags:
	- transformers
	- safetensors
	- bfloat16
	- custom_code
	- text-generation
	- conversion
	- talkie
	- pre-1931
	---

	# talkie-1930-13b-base-tf (BF16 Transformers + safetensors conversion)

	This repository is a Transformers-compatible conversion of
	[`talkie-lm/talkie-1930-13b-base`](https://huggingface.co/talkie-lm/talkie-1930-13b-base), the original Talkie base completion model.

	The upstream model is a 13B vintage language model trained on 260B tokens of pre-1931 English-language text, according to the original model card.

	The original base checkpoint is FP32. This repository stores a BF16 conversion of those weights and packages them for Transformers with custom `trust_remote_code` modules and BF16 sharded safetensors.

	This is not an official Talkie release; refer to the upstream model card for
	the author-provided provenance and usage notes.

	## Source Model

	- Original model: [talkie-lm/talkie-1930-13b-base](https://huggingface.co/talkie-lm/talkie-1930-13b-base)
	- Talkie report: [talkie-lm.com](https://talkie-lm.com/)
	- Reference code: [github.com/talkie-lm/talkie](https://github.com/talkie-lm/talkie)

	## Conversion Details

	- Weight dtype: BF16
	- Weight format: sharded safetensors
	- Context length: 2048 tokens
	- Architecture: custom Talkie code loaded with `trust_remote_code=True`
	- Tokenizer: Talkie tiktoken-compatible tokenizer exposed through `AutoTokenizer`

	## Usage

	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer

	path = "xlr8harder/talkie-1930-13b-base-tf"
	tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained(
	path,
	trust_remote_code=True,
	dtype=torch.bfloat16,
	device_map={"": "cuda"},
	use_safetensors=True,
	)
	```

	For base completions:

	```python
	inputs = tokenizer("The latest discoveries in physics suggest that", return_tensors="pt").to("cuda")
	output = model.generate(**inputs, max_new_tokens=64)
	print(tokenizer.decode(output[0], skip_special_tokens=True))
	```

	## vLLM

	The included remote-code model implements the Transformers attention-interface
	hooks expected by vLLM's Transformers modeling backend. For compatibility with
	that backend, the original single-scalar `lm_head_gain` is folded into
	`lm_head.weight` during conversion; the other Talkie gain parameters remain
	explicit model parameters. Using vLLM's `logit_scale`-style approach was not
	used because it applies scaling after the output matmul, while Talkie applies
	the gain to the head weight before the matmul. In BF16 this can introduce small
	rounding differences and, in smoke tests, changed one near-tied top-token
	ordering.

	```bash
	vllm serve xlr8harder/talkie-1930-13b-base-tf \
	--task generate \
	--model-impl transformers \
	--trust-remote-code \
	--dtype bfloat16 \
	--max-model-len 2048
	```

	## Validation

	The BF16 checkpoint matched a runtime BF16 cast from the original FP32 checkpoint exactly on the tested forward pass. The Transformers safetensors model was also compared against the Talkie reference architecture; the top-10 next-token ordering matched exactly, with observed max absolute logit difference `0.03125`.