Instructions to use fromthesky/PLDR-LLM-v51-110M-4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use fromthesky/PLDR-LLM-v51-110M-4 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="fromthesky/PLDR-LLM-v51-110M-4", trust_remote_code=True)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("fromthesky/PLDR-LLM-v51-110M-4", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use fromthesky/PLDR-LLM-v51-110M-4 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "fromthesky/PLDR-LLM-v51-110M-4"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "fromthesky/PLDR-LLM-v51-110M-4",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/fromthesky/PLDR-LLM-v51-110M-4

SGLang

How to use fromthesky/PLDR-LLM-v51-110M-4 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "fromthesky/PLDR-LLM-v51-110M-4" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "fromthesky/PLDR-LLM-v51-110M-4",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "fromthesky/PLDR-LLM-v51-110M-4" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "fromthesky/PLDR-LLM-v51-110M-4",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use fromthesky/PLDR-LLM-v51-110M-4 with Docker Model Runner:
```
docker model run hf.co/fromthesky/PLDR-LLM-v51-110M-4
```

fromthesky commited on Sep 18, 2025

Commit

366811c

1 Parent(s): dda3ae4

Updated readme

Browse files

Bumped transformers version to 4.56.1

Files changed (3) hide show

README.md +21 -21
config.json +1 -1
requirements.txt +1 -1

README.md CHANGED Viewed

@@ -14,6 +14,7 @@ tags:
 license: apache-2.0
 datasets:
 - tiiuae/falcon-refinedweb
 ---
 # PLDR-LLM-v51-110M-4
@@ -38,7 +39,7 @@ This model is intended to be used for research purposes. Given text as input pro
 ### Via Huggingface Transformers Library
-PLDR-LLM has custom model support for Huggingface Transformers library. PLDR-LLM custom models support is developed on Transformers v4.55.4 release available at the time.
 Using `pipeline`:
 ```python
@@ -47,11 +48,13 @@ from transformers import pipeline
 pipeline = pipeline(
     task="text-generation",
     model="fromthesky/PLDR-LLM-v51-110M-4",
-    device="cuda"
 )
-prompt="PLDR-LLM is a large language model architecture developed by Fromthesky Research Labs."
-output=pipeline(prompt, top_p=0.6, top_k=0, temperature=1, do_sample=True, max_new_tokens=100)
 print(output[0]["generated_text"])
 ```
@@ -65,21 +68,24 @@ model=AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path="fromth
                                           )
 tokenizer=AutoTokenizer.from_pretrained(pretrained_model_name_or_path="fromthesky/PLDR-LLM-v51-110M-4",
                                         add_eos_token=False,
-                                        Legacy=False,
                                         trust_remote_code=True
                                        )
-prompt="PLDR-LLM is a large language model architecture developed by Fromthesky Research Labs."
 inputs = tokenizer([prompt], return_tensors="pt").to(device=device)
 generated_ids = model.generate(**inputs,
-                                     max_new_tokens=100,
-                                     top_p=0.6,
-                                     top_k=0,
-                                     temperature=1,
-                                     do_sample=True,
-                                     use_cache=True
-                                    )
 print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))
 ```
 #### PLDR-LLM specific configurations:
 - `custom_G_type`: `None` for learned G values during pretraining, `'identity'` for LLM with SDPA equivalent, `'random'` for G values from a random normal distribution, `'external'` for custom G values that can be assigned after model initialization. This setting is more important for training purposes, for inference it is set in the model config.json file.
@@ -94,14 +100,8 @@ the output of the residual metric learner (metric tensor, **A**), output (**A<su
 See config.json for other model configuration details.
 #### Notes:
-- Transformers v4.55.4 causes generation with quantized cache to fail at the time of this writing.
-To overcome this issue, install the most recent updates from transformers library:
-```python
-      git clone https://github.com/huggingface/transformers
-      cd transformers
-      pip install -e ".[dev]"
-```
-We also have a fork of transformers library with PLDR-LLM model support for future development. The PLDR-LLM model files are added to the library so custom model files are not necessary.
 ```python
       git clone https://github.com/burcgokden/transformers
       cd transformers

 license: apache-2.0
 datasets:
 - tiiuae/falcon-refinedweb
+library_name: transformers
 ---
 # PLDR-LLM-v51-110M-4
 ### Via Huggingface Transformers Library
+PLDR-LLM has custom model support for Huggingface Transformers library. PLDR-LLM with custom code is evaluated on Transformers 4.56.1 available at the time.
 Using `pipeline`:
 ```python
 pipeline = pipeline(
     task="text-generation",
     model="fromthesky/PLDR-LLM-v51-110M-4",
+    device="cuda", # or "cpu"
+    trust_remote_code=True
 )
+prompt=('One time they had a drumming contest, and I didn’t do very well: '
+        'They said my drumming was "too intellectual"; theirs was much more pulsing.')
+output=pipeline(prompt, top_p=0.6, top_k=0, temperature=1, do_sample=True, use_cache=True, max_new_tokens=100)
 print(output[0]["generated_text"])
 ```
                                           )
 tokenizer=AutoTokenizer.from_pretrained(pretrained_model_name_or_path="fromthesky/PLDR-LLM-v51-110M-4",
                                         add_eos_token=False,
+                                        legacy=False,
                                         trust_remote_code=True
                                        )
+prompt=('One time they had a drumming contest, and I didn’t do very well: '
+        'They said my drumming was "too intellectual"; theirs was much more pulsing.')
 inputs = tokenizer([prompt], return_tensors="pt").to(device=device)
 generated_ids = model.generate(**inputs,
+                               max_new_tokens=100,
+                               top_p=0.6,
+                               top_k=0,
+                               temperature=1,
+                               do_sample=True,
+                               use_cache=True
+                              )
 print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))
 ```
+<sup>\*</sup> `prompt` string is a quote from Richard Feynman in Surely You're Joking, Mr. Feynman! Adventures of a Curious Character.
 #### PLDR-LLM specific configurations:
 - `custom_G_type`: `None` for learned G values during pretraining, `'identity'` for LLM with SDPA equivalent, `'random'` for G values from a random normal distribution, `'external'` for custom G values that can be assigned after model initialization. This setting is more important for training purposes, for inference it is set in the model config.json file.
 See config.json for other model configuration details.
 #### Notes:
+- This implementation of PLDR-LLM custom code was evaluated on Transformers 4.56.1 and pytorch 2.6.0.
+- We also have a fork of transformers library with PLDR-LLM model support for future development. The PLDR-LLM model files are added to the library so custom model files are not necessary.
 ```python
       git clone https://github.com/burcgokden/transformers
       cd transformers

config.json CHANGED Viewed

@@ -34,7 +34,7 @@
   "rope_theta": 10000.0,
   "tie_word_embeddings": false,
   "torch_dtype": "float32",
-  "transformers_version": "4.55.4",
   "use_cache": true,
   "vocab_size": 32000
 }

   "rope_theta": 10000.0,
   "tie_word_embeddings": false,
   "torch_dtype": "float32",
+  "transformers_version": "4.56.1",
   "use_cache": true,
   "vocab_size": 32000
 }

requirements.txt CHANGED Viewed

@@ -1,4 +1,4 @@
-transformers==4.55.4
 pytorch==2.6.0
 sentencepiece==0.1.99
 python==3.11

+transformers==4.56.1
 pytorch==2.6.0
 sentencepiece==0.1.99
 python==3.11