How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="gparag/gparag_tinyllama-1.1b-finetuned-gguf",
	filename="tinyllama.Q4_K_M.gguf",
)
output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

TinyLlama (Fine-Tuned)

Custom 1.1B model fine-tuned for concise chat!

🎯 Optimized For

  • Mobile devices (Android/iOS)
  • Real-time chat (llama.cpp)
  • Concise responses (1-3 sentences)

πŸ“Š Model Details

Parameter Value
Base Model TinyLlama-1.1B-Chat-v1.0
Fine-tuned Unsloth + LoRA (r=16)
Dataset Guanaco/ShareGPT (5K chat pairs)
Quantization Q4_K_M (637MB)
Context 2048 tokens
Vocab 32,000

Training Details

  • Epochs: 3
  • LoRA Rank: 16
  • Learning Rate: 2e-4
  • Batch Size: 2 (gradient accumulation: 4)
  • Framework: Unsloth + TRL

Downloads last month
8
GGUF
Model size
1B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for gparag/gparag_tinyllama-1.1b-finetuned-gguf

Quantized
(149)
this model