Instructions to use lex-au/Vocalis-FP16.gguf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use lex-au/Vocalis-FP16.gguf with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="lex-au/Vocalis-FP16.gguf",
	filename="Vocalis-FP16.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use lex-au/Vocalis-FP16.gguf with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf lex-au/Vocalis-FP16.gguf
# Run inference directly in the terminal:
llama-cli -hf lex-au/Vocalis-FP16.gguf

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf lex-au/Vocalis-FP16.gguf
# Run inference directly in the terminal:
llama-cli -hf lex-au/Vocalis-FP16.gguf

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf lex-au/Vocalis-FP16.gguf
# Run inference directly in the terminal:
./llama-cli -hf lex-au/Vocalis-FP16.gguf

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf lex-au/Vocalis-FP16.gguf
# Run inference directly in the terminal:
./build/bin/llama-cli -hf lex-au/Vocalis-FP16.gguf

Use Docker

docker model run hf.co/lex-au/Vocalis-FP16.gguf

LM Studio
Jan
Ollama
How to use lex-au/Vocalis-FP16.gguf with Ollama:
```
ollama run hf.co/lex-au/Vocalis-FP16.gguf
```

Unsloth Studio

How to use lex-au/Vocalis-FP16.gguf with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for lex-au/Vocalis-FP16.gguf to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for lex-au/Vocalis-FP16.gguf to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for lex-au/Vocalis-FP16.gguf to start chatting

Docker Model Runner
How to use lex-au/Vocalis-FP16.gguf with Docker Model Runner:
```
docker model run hf.co/lex-au/Vocalis-FP16.gguf
```

Lemonade

How to use lex-au/Vocalis-FP16.gguf with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull lex-au/Vocalis-FP16.gguf

Run and chat with the model

lemonade run user.Vocalis-FP16.gguf-{{QUANT_TAG}}

List all available models

lemonade list

🧠 Model Card: LLaMA 3 8B Instruct – Conversational Roleplay Enhanced

Model Name: lex-au/vocalis-FP16.gguf
Base Model: Meta LLaMA 3 8B Instruct
Fine-Tuned By: Lex (lex-au)
Model Type: Causal Decoder Transformer
License: Non-commercial (see Meta LLaMA 3 license)

✨ Overview

vocalis-FP16.gguf is a fine-tuned and quantised variant of Meta’s LLaMA 3 8B Instruct model, purpose-built for immersive roleplaying and low-latency conversational AI in the Vocalis web app. This model tracks spatial and temporal context across turns, enabling natural, persistent narrative flows and highly dynamic character interactions.

Designed for low-resource environments, it balances responsiveness with rich generative quality, making it ideal for real-time applications.

🛠 Fine-Tuning Objective

Purpose: Enable vivid storytelling and contextually grounded interactions for roleplay-based use cases.
Focus Areas:
- Spatial awareness (tracking objects, locations, movement)
- Temporal sequencing (event continuity, time-of-day, memory within thread)
- Expressive language generation with adaptable tone
Fine-Tuning Strategy: LoRA-based adaptation on multi-turn dialogues, fantasy/sci-fi roleplay transcripts, and narrative world-state tracking prompts.
Deployment Goal: Minimal latency, low memory footprint — optimised for running inside Vocalis

🗣 Use Cases

Text-based RPG narration
Interactive fiction engines
AI-driven NPCs for browser games
Persistent-world simulation bots
Web-deployed immersive chat agents

✅ Strengths

Maintains spatial & narrative continuity across turns
Generates richly descriptive, tone-sensitive prose
Adapts easily to character tone and dynamic environments
Ideal for 1st-person, 2nd-person, or multi-speaker scenarios

⚠️ Known Limitations

Context window constraints may cause memory loss over extended turns
Can over-describe or hallucinate detail in ambiguous settings
Not suited for task-based or factual Q&A applications
No long-term memory or external knowledge integration

💬 Example Interaction

User:
You enter the crumbling temple. Where’s the light coming from?

Assistant:
As you step into the ancient temple, a soft, ethereal glow emanates from within its walls. The light seems to seep through the weathered stone, casting an otherworldly illumination on the dusty artifacts and worn carvings that line the passageway.

Upon closer inspection, you notice that the light is coming from a series of intricate, glowing symbols etched into the ceiling above. These mystical runes pulse with a gentle, blue-green luminescence, their soft radiance casting an enchanting atmosphere throughout the temple's interior.

The air inside the temple feels heavy with age and mystery, yet the soothing light from the celestial symbols imbues the space with a sense of peace and ancient wisdom. As you venture deeper into the labyrinthine structure, the glow seems to grow stronger, guiding you towards secrets that have lain dormant for centuries within the sacred halls.

📎 Metadata

Format: GGUF
Quantisation: FP16
Optimised For: Real-time inference (Vocalis-compatible)
Recommended Frontends: OpenWebUI, KoboldCPP, LM Studio, or custom web clients (e.g., Flask/Vite apps)

🔗 Project Link

🧪 This model was fine-tuned specifically for the Vocalis webapp project — a low-latency, voice-enabled AI assistant platform.

📖 Citation & Attribution

The base model used for fine-tuning is Meta’s LLaMA 3 8B Instruct. For licensing, refer to Meta's license terms.

This repository contains a quantised version optimised for low-latency use in the Vocalis real-time webapp.

If you use this model in your research or application, please cite:

@misc{llama3-8b-2024,
  author = {Meta AI},
  title = {LLaMA 3 8B Instruct},
  year = {2024},
  publisher = {Meta},
  howpublished = {\url{https://ai.meta.com/llama/}}
}

@misc{vocalis-roleplay-2025,
  author = {Lex-au},
  title = {Vocalis-FP16: Fine-Tuned LLaMA 3 for Conversational Roleplay},
  note = {Low-latency roleplay model optimised for use in the Vocalis webapp},
  year = {2025},
  howpublished = {\url{https://github.com/Lex-au/Vocalis}}
}