Instructions to use lex-au/Vocalis-FP16.gguf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use lex-au/Vocalis-FP16.gguf with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="lex-au/Vocalis-FP16.gguf", filename="Vocalis-FP16.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use lex-au/Vocalis-FP16.gguf with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf lex-au/Vocalis-FP16.gguf # Run inference directly in the terminal: llama-cli -hf lex-au/Vocalis-FP16.gguf
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf lex-au/Vocalis-FP16.gguf # Run inference directly in the terminal: llama-cli -hf lex-au/Vocalis-FP16.gguf
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf lex-au/Vocalis-FP16.gguf # Run inference directly in the terminal: ./llama-cli -hf lex-au/Vocalis-FP16.gguf
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf lex-au/Vocalis-FP16.gguf # Run inference directly in the terminal: ./build/bin/llama-cli -hf lex-au/Vocalis-FP16.gguf
Use Docker
docker model run hf.co/lex-au/Vocalis-FP16.gguf
- LM Studio
- Jan
- Ollama
How to use lex-au/Vocalis-FP16.gguf with Ollama:
ollama run hf.co/lex-au/Vocalis-FP16.gguf
- Unsloth Studio
How to use lex-au/Vocalis-FP16.gguf with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for lex-au/Vocalis-FP16.gguf to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for lex-au/Vocalis-FP16.gguf to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for lex-au/Vocalis-FP16.gguf to start chatting
- Docker Model Runner
How to use lex-au/Vocalis-FP16.gguf with Docker Model Runner:
docker model run hf.co/lex-au/Vocalis-FP16.gguf
- Lemonade
How to use lex-au/Vocalis-FP16.gguf with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull lex-au/Vocalis-FP16.gguf
Run and chat with the model
lemonade run user.Vocalis-FP16.gguf-{{QUANT_TAG}}List all available models
lemonade list
π§ Model Card: LLaMA 3 8B Instruct β Conversational Roleplay Enhanced
Model Name: lex-au/vocalis-FP16.gguf
Base Model: Meta LLaMA 3 8B Instruct
Fine-Tuned By: Lex (lex-au)
Model Type: Causal Decoder Transformer
License: Non-commercial (see Meta LLaMA 3 license)
β¨ Overview
vocalis-FP16.gguf is a fine-tuned and quantised variant of Metaβs LLaMA 3 8B Instruct model, purpose-built for immersive roleplaying and low-latency conversational AI in the Vocalis web app. This model tracks spatial and temporal context across turns, enabling natural, persistent narrative flows and highly dynamic character interactions.
Designed for low-resource environments, it balances responsiveness with rich generative quality, making it ideal for real-time applications.
π Fine-Tuning Objective
- Purpose: Enable vivid storytelling and contextually grounded interactions for roleplay-based use cases.
- Focus Areas:
- Spatial awareness (tracking objects, locations, movement)
- Temporal sequencing (event continuity, time-of-day, memory within thread)
- Expressive language generation with adaptable tone
- Fine-Tuning Strategy: LoRA-based adaptation on multi-turn dialogues, fantasy/sci-fi roleplay transcripts, and narrative world-state tracking prompts.
- Deployment Goal: Minimal latency, low memory footprint β optimised for running inside Vocalis
π£ Use Cases
- Text-based RPG narration
- Interactive fiction engines
- AI-driven NPCs for browser games
- Persistent-world simulation bots
- Web-deployed immersive chat agents
β Strengths
- Maintains spatial & narrative continuity across turns
- Generates richly descriptive, tone-sensitive prose
- Adapts easily to character tone and dynamic environments
- Ideal for 1st-person, 2nd-person, or multi-speaker scenarios
β οΈ Known Limitations
- Context window constraints may cause memory loss over extended turns
- Can over-describe or hallucinate detail in ambiguous settings
- Not suited for task-based or factual Q&A applications
- No long-term memory or external knowledge integration
π¬ Example Interaction
User:
You enter the crumbling temple. Whereβs the light coming from?
Assistant:
As you step into the ancient temple, a soft, ethereal glow emanates from within its walls. The light seems to seep through the weathered stone, casting an otherworldly illumination on the dusty artifacts and worn carvings that line the passageway.
Upon closer inspection, you notice that the light is coming from a series of intricate, glowing symbols etched into the ceiling above. These mystical runes pulse with a gentle, blue-green luminescence, their soft radiance casting an enchanting atmosphere throughout the temple's interior.
The air inside the temple feels heavy with age and mystery, yet the soothing light from the celestial symbols imbues the space with a sense of peace and ancient wisdom. As you venture deeper into the labyrinthine structure, the glow seems to grow stronger, guiding you towards secrets that have lain dormant for centuries within the sacred halls.
π Metadata
- Format: GGUF
- Quantisation: FP16
- Optimised For: Real-time inference (Vocalis-compatible)
- Recommended Frontends: OpenWebUI, KoboldCPP, LM Studio, or custom web clients (e.g., Flask/Vite apps)
π Project Link
π§ͺ This model was fine-tuned specifically for the Vocalis webapp project β a low-latency, voice-enabled AI assistant platform.
π Citation & Attribution
The base model used for fine-tuning is Metaβs LLaMA 3 8B Instruct. For licensing, refer to Meta's license terms.
This repository contains a quantised version optimised for low-latency use in the Vocalis real-time webapp.
If you use this model in your research or application, please cite:
@misc{llama3-8b-2024,
author = {Meta AI},
title = {LLaMA 3 8B Instruct},
year = {2024},
publisher = {Meta},
howpublished = {\url{https://ai.meta.com/llama/}}
}
@misc{vocalis-roleplay-2025,
author = {Lex-au},
title = {Vocalis-FP16: Fine-Tuned LLaMA 3 for Conversational Roleplay},
note = {Low-latency roleplay model optimised for use in the Vocalis webapp},
year = {2025},
howpublished = {\url{https://github.com/Lex-au/Vocalis}}
}
- Downloads last month
- 2
We're not able to determine the quantization variants.
Model tree for lex-au/Vocalis-FP16.gguf
Base model
meta-llama/Meta-Llama-3-8B-Instruct