Instructions to use LiquidAI/LFM2.5-8B-A1B-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use LiquidAI/LFM2.5-8B-A1B-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="LiquidAI/LFM2.5-8B-A1B-GGUF", filename="LFM2.5-8B-A1B-BF16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use LiquidAI/LFM2.5-8B-A1B-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf LiquidAI/LFM2.5-8B-A1B-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf LiquidAI/LFM2.5-8B-A1B-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf LiquidAI/LFM2.5-8B-A1B-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf LiquidAI/LFM2.5-8B-A1B-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf LiquidAI/LFM2.5-8B-A1B-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf LiquidAI/LFM2.5-8B-A1B-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf LiquidAI/LFM2.5-8B-A1B-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf LiquidAI/LFM2.5-8B-A1B-GGUF:Q4_K_M
Use Docker
docker model run hf.co/LiquidAI/LFM2.5-8B-A1B-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use LiquidAI/LFM2.5-8B-A1B-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "LiquidAI/LFM2.5-8B-A1B-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LiquidAI/LFM2.5-8B-A1B-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/LiquidAI/LFM2.5-8B-A1B-GGUF:Q4_K_M
- Ollama
How to use LiquidAI/LFM2.5-8B-A1B-GGUF with Ollama:
ollama run hf.co/LiquidAI/LFM2.5-8B-A1B-GGUF:Q4_K_M
- Unsloth Studio
How to use LiquidAI/LFM2.5-8B-A1B-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for LiquidAI/LFM2.5-8B-A1B-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for LiquidAI/LFM2.5-8B-A1B-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for LiquidAI/LFM2.5-8B-A1B-GGUF to start chatting
- Pi
How to use LiquidAI/LFM2.5-8B-A1B-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf LiquidAI/LFM2.5-8B-A1B-GGUF:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "LiquidAI/LFM2.5-8B-A1B-GGUF:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use LiquidAI/LFM2.5-8B-A1B-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf LiquidAI/LFM2.5-8B-A1B-GGUF:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default LiquidAI/LFM2.5-8B-A1B-GGUF:Q4_K_M
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use LiquidAI/LFM2.5-8B-A1B-GGUF with Docker Model Runner:
docker model run hf.co/LiquidAI/LFM2.5-8B-A1B-GGUF:Q4_K_M
- Lemonade
How to use LiquidAI/LFM2.5-8B-A1B-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull LiquidAI/LFM2.5-8B-A1B-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.LFM2.5-8B-A1B-GGUF-Q4_K_M
List all available models
lemonade list
Tool calling and thinking not working in llama.cpp
Running in llama.cpp I am seeing two issues:
- think tags are being output as general output not reasoning trace including
<think/>tags - tool calling is not working.
I tried having chat-template = chatml but this did not help. Unclear if this is a problem of needing additional args to start in llama.cpp or if fixes are needed
"BenchLocal" in all test much less good than gemma-4-e2b
no tools, bad bugfix, bad math reasoning ... aso
sry next try guys ;)
your smart lfm2.5_vl 0.6/1.6 for images are fine
not working properly. unfortunately. llama.cpp + hermes agent.
the understanding is very poor tbh. Asking in portuguese and then it answers me with a unrelated thing
same here, but seems to have some potential if tool calls worked
The chat template is busted, but I cannot fix it tonight.
Reasoning works with this unrelated one (for Qwen) https://huggingface.co/froggeric/Qwen-Fixed-Chat-Templates/blob/main/chat_template.jinja
This chat template from https://huggingface.co/nathanrchn/LFM2.5-8B-A1B-GGUF-fixed-v2 seems to work
{# List of tools: [ #} {{- bos_token -}} {%- set preserve_thinking = preserve_thinking | default(false) -%} {%- macro format_arg_value(arg_value) -%} {%- if arg_value is string -%} {{- "'" + arg_value + "'" -}} {%- elif arg_value is mapping -%} {{- arg_value | tojson -}} {%- else -%} {{- arg_value | string -}} {%- endif -%} {%- endmacro -%} {%- macro parse_content(content) -%} {%- if content is string -%} {{- content -}} {%- else -%} {%- set _ns = namespace(result="") -%} {%- for item in content -%} {%- if item["type"] == "image" -%} {%- set _ns.result = _ns.result + "<image>" -%} {%- elif item["type"] == "text" -%} {%- set _ns.result = _ns.result + item["text"] -%} {%- else -%} {%- set _ns.result = _ns.result + item | tojson -%} {%- endif -%} {%- endfor -%} {{- _ns.result -}} {%- endif -%} {%- endmacro -%} {%- macro render_tool_calls(tool_calls) -%} {%- set tool_calls_ns = namespace(tool_calls=[]) -%} {%- for tool_call in tool_calls -%} {%- set func_name = tool_call["function"]["name"] -%} {%- set func_args = tool_call["function"]["arguments"] -%} {%- set args_ns = namespace(arg_strings=[]) -%} {%- for arg_name, arg_value in func_args.items() -%} {%- set args_ns.arg_strings = args_ns.arg_strings + [arg_name + "=" + format_arg_value(arg_value)] -%} {%- endfor -%} {%- set tool_calls_ns.tool_calls = tool_calls_ns.tool_calls + [func_name + "(" + (args_ns.arg_strings | join(", ")) + ")"] -%} {%- endfor -%} {{- "<|tool_call_start|>[" + (tool_calls_ns.tool_calls | join(", ")) + "]<|tool_call_end|>" -}} {%- endmacro -%} {%- set ns = namespace(system_prompt="", last_user_index=-1) -%} {%- if messages[0]["role"] == "system" -%} {%- if messages[0].get("content") -%} {%- set ns.system_prompt = parse_content(messages[0]["content"]) -%} {%- endif -%} {%- set messages = messages[1:] -%} {%- endif -%} {%- if tools -%} {%- set ns.system_prompt = ns.system_prompt + ("\n\n" if ns.system_prompt else "") + "Today's date: " + strftime_now("%Y-%m-%d") + "\n\nList of tools: " + (tools | tojson) -%} {%- endif -%} {%- if ns.system_prompt -%} {{- "<|im_start|>system\n" + ns.system_prompt + "<|im_end|>\n" -}} {%- endif -%} {%- for message in messages -%} {%- if message["role"] == "user" -%} {%- set ns.last_user_index = loop.index0 -%} {%- endif -%} {%- endfor -%} {%- for message in messages -%} {{- "<|im_start|>" + message.role + "\n" -}} {%- if message.role == "assistant" -%} {%- generation -%} {%- if message.thinking is defined and (preserve_thinking or loop.index0 > ns.last_user_index) -%} {{- "<think>" + message.thinking + "</think>" -}} {%- endif -%} {%- set _cfm_tag = "CONTINUE_FINAL_MESSAGE_TAG " -%} {%- set _has_cfm = false -%} {%- if message.content is defined -%} {%- set content = parse_content(message.content) -%} {%- if not (preserve_thinking or loop.index0 > ns.last_user_index) -%} {%- if "</think>" in content -%} {%- set content = content.split("</think>")[-1] | trim -%} {%- endif -%} {%- endif -%} {%- if message.tool_calls is defined and content.endswith(_cfm_tag) -%} {%- set _has_cfm = true -%} {%- set _trunc_len = (content | length) - (_cfm_tag | length) -%} {{- content[:_trunc_len] -}} {%- else -%} {{- content -}} {%- endif -%} {%- endif -%} {%- if message.tool_calls is defined -%} {{- render_tool_calls(message.tool_calls) -}} {%- endif -%} {%- if _has_cfm -%} {{- _cfm_tag -}} {%- endif -%} {{- "<|im_end|>\n" -}} {%- endgeneration -%} {%- else %} {%- if message.get("content") -%} {{- parse_content(message["content"]) -}} {%- endif -%} {{- "<|im_end|>\n" -}} {%- endif %} {%- endfor -%} {%- if add_generation_prompt -%} {{- "<|im_start|>assistant\n" -}} {%- endif -%}
This chat template from https://huggingface.co/nathanrchn/LFM2.5-8B-A1B-GGUF-fixed-v2 seems to work
{# List of tools: [ #} {{- bos_token -}} {%- set preserve_thinking = preserve_thinking | default(false) -%} {%- macro format_arg_value(arg_value) -%} {%- if arg_value is string -%} {{- "'" + arg_value + "'" -}} {%- elif arg_value is mapping -%} {{- arg_value | tojson -}} {%- else -%} {{- arg_value | string -}} {%- endif -%} {%- endmacro -%} {%- macro parse_content(content) -%} {%- if content is string -%} {{- content -}} {%- else -%} {%- set _ns = namespace(result="") -%} {%- for item in content -%} {%- if item["type"] == "image" -%} {%- set _ns.result = _ns.result + "<image>" -%} {%- elif item["type"] == "text" -%} {%- set _ns.result = _ns.result + item["text"] -%} {%- else -%} {%- set _ns.result = _ns.result + item | tojson -%} {%- endif -%} {%- endfor -%} {{- _ns.result -}} {%- endif -%} {%- endmacro -%} {%- macro render_tool_calls(tool_calls) -%} {%- set tool_calls_ns = namespace(tool_calls=[]) -%} {%- for tool_call in tool_calls -%} {%- set func_name = tool_call["function"]["name"] -%} {%- set func_args = tool_call["function"]["arguments"] -%} {%- set args_ns = namespace(arg_strings=[]) -%} {%- for arg_name, arg_value in func_args.items() -%} {%- set args_ns.arg_strings = args_ns.arg_strings + [arg_name + "=" + format_arg_value(arg_value)] -%} {%- endfor -%} {%- set tool_calls_ns.tool_calls = tool_calls_ns.tool_calls + [func_name + "(" + (args_ns.arg_strings | join(", ")) + ")"] -%} {%- endfor -%} {{- "<|tool_call_start|>[" + (tool_calls_ns.tool_calls | join(", ")) + "]<|tool_call_end|>" -}} {%- endmacro -%} {%- set ns = namespace(system_prompt="", last_user_index=-1) -%} {%- if messages[0]["role"] == "system" -%} {%- if messages[0].get("content") -%} {%- set ns.system_prompt = parse_content(messages[0]["content"]) -%} {%- endif -%} {%- set messages = messages[1:] -%} {%- endif -%} {%- if tools -%} {%- set ns.system_prompt = ns.system_prompt + ("\n\n" if ns.system_prompt else "") + "Today's date: " + strftime_now("%Y-%m-%d") + "\n\nList of tools: " + (tools | tojson) -%} {%- endif -%} {%- if ns.system_prompt -%} {{- "<|im_start|>system\n" + ns.system_prompt + "<|im_end|>\n" -}} {%- endif -%} {%- for message in messages -%} {%- if message["role"] == "user" -%} {%- set ns.last_user_index = loop.index0 -%} {%- endif -%} {%- endfor -%} {%- for message in messages -%} {{- "<|im_start|>" + message.role + "\n" -}} {%- if message.role == "assistant" -%} {%- generation -%} {%- if message.thinking is defined and (preserve_thinking or loop.index0 > ns.last_user_index) -%} {{- "<think>" + message.thinking + "</think>" -}} {%- endif -%} {%- set _cfm_tag = "CONTINUE_FINAL_MESSAGE_TAG " -%} {%- set _has_cfm = false -%} {%- if message.content is defined -%} {%- set content = parse_content(message.content) -%} {%- if not (preserve_thinking or loop.index0 > ns.last_user_index) -%} {%- if "</think>" in content -%} {%- set content = content.split("</think>")[-1] | trim -%} {%- endif -%} {%- endif -%} {%- if message.tool_calls is defined and content.endswith(_cfm_tag) -%} {%- set _has_cfm = true -%} {%- set _trunc_len = (content | length) - (_cfm_tag | length) -%} {{- content[:_trunc_len] -}} {%- else -%} {{- content -}} {%- endif -%} {%- endif -%} {%- if message.tool_calls is defined -%} {{- render_tool_calls(message.tool_calls) -}} {%- endif -%} {%- if _has_cfm -%} {{- _cfm_tag -}} {%- endif -%} {{- "<|im_end|>\n" -}} {%- endgeneration -%} {%- else %} {%- if message.get("content") -%} {{- parse_content(message["content"]) -}} {%- endif -%} {{- "<|im_end|>\n" -}} {%- endif %} {%- endfor -%} {%- if add_generation_prompt -%} {{- "<|im_start|>assistant\n" -}} {%- endif -%}
Nice find man. It's not perfect, but it works very well.
Issues reported in llama.cpp: https://github.com/ggml-org/llama.cpp/issues?q=is%3Aissue%20state%3Aopen%20lfm2.5-8b
Tool calling has been fixed, and GGUF's were updated. Please pull.