--- tags: - gguf - llama.cpp - unsloth - mind_call - function_call datasets: - frshafi/mind_call language: - en base_model: - meta-llama/Llama-3.2-3B-Instruct metrics: - accuracy: 88% --- # health_function_call_llama3.2_3b_gguf: GGUF A fine-tuned **Llama 3.2 3B GGUF model** designed for **structured function calling in healthcare edge devices**.Trained to convert natural language health queries into **JSON-based function calls**. Base Model: LLama 3.2 3B Fine Tuning: Parameter Efficient Fine Tuning. Targeted all linear layers (Q, K, V, O, gate, up, down), the model learned complex mapping logic while maintaining a tiny 10.5 MB adapter footprint. Quantization: Exported to GGUF (Q4_K_M) format. Dataset: The model is trained on the MindCall Dataset, a curated synthetic collection of 5,000+ high-fidelity health interaction pairs. ## 🚀 Key Features - Converts user queries → structured API calls - Lightweight GGUF format (runs locally via llama.cpp) - Optimized for deterministic outputs (low temperature) - Supports reasoning via `` tags ## 📦 Model Files - `Llama-3.2-3B-Instruct.Q4_K_M.gguf` ## ⚡ Quick Start (Python) ### Install dependencies ```bash pip install llama-cpp-python huggingface_hub ``` ### Load the model ``` code from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="ramgovindv/health_function_call_llama3.2_3b_gguf", filename="Llama-3.2-3B-Instruct.Q4_K_M.gguf", ) ``` ### Inference ``` code query = "I am feeling dizzy for 2 days" prompt = f""" You are an API generator. Return JSON in this format: {{ "name": "function_name", "parameters": {{ "key": "value" }} }} User query: {query} JSON: """ response = llm.create_chat_completion( messages=[{"role": "user", "content": prompt}], temperature=0.1 ) output = response["choices"][0]["message"]["content"] print(output) ``` ## Output ```code User has dizziness → likely need blood pressure check { "name": "get_blood_pressure_data", "parameters": { "num_days": 2 } } ``` `````` → reasoning
`````` → actual function call [](https://github.com/unslothai/unsloth)