---
license: mit
base_model: bharatgenai/LegalParam
tags:
- gguf
- llama.cpp
- ollama
- quantized
- 2.9B
- indian-law
- legal
- llama
language:
- en
pipeline_tag: text-generation
---

# LegalParam GGUF Models

GGUF quantized versions of [bharatgenai/LegalParam](https://huggingface.co/bharatgenai/LegalParam) for use with Ollama.

## Model Information

**Original Model:** [bharatgenai/LegalParam](https://huggingface.co/bharatgenai/LegalParam)
- **Architecture:** ParamBharatGen (LLaMA-based)
- **Parameters:** 2.9B
- **Context Length:** 2048 tokens
- **Purpose:** Specialized AI assistant for Indian law

## Available Quantizations

| Quantization | File Size | Description | Use Case |
|-------------|-----------|-------------|----------|
| Q4_K_M | 1.7GB | 4-bit quantized | Recommended for most use cases |
| Q6_K | 2.2GB | 6-bit quantized | Higher quality, moderate resource usage |
| F16 | 5.4GB | 16-bit float (no quantization) | Highest quality, requires more memory |

## Quick Start

### 1. Install Ollama

```bash
curl -fsSL https://ollama.com/install.sh | sh
```

### 2. Create the Model

Choose a quantization level:

```bash
# Q4_K_M (Recommended - 1.7GB)
ollama create legalparam:q4 -f Modelfile

# Q6_K (Higher quality - 2.2GB)
ollama create legalparam:q6 -f Modelfile-q6

# F16 (Highest quality - 5.4GB)
ollama create legalparam:f16 -f Modelfile-f16
```

### 3. Run the Model

```bash
# Interactive chat
ollama run legalparam:q4

# Single query
ollama run legalparam:q4 "What steps should a farmer take to legally transfer agricultural land ownership?"
```

## Python Usage

```python
from ollama import Client

client = Client()

response = client.chat(model='legalparam:q4', messages=[
  {'role': 'user', 'content': 'What are the fundamental rights in the Indian Constitution?'}
])

print(response['message']['content'])
```

## Model File Details

All Modelfiles include:
- **Correct chat template** matching the tokenizer's format
- **Stop tokens** (`</s>`, `<user>`, `<assistant>`) to prevent infinite generation loops
- **Optimized parameters** for legal question answering

### Chat Template Format

```
<user>
{user_message}
<assistant>
{assistant_response}
```

## Context Window

- **Default:** 2048 tokens (combined input + output)
- **Scaling:** Can be extended with RoPE scaling in Ollama (experimental)

## Example Queries

The model excels at Indian legal queries:

- "Explain the First Amendment of the Indian Constitution"
- "What is the procedure for filing a civil suit in India?"
- "What are the key provisions of the Land Acquisition Act?"
- "Explain the concept of judicial review in India"
- "What are the powers of the Supreme Court of India?"

## Technical Specifications

### Model Architecture
- Hidden size: 2048
- Layers: 32
- Attention heads: 16
- KV heads: 8 (Grouped Query Attention)
- Vocabulary: 256,006 tokens

### Special Tokens
- `<s>`: Beginning of sequence (BOS)
- `</s>`: End of sequence (EOS)
- `<user>`: User message marker
- `<assistant>`: Assistant message marker

## Limitations

- Context limited to 2048 tokens
- Training data cutoff: August 2023
- Optimized for Indian law queries
- May not perform well on non-legal topics

## Original Model

This is a quantized version of [bharatgenai/LegalParam](https://huggingface.co/bharatgenai/LegalParam). For the original PyTorch model, training details, and full documentation, please refer to the original repository.

## License

Please refer to the [original model repository](https://huggingface.co/bharatgenai/LegalParam) for licensing information.

## Conversion Process

These models were converted from the original HuggingFace format to GGUF using llama.cpp with the following process:
1. Loaded original model with transformers
2. Converted to GGUF format
3. Quantized to Q4_K_M, Q6_K, and F16 precision
4. Validated with Ollama inference engine

## Troubleshooting

### Model repeats or loops
- Ensure you're using the provided Modelfiles
- Stop tokens are pre-configured to prevent infinite loops

### Out of memory errors
- Try a smaller quantization (Q4_K_M instead of Q6_K)
- Reduce `num_ctx` parameter in Ollama

### Poor quality responses
- Try F16 quantization for highest quality
- Ensure proper prompt formatting with `<user>` and `<assistant>` tags

## Acknowledgments

- Original model: [bharatgenai/LegalParam](https://huggingface.co/bharatgenai/LegalParam)
- GGUF conversion: [llama.cpp](https://github.com/ggerganov/llama.cpp)
- Inference engine: [Ollama](https://ollama.ai)