File size: 5,453 Bytes
6076502
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
---
language:
- fr
license: apache-2.0
base_model: mistralai/Mistral-7B-Instruct-v0.3
tags:
- legal
- french
- question-answering
- mistral
- lora
- peft
pipeline_tag: text-generation
library_name: peft
---

# Mistral-7B French Legal Q&A Fine-tuned Model

This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) on a French legal question-answering dataset using LoRA (Low-Rank Adaptation).

## Model Details
- **Repository:** [`Nahla-yasmine/mistral-7b-french-legal-qa`](https://huggingface.co/Nahla-yasmine/mistral-7b-french-legal-qa)  
- **Base model:** [`mistralai/Mistral-7B-Instruct-v0.3`](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3)  
- **Fine-tuning Method**: LoRA (Low-Rank Adaptation) 
- **Language**: 🇫🇷 French
- **Domain**: Legal Q&A
- **Training Dataset Size**: ~846 samples
- **Fine-tuning Method**: LoRA (Low-Rank Adaptation) 
- **Training Method**: QLoRA (4-bit quantization)



##  Model Description

This is a **LoRA fine-tuned adapter** for Mistral-7B-Instruct, trained on a curated **French legal question-answering** dataset focused on **data protection and privacy laws** (e.g., Law 18-07 in Algeria).

The goal is to assist users in understanding legal rights, definitions, and procedures related to personal data protection.

---
## LoRA Configuration

- **r**: 16
- **alpha**: 32
- **dropout**: 0.1
- **target_modules**: ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]


## Usage

### Quick Start

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("Nahla-yasmine/mistral-7b-french-legal-qa")

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "mistralai/Mistral-7B-Instruct-v0.3",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Load LoRA weights
model = PeftModel.from_pretrained(base_model, "Nahla-yasmine/mistral-7b-french-legal-qa")

# Generate response
def ask_question(question):
    prompt = f"<s>[INST] {question} [/INST]"
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=200,
            temperature=0.3,
            top_p=0.9,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id
        )
    
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response.split("[/INST]")[-1].strip()

# Example usage
question = "Qu'est-ce qu'une donnée à caractère personnel ?"
answer = ask_question(question)
print(f"Question: {question}")
print(f"Answer: {answer}")
```

### With Memory-Efficient Loading (4-bit quantization)

```python
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
import torch

# Configure quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

# Load with quantization
tokenizer = AutoTokenizer.from_pretrained("Nahla-yasmine/mistral-7b-french-legal-qa")
base_model = AutoModelForCausalLM.from_pretrained(
    "mistralai/Mistral-7B-Instruct-v0.3",
    quantization_config=bnb_config,
    device_map="auto"
)
model = PeftModel.from_pretrained(base_model, "Nahla-yasmine/mistral-7b-french-legal-qa")
```

## Training Details

- **Epochs**: 3
- **Learning Rate**: 2e-4
- **Batch Size**: 2 (with gradient accumulation steps: 4)
- **Max Sequence Length**: 512 tokens
- **Optimizer**: paged_adamw_8bit
- **Warmup Ratio**: 0.1

## Example Questions

The model can answer various French legal questions such as:

- "Qu'est-ce qu'une donnée à caractère personnel ?"
- "Quels sont les droits de la personne concernée ?"
- "Quelles sanctions s'appliquent en cas de non-respect de la loi 18-07 ?"
- "Comment exercer son droit de rectification ?"

## Intended Use

This model is designed for answering questions about French legal topics, particularly data protection and privacy law. It should be used as a helpful assistant but **always verify important legal information with qualified professionals**.

## Limitations

- The model is fine-tuned on a specific French legal dataset (protection de données) and may not generalize to all legal questions
- Responses should be verified by qualified legal professionals  
- The model may occasionally generate inaccurate or incomplete information
- Limited to French legal context

## Ethical Considerations

- This model provides general legal information and should not replace professional legal advice
- Users should verify all legal information with qualified professionals
- The model should not be used for making important legal decisions without proper review

## Citation

If you use this model, please cite the original Mistral paper:

```bibtex
@article{jiang2023mistral,
  title={Mistral 7B},
  author={Jiang, Albert Q and Sablayrolles, Alexandre and Mensch, Arthur and Bamford, Chris and Chaplot, Devendra Singh and Casas, Diego de las and Bressand, Florian and Lengyel, Gianna and Lample, Guillaume and Saulnier, Lucile and others},
  journal={arXiv preprint arXiv:2310.06825},
  year={2023}
}
```

## Contact

For questions about this fine-tuned model, please open an issue in this repository.