---
language: fr
tags:
- mistral
- lora
- qlora
- immobilier
- estimation
- price
- real-estate
license: mit
datasets:
- custom
base_model: mistralai/Mistral-7B-Instruct-v0.3
pipeline_tag: text-generation
---
🌐 **Looking for the English version?** Scroll down 👇

👉 [Jump to English version](#english-version)

# Lyra-Mistral7B-immobilier-LoRA 🏠💶
<img src="https://upload.wikimedia.org/wikipedia/en/c/c3/Flag_of_France.svg" width="40px" height="auto" />

![SouverainAI](https://img.shields.io/badge/🇫🇷%20SouverainAI-oui-success)
![EUstack](https://img.shields.io/badge/🇪🇺%20EUstack-ready-blue)

## Description
Ce modèle est une adaptation LoRA du **Mistral-7B-Instruct-v0.3** spécialisée sur des données synthétiques d’**estimation immobilière**.  
Objectif : proposer en français une **estimation de prix** (fourchette en €) assortie d’un **commentaire concis** (1–2 phrases), à partir de paramètres simples : surface, type de bien, état et zone (A, B1, B2 ou C).

## Présentation du pipeline

Ce projet est construit en **deux parties complémentaires**, qui illustrent la chaîne complète de traitement d’un cas réel de données immobilières :

### Partie 1 – Préparation du dataset
- Point de départ : un fichier Excel client, volontairement bruité (valeurs manquantes, doublons, formats hétérogènes).
- Nettoyage et normalisation des données avec **pandas** :
  - conversion des prix en numérique,
  - gestion des valeurs manquantes,
  - suppression des doublons,
  - contrôle qualité statistique.
- Export en **CSV UTF-8** puis conversion en **JSONL** au format attendu par Mistral (structure `messages`).
- Résultat : deux jeux de données (`train` et `validation`) propres, exploitables directement pour le fine-tuning.

### Partie 2 – Fine-tuning LoRA Mistral-7B

## Détails techniques
- Base model: `mistralai/Mistral-7B-Instruct-v0.3`  
- Technique: QLoRA (4-bit, bitsandbytes)  
- Modules LoRA: q_proj, k_proj, v_proj, o_proj, down_proj  
- Epochs: 3  
- GPU: A100 (Colab Pro)  

## Résultats observés
Le modèle apprend à :  
- répondre en français avec un format clair : **Estimation + commentaire**,  
- respecter les zones fournies (A, B1, B2, C),  
- éviter les hallucinations de villes/régions.  
Comparé au modèle de base, il est beaucoup plus cadré et concis.

## Utilisation
```python
!pip install -q transformers accelerate bitsandbytes sentencepiece huggingface_hub
```

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "jeromex1/Lyra-Mistral7B-immobilier-LoRA"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16
)

system_prompt = (
    "Tu es un expert en estimation immobilière. "
    "Réponds en français, de manière concise, en 1 à 2 phrases maximum. "
    "Donne une estimation chiffrée (fourchette en euros) et un bref commentaire. "
    "Les localisations sont classées par zone : A, B1, B2 ou C. "
    "NE MENTIONNE PAS de villes ou régions, uniquement la zone donnée."
)

user_prompt = "Surface : 96 m²\nType : maison\nÉtat : bon\nZone : B1"
full_prompt = f"{system_prompt}\n\n{user_prompt}\n\nRéponse :"

inputs = tokenizer(full_prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=80)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

```

| Prompt (input)                                                        | Réponse générée                                                                                                                                |
| --------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- |
| Surface : 96 m²<br>Type : maison<br>État : bon<br>Zone : B1           | Valeur estimée entre 300 000 € et 400 000 €. Bien situé dans une zone B1, cette maison de 96 m² est en très bon état.                          |
| Surface : 55 m²<br>Type : appartement<br>État : à rénover<br>Zone : C | Estimation : 100 000 – 120 000 €<br>Commentaire : Appartement à rénover dans une zone C, l'estimation est comprise entre 100 000 et 120 000 €. |
| Surface : 120 m²<br>Type : maison<br>État : excellent<br>Zone : A     | Valeur estimée entre 500 000 € et 600 000 €. Situation privilégiée, grande surface et état excellent.                                          |
| Surface : 80 m²<br>Type : appartement<br>État : moyen<br>Zone : B2    | Estimation : 250 000 – 300 000 €<br>Commentaire : Bien situé dans une zone B2, cet appartement de 80 m² est en bon état de conservation.       |
| Surface : 150 m²<br>Type : maison<br>État : ancien<br>Zone : C        | Valeur estimée entre 200 000 € et 300 000 € pour une maison ancienne de 150 m² dans la zone C.                                                 |

Recommandation

Nous conseillons vivement d’utiliser Google Colab
 (gratuit ou Colab Pro) et d’y sélectionner un GPU (T4 si gratuit, A100 si Colab Pro).


📘 Découvrez mes 40 projets IA et sciences STEM ici : 👉 [github.com/Jerome-openclassroom](https://github.com/Jerome-openclassroom/)

📘 **Pour en savoir plus**, rendez-vous sur ma page GitHub consacrée au projet :  👉 [Lyra-Mistral7B-Estate](https://github.com/Jerome-openclassroom/Lyra-Mistral7B-Estate)

<a name="english-version"></a>

🌍 English Version
## Description

This model is a LoRA adaptation of Mistral-7B-Instruct-v0.3, specialized in synthetic real-estate estimation data.
Goal: provide concise French responses with a price range (in €) and a short comment based on simple parameters: surface, property type, condition, and zone (A, B1, B2, C).

## Pipeline Overview

This project is built in **two complementary parts**, illustrating the full workflow of a real-world real estate dataset:

### Part 1 – Dataset Preparation
- Starting point: a raw Excel file, intentionally noisy (missing values, duplicates, heterogeneous formats).
- Data cleaning and normalization with **pandas**:
  - conversion of prices into numeric format,
  - handling of missing values,
  - removal of duplicates,
  - statistical quality checks.
- Export to **UTF-8 CSV** and conversion to **JSONL** in the format required by Mistral (`messages` structure).
- Result: two clean datasets (`train` and `validation`), directly usable for fine-tuning.

### Part 2 – Fine-tuning LoRA Mistral-7B


# Technical Details

- Base model: mistralai/Mistral-7B-Instruct-v0.3

- Technique: QLoRA (4-bit, bitsandbytes)

- LoRA modules: q_proj, k_proj, v_proj, o_proj, down_proj

- Epochs: 3

- GPU: A100 (Colab Pro)

# Observed Results

The model learns to:

- answer in French with a clear “Estimation + comment” format,

- respect the given zone (A, B1, B2, C),

- avoid hallucinating real cities/regions.
Compared to the base model, outputs are much more concise and on-topic.

```python
!pip install -q transformers accelerate bitsandbytes sentencepiece huggingface_hub
```

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "jeromex1/Lyra-Mistral7B-immobilier-LoRA"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16
)

system_prompt = (
    "You are a real-estate estimation expert. "
    "Answer strictly in French, concisely, in 1–2 sentences. "
    "Always provide a price range in euros and a short comment. "
    "Localisations are zones A, B1, B2 or C. "
    "NEVER mention cities or regions, only the given zone."
)

user_prompt = "Surface : 96 m²\nType : maison\nÉtat : bon\nZone : B1"
full_prompt = f"{system_prompt}\n\n{user_prompt}\n\nRéponse :"

inputs = tokenizer(full_prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=80)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

```
📘 Discover my 40 AI & STEM projects here:
👉 [github.com/Jerome-openclassroom](https://github.com/Jerome-openclassroom/)

📘 To know more about this project :
👉 [Lyra-Mistral7B-Estate](https://github.com/Jerome-openclassroom/Lyra-Mistral7B-Estate/blob/main/README_En.md)