# Data Files This folder contains the JSONL files used by the Wolof LoRA demo. ## Training Format Each training row must be one JSON object per line: ```json {"instruction": "question or task", "input": "education", "output": "Wolof answer"} ``` Required fields: - `instruction`: user question or task. - `input`: category/context. - `output`: expected Wolof answer. Supported categories: - `education` - `agriculture` - `sante` - `transport` - `culture` ## Main Files - `wolof_instruction_data.jsonl`: main training dataset. - `wolof_instruction_sample.jsonl`: small sample for pipeline tests. - `wolof_eval_examples.jsonl`: evaluation set with references and predictions. Additional generated files may appear here, for example: - `wolof_culture_salutations_1000.jsonl` - `1000_wol_instruct_data.jsonl` - `273_wol_instruct_data.jsonl` Validate a training file: ```bash python -c 'from src.data_utils import load_instruction_examples; print(len(load_instruction_examples("data/wolof_instruction_data.jsonl")))' ``` ## Evaluation Format Each evaluation row contains: ```json {"instruction": "question", "input": "education", "reference": "expected answer", "prediction": "model answer"} ``` Run evaluation: ```bash python evaluation.py --data data/wolof_eval_examples.jsonl ``` Generate predictions with the adapter before evaluating: ```bash python evaluation.py --data data/wolof_eval_examples.jsonl --generate ``` ## Appending New Data After generating new JSONL rows, validate the file first: ```bash python -c 'from src.data_utils import load_instruction_examples; print(len(load_instruction_examples("data/wolof_culture_salutations_1000.jsonl")))' ``` Then append: ```bash cat data/wolof_culture_salutations_1000.jsonl >> data/wolof_instruction_data.jsonl ```