Data Files

This folder contains the JSONL files used by the Wolof LoRA demo.

Training Format

Each training row must be one JSON object per line:

{"instruction": "question or task", "input": "education", "output": "Wolof answer"}

Required fields:

Supported categories:

Additional generated files may appear here, for example:

Validate a training file:

python -c 'from src.data_utils import load_instruction_examples; print(len(load_instruction_examples("data/wolof_instruction_data.jsonl")))'

Each evaluation row contains:

{"instruction": "question", "input": "education", "reference": "expected answer", "prediction": "model answer"}

Run evaluation:

python evaluation.py --data data/wolof_eval_examples.jsonl

Generate predictions with the adapter before evaluating:

python evaluation.py --data data/wolof_eval_examples.jsonl --generate

After generating new JSONL rows, validate the file first:

python -c 'from src.data_utils import load_instruction_examples; print(len(load_instruction_examples("data/wolof_culture_salutations_1000.jsonl")))'

Then append:

cat data/wolof_culture_salutations_1000.jsonl >> data/wolof_instruction_data.jsonl