--- language: am license: mit tags: - gpt2 - amharic - onnx - quantized - q8 --- # GPT-2 Amharic - Quantized ONNX (Q8) This is a quantized version of rasyosef/gpt2-small-amharic converted to ONNX with 8-bit quantization. ## Model Size - Original PyTorch: ~550 MB - FP32 ONNX: 130.83 MB - Q8 ONNX: 34.74 MB (93.7 percent smaller) ## Usage ```python import onnxruntime as ort from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained('bennysee/gpt2-amharic-onnx-q8') session = ort.InferenceSession('bennysee/gpt2-amharic-onnx-q8/model_q8.onnx') text = 'ሰላም' inputs = tokenizer(text, return_tensors='np') outputs = session.run(None, {'input_ids': inputs['input_ids']}) ``` ## Files - model_q8.onnx (33.24 MB) - Quantized model - tokenizer.json - Tokenizer vocabulary - tokenizer_config.json - Tokenizer config