GPT-2 Amharic Model - Quantized ONNX (Q8) ========================================== Original model: rasyosef/gpt2-small-amharic Conversion date: 2026-06-06 21:08:42 Model sizes: - FP32 ONNX: 130.83 MB - Q8 ONNX: 34.74 MB How to use the quantized model: import onnxruntime as ort from transformers import AutoTokenizer # Load tokenizer tokenizer = AutoTokenizer.from_pretrained("/content/drive/MyDrive/gpt2-amharic-onnx/onnx_q8") # Load Q8 model session = ort.InferenceSession("/content/drive/MyDrive/gpt2-amharic-onnx/onnx_q8/model_q8.onnx") # Run inference text = "ሰላም" inputs = tokenizer(text, return_tensors="np") outputs = session.run(None, {'input_ids': inputs['input_ids']})