bennysee's picture
Upload README.md with huggingface_hub
c48e260 verified
|
Raw
History Blame Contribute Delete
844 Bytes
metadata
language: am
license: mit
tags:
  - gpt2
  - amharic
  - onnx
  - quantized
  - q8

GPT-2 Amharic - Quantized ONNX (Q8)

This is a quantized version of rasyosef/gpt2-small-amharic converted to ONNX with 8-bit quantization.

Model Size

  • Original PyTorch: ~550 MB
  • FP32 ONNX: 130.83 MB
  • Q8 ONNX: 34.74 MB (93.7 percent smaller)

Usage

import onnxruntime as ort
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained('bennysee/gpt2-amharic-onnx-q8')
session = ort.InferenceSession('bennysee/gpt2-amharic-onnx-q8/model_q8.onnx')

text = 'ሰላም'
inputs = tokenizer(text, return_tensors='np')
outputs = session.run(None, {'input_ids': inputs['input_ids']})

Files

  • model_q8.onnx (33.24 MB) - Quantized model
  • tokenizer.json - Tokenizer vocabulary
  • tokenizer_config.json - Tokenizer config