AmharicCLIP

Stable Diffusion v1.5 extended to support Amharic (Ethiopic script) prompts.

This repository contains three components — each with its own documentation:

Component	Folder	Description
🔤 Text Encoder	`text_encoder/`	Fine-tuned CLIPTextModel with Amharic support
📝 Tokenizer	`tokenizer/`	Patched tokenizer with 512 Ethiopic atomic tokens
🖼️ Full Pipeline	`pipeline/`	Complete SD v1.5 with Amharic text encoder

Quick Start

from diffusers import StableDiffusionPipeline
from huggingface_hub import snapshot_download
import torch

# Download pipeline from HuggingFace
path = snapshot_download(
    repo_id="michealnaye/AmharicCLIP",
    allow_patterns="pipeline/*",
)

# Load pipeline
pipe = StableDiffusionPipeline.from_pretrained(
    f"{path}/pipeline",
    torch_dtype=torch.float16,
    safety_checker=None,
)
pipe = pipe.to("cuda")

# Generate from Amharic prompt
image = pipe("የድመት ፎቶ").images[0]  # photo of a cat
image.save("cat.png")

Example Results

Amharic Prompt	English	Generated
የድመት ፎቶ	photo of a cat	✓
የውሻ ፎቶ	photo of a dog	✓
የዝሆን ፎቶ	photo of an elephant	✓
የፈረስ ፎቶ	photo of a horse	✓
የቢራቢሮ ፎቶ	photo of a butterfly	✓

The Problem We Solved

OpenAI's CLIP tokenizer has no Amharic vocabulary. Each Ethiopic character fragments into 9 byte-level tokens, causing:

Severe context window waste (77-token limit hit quickly)
Meaningless embeddings → SD generates noise instead of images

Our fix reduces token count by 66% and achieves 100% round-trip fidelity.

Citation

@misc{amharicclip2024,
  title={AmharicCLIP: Extending CLIP to Amharic via Atomic Tokenization and Knowledge Distillation},
  author={Micheal Naye},
  year={2024},
  publisher={HuggingFace},
  url={https://huggingface.co/michealnaye/AmharicCLIP}
}

Downloads last month: -