AmharicCLIP

Stable Diffusion v1.5 extended to support Amharic (Ethiopic script) prompts.

This repository contains three components — each with its own documentation:

Component Folder Description
🔤 Text Encoder text_encoder/ Fine-tuned CLIPTextModel with Amharic support
📝 Tokenizer tokenizer/ Patched tokenizer with 512 Ethiopic atomic tokens
🖼️ Full Pipeline pipeline/ Complete SD v1.5 with Amharic text encoder

Quick Start

from diffusers import StableDiffusionPipeline
from huggingface_hub import snapshot_download
import torch

# Download pipeline from HuggingFace
path = snapshot_download(
    repo_id="michealnaye/AmharicCLIP",
    allow_patterns="pipeline/*",
)

# Load pipeline
pipe = StableDiffusionPipeline.from_pretrained(
    f"{path}/pipeline",
    torch_dtype=torch.float16,
    safety_checker=None,
)
pipe = pipe.to("cuda")

# Generate from Amharic prompt
image = pipe("የድመት ፎቶ").images[0]  # photo of a cat
image.save("cat.png")

Example Results

Amharic Prompt English Generated
የድመት ፎቶ photo of a cat
የውሻ ፎቶ photo of a dog
የዝሆን ፎቶ photo of an elephant
የፈረስ ፎቶ photo of a horse
የቢራቢሮ ፎቶ photo of a butterfly

The Problem We Solved

OpenAI's CLIP tokenizer has no Amharic vocabulary. Each Ethiopic character fragments into 9 byte-level tokens, causing:

  • Severe context window waste (77-token limit hit quickly)
  • Meaningless embeddings → SD generates noise instead of images

Our fix reduces token count by 66% and achieves 100% round-trip fidelity.

Citation

@misc{amharicclip2024,
  title={AmharicCLIP: Extending CLIP to Amharic via Atomic Tokenization and Knowledge Distillation},
  author={Micheal Naye},
  year={2024},
  publisher={HuggingFace},
  url={https://huggingface.co/michealnaye/AmharicCLIP}
}
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support