--- title: Kab Ocr Tanti emoji: 📚 colorFrom: indigo colorTo: purple sdk: docker #sdk_version: 6.0.2 app_file: app.py app_port: 7860 # Tell HF Spaces to check port 7860 for the app pinned: false license: apache-2.0 short_description: This Space provides a web interface for Optical Character Re --- # Asemmezdey Asekdan n Teqbaylit - Kabyle OCR By Bouaziz Ait Driss This Space provides a web interface for Optical Character Recognition (OCR) tailored for the Taqbaylit (Kabyle) language using a custom Tesseract model ('kab.traineddata') with support for special characters (ɣ, ɛ, ḍ, ṭ, ḥ, ṛ, ṣ, ẓ, ǧ, č). ## Features * Upload PDF, PNG, JPG, or JPEG files. * Perform OCR using the custom 'kab' model. * Preview documents (for PDFs). * Edit the extracted text. * Download the final text as a UTF-8 encoded `.txt` file. * Adjust display DPI and font size for better user experience. ## How to Use 1. Upload a file using the sidebar. 2. Click "Sekker PDF (Askan n Yisebtar)" if it's a PDF to load previews. 3. Click "Sekker OCR" to start the OCR process. 4. Edit the text in the right panel if needed. 5. Download the final text using the "Zdem Aḍris" button. ## Known Limitations * Numbers: Limited training data. * Some old less used characters such as "Г" equivalent to "ɣ" and "ţ" equivalent to "tt". * Performance degrades with poor scan quality. * Best results on printed text (not handwritten). ============================================================================== English will follow Annar-a d afecku iteddun deg uẓeṭṭa n internet i usemmezdey aseklan n Teqbaylit (OCR). Yettunefk-d ilmend n tutlayt Taqbaylit. Yebna ɣef tmudemt Tesseract ('kab.traineddata') ideg kkin yisekkilen n Teqbaylit / Tamaziɣt (ɣ, ɛ, ḍ, ṭ, ḥ, ṛ, ṣ, ẓ, ǧ, č). Tiwura * Sali afaylu PDF, PNG, JPG, neɣ JPEG * Sekker OCR suseqdec n tmudemt 'kab'. * Sekker PDF i uskan n yisebtar. * Zṛeg aḍris, seɣti tira-s ma ilaq. * Zdem aḍris s talɣa UTF8, afaylu `.txt`. * Beddel DPI n uskan akked tiddi n yisekkilen. Amek iteddu 1. Sal afaylu seg ufeggag n yifecka 2. Tekki ɣef "Sekker PDF (Askan n Yisebtar)" ma d aPDF akken ad d-iban. 3. Tekki ɣef "Sekker OCR" akken ad yebdu usemmezdey n yisekkilen (OCR). 4. Zṛeg aḍris i d-yettkaden deg usfaylu yeffes ma ilaq. 5. Tekki ɣef "Zdem Aḍris" akken ad d-yeḥrez ufaylu. Ayen ixuṣṣen * Amḍan: Ur yemmid ara uselmed ɣef yimḍanen. * Kra isekkilen iqburen ur ten-yesemmezdey (ɛeqqel) ara am "Г" yettwarun "ɣ" akked "ţ" yettwarun "tt". * Tamellit tɣelli mi ara yeɣli umerkid n uskan n tugniwin. * Asufeɣ n usemmezdey n uḍris ad yelhu i yiḍrisen yettḍebɛen (anagar ayen yuran s uɣanib ufus). Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference