--- license: apache-2.0 datasets: - gymprathap/Breast-Cancer-Ultrasound-Images-Dataset base_model: - google/vit-base-patch16-224-in21k - Parveshiiii/breast-cancer-detector pipeline_tag: image-classification tags: - image-classification - vision-transformer - vit - breast-cancer - breast-ultrasound - medical - medical-imaging - ultrasound - healthcare - radiology - oncology - pytorch --- #### Overview > Note: This checkpoint was donated to Huggingface-science to support open medical AI research - Anyone intrested in the making of this model please read my [blog](https://parveshiiii.github.io/blogs/breast-cancer-detector/) - [Live Demo](https://huggingface.co/spaces/Parveshiiii/breast-cancer-detection) Artificial Intelligence has revolutionized software engineering through automation, intelligent code assistance, and streamlined workflows. In the same way, AI is transforming the **medical field**, particularly in radiology and oncology, by helping clinicians detect diseases earlier and making diagnostic processes faster and more efficient. >The model demonstrates strong generalization and excellent accuracy even on unseen data. However, as a responsible developer, I strongly recommend using this model only as an assistive tool — never as a standalone diagnostic solution. It should always be used in conjunction with the professional judgment of a qualified radiologist or clinician. **huggingface-science/breast-cancer-detector** is a three-class image classification model designed to assist healthcare professionals in analyzing **breast ultrasound images**. It classifies images into: - **0: benign** — Non-cancerous findings - **1: malignant** — Suspicious or cancerous findings - **2: normal** — No visible abnormalities The model acts as a supportive tool that can help reduce workload, provide a quick second opinion, and contribute to earlier detection of breast cancer — one of the most common cancers affecting women worldwide. #### Intended Use - **Primary Use**: Classification of clean breast ultrasound images to support screening and diagnosis workflows. - **Target Users**: Radiologists, oncologists, medical researchers, and developers building healthcare AI applications. - **Deployment Examples**: Integration into web demos (Gradio/Streamlit), hospital decision-support systems, or research pipelines (via Hugging Face Inference API). - **Out-of-Scope**: - Use with mammography, MRI, CT, or any non-ultrasound modality. - Images with text overlays, annotations, calipers, or other artifacts. - Real-time standalone diagnosis without radiologist review. - Pediatric cases or non-breast ultrasound images. **Important**: This model is **not** a replacement for professional medical judgment. It should always be used alongside expert review. #### Training Details - **Dataset**: [gymprathap/Breast-Cancer-Ultrasound-Images-Dataset](https://huggingface.co/datasets/gymprathap/Breast-Cancer-Ultrasound-Images-Dataset) This dataset contains approximately 1,578 breast ultrasound images (PNG format) categorized into normal, benign, and malignant classes. The images were originally collected from women aged 25–75 years. - **Training Size**: ~1,500 samples (subset used for training). - **Data Augmentation**: 20% noise was intentionally added to the training data to improve generalization and robustness against variations in image quality and acquisition conditions. - **Training Duration**: 12 epochs - **Training Metrics** (on training and validation sets): | Epoch | Training Loss | Validation Loss | Accuracy | |-------|---------------|-----------------|-----------| | 1 | 0.8091 | 0.6462 | 0.8127 | | 2 | 0.4966 | 0.4761 | 0.8311 | | 3 | 0.4988 | 0.5465 | 0.7388 | | 4 | 0.5662 | 0.3808 | 0.8707 | | 5 | 0.3819 | 0.2817 | 0.9156 | | 6 | 0.4244 | 0.2713 | 0.9288 | | 7 | 0.3856 | 0.2570 | 0.9261 | | 8 | 0.3428 | 0.2442 | 0.9446 | | 9 | 0.3087 | 0.2163 | 0.9446 | | 10 | 0.3349 | 0.2100 | 0.9393 | | 11 | 0.3559 | 0.2261 | 0.9393 | | 12 | 0.2707 | 0.2248 | **0.9446** | The model shows strong convergence and high final validation accuracy (~94.46%), with good robustness thanks to the noise augmentation. #### Evaluation & Performance - **Strengths**: The model shows excellent detection performance on the trained distribution. The 20% noise augmentation during training helped improve robustness against real-world variations in ultrasound image quality. - **Internal Evaluation**: Final validation accuracy of **94.46%** after 12 epochs. ### External Benchmarking To further evaluate generalization on unseen data, we tested the model on the external dataset **as-cle-bert/breastcanc-ultrasound-class**, which contains **647 breast ultrasound images** with only two classes (`benign_breast_cancer` and `malignant_breast_cancer`). **Note**: Since this dataset does not include a "normal" class (while our model supports three classes), we mapped predictions as follows for evaluation: - Model output `"benign"` → `benign_breast_cancer` - Model output `"malignant"` → `malignant_breast_cancer` - Model output `"normal"` → excluded from primary metrics (reported separately) #### Benchmark Results - **Total samples**: 647 - **Samples evaluated** (excluding "normal" predictions): 644 - **"Normal" predictions** on this lesion-only dataset: **3 (0.46%)** **Performance on Benign vs Malignant Classification**: | Metric | Score | |---------------------------------|------------| | **Accuracy** | **96.12%** | | **Precision (Malignant)** | 94.26% | | **Recall / Sensitivity (Malignant)** | 93.81% | | **F1-Score (Malignant)** | 94.03% | **Detailed Classification Report**: | Class | Precision | Recall | F1-Score | Support | |------------|-----------|--------|----------|---------| | benign | 0.9701 | 0.9724 | 0.9712 | 434 | | malignant | 0.9426 | 0.9381 | 0.9403 | 210 | | **accuracy** | - | - | **0.9612** | 644 | | macro avg | 0.9563 | 0.9552 | 0.9558 | 644 | | weighted avg | 0.9611 | 0.9612 | 0.9612 | 644 | The model demonstrates **strong generalization**, achieving **96.12% accuracy** on this external dataset despite the difference in label space and limited training data. The very low rate of "normal" predictions (0.46%) on images containing only lesions further highlights the model's reliability. #### Limitations - Trained on a relatively small dataset (~1,500 samples) derived from ~1,578 images. - Input images **must be exclusively clean breast ultrasound images** with **no text overlays, annotations, markers, or other artifacts**. - Only supports the three defined labels (benign, malignant, normal). It cannot handle ambiguous, multi-lesion, or out-of-distribution cases. - Performance may vary across different ultrasound machines, patient demographics (age, ethnicity, breast density), geographic regions, or lower-quality scans. - Small dataset size limits full representation of global diversity. #### Ethical Considerations & Risks - **Bias & Fairness**: The limited dataset may not fully capture variations in imaging equipment, patient populations, or rare presentations. Thorough testing on diverse datasets is strongly recommended. - **Clinical Use**: False negatives could delay diagnosis; false positives may cause unnecessary anxiety or procedures. **Always combine with human expert review**. - **Privacy**: Ensure compliance with local regulations (e.g., HIPAA, GDPR, or Indian DPDP Act) when using real patient data. - **Recommendations**: Perform external clinical validation before any deployment. Monitor for performance drift in production. This model follows the spirit of large-scale medical AI efforts (such as Google Health and DeepMind’s breast cancer screening research), which emphasize AI as a powerful assistive tool that augments, rather than replaces, clinical expertise. #### How to Use ```python from transformers import pipeline classifier = pipeline("image-classification", model="huggingface-science/breast-cancer-detector") result = classifier("path/to/clean_breast_ultrasound_image.png") # you can use a link also print(result) ``` The model outputs labels via the mapping: ```json { "0": "benign", "1": "malignant", "2": "normal" } ``` **Input Requirements**: Provide only clean breast ultrasound images (PNG/JPG) without any text or overlays. #### Citation If you use this model, please cite: > Parveshiiii (2026). breast-cancer-detector: A three-class breast ultrasound classifier trained on gymprathap/Breast-Cancer-Ultrasound-Images-Dataset. **Disclaimer**: This model is provided for research and assistive purposes only. The developer bears no liability for any clinical decisions made using this tool. Always consult qualified healthcare professionals for medical diagnosis. ---