--- title: RefCheck emoji: 🔍 colorFrom: blue colorTo: indigo sdk: gradio app_file: app.py python_version: 3.11 suggested_hardware: cpu-basic fullWidth: true short_description: Upload BibTeX, validate citations, download fixes. tags: - bibtex - citations - academic - bibliography --- # RefCheck 🔍 > **A Citation Hallucination Detector & Auto-Fixer** > Validate and automatically correct your BibTeX bibliography against multiple academic databases. [![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/) [![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE) --- ## Why RefCheck? Academic papers often contain citation errors — wrong titles, incorrect authors, mismatched years, or even completely fabricated references (hallucinations from AI tools). **RefCheck** automatically: - ✅ **Validates** each citation against 6 academic databases - 🔧 **Auto-fixes** metadata mismatches (title, authors, year, DOI) - 🗑️ **Removes** unverifiable/hallucinated entries - 📊 **Reports** a clear verification summary --- ## Features ### Multi-Source Verification RefCheck cross-references your citations against: | Source | Lookup Methods | |--------|----------------| | **arXiv** | arXiv ID, Title search | | **CrossRef** | DOI, Title search | | **DBLP** | Title search | | **Semantic Scholar** | DOI, Title search | | **OpenAlex** | DOI, Title search | | **Google Scholar** | Title search (disabled by default) | ### Two-Pass Workflow 1. **Pass 1 — Validate & Fix**: Checks each entry, auto-corrects metadata, removes invalid citations 2. **Pass 2 — Verify**: Re-validates the cleaned file to confirm all entries are correct --- ## Installation ```bash # Clone the repository git clone https://github.com/voidful/RefCheck.git cd RefCheck # Install dependencies pip install -r requirements.txt ``` ### Requirements - Python 3.9+ - Dependencies: `bibtexparser`, `requests`, `beautifulsoup4`, `rich`, `Unidecode`, `lxml` --- ## Usage ### Hugging Face Space This repository is ready to run as a Gradio Space. Create a Hugging Face Space with the Gradio SDK, push these files, and the Space will launch `app.py`. The Space UI accepts a `.bib` upload and returns: - a corrected BibTeX file - a Markdown validation report - a list of entries that still need manual review ### Basic Usage ```bash # Validate and auto-fix a bib file python main.py --bib references.bib ``` ### Command-Line Options | Option | Short | Description | |--------|-------|-------------| | `--bib` | `-b` | Path to your `.bib` file (required) | | `--output` | `-o` | Output report path (optional) | ### Example ```bash # Process your bibliography python main.py --bib paper/references.bib # With custom output path python main.py --bib refs.bib --output validation_report.md ``` --- ## How It Works ``` ┌─────────────────┐ │ Load .bib file │ └────────┬────────┘ ▼ ┌─────────────────────────────────────────┐ │ For each entry: │ │ 1. Query academic databases │ │ 2. Compare metadata (title, author, yr)│ │ 3. Calculate confidence score │ └────────┬────────────────────────────────┘ ▼ ┌─────────────────────────────────────────┐ │ Decision: │ │ • confidence > 85% → Auto-fix metadata │ │ • Match found → Keep as-is │ │ • No match → Remove entry │ └────────┬────────────────────────────────┘ ▼ ┌─────────────────────────────────────────┐ │ Save updated .bib file │ │ Run verification pass │ └─────────────────────────────────────────┘ ``` --- ## Output RefCheck displays real-time progress and a final summary: ``` 📚 BibGuard - Auto-Fix & Verify Target: references.bib Found 42 entries. Running validation and auto-fix... Validating & Fixing ━━━━━━━━━━━━━━━━━ 100% 42/42 ✓ 38 ⚠ 2 ✗ 2 ✏️ Updates: - Fixed 2 entries (metadata updated) - Removed 2 invalid/hallucinated entries ✓ File saved. 🔄 Double checking (Re-validation)... Verifying ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 40/40 ✓ 40 ================================================== 📊 Final Status ================================================== Total: 40 ✓ Verified: 40 ⚠ Issues: 0 ✗ Not found: 0 ``` ### Status Meanings | Symbol | Meaning | |--------|---------| | ✅ Verified | Entry matches a known publication | | ⚠️ Fixed | Metadata was auto-corrected | | ❌ Removed | Entry could not be verified (likely hallucination) | --- ## Project Structure ``` RefCheck/ ├── main.py # Entry point & workflow orchestration ├── requirements.txt # Python dependencies ├── README.md └── src/ ├── fetcher.py # API clients for academic databases ├── comparator.py # Metadata comparison & scoring ├── parser.py # BibTeX parsing & saving └── utils.py # Progress display & text utilities ``` --- ## License MIT License — see [LICENSE](LICENSE) for details. --- ## Contributing Contributions are welcome! Please feel free to submit a Pull Request. --- ## Acknowledgments Built with: - [bibtexparser](https://github.com/sciunto-org/python-bibtexparser) for BibTeX handling - [Rich](https://github.com/Textualize/rich) for beautiful terminal output - APIs from arXiv, CrossRef, DBLP, Semantic Scholar, and OpenAlex