| --- |
| title: RefCheck |
| emoji: π |
| colorFrom: blue |
| colorTo: indigo |
| sdk: gradio |
| app_file: app.py |
| python_version: 3.11 |
| suggested_hardware: cpu-basic |
| fullWidth: true |
| short_description: Upload BibTeX, validate citations, download fixes. |
| tags: |
| - bibtex |
| - citations |
| - academic |
| - bibliography |
| --- |
| |
| # RefCheck π |
|
|
| > **A Citation Hallucination Detector & Auto-Fixer** |
| > Validate and automatically correct your BibTeX bibliography against multiple academic databases. |
|
|
| [](https://www.python.org/downloads/) |
| [](LICENSE) |
|
|
| --- |
|
|
| ## Why RefCheck? |
|
|
| Academic papers often contain citation errors β wrong titles, incorrect authors, mismatched years, or even completely fabricated references (hallucinations from AI tools). **RefCheck** automatically: |
|
|
| - β
**Validates** each citation against 6 academic databases |
| - π§ **Auto-fixes** metadata mismatches (title, authors, year, DOI) |
| - ποΈ **Removes** unverifiable/hallucinated entries |
| - π **Reports** a clear verification summary |
|
|
| --- |
|
|
| ## Features |
|
|
| ### Multi-Source Verification |
|
|
| RefCheck cross-references your citations against: |
|
|
| | Source | Lookup Methods | |
| |--------|----------------| |
| | **arXiv** | arXiv ID, Title search | |
| | **CrossRef** | DOI, Title search | |
| | **DBLP** | Title search | |
| | **Semantic Scholar** | DOI, Title search | |
| | **OpenAlex** | DOI, Title search | |
| | **Google Scholar** | Title search (disabled by default) | |
|
|
| ### Two-Pass Workflow |
|
|
| 1. **Pass 1 β Validate & Fix**: Checks each entry, auto-corrects metadata, removes invalid citations |
| 2. **Pass 2 β Verify**: Re-validates the cleaned file to confirm all entries are correct |
|
|
| --- |
|
|
| ## Installation |
|
|
| ```bash |
| # Clone the repository |
| git clone https://github.com/voidful/RefCheck.git |
| cd RefCheck |
| |
| # Install dependencies |
| pip install -r requirements.txt |
| ``` |
|
|
| ### Requirements |
|
|
| - Python 3.9+ |
| - Dependencies: `bibtexparser`, `requests`, `beautifulsoup4`, `rich`, `Unidecode`, `lxml` |
|
|
| --- |
|
|
| ## Usage |
|
|
| ### Hugging Face Space |
|
|
| This repository is ready to run as a Gradio Space. Create a Hugging Face Space with the Gradio SDK, push these files, and the Space will launch `app.py`. |
|
|
| The Space UI accepts a `.bib` upload and returns: |
|
|
| - a corrected BibTeX file |
| - a Markdown validation report |
| - a list of entries that still need manual review |
|
|
| ### Basic Usage |
|
|
| ```bash |
| # Validate and auto-fix a bib file |
| python main.py --bib references.bib |
| ``` |
|
|
| ### Command-Line Options |
|
|
| | Option | Short | Description | |
| |--------|-------|-------------| |
| | `--bib` | `-b` | Path to your `.bib` file (required) | |
| | `--output` | `-o` | Output report path (optional) | |
|
|
| ### Example |
|
|
| ```bash |
| # Process your bibliography |
| python main.py --bib paper/references.bib |
| |
| # With custom output path |
| python main.py --bib refs.bib --output validation_report.md |
| ``` |
|
|
| --- |
|
|
| ## How It Works |
|
|
| ``` |
| βββββββββββββββββββ |
| β Load .bib file β |
| ββββββββββ¬βββββββββ |
| βΌ |
| βββββββββββββββββββββββββββββββββββββββββββ |
| β For each entry: β |
| β 1. Query academic databases β |
| β 2. Compare metadata (title, author, yr)β |
| β 3. Calculate confidence score β |
| ββββββββββ¬βββββββββββββββββββββββββββββββββ |
| βΌ |
| βββββββββββββββββββββββββββββββββββββββββββ |
| β Decision: β |
| β β’ confidence > 85% β Auto-fix metadata β |
| β β’ Match found β Keep as-is β |
| β β’ No match β Remove entry β |
| ββββββββββ¬βββββββββββββββββββββββββββββββββ |
| βΌ |
| βββββββββββββββββββββββββββββββββββββββββββ |
| β Save updated .bib file β |
| β Run verification pass β |
| βββββββββββββββββββββββββββββββββββββββββββ |
| ``` |
|
|
| --- |
|
|
| ## Output |
|
|
| RefCheck displays real-time progress and a final summary: |
|
|
| ``` |
| π BibGuard - Auto-Fix & Verify |
| Target: references.bib |
| |
| Found 42 entries. Running validation and auto-fix... |
| |
| Validating & Fixing βββββββββββββββββ 100% 42/42 β 38 β 2 β 2 |
| |
| βοΈ Updates: |
| - Fixed 2 entries (metadata updated) |
| - Removed 2 invalid/hallucinated entries |
| β File saved. |
| |
| π Double checking (Re-validation)... |
| |
| Verifying βββββββββββββββββββββββββββ 100% 40/40 β 40 |
| |
| ================================================== |
| π Final Status |
| ================================================== |
| Total: 40 |
| β Verified: 40 |
| β Issues: 0 |
| β Not found: 0 |
| ``` |
|
|
| ### Status Meanings |
|
|
| | Symbol | Meaning | |
| |--------|---------| |
| | β
Verified | Entry matches a known publication | |
| | β οΈ Fixed | Metadata was auto-corrected | |
| | β Removed | Entry could not be verified (likely hallucination) | |
|
|
| --- |
|
|
| ## Project Structure |
|
|
| ``` |
| RefCheck/ |
| βββ main.py # Entry point & workflow orchestration |
| βββ requirements.txt # Python dependencies |
| βββ README.md |
| βββ src/ |
| βββ fetcher.py # API clients for academic databases |
| βββ comparator.py # Metadata comparison & scoring |
| βββ parser.py # BibTeX parsing & saving |
| βββ utils.py # Progress display & text utilities |
| ``` |
|
|
| --- |
|
|
| ## License |
|
|
| MIT License β see [LICENSE](LICENSE) for details. |
|
|
| --- |
|
|
| ## Contributing |
|
|
| Contributions are welcome! Please feel free to submit a Pull Request. |
|
|
| --- |
|
|
| ## Acknowledgments |
|
|
| Built with: |
| - [bibtexparser](https://github.com/sciunto-org/python-bibtexparser) for BibTeX handling |
| - [Rich](https://github.com/Textualize/rich) for beautiful terminal output |
| - APIs from arXiv, CrossRef, DBLP, Semantic Scholar, and OpenAlex |
|
|