RefCheck / README.md
voidful's picture
Add RefCheck Gradio Space
11a28db verified

A newer version of the Gradio SDK is available: 6.17.3

Upgrade
metadata
title: RefCheck
emoji: πŸ”
colorFrom: blue
colorTo: indigo
sdk: gradio
app_file: app.py
python_version: 3.11
suggested_hardware: cpu-basic
fullWidth: true
short_description: Upload BibTeX, validate citations, download fixes.
tags:
  - bibtex
  - citations
  - academic
  - bibliography

RefCheck πŸ”

A Citation Hallucination Detector & Auto-Fixer
Validate and automatically correct your BibTeX bibliography against multiple academic databases.

Python 3.9+ License: MIT


Why RefCheck?

Academic papers often contain citation errors β€” wrong titles, incorrect authors, mismatched years, or even completely fabricated references (hallucinations from AI tools). RefCheck automatically:

  • βœ… Validates each citation against 6 academic databases
  • πŸ”§ Auto-fixes metadata mismatches (title, authors, year, DOI)
  • πŸ—‘οΈ Removes unverifiable/hallucinated entries
  • πŸ“Š Reports a clear verification summary

Features

Multi-Source Verification

RefCheck cross-references your citations against:

Source Lookup Methods
arXiv arXiv ID, Title search
CrossRef DOI, Title search
DBLP Title search
Semantic Scholar DOI, Title search
OpenAlex DOI, Title search
Google Scholar Title search (disabled by default)

Two-Pass Workflow

  1. Pass 1 β€” Validate & Fix: Checks each entry, auto-corrects metadata, removes invalid citations
  2. Pass 2 β€” Verify: Re-validates the cleaned file to confirm all entries are correct

Installation

# Clone the repository
git clone https://github.com/voidful/RefCheck.git
cd RefCheck

# Install dependencies
pip install -r requirements.txt

Requirements

  • Python 3.9+
  • Dependencies: bibtexparser, requests, beautifulsoup4, rich, Unidecode, lxml

Usage

Hugging Face Space

This repository is ready to run as a Gradio Space. Create a Hugging Face Space with the Gradio SDK, push these files, and the Space will launch app.py.

The Space UI accepts a .bib upload and returns:

  • a corrected BibTeX file
  • a Markdown validation report
  • a list of entries that still need manual review

Basic Usage

# Validate and auto-fix a bib file
python main.py --bib references.bib

Command-Line Options

Option Short Description
--bib -b Path to your .bib file (required)
--output -o Output report path (optional)

Example

# Process your bibliography
python main.py --bib paper/references.bib

# With custom output path
python main.py --bib refs.bib --output validation_report.md

How It Works

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Load .bib file β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  For each entry:                        β”‚
β”‚  1. Query academic databases            β”‚
β”‚  2. Compare metadata (title, author, yr)β”‚
β”‚  3. Calculate confidence score          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Decision:                              β”‚
β”‚  β€’ confidence > 85% β†’ Auto-fix metadata β”‚
β”‚  β€’ Match found      β†’ Keep as-is        β”‚
β”‚  β€’ No match         β†’ Remove entry      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Save updated .bib file                 β”‚
β”‚  Run verification pass                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Output

RefCheck displays real-time progress and a final summary:

πŸ“š BibGuard - Auto-Fix & Verify
   Target: references.bib

Found 42 entries. Running validation and auto-fix...

Validating & Fixing ━━━━━━━━━━━━━━━━━ 100% 42/42 βœ“ 38 ⚠ 2 βœ— 2

✏️  Updates:
   - Fixed 2 entries (metadata updated)
   - Removed 2 invalid/hallucinated entries
βœ“ File saved.

πŸ”„ Double checking (Re-validation)...

Verifying ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 40/40 βœ“ 40

==================================================
πŸ“Š Final Status
==================================================
  Total:      40
  βœ“ Verified: 40
  ⚠ Issues:   0
  βœ— Not found: 0

Status Meanings

Symbol Meaning
βœ… Verified Entry matches a known publication
⚠️ Fixed Metadata was auto-corrected
❌ Removed Entry could not be verified (likely hallucination)

Project Structure

RefCheck/
β”œβ”€β”€ main.py              # Entry point & workflow orchestration
β”œβ”€β”€ requirements.txt     # Python dependencies
β”œβ”€β”€ README.md
└── src/
    β”œβ”€β”€ fetcher.py       # API clients for academic databases
    β”œβ”€β”€ comparator.py    # Metadata comparison & scoring
    β”œβ”€β”€ parser.py        # BibTeX parsing & saving
    └── utils.py         # Progress display & text utilities

License

MIT License β€” see LICENSE for details.


Contributing

Contributions are welcome! Please feel free to submit a Pull Request.


Acknowledgments

Built with:

  • bibtexparser for BibTeX handling
  • Rich for beautiful terminal output
  • APIs from arXiv, CrossRef, DBLP, Semantic Scholar, and OpenAlex