RefCheck / README.md
voidful's picture
Add RefCheck Gradio Space
11a28db verified
---
title: RefCheck
emoji: πŸ”
colorFrom: blue
colorTo: indigo
sdk: gradio
app_file: app.py
python_version: 3.11
suggested_hardware: cpu-basic
fullWidth: true
short_description: Upload BibTeX, validate citations, download fixes.
tags:
- bibtex
- citations
- academic
- bibliography
---
# RefCheck πŸ”
> **A Citation Hallucination Detector & Auto-Fixer**
> Validate and automatically correct your BibTeX bibliography against multiple academic databases.
[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)
---
## Why RefCheck?
Academic papers often contain citation errors β€” wrong titles, incorrect authors, mismatched years, or even completely fabricated references (hallucinations from AI tools). **RefCheck** automatically:
- βœ… **Validates** each citation against 6 academic databases
- πŸ”§ **Auto-fixes** metadata mismatches (title, authors, year, DOI)
- πŸ—‘οΈ **Removes** unverifiable/hallucinated entries
- πŸ“Š **Reports** a clear verification summary
---
## Features
### Multi-Source Verification
RefCheck cross-references your citations against:
| Source | Lookup Methods |
|--------|----------------|
| **arXiv** | arXiv ID, Title search |
| **CrossRef** | DOI, Title search |
| **DBLP** | Title search |
| **Semantic Scholar** | DOI, Title search |
| **OpenAlex** | DOI, Title search |
| **Google Scholar** | Title search (disabled by default) |
### Two-Pass Workflow
1. **Pass 1 β€” Validate & Fix**: Checks each entry, auto-corrects metadata, removes invalid citations
2. **Pass 2 β€” Verify**: Re-validates the cleaned file to confirm all entries are correct
---
## Installation
```bash
# Clone the repository
git clone https://github.com/voidful/RefCheck.git
cd RefCheck
# Install dependencies
pip install -r requirements.txt
```
### Requirements
- Python 3.9+
- Dependencies: `bibtexparser`, `requests`, `beautifulsoup4`, `rich`, `Unidecode`, `lxml`
---
## Usage
### Hugging Face Space
This repository is ready to run as a Gradio Space. Create a Hugging Face Space with the Gradio SDK, push these files, and the Space will launch `app.py`.
The Space UI accepts a `.bib` upload and returns:
- a corrected BibTeX file
- a Markdown validation report
- a list of entries that still need manual review
### Basic Usage
```bash
# Validate and auto-fix a bib file
python main.py --bib references.bib
```
### Command-Line Options
| Option | Short | Description |
|--------|-------|-------------|
| `--bib` | `-b` | Path to your `.bib` file (required) |
| `--output` | `-o` | Output report path (optional) |
### Example
```bash
# Process your bibliography
python main.py --bib paper/references.bib
# With custom output path
python main.py --bib refs.bib --output validation_report.md
```
---
## How It Works
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Load .bib file β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ For each entry: β”‚
β”‚ 1. Query academic databases β”‚
β”‚ 2. Compare metadata (title, author, yr)β”‚
β”‚ 3. Calculate confidence score β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Decision: β”‚
β”‚ β€’ confidence > 85% β†’ Auto-fix metadata β”‚
β”‚ β€’ Match found β†’ Keep as-is β”‚
β”‚ β€’ No match β†’ Remove entry β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Save updated .bib file β”‚
β”‚ Run verification pass β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
---
## Output
RefCheck displays real-time progress and a final summary:
```
πŸ“š BibGuard - Auto-Fix & Verify
Target: references.bib
Found 42 entries. Running validation and auto-fix...
Validating & Fixing ━━━━━━━━━━━━━━━━━ 100% 42/42 βœ“ 38 ⚠ 2 βœ— 2
✏️ Updates:
- Fixed 2 entries (metadata updated)
- Removed 2 invalid/hallucinated entries
βœ“ File saved.
πŸ”„ Double checking (Re-validation)...
Verifying ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 40/40 βœ“ 40
==================================================
πŸ“Š Final Status
==================================================
Total: 40
βœ“ Verified: 40
⚠ Issues: 0
βœ— Not found: 0
```
### Status Meanings
| Symbol | Meaning |
|--------|---------|
| βœ… Verified | Entry matches a known publication |
| ⚠️ Fixed | Metadata was auto-corrected |
| ❌ Removed | Entry could not be verified (likely hallucination) |
---
## Project Structure
```
RefCheck/
β”œβ”€β”€ main.py # Entry point & workflow orchestration
β”œβ”€β”€ requirements.txt # Python dependencies
β”œβ”€β”€ README.md
└── src/
β”œβ”€β”€ fetcher.py # API clients for academic databases
β”œβ”€β”€ comparator.py # Metadata comparison & scoring
β”œβ”€β”€ parser.py # BibTeX parsing & saving
└── utils.py # Progress display & text utilities
```
---
## License
MIT License β€” see [LICENSE](LICENSE) for details.
---
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
---
## Acknowledgments
Built with:
- [bibtexparser](https://github.com/sciunto-org/python-bibtexparser) for BibTeX handling
- [Rich](https://github.com/Textualize/rich) for beautiful terminal output
- APIs from arXiv, CrossRef, DBLP, Semantic Scholar, and OpenAlex