File size: 6,245 Bytes
ec88be4 11a28db ec88be4 11a28db ec88be4 11a28db | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 | ---
title: RefCheck
emoji: π
colorFrom: blue
colorTo: indigo
sdk: gradio
app_file: app.py
python_version: 3.11
suggested_hardware: cpu-basic
fullWidth: true
short_description: Upload BibTeX, validate citations, download fixes.
tags:
- bibtex
- citations
- academic
- bibliography
---
# RefCheck π
> **A Citation Hallucination Detector & Auto-Fixer**
> Validate and automatically correct your BibTeX bibliography against multiple academic databases.
[](https://www.python.org/downloads/)
[](LICENSE)
---
## Why RefCheck?
Academic papers often contain citation errors β wrong titles, incorrect authors, mismatched years, or even completely fabricated references (hallucinations from AI tools). **RefCheck** automatically:
- β
**Validates** each citation against 6 academic databases
- π§ **Auto-fixes** metadata mismatches (title, authors, year, DOI)
- ποΈ **Removes** unverifiable/hallucinated entries
- π **Reports** a clear verification summary
---
## Features
### Multi-Source Verification
RefCheck cross-references your citations against:
| Source | Lookup Methods |
|--------|----------------|
| **arXiv** | arXiv ID, Title search |
| **CrossRef** | DOI, Title search |
| **DBLP** | Title search |
| **Semantic Scholar** | DOI, Title search |
| **OpenAlex** | DOI, Title search |
| **Google Scholar** | Title search (disabled by default) |
### Two-Pass Workflow
1. **Pass 1 β Validate & Fix**: Checks each entry, auto-corrects metadata, removes invalid citations
2. **Pass 2 β Verify**: Re-validates the cleaned file to confirm all entries are correct
---
## Installation
```bash
# Clone the repository
git clone https://github.com/voidful/RefCheck.git
cd RefCheck
# Install dependencies
pip install -r requirements.txt
```
### Requirements
- Python 3.9+
- Dependencies: `bibtexparser`, `requests`, `beautifulsoup4`, `rich`, `Unidecode`, `lxml`
---
## Usage
### Hugging Face Space
This repository is ready to run as a Gradio Space. Create a Hugging Face Space with the Gradio SDK, push these files, and the Space will launch `app.py`.
The Space UI accepts a `.bib` upload and returns:
- a corrected BibTeX file
- a Markdown validation report
- a list of entries that still need manual review
### Basic Usage
```bash
# Validate and auto-fix a bib file
python main.py --bib references.bib
```
### Command-Line Options
| Option | Short | Description |
|--------|-------|-------------|
| `--bib` | `-b` | Path to your `.bib` file (required) |
| `--output` | `-o` | Output report path (optional) |
### Example
```bash
# Process your bibliography
python main.py --bib paper/references.bib
# With custom output path
python main.py --bib refs.bib --output validation_report.md
```
---
## How It Works
```
βββββββββββββββββββ
β Load .bib file β
ββββββββββ¬βββββββββ
βΌ
βββββββββββββββββββββββββββββββββββββββββββ
β For each entry: β
β 1. Query academic databases β
β 2. Compare metadata (title, author, yr)β
β 3. Calculate confidence score β
ββββββββββ¬βββββββββββββββββββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββββββββββ
β Decision: β
β β’ confidence > 85% β Auto-fix metadata β
β β’ Match found β Keep as-is β
β β’ No match β Remove entry β
ββββββββββ¬βββββββββββββββββββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββββββββββ
β Save updated .bib file β
β Run verification pass β
βββββββββββββββββββββββββββββββββββββββββββ
```
---
## Output
RefCheck displays real-time progress and a final summary:
```
π BibGuard - Auto-Fix & Verify
Target: references.bib
Found 42 entries. Running validation and auto-fix...
Validating & Fixing βββββββββββββββββ 100% 42/42 β 38 β 2 β 2
βοΈ Updates:
- Fixed 2 entries (metadata updated)
- Removed 2 invalid/hallucinated entries
β File saved.
π Double checking (Re-validation)...
Verifying βββββββββββββββββββββββββββ 100% 40/40 β 40
==================================================
π Final Status
==================================================
Total: 40
β Verified: 40
β Issues: 0
β Not found: 0
```
### Status Meanings
| Symbol | Meaning |
|--------|---------|
| β
Verified | Entry matches a known publication |
| β οΈ Fixed | Metadata was auto-corrected |
| β Removed | Entry could not be verified (likely hallucination) |
---
## Project Structure
```
RefCheck/
βββ main.py # Entry point & workflow orchestration
βββ requirements.txt # Python dependencies
βββ README.md
βββ src/
βββ fetcher.py # API clients for academic databases
βββ comparator.py # Metadata comparison & scoring
βββ parser.py # BibTeX parsing & saving
βββ utils.py # Progress display & text utilities
```
---
## License
MIT License β see [LICENSE](LICENSE) for details.
---
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
---
## Acknowledgments
Built with:
- [bibtexparser](https://github.com/sciunto-org/python-bibtexparser) for BibTeX handling
- [Rich](https://github.com/Textualize/rich) for beautiful terminal output
- APIs from arXiv, CrossRef, DBLP, Semantic Scholar, and OpenAlex
|