File size: 6,245 Bytes
ec88be4
 
11a28db
 
ec88be4
 
 
11a28db
 
 
 
 
 
 
 
 
ec88be4
 
11a28db
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
---
title: RefCheck
emoji: πŸ”
colorFrom: blue
colorTo: indigo
sdk: gradio
app_file: app.py
python_version: 3.11
suggested_hardware: cpu-basic
fullWidth: true
short_description: Upload BibTeX, validate citations, download fixes.
tags:
  - bibtex
  - citations
  - academic
  - bibliography
---

# RefCheck πŸ”

> **A Citation Hallucination Detector & Auto-Fixer**  
> Validate and automatically correct your BibTeX bibliography against multiple academic databases.

[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)

---

## Why RefCheck?

Academic papers often contain citation errors β€” wrong titles, incorrect authors, mismatched years, or even completely fabricated references (hallucinations from AI tools). **RefCheck** automatically:

- βœ… **Validates** each citation against 6 academic databases
- πŸ”§ **Auto-fixes** metadata mismatches (title, authors, year, DOI)
- πŸ—‘οΈ **Removes** unverifiable/hallucinated entries
- πŸ“Š **Reports** a clear verification summary

---

## Features

### Multi-Source Verification

RefCheck cross-references your citations against:

| Source | Lookup Methods |
|--------|----------------|
| **arXiv** | arXiv ID, Title search |
| **CrossRef** | DOI, Title search |
| **DBLP** | Title search |
| **Semantic Scholar** | DOI, Title search |
| **OpenAlex** | DOI, Title search |
| **Google Scholar** | Title search (disabled by default) |

### Two-Pass Workflow

1. **Pass 1 β€” Validate & Fix**: Checks each entry, auto-corrects metadata, removes invalid citations
2. **Pass 2 β€” Verify**: Re-validates the cleaned file to confirm all entries are correct

---

## Installation

```bash
# Clone the repository
git clone https://github.com/voidful/RefCheck.git
cd RefCheck

# Install dependencies
pip install -r requirements.txt
```

### Requirements

- Python 3.9+
- Dependencies: `bibtexparser`, `requests`, `beautifulsoup4`, `rich`, `Unidecode`, `lxml`

---

## Usage

### Hugging Face Space

This repository is ready to run as a Gradio Space. Create a Hugging Face Space with the Gradio SDK, push these files, and the Space will launch `app.py`.

The Space UI accepts a `.bib` upload and returns:

- a corrected BibTeX file
- a Markdown validation report
- a list of entries that still need manual review

### Basic Usage

```bash
# Validate and auto-fix a bib file
python main.py --bib references.bib
```

### Command-Line Options

| Option | Short | Description |
|--------|-------|-------------|
| `--bib` | `-b` | Path to your `.bib` file (required) |
| `--output` | `-o` | Output report path (optional) |

### Example

```bash
# Process your bibliography
python main.py --bib paper/references.bib

# With custom output path
python main.py --bib refs.bib --output validation_report.md
```

---

## How It Works

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Load .bib file β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  For each entry:                        β”‚
β”‚  1. Query academic databases            β”‚
β”‚  2. Compare metadata (title, author, yr)β”‚
β”‚  3. Calculate confidence score          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Decision:                              β”‚
β”‚  β€’ confidence > 85% β†’ Auto-fix metadata β”‚
β”‚  β€’ Match found      β†’ Keep as-is        β”‚
β”‚  β€’ No match         β†’ Remove entry      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Save updated .bib file                 β”‚
β”‚  Run verification pass                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

---

## Output

RefCheck displays real-time progress and a final summary:

```
πŸ“š BibGuard - Auto-Fix & Verify
   Target: references.bib

Found 42 entries. Running validation and auto-fix...

Validating & Fixing ━━━━━━━━━━━━━━━━━ 100% 42/42 βœ“ 38 ⚠ 2 βœ— 2

✏️  Updates:
   - Fixed 2 entries (metadata updated)
   - Removed 2 invalid/hallucinated entries
βœ“ File saved.

πŸ”„ Double checking (Re-validation)...

Verifying ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 40/40 βœ“ 40

==================================================
πŸ“Š Final Status
==================================================
  Total:      40
  βœ“ Verified: 40
  ⚠ Issues:   0
  βœ— Not found: 0
```

### Status Meanings

| Symbol | Meaning |
|--------|---------|
| βœ… Verified | Entry matches a known publication |
| ⚠️ Fixed | Metadata was auto-corrected |
| ❌ Removed | Entry could not be verified (likely hallucination) |

---

## Project Structure

```
RefCheck/
β”œβ”€β”€ main.py              # Entry point & workflow orchestration
β”œβ”€β”€ requirements.txt     # Python dependencies
β”œβ”€β”€ README.md
└── src/
    β”œβ”€β”€ fetcher.py       # API clients for academic databases
    β”œβ”€β”€ comparator.py    # Metadata comparison & scoring
    β”œβ”€β”€ parser.py        # BibTeX parsing & saving
    └── utils.py         # Progress display & text utilities
```

---

## License

MIT License β€” see [LICENSE](LICENSE) for details.

---

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

---

## Acknowledgments

Built with:
- [bibtexparser](https://github.com/sciunto-org/python-bibtexparser) for BibTeX handling
- [Rich](https://github.com/Textualize/rich) for beautiful terminal output
- APIs from arXiv, CrossRef, DBLP, Semantic Scholar, and OpenAlex