Spam Classifier with Explainable AI

ENGT 375 — Applied Machine Learning | Spring 2026 | ODU

Prerequisites

These should already be installed from the project setup:

Python 3.11 (via venv or system)
scikit-learn, pandas, numpy
lime, shap, eli5, gradio
nltk, joblib, wordcloud
Missing packages? Open a terminal in the project folder and run:
pip install -r requirements.txt

1 Launch the Gradio Web App Easiest

The interactive Gradio app lets you paste any email and see spam/ham predictions with LIME and SHAP explanations. The ensemble model (models/voting_model.joblib) must already be trained — see Step 3 if it's missing.

Option A — Double-click in Finder:

  1. In Finder, navigate to the project folder
  2. Double-click launch-gradio.command
  3. A terminal window opens and the app starts
  4. Your browser opens automatically at http://127.0.0.1:7860
  5. To stop: press Ctrl+C in the terminal, or close the window

Option B — From terminal:

$ cd /path/to/spam-xai-project $ python3.11 app.py
You'll see: A browser tab opens at http://127.0.0.1:7860 with the Spam Classifier interface showing Result, LIME, and SHAP tabs.

2 Open the Student Notebook Full Analysis

The student notebook (spam_classifier_xai_student.ipynb) is the main course deliverable. It contains the full XAI walkthrough: data loading, model training, and LIME / SHAP / ELI5 comparisons using the Kuzlu et al. 2020 methodology.

Option A — Double-click in Finder:

  1. Double-click launch-notebook.command
  2. Jupyter opens in your browser
  3. The student notebook opens automatically

Option B — From terminal:

$ cd /path/to/spam-xai-project $ python3.11 -m jupyter notebook notebooks/spam_classifier_xai_student.ipynb
Re-running the notebook: Full execution takes ~5-15 minutes (SHAP computation is the slow part). You can run individual cells with Shift+Enter, or run all via Kernel > Restart & Run All.
Note: Retraining the model (Step 3) does NOT automatically re-run the notebook. After retraining, re-run the notebook manually to get fresh outputs.

3 Retrain the Model

Two retrain modes are available. Both save new model files to models/ and automatically back up the previous models.

Fast mode — single Random Forest, 1000 TF-IDF features, no grid search. Use this to quickly verify the pipeline works after a small change.

  1. Double-click retrain-fast.command
  2. Takes ~2–5 minutes

Full mode ~15–30 min — VotingClassifier ensemble (RF + LR + SVM), 3000 TF-IDF features. This is the production model.

  1. Double-click retrain-full.command
  2. Takes ~15–30 minutes

From terminal:

$ python3.11 retrain.py --mode fast # quick smoke-test $ python3.11 retrain.py --mode full # production model $ python3.11 retrain.py --mode full --no-feedback # ignore user feedback
User feedback: By default, retrain merges corrections from data/feedback/feedback_log.csv into training data. Pass --no-feedback to skip this.

4 Project Files Reference

File / FolderWhat It Is
launch-gradio.commandDouble-click to launch the Gradio web app
launch-notebook.commandDouble-click to open the student notebook in Jupyter
retrain-fast.commandDouble-click to retrain in fast mode (~2–5 min)
retrain-full.commandDouble-click to retrain in full mode (~15–30 min)
app.pyGradio web app (Result, LIME, SHAP tabs)
retrain.pyRetrain script — fast or full mode
train_ensemble.pyLower-level ensemble training script
utils.pyText preprocessing + 24 metadata features (shared by app and notebooks)
notebooks/spam_classifier_xai_student.ipynbMain course deliverable — full XAI analysis (LIME, SHAP, ELI5)
notebooks/spam_classifier_gradio.ipynbEnsemble training pipeline and Gradio deployment notebook
models/voting_model.joblibTrained VotingClassifier ensemble (RF + LR + SVM)
models/tfidf_vectorizer.joblibFitted TF-IDF vectorizer
data/spam_Emails_data.csvKaggle spam email dataset
data/email-dataset-main/GitHub email dataset (spam/ and ham/ folders)
data/feedback/feedback_log.csvUser corrections collected from the Gradio app
requirements.txtPython package dependencies

5 Troubleshooting

ProblemFix
ModuleNotFoundErrorRun pip install -r requirements.txt to install all dependencies.
App says "Model files not found"The model hasn't been trained yet. Run retrain-fast.command or retrain-full.command first.
Browser doesn't open automaticallyManually go to http://127.0.0.1:7860 in your browser.
permission denied on .command filesRun once in terminal: chmod +x *.command
Notebook kernel not foundRun: python3.11 -m ipykernel install --user --name spam-xai
NLTK data missingRun: python3.11 -c "import nltk; nltk.download('punkt'); nltk.download('stopwords')"