How to Run — Spam Classifier XAI Project

Prerequisites

These should already be installed from the project setup:

✓ Python 3.11 (via venv or system)

✓ scikit-learn, pandas, numpy

✓ lime, shap, eli5, gradio

✓ nltk, joblib, wordcloud

Missing packages? Open a terminal in the project folder and run:
pip install -r requirements.txt

1 Launch the Gradio Web App Easiest

The interactive Gradio app lets you paste any email and see spam/ham predictions with LIME and SHAP explanations. The ensemble model (models/voting_model.joblib) must already be trained — see Step 3 if it's missing.

Option A — Double-click in Finder:

In Finder, navigate to the project folder
Double-click launch-gradio.command
A terminal window opens and the app starts
Your browser opens automatically at http://127.0.0.1:7860
To stop: press Ctrl+C in the terminal, or close the window

Option B — From terminal:

$ cd /path/to/spam-xai-project $ python3.11 app.py

You'll see: A browser tab opens at http://127.0.0.1:7860 with the Spam Classifier interface showing Result, LIME, and SHAP tabs.

2 Open the Student Notebook Full Analysis

The student notebook (spam_classifier_xai_student.ipynb) is the main course deliverable. It contains the full XAI walkthrough: data loading, model training, and LIME / SHAP / ELI5 comparisons using the Kuzlu et al. 2020 methodology.

Option A — Double-click in Finder:

Double-click launch-notebook.command
Jupyter opens in your browser
The student notebook opens automatically

Option B — From terminal:

$ cd /path/to/spam-xai-project $ python3.11 -m jupyter notebook notebooks/spam_classifier_xai_student.ipynb

Re-running the notebook: Full execution takes ~5-15 minutes (SHAP computation is the slow part). You can run individual cells with Shift+Enter, or run all via Kernel > Restart & Run All.

Note: Retraining the model (Step 3) does NOT automatically re-run the notebook. After retraining, re-run the notebook manually to get fresh outputs.

3 Retrain the Model

Two retrain modes are available. Both save new model files to models/ and automatically back up the previous models.

Fast mode — single Random Forest, 1000 TF-IDF features, no grid search. Use this to quickly verify the pipeline works after a small change.

Double-click retrain-fast.command
Takes ~2–5 minutes

Full mode ~15–30 min — VotingClassifier ensemble (RF + LR + SVM), 3000 TF-IDF features. This is the production model.

Double-click retrain-full.command
Takes ~15–30 minutes

From terminal:

$ python3.11 retrain.py --mode fast # quick smoke-test $ python3.11 retrain.py --mode full # production model $ python3.11 retrain.py --mode full --no-feedback # ignore user feedback

User feedback: By default, retrain merges corrections from data/feedback/feedback_log.csv into training data. Pass --no-feedback to skip this.

4 Project Files Reference

File / Folder	What It Is
`launch-gradio.command`	Double-click to launch the Gradio web app
`launch-notebook.command`	Double-click to open the student notebook in Jupyter
`retrain-fast.command`	Double-click to retrain in fast mode (~2–5 min)
`retrain-full.command`	Double-click to retrain in full mode (~15–30 min)
`app.py`	Gradio web app (Result, LIME, SHAP tabs)
`retrain.py`	Retrain script — fast or full mode
`train_ensemble.py`	Lower-level ensemble training script
`utils.py`	Text preprocessing + 24 metadata features (shared by app and notebooks)
`notebooks/spam_classifier_xai_student.ipynb`	Main course deliverable — full XAI analysis (LIME, SHAP, ELI5)
`notebooks/spam_classifier_gradio.ipynb`	Ensemble training pipeline and Gradio deployment notebook
`models/voting_model.joblib`	Trained VotingClassifier ensemble (RF + LR + SVM)
`models/tfidf_vectorizer.joblib`	Fitted TF-IDF vectorizer
`data/spam_Emails_data.csv`	Kaggle spam email dataset
`data/email-dataset-main/`	GitHub email dataset (spam/ and ham/ folders)
`data/feedback/feedback_log.csv`	User corrections collected from the Gradio app
`requirements.txt`	Python package dependencies

5 Troubleshooting