ENGT 375 — Applied Machine Learning | Spring 2026 | ODU
These should already be installed from the project setup:
pip install -r requirements.txt
The interactive Gradio app lets you paste any email and see spam/ham predictions with LIME and SHAP explanations. The ensemble model (models/voting_model.joblib) must already be trained — see Step 3 if it's missing.
Option A — Double-click in Finder:
http://127.0.0.1:7860Option B — From terminal:
http://127.0.0.1:7860 with the Spam Classifier interface showing Result, LIME, and SHAP tabs.
The student notebook (spam_classifier_xai_student.ipynb) is the main course deliverable. It contains the full XAI walkthrough: data loading, model training, and LIME / SHAP / ELI5 comparisons using the Kuzlu et al. 2020 methodology.
Option A — Double-click in Finder:
Option B — From terminal:
Shift+Enter, or run all via Kernel > Restart & Run All.
Two retrain modes are available. Both save new model files to models/ and automatically back up the previous models.
Fast mode — single Random Forest, 1000 TF-IDF features, no grid search. Use this to quickly verify the pipeline works after a small change.
Full mode ~15–30 min — VotingClassifier ensemble (RF + LR + SVM), 3000 TF-IDF features. This is the production model.
From terminal:
data/feedback/feedback_log.csv into training data. Pass --no-feedback to skip this.
| File / Folder | What It Is |
|---|---|
launch-gradio.command | Double-click to launch the Gradio web app |
launch-notebook.command | Double-click to open the student notebook in Jupyter |
retrain-fast.command | Double-click to retrain in fast mode (~2–5 min) |
retrain-full.command | Double-click to retrain in full mode (~15–30 min) |
app.py | Gradio web app (Result, LIME, SHAP tabs) |
retrain.py | Retrain script — fast or full mode |
train_ensemble.py | Lower-level ensemble training script |
utils.py | Text preprocessing + 24 metadata features (shared by app and notebooks) |
notebooks/spam_classifier_xai_student.ipynb | Main course deliverable — full XAI analysis (LIME, SHAP, ELI5) |
notebooks/spam_classifier_gradio.ipynb | Ensemble training pipeline and Gradio deployment notebook |
models/voting_model.joblib | Trained VotingClassifier ensemble (RF + LR + SVM) |
models/tfidf_vectorizer.joblib | Fitted TF-IDF vectorizer |
data/spam_Emails_data.csv | Kaggle spam email dataset |
data/email-dataset-main/ | GitHub email dataset (spam/ and ham/ folders) |
data/feedback/feedback_log.csv | User corrections collected from the Gradio app |
requirements.txt | Python package dependencies |
| Problem | Fix |
|---|---|
ModuleNotFoundError | Run pip install -r requirements.txt to install all dependencies. |
| App says "Model files not found" | The model hasn't been trained yet. Run retrain-fast.command or retrain-full.command first. |
| Browser doesn't open automatically | Manually go to http://127.0.0.1:7860 in your browser. |
permission denied on .command files | Run once in terminal: chmod +x *.command |
| Notebook kernel not found | Run: python3.11 -m ipykernel install --user --name spam-xai |
| NLTK data missing | Run: python3.11 -c "import nltk; nltk.download('punkt'); nltk.download('stopwords')" |