spam-xai-model-v2 / HOW_TO_RUN.html
VoltageVagabond's picture
Upload folder using huggingface_hub
960ec3d verified
raw
history blame
12.4 kB
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>How to Run β€” Spam Classifier XAI Project</title>
<style>
* { margin: 0; padding: 0; box-sizing: border-box; }
body { font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif; background: #f5f7fa; color: #333; line-height: 1.6; padding: 40px 20px; }
.container { max-width: 800px; margin: 0 auto; }
h1 { font-size: 28px; color: #1a1a2e; margin-bottom: 8px; }
.subtitle { color: #666; font-size: 14px; margin-bottom: 32px; }
.card { background: #fff; border-radius: 12px; padding: 28px; margin-bottom: 20px; box-shadow: 0 2px 8px rgba(0,0,0,0.08); }
.card h2 { font-size: 20px; color: #1a1a2e; margin-bottom: 4px; display: flex; align-items: center; gap: 10px; }
.card h2 .num { background: #4361ee; color: #fff; width: 32px; height: 32px; border-radius: 50%; display: inline-flex; align-items: center; justify-content: center; font-size: 16px; flex-shrink: 0; }
.card p { margin: 10px 0; color: #555; }
.card ul, .card ol { margin: 10px 0 10px 20px; color: #555; }
.card ul li, .card ol li { margin-bottom: 6px; }
code { background: #e8edf3; padding: 2px 8px; border-radius: 4px; font-family: 'Consolas', 'Courier New', monospace; font-size: 14px; }
.cmd-block { background: #1a1a2e; color: #e0e0e0; padding: 16px 20px; border-radius: 8px; margin: 12px 0; font-family: 'Consolas', 'Courier New', monospace; font-size: 14px; overflow-x: auto; position: relative; white-space: pre-wrap; word-break: break-all; }
.cmd-block .prompt { color: #4361ee; }
.cmd-block .comment { color: #6c757d; }
.tag { display: inline-block; padding: 3px 10px; border-radius: 20px; font-size: 12px; font-weight: 600; margin-left: 8px; }
.tag-easy { background: #d4edda; color: #155724; }
.tag-manual { background: #fff3cd; color: #856404; }
.tag-slow { background: #fde8e8; color: #7b1c1c; }
.divider { border: none; border-top: 1px solid #e0e0e0; margin: 24px 0; }
.note { background: #fff8e1; border-left: 4px solid #ffc107; padding: 14px 18px; border-radius: 0 8px 8px 0; margin: 16px 0; font-size: 14px; }
.note strong { color: #856404; }
.success { background: #e8f5e9; border-left: 4px solid #4caf50; padding: 14px 18px; border-radius: 0 8px 8px 0; margin: 16px 0; font-size: 14px; }
.success strong { color: #2e7d32; }
a { color: #4361ee; text-decoration: none; }
a:hover { text-decoration: underline; }
.prereq-grid { display: grid; grid-template-columns: 1fr 1fr; gap: 10px; margin: 12px 0; }
.prereq-item { background: #f8f9fa; padding: 10px 14px; border-radius: 8px; font-size: 14px; }
.prereq-item .check { color: #4caf50; font-weight: bold; margin-right: 6px; }
table { width: 100%; border-collapse: collapse; margin-top: 12px; font-size: 14px; }
th { text-align: left; padding: 8px; border-bottom: 2px solid #e0e0e0; }
td { padding: 8px; border-bottom: 1px solid #f0f0f0; }
</style>
</head>
<body>
<div class="container">
<h1>Spam Classifier with Explainable AI</h1>
<p class="subtitle">ENGT 375 β€” Applied Machine Learning | Spring 2026 | ODU</p>
<!-- Prerequisites -->
<div class="card">
<h2>Prerequisites</h2>
<p>These should already be installed from the project setup:</p>
<div class="prereq-grid">
<div class="prereq-item"><span class="check">&#10003;</span> Python 3.11 (via venv or system)</div>
<div class="prereq-item"><span class="check">&#10003;</span> scikit-learn, pandas, numpy</div>
<div class="prereq-item"><span class="check">&#10003;</span> lime, shap, eli5, gradio</div>
<div class="prereq-item"><span class="check">&#10003;</span> nltk, joblib, wordcloud</div>
</div>
<div class="note">
<strong>Missing packages?</strong> Open a terminal in the project folder and run:<br>
<code>pip install -r requirements.txt</code>
</div>
</div>
<!-- Step 1: Launch Gradio App -->
<div class="card">
<h2><span class="num">1</span> Launch the Gradio Web App <span class="tag tag-easy">Easiest</span></h2>
<p>The interactive Gradio app lets you paste any email and see spam/ham predictions with LIME and SHAP explanations. The ensemble model (<code>models/voting_model.joblib</code>) must already be trained β€” see Step 3 if it's missing.</p>
<p><strong>Option A β€” Double-click in Finder:</strong></p>
<ol>
<li>In Finder, navigate to the project folder</li>
<li>Double-click <strong>launch-gradio.command</strong></li>
<li>A terminal window opens and the app starts</li>
<li>Your browser opens automatically at <code>http://127.0.0.1:7860</code></li>
<li>To stop: press <strong>Ctrl+C</strong> in the terminal, or close the window</li>
</ol>
<p><strong>Option B β€” From terminal:</strong></p>
<div class="cmd-block"><span class="prompt">$</span> cd /path/to/spam-xai-project
<span class="prompt">$</span> python3.11 app.py</div>
<div class="success">
<strong>You'll see:</strong> A browser tab opens at <code>http://127.0.0.1:7860</code> with the Spam Classifier interface showing Result, LIME, and SHAP tabs.
</div>
</div>
<!-- Step 2: Open the Notebook -->
<div class="card">
<h2><span class="num">2</span> Open the Student Notebook <span class="tag tag-manual">Full Analysis</span></h2>
<p>The student notebook (<code>spam_classifier_xai_student.ipynb</code>) is the main course deliverable. It contains the full XAI walkthrough: data loading, model training, and LIME / SHAP / ELI5 comparisons using the Kuzlu et al. 2020 methodology.</p>
<p><strong>Option A β€” Double-click in Finder:</strong></p>
<ol>
<li>Double-click <strong>launch-notebook.command</strong></li>
<li>Jupyter opens in your browser</li>
<li>The student notebook opens automatically</li>
</ol>
<p><strong>Option B β€” From terminal:</strong></p>
<div class="cmd-block"><span class="prompt">$</span> cd /path/to/spam-xai-project
<span class="prompt">$</span> python3.11 -m jupyter notebook notebooks/spam_classifier_xai_student.ipynb</div>
<div class="note">
<strong>Re-running the notebook:</strong> Full execution takes ~5-15 minutes (SHAP computation is the slow part). You can run individual cells with <code>Shift+Enter</code>, or run all via <em>Kernel &gt; Restart &amp; Run All</em>.
</div>
<div class="note">
<strong>Note:</strong> Retraining the model (Step 3) does NOT automatically re-run the notebook. After retraining, re-run the notebook manually to get fresh outputs.
</div>
</div>
<!-- Step 3: Retrain -->
<div class="card">
<h2><span class="num">3</span> Retrain the Model</h2>
<p>Two retrain modes are available. Both save new model files to <code>models/</code> and automatically back up the previous models.</p>
<p><strong>Fast mode</strong> β€” single Random Forest, 1000 TF-IDF features, no grid search. Use this to quickly verify the pipeline works after a small change.</p>
<ol>
<li>Double-click <strong>retrain-fast.command</strong></li>
<li>Takes ~2–5 minutes</li>
</ol>
<p><strong>Full mode</strong> <span class="tag tag-slow">~15–30 min</span> β€” VotingClassifier ensemble (RF + LR + SVM), 3000 TF-IDF features. This is the production model.</p>
<ol>
<li>Double-click <strong>retrain-full.command</strong></li>
<li>Takes ~15–30 minutes</li>
</ol>
<p><strong>From terminal:</strong></p>
<div class="cmd-block"><span class="prompt">$</span> python3.11 retrain.py --mode fast <span class="comment"># quick smoke-test</span>
<span class="prompt">$</span> python3.11 retrain.py --mode full <span class="comment"># production model</span>
<span class="prompt">$</span> python3.11 retrain.py --mode full --no-feedback <span class="comment"># ignore user feedback</span></div>
<div class="note">
<strong>User feedback:</strong> By default, retrain merges corrections from <code>data/feedback/feedback_log.csv</code> into training data. Pass <code>--no-feedback</code> to skip this.
</div>
</div>
<!-- Project structure -->
<div class="card">
<h2><span class="num">4</span> Project Files Reference</h2>
<table>
<tr><th>File / Folder</th><th>What It Is</th></tr>
<tr><td><code>launch-gradio.command</code></td><td>Double-click to launch the Gradio web app</td></tr>
<tr><td><code>launch-notebook.command</code></td><td>Double-click to open the student notebook in Jupyter</td></tr>
<tr><td><code>retrain-fast.command</code></td><td>Double-click to retrain in fast mode (~2–5 min)</td></tr>
<tr><td><code>retrain-full.command</code></td><td>Double-click to retrain in full mode (~15–30 min)</td></tr>
<tr><td><code>app.py</code></td><td>Gradio web app (Result, LIME, SHAP tabs)</td></tr>
<tr><td><code>retrain.py</code></td><td>Retrain script β€” fast or full mode</td></tr>
<tr><td><code>train_ensemble.py</code></td><td>Lower-level ensemble training script</td></tr>
<tr><td><code>utils.py</code></td><td>Text preprocessing + 24 metadata features (shared by app and notebooks)</td></tr>
<tr><td><code>notebooks/spam_classifier_xai_student.ipynb</code></td><td>Main course deliverable β€” full XAI analysis (LIME, SHAP, ELI5)</td></tr>
<tr><td><code>notebooks/spam_classifier_gradio.ipynb</code></td><td>Ensemble training pipeline and Gradio deployment notebook</td></tr>
<tr><td><code>models/voting_model.joblib</code></td><td>Trained VotingClassifier ensemble (RF + LR + SVM)</td></tr>
<tr><td><code>models/tfidf_vectorizer.joblib</code></td><td>Fitted TF-IDF vectorizer</td></tr>
<tr><td><code>data/spam_Emails_data.csv</code></td><td>Kaggle spam email dataset</td></tr>
<tr><td><code>data/email-dataset-main/</code></td><td>GitHub email dataset (spam/ and ham/ folders)</td></tr>
<tr><td><code>data/feedback/feedback_log.csv</code></td><td>User corrections collected from the Gradio app</td></tr>
<tr><td><code>requirements.txt</code></td><td>Python package dependencies</td></tr>
</table>
</div>
<!-- Troubleshooting -->
<div class="card">
<h2><span class="num">5</span> Troubleshooting</h2>
<table>
<tr><th>Problem</th><th>Fix</th></tr>
<tr><td><code>ModuleNotFoundError</code></td><td>Run <code>pip install -r requirements.txt</code> to install all dependencies.</td></tr>
<tr><td>App says "Model files not found"</td><td>The model hasn't been trained yet. Run <strong>retrain-fast.command</strong> or <strong>retrain-full.command</strong> first.</td></tr>
<tr><td>Browser doesn't open automatically</td><td>Manually go to <a href="http://127.0.0.1:7860">http://127.0.0.1:7860</a> in your browser.</td></tr>
<tr><td><code>permission denied</code> on .command files</td><td>Run once in terminal: <code>chmod +x *.command</code></td></tr>
<tr><td>Notebook kernel not found</td><td>Run: <code>python3.11 -m ipykernel install --user --name spam-xai</code></td></tr>
<tr><td>NLTK data missing</td><td>Run: <code>python3.11 -c "import nltk; nltk.download('punkt'); nltk.download('stopwords')"</code></td></tr>
</table>
</div>
</div>
</body>
</html>