File size: 12,431 Bytes
960ec3d | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 | <!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>How to Run β Spam Classifier XAI Project</title>
<style>
* { margin: 0; padding: 0; box-sizing: border-box; }
body { font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif; background: #f5f7fa; color: #333; line-height: 1.6; padding: 40px 20px; }
.container { max-width: 800px; margin: 0 auto; }
h1 { font-size: 28px; color: #1a1a2e; margin-bottom: 8px; }
.subtitle { color: #666; font-size: 14px; margin-bottom: 32px; }
.card { background: #fff; border-radius: 12px; padding: 28px; margin-bottom: 20px; box-shadow: 0 2px 8px rgba(0,0,0,0.08); }
.card h2 { font-size: 20px; color: #1a1a2e; margin-bottom: 4px; display: flex; align-items: center; gap: 10px; }
.card h2 .num { background: #4361ee; color: #fff; width: 32px; height: 32px; border-radius: 50%; display: inline-flex; align-items: center; justify-content: center; font-size: 16px; flex-shrink: 0; }
.card p { margin: 10px 0; color: #555; }
.card ul, .card ol { margin: 10px 0 10px 20px; color: #555; }
.card ul li, .card ol li { margin-bottom: 6px; }
code { background: #e8edf3; padding: 2px 8px; border-radius: 4px; font-family: 'Consolas', 'Courier New', monospace; font-size: 14px; }
.cmd-block { background: #1a1a2e; color: #e0e0e0; padding: 16px 20px; border-radius: 8px; margin: 12px 0; font-family: 'Consolas', 'Courier New', monospace; font-size: 14px; overflow-x: auto; position: relative; white-space: pre-wrap; word-break: break-all; }
.cmd-block .prompt { color: #4361ee; }
.cmd-block .comment { color: #6c757d; }
.tag { display: inline-block; padding: 3px 10px; border-radius: 20px; font-size: 12px; font-weight: 600; margin-left: 8px; }
.tag-easy { background: #d4edda; color: #155724; }
.tag-manual { background: #fff3cd; color: #856404; }
.tag-slow { background: #fde8e8; color: #7b1c1c; }
.divider { border: none; border-top: 1px solid #e0e0e0; margin: 24px 0; }
.note { background: #fff8e1; border-left: 4px solid #ffc107; padding: 14px 18px; border-radius: 0 8px 8px 0; margin: 16px 0; font-size: 14px; }
.note strong { color: #856404; }
.success { background: #e8f5e9; border-left: 4px solid #4caf50; padding: 14px 18px; border-radius: 0 8px 8px 0; margin: 16px 0; font-size: 14px; }
.success strong { color: #2e7d32; }
a { color: #4361ee; text-decoration: none; }
a:hover { text-decoration: underline; }
.prereq-grid { display: grid; grid-template-columns: 1fr 1fr; gap: 10px; margin: 12px 0; }
.prereq-item { background: #f8f9fa; padding: 10px 14px; border-radius: 8px; font-size: 14px; }
.prereq-item .check { color: #4caf50; font-weight: bold; margin-right: 6px; }
table { width: 100%; border-collapse: collapse; margin-top: 12px; font-size: 14px; }
th { text-align: left; padding: 8px; border-bottom: 2px solid #e0e0e0; }
td { padding: 8px; border-bottom: 1px solid #f0f0f0; }
</style>
</head>
<body>
<div class="container">
<h1>Spam Classifier with Explainable AI</h1>
<p class="subtitle">ENGT 375 β Applied Machine Learning | Spring 2026 | ODU</p>
<!-- Prerequisites -->
<div class="card">
<h2>Prerequisites</h2>
<p>These should already be installed from the project setup:</p>
<div class="prereq-grid">
<div class="prereq-item"><span class="check">✓</span> Python 3.11 (via venv or system)</div>
<div class="prereq-item"><span class="check">✓</span> scikit-learn, pandas, numpy</div>
<div class="prereq-item"><span class="check">✓</span> lime, shap, eli5, gradio</div>
<div class="prereq-item"><span class="check">✓</span> nltk, joblib, wordcloud</div>
</div>
<div class="note">
<strong>Missing packages?</strong> Open a terminal in the project folder and run:<br>
<code>pip install -r requirements.txt</code>
</div>
</div>
<!-- Step 1: Launch Gradio App -->
<div class="card">
<h2><span class="num">1</span> Launch the Gradio Web App <span class="tag tag-easy">Easiest</span></h2>
<p>The interactive Gradio app lets you paste any email and see spam/ham predictions with LIME and SHAP explanations. The ensemble model (<code>models/voting_model.joblib</code>) must already be trained β see Step 3 if it's missing.</p>
<p><strong>Option A β Double-click in Finder:</strong></p>
<ol>
<li>In Finder, navigate to the project folder</li>
<li>Double-click <strong>launch-gradio.command</strong></li>
<li>A terminal window opens and the app starts</li>
<li>Your browser opens automatically at <code>http://127.0.0.1:7860</code></li>
<li>To stop: press <strong>Ctrl+C</strong> in the terminal, or close the window</li>
</ol>
<p><strong>Option B β From terminal:</strong></p>
<div class="cmd-block"><span class="prompt">$</span> cd /path/to/spam-xai-project
<span class="prompt">$</span> python3.11 app.py</div>
<div class="success">
<strong>You'll see:</strong> A browser tab opens at <code>http://127.0.0.1:7860</code> with the Spam Classifier interface showing Result, LIME, and SHAP tabs.
</div>
</div>
<!-- Step 2: Open the Notebook -->
<div class="card">
<h2><span class="num">2</span> Open the Student Notebook <span class="tag tag-manual">Full Analysis</span></h2>
<p>The student notebook (<code>spam_classifier_xai_student.ipynb</code>) is the main course deliverable. It contains the full XAI walkthrough: data loading, model training, and LIME / SHAP / ELI5 comparisons using the Kuzlu et al. 2020 methodology.</p>
<p><strong>Option A β Double-click in Finder:</strong></p>
<ol>
<li>Double-click <strong>launch-notebook.command</strong></li>
<li>Jupyter opens in your browser</li>
<li>The student notebook opens automatically</li>
</ol>
<p><strong>Option B β From terminal:</strong></p>
<div class="cmd-block"><span class="prompt">$</span> cd /path/to/spam-xai-project
<span class="prompt">$</span> python3.11 -m jupyter notebook notebooks/spam_classifier_xai_student.ipynb</div>
<div class="note">
<strong>Re-running the notebook:</strong> Full execution takes ~5-15 minutes (SHAP computation is the slow part). You can run individual cells with <code>Shift+Enter</code>, or run all via <em>Kernel > Restart & Run All</em>.
</div>
<div class="note">
<strong>Note:</strong> Retraining the model (Step 3) does NOT automatically re-run the notebook. After retraining, re-run the notebook manually to get fresh outputs.
</div>
</div>
<!-- Step 3: Retrain -->
<div class="card">
<h2><span class="num">3</span> Retrain the Model</h2>
<p>Two retrain modes are available. Both save new model files to <code>models/</code> and automatically back up the previous models.</p>
<p><strong>Fast mode</strong> β single Random Forest, 1000 TF-IDF features, no grid search. Use this to quickly verify the pipeline works after a small change.</p>
<ol>
<li>Double-click <strong>retrain-fast.command</strong></li>
<li>Takes ~2β5 minutes</li>
</ol>
<p><strong>Full mode</strong> <span class="tag tag-slow">~15β30 min</span> β VotingClassifier ensemble (RF + LR + SVM), 3000 TF-IDF features. This is the production model.</p>
<ol>
<li>Double-click <strong>retrain-full.command</strong></li>
<li>Takes ~15β30 minutes</li>
</ol>
<p><strong>From terminal:</strong></p>
<div class="cmd-block"><span class="prompt">$</span> python3.11 retrain.py --mode fast <span class="comment"># quick smoke-test</span>
<span class="prompt">$</span> python3.11 retrain.py --mode full <span class="comment"># production model</span>
<span class="prompt">$</span> python3.11 retrain.py --mode full --no-feedback <span class="comment"># ignore user feedback</span></div>
<div class="note">
<strong>User feedback:</strong> By default, retrain merges corrections from <code>data/feedback/feedback_log.csv</code> into training data. Pass <code>--no-feedback</code> to skip this.
</div>
</div>
<!-- Project structure -->
<div class="card">
<h2><span class="num">4</span> Project Files Reference</h2>
<table>
<tr><th>File / Folder</th><th>What It Is</th></tr>
<tr><td><code>launch-gradio.command</code></td><td>Double-click to launch the Gradio web app</td></tr>
<tr><td><code>launch-notebook.command</code></td><td>Double-click to open the student notebook in Jupyter</td></tr>
<tr><td><code>retrain-fast.command</code></td><td>Double-click to retrain in fast mode (~2β5 min)</td></tr>
<tr><td><code>retrain-full.command</code></td><td>Double-click to retrain in full mode (~15β30 min)</td></tr>
<tr><td><code>app.py</code></td><td>Gradio web app (Result, LIME, SHAP tabs)</td></tr>
<tr><td><code>retrain.py</code></td><td>Retrain script β fast or full mode</td></tr>
<tr><td><code>train_ensemble.py</code></td><td>Lower-level ensemble training script</td></tr>
<tr><td><code>utils.py</code></td><td>Text preprocessing + 24 metadata features (shared by app and notebooks)</td></tr>
<tr><td><code>notebooks/spam_classifier_xai_student.ipynb</code></td><td>Main course deliverable β full XAI analysis (LIME, SHAP, ELI5)</td></tr>
<tr><td><code>notebooks/spam_classifier_gradio.ipynb</code></td><td>Ensemble training pipeline and Gradio deployment notebook</td></tr>
<tr><td><code>models/voting_model.joblib</code></td><td>Trained VotingClassifier ensemble (RF + LR + SVM)</td></tr>
<tr><td><code>models/tfidf_vectorizer.joblib</code></td><td>Fitted TF-IDF vectorizer</td></tr>
<tr><td><code>data/spam_Emails_data.csv</code></td><td>Kaggle spam email dataset</td></tr>
<tr><td><code>data/email-dataset-main/</code></td><td>GitHub email dataset (spam/ and ham/ folders)</td></tr>
<tr><td><code>data/feedback/feedback_log.csv</code></td><td>User corrections collected from the Gradio app</td></tr>
<tr><td><code>requirements.txt</code></td><td>Python package dependencies</td></tr>
</table>
</div>
<!-- Troubleshooting -->
<div class="card">
<h2><span class="num">5</span> Troubleshooting</h2>
<table>
<tr><th>Problem</th><th>Fix</th></tr>
<tr><td><code>ModuleNotFoundError</code></td><td>Run <code>pip install -r requirements.txt</code> to install all dependencies.</td></tr>
<tr><td>App says "Model files not found"</td><td>The model hasn't been trained yet. Run <strong>retrain-fast.command</strong> or <strong>retrain-full.command</strong> first.</td></tr>
<tr><td>Browser doesn't open automatically</td><td>Manually go to <a href="http://127.0.0.1:7860">http://127.0.0.1:7860</a> in your browser.</td></tr>
<tr><td><code>permission denied</code> on .command files</td><td>Run once in terminal: <code>chmod +x *.command</code></td></tr>
<tr><td>Notebook kernel not found</td><td>Run: <code>python3.11 -m ipykernel install --user --name spam-xai</code></td></tr>
<tr><td>NLTK data missing</td><td>Run: <code>python3.11 -c "import nltk; nltk.download('punkt'); nltk.download('stopwords')"</code></td></tr>
</table>
</div>
</div>
</body>
</html>
|