File size: 12,431 Bytes
960ec3d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>How to Run β€” Spam Classifier XAI Project</title>
    <style>
        * { margin: 0; padding: 0; box-sizing: border-box; }
        body { font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif; background: #f5f7fa; color: #333; line-height: 1.6; padding: 40px 20px; }
        .container { max-width: 800px; margin: 0 auto; }
        h1 { font-size: 28px; color: #1a1a2e; margin-bottom: 8px; }
        .subtitle { color: #666; font-size: 14px; margin-bottom: 32px; }
        .card { background: #fff; border-radius: 12px; padding: 28px; margin-bottom: 20px; box-shadow: 0 2px 8px rgba(0,0,0,0.08); }
        .card h2 { font-size: 20px; color: #1a1a2e; margin-bottom: 4px; display: flex; align-items: center; gap: 10px; }
        .card h2 .num { background: #4361ee; color: #fff; width: 32px; height: 32px; border-radius: 50%; display: inline-flex; align-items: center; justify-content: center; font-size: 16px; flex-shrink: 0; }
        .card p { margin: 10px 0; color: #555; }
        .card ul, .card ol { margin: 10px 0 10px 20px; color: #555; }
        .card ul li, .card ol li { margin-bottom: 6px; }
        code { background: #e8edf3; padding: 2px 8px; border-radius: 4px; font-family: 'Consolas', 'Courier New', monospace; font-size: 14px; }
        .cmd-block { background: #1a1a2e; color: #e0e0e0; padding: 16px 20px; border-radius: 8px; margin: 12px 0; font-family: 'Consolas', 'Courier New', monospace; font-size: 14px; overflow-x: auto; position: relative; white-space: pre-wrap; word-break: break-all; }
        .cmd-block .prompt { color: #4361ee; }
        .cmd-block .comment { color: #6c757d; }
        .tag { display: inline-block; padding: 3px 10px; border-radius: 20px; font-size: 12px; font-weight: 600; margin-left: 8px; }
        .tag-easy { background: #d4edda; color: #155724; }
        .tag-manual { background: #fff3cd; color: #856404; }
        .tag-slow { background: #fde8e8; color: #7b1c1c; }
        .divider { border: none; border-top: 1px solid #e0e0e0; margin: 24px 0; }
        .note { background: #fff8e1; border-left: 4px solid #ffc107; padding: 14px 18px; border-radius: 0 8px 8px 0; margin: 16px 0; font-size: 14px; }
        .note strong { color: #856404; }
        .success { background: #e8f5e9; border-left: 4px solid #4caf50; padding: 14px 18px; border-radius: 0 8px 8px 0; margin: 16px 0; font-size: 14px; }
        .success strong { color: #2e7d32; }
        a { color: #4361ee; text-decoration: none; }
        a:hover { text-decoration: underline; }
        .prereq-grid { display: grid; grid-template-columns: 1fr 1fr; gap: 10px; margin: 12px 0; }
        .prereq-item { background: #f8f9fa; padding: 10px 14px; border-radius: 8px; font-size: 14px; }
        .prereq-item .check { color: #4caf50; font-weight: bold; margin-right: 6px; }
        table { width: 100%; border-collapse: collapse; margin-top: 12px; font-size: 14px; }
        th { text-align: left; padding: 8px; border-bottom: 2px solid #e0e0e0; }
        td { padding: 8px; border-bottom: 1px solid #f0f0f0; }
    </style>
</head>
<body>
    <div class="container">
        <h1>Spam Classifier with Explainable AI</h1>
        <p class="subtitle">ENGT 375 β€” Applied Machine Learning | Spring 2026 | ODU</p>

        <!-- Prerequisites -->
        <div class="card">
            <h2>Prerequisites</h2>
            <p>These should already be installed from the project setup:</p>
            <div class="prereq-grid">
                <div class="prereq-item"><span class="check">&#10003;</span> Python 3.11 (via venv or system)</div>
                <div class="prereq-item"><span class="check">&#10003;</span> scikit-learn, pandas, numpy</div>
                <div class="prereq-item"><span class="check">&#10003;</span> lime, shap, eli5, gradio</div>
                <div class="prereq-item"><span class="check">&#10003;</span> nltk, joblib, wordcloud</div>
            </div>
            <div class="note">
                <strong>Missing packages?</strong> Open a terminal in the project folder and run:<br>
                <code>pip install -r requirements.txt</code>
            </div>
        </div>

        <!-- Step 1: Launch Gradio App -->
        <div class="card">
            <h2><span class="num">1</span> Launch the Gradio Web App <span class="tag tag-easy">Easiest</span></h2>
            <p>The interactive Gradio app lets you paste any email and see spam/ham predictions with LIME and SHAP explanations. The ensemble model (<code>models/voting_model.joblib</code>) must already be trained β€” see Step 3 if it's missing.</p>

            <p><strong>Option A β€” Double-click in Finder:</strong></p>
            <ol>
                <li>In Finder, navigate to the project folder</li>
                <li>Double-click <strong>launch-gradio.command</strong></li>
                <li>A terminal window opens and the app starts</li>
                <li>Your browser opens automatically at <code>http://127.0.0.1:7860</code></li>
                <li>To stop: press <strong>Ctrl+C</strong> in the terminal, or close the window</li>
            </ol>

            <p><strong>Option B β€” From terminal:</strong></p>
            <div class="cmd-block"><span class="prompt">$</span> cd /path/to/spam-xai-project
<span class="prompt">$</span> python3.11 app.py</div>

            <div class="success">
                <strong>You'll see:</strong> A browser tab opens at <code>http://127.0.0.1:7860</code> with the Spam Classifier interface showing Result, LIME, and SHAP tabs.
            </div>
        </div>

        <!-- Step 2: Open the Notebook -->
        <div class="card">
            <h2><span class="num">2</span> Open the Student Notebook <span class="tag tag-manual">Full Analysis</span></h2>
            <p>The student notebook (<code>spam_classifier_xai_student.ipynb</code>) is the main course deliverable. It contains the full XAI walkthrough: data loading, model training, and LIME / SHAP / ELI5 comparisons using the Kuzlu et al. 2020 methodology.</p>

            <p><strong>Option A β€” Double-click in Finder:</strong></p>
            <ol>
                <li>Double-click <strong>launch-notebook.command</strong></li>
                <li>Jupyter opens in your browser</li>
                <li>The student notebook opens automatically</li>
            </ol>

            <p><strong>Option B β€” From terminal:</strong></p>
            <div class="cmd-block"><span class="prompt">$</span> cd /path/to/spam-xai-project
<span class="prompt">$</span> python3.11 -m jupyter notebook notebooks/spam_classifier_xai_student.ipynb</div>

            <div class="note">
                <strong>Re-running the notebook:</strong> Full execution takes ~5-15 minutes (SHAP computation is the slow part). You can run individual cells with <code>Shift+Enter</code>, or run all via <em>Kernel &gt; Restart &amp; Run All</em>.
            </div>
            <div class="note">
                <strong>Note:</strong> Retraining the model (Step 3) does NOT automatically re-run the notebook. After retraining, re-run the notebook manually to get fresh outputs.
            </div>
        </div>

        <!-- Step 3: Retrain -->
        <div class="card">
            <h2><span class="num">3</span> Retrain the Model</h2>
            <p>Two retrain modes are available. Both save new model files to <code>models/</code> and automatically back up the previous models.</p>

            <p><strong>Fast mode</strong> β€” single Random Forest, 1000 TF-IDF features, no grid search. Use this to quickly verify the pipeline works after a small change.</p>
            <ol>
                <li>Double-click <strong>retrain-fast.command</strong></li>
                <li>Takes ~2–5 minutes</li>
            </ol>

            <p><strong>Full mode</strong> <span class="tag tag-slow">~15–30 min</span> β€” VotingClassifier ensemble (RF + LR + SVM), 3000 TF-IDF features. This is the production model.</p>
            <ol>
                <li>Double-click <strong>retrain-full.command</strong></li>
                <li>Takes ~15–30 minutes</li>
            </ol>

            <p><strong>From terminal:</strong></p>
            <div class="cmd-block"><span class="prompt">$</span> python3.11 retrain.py --mode fast    <span class="comment"># quick smoke-test</span>
<span class="prompt">$</span> python3.11 retrain.py --mode full    <span class="comment"># production model</span>
<span class="prompt">$</span> python3.11 retrain.py --mode full --no-feedback  <span class="comment"># ignore user feedback</span></div>

            <div class="note">
                <strong>User feedback:</strong> By default, retrain merges corrections from <code>data/feedback/feedback_log.csv</code> into training data. Pass <code>--no-feedback</code> to skip this.
            </div>
        </div>

        <!-- Project structure -->
        <div class="card">
            <h2><span class="num">4</span> Project Files Reference</h2>
            <table>
                <tr><th>File / Folder</th><th>What It Is</th></tr>
                <tr><td><code>launch-gradio.command</code></td><td>Double-click to launch the Gradio web app</td></tr>
                <tr><td><code>launch-notebook.command</code></td><td>Double-click to open the student notebook in Jupyter</td></tr>
                <tr><td><code>retrain-fast.command</code></td><td>Double-click to retrain in fast mode (~2–5 min)</td></tr>
                <tr><td><code>retrain-full.command</code></td><td>Double-click to retrain in full mode (~15–30 min)</td></tr>
                <tr><td><code>app.py</code></td><td>Gradio web app (Result, LIME, SHAP tabs)</td></tr>
                <tr><td><code>retrain.py</code></td><td>Retrain script β€” fast or full mode</td></tr>
                <tr><td><code>train_ensemble.py</code></td><td>Lower-level ensemble training script</td></tr>
                <tr><td><code>utils.py</code></td><td>Text preprocessing + 24 metadata features (shared by app and notebooks)</td></tr>
                <tr><td><code>notebooks/spam_classifier_xai_student.ipynb</code></td><td>Main course deliverable β€” full XAI analysis (LIME, SHAP, ELI5)</td></tr>
                <tr><td><code>notebooks/spam_classifier_gradio.ipynb</code></td><td>Ensemble training pipeline and Gradio deployment notebook</td></tr>
                <tr><td><code>models/voting_model.joblib</code></td><td>Trained VotingClassifier ensemble (RF + LR + SVM)</td></tr>
                <tr><td><code>models/tfidf_vectorizer.joblib</code></td><td>Fitted TF-IDF vectorizer</td></tr>
                <tr><td><code>data/spam_Emails_data.csv</code></td><td>Kaggle spam email dataset</td></tr>
                <tr><td><code>data/email-dataset-main/</code></td><td>GitHub email dataset (spam/ and ham/ folders)</td></tr>
                <tr><td><code>data/feedback/feedback_log.csv</code></td><td>User corrections collected from the Gradio app</td></tr>
                <tr><td><code>requirements.txt</code></td><td>Python package dependencies</td></tr>
            </table>
        </div>

        <!-- Troubleshooting -->
        <div class="card">
            <h2><span class="num">5</span> Troubleshooting</h2>
            <table>
                <tr><th>Problem</th><th>Fix</th></tr>
                <tr><td><code>ModuleNotFoundError</code></td><td>Run <code>pip install -r requirements.txt</code> to install all dependencies.</td></tr>
                <tr><td>App says "Model files not found"</td><td>The model hasn't been trained yet. Run <strong>retrain-fast.command</strong> or <strong>retrain-full.command</strong> first.</td></tr>
                <tr><td>Browser doesn't open automatically</td><td>Manually go to <a href="http://127.0.0.1:7860">http://127.0.0.1:7860</a> in your browser.</td></tr>
                <tr><td><code>permission denied</code> on .command files</td><td>Run once in terminal: <code>chmod +x *.command</code></td></tr>
                <tr><td>Notebook kernel not found</td><td>Run: <code>python3.11 -m ipykernel install --user --name spam-xai</code></td></tr>
                <tr><td>NLTK data missing</td><td>Run: <code>python3.11 -c "import nltk; nltk.download('punkt'); nltk.download('stopwords')"</code></td></tr>
            </table>
        </div>

    </div>
</body>
</html>