File size: 12,431 Bytes

960ec3d

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>How to Run — Spam Classifier XAI Project</title>
    <style>
        * { margin: 0; padding: 0; box-sizing: border-box; }
        body { font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif; background: #f5f7fa; color: #333; line-height: 1.6; padding: 40px 20px; }
        .container { max-width: 800px; margin: 0 auto; }
        h1 { font-size: 28px; color: #1a1a2e; margin-bottom: 8px; }
        .subtitle { color: #666; font-size: 14px; margin-bottom: 32px; }
        .card { background: #fff; border-radius: 12px; padding: 28px; margin-bottom: 20px; box-shadow: 0 2px 8px rgba(0,0,0,0.08); }
        .card h2 { font-size: 20px; color: #1a1a2e; margin-bottom: 4px; display: flex; align-items: center; gap: 10px; }
        .card h2 .num { background: #4361ee; color: #fff; width: 32px; height: 32px; border-radius: 50%; display: inline-flex; align-items: center; justify-content: center; font-size: 16px; flex-shrink: 0; }
        .card p { margin: 10px 0; color: #555; }
        .card ul, .card ol { margin: 10px 0 10px 20px; color: #555; }
        .card ul li, .card ol li { margin-bottom: 6px; }
        code { background: #e8edf3; padding: 2px 8px; border-radius: 4px; font-family: 'Consolas', 'Courier New', monospace; font-size: 14px; }
        .cmd-block { background: #1a1a2e; color: #e0e0e0; padding: 16px 20px; border-radius: 8px; margin: 12px 0; font-family: 'Consolas', 'Courier New', monospace; font-size: 14px; overflow-x: auto; position: relative; white-space: pre-wrap; word-break: break-all; }
        .cmd-block .prompt { color: #4361ee; }
        .cmd-block .comment { color: #6c757d; }
        .tag { display: inline-block; padding: 3px 10px; border-radius: 20px; font-size: 12px; font-weight: 600; margin-left: 8px; }
        .tag-easy { background: #d4edda; color: #155724; }
        .tag-manual { background: #fff3cd; color: #856404; }
        .tag-slow { background: #fde8e8; color: #7b1c1c; }
        .divider { border: none; border-top: 1px solid #e0e0e0; margin: 24px 0; }
        .note { background: #fff8e1; border-left: 4px solid #ffc107; padding: 14px 18px; border-radius: 0 8px 8px 0; margin: 16px 0; font-size: 14px; }
        .note strong { color: #856404; }
        .success { background: #e8f5e9; border-left: 4px solid #4caf50; padding: 14px 18px; border-radius: 0 8px 8px 0; margin: 16px 0; font-size: 14px; }
        .success strong { color: #2e7d32; }
        a { color: #4361ee; text-decoration: none; }
        a:hover { text-decoration: underline; }
        .prereq-grid { display: grid; grid-template-columns: 1fr 1fr; gap: 10px; margin: 12px 0; }
        .prereq-item { background: #f8f9fa; padding: 10px 14px; border-radius: 8px; font-size: 14px; }
        .prereq-item .check { color: #4caf50; font-weight: bold; margin-right: 6px; }
        table { width: 100%; border-collapse: collapse; margin-top: 12px; font-size: 14px; }
        th { text-align: left; padding: 8px; border-bottom: 2px solid #e0e0e0; }
        td { padding: 8px; border-bottom: 1px solid #f0f0f0; }
    </style>
</head>
<body>
    <div class="container">
        <h1>Spam Classifier with Explainable AI</h1>
        <p class="subtitle">ENGT 375 — Applied Machine Learning | Spring 2026 | ODU</p>

        <!-- Prerequisites -->
        <div class="card">
            <h2>Prerequisites</h2>
            <p>These should already be installed from the project setup:</p>
            <div class="prereq-grid">
                <div class="prereq-item"><span class="check">&#10003;</span> Python 3.11 (via venv or system)</div>
                <div class="prereq-item"><span class="check">&#10003;</span> scikit-learn, pandas, numpy</div>
                <div class="prereq-item"><span class="check">&#10003;</span> lime, shap, eli5, gradio</div>
                <div class="prereq-item"><span class="check">&#10003;</span> nltk, joblib, wordcloud</div>
            </div>
            <div class="note">
                <strong>Missing packages?</strong> Open a terminal in the project folder and run:<br>
                <code>pip install -r requirements.txt</code>
            </div>
        </div>

        <!-- Step 1: Launch Gradio App -->
        <div class="card">
            <h2><span class="num">1</span> Launch the Gradio Web App <span class="tag tag-easy">Easiest</span></h2>
            <p>The interactive Gradio app lets you paste any email and see spam/ham predictions with LIME and SHAP explanations. The ensemble model (<code>models/voting_model.joblib</code>) must already be trained — see Step 3 if it's missing.</p>

            <p><strong>Option A — Double-click in Finder:</strong></p>
            <ol>
                <li>In Finder, navigate to the project folder</li>
                <li>Double-click <strong>launch-gradio.command</strong></li>
                <li>A terminal window opens and the app starts</li>
                <li>Your browser opens automatically at <code>http://127.0.0.1:7860</code></li>
                <li>To stop: press <strong>Ctrl+C</strong> in the terminal, or close the window</li>
            </ol>

            <p><strong>Option B — From terminal:</strong></p>
            <div class="cmd-block"><span class="prompt">$</span> cd /path/to/spam-xai-project
<span class="prompt">$</span> python3.11 app.py</div>

            <div class="success">
                <strong>You'll see:</strong> A browser tab opens at <code>http://127.0.0.1:7860</code> with the Spam Classifier interface showing Result, LIME, and SHAP tabs.
            </div>
        </div>

        <!-- Step 2: Open the Notebook -->
        <div class="card">
            <h2><span class="num">2</span> Open the Student Notebook <span class="tag tag-manual">Full Analysis</span></h2>
            <p>The student notebook (<code>spam_classifier_xai_student.ipynb</code>) is the main course deliverable. It contains the full XAI walkthrough: data loading, model training, and LIME / SHAP / ELI5 comparisons using the Kuzlu et al. 2020 methodology.</p>

            <p><strong>Option A — Double-click in Finder:</strong></p>
            <ol>
                <li>Double-click <strong>launch-notebook.command</strong></li>
                <li>Jupyter opens in your browser</li>
                <li>The student notebook opens automatically</li>
            </ol>

            <p><strong>Option B — From terminal:</strong></p>
            <div class="cmd-block"><span class="prompt">$</span> cd /path/to/spam-xai-project
<span class="prompt">$</span> python3.11 -m jupyter notebook notebooks/spam_classifier_xai_student.ipynb</div>

            <div class="note">
                <strong>Re-running the notebook:</strong> Full execution takes ~5-15 minutes (SHAP computation is the slow part). You can run individual cells with <code>Shift+Enter</code>, or run all via <em>Kernel &gt; Restart &amp; Run All</em>.
            </div>
            <div class="note">
                <strong>Note:</strong> Retraining the model (Step 3) does NOT automatically re-run the notebook. After retraining, re-run the notebook manually to get fresh outputs.
            </div>
        </div>

        <!-- Step 3: Retrain -->
        <div class="card">
            <h2><span class="num">3</span> Retrain the Model</h2>
            <p>Two retrain modes are available. Both save new model files to <code>models/</code> and automatically back up the previous models.</p>

            <p><strong>Fast mode</strong> — single Random Forest, 1000 TF-IDF features, no grid search. Use this to quickly verify the pipeline works after a small change.</p>
            <ol>
                <li>Double-click <strong>retrain-fast.command</strong></li>
                <li>Takes ~2–5 minutes</li>
            </ol>

            <p><strong>Full mode</strong> <span class="tag tag-slow">~15–30 min</span> — VotingClassifier ensemble (RF + LR + SVM), 3000 TF-IDF features. This is the production model.</p>
            <ol>
                <li>Double-click <strong>retrain-full.command</strong></li>
                <li>Takes ~15–30 minutes</li>
            </ol>

            <p><strong>From terminal:</strong></p>
            <div class="cmd-block"><span class="prompt">$</span> python3.11 retrain.py --mode fast    <span class="comment"># quick smoke-test</span>
<span class="prompt">$</span> python3.11 retrain.py --mode full    <span class="comment"># production model</span>
<span class="prompt">$</span> python3.11 retrain.py --mode full --no-feedback  <span class="comment"># ignore user feedback</span></div>

            <div class="note">
                <strong>User feedback:</strong> By default, retrain merges corrections from <code>data/feedback/feedback_log.csv</code> into training data. Pass <code>--no-feedback</code> to skip this.
            </div>
        </div>

        <!-- Project structure -->
        <div class="card">
            <h2><span class="num">4</span> Project Files Reference</h2>
            <table>
                <tr><th>File / Folder</th><th>What It Is</th></tr>
                <tr><td><code>launch-gradio.command</code></td><td>Double-click to launch the Gradio web app</td></tr>
                <tr><td><code>launch-notebook.command</code></td><td>Double-click to open the student notebook in Jupyter</td></tr>
                <tr><td><code>retrain-fast.command</code></td><td>Double-click to retrain in fast mode (~2–5 min)</td></tr>
                <tr><td><code>retrain-full.command</code></td><td>Double-click to retrain in full mode (~15–30 min)</td></tr>
                <tr><td><code>app.py</code></td><td>Gradio web app (Result, LIME, SHAP tabs)</td></tr>
                <tr><td><code>retrain.py</code></td><td>Retrain script — fast or full mode</td></tr>
                <tr><td><code>train_ensemble.py</code></td><td>Lower-level ensemble training script</td></tr>
                <tr><td><code>utils.py</code></td><td>Text preprocessing + 24 metadata features (shared by app and notebooks)</td></tr>
                <tr><td><code>notebooks/spam_classifier_xai_student.ipynb</code></td><td>Main course deliverable — full XAI analysis (LIME, SHAP, ELI5)</td></tr>
                <tr><td><code>notebooks/spam_classifier_gradio.ipynb</code></td><td>Ensemble training pipeline and Gradio deployment notebook</td></tr>
                <tr><td><code>models/voting_model.joblib</code></td><td>Trained VotingClassifier ensemble (RF + LR + SVM)</td></tr>
                <tr><td><code>models/tfidf_vectorizer.joblib</code></td><td>Fitted TF-IDF vectorizer</td></tr>
                <tr><td><code>data/spam_Emails_data.csv</code></td><td>Kaggle spam email dataset</td></tr>
                <tr><td><code>data/email-dataset-main/</code></td><td>GitHub email dataset (spam/ and ham/ folders)</td></tr>
                <tr><td><code>data/feedback/feedback_log.csv</code></td><td>User corrections collected from the Gradio app</td></tr>
                <tr><td><code>requirements.txt</code></td><td>Python package dependencies</td></tr>
            </table>
        </div>

        <!-- Troubleshooting -->
        <div class="card">
            <h2><span class="num">5</span> Troubleshooting</h2>
            <table>
                <tr><th>Problem</th><th>Fix</th></tr>
                <tr><td><code>ModuleNotFoundError</code></td><td>Run <code>pip install -r requirements.txt</code> to install all dependencies.</td></tr>
                <tr><td>App says "Model files not found"</td><td>The model hasn't been trained yet. Run <strong>retrain-fast.command</strong> or <strong>retrain-full.command</strong> first.</td></tr>
                <tr><td>Browser doesn't open automatically</td><td>Manually go to <a href="http://127.0.0.1:7860">http://127.0.0.1:7860</a> in your browser.</td></tr>
                <tr><td><code>permission denied</code> on .command files</td><td>Run once in terminal: <code>chmod +x *.command</code></td></tr>
                <tr><td>Notebook kernel not found</td><td>Run: <code>python3.11 -m ipykernel install --user --name spam-xai</code></td></tr>
                <tr><td>NLTK data missing</td><td>Run: <code>python3.11 -c "import nltk; nltk.download('punkt'); nltk.download('stopwords')"</code></td></tr>
            </table>
        </div>

    </div>
</body>
</html>