Spaces:

build-small-hackathon
/

lesson-agent

Running on Zero

MSG commited on 15 days ago

Commit

1e52a1f

1 Parent(s): bd75839

Feat/monday 4 sprint fast (#21)

* quizz makers wip

* skills quizz

* quizz harness

* quizz run skills ui

* index html quizz

* quizz maker skills

* fix common

* fix common and modal app

* experimental wip

* experiemnt wip

* common check and multiple publish

Files changed (24) hide show

.cursor/plans/quiz_maker_skill_52f29d14.plan.md +5 -5
apps/gradio-space/README.md +6 -2
apps/gradio-space/src/gradio_space/api/studio.py +155 -0
apps/gradio-space/src/gradio_space/app.py +3 -0
apps/gradio-space/src/gradio_space/server.py +1 -1
apps/gradio-space/src/gradio_space/tabs/__init__.py +2 -0
apps/gradio-space/src/gradio_space/tabs/quiz_maker.py +430 -0
apps/gradio-space/static/studio/index.html +118 -0
apps/gradio-space/static/studio/studio.css +44 -3
apps/gradio-space/static/studio/studio.js +301 -0
libs/agent/src/agent/models.py +26 -0
libs/agent/src/agent/progress.py +59 -0
libs/agent/src/agent/prompts.py +149 -1
libs/agent/src/agent/runner.py +426 -4
libs/agent/src/agent/tools/quiz.py +134 -0
libs/agent/src/agent/tools_registry.py +18 -1
libs/agent/tests/test_quiz_maker.py +123 -0
research/evals/configs/lm_eval_reasoning.yaml +2 -2
research/evals/configs/lm_eval_science.yaml +2 -2
research/modal/_common.py +53 -10
research/modal/experiments.yaml +37 -3
research/modal/server_app.py +9 -2
skills/quiz-maker/SKILL.md +29 -0
skills/quiz-maker/references/mcq-format.md +22 -0

.cursor/plans/quiz_maker_skill_52f29d14.plan.md CHANGED Viewed

@@ -4,19 +4,19 @@ overview: "Sprint 1 (teaching loop): ship a quiz-maker skill mirroring education
 todos:
   - id: quiz-skill-backend
     content: Create quiz-maker skill, QuizOutline models, prompts, create_quiz tool, iter_quiz_maker runner
-    status: pending
   - id: quiz-tests
     content: "Agent tests: JSON repair, fallback_quiz, docx/html smoke"
-    status: pending
   - id: quiz-classic-tab
     content: Add tabs/quiz_maker.py with source modes + wire Classic Gradio tab
-    status: pending
   - id: quiz-studio-ui
     content: Add api_generate_quiz + Studio Quiz sidebar view with DOCX/HTML downloads
-    status: pending
   - id: quiz-teaching-cta
     content: "Slides view CTA: Create quiz on this topic (pre-fill topic/grade/session)"
-    status: pending
 isProject: false
 ---

 todos:
   - id: quiz-skill-backend
     content: Create quiz-maker skill, QuizOutline models, prompts, create_quiz tool, iter_quiz_maker runner
+    status: completed
   - id: quiz-tests
     content: "Agent tests: JSON repair, fallback_quiz, docx/html smoke"
+    status: completed
   - id: quiz-classic-tab
     content: Add tabs/quiz_maker.py with source modes + wire Classic Gradio tab
+    status: completed
   - id: quiz-studio-ui
     content: Add api_generate_quiz + Studio Quiz sidebar view with DOCX/HTML downloads
+    status: completed
   - id: quiz-teaching-cta
     content: "Slides view CTA: Create quiz on this topic (pre-fill topic/grade/session)"
+    status: completed
 isProject: false
 ---

apps/gradio-space/README.md CHANGED Viewed

@@ -31,6 +31,7 @@ This package uses **Gradio 6 Server mode** (`gradio.Server`):
 - `discover_sources`, `auto_search_ingest`, `ingest_sources`, `ingest_url`, `ingest_files`
 - `research_chat`, `generate_slides` (supports `source_mode`: none / web / rag)
 - `generate_slides_from_conversation` — build a deck from Research, Language lessons, or Chat history
 **Voice & coach**
@@ -61,8 +62,11 @@ Set `ALLOW_MODEL_SWITCH=true` in `.env` (see [USAGE.md](../../USAGE.md)). The Se
 1. Open `/` — **Small Model Finetuning** project workspace
 2. **Research** — ingest a PDF or URL on your topic → ask 2 RAG questions with citations
 3. Tap **Generate slides from chat** → switch to **Slides** → preview deck → **Present** (fullscreen, arrow keys)
-4. Download **PPTX** and expand **Agent trace**
-5. Optional: **Language lessons** → French voice turn → **Slides from chat** on the same topic
 ### Language lessons + Cohere stack (voice demo)

 - `discover_sources`, `auto_search_ingest`, `ingest_sources`, `ingest_url`, `ingest_files`
 - `research_chat`, `generate_slides` (supports `source_mode`: none / web / rag)
 - `generate_slides_from_conversation` — build a deck from Research, Language lessons, or Chat history
+- `generate_quiz` — printable MCQ worksheet (DOCX + HTML) with optional RAG / web sources
 **Voice & coach**
 1. Open `/` — **Small Model Finetuning** project workspace
 2. **Research** — ingest a PDF or URL on your topic → ask 2 RAG questions with citations
 3. Tap **Generate slides from chat** → switch to **Slides** → preview deck → **Present** (fullscreen, arrow keys)
+4. Tap **Create quiz on this topic** → **Quiz** view → generate worksheet → download **DOCX** (answer key included)
+5. Download **PPTX** and expand **Agent trace**
+6. Optional: **Language lessons** → French voice turn → **Slides from chat** on the same topic
+Classic UI (`/classic`) adds a **Quiz maker** tab after **Lesson slides** with the same agent pipeline.
 ### Language lessons + Cohere stack (voice demo)

apps/gradio-space/src/gradio_space/api/studio.py CHANGED Viewed

@@ -39,6 +39,7 @@ from gradio_space.research_helpers import (
 )
 from gradio_space.conversation_helpers import format_conversation_context
 from gradio_space.tabs.education_pptx import SOURCE_MODES, SEARCH_WORKFLOWS, generate_lesson_slides
 from gradio_space.tabs.research_mind import (
     ask_question,
     auto_search_ingest,
@@ -642,6 +643,132 @@ def api_generate_slides(
     )
 def api_generate_slides_from_conversation(
     history: list | None,
     history_kind: str,
@@ -1226,6 +1353,34 @@ def register_studio_apis(server: gr.Server) -> None:
             file_paths,
         )
     @server.api(name="language_lesson_turn")
     def _language_lesson_turn(
         message: str = "",

 )
 from gradio_space.conversation_helpers import format_conversation_context
 from gradio_space.tabs.education_pptx import SOURCE_MODES, SEARCH_WORKFLOWS, generate_lesson_slides
+from gradio_space.tabs.quiz_maker import generate_quiz
 from gradio_space.tabs.research_mind import (
     ask_question,
     auto_search_ingest,
     )
+def _build_quiz_api_response(
+    last: tuple,
+    *,
+    topic: str,
+    sid: str,
+    rag_notice: str = "",
+) -> dict[str, Any]:
+    (
+        outline_md,
+        preview_html,
+        docx,
+        html_export,
+        processing_log,
+        trace_sum,
+        trace_json,
+        status,
+    ) = last
+    if preview_html and "form-error" in preview_html:
+        return err(status or "Generation failed.", status=status, progress_log=processing_log)
+    if rag_notice:
+        status = f"{rag_notice}\n\n{status or 'Quiz generated.'}".strip()
+    downloads = {
+        "docx": docx,
+        "html": html_export,
+    }
+    trace_str = trace_json if isinstance(trace_json, str) else ""
+    return ok(
+        topic=topic,
+        session_id=sid,
+        outline_md=outline_md,
+        preview_html=preview_html,
+        downloads=downloads,
+        status=status,
+        rag_fallback=bool(rag_notice),
+        progress_log=processing_log,
+        trace_summary=trace_sum,
+        trace_json=trace_str,
+        trace_html=render_trace_details(
+            trace_summary=trace_sum,
+            trace_json=trace_str,
+            progress_log=processing_log,
+        ),
+        elapsed_seconds=_elapsed_seconds_from_log(processing_log),
+        progress=_progress_from_trace(trace_str),
+    )
+def _run_quiz_generation(**kwargs) -> dict[str, Any]:
+    topic = kwargs.pop("topic")
+    sid = kwargs.pop("sid", "")
+    rag_notice = kwargs.pop("rag_notice", "")
+    gen = generate_quiz(topic, **kwargs)
+    last: tuple | None = None
+    for item in gen:
+        last = item
+    if last is None:
+        return err("Generation failed before producing output.")
+    return _build_quiz_api_response(last, topic=topic, sid=sid, rag_notice=rag_notice)
+def api_generate_quiz(
+    topic: str,
+    grade: str = "6",
+    question_count: int = 5,
+    session_id: str = "",
+    use_rag: bool = True,
+    doc_ids: list[str] | None = None,
+    source_mode: str = "",
+    search_workflow: str = "two_step",
+    urls_text: str = "",
+    selected_urls: list[str] | None = None,
+    file_paths: list[str] | None = None,
+) -> dict[str, Any]:
+    rag_docs = doc_ids or []
+    sid = (session_id or "").strip()
+    if not (source_mode or "").strip() and use_rag and not sid:
+        sid = _pick_session(topic)
+    source_label, workflow_label, effective_sid, effective_docs = _resolve_source_labels(
+        source_mode,
+        search_workflow,
+        use_rag,
+        sid,
+        rag_docs,
+    )
+    rag_notice = ""
+    if (source_mode or "").strip().lower() == "rag" or (
+        not (source_mode or "").strip() and use_rag
+    ):
+        has_sources = _session_has_rag_sources(sid, rag_docs)
+        if use_rag and not has_sources and source_label == _SOURCE_LABELS["rag"]:
+            rag_notice = (
+                "Cross-Reference Sources is on, but this session has no indexed documents — "
+                "generated from model knowledge only. Ingest sources in Step 1 to enable RAG."
+            )
+            source_label = _SOURCE_LABELS["none"]
+            effective_sid = ""
+            effective_docs = []
+    upload_files = file_paths if file_paths else None
+    return _run_quiz_generation(
+        topic=topic,
+        sid=sid,
+        rag_notice=rag_notice,
+        grade=grade,
+        question_count=int(question_count),
+        source_mode_label=source_label,
+        search_workflow_label=workflow_label,
+        urls_text=urls_text or "",
+        selected_urls=selected_urls or [],
+        upload_files=upload_files,
+        session_id=effective_sid,
+        doc_ids=effective_docs,
+        workspace_topic=topic,
+        workspace_session=effective_sid,
+        workspace_doc_ids=effective_docs,
+        progress=_NoopProgress(),
+    )
 def api_generate_slides_from_conversation(
     history: list | None,
     history_kind: str,
             file_paths,
         )
+    @server.api(name="generate_quiz")
+    def _generate_quiz(
+        topic: str,
+        grade: str = "6",
+        question_count: int = 5,
+        session_id: str = "",
+        use_rag: bool = True,
+        doc_ids: list[str] | None = None,
+        source_mode: str = "",
+        search_workflow: str = "two_step",
+        urls_text: str = "",
+        selected_urls: list[str] | None = None,
+        file_paths: list[str] | None = None,
+    ) -> dict[str, Any]:
+        return api_generate_quiz(
+            topic,
+            grade,
+            question_count,
+            session_id,
+            use_rag,
+            doc_ids,
+            source_mode,
+            search_workflow,
+            urls_text,
+            selected_urls,
+            file_paths,
+        )
     @server.api(name="language_lesson_turn")
     def _language_lesson_turn(
         message: str = "",

apps/gradio-space/src/gradio_space/app.py CHANGED Viewed

@@ -9,6 +9,7 @@ from gradio_space.tabs import (
     build_chat_tab,
     build_education_pptx_tab,
     build_echo_coach_tab,
     build_research_mind_tab,
     build_teacher_voice_tab,
 )
@@ -63,6 +64,8 @@ def build_demo() -> gr.Blocks:
         with gr.Tabs():
             with gr.Tab("Lesson slides"):
                 build_education_pptx_tab(workspace)
             with gr.Tab("ResearchMind"):
                 build_research_mind_tab(workspace)
             with gr.Tab("EchoCoach"):

     build_chat_tab,
     build_education_pptx_tab,
     build_echo_coach_tab,
+    build_quiz_maker_tab,
     build_research_mind_tab,
     build_teacher_voice_tab,
 )
         with gr.Tabs():
             with gr.Tab("Lesson slides"):
                 build_education_pptx_tab(workspace)
+            with gr.Tab("Quiz maker"):
+                build_quiz_maker_tab(workspace)
             with gr.Tab("ResearchMind"):
                 build_research_mind_tab(workspace)
             with gr.Tab("EchoCoach"):

apps/gradio-space/src/gradio_space/server.py CHANGED Viewed

@@ -23,7 +23,7 @@ from gradio_space.ui.theme import get_theme, load_css
 _PKG_ROOT = Path(__file__).resolve().parent
 _APP_ROOT = _PKG_ROOT.parents[1]
 _STATIC_DIR = _APP_ROOT / "static" / "studio"
-_STUDIO_ASSET_VERSION = "20260615c"
 _STUDIO_INDEX_HTML = _STATIC_DIR / "index.html"

 _PKG_ROOT = Path(__file__).resolve().parent
 _APP_ROOT = _PKG_ROOT.parents[1]
 _STATIC_DIR = _APP_ROOT / "static" / "studio"
+_STUDIO_ASSET_VERSION = "20260615d"
 _STUDIO_INDEX_HTML = _STATIC_DIR / "index.html"

apps/gradio-space/src/gradio_space/tabs/__init__.py CHANGED Viewed

@@ -1,6 +1,7 @@
 from gradio_space.tabs.chat import build_chat_tab
 from gradio_space.tabs.education_pptx import build_education_pptx_tab
 from gradio_space.tabs.echo_coach import build_echo_coach_tab
 from gradio_space.tabs.research_mind import build_research_mind_tab
 from gradio_space.tabs.teacher_voice import build_teacher_voice_tab
@@ -8,6 +9,7 @@ __all__ = [
     "build_chat_tab",
     "build_education_pptx_tab",
     "build_echo_coach_tab",
     "build_research_mind_tab",
     "build_teacher_voice_tab",
 ]

 from gradio_space.tabs.chat import build_chat_tab
 from gradio_space.tabs.education_pptx import build_education_pptx_tab
 from gradio_space.tabs.echo_coach import build_echo_coach_tab
+from gradio_space.tabs.quiz_maker import build_quiz_maker_tab
 from gradio_space.tabs.research_mind import build_research_mind_tab
 from gradio_space.tabs.teacher_voice import build_teacher_voice_tab
     "build_chat_tab",
     "build_education_pptx_tab",
     "build_echo_coach_tab",
+    "build_quiz_maker_tab",
     "build_research_mind_tab",
     "build_teacher_voice_tab",
 ]

apps/gradio-space/src/gradio_space/tabs/quiz_maker.py ADDED Viewed

	@@ -0,0 +1,430 @@

+from html import escape
+from pathlib import Path
+import gradio as gr
+from agent.progress import QuizGenerationProgress
+from agent.runner import AgentRunner, QuizAgentResult
+from gradio_space.model_loading import ensure_model_loaded, get_active_model_key
+from gradio_space.research_helpers import (
+    list_session_choices,
+    merge_lesson_urls,
+    refresh_doc_choices,
+    refresh_sessions,
+    resolve_doc_ids,
+    resolve_session,
+    resolve_topic,
+)
+from gradio_space.spaces_runtime import gpu_task
+from gradio_space.tabs.education_pptx import (
+    SEARCH_WORKFLOWS,
+    SOURCE_MODES,
+    discover_lesson_sources,
+    strip_md_inline,
+    update_source_visibility,
+)
+from gradio_space.ui.components import build_advanced_panel, DOC_CHOICE_LIST_CLASSES, WorkspaceWidgets
+from inference.factory import get_backend
+_SOURCE_LABEL_TO_VALUE = {label: value for label, value in SOURCE_MODES}
+_WORKFLOW_LABEL_TO_VALUE = {label: value for label, value in SEARCH_WORKFLOWS}
+def _source_mode_value(label: str) -> str:
+    return _SOURCE_LABEL_TO_VALUE.get(label, "none")
+def _search_workflow_value(label: str) -> str:
+    return _WORKFLOW_LABEL_TO_VALUE.get(label, "two_step")
+def _error_html(message: str) -> str:
+    safe = (
+        message.replace("&", "&amp;")
+        .replace("<", "&lt;")
+        .replace(">", "&gt;")
+    )
+    return (
+        f'<div style="padding:12px;border:1px solid #c44;border-radius:8px;'
+        f'background:#fff5f5;color:#8a1f1f;">{safe}</div>'
+    )
+def _empty_outputs(message: str) -> tuple:
+    log_html = (
+        f'<div class="slide-gen-log"><div class="slide-gen-log-banner error">'
+        f"{message}</div></div>"
+    )
+    return (
+        message,
+        _error_html(message),
+        None,
+        None,
+        log_html,
+        message,
+        message,
+        message,
+    )
+def _running_preview_html(step_label: str = "Generating quiz…") -> str:
+    safe = (
+        step_label.replace("&", "&amp;")
+        .replace("<", "&lt;")
+        .replace(">", "&gt;")
+    )
+    return (
+        '<div class="lesson-running-preview">'
+        '<div class="lesson-running-spinner" aria-hidden="true"></div>'
+        f"<p><strong>{safe}</strong></p>"
+        "<p class=\"lesson-running-hint\">Local models can take 30–90s on CPU. "
+        "Steps update live below.</p>"
+        "</div>"
+    )
+def _interim_outputs(
+    quiz_progress: QuizGenerationProgress,
+    *,
+    status: str = "_Generating quiz…_",
+    step_label: str = "Generating quiz…",
+) -> tuple:
+    log_html = quiz_progress.format_log_html(running=True)
+    return (
+        "",
+        _running_preview_html(step_label),
+        None,
+        None,
+        log_html,
+        "",
+        "",
+        status,
+    )
+def _format_processing_log(
+    progress: QuizGenerationProgress,
+    *,
+    trace_summary: str = "",
+    source_status: str = "",
+) -> str:
+    footer_parts: list[str] = []
+    if source_status:
+        footer_parts.append(
+            f"<p><strong>Sources:</strong> {escape(strip_md_inline(source_status))}</p>"
+        )
+    if trace_summary:
+        footer_parts.append(
+            f'<pre class="slide-gen-log-trace">{escape(trace_summary)}</pre>'
+        )
+    footer_html = "".join(footer_parts)
+    return progress.format_log_html(running=False, footer_html=footer_html)
+@gpu_task(duration=300)
+def generate_quiz(
+    topic: str,
+    grade: str,
+    question_count: int,
+    source_mode_label: str,
+    search_workflow_label: str,
+    urls_text: str,
+    selected_urls: list[str],
+    upload_files: list[str] | None,
+    session_id: str,
+    doc_ids: list[str] | None,
+    workspace_topic: str = "",
+    workspace_session: str = "",
+    workspace_doc_ids: list[str] | None = None,
+    progress: gr.Progress = gr.Progress(),
+):
+    topic = resolve_topic(topic, workspace_topic)
+    session_id = resolve_session(session_id, workspace_session)
+    doc_ids = resolve_doc_ids(doc_ids, workspace_doc_ids)
+    quiz_progress = QuizGenerationProgress(
+        on_update=lambda fraction, desc: progress(fraction, desc=desc),
+    )
+    quiz_progress.begin("load_model", "Load language model")
+    model_key = get_active_model_key()
+    load_error = ensure_model_loaded(model_key)
+    if load_error:
+        yield _empty_outputs(load_error)
+        return
+    if not topic.strip():
+        message = "Please enter a quiz topic."
+        yield _empty_outputs(message)
+        return
+    source_mode = _source_mode_value(source_mode_label)
+    search_workflow = _search_workflow_value(search_workflow_label)
+    merged_urls = merge_lesson_urls(urls_text, selected_urls)
+    files = [Path(p) for p in (upload_files or [])]
+    current_step = "Load language model"
+    yield _interim_outputs(quiz_progress, step_label=current_step)
+    result = None
+    try:
+        runner = AgentRunner()
+        for item in runner.iter_quiz_maker(
+            topic=topic,
+            grade=grade,
+            question_count=int(question_count),
+            model_key=model_key,
+            backend=get_backend(model_key),
+            source_mode=source_mode,  # type: ignore[arg-type]
+            search_workflow=search_workflow,  # type: ignore[arg-type]
+            urls=merged_urls,
+            files=files,
+            session_id=session_id or None,
+            doc_ids=doc_ids or [],
+            progress=quiz_progress,
+        ):
+            if isinstance(item, QuizAgentResult):
+                result = item
+                break
+            current_step = item.steps[-1].label if item.steps else current_step
+            yield _interim_outputs(quiz_progress, step_label=current_step)
+    except Exception as exc:  # noqa: BLE001
+        message = f"Agent error: {exc}"
+        quiz_progress.finish()
+        yield (
+            message,
+            _error_html(message),
+            None,
+            None,
+            quiz_progress.format_log_html(running=False),
+            message,
+            message,
+            message,
+        )
+        return
+    if result is None:
+        message = "Agent error: generation finished without a result."
+        yield _empty_outputs(message)
+        return
+    progress(1.0, desc="Done")
+    trace_summary = (
+        f"Run `{result.trace.run_id}` · skill `{result.trace.skill}` · "
+        f"model `{result.trace.model}`\n\n"
+        f"Trace saved: `{result.trace_path}`"
+    )
+    source_status = result.source_summary or "_No external sources used (model only)._"
+    processing_log = _format_processing_log(
+        quiz_progress,
+        trace_summary=trace_summary,
+        source_status=source_status,
+    )
+    yield (
+        result.markdown_preview,
+        result.html_preview,
+        str(Path(result.docx_path).resolve()),
+        str(Path(result.html_export_path).resolve()),
+        processing_log,
+        trace_summary,
+        result.trace.to_json(),
+        source_status,
+    )
+def build_quiz_maker_tab(workspace: WorkspaceWidgets) -> None:
+    gr.Markdown("### Quiz maker", elem_classes=["lesson-tab-heading"])
+    gr.HTML(
+        '<p class="tab-subtitle">Create a printable multiple-choice quiz with answer key '
+        "from your topic and optional research sources.</p>"
+    )
+    with gr.Column(elem_classes=["lesson-form-primary"]):
+        topic = gr.Textbox(
+            label="Quiz topic",
+            placeholder="e.g. Photosynthesis, Fractions, The water cycle…",
+            lines=2,
+            max_lines=3,
+            elem_classes=["lesson-topic-input"],
+        )
+    with gr.Row(elem_classes=["lesson-form-secondary"]):
+        grade = gr.Dropdown(
+            label="Grade",
+            choices=["K", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "Adult"],
+            value="6",
+            scale=1,
+            min_width=100,
+        )
+        question_count = gr.Slider(
+            minimum=5,
+            maximum=10,
+            step=1,
+            value=5,
+            label="Questions",
+            scale=2,
+        )
+    with gr.Accordion("Research sources (optional)", open=False, elem_classes=["lesson-optional-accordion"]):
+        source_mode = gr.Radio(
+            label="Source mode",
+            choices=[m[0] for m in SOURCE_MODES],
+            value=SOURCE_MODES[0][0],
+        )
+        search_workflow = gr.Radio(
+            label="Web search workflow",
+            choices=[m[0] for m in SEARCH_WORKFLOWS],
+            value=SEARCH_WORKFLOWS[0][0],
+            visible=False,
+        )
+        discover_btn = gr.Button("Discover sources", variant="secondary", visible=False)
+        with gr.Row():
+            session_dd = gr.Dropdown(
+                label="ResearchMind session",
+                choices=list_session_choices(),
+                value="",
+                visible=False,
+            )
+            refresh_sess_btn = gr.Button("↻", size="sm", visible=False, min_width=40)
+        url_choices = gr.CheckboxGroup(
+            label="Suggested URLs to use",
+            choices=[],
+            visible=False,
+            elem_classes=DOC_CHOICE_LIST_CLASSES,
+        )
+        urls_text = gr.Textbox(
+            label="URLs (one per line, optional)",
+            lines=3,
+            placeholder="https://en.wikipedia.org/wiki/...",
+            visible=False,
+        )
+        upload_files = gr.File(
+            label="Upload PDF or DOCX",
+            file_count="multiple",
+            file_types=[".pdf", ".docx"],
+            visible=False,
+        )
+        doc_dd = gr.CheckboxGroup(
+            label="Documents in session (RAG scope)",
+            choices=[],
+            value=[],
+            visible=False,
+            elem_classes=DOC_CHOICE_LIST_CLASSES,
+        )
+    with gr.Row(elem_classes=["lesson-generate-row"]):
+        generate_btn = gr.Button(
+            "Generate quiz",
+            variant="primary",
+            elem_classes=["primary-cta"],
+            scale=1,
+        )
+    source_status = gr.Markdown(value="_Ready to generate._", elem_classes=["lesson-status"])
+    processing_log = gr.HTML(
+        value=(
+            '<div class="slide-gen-log slide-gen-log-idle">'
+            "<p>Generation steps and timings appear here when you run.</p>"
+            "</div>"
+        ),
+        elem_classes=["lesson-processing-log"],
+    )
+    with gr.Tabs():
+        with gr.Tab("Worksheet preview"):
+            quiz_preview = gr.HTML(label="Quiz preview")
+        with gr.Tab("Outline"):
+            outline_preview = gr.Markdown(label="Outline (markdown)")
+    with gr.Row():
+        docx_file = gr.File(label="Download worksheet (.docx)", interactive=False)
+        html_file = gr.File(label="Download HTML preview", interactive=False)
+    with gr.Accordion("Agent trace", open=False):
+        trace_summary = gr.Markdown()
+        trace_json = gr.Code(language="json", label="Trace JSON")
+    advanced = build_advanced_panel()
+    source_controls = [
+        search_workflow,
+        discover_btn,
+        url_choices,
+        urls_text,
+        upload_files,
+        session_dd,
+        refresh_sess_btn,
+        doc_dd,
+        generate_btn,
+    ]
+    def _refresh_visibility(mode_label: str, workflow_label: str):
+        return update_source_visibility(mode_label, workflow_label)
+    source_mode.change(
+        fn=_refresh_visibility,
+        inputs=[source_mode, search_workflow],
+        outputs=source_controls,
+    )
+    search_workflow.change(
+        fn=_refresh_visibility,
+        inputs=[source_mode, search_workflow],
+        outputs=source_controls,
+    )
+    refresh_sess_btn.click(fn=refresh_sessions, inputs=[session_dd], outputs=[session_dd])
+    session_dd.change(
+        fn=refresh_doc_choices,
+        inputs=[session_dd, doc_dd],
+        outputs=[doc_dd],
+    )
+    discover_btn.click(
+        fn=discover_lesson_sources,
+        inputs=[topic, session_dd, workspace.topic, workspace.session_dd],
+        outputs=[source_status, url_choices, session_dd],
+    )
+    generate_btn.click(
+        fn=generate_quiz,
+        inputs=[
+            topic,
+            grade,
+            question_count,
+            source_mode,
+            search_workflow,
+            urls_text,
+            url_choices,
+            upload_files,
+            session_dd,
+            doc_dd,
+            workspace.topic,
+            workspace.session_dd,
+            workspace.doc_dd,
+        ],
+        outputs=[
+            outline_preview,
+            quiz_preview,
+            docx_file,
+            html_file,
+            processing_log,
+            trace_summary,
+            trace_json,
+            source_status,
+        ],
+        show_progress="hidden",
+    )
+    def _sync_session_from_workspace(ws_session: str, local_session: str):
+        if ws_session and ws_session != local_session:
+            return gr.update(value=ws_session)
+        return gr.update()
+    workspace.session_dd.change(
+        fn=_sync_session_from_workspace,
+        inputs=[workspace.session_dd, session_dd],
+        outputs=[session_dd],
+    ).then(
+        fn=refresh_doc_choices,
+        inputs=[session_dd, doc_dd],
+        outputs=[doc_dd],
+    )

apps/gradio-space/static/studio/index.html CHANGED Viewed

@@ -36,6 +36,7 @@
     <nav class="sidebar-nav">
       <button type="button" class="nav-item" data-view="research"><span class="material-symbols-outlined">search</span>Research</button>
       <button type="button" class="nav-item active" data-view="slides"><span class="material-symbols-outlined">present_to_all</span>Slides</button>
       <button type="button" class="nav-item" data-view="language-lessons"><span class="material-symbols-outlined">translate</span>Language lessons</button>
       <button type="button" class="nav-item" data-view="debug"><span class="material-symbols-outlined">chat</span>Chat</button>
       <button type="button" id="btn-open-settings" class="nav-item"><span class="material-symbols-outlined">settings</span>Settings</button>
@@ -313,6 +314,10 @@
           <div id="slide-outline" class="slide-outline"></div>
         </details>
         <div id="downloads" class="downloads hidden"></div>
         <details class="slide-export-help">
           <summary>Export help — open in Google Docs</summary>
           <p class="status-text">Download the <strong>.docx</strong> file, upload it to <a href="https://drive.google.com" target="_blank" rel="noopener">Google Drive</a>, then choose <strong>Open with → Google Docs</strong>. You can also upload the <strong>.html</strong> file via Google Docs → File → Open → Upload.</p>
@@ -320,6 +325,119 @@
       </div>
     </section>
     <section class="col col-studio">
       <div class="lessons-layout view-lessons-only">
         <aside class="lessons-rail">

     <nav class="sidebar-nav">
       <button type="button" class="nav-item" data-view="research"><span class="material-symbols-outlined">search</span>Research</button>
       <button type="button" class="nav-item active" data-view="slides"><span class="material-symbols-outlined">present_to_all</span>Slides</button>
+      <button type="button" class="nav-item" data-view="quiz"><span class="material-symbols-outlined">quiz</span>Quiz</button>
       <button type="button" class="nav-item" data-view="language-lessons"><span class="material-symbols-outlined">translate</span>Language lessons</button>
       <button type="button" class="nav-item" data-view="debug"><span class="material-symbols-outlined">chat</span>Chat</button>
       <button type="button" id="btn-open-settings" class="nav-item"><span class="material-symbols-outlined">settings</span>Settings</button>
           <div id="slide-outline" class="slide-outline"></div>
         </details>
         <div id="downloads" class="downloads hidden"></div>
+        <button type="button" id="btn-slides-to-quiz" class="btn btn-ghost btn-block slides-to-quiz hidden">
+          <span class="material-symbols-outlined">quiz</span>
+          Create quiz on this topic
+        </button>
         <details class="slide-export-help">
           <summary>Export help — open in Google Docs</summary>
           <p class="status-text">Download the <strong>.docx</strong> file, upload it to <a href="https://drive.google.com" target="_blank" rel="noopener">Google Drive</a>, then choose <strong>Open with → Google Docs</strong>. You can also upload the <strong>.html</strong> file via Google Docs → File → Open → Upload.</p>
       </div>
     </section>
+    <section class="col col-quiz view-quiz-only">
+      <div class="card card-tall">
+        <div class="card-header">
+          <div class="step-badge">3</div>
+          <h2>Quiz maker</h2>
+        </div>
+        <div class="controls-panel">
+          <div class="controls-grid">
+            <label class="field">
+              <span>Topic override (optional)</span>
+              <input id="quiz-topic" type="text" class="input" placeholder="Uses workspace topic when empty" />
+            </label>
+            <label class="field">
+              <span>Grade</span>
+              <select id="quiz-grade" class="input">
+                <option value="K">K</option>
+                <option value="1">1</option>
+                <option value="2">2</option>
+                <option value="3">3</option>
+                <option value="4">4</option>
+                <option value="5">5</option>
+                <option value="6" selected>6</option>
+                <option value="7">7</option>
+                <option value="8">8</option>
+                <option value="9">9</option>
+                <option value="10">10</option>
+                <option value="11">11</option>
+                <option value="12">12</option>
+                <option value="Adult">Adult</option>
+              </select>
+            </label>
+            <label class="field field-wide">
+              <span>Questions: <strong id="quiz-count-val">5</strong></span>
+              <input id="quiz-count" type="range" min="5" max="10" value="5" />
+            </label>
+          </div>
+          <details class="slide-source-details" id="quiz-source-details">
+            <summary>Research sources (optional)</summary>
+            <label class="field">
+              <span>Source mode</span>
+              <select id="quiz-source-mode" class="input">
+                <option value="">Auto (RAG toggle)</option>
+                <option value="none">None (model only)</option>
+                <option value="web">Web search</option>
+                <option value="rag">RAG (indexed sources)</option>
+              </select>
+            </label>
+            <label class="field slide-web-workflow hidden" id="quiz-web-workflow-wrap">
+              <span>Web search workflow</span>
+              <select id="quiz-search-workflow" class="input">
+                <option value="two_step">Discover &amp; confirm</option>
+                <option value="auto">Auto search &amp; ingest</option>
+              </select>
+            </label>
+            <div class="slide-web-discover hidden" id="quiz-web-discover-wrap">
+              <button type="button" id="btn-quiz-discover" class="btn btn-secondary btn-block">Discover sources</button>
+              <div id="quiz-url-choices-panel" class="url-choices-panel hidden">
+                <div id="quiz-url-choices-list" class="url-choices-list"></div>
+              </div>
+              <label class="field">
+                <span>URLs (one per line)</span>
+                <textarea id="quiz-urls-text" class="input" rows="2" placeholder="https://…"></textarea>
+              </label>
+            </div>
+            <label class="upload-zone upload-zone-compact">
+              <input id="quiz-source-files" type="file" accept=".pdf,.docx" multiple hidden />
+              <span class="material-symbols-outlined">upload_file</span>
+              <span>Upload PDF or Doc for generation</span>
+            </label>
+          </details>
+          <div class="controls-actions">
+            <button type="button" id="btn-generate-quiz" class="btn btn-primary">
+              <span class="material-symbols-outlined">auto_awesome</span>
+              Generate quiz
+            </button>
+          </div>
+          <p id="quiz-generate-status" class="status-text">Ready to generate.</p>
+          <div id="quiz-progress-panel" class="progress-panel hidden">
+            <div class="progress-panel-head">
+              <span id="quiz-progress-elapsed" class="progress-elapsed">Elapsed: 0s</span>
+              <span id="quiz-progress-eta" class="progress-eta"></span>
+            </div>
+            <div class="progress-bar-track" aria-hidden="true">
+              <div id="quiz-progress-bar-fill" class="progress-bar-fill" style="width: 0%"></div>
+            </div>
+            <p id="quiz-progress-current" class="progress-current">Idle</p>
+            <ol id="quiz-progress-steps" class="progress-steps"></ol>
+            <div id="quiz-progress-log" class="progress-log hidden" aria-live="polite"></div>
+            <details class="studio-debug-trace" id="quiz-trace-details">
+              <summary>Agent trace</summary>
+              <div id="quiz-trace-panel"></div>
+            </details>
+          </div>
+        </div>
+        <div id="quiz-preview" class="slide-canvas">
+          <div id="quiz-preview-overlay" class="region-loading hidden" aria-live="polite">
+            <div class="region-loading-inner">
+              <span class="studio-spinner" aria-hidden="true"></span>
+              <p class="region-loading-text">Generating quiz…</p>
+            </div>
+          </div>
+          <div id="quiz-preview-content" class="slide-canvas-content">
+            <div class="studio-canvas-empty"><p>Generate a quiz to preview the worksheet here.</p></div>
+          </div>
+        </div>
+        <details class="slide-outline-details hidden" id="quiz-outline-details">
+          <summary>Outline (markdown)</summary>
+          <div id="quiz-outline" class="slide-outline"></div>
+        </details>
+        <div id="quiz-downloads" class="downloads hidden"></div>
+      </div>
+    </section>
     <section class="col col-studio">
       <div class="lessons-layout view-lessons-only">
         <aside class="lessons-rail">

apps/gradio-space/static/studio/studio.css CHANGED Viewed

@@ -879,7 +879,8 @@ body.sidebar-open {
 .studio-coach-hint { margin: 0; opacity: 0.8; line-height: 1.45; }
 .workspace[data-view="research"] .col-slides,
-.workspace[data-view="research"] .col-studio { display: none; }
 .workspace[data-view="research"] {
   grid-template-columns: 1fr;
   max-width: 1120px;
@@ -1184,7 +1185,45 @@ body.sidebar-open {
 }
 .workspace[data-view="language-lessons"] .col-research,
-.workspace[data-view="language-lessons"] .col-slides { display: none; }
 .workspace[data-view="language-lessons"] .col-debug { display: none; }
@@ -1388,6 +1427,7 @@ body.sidebar-open {
 }
 .workspace[data-view="slides"] .col-studio,
 .workspace[data-view="research"] .col-debug { display: none; }
 .coach-card-head {
@@ -1889,7 +1929,8 @@ body.sidebar-open {
 .workspace[data-view="debug"] .col-research,
 .workspace[data-view="debug"] .col-slides,
-.workspace[data-view="debug"] .col-studio { display: none; }
 .workspace[data-view="debug"] {
   grid-template-columns: 1fr;

 .studio-coach-hint { margin: 0; opacity: 0.8; line-height: 1.45; }
 .workspace[data-view="research"] .col-slides,
+.workspace[data-view="research"] .col-studio,
+.workspace[data-view="research"] .col-quiz { display: none; }
 .workspace[data-view="research"] {
   grid-template-columns: 1fr;
   max-width: 1120px;
 }
 .workspace[data-view="language-lessons"] .col-research,
+.workspace[data-view="language-lessons"] .col-slides,
+.workspace[data-view="language-lessons"] .col-quiz { display: none; }
+.view-quiz-only { display: none; }
+.workspace[data-view="quiz"] .col-research,
+.workspace[data-view="quiz"] .col-slides,
+.workspace[data-view="quiz"] .col-studio,
+.workspace[data-view="quiz"] .col-debug { display: none; }
+.workspace[data-view="quiz"] {
+  grid-template-columns: minmax(0, 1fr);
+  max-width: 960px;
+  gap: 1.25rem;
+}
+.workspace[data-view="quiz"] .col-quiz {
+  display: block;
+  grid-column: 1 / -1;
+  width: 100%;
+  min-width: 0;
+}
+.workspace[data-view="quiz"] .quiz-preview-inner {
+  font-size: 0.92rem;
+}
+.quiz-preview-frame {
+  width: 100%;
+  min-height: 520px;
+  border: 1px solid var(--border-subtle, #ddd);
+  border-radius: 8px;
+  background: #fff;
+}
+.slides-to-quiz {
+  margin-top: 0.75rem;
+  text-align: left;
+}
 .workspace[data-view="language-lessons"] .col-debug { display: none; }
 }
 .workspace[data-view="slides"] .col-studio,
+.workspace[data-view="slides"] .col-quiz,
 .workspace[data-view="research"] .col-debug { display: none; }
 .coach-card-head {
 .workspace[data-view="debug"] .col-research,
 .workspace[data-view="debug"] .col-slides,
+.workspace[data-view="debug"] .col-studio,
+.workspace[data-view="debug"] .col-quiz { display: none; }
 .workspace[data-view="debug"] {
   grid-template-columns: 1fr;

apps/gradio-space/static/studio/studio.js CHANGED Viewed

@@ -50,6 +50,13 @@ const SLIDE_PIPELINE_STEPS = [
   "Build PPTX, DOCX, and HTML exports",
 ];
 const state = {
   workspaceTopic: "small model finetuning",
   workspaceSessionId: "",
@@ -58,6 +65,8 @@ const state = {
   selectedUrls: [],
   slideDiscoveredUrls: [],
   slideSelectedUrls: [],
   lessonsDiscoveredUrls: [],
   lessonsSelectedUrls: [],
   researchChatHistory: [],
@@ -65,6 +74,9 @@ const state = {
   lessonsMode: "lesson",
   history: [],
   downloads: null,
   client: null,
   progressTimer: null,
   progressStartedAt: null,
@@ -171,9 +183,25 @@ function syncSlideSourceUi() {
   }
 }
 function syncResearchLayout() {
   syncIngestWorkflowUi();
   syncSlideSourceUi();
   updateResearchDocCount(state.workspaceDocIds?.length || 0);
 }
@@ -376,6 +404,13 @@ function renderSlideGenerationResult(data, { scrollToCanvas = false, pulsePresen
   setTracePanel("#slides-trace-panel", data);
   if (scrollToCanvas) {
     $("#slide-canvas")?.scrollIntoView({ behavior: "smooth", block: "nearest" });
   }
@@ -732,6 +767,19 @@ function renderSlideUrlChoices(urls, selected) {
   syncSlideSourceUi();
 }
 function syncUrlSelectAll() {
   const boxes = [...document.querySelectorAll("#url-choices-list input[type=checkbox]")];
   const selectAll = $("#url-select-all");
@@ -785,6 +833,18 @@ async function discoverSlideSources() {
   });
 }
 async function autoSearchIngest() {
   const topic = effectiveTopic("");
   if (!topic) {
@@ -1558,6 +1618,238 @@ async function generateSlidesFromConversation(kind) {
   );
 }
 function renderLessonsReply(data) {
   state.history = data.history ?? state.history;
   if (state.history.length) {
@@ -1776,6 +2068,9 @@ function bindUi() {
   $("#slide-count").addEventListener("input", (e) => {
     $("#slide-count-val").textContent = e.target.value;
   });
   document.querySelectorAll(".nav-item[data-view]").forEach((btn) => {
     btn.addEventListener("click", () => {
@@ -1835,6 +2130,10 @@ function bindUi() {
   $("#slide-search-workflow")?.addEventListener("change", syncSlideSourceUi);
   $("#btn-slide-discover")?.addEventListener("click", () => discoverSlideSources().catch(() => {}));
   $("#btn-research-ask").addEventListener("click", () => askResearchQuestion().catch(() => {}));
   $("#research-question")?.addEventListener("keydown", (e) => {
     if (e.key === "Enter" && !e.shiftKey) {
@@ -1844,6 +2143,8 @@ function bindUi() {
   });
   $("#btn-generate").addEventListener("click", () => generateSlides().catch(() => {}));
   $("#btn-present")?.addEventListener("click", () => openPresenter());
   $("#btn-research-to-slides")?.addEventListener("click", () =>
     generateSlidesFromConversation("research").catch(() => {})

   "Build PPTX, DOCX, and HTML exports",
 ];
+const QUIZ_PIPELINE_STEPS = [
+  "Load language model",
+  "Gather lesson sources",
+  "Generate quiz outline",
+  "Build DOCX and HTML quiz exports",
+];
 const state = {
   workspaceTopic: "small model finetuning",
   workspaceSessionId: "",
   selectedUrls: [],
   slideDiscoveredUrls: [],
   slideSelectedUrls: [],
+  quizDiscoveredUrls: [],
+  quizSelectedUrls: [],
   lessonsDiscoveredUrls: [],
   lessonsSelectedUrls: [],
   researchChatHistory: [],
   lessonsMode: "lesson",
   history: [],
   downloads: null,
+  quizDownloads: null,
+  lastSlideTopic: "",
+  lastSlideGrade: "6",
   client: null,
   progressTimer: null,
   progressStartedAt: null,
   }
 }
+function syncQuizSourceUi() {
+  const mode = $("#quiz-source-mode")?.value || "";
+  const isWeb = mode === "web";
+  $("#quiz-web-workflow-wrap")?.classList.toggle("hidden", !isWeb);
+  $("#quiz-web-discover-wrap")?.classList.toggle("hidden", !isWeb);
+  if (isWeb && $("#quiz-search-workflow")?.value === "two_step") {
+    $("#quiz-url-choices-panel")?.classList.toggle(
+      "hidden",
+      !state.quizDiscoveredUrls.length
+    );
+  } else {
+    $("#quiz-url-choices-panel")?.classList.add("hidden");
+  }
+}
 function syncResearchLayout() {
   syncIngestWorkflowUi();
   syncSlideSourceUi();
+  syncQuizSourceUi();
   updateResearchDocCount(state.workspaceDocIds?.length || 0);
 }
   setTracePanel("#slides-trace-panel", data);
+  const cta = $("#btn-slides-to-quiz");
+  if (cta) {
+    state.lastSlideTopic = data.topic || effectiveTopic($("#lesson-topic")?.value);
+    state.lastSlideGrade = $("#lesson-grade")?.value || "6";
+    cta.classList.remove("hidden");
+  }
   if (scrollToCanvas) {
     $("#slide-canvas")?.scrollIntoView({ behavior: "smooth", block: "nearest" });
   }
   syncSlideSourceUi();
 }
+function renderQuizUrlChoices(urls, selected) {
+  state.quizDiscoveredUrls = urls || [];
+  state.quizSelectedUrls = selected?.length ? selected : [...state.quizDiscoveredUrls];
+  renderUrlChoices(
+    urls,
+    selected,
+    "#quiz-url-choices-list",
+    "#quiz-url-choices-panel",
+    { discovered: state.quizDiscoveredUrls, selected: state.quizSelectedUrls }
+  );
+  syncQuizSourceUi();
+}
 function syncUrlSelectAll() {
   const boxes = [...document.querySelectorAll("#url-choices-list input[type=checkbox]")];
   const selectAll = $("#url-select-all");
   });
 }
+async function discoverQuizSources() {
+  const topic = effectiveTopic($("#quiz-topic")?.value);
+  if (!topic) {
+    showError("Set a topic before discovering sources.");
+    return;
+  }
+  await withRegionLoading($(".col-quiz .controls-panel"), "Discovering sources…", async () => {
+    const data = await callApi("discover_sources", [topic, state.workspaceSessionId]);
+    renderQuizUrlChoices(data.urls || [], data.selected_urls || data.urls || []);
+  });
+}
 async function autoSearchIngest() {
   const topic = effectiveTopic("");
   if (!topic) {
   );
 }
+async function collectQuizGenerationParams() {
+  const topic = effectiveTopic($("#quiz-topic")?.value);
+  const grade = $("#quiz-grade")?.value;
+  const questionCount = Number($("#quiz-count")?.value || 5);
+  const useRag = Boolean($("#lessons-use-rag")?.checked);
+  const docIds = effectiveDocIds([]);
+  const sourceMode = $("#quiz-source-mode")?.value || "";
+  const searchWorkflow = $("#quiz-search-workflow")?.value || "two_step";
+  const urlsText = $("#quiz-urls-text")?.value.trim() || "";
+  const selectedUrls = getSelectedDiscoveredUrls("#quiz-url-choices-list");
+  const filePaths = [];
+  const quizFiles = $("#quiz-source-files")?.files;
+  if (quizFiles?.length) {
+    for (const file of quizFiles) {
+      filePaths.push(await uploadFile(file));
+    }
+  }
+  return {
+    topic,
+    grade,
+    questionCount,
+    sessionId: state.workspaceSessionId,
+    useRag,
+    docIds,
+    sourceMode,
+    searchWorkflow,
+    urlsText,
+    selectedUrls,
+    filePaths,
+  };
+}
+function startQuizProgressPanel() {
+  const panel = $("#quiz-progress-panel");
+  const stepsEl = $("#quiz-progress-steps");
+  panel?.classList.remove("hidden");
+  state.progressStartedAt = Date.now();
+  if (stepsEl) {
+    stepsEl.innerHTML = QUIZ_PIPELINE_STEPS.map(
+      (label, index) =>
+        `<li data-step="${index}" class="progress-step pending">${label}</li>`
+    ).join("");
+  }
+  $("#quiz-progress-log")?.classList.add("hidden");
+  if ($("#quiz-progress-log")) $("#quiz-progress-log").textContent = "";
+  if ($("#quiz-progress-eta")) $("#quiz-progress-eta").textContent = "Est. remaining: calculating…";
+  updateQuizProgressElapsed();
+  if (state.progressTimer) clearInterval(state.progressTimer);
+  state.progressTimer = setInterval(updateQuizProgressElapsed, 500);
+}
+function updateQuizProgressElapsed() {
+  if (!state.progressStartedAt) return;
+  const elapsed = (Date.now() - state.progressStartedAt) / 1000;
+  if ($("#quiz-progress-elapsed")) {
+    $("#quiz-progress-elapsed").textContent = `Elapsed: ${elapsed.toFixed(1)}s`;
+  }
+  const eta = estimateQuizRemaining(elapsed);
+  if ($("#quiz-progress-eta")) {
+    $("#quiz-progress-eta").textContent =
+      eta !== null ? `Est. remaining: ~${Math.max(0, Math.round(eta))}s` : "";
+  }
+}
+function estimateQuizRemaining(elapsed) {
+  if (elapsed < 3) return null;
+  const stepNodes = [...document.querySelectorAll("#quiz-progress-steps .progress-step")];
+  const activeIndex = stepNodes.findIndex((node) => node.classList.contains("active"));
+  const doneCount = stepNodes.filter((node) => node.classList.contains("done")).length;
+  const progress = Math.max((doneCount + (activeIndex >= 0 ? 0.35 : 0)) / stepNodes.length, 0.15);
+  return elapsed / progress - elapsed;
+}
+function advanceQuizProgressWhileWaiting() {
+  let current = 0;
+  const mark = (index, status) => {
+    const node = document.querySelector(`#quiz-progress-steps [data-step="${index}"]`);
+    if (!node) return;
+    node.classList.remove("pending", "active", "done");
+    node.classList.add(status);
+  };
+  mark(current, "active");
+  const timer = setInterval(() => {
+    if (!$("#quiz-progress-panel") || $("#quiz-progress-panel").classList.contains("hidden")) {
+      clearInterval(timer);
+      return;
+    }
+    if (current < QUIZ_PIPELINE_STEPS.length - 1) {
+      mark(current, "done");
+      current += 1;
+      mark(current, "active");
+    }
+  }, 9000);
+  return timer;
+}
+function finishQuizProgressPanel(data) {
+  if (state.progressTimer) {
+    clearInterval(state.progressTimer);
+    state.progressTimer = null;
+  }
+  const stepsEl = $("#quiz-progress-steps");
+  const traceSteps = data?.progress?.steps || [];
+  if (stepsEl) {
+    if (traceSteps.length) {
+      stepsEl.innerHTML = traceSteps
+        .map((step) => {
+          const duration = step.duration_s != null ? ` (${step.duration_s}s)` : "";
+          const detail = step.detail ? ` — ${step.detail}` : "";
+          return `<li class="progress-step done">${step.label}${duration}${detail}</li>`;
+        })
+        .join("");
+    } else {
+      document.querySelectorAll("#quiz-progress-steps .progress-step").forEach((node) => {
+        node.classList.remove("pending", "active");
+        node.classList.add("done");
+      });
+    }
+  }
+  if (data?.progress_log) {
+    const logEl = $("#quiz-progress-log");
+    const log = data.progress_log;
+    if (logEl) {
+      if (/<[a-z][\s\S]*>/i.test(log)) logEl.innerHTML = log;
+      else logEl.textContent = stripMd(log);
+      logEl.classList.remove("hidden");
+    }
+  }
+  if (data?.elapsed_seconds != null && $("#quiz-progress-elapsed")) {
+    $("#quiz-progress-elapsed").textContent = `Elapsed: ${Number(data.elapsed_seconds).toFixed(1)}s`;
+  }
+  if ($("#quiz-progress-eta")) $("#quiz-progress-eta").textContent = "Complete";
+  setTracePanel("#quiz-trace-panel", data);
+}
+async function runQuizGenerationApi(apiArgs) {
+  startQuizProgressPanel();
+  const waitTimer = advanceQuizProgressWhileWaiting();
+  try {
+    return await callApi("generate_quiz", apiArgs);
+  } finally {
+    clearInterval(waitTimer);
+    if (state.progressTimer) {
+      clearInterval(state.progressTimer);
+      state.progressTimer = null;
+    }
+  }
+}
+function renderQuizGenerationResult(data, { scrollToPreview = false } = {}) {
+  finishQuizProgressPanel(data);
+  $("#quiz-generate-status").textContent = stripMd(data.status || "Quiz generated.");
+  const contentEl = $("#quiz-preview-content");
+  if (data.preview_html && contentEl) {
+    const blob = new Blob([data.preview_html], { type: "text/html;charset=utf-8" });
+    const url = URL.createObjectURL(blob);
+    contentEl.innerHTML = `<iframe class="quiz-preview-frame" src="${url}" title="Quiz preview"></iframe>`;
+  } else if (contentEl) {
+    contentEl.innerHTML = '<div class="studio-canvas-empty"><p>Preview unavailable.</p></div>';
+  }
+  state.quizDownloads = data.downloads;
+  const dl = $("#quiz-downloads");
+  if (data.downloads?.docx) {
+    dl.classList.remove("hidden");
+    dl.innerHTML = `
+      <a href="${fileUrl(data.downloads.docx)}" download>DOCX worksheet</a>
+      <a href="${fileUrl(data.downloads.html)}" download>HTML preview</a>`;
+  } else {
+    dl.classList.add("hidden");
+    dl.innerHTML = "";
+  }
+  const outlineDetails = $("#quiz-outline-details");
+  const outlineEl = $("#quiz-outline");
+  if (data.outline_md) {
+    outlineEl.innerHTML = renderMarkdownLite(data.outline_md);
+    outlineDetails?.classList.remove("hidden");
+  } else {
+    outlineEl.innerHTML = "";
+    outlineDetails?.classList.add("hidden");
+  }
+  setTracePanel("#quiz-trace-panel", data);
+  if (scrollToPreview) {
+    $("#quiz-preview")?.scrollIntoView({ behavior: "smooth", block: "nearest" });
+  }
+}
+async function generateQuiz() {
+  const params = await collectQuizGenerationParams();
+  await withRegionLoading(
+    $("#quiz-preview"),
+    "Generating quiz…",
+    async () => {
+      let data;
+      try {
+        data = await runQuizGenerationApi([
+          params.topic,
+          params.grade,
+          params.questionCount,
+          params.sessionId,
+          params.useRag,
+          params.docIds,
+          params.sourceMode,
+          params.searchWorkflow,
+          params.urlsText,
+          params.selectedUrls,
+          params.filePaths,
+        ]);
+      } catch (_err) {
+        if ($("#quiz-progress-eta")) $("#quiz-progress-eta").textContent = "Failed";
+        throw _err;
+      }
+      renderQuizGenerationResult(data, { scrollToPreview: true });
+    },
+    { overlayEl: $("#quiz-preview-overlay") }
+  );
+}
+function openQuizFromSlides() {
+  const topic = state.lastSlideTopic || effectiveTopic($("#lesson-topic")?.value);
+  const grade = state.lastSlideGrade || $("#lesson-grade")?.value || "6";
+  if ($("#quiz-topic")) $("#quiz-topic").value = topic;
+  if ($("#quiz-grade")) $("#quiz-grade").value = grade;
+  setWorkspaceView("quiz");
+  window.setTimeout(() => $("#quiz-topic")?.focus(), 80);
+}
 function renderLessonsReply(data) {
   state.history = data.history ?? state.history;
   if (state.history.length) {
   $("#slide-count").addEventListener("input", (e) => {
     $("#slide-count-val").textContent = e.target.value;
   });
+  $("#quiz-count")?.addEventListener("input", (e) => {
+    $("#quiz-count-val").textContent = e.target.value;
+  });
   document.querySelectorAll(".nav-item[data-view]").forEach((btn) => {
     btn.addEventListener("click", () => {
   $("#slide-search-workflow")?.addEventListener("change", syncSlideSourceUi);
   $("#btn-slide-discover")?.addEventListener("click", () => discoverSlideSources().catch(() => {}));
+  $("#quiz-source-mode")?.addEventListener("change", syncQuizSourceUi);
+  $("#quiz-search-workflow")?.addEventListener("change", syncQuizSourceUi);
+  $("#btn-quiz-discover")?.addEventListener("click", () => discoverQuizSources().catch(() => {}));
   $("#btn-research-ask").addEventListener("click", () => askResearchQuestion().catch(() => {}));
   $("#research-question")?.addEventListener("keydown", (e) => {
     if (e.key === "Enter" && !e.shiftKey) {
   });
   $("#btn-generate").addEventListener("click", () => generateSlides().catch(() => {}));
+  $("#btn-generate-quiz")?.addEventListener("click", () => generateQuiz().catch(() => {}));
+  $("#btn-slides-to-quiz")?.addEventListener("click", () => openQuizFromSlides());
   $("#btn-present")?.addEventListener("click", () => openPresenter());
   $("#btn-research-to-slides")?.addEventListener("click", () =>
     generateSlidesFromConversation("research").catch(() => {})

libs/agent/src/agent/models.py CHANGED Viewed

@@ -17,6 +17,32 @@ class SlideOutline(BaseModel):
     slides: list[SlideSpec] = Field(min_length=1)
 class EducationPptxInput(BaseModel):
     topic: str
     grade: str

     slides: list[SlideSpec] = Field(min_length=1)
+class QuizQuestion(BaseModel):
+    prompt: str
+    choices: list[str] = Field(min_length=4, max_length=4)
+    correct_index: int = Field(ge=0, le=3)
+    explanation: str = ""
+class QuizOutline(BaseModel):
+    title: str
+    instructions: str = ""
+    questions: list[QuizQuestion] = Field(min_length=3, max_length=12)
+class QuizMakerInput(BaseModel):
+    topic: str
+    grade: str
+    question_count: int = Field(ge=5, le=10, default=5)
+    source_mode: Literal["none", "web", "rag"] = "none"
+    search_workflow: Literal["two_step", "auto"] = "two_step"
+    urls: list[str] = Field(default_factory=list)
+    files: list[Path] = Field(default_factory=list)
+    session_id: str | None = None
+    doc_ids: list[str] = Field(default_factory=list)
+    conversation_context: str = ""
 class EducationPptxInput(BaseModel):
     topic: str
     grade: str

libs/agent/src/agent/progress.py CHANGED Viewed

@@ -204,3 +204,62 @@ class SlideGenerationProgress:
         fraction = min(self._completed_weight / total_weight, 0.98)
         desc = label if not detail else f"{label} — {detail}"
         self.on_update(fraction, desc)

         fraction = min(self._completed_weight / total_weight, 0.98)
         desc = label if not detail else f"{label} — {detail}"
         self.on_update(fraction, desc)
+@dataclass
+class QuizGenerationProgress(SlideGenerationProgress):
+    """Quiz generation progress tracker (same steps, quiz-specific banner text)."""
+    def format_log_html(
+        self,
+        *,
+        running: bool = False,
+        footer_html: str = "",
+    ) -> str:
+        elapsed = self.elapsed_s()
+        eta = self.estimate_remaining_s() if running else None
+        banner = (
+            '<div class="slide-gen-log-banner running">Generating quiz…</div>'
+            if running
+            else '<div class="slide-gen-log-banner done">Quiz generation complete</div>'
+        )
+        eta_html = (
+            f'<div class="slide-gen-log-meta">Est. remaining: ~{int(eta)}s</div>'
+            if eta is not None and running
+            else ""
+        )
+        steps_html: list[str] = []
+        for step in self.steps:
+            done = step.ended_at is not None
+            status = "done" if done else "active"
+            icon = "✓" if done else "●"
+            duration = (
+                f' <span class="slide-gen-log-dur">({step.duration_s:.1f}s)</span>'
+                if step.duration_s is not None
+                else ""
+            )
+            detail = (
+                f' <span class="slide-gen-log-detail">— {escape(step.detail)}</span>'
+                if step.detail
+                else ""
+            )
+            steps_html.append(
+                f'<li class="slide-gen-log-step {status}">'
+                f'<span class="slide-gen-log-icon">{icon}</span>'
+                f'<span class="slide-gen-log-label">{escape(step.label)}</span>'
+                f"{duration}{detail}</li>"
+            )
+        steps_block = (
+            f'<ol class="slide-gen-log-steps">{"".join(steps_html)}</ol>'
+            if steps_html
+            else '<p class="slide-gen-log-empty">Waiting for first step…</p>'
+        )
+        return (
+            f'<div class="slide-gen-log">'
+            f"{banner}"
+            f'<div class="slide-gen-log-meta">Elapsed: {elapsed:.1f}s</div>'
+            f"{eta_html}"
+            f"{steps_block}"
+            f"{footer_html}"
+            f"</div>"
+        )

libs/agent/src/agent/prompts.py CHANGED Viewed

@@ -2,7 +2,7 @@ from __future__ import annotations
 import json
-from agent.models import EducationPptxInput, SlideOutline, SlideSpec
 def education_outline_system(skill_body: str) -> str:
@@ -182,3 +182,151 @@ def outline_json_example(slide_count: int) -> str:
         ],
     }
     return json.dumps(example, indent=2)

 import json
+from agent.models import EducationPptxInput, QuizMakerInput, QuizOutline, QuizQuestion, SlideOutline, SlideSpec
 def education_outline_system(skill_body: str) -> str:
         ],
     }
     return json.dumps(example, indent=2)
+def quiz_max_tokens(question_count: int) -> int:
+    count = max(3, min(int(question_count), 12))
+    return min(1536, 120 + count * 180)
+def quiz_outline_system(skill_body: str) -> str:
+    return f"""You are an expert teacher writing multiple-choice quizzes.
+Follow the skill workflow below and output ONLY valid JSON (no markdown fences).
+Skill workflow:
+{skill_body}
+Required JSON shape:
+{{
+  "title": "Photosynthesis Quiz — Grade 6",
+  "instructions": "Read each question. Circle the best answer.",
+  "questions": [
+    {{
+      "prompt": "What do plants use to make food?",
+      "choices": ["Sunlight", "Rocks", "Plastic", "Metal"],
+      "correct_index": 0,
+      "explanation": "Plants use sunlight in photosynthesis."
+    }}
+  ]
+}}
+Rules:
+- Each question has exactly 4 choices; correct_index is 0-3.
+- Grade-appropriate vocabulary and plausible distractors.
+- Output compact JSON only — no preamble, no markdown fences.
+- When source excerpts are provided, ground questions in those sources.
+"""
+def quiz_outline_user(req: QuizMakerInput, *, source_context: str = "") -> str:
+    base = (
+        f"Topic: {req.topic}\n"
+        f"Grade level: {req.grade}\n"
+        f"Number of questions: {req.question_count}\n"
+    )
+    if source_context.strip():
+        base += (
+            "\nUse the following retrieved source excerpts as factual grounding. "
+            "Prefer these over general knowledge when they apply.\n\n"
+            f"{source_context}\n"
+        )
+    if req.conversation_context.strip():
+        base += (
+            "\nBase the quiz on this conversation transcript when relevant.\n\n"
+            f"{req.conversation_context.strip()}\n"
+        )
+    return base + "\nReturn JSON only."
+def quiz_outline_repair(
+    invalid_output: str,
+    error: str,
+    *,
+    expected_questions: int | None = None,
+) -> str:
+    count_line = ""
+    if expected_questions is not None:
+        count_line = f"\nYou must include exactly {expected_questions} items in the questions array.\n"
+    return (
+        "The previous response was invalid JSON or did not match the QuizOutline schema.\n"
+        f"Validation error: {error}\n"
+        f"{count_line}"
+        f"Previous output:\n{invalid_output}\n\n"
+        "Return corrected JSON only, no explanation."
+    )
+def quiz_outline_retry_user(req: QuizMakerInput, *, example_json: str) -> str:
+    return (
+        f"Topic: {req.topic}\n"
+        f"Grade level: {req.grade}\n"
+        f"Number of questions: {req.question_count}\n\n"
+        "Your previous response was empty or invalid. "
+        "Write real quiz content for the topic. "
+        "Return ONLY valid JSON matching this structure:\n"
+        f"{example_json}"
+    )
+def quiz_json_example(question_count: int) -> str:
+    example = {
+        "title": "Example Quiz",
+        "instructions": "Circle the best answer for each question.",
+        "questions": [
+            {
+                "prompt": f"Question {i}?",
+                "choices": ["Correct answer", "Distractor A", "Distractor B", "Distractor C"],
+                "correct_index": 0,
+                "explanation": "Brief teacher note.",
+            }
+            for i in range(1, question_count + 1)
+        ],
+    }
+    return json.dumps(example, indent=2)
+def fallback_quiz(req: QuizMakerInput) -> QuizOutline:
+    """Deterministic quiz when the model returns empty or unparseable JSON."""
+    topic = req.topic.strip() or "Lesson"
+    grade = req.grade
+    n = req.question_count
+    questions: list[QuizQuestion] = []
+    for i in range(1, n + 1):
+        questions.append(
+            QuizQuestion(
+                prompt=f"What is an important idea about {topic} (question {i})?",
+                choices=[
+                    f"A key fact about {topic}",
+                    "An unrelated detail",
+                    "A common misconception",
+                    "None of these",
+                ],
+                correct_index=0,
+                explanation="Template question — edit using your lesson sources.",
+            )
+        )
+    return QuizOutline(
+        title=f"{topic[:1].upper() + topic[1:]} Quiz — Grade {grade}",
+        instructions="Read each question carefully. Circle the best answer.",
+        questions=questions,
+    )
+def quiz_to_markdown(outline: QuizOutline) -> str:
+    lines = [f"# {outline.title}", ""]
+    if outline.instructions.strip():
+        lines.extend([outline.instructions.strip(), ""])
+    for i, q in enumerate(outline.questions, start=1):
+        lines.append(f"## Question {i}")
+        lines.append("")
+        lines.append(q.prompt)
+        lines.append("")
+        for label, choice in zip("ABCD", q.choices, strict=True):
+            lines.append(f"- **{label}.** {choice}")
+        correct = "ABCD"[q.correct_index]
+        lines.append("")
+        lines.append(f"**Answer:** {correct}")
+        if q.explanation.strip():
+            lines.append(f"*{q.explanation.strip()}*")
+        lines.append("")
+    return "\n".join(lines).strip() + "\n"

libs/agent/src/agent/runner.py CHANGED Viewed

@@ -17,6 +17,9 @@ from researchmind.retrieve import retrieve
 from agent.models import (
     Citation,
     EducationPptxInput,
     ResearchChatInput,
     ResearchChatResult,
     ResearchDiscoverResult,
@@ -25,17 +28,25 @@ from agent.models import (
     SlideSpec,
 )
 from agent.preview import outline_to_html, render_slide_images
-from agent.progress import SlideGenerationProgress
 from agent.prompts import (
     education_outline_repair,
     education_outline_retry_user,
     education_outline_system,
     education_outline_user,
     fallback_outline,
     outline_json_example,
     outline_looks_like_schema_echo,
     outline_max_tokens,
     outline_to_markdown,
 )
 from agent.skills import SkillRegistry
 from agent.tools.docx import create_docx, create_html_export
@@ -43,8 +54,11 @@ from agent.tools_registry import ToolRegistry
 from agent.trace import TraceRecorder
 EDUCATION_PPTX_SKILL = "education-pptx"
 RESEARCH_MIND_SKILL = "research-mind"
 @dataclass
 class AgentResult:
@@ -60,6 +74,18 @@ class AgentResult:
     source_summary: str = ""
 class AgentRunner:
     def __init__(
         self,
@@ -322,9 +348,217 @@ class AgentRunner:
             source_summary=source_summary,
         )
     def _gather_lesson_source_context(
         self,
-        req: EducationPptxInput,
         backend: InferenceBackend,
         model_key: str,
         trace: TraceRecorder,
@@ -519,6 +753,194 @@ class AgentRunner:
             )
         return fallback_outline(req)
     def _parse_outline_or_error(
         self,
         raw: str,
@@ -675,7 +1097,7 @@ class AgentRunner:
     def _lesson_doc_ids(
         store: Any,
         session_id: str | None,
-        req: EducationPptxInput,
         ingest: ResearchIngestResult | None,
     ) -> list[str]:
         if req.doc_ids:
@@ -711,7 +1133,7 @@ class AgentRunner:
     def _lesson_retrieve_scope(
         store: Any,
         session_id: str | None,
-        req: EducationPptxInput,
         ingest: ResearchIngestResult | None,
     ) -> tuple[str | None, list[str] | None]:
         from researchmind.scope import resolve_retrieve_scope

 from agent.models import (
     Citation,
     EducationPptxInput,
+    QuizMakerInput,
+    QuizOutline,
+    QuizQuestion,
     ResearchChatInput,
     ResearchChatResult,
     ResearchDiscoverResult,
     SlideSpec,
 )
 from agent.preview import outline_to_html, render_slide_images
+from agent.progress import QuizGenerationProgress, SlideGenerationProgress
 from agent.prompts import (
     education_outline_repair,
     education_outline_retry_user,
     education_outline_system,
     education_outline_user,
     fallback_outline,
+    fallback_quiz,
     outline_json_example,
     outline_looks_like_schema_echo,
     outline_max_tokens,
     outline_to_markdown,
+    quiz_json_example,
+    quiz_max_tokens,
+    quiz_outline_repair,
+    quiz_outline_retry_user,
+    quiz_outline_system,
+    quiz_outline_user,
+    quiz_to_markdown,
 )
 from agent.skills import SkillRegistry
 from agent.tools.docx import create_docx, create_html_export
 from agent.trace import TraceRecorder
 EDUCATION_PPTX_SKILL = "education-pptx"
+QUIZ_MAKER_SKILL = "quiz-maker"
 RESEARCH_MIND_SKILL = "research-mind"
+LessonSourceInput = EducationPptxInput | QuizMakerInput
 @dataclass
 class AgentResult:
     source_summary: str = ""
+@dataclass
+class QuizAgentResult:
+    markdown_preview: str
+    html_preview: str
+    docx_path: str
+    html_export_path: str
+    trace: TraceRecorder
+    trace_path: str
+    outline: QuizOutline
+    source_summary: str = ""
 class AgentRunner:
     def __init__(
         self,
             source_summary=source_summary,
         )
+    def run_quiz_maker(
+        self,
+        *,
+        topic: str,
+        grade: str,
+        question_count: int = 5,
+        model_key: str,
+        backend: InferenceBackend,
+        source_mode: Literal["none", "web", "rag"] = "none",
+        search_workflow: Literal["two_step", "auto"] = "two_step",
+        urls: list[str] | None = None,
+        files: list[Path] | None = None,
+        session_id: str | None = None,
+        doc_ids: list[str] | None = None,
+        conversation_context: str = "",
+        progress: QuizGenerationProgress | None = None,
+    ) -> QuizAgentResult:
+        result: QuizAgentResult | None = None
+        for item in self.iter_quiz_maker(
+            topic=topic,
+            grade=grade,
+            question_count=question_count,
+            model_key=model_key,
+            backend=backend,
+            source_mode=source_mode,
+            search_workflow=search_workflow,
+            urls=urls,
+            files=files,
+            session_id=session_id,
+            doc_ids=doc_ids,
+            conversation_context=conversation_context,
+            progress=progress,
+        ):
+            if isinstance(item, QuizAgentResult):
+                result = item
+        if result is None:
+            raise RuntimeError("Quiz generation did not return a result")
+        return result
+    def iter_quiz_maker(
+        self,
+        *,
+        topic: str,
+        grade: str,
+        question_count: int = 5,
+        model_key: str,
+        backend: InferenceBackend,
+        source_mode: Literal["none", "web", "rag"] = "none",
+        search_workflow: Literal["two_step", "auto"] = "two_step",
+        urls: list[str] | None = None,
+        files: list[Path] | None = None,
+        session_id: str | None = None,
+        doc_ids: list[str] | None = None,
+        conversation_context: str = "",
+        progress: QuizGenerationProgress | None = None,
+    ) -> Iterator[QuizGenerationProgress | QuizAgentResult]:
+        skill = self._skills.get(QUIZ_MAKER_SKILL)
+        req = QuizMakerInput(
+            topic=topic.strip(),
+            grade=grade,
+            question_count=question_count,
+            source_mode=source_mode,
+            search_workflow=search_workflow,
+            urls=urls or [],
+            files=files or [],
+            session_id=session_id,
+            doc_ids=doc_ids or [],
+            conversation_context=(conversation_context or "").strip(),
+        )
+        trace = TraceRecorder(
+            skill=skill.name,
+            model=model_key,
+            user_input=req.model_dump(mode="json"),
+        )
+        try:
+            yield from self._iter_quiz_maker_steps(
+                req=req,
+                skill=skill,
+                model_key=model_key,
+                backend=backend,
+                trace=trace,
+                progress=progress,
+            )
+        except Exception as exc:
+            trace.log_note("Run failed", error=str(exc))
+            try:
+                trace.save()
+            except OSError:
+                pass
+            raise
+    def _iter_quiz_maker_steps(
+        self,
+        *,
+        req: QuizMakerInput,
+        skill: Any,
+        model_key: str,
+        backend: InferenceBackend,
+        trace: TraceRecorder,
+        progress: QuizGenerationProgress | None,
+    ) -> Iterator[QuizGenerationProgress | QuizAgentResult]:
+        if req.conversation_context.strip():
+            trace.log_note(
+                "Conversation grounding",
+                chars=len(req.conversation_context.strip()),
+            )
+        if progress is not None:
+            progress.begin("load_model", "Load language model")
+            yield progress
+        load_started = monotonic()
+        backend.load()
+        load_ms = int((monotonic() - load_started) * 1000)
+        trace.log_step("load_model", "Load language model", duration_ms=load_ms)
+        if progress is not None:
+            progress.begin(
+                "gather_sources",
+                "Gather lesson sources",
+                detail=req.source_mode,
+            )
+            yield progress
+        source_started = monotonic()
+        source_context, source_summary, active_session = self._gather_lesson_source_context(
+            req, backend, model_key, trace
+        )
+        source_ms = int((monotonic() - source_started) * 1000)
+        trace.log_step(
+            "gather_sources",
+            "Gather lesson sources",
+            duration_ms=source_ms,
+            source_mode=req.source_mode,
+        )
+        if active_session:
+            req = req.model_copy(update={"session_id": active_session})
+        if progress is not None:
+            yield progress
+        if progress is not None:
+            progress.begin(
+                "generate_outline",
+                "Generate quiz outline",
+                detail=f"{req.question_count} questions · grade {req.grade}",
+            )
+            yield progress
+        outline_started = monotonic()
+        outline = self._generate_quiz_outline(
+            skill, req, backend, trace, source_context=source_context, progress=progress
+        )
+        outline_ms = int((monotonic() - outline_started) * 1000)
+        trace.log_step(
+            "generate_outline",
+            "Generate quiz outline",
+            duration_ms=outline_ms,
+            question_count=len(outline.questions),
+        )
+        for step in trace.steps:
+            if step.get("type") == "note" and step.get("phase") == "outline_fallback":
+                note = str(step.get("message") or "")
+                source_summary = f"{source_summary}\n\n_{note}_".strip() if source_summary else f"_{note}_"
+        if progress is not None:
+            yield progress
+        if progress is not None:
+            progress.begin("create_exports", "Build DOCX and HTML quiz exports")
+            yield progress
+        export_started = monotonic()
+        tool = self._tools.get("create_quiz")
+        export_paths = tool.handler(outline, run_id=trace.run_id)
+        trace.log_tool(
+            "create_quiz",
+            {"title": outline.title, "question_count": len(outline.questions)},
+            json.dumps(export_paths),
+        )
+        docx_path = export_paths["docx"]
+        html_export_path = export_paths["html"]
+        export_ms = int((monotonic() - export_started) * 1000)
+        trace.log_step(
+            "create_exports",
+            "Build DOCX and HTML quiz exports",
+            duration_ms=export_ms,
+        )
+        trace.set_artifact(docx_path)
+        markdown = quiz_to_markdown(outline)
+        html_preview_path = Path(html_export_path)
+        html_preview = html_preview_path.read_text(encoding="utf-8")
+        if progress is not None:
+            progress.finish()
+            yield progress
+        trace_path = trace.save()
+        yield QuizAgentResult(
+            markdown_preview=markdown,
+            html_preview=html_preview,
+            docx_path=docx_path,
+            html_export_path=html_export_path,
+            trace=trace,
+            trace_path=str(trace_path),
+            outline=outline,
+            source_summary=source_summary,
+        )
     def _gather_lesson_source_context(
         self,
+        req: LessonSourceInput,
         backend: InferenceBackend,
         model_key: str,
         trace: TraceRecorder,
             )
         return fallback_outline(req)
+    def _generate_quiz_outline(
+        self,
+        skill: Any,
+        req: QuizMakerInput,
+        backend: InferenceBackend,
+        trace: TraceRecorder,
+        *,
+        source_context: str = "",
+        progress: QuizGenerationProgress | None = None,
+    ) -> QuizOutline:
+        system = quiz_outline_system(skill.body)
+        user = quiz_outline_user(req, source_context=source_context)
+        messages = [
+            {"role": "system", "content": system},
+            {"role": "user", "content": user},
+        ]
+        prompt_text = system + "\n\n" + user
+        token_budget = quiz_max_tokens(req.question_count)
+        raw = self._normalize_outline_llm_text(
+            backend.chat(messages, max_tokens=token_budget, temperature=0.0)
+        )
+        trace.log_llm(prompt_text, raw)
+        if not raw:
+            trace.log_note(
+                "Empty quiz outline response; retrying with JSON example",
+                phase="outline_retry",
+            )
+            example = quiz_json_example(req.question_count)
+            retry_user = quiz_outline_retry_user(req, example_json=example)
+            retry_messages = [
+                {"role": "system", "content": system},
+                {"role": "user", "content": retry_user},
+            ]
+            retry_prompt = system + "\n\n" + retry_user
+            raw = self._normalize_outline_llm_text(
+                backend.chat(retry_messages, max_tokens=token_budget, temperature=0.0)
+            )
+            trace.log_llm(retry_prompt, raw)
+        outline, parse_error = self._parse_quiz_outline_or_error(
+            raw, req.question_count, trace
+        )
+        if outline is not None:
+            return outline
+        if progress is not None:
+            progress.begin(
+                "repair_outline",
+                "Repair quiz JSON",
+                detail=(parse_error or "invalid JSON")[:80],
+            )
+        repair_started = monotonic()
+        repair_user = quiz_outline_repair(
+            raw,
+            parse_error or "invalid JSON",
+            expected_questions=req.question_count,
+        )
+        repair_messages = messages + [
+            {"role": "assistant", "content": raw},
+            {"role": "user", "content": repair_user},
+        ]
+        repaired = self._normalize_outline_llm_text(
+            backend.chat(
+                repair_messages,
+                max_tokens=min(768, token_budget),
+                temperature=0.0,
+            )
+        )
+        trace.log_llm(repair_user, repaired)
+        outline, repair_error = self._parse_quiz_outline_or_error(
+            repaired, req.question_count, trace
+        )
+        repair_ms = int((monotonic() - repair_started) * 1000)
+        if outline is not None:
+            trace.log_step(
+                "repair_outline",
+                "Repair quiz JSON",
+                duration_ms=repair_ms,
+            )
+            return outline
+        trace.log_step(
+            "repair_outline",
+            "Repair quiz JSON",
+            duration_ms=repair_ms,
+            error=repair_error or parse_error,
+        )
+        trace.log_note(
+            "Model quiz outline invalid after repair; using template questions.",
+            phase="outline_fallback",
+        )
+        if progress is not None:
+            progress.begin(
+                "fallback_outline",
+                "Use template quiz",
+                detail=(repair_error or parse_error or "invalid JSON")[:80],
+            )
+        return fallback_quiz(req)
+    def _parse_quiz_outline_or_error(
+        self,
+        raw: str,
+        expected_questions: int,
+        trace: TraceRecorder | None,
+    ) -> tuple[QuizOutline | None, str]:
+        if not raw.strip():
+            return None, "Model returned empty output (no JSON)"
+        try:
+            return self._parse_quiz_outline(raw, expected_questions, trace), ""
+        except (json.JSONDecodeError, ValueError) as exc:
+            return None, str(exc)
+    def _parse_quiz_outline(
+        self,
+        raw: str,
+        expected_questions: int,
+        trace: TraceRecorder | None = None,
+    ) -> QuizOutline:
+        data = self._sanitize_quiz_data(self._extract_json(raw))
+        outline = QuizOutline.model_validate(data)
+        original_count = len(outline.questions)
+        outline = self._normalize_question_count(outline, expected_questions)
+        if trace and original_count != expected_questions:
+            trace.log_note(
+                "Adjusted question count to match request",
+                requested=expected_questions,
+                model_returned=original_count,
+                final=len(outline.questions),
+            )
+        return outline
+    @staticmethod
+    def _sanitize_quiz_data(data: dict[str, Any]) -> dict[str, Any]:
+        title = str(data.get("title") or "Quiz").strip() or "Quiz"
+        instructions = str(data.get("instructions") or "").strip()
+        questions_in = data.get("questions") or []
+        questions_out: list[dict[str, Any]] = []
+        for index, question in enumerate(questions_in):
+            if not isinstance(question, dict):
+                continue
+            prompt = str(question.get("prompt") or f"Question {index + 1}?").strip()
+            choices_raw = question.get("choices") or []
+            if isinstance(choices_raw, str):
+                choices_raw = [choices_raw]
+            choices = [str(c).strip() for c in choices_raw if str(c).strip()]
+            while len(choices) < 4:
+                choices.append(f"Option {len(choices) + 1}")
+            choices = choices[:4]
+            correct_index = int(question.get("correct_index", 0))
+            correct_index = max(0, min(3, correct_index))
+            questions_out.append(
+                {
+                    "prompt": prompt or f"Question {index + 1}?",
+                    "choices": choices,
+                    "correct_index": correct_index,
+                    "explanation": str(question.get("explanation") or ""),
+                }
+            )
+        if not questions_out:
+            questions_out.append(
+                {
+                    "prompt": "Sample question?",
+                    "choices": ["Answer A", "Answer B", "Answer C", "Answer D"],
+                    "correct_index": 0,
+                    "explanation": "",
+                }
+            )
+        return {"title": title, "instructions": instructions, "questions": questions_out}
+    @staticmethod
+    def _normalize_question_count(outline: QuizOutline, expected: int) -> QuizOutline:
+        questions = list(outline.questions)
+        if len(questions) > expected:
+            questions = questions[:expected]
+        while len(questions) < expected:
+            number = len(questions) + 1
+            questions.append(
+                QuizQuestion(
+                    prompt=f"Additional question {number} about {outline.title}?",
+                    choices=["Correct", "Distractor A", "Distractor B", "Distractor C"],
+                    correct_index=0,
+                    explanation="",
+                )
+            )
+        return outline.model_copy(update={"questions": questions})
     def _parse_outline_or_error(
         self,
         raw: str,
     def _lesson_doc_ids(
         store: Any,
         session_id: str | None,
+        req: LessonSourceInput,
         ingest: ResearchIngestResult | None,
     ) -> list[str]:
         if req.doc_ids:
     def _lesson_retrieve_scope(
         store: Any,
         session_id: str | None,
+        req: LessonSourceInput,
         ingest: ResearchIngestResult | None,
     ) -> tuple[str | None, list[str] | None]:
         from researchmind.scope import resolve_retrieve_scope

libs/agent/src/agent/tools/quiz.py ADDED Viewed

	@@ -0,0 +1,134 @@

+"""Quiz export: DOCX worksheet + HTML preview."""
+from __future__ import annotations
+import html
+from pathlib import Path
+from docx import Document
+from agent.models import QuizOutline, QuizQuestion
+_CHOICE_LABELS = ("A", "B", "C", "D")
+def _add_question_docx(doc: Document, index: int, question: QuizQuestion) -> None:
+    doc.add_paragraph(f"{index}. {question.prompt}")
+    for label, choice in zip(_CHOICE_LABELS, question.choices, strict=True):
+        doc.add_paragraph(f"   {label}. {choice}")
+    doc.add_paragraph("")
+def create_quiz_docx(outline: QuizOutline, path: Path) -> Path:
+    """Student worksheet with numbered questions; answer key on final page."""
+    path = Path(path)
+    path.parent.mkdir(parents=True, exist_ok=True)
+    doc = Document()
+    doc.add_heading(outline.title, level=0)
+    if outline.instructions.strip():
+        doc.add_paragraph(outline.instructions.strip())
+        doc.add_paragraph("")
+    for i, question in enumerate(outline.questions, start=1):
+        _add_question_docx(doc, i, question)
+    doc.add_page_break()
+    doc.add_heading("Answer Key", level=1)
+    for i, question in enumerate(outline.questions, start=1):
+        label = _CHOICE_LABELS[question.correct_index]
+        answer = question.choices[question.correct_index]
+        p = doc.add_paragraph()
+        run = p.add_run(f"{i}. {label}. {answer}")
+        run.bold = True
+        if question.explanation.strip():
+            doc.add_paragraph(question.explanation.strip(), style="List Bullet")
+    doc.save(str(path))
+    return path
+def create_quiz_html(outline: QuizOutline, path: Path) -> Path:
+    """Printable HTML worksheet with collapsible answer key."""
+    path = Path(path)
+    path.parent.mkdir(parents=True, exist_ok=True)
+    title = html.escape(outline.title)
+    instructions = html.escape(outline.instructions.strip()) if outline.instructions.strip() else ""
+    question_blocks: list[str] = []
+    answer_rows: list[str] = []
+    for i, question in enumerate(outline.questions, start=1):
+        prompt = html.escape(question.prompt)
+        choices_html = "\n".join(
+            f'<li><span class="choice-label">{label}.</span> {html.escape(choice)}</li>'
+            for label, choice in zip(_CHOICE_LABELS, question.choices, strict=True)
+        )
+        question_blocks.append(
+            f'<section class="question"><h3>{i}. {prompt}</h3><ol class="choices">{choices_html}</ol></section>'
+        )
+        correct_label = _CHOICE_LABELS[question.correct_index]
+        correct_text = html.escape(question.choices[question.correct_index])
+        expl = html.escape(question.explanation.strip()) if question.explanation.strip() else ""
+        expl_html = f'<p class="explanation">{expl}</p>' if expl else ""
+        answer_rows.append(
+            f"<tr><td>{i}</td><td><strong>{correct_label}. {correct_text}</strong></td>"
+            f"<td>{expl}</td></tr>"
+            if expl
+            else f"<tr><td>{i}</td><td><strong>{correct_label}. {correct_text}</strong></td><td></td></tr>"
+        )
+    body = f"""<!DOCTYPE html>
+<html lang="en">
+<head>
+<meta charset="utf-8">
+<title>{title}</title>
+<style>
+  body {{ font-family: Georgia, serif; max-width: 720px; margin: 2rem auto; padding: 0 1rem; line-height: 1.5; }}
+  h1 {{ font-size: 1.5rem; margin-bottom: 0.5rem; }}
+  .instructions {{ margin-bottom: 1.5rem; color: #333; }}
+  .question {{ margin-bottom 1.25rem; page-break-inside: avoid; }}
+  .question h3 {{ font-size: 1rem; font-weight: 600; margin: 0 0 0.5rem; }}
+  .choices {{ list-style: none; padding-left: 0; margin: 0; }}
+  .choices li {{ margin: 0.25rem 0; }}
+  .choice-label {{ font-weight: 600; margin-right: 0.35rem; }}
+  details.answer-key {{ margin-top: 2rem; border-top: 2px solid #ccc; padding-top: 1rem; }}
+  table {{ width: 100%; border-collapse: collapse; font-size: 0.9rem; }}
+  th, td {{ border: 1px solid #ccc; padding: 0.4rem 0.6rem; text-align: left; vertical-align: top; }}
+  th {{ background: #f5f5f5; }}
+  @media print {{
+    details.answer-key {{ display: block; }}
+    details.answer-key summary {{ display: none; }}
+  }}
+</style>
+</head>
+<body>
+  <h1>{title}</h1>
+  {"<p class='instructions'>" + instructions + "</p>" if instructions else ""}
+  {"".join(question_blocks)}
+  <details class="answer-key">
+    <summary>Answer key (click to expand)</summary>
+    <table>
+      <thead><tr><th>#</th><th>Answer</th><th>Explanation</th></tr></thead>
+      <tbody>
+        {"".join(answer_rows)}
+      </tbody>
+    </table>
+  </details>
+</body>
+</html>
+"""
+    path.write_text(body, encoding="utf-8")
+    return path
+def create_quiz(outline: QuizOutline, output_dir: Path, stem: str = "quiz") -> dict[str, Path]:
+    """Write DOCX and HTML exports; return paths keyed by format."""
+    output_dir = Path(output_dir)
+    output_dir.mkdir(parents=True, exist_ok=True)
+    docx_path = output_dir / f"{stem}.docx"
+    html_path = output_dir / f"{stem}.html"
+    create_quiz_docx(outline, docx_path)
+    create_quiz_html(outline, html_path)
+    return {"docx": docx_path, "html": html_path}

libs/agent/src/agent/tools_registry.py CHANGED Viewed

@@ -4,8 +4,9 @@ from collections.abc import Callable
 from dataclasses import dataclass
 from typing import Any
-from agent.models import SlideOutline
 from agent.tools.pptx import create_pptx
 from agent.tools.research_tools import (
     tool_extract_and_index,
     tool_research_answer,
@@ -30,6 +31,11 @@ class ToolRegistry:
             "Create a PowerPoint file from a validated SlideOutline",
             self._handle_create_pptx,
         )
         self.register(
             "suggest_urls",
             "Suggest research URLs for a topic using the local LLM",
@@ -72,3 +78,14 @@ class ToolRegistry:
     def _handle_create_pptx(self, outline: SlideOutline, run_id: str | None = None) -> str:
         path = create_pptx(outline, run_id=run_id)
         return str(path)

 from dataclasses import dataclass
 from typing import Any
+from agent.models import QuizOutline, SlideOutline
 from agent.tools.pptx import create_pptx
+from agent.tools.quiz import create_quiz
 from agent.tools.research_tools import (
     tool_extract_and_index,
     tool_research_answer,
             "Create a PowerPoint file from a validated SlideOutline",
             self._handle_create_pptx,
         )
+        self.register(
+            "create_quiz",
+            "Create DOCX and HTML quiz exports from a validated QuizOutline",
+            self._handle_create_quiz,
+        )
         self.register(
             "suggest_urls",
             "Suggest research URLs for a topic using the local LLM",
     def _handle_create_pptx(self, outline: SlideOutline, run_id: str | None = None) -> str:
         path = create_pptx(outline, run_id=run_id)
         return str(path)
+    def _handle_create_quiz(
+        self,
+        outline: QuizOutline,
+        run_id: str | None = None,
+    ) -> dict[str, str]:
+        from agent.tools.pptx import get_outputs_dir
+        output_dir = get_outputs_dir() / (run_id or "quiz")
+        paths = create_quiz(outline, output_dir, stem="quiz")
+        return {fmt: str(path) for fmt, path in paths.items()}

libs/agent/tests/test_quiz_maker.py ADDED Viewed

	@@ -0,0 +1,123 @@

+"""Tests for quiz-maker skill: JSON parse, fallback, and export smoke."""
+from pathlib import Path
+from agent.models import QuizMakerInput, QuizOutline, QuizQuestion
+from agent.prompts import fallback_quiz, quiz_max_tokens, quiz_to_markdown
+from agent.runner import AgentRunner
+from agent.tools.quiz import create_quiz, create_quiz_docx, create_quiz_html
+def test_quiz_max_tokens_scales_with_question_count():
+    assert quiz_max_tokens(5) == 1020
+    assert quiz_max_tokens(3) == 660
+    assert quiz_max_tokens(12) == 1536
+def test_parse_quiz_outline_normalizes_count():
+    runner = AgentRunner()
+    raw = (
+        '{"title": "Science Quiz", "instructions": "Circle one.", "questions": ['
+        '{"prompt": "Q1?", "choices": ["a", "b", "c", "d"], "correct_index": 0, "explanation": "e1"},'
+        '{"prompt": "Q2?", "choices": ["a", "b", "c", "d"], "correct_index": 1, "explanation": "e2"},'
+        '{"prompt": "Q3?", "choices": ["a", "b", "c", "d"], "correct_index": 2, "explanation": "e3"}'
+        "]}"
+    )
+    outline = runner._parse_quiz_outline(raw, expected_questions=5)
+    assert len(outline.questions) == 5
+    assert outline.title == "Science Quiz"
+def test_parse_quiz_outline_trims_extra_questions():
+    runner = AgentRunner()
+    questions = ",".join(
+        f'{{"prompt": "Q{i}?", "choices": ["a","b","c","d"], "correct_index": 0, "explanation": ""}}'
+        for i in range(1, 8)
+    )
+    raw = f'{{"title": "Long", "questions": [{questions}]}}'
+    outline = runner._parse_quiz_outline(raw, expected_questions=5)
+    assert len(outline.questions) == 5
+def test_parse_quiz_outline_or_error_empty():
+    runner = AgentRunner()
+    outline, err = runner._parse_quiz_outline_or_error("", 5, None)
+    assert outline is None
+    assert "empty" in err.lower()
+def test_fallback_quiz_has_requested_count():
+    req = QuizMakerInput(topic="Fractions", grade="5", question_count=7)
+    outline = fallback_quiz(req)
+    assert len(outline.questions) == 7
+    assert "Fractions" in outline.title
+    assert all(len(q.choices) == 4 for q in outline.questions)
+def test_quiz_to_markdown_includes_answers():
+    outline = QuizOutline(
+        title="Test",
+        instructions="Read carefully.",
+        questions=[
+            QuizQuestion(
+                prompt="2+2?",
+                choices=["4", "3", "5", "6"],
+                correct_index=0,
+                explanation="Basic addition.",
+            ),
+            QuizQuestion(
+                prompt="3+3?",
+                choices=["6", "5", "7", "8"],
+                correct_index=0,
+                explanation="Also addition.",
+            ),
+            QuizQuestion(
+                prompt="1+1?",
+                choices=["2", "1", "3", "4"],
+                correct_index=0,
+                explanation="Easy.",
+            ),
+        ],
+    )
+    md = quiz_to_markdown(outline)
+    assert "2+2?" in md
+    assert "**Answer:** A" in md
+def test_create_quiz_docx_and_html(tmp_path: Path):
+    outline = QuizOutline(
+        title="Smoke Quiz",
+        instructions="Circle the best answer.",
+        questions=[
+            QuizQuestion(
+                prompt="Sample?",
+                choices=["Yes", "No", "Maybe", "Sometimes"],
+                correct_index=0,
+                explanation="Because.",
+            ),
+            QuizQuestion(
+                prompt="Another?",
+                choices=["A", "B", "C", "D"],
+                correct_index=2,
+                explanation="C is correct.",
+            ),
+            QuizQuestion(
+                prompt="Third?",
+                choices=["1", "2", "3", "4"],
+                correct_index=1,
+                explanation="Two.",
+            ),
+        ],
+    )
+    docx_path = tmp_path / "quiz.docx"
+    html_path = tmp_path / "quiz.html"
+    create_quiz_docx(outline, docx_path)
+    create_quiz_html(outline, html_path)
+    assert docx_path.stat().st_size > 100
+    html_text = html_path.read_text(encoding="utf-8")
+    assert "Smoke Quiz" in html_text
+    assert "Answer key" in html_text
+    paths = create_quiz(outline, tmp_path / "out", stem="worksheet")
+    assert paths["docx"].exists()
+    assert paths["html"].exists()

research/evals/configs/lm_eval_reasoning.yaml CHANGED Viewed

@@ -10,8 +10,8 @@ tasks:
   - arc_challenge
   - hellaswag
-num_fewshot: 5
-limit: 100
 seed: 42
 batch_size: auto
 device: auto

   - arc_challenge
   - hellaswag
+num_fewshot: null   # per-task canonical fewshot (gsm8k 5-shot, MC tasks 0-shot)
+limit: 200           # larger sample -> tighter stderr for gate decisions
 seed: 42
 batch_size: auto
 device: auto

research/evals/configs/lm_eval_science.yaml CHANGED Viewed

@@ -9,8 +9,8 @@ tasks:
   - openbookqa
   - arc_challenge
-num_fewshot: 0
-limit: 100
 seed: 42
 batch_size: auto
 device: auto

   - openbookqa
   - arc_challenge
+num_fewshot: null   # per-task canonical fewshot (sciq/openbookqa/arc 0-shot)
+limit: 200           # larger sample -> tighter stderr for gate decisions
 seed: 42
 batch_size: auto
 device: auto

research/modal/_common.py CHANGED Viewed

@@ -291,6 +291,31 @@ def primary_metric(task_metrics: dict[str, Any]) -> tuple[str, float] | None:
     return None
 def evaluate_gate(
     *,
     candidate: dict[str, Any],
@@ -429,6 +454,16 @@ def render_model_card(
     base_tasks = (baseline or {}).get("results", {})
     base_model = (training_payload or {}).get("model") or BASE_MODEL_ID
     lines = [
         "---",
         "library_name: peft",
@@ -445,7 +480,7 @@ def render_model_card(
         f"# {job['name']}",
         "",
         f"QLoRA adapter for **{job.get('category', 'general')}**, fine-tuned from "
-        f"`{base_model}` on `{job['dataset']}` (format: `{job['format']}`).",
         "",
         "Trained, evaluated, and gated on [Modal](https://modal.com/docs/guide) via "
         "`research/modal/` (app `slm-finetune-benchmark`).",
@@ -570,16 +605,24 @@ def publish_adapter_files(
     from huggingface_hub import HfApi
-    repo_id = publish_cfg["hub_repo"]
     private = publish_cfg.get("private", True)
     api = HfApi()
-    api.create_repo(repo_id=repo_id, repo_type="model", private=private, exist_ok=True)
-    api.upload_folder(
-        folder_path=str(adapter_path),
-        repo_id=repo_id,
-        repo_type="model",
-        commit_message=f"Publish {job['name']} (gate passed: {gate_result.get('task')})",
-    )
-    return {"published": True, "repo_id": repo_id, "url": f"https://huggingface.co/{repo_id}"}

     return None
+def baseline_is_cached(experiment_name: str, config_path: str) -> bool:
+    """True if a baseline results.json exists AND its run_meta still matches the
+    profile config's tasks/limit/num_fewshot. Config changes (e.g. new guard
+    tasks or a higher limit) therefore correctly force a fresh baseline."""
+    results = Path(LM_EVAL_OUTPUT) / experiment_name / "results.json"
+    if not results.is_file():
+        return False
+    candidates = [Path(config_path)]
+    if not Path(config_path).is_absolute():
+        candidates += [REPO_ROOT / config_path, Path("/repo") / config_path]
+    cfg_file = next((p for p in candidates if p.is_file()), None)
+    if cfg_file is None:
+        return False
+    try:
+        meta = json.loads(results.read_text()).get("run_meta", {})
+        cfg = yaml.safe_load(cfg_file.read_text()) or {}
+    except Exception:
+        return False
+    return (
+        sorted(meta.get("tasks") or []) == sorted(cfg.get("tasks") or [])
+        and meta.get("limit") == cfg.get("limit")
+        and meta.get("num_fewshot") == cfg.get("num_fewshot", 0)
+    )
 def evaluate_gate(
     *,
     candidate: dict[str, Any],
     base_tasks = (baseline or {}).get("results", {})
     base_model = (training_payload or {}).get("model") or BASE_MODEL_ID
+    # A job is either a single dataset (`dataset`/`format`) or a `mix:` of sources.
+    if job.get("mix"):
+        dataset_desc = " + ".join(
+            f"`{s.get('dataset', '?')}`" for s in job["mix"]
+        )
+        format_desc = "mix"
+    else:
+        dataset_desc = f"`{job.get('dataset', '?')}`"
+        format_desc = job.get("format", "?")
     lines = [
         "---",
         "library_name: peft",
         f"# {job['name']}",
         "",
         f"QLoRA adapter for **{job.get('category', 'general')}**, fine-tuned from "
+        f"`{base_model}` on {dataset_desc} (format: `{format_desc}`).",
         "",
         "Trained, evaluated, and gated on [Modal](https://modal.com/docs/guide) via "
         "`research/modal/` (app `slm-finetune-benchmark`).",
     from huggingface_hub import HfApi
+    repo_ids = [publish_cfg["hub_repo"], *(publish_cfg.get("mirror_repos") or [])]
     private = publish_cfg.get("private", True)
     api = HfApi()
+    uploads = []
+    for repo_id in dict.fromkeys(repo_ids):
+        api.create_repo(repo_id=repo_id, repo_type="model", private=private, exist_ok=True)
+        api.upload_folder(
+            folder_path=str(adapter_path),
+            repo_id=repo_id,
+            repo_type="model",
+            commit_message=f"Publish {job['name']} (gate passed: {gate_result.get('task')})",
+        )
+        uploads.append({"repo_id": repo_id, "url": f"https://huggingface.co/{repo_id}"})
+    return {
+        "published": True,
+        "repo_id": uploads[0]["repo_id"],
+        "url": uploads[0]["url"],
+        "uploads": uploads,
+    }

research/modal/experiments.yaml CHANGED Viewed

@@ -30,11 +30,26 @@ defaults:
 finetune:
   # --- teaching: lesson-planning agent chat data (Well-Tuned primary) ---
   - name: teaching-lora
     category: teaching
-    dataset: research/data/education-lesson-chat.jsonl
-    format: chat
-    description: Lesson-planning agent chat data (local)
     eval_profile: instructions
     goals:
       task: ifeval
@@ -42,6 +57,8 @@ finetune:
       min_improve: 0.02
     publish:
       hub_repo: MSGEncrypted/minicpm5-1b-teaching-lora
       private: false
   # --- science: factual + explanatory science tutoring ---
@@ -49,6 +66,9 @@ finetune:
     category: science
     dataset: research/data/science-tutor-chat.jsonl
     format: chat
     description: Science tutor Q&A chat data (local)
     eval_profile: science
     goals:
@@ -60,6 +80,8 @@ finetune:
           max_regress: 0.03
     publish:
       hub_repo: MSGEncrypted/minicpm5-1b-science-lora
       private: false
   # --- math: GSM8K/MATH natural-language CoT augmentation (MetaMathQA) ---
@@ -102,6 +124,8 @@ finetune:
           max_regress: 0.03
     publish:
       hub_repo: MSGEncrypted/minicpm5-1b-math-lora
       private: false
   # --- coding: Python instruction-following code generation ---
@@ -111,6 +135,9 @@ finetune:
     format: alpaca
     dataset_split: "train[:1000]"
     max_samples: 1000
     description: Python code instruction tuning (Hub, alpaca columns)
     eval_profile: code
     goals:
@@ -124,6 +151,8 @@ finetune:
           max_regress: 0.03
     publish:
       hub_repo: MSGEncrypted/minicpm5-1b-coding-lora
       private: false
   # --- reasoning: multi-turn chat with reasoning-heavy conversations ---
@@ -134,6 +163,9 @@ finetune:
     dataset_config: all
     dataset_split: "train[:500]"
     max_samples: 500
     description: Multi-turn reasoning/chat subset (Hub)
     eval_profile: reasoning
     goals:
@@ -145,6 +177,8 @@ finetune:
           max_regress: 0.03
     publish:
       hub_repo: MSGEncrypted/minicpm5-1b-reasoning-lora
       private: false
   # --- general instructions baseline: no goals/publish -> local-only adapter ---

 finetune:
   # --- teaching: lesson-planning agent chat data (Well-Tuned primary) ---
+  # 8 local lesson chats overfit easily; mix in alpaca replay + NEFTune so
+  # IFEval clears without washing out the lesson skill signal.
   - name: teaching-lora
     category: teaching
+    max_steps: 150
+    mix:
+      - dataset: research/data/education-lesson-chat.jsonl
+        format: chat
+        weight: 20                    # ~8 samples → ~160 examples
+      - dataset: tatsu-lab/alpaca      # instruction-following replay for ifeval
+        format: alpaca
+        dataset_split: "train[:600]"
+        max_samples: 600
+    args:
+      lora_r: 32
+      lora_alpha: 64
+      neftune_noise_alpha: 5
+      early_stopping_patience: 2   # keep best eval_loss checkpoint, not the last
+      val_split: 0.05
+    description: Lesson-planning chat + alpaca replay, r=32 + NEFTune
     eval_profile: instructions
     goals:
       task: ifeval
       min_improve: 0.02
     publish:
       hub_repo: MSGEncrypted/minicpm5-1b-teaching-lora
+      mirror_repos:
+        - build-small-hackathon/minicpm5-1b-teaching-lora
       private: false
   # --- science: factual + explanatory science tutoring ---
     category: science
     dataset: research/data/science-tutor-chat.jsonl
     format: chat
+    args:
+      early_stopping_patience: 2   # keep best eval_loss checkpoint, not the last
+      val_split: 0.05
     description: Science tutor Q&A chat data (local)
     eval_profile: science
     goals:
           max_regress: 0.03
     publish:
       hub_repo: MSGEncrypted/minicpm5-1b-science-lora
+      mirror_repos:
+        - build-small-hackathon/minicpm5-1b-science-lora
       private: false
   # --- math: GSM8K/MATH natural-language CoT augmentation (MetaMathQA) ---
           max_regress: 0.03
     publish:
       hub_repo: MSGEncrypted/minicpm5-1b-math-lora
+      mirror_repos:
+        - build-small-hackathon/minicpm5-1b-math-lora
       private: false
   # --- coding: Python instruction-following code generation ---
     format: alpaca
     dataset_split: "train[:1000]"
     max_samples: 1000
+    args:
+      early_stopping_patience: 2   # keep best eval_loss checkpoint, not the last
+      val_split: 0.05
     description: Python code instruction tuning (Hub, alpaca columns)
     eval_profile: code
     goals:
           max_regress: 0.03
     publish:
       hub_repo: MSGEncrypted/minicpm5-1b-coding-lora
+      mirror_repos:
+        - build-small-hackathon/minicpm5-1b-coding-lora
       private: false
   # --- reasoning: multi-turn chat with reasoning-heavy conversations ---
     dataset_config: all
     dataset_split: "train[:500]"
     max_samples: 500
+    args:
+      early_stopping_patience: 2   # keep best eval_loss checkpoint, not the last
+      val_split: 0.05
     description: Multi-turn reasoning/chat subset (Hub)
     eval_profile: reasoning
     goals:
           max_regress: 0.03
     publish:
       hub_repo: MSGEncrypted/minicpm5-1b-reasoning-lora
+      mirror_repos:
+        - build-small-hackathon/minicpm5-1b-reasoning-lora
       private: false
   # --- general instructions baseline: no goals/publish -> local-only adapter ---

research/modal/server_app.py CHANGED Viewed

@@ -53,6 +53,7 @@ from _common import (
     HF_CACHE_PATH,
     LM_EVAL_OUTPUT,
     apply_defaults,
     build_finetune_cmd,
     build_lm_eval_cmd,
     check_gate_files,
@@ -285,9 +286,15 @@ class GpuWorker:
         baselines_ok: dict[str, bool] = {}
         if not eval_only:
             for profile in profiles:
                 result = self.lm_eval.local(
-                    experiment_name=f"{preset}__baseline__{profile}",
-                    config=config_for_profile(profile),
                     preset=preset,
                 )
                 baselines_ok[profile] = bool(result.get("ok"))

     HF_CACHE_PATH,
     LM_EVAL_OUTPUT,
     apply_defaults,
+    baseline_is_cached,
     build_finetune_cmd,
     build_lm_eval_cmd,
     check_gate_files,
         baselines_ok: dict[str, bool] = {}
         if not eval_only:
             for profile in profiles:
+                exp = f"{preset}__baseline__{profile}"
+                cfg_path = config_for_profile(profile)
+                if baseline_is_cached(exp, cfg_path):
+                    print(f"baseline {exp}: reusing cached results (config unchanged)")
+                    baselines_ok[profile] = True
+                    continue
                 result = self.lm_eval.local(
+                    experiment_name=exp,
+                    config=cfg_path,
                     preset=preset,
                 )
                 baselines_ok[profile] = bool(result.get("ok"))

skills/quiz-maker/SKILL.md ADDED Viewed

	@@ -0,0 +1,29 @@

+---
+name: quiz-maker
+description: Create a multiple-choice quiz from a topic and grade level
+task: education
+tools:
+  - create_quiz
+model_hints:
+  - minicpm5-1b
+---
+# Quiz maker
+Generate a printable multiple-choice quiz (worksheet + answer key) for a topic and grade level.
+## Workflow
+1. Gather optional source context (web URLs, uploaded files, or session RAG).
+2. Produce a `QuizOutline` JSON object with 3–12 questions (typically 5–10).
+3. Export DOCX (student worksheet + answer key page) and HTML preview via `create_quiz`.
+## Output rules
+- Each question has exactly **4** choices labeled A–D.
+- Exactly one correct answer per question (`correct_index` 0–3).
+- Include a short explanation for each answer (teacher reference).
+- Grade-appropriate vocabulary and distractors.
+- Ground content in provided sources when available.
+See `references/mcq-format.md` for MCQ structure details.

skills/quiz-maker/references/mcq-format.md ADDED Viewed

	@@ -0,0 +1,22 @@

+# Multiple-choice format
+## Question structure
+Each question must include:
+- `prompt`: clear stem (one sentence or short paragraph)
+- `choices`: array of exactly **4** strings (no A/B/C/D prefixes in JSON)
+- `correct_index`: integer 0–3 (index into `choices`)
+- `explanation`: one or two sentences for teachers (why the answer is correct)
+## Distractors
+- All four options should be plausible at the target grade level.
+- Avoid "all of the above" / "none of the above" unless topic-specific.
+- Keep choice length similar within a question.
+## Quiz outline
+- `title`: quiz title (include topic and grade when helpful)
+- `instructions`: student-facing directions (e.g. "Circle the best answer.")
+- `questions`: 3–12 items; default request is 5–10