lesson-agent / .cursor /plans /skill_agent_pptx_5413e3c2.plan.md
MSGEncrypted's picture
skill education wip test
a31982f
|
Raw
History Blame
8.62 kB
metadata
name: Skill Agent PPTX
overview: >-
  Add a Hermes-style skill agent library on top of your existing
  TransformersBackend, with one education PowerPoint task as the first tab in a
  multi-tab Gradio Space — structured so more task tabs can be added later.
todos:
  - id: agent-lib
    content: >-
      Create libs/agent: SkillRegistry, ToolRegistry, AgentRunner,
      TraceRecorder, pydantic outline models
    status: completed
  - id: pptx-skill
    content: Add skills/education-pptx/SKILL.md and create_pptx tool (python-pptx)
    status: completed
  - id: gradio-tabs
    content: 'Refactor Gradio app into Tabs: Education PPTX (primary) + Chat (debug)'
    status: completed
  - id: docker-workspace
    content: Wire agent package into uv workspace, Dockerfile, models.yaml active_model
    status: completed
  - id: trace-demo
    content: Add trace JSON export + README demo script for hackathon submission
    status: completed
isProject: false

Skill Agent + Education PowerPoint (Phase 1)

Goal

Replace the generic chat demo with a local skill-based agent that uses your transformers presets (default: minicpm5-1b or openbmb/MiniCPM5-1B from models.yaml) to run a real workflow: topic in → slide outline → downloadable .pptx out.

This follows the agentskills.io / Hermes SKILL.md pattern without embedding the full Hermes runtime (too heavy for HF Space Docker).

Hackathon alignment: Backyard AI (teacher/tutor you know), Best Agent, Tiny Titan (≤4B), OpenBMB (MiniCPM), Sharing is Caring (publish trace JSON to Hub), optional Well-Tuned if you ship a fine-tuned preset later.


Architecture

flowchart TB
  subgraph ui [apps/gradio-space]
    Tabs[gr.Tabs]
    EduTab[EducationPptxTab]
    ChatTab[ChatTab optional later]
    Tabs --> EduTab
  end

  subgraph agent [libs/agent]
    Runner[AgentRunner]
    Skills[SkillRegistry]
    Tools[ToolRegistry]
    Trace[TraceRecorder]
    Runner --> Skills
    Runner --> Tools
    Runner --> Trace
  end

  subgraph inference [libs/inference]
    Factory[factory.get_backend]
    TF[TransformersBackend]
    Factory --> TF
  end

  EduTab --> Runner
  Runner --> Factory
  Tools --> PptxTool[python-pptx]

Agent loop (simple, reliable for small models):

  1. Load skill education-pptx from skills/education-pptx/SKILL.md
  2. User provides: topic, grade level, number of slides (3–8)
  3. Step A — LLM: generate structured slide outline (JSON schema in prompt; parse with fallback regex)
  4. Step B — Tool: create_pptx(outline) writes file via python-pptx (deterministic; no LLM needed for file bytes)
  5. Step C — LLM (optional): one-sentence “teacher notes” per slide
  6. Return: trace steps + gr.File download + markdown preview

Small models are weak at multi-hop tool JSON — keep the loop fixed 2-step (outline → tool) rather than open-ended ReAct for v1.


New package: libs/agent

Add workspace member libs/agent with:

Module Responsibility
skills.py Load SKILL.md (YAML frontmatter + body), list skills by task tag
tools.py Register callable tools with name, description, JSON schema
runner.py AgentRunner.run(skill_id, user_input, backend) — orchestrates LLM + tools
trace.py Append-only step log (thought, tool_call, tool_result, artifact)
prompts.py Skill-specific system prompts and JSON outline template

Dependencies: inference (workspace), python-pptx, pydantic (outline validation)

Extend libs/inference/src/inference/base.py usage only — no changes required to TransformersBackend beyond optionally bumping max_tokens for outline generation via env or per-call kwargs (already supported in transformers.py).


First skill: skills/education-pptx/SKILL.md

---
name: education-pptx
description: Create a short lesson PowerPoint from a topic and grade level
task: education
tools:
  - create_pptx
model_hints:
  - minicpm5-1b
  - qwen3b-gguf
---

## Workflow
1. Ask for topic, audience grade, slide count.
2. Produce JSON outline: title, slides[{title, bullets[], speaker_note}].
3. Call create_pptx with validated outline.
4. Return download link and preview.

First tool: create_pptx

Implement in libs/agent/src/agent/tools/pptx.py:

  • Input: Pydantic model SlideOutline (title, slides list)
  • Output: path under /tmp/agent_outputs/{run_id}.pptx (HF Space writable temp)
  • Simple template: title slide + bullet slides + optional speaker notes in notes field
  • No images in v1 (keeps scope shippable by June 15)

Gradio UI changes: apps/gradio-space/src/gradio_space/app.py

Refactor into:

gradio_space/
  app.py              # build_demo(), launch
  tabs/
    __init__.py
    education_pptx.py # first task tab
    chat.py           # move existing ChatInterface here (secondary tab)

Tab 1 — Lesson Slides (primary submission UI):

  • Inputs: Topic, Grade (dropdown), Slides (slider 3–8)
  • Button: Generate
  • Outputs: Markdown outline preview, File download, Agent trace (accordion)
  • Status line: model name + device from existing warmup() / model_status()

Tab 2 — Chat (keep for debugging): existing chat wired to same ACTIVE_MODEL

Use gr.Tabs() at top level; only Tab 1 needs polish for demo video.

Model default for Space: set active_model: minicpm5-1b in models.yaml (OpenBMB + Tiny Titan). Space hardware: GPU basic if transformers on CPU is too slow.


Trace export (Sharing is Caring badge)

After each run, write trace JSON to outputs/traces/{run_id}.json and expose a “Copy trace” / optional Hub dataset upload script:

  • scripts/upload_trace.py — pushes latest trace to a HF dataset repo (manual one-time setup; not required for v1 demo)

Trace schema (minimal):

{
  "skill": "education-pptx",
  "model": "minicpm5-1b",
  "input": {"topic": "...", "grade": "6", "slides": 5},
  "steps": [{"type": "llm", "prompt_hash": "...", "output": "..."}, {"type": "tool", "name": "create_pptx", "result": "..."}],
  "artifact": "lesson_photosynthesis.pptx"
}

Docker / workspace updates


Phase 2 (after v1 ships — not in first PR)

  • New tabs: tabs/quiz_maker.py, tabs/worksheet.py — each maps to a new skills/*/SKILL.md
  • SkillRegistry already supports multiple skills; tabs just call AgentRunner.run(skill_id=...)
  • Off-Brand: custom layout via gr.Blocks theming or gr.Server if time allows
  • Fine-tuned Gemma preset from notebook/gemma-finetune.ipynb for Well-Tuned badge

Demo video script (for submission)

  1. Introduce real user (teacher/tutor) and problem: “building a 5-slide lesson takes 30+ minutes”
  2. Enter topic + grade in Tab 1, click Generate (~30–90s on GPU)
  3. Show outline preview + download .pptx, open in LibreOffice/Google Slides
  4. Show agent trace proving local model + tool pipeline (no cloud LLM API)

Risks and mitigations

Risk Mitigation
Small model outputs invalid JSON Pydantic validate + one repair retry with “fix JSON only” prompt
CPU Space too slow Pin minicpm5-1b, use GPU basic, or fallback qwen3b-gguf + llama.cpp for outline-only step
pptx dependency size python-pptx is lightweight (~few MB)
Scope creep (many tabs) Ship Tab 1 only; stub Tab 2 chat for dev

Files to create / modify (summary)

Create:

  • libs/agent/ (package + runner, tools, skills loader, trace)
  • skills/education-pptx/SKILL.md
  • apps/gradio-space/src/gradio_space/tabs/education_pptx.py
  • apps/gradio-space/src/gradio_space/tabs/chat.py

Modify: