Docs: Add pipeline flow document
Browse filesAdd agent trace and pipeline architecture documentation to secure the "Sharing is Caring" hackathon badge.
- pipeline_flow.md +27 -0
pipeline_flow.md
ADDED
|
@@ -0,0 +1,27 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Agent Trace: Memory Keeper Pipeline Flow
|
| 2 |
+
|
| 3 |
+
This document outlines the end-to-end data flow and architectural pipeline for the Memory Keeper application.
|
| 4 |
+
|
| 5 |
+
## 📡 Pipeline Architecture
|
| 6 |
+
|
| 7 |
+
The system operates strictly "Off the Grid" using entirely open-weight models hosted via serverless functions. **No proprietary cloud LLM APIs (like OpenAI, Anthropic, or Gemini) are used.**
|
| 8 |
+
|
| 9 |
+
### Step 1: User Input (Hugging Face Spaces)
|
| 10 |
+
1. The user interacts with the completely custom `gradio.Server` interface running in a Docker container on Hugging Face Spaces.
|
| 11 |
+
2. The user uploads a photo, records an audio voice note, or types a text memory.
|
| 12 |
+
3. The FastAPI backend running on HF Spaces receives the multipart form data.
|
| 13 |
+
|
| 14 |
+
### Step 2: Parallel Perception (Modal Serverless)
|
| 15 |
+
The backend securely offloads the heavy perception tasks to transient Modal endpoints powered by A10G GPUs.
|
| 16 |
+
- **Audio Pipeline:** The audio file is sent to the `transcribe_audio` Modal endpoint. A serverless worker spins up, runs `openai/whisper-base` entirely locally, and returns the transcribed text.
|
| 17 |
+
- **Visual Pipeline:** The photo is sent to the `describe_photo` Modal endpoint. A worker runs `Salesforce/blip-image-captioning-base` to extract rich visual context and returns the semantic description.
|
| 18 |
+
|
| 19 |
+
### Step 3: Synthesis & Generation (Modal Serverless)
|
| 20 |
+
The extracted text transcripts, visual descriptions, and the user's historical profile data are gathered by the HF Spaces backend and sent to the core orchestrator.
|
| 21 |
+
- **LLM Pipeline:** The `build_memory_book` Modal endpoint spins up with `Qwen/Qwen2.5-7B-Instruct`.
|
| 22 |
+
- The LLM processes the raw contexts and generates structured JSON containing a chronological "Timeline", a narrative "Story", and a personalized "Letter".
|
| 23 |
+
|
| 24 |
+
### Step 4: Storage & Delivery
|
| 25 |
+
1. The HF Space backend receives the structured JSON from Qwen2.5.
|
| 26 |
+
2. The data is saved locally to the container's ephemeral storage (managed by an asynchronous 48-hour cleanup task).
|
| 27 |
+
3. The generated "Memory Book" is rendered beautifully in the custom HTML/CSS UI for the user to view and download.
|