pixxel-phantom commited on
Commit
d09063f
·
verified ·
1 Parent(s): cfa6771

Add training pipeline: SFT+GRPO notebook, multi-reward verifier, HF job script

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. .agents/skills.zip +3 -0
  2. .agents/skills/api-design/SKILL.md +523 -0
  3. .agents/skills/backend-patterns/SKILL.md +598 -0
  4. .agents/skills/brainstorming/SKILL.md +164 -0
  5. .agents/skills/brainstorming/scripts/frame-template.html +214 -0
  6. .agents/skills/brainstorming/scripts/helper.js +88 -0
  7. .agents/skills/brainstorming/scripts/server.cjs +354 -0
  8. .agents/skills/brainstorming/scripts/start-server.sh +148 -0
  9. .agents/skills/brainstorming/scripts/stop-server.sh +56 -0
  10. .agents/skills/brainstorming/spec-document-reviewer-prompt.md +49 -0
  11. .agents/skills/brainstorming/visual-companion.md +287 -0
  12. .agents/skills/caveman-commit/SKILL.md +65 -0
  13. .agents/skills/caveman-help/SKILL.md +59 -0
  14. .agents/skills/caveman-review/SKILL.md +55 -0
  15. .agents/skills/caveman/SKILL.md +67 -0
  16. .agents/skills/deep-research/SKILL.md +155 -0
  17. .agents/skills/dispatching-parallel-agents/SKILL.md +182 -0
  18. .agents/skills/documentation-lookup/SKILL.md +90 -0
  19. .agents/skills/e2e-testing/SKILL.md +326 -0
  20. .agents/skills/eval-harness/SKILL.md +270 -0
  21. .agents/skills/executing-plans/SKILL.md +70 -0
  22. .agents/skills/finishing-a-development-branch/SKILL.md +200 -0
  23. .agents/skills/frontend-slides/SKILL.md +184 -0
  24. .agents/skills/frontend-slides/STYLE_PRESETS.md +330 -0
  25. .agents/skills/karpathy-guidelines/SKILL.md +67 -0
  26. .agents/skills/openenv-cli/SKILL.md +18 -0
  27. .agents/skills/python-testing/SKILL.md +816 -0
  28. .agents/skills/receiving-code-review/SKILL.md +213 -0
  29. .agents/skills/requesting-code-review/SKILL.md +105 -0
  30. .agents/skills/requesting-code-review/code-reviewer.md +146 -0
  31. .agents/skills/search-first/SKILL.md +161 -0
  32. .agents/skills/security-review/SKILL.md +495 -0
  33. .agents/skills/security-review/cloud-infrastructure-security.md +361 -0
  34. .agents/skills/subagent-driven-development/SKILL.md +277 -0
  35. .agents/skills/subagent-driven-development/code-quality-reviewer-prompt.md +26 -0
  36. .agents/skills/subagent-driven-development/implementer-prompt.md +113 -0
  37. .agents/skills/subagent-driven-development/spec-reviewer-prompt.md +61 -0
  38. .agents/skills/systematic-debugging/CREATION-LOG.md +119 -0
  39. .agents/skills/systematic-debugging/SKILL.md +296 -0
  40. .agents/skills/systematic-debugging/condition-based-waiting-example.ts +158 -0
  41. .agents/skills/systematic-debugging/condition-based-waiting.md +115 -0
  42. .agents/skills/systematic-debugging/defense-in-depth.md +122 -0
  43. .agents/skills/systematic-debugging/find-polluter.sh +63 -0
  44. .agents/skills/systematic-debugging/root-cause-tracing.md +169 -0
  45. .agents/skills/systematic-debugging/test-academic.md +14 -0
  46. .agents/skills/systematic-debugging/test-pressure-1.md +58 -0
  47. .agents/skills/systematic-debugging/test-pressure-2.md +68 -0
  48. .agents/skills/systematic-debugging/test-pressure-3.md +69 -0
  49. .agents/skills/tdd-workflow/SKILL.md +463 -0
  50. .agents/skills/test-driven-development/SKILL.md +371 -0
.agents/skills.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:393a1294a2c39475ff04ef4dc714f4b41ef669c5b591f902518c91bed9d01048
3
+ size 194360
.agents/skills/api-design/SKILL.md ADDED
@@ -0,0 +1,523 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ name: api-design
3
+ description: REST API design patterns including resource naming, status codes, pagination, filtering, error responses, versioning, and rate limiting for production APIs.
4
+ origin: ECC
5
+ ---
6
+
7
+ # API Design Patterns
8
+
9
+ Conventions and best practices for designing consistent, developer-friendly REST APIs.
10
+
11
+ ## When to Activate
12
+
13
+ - Designing new API endpoints
14
+ - Reviewing existing API contracts
15
+ - Adding pagination, filtering, or sorting
16
+ - Implementing error handling for APIs
17
+ - Planning API versioning strategy
18
+ - Building public or partner-facing APIs
19
+
20
+ ## Resource Design
21
+
22
+ ### URL Structure
23
+
24
+ ```
25
+ # Resources are nouns, plural, lowercase, kebab-case
26
+ GET /api/v1/users
27
+ GET /api/v1/users/:id
28
+ POST /api/v1/users
29
+ PUT /api/v1/users/:id
30
+ PATCH /api/v1/users/:id
31
+ DELETE /api/v1/users/:id
32
+
33
+ # Sub-resources for relationships
34
+ GET /api/v1/users/:id/orders
35
+ POST /api/v1/users/:id/orders
36
+
37
+ # Actions that don't map to CRUD (use verbs sparingly)
38
+ POST /api/v1/orders/:id/cancel
39
+ POST /api/v1/auth/login
40
+ POST /api/v1/auth/refresh
41
+ ```
42
+
43
+ ### Naming Rules
44
+
45
+ ```
46
+ # GOOD
47
+ /api/v1/team-members # kebab-case for multi-word resources
48
+ /api/v1/orders?status=active # query params for filtering
49
+ /api/v1/users/123/orders # nested resources for ownership
50
+
51
+ # BAD
52
+ /api/v1/getUsers # verb in URL
53
+ /api/v1/user # singular (use plural)
54
+ /api/v1/team_members # snake_case in URLs
55
+ /api/v1/users/123/getOrders # verb in nested resource
56
+ ```
57
+
58
+ ## HTTP Methods and Status Codes
59
+
60
+ ### Method Semantics
61
+
62
+ | Method | Idempotent | Safe | Use For |
63
+ |--------|-----------|------|---------|
64
+ | GET | Yes | Yes | Retrieve resources |
65
+ | POST | No | No | Create resources, trigger actions |
66
+ | PUT | Yes | No | Full replacement of a resource |
67
+ | PATCH | No* | No | Partial update of a resource |
68
+ | DELETE | Yes | No | Remove a resource |
69
+
70
+ *PATCH can be made idempotent with proper implementation
71
+
72
+ ### Status Code Reference
73
+
74
+ ```
75
+ # Success
76
+ 200 OK — GET, PUT, PATCH (with response body)
77
+ 201 Created — POST (include Location header)
78
+ 204 No Content — DELETE, PUT (no response body)
79
+
80
+ # Client Errors
81
+ 400 Bad Request — Validation failure, malformed JSON
82
+ 401 Unauthorized — Missing or invalid authentication
83
+ 403 Forbidden — Authenticated but not authorized
84
+ 404 Not Found — Resource doesn't exist
85
+ 409 Conflict — Duplicate entry, state conflict
86
+ 422 Unprocessable Entity — Semantically invalid (valid JSON, bad data)
87
+ 429 Too Many Requests — Rate limit exceeded
88
+
89
+ # Server Errors
90
+ 500 Internal Server Error — Unexpected failure (never expose details)
91
+ 502 Bad Gateway — Upstream service failed
92
+ 503 Service Unavailable — Temporary overload, include Retry-After
93
+ ```
94
+
95
+ ### Common Mistakes
96
+
97
+ ```
98
+ # BAD: 200 for everything
99
+ { "status": 200, "success": false, "error": "Not found" }
100
+
101
+ # GOOD: Use HTTP status codes semantically
102
+ HTTP/1.1 404 Not Found
103
+ { "error": { "code": "not_found", "message": "User not found" } }
104
+
105
+ # BAD: 500 for validation errors
106
+ # GOOD: 400 or 422 with field-level details
107
+
108
+ # BAD: 200 for created resources
109
+ # GOOD: 201 with Location header
110
+ HTTP/1.1 201 Created
111
+ Location: /api/v1/users/abc-123
112
+ ```
113
+
114
+ ## Response Format
115
+
116
+ ### Success Response
117
+
118
+ ```json
119
+ {
120
+ "data": {
121
+ "id": "abc-123",
122
+ "email": "alice@example.com",
123
+ "name": "Alice",
124
+ "created_at": "2025-01-15T10:30:00Z"
125
+ }
126
+ }
127
+ ```
128
+
129
+ ### Collection Response (with Pagination)
130
+
131
+ ```json
132
+ {
133
+ "data": [
134
+ { "id": "abc-123", "name": "Alice" },
135
+ { "id": "def-456", "name": "Bob" }
136
+ ],
137
+ "meta": {
138
+ "total": 142,
139
+ "page": 1,
140
+ "per_page": 20,
141
+ "total_pages": 8
142
+ },
143
+ "links": {
144
+ "self": "/api/v1/users?page=1&per_page=20",
145
+ "next": "/api/v1/users?page=2&per_page=20",
146
+ "last": "/api/v1/users?page=8&per_page=20"
147
+ }
148
+ }
149
+ ```
150
+
151
+ ### Error Response
152
+
153
+ ```json
154
+ {
155
+ "error": {
156
+ "code": "validation_error",
157
+ "message": "Request validation failed",
158
+ "details": [
159
+ {
160
+ "field": "email",
161
+ "message": "Must be a valid email address",
162
+ "code": "invalid_format"
163
+ },
164
+ {
165
+ "field": "age",
166
+ "message": "Must be between 0 and 150",
167
+ "code": "out_of_range"
168
+ }
169
+ ]
170
+ }
171
+ }
172
+ ```
173
+
174
+ ### Response Envelope Variants
175
+
176
+ ```typescript
177
+ // Option A: Envelope with data wrapper (recommended for public APIs)
178
+ interface ApiResponse<T> {
179
+ data: T;
180
+ meta?: PaginationMeta;
181
+ links?: PaginationLinks;
182
+ }
183
+
184
+ interface ApiError {
185
+ error: {
186
+ code: string;
187
+ message: string;
188
+ details?: FieldError[];
189
+ };
190
+ }
191
+
192
+ // Option B: Flat response (simpler, common for internal APIs)
193
+ // Success: just return the resource directly
194
+ // Error: return error object
195
+ // Distinguish by HTTP status code
196
+ ```
197
+
198
+ ## Pagination
199
+
200
+ ### Offset-Based (Simple)
201
+
202
+ ```
203
+ GET /api/v1/users?page=2&per_page=20
204
+
205
+ # Implementation
206
+ SELECT * FROM users
207
+ ORDER BY created_at DESC
208
+ LIMIT 20 OFFSET 20;
209
+ ```
210
+
211
+ **Pros:** Easy to implement, supports "jump to page N"
212
+ **Cons:** Slow on large offsets (OFFSET 100000), inconsistent with concurrent inserts
213
+
214
+ ### Cursor-Based (Scalable)
215
+
216
+ ```
217
+ GET /api/v1/users?cursor=eyJpZCI6MTIzfQ&limit=20
218
+
219
+ # Implementation
220
+ SELECT * FROM users
221
+ WHERE id > :cursor_id
222
+ ORDER BY id ASC
223
+ LIMIT 21; -- fetch one extra to determine has_next
224
+ ```
225
+
226
+ ```json
227
+ {
228
+ "data": [...],
229
+ "meta": {
230
+ "has_next": true,
231
+ "next_cursor": "eyJpZCI6MTQzfQ"
232
+ }
233
+ }
234
+ ```
235
+
236
+ **Pros:** Consistent performance regardless of position, stable with concurrent inserts
237
+ **Cons:** Cannot jump to arbitrary page, cursor is opaque
238
+
239
+ ### When to Use Which
240
+
241
+ | Use Case | Pagination Type |
242
+ |----------|----------------|
243
+ | Admin dashboards, small datasets (<10K) | Offset |
244
+ | Infinite scroll, feeds, large datasets | Cursor |
245
+ | Public APIs | Cursor (default) with offset (optional) |
246
+ | Search results | Offset (users expect page numbers) |
247
+
248
+ ## Filtering, Sorting, and Search
249
+
250
+ ### Filtering
251
+
252
+ ```
253
+ # Simple equality
254
+ GET /api/v1/orders?status=active&customer_id=abc-123
255
+
256
+ # Comparison operators (use bracket notation)
257
+ GET /api/v1/products?price[gte]=10&price[lte]=100
258
+ GET /api/v1/orders?created_at[after]=2025-01-01
259
+
260
+ # Multiple values (comma-separated)
261
+ GET /api/v1/products?category=electronics,clothing
262
+
263
+ # Nested fields (dot notation)
264
+ GET /api/v1/orders?customer.country=US
265
+ ```
266
+
267
+ ### Sorting
268
+
269
+ ```
270
+ # Single field (prefix - for descending)
271
+ GET /api/v1/products?sort=-created_at
272
+
273
+ # Multiple fields (comma-separated)
274
+ GET /api/v1/products?sort=-featured,price,-created_at
275
+ ```
276
+
277
+ ### Full-Text Search
278
+
279
+ ```
280
+ # Search query parameter
281
+ GET /api/v1/products?q=wireless+headphones
282
+
283
+ # Field-specific search
284
+ GET /api/v1/users?email=alice
285
+ ```
286
+
287
+ ### Sparse Fieldsets
288
+
289
+ ```
290
+ # Return only specified fields (reduces payload)
291
+ GET /api/v1/users?fields=id,name,email
292
+ GET /api/v1/orders?fields=id,total,status&include=customer.name
293
+ ```
294
+
295
+ ## Authentication and Authorization
296
+
297
+ ### Token-Based Auth
298
+
299
+ ```
300
+ # Bearer token in Authorization header
301
+ GET /api/v1/users
302
+ Authorization: Bearer eyJhbGciOiJIUzI1NiIs...
303
+
304
+ # API key (for server-to-server)
305
+ GET /api/v1/data
306
+ X-API-Key: sk_live_abc123
307
+ ```
308
+
309
+ ### Authorization Patterns
310
+
311
+ ```typescript
312
+ // Resource-level: check ownership
313
+ app.get("/api/v1/orders/:id", async (req, res) => {
314
+ const order = await Order.findById(req.params.id);
315
+ if (!order) return res.status(404).json({ error: { code: "not_found" } });
316
+ if (order.userId !== req.user.id) return res.status(403).json({ error: { code: "forbidden" } });
317
+ return res.json({ data: order });
318
+ });
319
+
320
+ // Role-based: check permissions
321
+ app.delete("/api/v1/users/:id", requireRole("admin"), async (req, res) => {
322
+ await User.delete(req.params.id);
323
+ return res.status(204).send();
324
+ });
325
+ ```
326
+
327
+ ## Rate Limiting
328
+
329
+ ### Headers
330
+
331
+ ```
332
+ HTTP/1.1 200 OK
333
+ X-RateLimit-Limit: 100
334
+ X-RateLimit-Remaining: 95
335
+ X-RateLimit-Reset: 1640000000
336
+
337
+ # When exceeded
338
+ HTTP/1.1 429 Too Many Requests
339
+ Retry-After: 60
340
+ {
341
+ "error": {
342
+ "code": "rate_limit_exceeded",
343
+ "message": "Rate limit exceeded. Try again in 60 seconds."
344
+ }
345
+ }
346
+ ```
347
+
348
+ ### Rate Limit Tiers
349
+
350
+ | Tier | Limit | Window | Use Case |
351
+ |------|-------|--------|----------|
352
+ | Anonymous | 30/min | Per IP | Public endpoints |
353
+ | Authenticated | 100/min | Per user | Standard API access |
354
+ | Premium | 1000/min | Per API key | Paid API plans |
355
+ | Internal | 10000/min | Per service | Service-to-service |
356
+
357
+ ## Versioning
358
+
359
+ ### URL Path Versioning (Recommended)
360
+
361
+ ```
362
+ /api/v1/users
363
+ /api/v2/users
364
+ ```
365
+
366
+ **Pros:** Explicit, easy to route, cacheable
367
+ **Cons:** URL changes between versions
368
+
369
+ ### Header Versioning
370
+
371
+ ```
372
+ GET /api/users
373
+ Accept: application/vnd.myapp.v2+json
374
+ ```
375
+
376
+ **Pros:** Clean URLs
377
+ **Cons:** Harder to test, easy to forget
378
+
379
+ ### Versioning Strategy
380
+
381
+ ```
382
+ 1. Start with /api/v1/ — don't version until you need to
383
+ 2. Maintain at most 2 active versions (current + previous)
384
+ 3. Deprecation timeline:
385
+ - Announce deprecation (6 months notice for public APIs)
386
+ - Add Sunset header: Sunset: Sat, 01 Jan 2026 00:00:00 GMT
387
+ - Return 410 Gone after sunset date
388
+ 4. Non-breaking changes don't need a new version:
389
+ - Adding new fields to responses
390
+ - Adding new optional query parameters
391
+ - Adding new endpoints
392
+ 5. Breaking changes require a new version:
393
+ - Removing or renaming fields
394
+ - Changing field types
395
+ - Changing URL structure
396
+ - Changing authentication method
397
+ ```
398
+
399
+ ## Implementation Patterns
400
+
401
+ ### TypeScript (Next.js API Route)
402
+
403
+ ```typescript
404
+ import { z } from "zod";
405
+ import { NextRequest, NextResponse } from "next/server";
406
+
407
+ const createUserSchema = z.object({
408
+ email: z.string().email(),
409
+ name: z.string().min(1).max(100),
410
+ });
411
+
412
+ export async function POST(req: NextRequest) {
413
+ const body = await req.json();
414
+ const parsed = createUserSchema.safeParse(body);
415
+
416
+ if (!parsed.success) {
417
+ return NextResponse.json({
418
+ error: {
419
+ code: "validation_error",
420
+ message: "Request validation failed",
421
+ details: parsed.error.issues.map(i => ({
422
+ field: i.path.join("."),
423
+ message: i.message,
424
+ code: i.code,
425
+ })),
426
+ },
427
+ }, { status: 422 });
428
+ }
429
+
430
+ const user = await createUser(parsed.data);
431
+
432
+ return NextResponse.json(
433
+ { data: user },
434
+ {
435
+ status: 201,
436
+ headers: { Location: `/api/v1/users/${user.id}` },
437
+ },
438
+ );
439
+ }
440
+ ```
441
+
442
+ ### Python (Django REST Framework)
443
+
444
+ ```python
445
+ from rest_framework import serializers, viewsets, status
446
+ from rest_framework.response import Response
447
+
448
+ class CreateUserSerializer(serializers.Serializer):
449
+ email = serializers.EmailField()
450
+ name = serializers.CharField(max_length=100)
451
+
452
+ class UserSerializer(serializers.ModelSerializer):
453
+ class Meta:
454
+ model = User
455
+ fields = ["id", "email", "name", "created_at"]
456
+
457
+ class UserViewSet(viewsets.ModelViewSet):
458
+ serializer_class = UserSerializer
459
+ permission_classes = [IsAuthenticated]
460
+
461
+ def get_serializer_class(self):
462
+ if self.action == "create":
463
+ return CreateUserSerializer
464
+ return UserSerializer
465
+
466
+ def create(self, request):
467
+ serializer = CreateUserSerializer(data=request.data)
468
+ serializer.is_valid(raise_exception=True)
469
+ user = UserService.create(**serializer.validated_data)
470
+ return Response(
471
+ {"data": UserSerializer(user).data},
472
+ status=status.HTTP_201_CREATED,
473
+ headers={"Location": f"/api/v1/users/{user.id}"},
474
+ )
475
+ ```
476
+
477
+ ### Go (net/http)
478
+
479
+ ```go
480
+ func (h *UserHandler) CreateUser(w http.ResponseWriter, r *http.Request) {
481
+ var req CreateUserRequest
482
+ if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
483
+ writeError(w, http.StatusBadRequest, "invalid_json", "Invalid request body")
484
+ return
485
+ }
486
+
487
+ if err := req.Validate(); err != nil {
488
+ writeError(w, http.StatusUnprocessableEntity, "validation_error", err.Error())
489
+ return
490
+ }
491
+
492
+ user, err := h.service.Create(r.Context(), req)
493
+ if err != nil {
494
+ switch {
495
+ case errors.Is(err, domain.ErrEmailTaken):
496
+ writeError(w, http.StatusConflict, "email_taken", "Email already registered")
497
+ default:
498
+ writeError(w, http.StatusInternalServerError, "internal_error", "Internal error")
499
+ }
500
+ return
501
+ }
502
+
503
+ w.Header().Set("Location", fmt.Sprintf("/api/v1/users/%s", user.ID))
504
+ writeJSON(w, http.StatusCreated, map[string]any{"data": user})
505
+ }
506
+ ```
507
+
508
+ ## API Design Checklist
509
+
510
+ Before shipping a new endpoint:
511
+
512
+ - [ ] Resource URL follows naming conventions (plural, kebab-case, no verbs)
513
+ - [ ] Correct HTTP method used (GET for reads, POST for creates, etc.)
514
+ - [ ] Appropriate status codes returned (not 200 for everything)
515
+ - [ ] Input validated with schema (Zod, Pydantic, Bean Validation)
516
+ - [ ] Error responses follow standard format with codes and messages
517
+ - [ ] Pagination implemented for list endpoints (cursor or offset)
518
+ - [ ] Authentication required (or explicitly marked as public)
519
+ - [ ] Authorization checked (user can only access their own resources)
520
+ - [ ] Rate limiting configured
521
+ - [ ] Response does not leak internal details (stack traces, SQL errors)
522
+ - [ ] Consistent naming with existing endpoints (camelCase vs snake_case)
523
+ - [ ] Documented (OpenAPI/Swagger spec updated)
.agents/skills/backend-patterns/SKILL.md ADDED
@@ -0,0 +1,598 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ name: backend-patterns
3
+ description: Backend architecture patterns, API design, database optimization, and server-side best practices for Node.js, Express, and Next.js API routes.
4
+ origin: ECC
5
+ ---
6
+
7
+ # Backend Development Patterns
8
+
9
+ Backend architecture patterns and best practices for scalable server-side applications.
10
+
11
+ ## When to Activate
12
+
13
+ - Designing REST or GraphQL API endpoints
14
+ - Implementing repository, service, or controller layers
15
+ - Optimizing database queries (N+1, indexing, connection pooling)
16
+ - Adding caching (Redis, in-memory, HTTP cache headers)
17
+ - Setting up background jobs or async processing
18
+ - Structuring error handling and validation for APIs
19
+ - Building middleware (auth, logging, rate limiting)
20
+
21
+ ## API Design Patterns
22
+
23
+ ### RESTful API Structure
24
+
25
+ ```typescript
26
+ // PASS: Resource-based URLs
27
+ GET /api/markets # List resources
28
+ GET /api/markets/:id # Get single resource
29
+ POST /api/markets # Create resource
30
+ PUT /api/markets/:id # Replace resource
31
+ PATCH /api/markets/:id # Update resource
32
+ DELETE /api/markets/:id # Delete resource
33
+
34
+ // PASS: Query parameters for filtering, sorting, pagination
35
+ GET /api/markets?status=active&sort=volume&limit=20&offset=0
36
+ ```
37
+
38
+ ### Repository Pattern
39
+
40
+ ```typescript
41
+ // Abstract data access logic
42
+ interface MarketRepository {
43
+ findAll(filters?: MarketFilters): Promise<Market[]>
44
+ findById(id: string): Promise<Market | null>
45
+ create(data: CreateMarketDto): Promise<Market>
46
+ update(id: string, data: UpdateMarketDto): Promise<Market>
47
+ delete(id: string): Promise<void>
48
+ }
49
+
50
+ class SupabaseMarketRepository implements MarketRepository {
51
+ async findAll(filters?: MarketFilters): Promise<Market[]> {
52
+ let query = supabase.from('markets').select('*')
53
+
54
+ if (filters?.status) {
55
+ query = query.eq('status', filters.status)
56
+ }
57
+
58
+ if (filters?.limit) {
59
+ query = query.limit(filters.limit)
60
+ }
61
+
62
+ const { data, error } = await query
63
+
64
+ if (error) throw new Error(error.message)
65
+ return data
66
+ }
67
+
68
+ // Other methods...
69
+ }
70
+ ```
71
+
72
+ ### Service Layer Pattern
73
+
74
+ ```typescript
75
+ // Business logic separated from data access
76
+ class MarketService {
77
+ constructor(private marketRepo: MarketRepository) {}
78
+
79
+ async searchMarkets(query: string, limit: number = 10): Promise<Market[]> {
80
+ // Business logic
81
+ const embedding = await generateEmbedding(query)
82
+ const results = await this.vectorSearch(embedding, limit)
83
+
84
+ // Fetch full data
85
+ const markets = await this.marketRepo.findByIds(results.map(r => r.id))
86
+
87
+ // Sort by similarity
88
+ return markets.sort((a, b) => {
89
+ const scoreA = results.find(r => r.id === a.id)?.score || 0
90
+ const scoreB = results.find(r => r.id === b.id)?.score || 0
91
+ return scoreA - scoreB
92
+ })
93
+ }
94
+
95
+ private async vectorSearch(embedding: number[], limit: number) {
96
+ // Vector search implementation
97
+ }
98
+ }
99
+ ```
100
+
101
+ ### Middleware Pattern
102
+
103
+ ```typescript
104
+ // Request/response processing pipeline
105
+ export function withAuth(handler: NextApiHandler): NextApiHandler {
106
+ return async (req, res) => {
107
+ const token = req.headers.authorization?.replace('Bearer ', '')
108
+
109
+ if (!token) {
110
+ return res.status(401).json({ error: 'Unauthorized' })
111
+ }
112
+
113
+ try {
114
+ const user = await verifyToken(token)
115
+ req.user = user
116
+ return handler(req, res)
117
+ } catch (error) {
118
+ return res.status(401).json({ error: 'Invalid token' })
119
+ }
120
+ }
121
+ }
122
+
123
+ // Usage
124
+ export default withAuth(async (req, res) => {
125
+ // Handler has access to req.user
126
+ })
127
+ ```
128
+
129
+ ## Database Patterns
130
+
131
+ ### Query Optimization
132
+
133
+ ```typescript
134
+ // PASS: GOOD: Select only needed columns
135
+ const { data } = await supabase
136
+ .from('markets')
137
+ .select('id, name, status, volume')
138
+ .eq('status', 'active')
139
+ .order('volume', { ascending: false })
140
+ .limit(10)
141
+
142
+ // FAIL: BAD: Select everything
143
+ const { data } = await supabase
144
+ .from('markets')
145
+ .select('*')
146
+ ```
147
+
148
+ ### N+1 Query Prevention
149
+
150
+ ```typescript
151
+ // FAIL: BAD: N+1 query problem
152
+ const markets = await getMarkets()
153
+ for (const market of markets) {
154
+ market.creator = await getUser(market.creator_id) // N queries
155
+ }
156
+
157
+ // PASS: GOOD: Batch fetch
158
+ const markets = await getMarkets()
159
+ const creatorIds = markets.map(m => m.creator_id)
160
+ const creators = await getUsers(creatorIds) // 1 query
161
+ const creatorMap = new Map(creators.map(c => [c.id, c]))
162
+
163
+ markets.forEach(market => {
164
+ market.creator = creatorMap.get(market.creator_id)
165
+ })
166
+ ```
167
+
168
+ ### Transaction Pattern
169
+
170
+ ```typescript
171
+ async function createMarketWithPosition(
172
+ marketData: CreateMarketDto,
173
+ positionData: CreatePositionDto
174
+ ) {
175
+ // Use Supabase transaction
176
+ const { data, error } = await supabase.rpc('create_market_with_position', {
177
+ market_data: marketData,
178
+ position_data: positionData
179
+ })
180
+
181
+ if (error) throw new Error('Transaction failed')
182
+ return data
183
+ }
184
+
185
+ // SQL function in Supabase
186
+ CREATE OR REPLACE FUNCTION create_market_with_position(
187
+ market_data jsonb,
188
+ position_data jsonb
189
+ )
190
+ RETURNS jsonb
191
+ LANGUAGE plpgsql
192
+ AS $$
193
+ BEGIN
194
+ -- Start transaction automatically
195
+ INSERT INTO markets VALUES (market_data);
196
+ INSERT INTO positions VALUES (position_data);
197
+ RETURN jsonb_build_object('success', true);
198
+ EXCEPTION
199
+ WHEN OTHERS THEN
200
+ -- Rollback happens automatically
201
+ RETURN jsonb_build_object('success', false, 'error', SQLERRM);
202
+ END;
203
+ $$;
204
+ ```
205
+
206
+ ## Caching Strategies
207
+
208
+ ### Redis Caching Layer
209
+
210
+ ```typescript
211
+ class CachedMarketRepository implements MarketRepository {
212
+ constructor(
213
+ private baseRepo: MarketRepository,
214
+ private redis: RedisClient
215
+ ) {}
216
+
217
+ async findById(id: string): Promise<Market | null> {
218
+ // Check cache first
219
+ const cached = await this.redis.get(`market:${id}`)
220
+
221
+ if (cached) {
222
+ return JSON.parse(cached)
223
+ }
224
+
225
+ // Cache miss - fetch from database
226
+ const market = await this.baseRepo.findById(id)
227
+
228
+ if (market) {
229
+ // Cache for 5 minutes
230
+ await this.redis.setex(`market:${id}`, 300, JSON.stringify(market))
231
+ }
232
+
233
+ return market
234
+ }
235
+
236
+ async invalidateCache(id: string): Promise<void> {
237
+ await this.redis.del(`market:${id}`)
238
+ }
239
+ }
240
+ ```
241
+
242
+ ### Cache-Aside Pattern
243
+
244
+ ```typescript
245
+ async function getMarketWithCache(id: string): Promise<Market> {
246
+ const cacheKey = `market:${id}`
247
+
248
+ // Try cache
249
+ const cached = await redis.get(cacheKey)
250
+ if (cached) return JSON.parse(cached)
251
+
252
+ // Cache miss - fetch from DB
253
+ const market = await db.markets.findUnique({ where: { id } })
254
+
255
+ if (!market) throw new Error('Market not found')
256
+
257
+ // Update cache
258
+ await redis.setex(cacheKey, 300, JSON.stringify(market))
259
+
260
+ return market
261
+ }
262
+ ```
263
+
264
+ ## Error Handling Patterns
265
+
266
+ ### Centralized Error Handler
267
+
268
+ ```typescript
269
+ class ApiError extends Error {
270
+ constructor(
271
+ public statusCode: number,
272
+ public message: string,
273
+ public isOperational = true
274
+ ) {
275
+ super(message)
276
+ Object.setPrototypeOf(this, ApiError.prototype)
277
+ }
278
+ }
279
+
280
+ export function errorHandler(error: unknown, req: Request): Response {
281
+ if (error instanceof ApiError) {
282
+ return NextResponse.json({
283
+ success: false,
284
+ error: error.message
285
+ }, { status: error.statusCode })
286
+ }
287
+
288
+ if (error instanceof z.ZodError) {
289
+ return NextResponse.json({
290
+ success: false,
291
+ error: 'Validation failed',
292
+ details: error.errors
293
+ }, { status: 400 })
294
+ }
295
+
296
+ // Log unexpected errors
297
+ console.error('Unexpected error:', error)
298
+
299
+ return NextResponse.json({
300
+ success: false,
301
+ error: 'Internal server error'
302
+ }, { status: 500 })
303
+ }
304
+
305
+ // Usage
306
+ export async function GET(request: Request) {
307
+ try {
308
+ const data = await fetchData()
309
+ return NextResponse.json({ success: true, data })
310
+ } catch (error) {
311
+ return errorHandler(error, request)
312
+ }
313
+ }
314
+ ```
315
+
316
+ ### Retry with Exponential Backoff
317
+
318
+ ```typescript
319
+ async function fetchWithRetry<T>(
320
+ fn: () => Promise<T>,
321
+ maxRetries = 3
322
+ ): Promise<T> {
323
+ let lastError: Error
324
+
325
+ for (let i = 0; i < maxRetries; i++) {
326
+ try {
327
+ return await fn()
328
+ } catch (error) {
329
+ lastError = error as Error
330
+
331
+ if (i < maxRetries - 1) {
332
+ // Exponential backoff: 1s, 2s, 4s
333
+ const delay = Math.pow(2, i) * 1000
334
+ await new Promise(resolve => setTimeout(resolve, delay))
335
+ }
336
+ }
337
+ }
338
+
339
+ throw lastError!
340
+ }
341
+
342
+ // Usage
343
+ const data = await fetchWithRetry(() => fetchFromAPI())
344
+ ```
345
+
346
+ ## Authentication & Authorization
347
+
348
+ ### JWT Token Validation
349
+
350
+ ```typescript
351
+ import jwt from 'jsonwebtoken'
352
+
353
+ interface JWTPayload {
354
+ userId: string
355
+ email: string
356
+ role: 'admin' | 'user'
357
+ }
358
+
359
+ export function verifyToken(token: string): JWTPayload {
360
+ try {
361
+ const payload = jwt.verify(token, process.env.JWT_SECRET!) as JWTPayload
362
+ return payload
363
+ } catch (error) {
364
+ throw new ApiError(401, 'Invalid token')
365
+ }
366
+ }
367
+
368
+ export async function requireAuth(request: Request) {
369
+ const token = request.headers.get('authorization')?.replace('Bearer ', '')
370
+
371
+ if (!token) {
372
+ throw new ApiError(401, 'Missing authorization token')
373
+ }
374
+
375
+ return verifyToken(token)
376
+ }
377
+
378
+ // Usage in API route
379
+ export async function GET(request: Request) {
380
+ const user = await requireAuth(request)
381
+
382
+ const data = await getDataForUser(user.userId)
383
+
384
+ return NextResponse.json({ success: true, data })
385
+ }
386
+ ```
387
+
388
+ ### Role-Based Access Control
389
+
390
+ ```typescript
391
+ type Permission = 'read' | 'write' | 'delete' | 'admin'
392
+
393
+ interface User {
394
+ id: string
395
+ role: 'admin' | 'moderator' | 'user'
396
+ }
397
+
398
+ const rolePermissions: Record<User['role'], Permission[]> = {
399
+ admin: ['read', 'write', 'delete', 'admin'],
400
+ moderator: ['read', 'write', 'delete'],
401
+ user: ['read', 'write']
402
+ }
403
+
404
+ export function hasPermission(user: User, permission: Permission): boolean {
405
+ return rolePermissions[user.role].includes(permission)
406
+ }
407
+
408
+ export function requirePermission(permission: Permission) {
409
+ return (handler: (request: Request, user: User) => Promise<Response>) => {
410
+ return async (request: Request) => {
411
+ const user = await requireAuth(request)
412
+
413
+ if (!hasPermission(user, permission)) {
414
+ throw new ApiError(403, 'Insufficient permissions')
415
+ }
416
+
417
+ return handler(request, user)
418
+ }
419
+ }
420
+ }
421
+
422
+ // Usage - HOF wraps the handler
423
+ export const DELETE = requirePermission('delete')(
424
+ async (request: Request, user: User) => {
425
+ // Handler receives authenticated user with verified permission
426
+ return new Response('Deleted', { status: 200 })
427
+ }
428
+ )
429
+ ```
430
+
431
+ ## Rate Limiting
432
+
433
+ ### Simple In-Memory Rate Limiter
434
+
435
+ ```typescript
436
+ class RateLimiter {
437
+ private requests = new Map<string, number[]>()
438
+
439
+ async checkLimit(
440
+ identifier: string,
441
+ maxRequests: number,
442
+ windowMs: number
443
+ ): Promise<boolean> {
444
+ const now = Date.now()
445
+ const requests = this.requests.get(identifier) || []
446
+
447
+ // Remove old requests outside window
448
+ const recentRequests = requests.filter(time => now - time < windowMs)
449
+
450
+ if (recentRequests.length >= maxRequests) {
451
+ return false // Rate limit exceeded
452
+ }
453
+
454
+ // Add current request
455
+ recentRequests.push(now)
456
+ this.requests.set(identifier, recentRequests)
457
+
458
+ return true
459
+ }
460
+ }
461
+
462
+ const limiter = new RateLimiter()
463
+
464
+ export async function GET(request: Request) {
465
+ const ip = request.headers.get('x-forwarded-for') || 'unknown'
466
+
467
+ const allowed = await limiter.checkLimit(ip, 100, 60000) // 100 req/min
468
+
469
+ if (!allowed) {
470
+ return NextResponse.json({
471
+ error: 'Rate limit exceeded'
472
+ }, { status: 429 })
473
+ }
474
+
475
+ // Continue with request
476
+ }
477
+ ```
478
+
479
+ ## Background Jobs & Queues
480
+
481
+ ### Simple Queue Pattern
482
+
483
+ ```typescript
484
+ class JobQueue<T> {
485
+ private queue: T[] = []
486
+ private processing = false
487
+
488
+ async add(job: T): Promise<void> {
489
+ this.queue.push(job)
490
+
491
+ if (!this.processing) {
492
+ this.process()
493
+ }
494
+ }
495
+
496
+ private async process(): Promise<void> {
497
+ this.processing = true
498
+
499
+ while (this.queue.length > 0) {
500
+ const job = this.queue.shift()!
501
+
502
+ try {
503
+ await this.execute(job)
504
+ } catch (error) {
505
+ console.error('Job failed:', error)
506
+ }
507
+ }
508
+
509
+ this.processing = false
510
+ }
511
+
512
+ private async execute(job: T): Promise<void> {
513
+ // Job execution logic
514
+ }
515
+ }
516
+
517
+ // Usage for indexing markets
518
+ interface IndexJob {
519
+ marketId: string
520
+ }
521
+
522
+ const indexQueue = new JobQueue<IndexJob>()
523
+
524
+ export async function POST(request: Request) {
525
+ const { marketId } = await request.json()
526
+
527
+ // Add to queue instead of blocking
528
+ await indexQueue.add({ marketId })
529
+
530
+ return NextResponse.json({ success: true, message: 'Job queued' })
531
+ }
532
+ ```
533
+
534
+ ## Logging & Monitoring
535
+
536
+ ### Structured Logging
537
+
538
+ ```typescript
539
+ interface LogContext {
540
+ userId?: string
541
+ requestId?: string
542
+ method?: string
543
+ path?: string
544
+ [key: string]: unknown
545
+ }
546
+
547
+ class Logger {
548
+ log(level: 'info' | 'warn' | 'error', message: string, context?: LogContext) {
549
+ const entry = {
550
+ timestamp: new Date().toISOString(),
551
+ level,
552
+ message,
553
+ ...context
554
+ }
555
+
556
+ console.log(JSON.stringify(entry))
557
+ }
558
+
559
+ info(message: string, context?: LogContext) {
560
+ this.log('info', message, context)
561
+ }
562
+
563
+ warn(message: string, context?: LogContext) {
564
+ this.log('warn', message, context)
565
+ }
566
+
567
+ error(message: string, error: Error, context?: LogContext) {
568
+ this.log('error', message, {
569
+ ...context,
570
+ error: error.message,
571
+ stack: error.stack
572
+ })
573
+ }
574
+ }
575
+
576
+ const logger = new Logger()
577
+
578
+ // Usage
579
+ export async function GET(request: Request) {
580
+ const requestId = crypto.randomUUID()
581
+
582
+ logger.info('Fetching markets', {
583
+ requestId,
584
+ method: 'GET',
585
+ path: '/api/markets'
586
+ })
587
+
588
+ try {
589
+ const markets = await fetchMarkets()
590
+ return NextResponse.json({ success: true, data: markets })
591
+ } catch (error) {
592
+ logger.error('Failed to fetch markets', error as Error, { requestId })
593
+ return NextResponse.json({ error: 'Internal error' }, { status: 500 })
594
+ }
595
+ }
596
+ ```
597
+
598
+ **Remember**: Backend patterns enable scalable, maintainable server-side applications. Choose patterns that fit your complexity level.
.agents/skills/brainstorming/SKILL.md ADDED
@@ -0,0 +1,164 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ name: brainstorming
3
+ description: "You MUST use this before any creative work - creating features, building components, adding functionality, or modifying behavior. Explores user intent, requirements and design before implementation."
4
+ ---
5
+
6
+ # Brainstorming Ideas Into Designs
7
+
8
+ Help turn ideas into fully formed designs and specs through natural collaborative dialogue.
9
+
10
+ Start by understanding the current project context, then ask questions one at a time to refine the idea. Once you understand what you're building, present the design and get user approval.
11
+
12
+ <HARD-GATE>
13
+ Do NOT invoke any implementation skill, write any code, scaffold any project, or take any implementation action until you have presented a design and the user has approved it. This applies to EVERY project regardless of perceived simplicity.
14
+ </HARD-GATE>
15
+
16
+ ## Anti-Pattern: "This Is Too Simple To Need A Design"
17
+
18
+ Every project goes through this process. A todo list, a single-function utility, a config change — all of them. "Simple" projects are where unexamined assumptions cause the most wasted work. The design can be short (a few sentences for truly simple projects), but you MUST present it and get approval.
19
+
20
+ ## Checklist
21
+
22
+ You MUST create a task for each of these items and complete them in order:
23
+
24
+ 1. **Explore project context** — check files, docs, recent commits
25
+ 2. **Offer visual companion** (if topic will involve visual questions) — this is its own message, not combined with a clarifying question. See the Visual Companion section below.
26
+ 3. **Ask clarifying questions** — one at a time, understand purpose/constraints/success criteria
27
+ 4. **Propose 2-3 approaches** — with trade-offs and your recommendation
28
+ 5. **Present design** — in sections scaled to their complexity, get user approval after each section
29
+ 6. **Write design doc** — save to `docs/superpowers/specs/YYYY-MM-DD-<topic>-design.md` and commit
30
+ 7. **Spec self-review** — quick inline check for placeholders, contradictions, ambiguity, scope (see below)
31
+ 8. **User reviews written spec** — ask user to review the spec file before proceeding
32
+ 9. **Transition to implementation** — invoke writing-plans skill to create implementation plan
33
+
34
+ ## Process Flow
35
+
36
+ ```dot
37
+ digraph brainstorming {
38
+ "Explore project context" [shape=box];
39
+ "Visual questions ahead?" [shape=diamond];
40
+ "Offer Visual Companion\n(own message, no other content)" [shape=box];
41
+ "Ask clarifying questions" [shape=box];
42
+ "Propose 2-3 approaches" [shape=box];
43
+ "Present design sections" [shape=box];
44
+ "User approves design?" [shape=diamond];
45
+ "Write design doc" [shape=box];
46
+ "Spec self-review\n(fix inline)" [shape=box];
47
+ "User reviews spec?" [shape=diamond];
48
+ "Invoke writing-plans skill" [shape=doublecircle];
49
+
50
+ "Explore project context" -> "Visual questions ahead?";
51
+ "Visual questions ahead?" -> "Offer Visual Companion\n(own message, no other content)" [label="yes"];
52
+ "Visual questions ahead?" -> "Ask clarifying questions" [label="no"];
53
+ "Offer Visual Companion\n(own message, no other content)" -> "Ask clarifying questions";
54
+ "Ask clarifying questions" -> "Propose 2-3 approaches";
55
+ "Propose 2-3 approaches" -> "Present design sections";
56
+ "Present design sections" -> "User approves design?";
57
+ "User approves design?" -> "Present design sections" [label="no, revise"];
58
+ "User approves design?" -> "Write design doc" [label="yes"];
59
+ "Write design doc" -> "Spec self-review\n(fix inline)";
60
+ "Spec self-review\n(fix inline)" -> "User reviews spec?";
61
+ "User reviews spec?" -> "Write design doc" [label="changes requested"];
62
+ "User reviews spec?" -> "Invoke writing-plans skill" [label="approved"];
63
+ }
64
+ ```
65
+
66
+ **The terminal state is invoking writing-plans.** Do NOT invoke frontend-design, mcp-builder, or any other implementation skill. The ONLY skill you invoke after brainstorming is writing-plans.
67
+
68
+ ## The Process
69
+
70
+ **Understanding the idea:**
71
+
72
+ - Check out the current project state first (files, docs, recent commits)
73
+ - Before asking detailed questions, assess scope: if the request describes multiple independent subsystems (e.g., "build a platform with chat, file storage, billing, and analytics"), flag this immediately. Don't spend questions refining details of a project that needs to be decomposed first.
74
+ - If the project is too large for a single spec, help the user decompose into sub-projects: what are the independent pieces, how do they relate, what order should they be built? Then brainstorm the first sub-project through the normal design flow. Each sub-project gets its own spec → plan → implementation cycle.
75
+ - For appropriately-scoped projects, ask questions one at a time to refine the idea
76
+ - Prefer multiple choice questions when possible, but open-ended is fine too
77
+ - Only one question per message - if a topic needs more exploration, break it into multiple questions
78
+ - Focus on understanding: purpose, constraints, success criteria
79
+
80
+ **Exploring approaches:**
81
+
82
+ - Propose 2-3 different approaches with trade-offs
83
+ - Present options conversationally with your recommendation and reasoning
84
+ - Lead with your recommended option and explain why
85
+
86
+ **Presenting the design:**
87
+
88
+ - Once you believe you understand what you're building, present the design
89
+ - Scale each section to its complexity: a few sentences if straightforward, up to 200-300 words if nuanced
90
+ - Ask after each section whether it looks right so far
91
+ - Cover: architecture, components, data flow, error handling, testing
92
+ - Be ready to go back and clarify if something doesn't make sense
93
+
94
+ **Design for isolation and clarity:**
95
+
96
+ - Break the system into smaller units that each have one clear purpose, communicate through well-defined interfaces, and can be understood and tested independently
97
+ - For each unit, you should be able to answer: what does it do, how do you use it, and what does it depend on?
98
+ - Can someone understand what a unit does without reading its internals? Can you change the internals without breaking consumers? If not, the boundaries need work.
99
+ - Smaller, well-bounded units are also easier for you to work with - you reason better about code you can hold in context at once, and your edits are more reliable when files are focused. When a file grows large, that's often a signal that it's doing too much.
100
+
101
+ **Working in existing codebases:**
102
+
103
+ - Explore the current structure before proposing changes. Follow existing patterns.
104
+ - Where existing code has problems that affect the work (e.g., a file that's grown too large, unclear boundaries, tangled responsibilities), include targeted improvements as part of the design - the way a good developer improves code they're working in.
105
+ - Don't propose unrelated refactoring. Stay focused on what serves the current goal.
106
+
107
+ ## After the Design
108
+
109
+ **Documentation:**
110
+
111
+ - Write the validated design (spec) to `docs/superpowers/specs/YYYY-MM-DD-<topic>-design.md`
112
+ - (User preferences for spec location override this default)
113
+ - Use elements-of-style:writing-clearly-and-concisely skill if available
114
+ - Commit the design document to git
115
+
116
+ **Spec Self-Review:**
117
+ After writing the spec document, look at it with fresh eyes:
118
+
119
+ 1. **Placeholder scan:** Any "TBD", "TODO", incomplete sections, or vague requirements? Fix them.
120
+ 2. **Internal consistency:** Do any sections contradict each other? Does the architecture match the feature descriptions?
121
+ 3. **Scope check:** Is this focused enough for a single implementation plan, or does it need decomposition?
122
+ 4. **Ambiguity check:** Could any requirement be interpreted two different ways? If so, pick one and make it explicit.
123
+
124
+ Fix any issues inline. No need to re-review — just fix and move on.
125
+
126
+ **User Review Gate:**
127
+ After the spec review loop passes, ask the user to review the written spec before proceeding:
128
+
129
+ > "Spec written and committed to `<path>`. Please review it and let me know if you want to make any changes before we start writing out the implementation plan."
130
+
131
+ Wait for the user's response. If they request changes, make them and re-run the spec review loop. Only proceed once the user approves.
132
+
133
+ **Implementation:**
134
+
135
+ - Invoke the writing-plans skill to create a detailed implementation plan
136
+ - Do NOT invoke any other skill. writing-plans is the next step.
137
+
138
+ ## Key Principles
139
+
140
+ - **One question at a time** - Don't overwhelm with multiple questions
141
+ - **Multiple choice preferred** - Easier to answer than open-ended when possible
142
+ - **YAGNI ruthlessly** - Remove unnecessary features from all designs
143
+ - **Explore alternatives** - Always propose 2-3 approaches before settling
144
+ - **Incremental validation** - Present design, get approval before moving on
145
+ - **Be flexible** - Go back and clarify when something doesn't make sense
146
+
147
+ ## Visual Companion
148
+
149
+ A browser-based companion for showing mockups, diagrams, and visual options during brainstorming. Available as a tool — not a mode. Accepting the companion means it's available for questions that benefit from visual treatment; it does NOT mean every question goes through the browser.
150
+
151
+ **Offering the companion:** When you anticipate that upcoming questions will involve visual content (mockups, layouts, diagrams), offer it once for consent:
152
+ > "Some of what we're working on might be easier to explain if I can show it to you in a web browser. I can put together mockups, diagrams, comparisons, and other visuals as we go. This feature is still new and can be token-intensive. Want to try it? (Requires opening a local URL)"
153
+
154
+ **This offer MUST be its own message.** Do not combine it with clarifying questions, context summaries, or any other content. The message should contain ONLY the offer above and nothing else. Wait for the user's response before continuing. If they decline, proceed with text-only brainstorming.
155
+
156
+ **Per-question decision:** Even after the user accepts, decide FOR EACH QUESTION whether to use the browser or the terminal. The test: **would the user understand this better by seeing it than reading it?**
157
+
158
+ - **Use the browser** for content that IS visual — mockups, wireframes, layout comparisons, architecture diagrams, side-by-side visual designs
159
+ - **Use the terminal** for content that is text — requirements questions, conceptual choices, tradeoff lists, A/B/C/D text options, scope decisions
160
+
161
+ A question about a UI topic is not automatically a visual question. "What does personality mean in this context?" is a conceptual question — use the terminal. "Which wizard layout works better?" is a visual question — use the browser.
162
+
163
+ If they agree to the companion, read the detailed guide before proceeding:
164
+ `skills/brainstorming/visual-companion.md`
.agents/skills/brainstorming/scripts/frame-template.html ADDED
@@ -0,0 +1,214 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!DOCTYPE html>
2
+ <html>
3
+ <head>
4
+ <meta charset="utf-8">
5
+ <title>Superpowers Brainstorming</title>
6
+ <style>
7
+ /*
8
+ * BRAINSTORM COMPANION FRAME TEMPLATE
9
+ *
10
+ * This template provides a consistent frame with:
11
+ * - OS-aware light/dark theming
12
+ * - Fixed header and selection indicator bar
13
+ * - Scrollable main content area
14
+ * - CSS helpers for common UI patterns
15
+ *
16
+ * Content is injected via placeholder comment in #claude-content.
17
+ */
18
+
19
+ * { box-sizing: border-box; margin: 0; padding: 0; }
20
+ html, body { height: 100%; overflow: hidden; }
21
+
22
+ /* ===== THEME VARIABLES ===== */
23
+ :root {
24
+ --bg-primary: #f5f5f7;
25
+ --bg-secondary: #ffffff;
26
+ --bg-tertiary: #e5e5e7;
27
+ --border: #d1d1d6;
28
+ --text-primary: #1d1d1f;
29
+ --text-secondary: #86868b;
30
+ --text-tertiary: #aeaeb2;
31
+ --accent: #0071e3;
32
+ --accent-hover: #0077ed;
33
+ --success: #34c759;
34
+ --warning: #ff9f0a;
35
+ --error: #ff3b30;
36
+ --selected-bg: #e8f4fd;
37
+ --selected-border: #0071e3;
38
+ }
39
+
40
+ @media (prefers-color-scheme: dark) {
41
+ :root {
42
+ --bg-primary: #1d1d1f;
43
+ --bg-secondary: #2d2d2f;
44
+ --bg-tertiary: #3d3d3f;
45
+ --border: #424245;
46
+ --text-primary: #f5f5f7;
47
+ --text-secondary: #86868b;
48
+ --text-tertiary: #636366;
49
+ --accent: #0a84ff;
50
+ --accent-hover: #409cff;
51
+ --selected-bg: rgba(10, 132, 255, 0.15);
52
+ --selected-border: #0a84ff;
53
+ }
54
+ }
55
+
56
+ body {
57
+ font-family: system-ui, -apple-system, BlinkMacSystemFont, sans-serif;
58
+ background: var(--bg-primary);
59
+ color: var(--text-primary);
60
+ display: flex;
61
+ flex-direction: column;
62
+ line-height: 1.5;
63
+ }
64
+
65
+ /* ===== FRAME STRUCTURE ===== */
66
+ .header {
67
+ background: var(--bg-secondary);
68
+ padding: 0.5rem 1.5rem;
69
+ display: flex;
70
+ justify-content: space-between;
71
+ align-items: center;
72
+ border-bottom: 1px solid var(--border);
73
+ flex-shrink: 0;
74
+ }
75
+ .header h1 { font-size: 0.85rem; font-weight: 500; color: var(--text-secondary); }
76
+ .header .status { font-size: 0.7rem; color: var(--success); display: flex; align-items: center; gap: 0.4rem; }
77
+ .header .status::before { content: ''; width: 6px; height: 6px; background: var(--success); border-radius: 50%; }
78
+
79
+ .main { flex: 1; overflow-y: auto; }
80
+ #claude-content { padding: 2rem; min-height: 100%; }
81
+
82
+ .indicator-bar {
83
+ background: var(--bg-secondary);
84
+ border-top: 1px solid var(--border);
85
+ padding: 0.5rem 1.5rem;
86
+ flex-shrink: 0;
87
+ text-align: center;
88
+ }
89
+ .indicator-bar span {
90
+ font-size: 0.75rem;
91
+ color: var(--text-secondary);
92
+ }
93
+ .indicator-bar .selected-text {
94
+ color: var(--accent);
95
+ font-weight: 500;
96
+ }
97
+
98
+ /* ===== TYPOGRAPHY ===== */
99
+ h2 { font-size: 1.5rem; font-weight: 600; margin-bottom: 0.5rem; }
100
+ h3 { font-size: 1.1rem; font-weight: 600; margin-bottom: 0.25rem; }
101
+ .subtitle { color: var(--text-secondary); margin-bottom: 1.5rem; }
102
+ .section { margin-bottom: 2rem; }
103
+ .label { font-size: 0.7rem; color: var(--text-secondary); text-transform: uppercase; letter-spacing: 0.05em; margin-bottom: 0.5rem; }
104
+
105
+ /* ===== OPTIONS (for A/B/C choices) ===== */
106
+ .options { display: flex; flex-direction: column; gap: 0.75rem; }
107
+ .option {
108
+ background: var(--bg-secondary);
109
+ border: 2px solid var(--border);
110
+ border-radius: 12px;
111
+ padding: 1rem 1.25rem;
112
+ cursor: pointer;
113
+ transition: all 0.15s ease;
114
+ display: flex;
115
+ align-items: flex-start;
116
+ gap: 1rem;
117
+ }
118
+ .option:hover { border-color: var(--accent); }
119
+ .option.selected { background: var(--selected-bg); border-color: var(--selected-border); }
120
+ .option .letter {
121
+ background: var(--bg-tertiary);
122
+ color: var(--text-secondary);
123
+ width: 1.75rem; height: 1.75rem;
124
+ border-radius: 6px;
125
+ display: flex; align-items: center; justify-content: center;
126
+ font-weight: 600; font-size: 0.85rem; flex-shrink: 0;
127
+ }
128
+ .option.selected .letter { background: var(--accent); color: white; }
129
+ .option .content { flex: 1; }
130
+ .option .content h3 { font-size: 0.95rem; margin-bottom: 0.15rem; }
131
+ .option .content p { color: var(--text-secondary); font-size: 0.85rem; margin: 0; }
132
+
133
+ /* ===== CARDS (for showing designs/mockups) ===== */
134
+ .cards { display: grid; grid-template-columns: repeat(auto-fit, minmax(280px, 1fr)); gap: 1rem; }
135
+ .card {
136
+ background: var(--bg-secondary);
137
+ border: 1px solid var(--border);
138
+ border-radius: 12px;
139
+ overflow: hidden;
140
+ cursor: pointer;
141
+ transition: all 0.15s ease;
142
+ }
143
+ .card:hover { border-color: var(--accent); transform: translateY(-2px); box-shadow: 0 4px 12px rgba(0,0,0,0.1); }
144
+ .card.selected { border-color: var(--selected-border); border-width: 2px; }
145
+ .card-image { background: var(--bg-tertiary); aspect-ratio: 16/10; display: flex; align-items: center; justify-content: center; }
146
+ .card-body { padding: 1rem; }
147
+ .card-body h3 { margin-bottom: 0.25rem; }
148
+ .card-body p { color: var(--text-secondary); font-size: 0.85rem; }
149
+
150
+ /* ===== MOCKUP CONTAINER ===== */
151
+ .mockup {
152
+ background: var(--bg-secondary);
153
+ border: 1px solid var(--border);
154
+ border-radius: 12px;
155
+ overflow: hidden;
156
+ margin-bottom: 1.5rem;
157
+ }
158
+ .mockup-header {
159
+ background: var(--bg-tertiary);
160
+ padding: 0.5rem 1rem;
161
+ font-size: 0.75rem;
162
+ color: var(--text-secondary);
163
+ border-bottom: 1px solid var(--border);
164
+ }
165
+ .mockup-body { padding: 1.5rem; }
166
+
167
+ /* ===== SPLIT VIEW (side-by-side comparison) ===== */
168
+ .split { display: grid; grid-template-columns: 1fr 1fr; gap: 1.5rem; }
169
+ @media (max-width: 700px) { .split { grid-template-columns: 1fr; } }
170
+
171
+ /* ===== PROS/CONS ===== */
172
+ .pros-cons { display: grid; grid-template-columns: 1fr 1fr; gap: 1rem; margin: 1rem 0; }
173
+ .pros, .cons { background: var(--bg-secondary); border-radius: 8px; padding: 1rem; }
174
+ .pros h4 { color: var(--success); font-size: 0.85rem; margin-bottom: 0.5rem; }
175
+ .cons h4 { color: var(--error); font-size: 0.85rem; margin-bottom: 0.5rem; }
176
+ .pros ul, .cons ul { margin-left: 1.25rem; font-size: 0.85rem; color: var(--text-secondary); }
177
+ .pros li, .cons li { margin-bottom: 0.25rem; }
178
+
179
+ /* ===== PLACEHOLDER (for mockup areas) ===== */
180
+ .placeholder {
181
+ background: var(--bg-tertiary);
182
+ border: 2px dashed var(--border);
183
+ border-radius: 8px;
184
+ padding: 2rem;
185
+ text-align: center;
186
+ color: var(--text-tertiary);
187
+ }
188
+
189
+ /* ===== INLINE MOCKUP ELEMENTS ===== */
190
+ .mock-nav { background: var(--accent); color: white; padding: 0.75rem 1rem; display: flex; gap: 1.5rem; font-size: 0.9rem; }
191
+ .mock-sidebar { background: var(--bg-tertiary); padding: 1rem; min-width: 180px; }
192
+ .mock-content { padding: 1.5rem; flex: 1; }
193
+ .mock-button { background: var(--accent); color: white; border: none; padding: 0.5rem 1rem; border-radius: 6px; font-size: 0.85rem; }
194
+ .mock-input { background: var(--bg-primary); border: 1px solid var(--border); border-radius: 6px; padding: 0.5rem; width: 100%; }
195
+ </style>
196
+ </head>
197
+ <body>
198
+ <div class="header">
199
+ <h1><a href="https://github.com/obra/superpowers" style="color: inherit; text-decoration: none;">Superpowers Brainstorming</a></h1>
200
+ <div class="status">Connected</div>
201
+ </div>
202
+
203
+ <div class="main">
204
+ <div id="claude-content">
205
+ <!-- CONTENT -->
206
+ </div>
207
+ </div>
208
+
209
+ <div class="indicator-bar">
210
+ <span id="indicator-text">Click an option above, then return to the terminal</span>
211
+ </div>
212
+
213
+ </body>
214
+ </html>
.agents/skills/brainstorming/scripts/helper.js ADDED
@@ -0,0 +1,88 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ (function() {
2
+ const WS_URL = 'ws://' + window.location.host;
3
+ let ws = null;
4
+ let eventQueue = [];
5
+
6
+ function connect() {
7
+ ws = new WebSocket(WS_URL);
8
+
9
+ ws.onopen = () => {
10
+ eventQueue.forEach(e => ws.send(JSON.stringify(e)));
11
+ eventQueue = [];
12
+ };
13
+
14
+ ws.onmessage = (msg) => {
15
+ const data = JSON.parse(msg.data);
16
+ if (data.type === 'reload') {
17
+ window.location.reload();
18
+ }
19
+ };
20
+
21
+ ws.onclose = () => {
22
+ setTimeout(connect, 1000);
23
+ };
24
+ }
25
+
26
+ function sendEvent(event) {
27
+ event.timestamp = Date.now();
28
+ if (ws && ws.readyState === WebSocket.OPEN) {
29
+ ws.send(JSON.stringify(event));
30
+ } else {
31
+ eventQueue.push(event);
32
+ }
33
+ }
34
+
35
+ // Capture clicks on choice elements
36
+ document.addEventListener('click', (e) => {
37
+ const target = e.target.closest('[data-choice]');
38
+ if (!target) return;
39
+
40
+ sendEvent({
41
+ type: 'click',
42
+ text: target.textContent.trim(),
43
+ choice: target.dataset.choice,
44
+ id: target.id || null
45
+ });
46
+
47
+ // Update indicator bar (defer so toggleSelect runs first)
48
+ setTimeout(() => {
49
+ const indicator = document.getElementById('indicator-text');
50
+ if (!indicator) return;
51
+ const container = target.closest('.options') || target.closest('.cards');
52
+ const selected = container ? container.querySelectorAll('.selected') : [];
53
+ if (selected.length === 0) {
54
+ indicator.textContent = 'Click an option above, then return to the terminal';
55
+ } else if (selected.length === 1) {
56
+ const label = selected[0].querySelector('h3, .content h3, .card-body h3')?.textContent?.trim() || selected[0].dataset.choice;
57
+ indicator.innerHTML = '<span class="selected-text">' + label + ' selected</span> — return to terminal to continue';
58
+ } else {
59
+ indicator.innerHTML = '<span class="selected-text">' + selected.length + ' selected</span> — return to terminal to continue';
60
+ }
61
+ }, 0);
62
+ });
63
+
64
+ // Frame UI: selection tracking
65
+ window.selectedChoice = null;
66
+
67
+ window.toggleSelect = function(el) {
68
+ const container = el.closest('.options') || el.closest('.cards');
69
+ const multi = container && container.dataset.multiselect !== undefined;
70
+ if (container && !multi) {
71
+ container.querySelectorAll('.option, .card').forEach(o => o.classList.remove('selected'));
72
+ }
73
+ if (multi) {
74
+ el.classList.toggle('selected');
75
+ } else {
76
+ el.classList.add('selected');
77
+ }
78
+ window.selectedChoice = el.dataset.choice;
79
+ };
80
+
81
+ // Expose API for explicit use
82
+ window.brainstorm = {
83
+ send: sendEvent,
84
+ choice: (value, metadata = {}) => sendEvent({ type: 'choice', value, ...metadata })
85
+ };
86
+
87
+ connect();
88
+ })();
.agents/skills/brainstorming/scripts/server.cjs ADDED
@@ -0,0 +1,354 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ const crypto = require('crypto');
2
+ const http = require('http');
3
+ const fs = require('fs');
4
+ const path = require('path');
5
+
6
+ // ========== WebSocket Protocol (RFC 6455) ==========
7
+
8
+ const OPCODES = { TEXT: 0x01, CLOSE: 0x08, PING: 0x09, PONG: 0x0A };
9
+ const WS_MAGIC = '258EAFA5-E914-47DA-95CA-C5AB0DC85B11';
10
+
11
+ function computeAcceptKey(clientKey) {
12
+ return crypto.createHash('sha1').update(clientKey + WS_MAGIC).digest('base64');
13
+ }
14
+
15
+ function encodeFrame(opcode, payload) {
16
+ const fin = 0x80;
17
+ const len = payload.length;
18
+ let header;
19
+
20
+ if (len < 126) {
21
+ header = Buffer.alloc(2);
22
+ header[0] = fin | opcode;
23
+ header[1] = len;
24
+ } else if (len < 65536) {
25
+ header = Buffer.alloc(4);
26
+ header[0] = fin | opcode;
27
+ header[1] = 126;
28
+ header.writeUInt16BE(len, 2);
29
+ } else {
30
+ header = Buffer.alloc(10);
31
+ header[0] = fin | opcode;
32
+ header[1] = 127;
33
+ header.writeBigUInt64BE(BigInt(len), 2);
34
+ }
35
+
36
+ return Buffer.concat([header, payload]);
37
+ }
38
+
39
+ function decodeFrame(buffer) {
40
+ if (buffer.length < 2) return null;
41
+
42
+ const secondByte = buffer[1];
43
+ const opcode = buffer[0] & 0x0F;
44
+ const masked = (secondByte & 0x80) !== 0;
45
+ let payloadLen = secondByte & 0x7F;
46
+ let offset = 2;
47
+
48
+ if (!masked) throw new Error('Client frames must be masked');
49
+
50
+ if (payloadLen === 126) {
51
+ if (buffer.length < 4) return null;
52
+ payloadLen = buffer.readUInt16BE(2);
53
+ offset = 4;
54
+ } else if (payloadLen === 127) {
55
+ if (buffer.length < 10) return null;
56
+ payloadLen = Number(buffer.readBigUInt64BE(2));
57
+ offset = 10;
58
+ }
59
+
60
+ const maskOffset = offset;
61
+ const dataOffset = offset + 4;
62
+ const totalLen = dataOffset + payloadLen;
63
+ if (buffer.length < totalLen) return null;
64
+
65
+ const mask = buffer.slice(maskOffset, dataOffset);
66
+ const data = Buffer.alloc(payloadLen);
67
+ for (let i = 0; i < payloadLen; i++) {
68
+ data[i] = buffer[dataOffset + i] ^ mask[i % 4];
69
+ }
70
+
71
+ return { opcode, payload: data, bytesConsumed: totalLen };
72
+ }
73
+
74
+ // ========== Configuration ==========
75
+
76
+ const PORT = process.env.BRAINSTORM_PORT || (49152 + Math.floor(Math.random() * 16383));
77
+ const HOST = process.env.BRAINSTORM_HOST || '127.0.0.1';
78
+ const URL_HOST = process.env.BRAINSTORM_URL_HOST || (HOST === '127.0.0.1' ? 'localhost' : HOST);
79
+ const SESSION_DIR = process.env.BRAINSTORM_DIR || '/tmp/brainstorm';
80
+ const CONTENT_DIR = path.join(SESSION_DIR, 'content');
81
+ const STATE_DIR = path.join(SESSION_DIR, 'state');
82
+ let ownerPid = process.env.BRAINSTORM_OWNER_PID ? Number(process.env.BRAINSTORM_OWNER_PID) : null;
83
+
84
+ const MIME_TYPES = {
85
+ '.html': 'text/html', '.css': 'text/css', '.js': 'application/javascript',
86
+ '.json': 'application/json', '.png': 'image/png', '.jpg': 'image/jpeg',
87
+ '.jpeg': 'image/jpeg', '.gif': 'image/gif', '.svg': 'image/svg+xml'
88
+ };
89
+
90
+ // ========== Templates and Constants ==========
91
+
92
+ const WAITING_PAGE = `<!DOCTYPE html>
93
+ <html>
94
+ <head><meta charset="utf-8"><title>Brainstorm Companion</title>
95
+ <style>body { font-family: system-ui, sans-serif; padding: 2rem; max-width: 800px; margin: 0 auto; }
96
+ h1 { color: #333; } p { color: #666; }</style>
97
+ </head>
98
+ <body><h1>Brainstorm Companion</h1>
99
+ <p>Waiting for the agent to push a screen...</p></body></html>`;
100
+
101
+ const frameTemplate = fs.readFileSync(path.join(__dirname, 'frame-template.html'), 'utf-8');
102
+ const helperScript = fs.readFileSync(path.join(__dirname, 'helper.js'), 'utf-8');
103
+ const helperInjection = '<script>\n' + helperScript + '\n</script>';
104
+
105
+ // ========== Helper Functions ==========
106
+
107
+ function isFullDocument(html) {
108
+ const trimmed = html.trimStart().toLowerCase();
109
+ return trimmed.startsWith('<!doctype') || trimmed.startsWith('<html');
110
+ }
111
+
112
+ function wrapInFrame(content) {
113
+ return frameTemplate.replace('<!-- CONTENT -->', content);
114
+ }
115
+
116
+ function getNewestScreen() {
117
+ const files = fs.readdirSync(CONTENT_DIR)
118
+ .filter(f => f.endsWith('.html'))
119
+ .map(f => {
120
+ const fp = path.join(CONTENT_DIR, f);
121
+ return { path: fp, mtime: fs.statSync(fp).mtime.getTime() };
122
+ })
123
+ .sort((a, b) => b.mtime - a.mtime);
124
+ return files.length > 0 ? files[0].path : null;
125
+ }
126
+
127
+ // ========== HTTP Request Handler ==========
128
+
129
+ function handleRequest(req, res) {
130
+ touchActivity();
131
+ if (req.method === 'GET' && req.url === '/') {
132
+ const screenFile = getNewestScreen();
133
+ let html = screenFile
134
+ ? (raw => isFullDocument(raw) ? raw : wrapInFrame(raw))(fs.readFileSync(screenFile, 'utf-8'))
135
+ : WAITING_PAGE;
136
+
137
+ if (html.includes('</body>')) {
138
+ html = html.replace('</body>', helperInjection + '\n</body>');
139
+ } else {
140
+ html += helperInjection;
141
+ }
142
+
143
+ res.writeHead(200, { 'Content-Type': 'text/html; charset=utf-8' });
144
+ res.end(html);
145
+ } else if (req.method === 'GET' && req.url.startsWith('/files/')) {
146
+ const fileName = req.url.slice(7);
147
+ const filePath = path.join(CONTENT_DIR, path.basename(fileName));
148
+ if (!fs.existsSync(filePath)) {
149
+ res.writeHead(404);
150
+ res.end('Not found');
151
+ return;
152
+ }
153
+ const ext = path.extname(filePath).toLowerCase();
154
+ const contentType = MIME_TYPES[ext] || 'application/octet-stream';
155
+ res.writeHead(200, { 'Content-Type': contentType });
156
+ res.end(fs.readFileSync(filePath));
157
+ } else {
158
+ res.writeHead(404);
159
+ res.end('Not found');
160
+ }
161
+ }
162
+
163
+ // ========== WebSocket Connection Handling ==========
164
+
165
+ const clients = new Set();
166
+
167
+ function handleUpgrade(req, socket) {
168
+ const key = req.headers['sec-websocket-key'];
169
+ if (!key) { socket.destroy(); return; }
170
+
171
+ const accept = computeAcceptKey(key);
172
+ socket.write(
173
+ 'HTTP/1.1 101 Switching Protocols\r\n' +
174
+ 'Upgrade: websocket\r\n' +
175
+ 'Connection: Upgrade\r\n' +
176
+ 'Sec-WebSocket-Accept: ' + accept + '\r\n\r\n'
177
+ );
178
+
179
+ let buffer = Buffer.alloc(0);
180
+ clients.add(socket);
181
+
182
+ socket.on('data', (chunk) => {
183
+ buffer = Buffer.concat([buffer, chunk]);
184
+ while (buffer.length > 0) {
185
+ let result;
186
+ try {
187
+ result = decodeFrame(buffer);
188
+ } catch (e) {
189
+ socket.end(encodeFrame(OPCODES.CLOSE, Buffer.alloc(0)));
190
+ clients.delete(socket);
191
+ return;
192
+ }
193
+ if (!result) break;
194
+ buffer = buffer.slice(result.bytesConsumed);
195
+
196
+ switch (result.opcode) {
197
+ case OPCODES.TEXT:
198
+ handleMessage(result.payload.toString());
199
+ break;
200
+ case OPCODES.CLOSE:
201
+ socket.end(encodeFrame(OPCODES.CLOSE, Buffer.alloc(0)));
202
+ clients.delete(socket);
203
+ return;
204
+ case OPCODES.PING:
205
+ socket.write(encodeFrame(OPCODES.PONG, result.payload));
206
+ break;
207
+ case OPCODES.PONG:
208
+ break;
209
+ default: {
210
+ const closeBuf = Buffer.alloc(2);
211
+ closeBuf.writeUInt16BE(1003);
212
+ socket.end(encodeFrame(OPCODES.CLOSE, closeBuf));
213
+ clients.delete(socket);
214
+ return;
215
+ }
216
+ }
217
+ }
218
+ });
219
+
220
+ socket.on('close', () => clients.delete(socket));
221
+ socket.on('error', () => clients.delete(socket));
222
+ }
223
+
224
+ function handleMessage(text) {
225
+ let event;
226
+ try {
227
+ event = JSON.parse(text);
228
+ } catch (e) {
229
+ console.error('Failed to parse WebSocket message:', e.message);
230
+ return;
231
+ }
232
+ touchActivity();
233
+ console.log(JSON.stringify({ source: 'user-event', ...event }));
234
+ if (event.choice) {
235
+ const eventsFile = path.join(STATE_DIR, 'events');
236
+ fs.appendFileSync(eventsFile, JSON.stringify(event) + '\n');
237
+ }
238
+ }
239
+
240
+ function broadcast(msg) {
241
+ const frame = encodeFrame(OPCODES.TEXT, Buffer.from(JSON.stringify(msg)));
242
+ for (const socket of clients) {
243
+ try { socket.write(frame); } catch (e) { clients.delete(socket); }
244
+ }
245
+ }
246
+
247
+ // ========== Activity Tracking ==========
248
+
249
+ const IDLE_TIMEOUT_MS = 30 * 60 * 1000; // 30 minutes
250
+ let lastActivity = Date.now();
251
+
252
+ function touchActivity() {
253
+ lastActivity = Date.now();
254
+ }
255
+
256
+ // ========== File Watching ==========
257
+
258
+ const debounceTimers = new Map();
259
+
260
+ // ========== Server Startup ==========
261
+
262
+ function startServer() {
263
+ if (!fs.existsSync(CONTENT_DIR)) fs.mkdirSync(CONTENT_DIR, { recursive: true });
264
+ if (!fs.existsSync(STATE_DIR)) fs.mkdirSync(STATE_DIR, { recursive: true });
265
+
266
+ // Track known files to distinguish new screens from updates.
267
+ // macOS fs.watch reports 'rename' for both new files and overwrites,
268
+ // so we can't rely on eventType alone.
269
+ const knownFiles = new Set(
270
+ fs.readdirSync(CONTENT_DIR).filter(f => f.endsWith('.html'))
271
+ );
272
+
273
+ const server = http.createServer(handleRequest);
274
+ server.on('upgrade', handleUpgrade);
275
+
276
+ const watcher = fs.watch(CONTENT_DIR, (eventType, filename) => {
277
+ if (!filename || !filename.endsWith('.html')) return;
278
+
279
+ if (debounceTimers.has(filename)) clearTimeout(debounceTimers.get(filename));
280
+ debounceTimers.set(filename, setTimeout(() => {
281
+ debounceTimers.delete(filename);
282
+ const filePath = path.join(CONTENT_DIR, filename);
283
+
284
+ if (!fs.existsSync(filePath)) return; // file was deleted
285
+ touchActivity();
286
+
287
+ if (!knownFiles.has(filename)) {
288
+ knownFiles.add(filename);
289
+ const eventsFile = path.join(STATE_DIR, 'events');
290
+ if (fs.existsSync(eventsFile)) fs.unlinkSync(eventsFile);
291
+ console.log(JSON.stringify({ type: 'screen-added', file: filePath }));
292
+ } else {
293
+ console.log(JSON.stringify({ type: 'screen-updated', file: filePath }));
294
+ }
295
+
296
+ broadcast({ type: 'reload' });
297
+ }, 100));
298
+ });
299
+ watcher.on('error', (err) => console.error('fs.watch error:', err.message));
300
+
301
+ function shutdown(reason) {
302
+ console.log(JSON.stringify({ type: 'server-stopped', reason }));
303
+ const infoFile = path.join(STATE_DIR, 'server-info');
304
+ if (fs.existsSync(infoFile)) fs.unlinkSync(infoFile);
305
+ fs.writeFileSync(
306
+ path.join(STATE_DIR, 'server-stopped'),
307
+ JSON.stringify({ reason, timestamp: Date.now() }) + '\n'
308
+ );
309
+ watcher.close();
310
+ clearInterval(lifecycleCheck);
311
+ server.close(() => process.exit(0));
312
+ }
313
+
314
+ function ownerAlive() {
315
+ if (!ownerPid) return true;
316
+ try { process.kill(ownerPid, 0); return true; } catch (e) { return e.code === 'EPERM'; }
317
+ }
318
+
319
+ // Check every 60s: exit if owner process died or idle for 30 minutes
320
+ const lifecycleCheck = setInterval(() => {
321
+ if (!ownerAlive()) shutdown('owner process exited');
322
+ else if (Date.now() - lastActivity > IDLE_TIMEOUT_MS) shutdown('idle timeout');
323
+ }, 60 * 1000);
324
+ lifecycleCheck.unref();
325
+
326
+ // Validate owner PID at startup. If it's already dead, the PID resolution
327
+ // was wrong (common on WSL, Tailscale SSH, and cross-user scenarios).
328
+ // Disable monitoring and rely on the idle timeout instead.
329
+ if (ownerPid) {
330
+ try { process.kill(ownerPid, 0); }
331
+ catch (e) {
332
+ if (e.code !== 'EPERM') {
333
+ console.log(JSON.stringify({ type: 'owner-pid-invalid', pid: ownerPid, reason: 'dead at startup' }));
334
+ ownerPid = null;
335
+ }
336
+ }
337
+ }
338
+
339
+ server.listen(PORT, HOST, () => {
340
+ const info = JSON.stringify({
341
+ type: 'server-started', port: Number(PORT), host: HOST,
342
+ url_host: URL_HOST, url: 'http://' + URL_HOST + ':' + PORT,
343
+ screen_dir: CONTENT_DIR, state_dir: STATE_DIR
344
+ });
345
+ console.log(info);
346
+ fs.writeFileSync(path.join(STATE_DIR, 'server-info'), info + '\n');
347
+ });
348
+ }
349
+
350
+ if (require.main === module) {
351
+ startServer();
352
+ }
353
+
354
+ module.exports = { computeAcceptKey, encodeFrame, decodeFrame, OPCODES };
.agents/skills/brainstorming/scripts/start-server.sh ADDED
@@ -0,0 +1,148 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env bash
2
+ # Start the brainstorm server and output connection info
3
+ # Usage: start-server.sh [--project-dir <path>] [--host <bind-host>] [--url-host <display-host>] [--foreground] [--background]
4
+ #
5
+ # Starts server on a random high port, outputs JSON with URL.
6
+ # Each session gets its own directory to avoid conflicts.
7
+ #
8
+ # Options:
9
+ # --project-dir <path> Store session files under <path>/.superpowers/brainstorm/
10
+ # instead of /tmp. Files persist after server stops.
11
+ # --host <bind-host> Host/interface to bind (default: 127.0.0.1).
12
+ # Use 0.0.0.0 in remote/containerized environments.
13
+ # --url-host <host> Hostname shown in returned URL JSON.
14
+ # --foreground Run server in the current terminal (no backgrounding).
15
+ # --background Force background mode (overrides Codex auto-foreground).
16
+
17
+ SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
18
+
19
+ # Parse arguments
20
+ PROJECT_DIR=""
21
+ FOREGROUND="false"
22
+ FORCE_BACKGROUND="false"
23
+ BIND_HOST="127.0.0.1"
24
+ URL_HOST=""
25
+ while [[ $# -gt 0 ]]; do
26
+ case "$1" in
27
+ --project-dir)
28
+ PROJECT_DIR="$2"
29
+ shift 2
30
+ ;;
31
+ --host)
32
+ BIND_HOST="$2"
33
+ shift 2
34
+ ;;
35
+ --url-host)
36
+ URL_HOST="$2"
37
+ shift 2
38
+ ;;
39
+ --foreground|--no-daemon)
40
+ FOREGROUND="true"
41
+ shift
42
+ ;;
43
+ --background|--daemon)
44
+ FORCE_BACKGROUND="true"
45
+ shift
46
+ ;;
47
+ *)
48
+ echo "{\"error\": \"Unknown argument: $1\"}"
49
+ exit 1
50
+ ;;
51
+ esac
52
+ done
53
+
54
+ if [[ -z "$URL_HOST" ]]; then
55
+ if [[ "$BIND_HOST" == "127.0.0.1" || "$BIND_HOST" == "localhost" ]]; then
56
+ URL_HOST="localhost"
57
+ else
58
+ URL_HOST="$BIND_HOST"
59
+ fi
60
+ fi
61
+
62
+ # Some environments reap detached/background processes. Auto-foreground when detected.
63
+ if [[ -n "${CODEX_CI:-}" && "$FOREGROUND" != "true" && "$FORCE_BACKGROUND" != "true" ]]; then
64
+ FOREGROUND="true"
65
+ fi
66
+
67
+ # Windows/Git Bash reaps nohup background processes. Auto-foreground when detected.
68
+ if [[ "$FOREGROUND" != "true" && "$FORCE_BACKGROUND" != "true" ]]; then
69
+ case "${OSTYPE:-}" in
70
+ msys*|cygwin*|mingw*) FOREGROUND="true" ;;
71
+ esac
72
+ if [[ -n "${MSYSTEM:-}" ]]; then
73
+ FOREGROUND="true"
74
+ fi
75
+ fi
76
+
77
+ # Generate unique session directory
78
+ SESSION_ID="$$-$(date +%s)"
79
+
80
+ if [[ -n "$PROJECT_DIR" ]]; then
81
+ SESSION_DIR="${PROJECT_DIR}/.superpowers/brainstorm/${SESSION_ID}"
82
+ else
83
+ SESSION_DIR="/tmp/brainstorm-${SESSION_ID}"
84
+ fi
85
+
86
+ STATE_DIR="${SESSION_DIR}/state"
87
+ PID_FILE="${STATE_DIR}/server.pid"
88
+ LOG_FILE="${STATE_DIR}/server.log"
89
+
90
+ # Create fresh session directory with content and state peers
91
+ mkdir -p "${SESSION_DIR}/content" "$STATE_DIR"
92
+
93
+ # Kill any existing server
94
+ if [[ -f "$PID_FILE" ]]; then
95
+ old_pid=$(cat "$PID_FILE")
96
+ kill "$old_pid" 2>/dev/null
97
+ rm -f "$PID_FILE"
98
+ fi
99
+
100
+ cd "$SCRIPT_DIR"
101
+
102
+ # Resolve the harness PID (grandparent of this script).
103
+ # $PPID is the ephemeral shell the harness spawned to run us — it dies
104
+ # when this script exits. The harness itself is $PPID's parent.
105
+ OWNER_PID="$(ps -o ppid= -p "$PPID" 2>/dev/null | tr -d ' ')"
106
+ if [[ -z "$OWNER_PID" || "$OWNER_PID" == "1" ]]; then
107
+ OWNER_PID="$PPID"
108
+ fi
109
+
110
+ # Foreground mode for environments that reap detached/background processes.
111
+ if [[ "$FOREGROUND" == "true" ]]; then
112
+ echo "$$" > "$PID_FILE"
113
+ env BRAINSTORM_DIR="$SESSION_DIR" BRAINSTORM_HOST="$BIND_HOST" BRAINSTORM_URL_HOST="$URL_HOST" BRAINSTORM_OWNER_PID="$OWNER_PID" node server.cjs
114
+ exit $?
115
+ fi
116
+
117
+ # Start server, capturing output to log file
118
+ # Use nohup to survive shell exit; disown to remove from job table
119
+ nohup env BRAINSTORM_DIR="$SESSION_DIR" BRAINSTORM_HOST="$BIND_HOST" BRAINSTORM_URL_HOST="$URL_HOST" BRAINSTORM_OWNER_PID="$OWNER_PID" node server.cjs > "$LOG_FILE" 2>&1 &
120
+ SERVER_PID=$!
121
+ disown "$SERVER_PID" 2>/dev/null
122
+ echo "$SERVER_PID" > "$PID_FILE"
123
+
124
+ # Wait for server-started message (check log file)
125
+ for i in {1..50}; do
126
+ if grep -q "server-started" "$LOG_FILE" 2>/dev/null; then
127
+ # Verify server is still alive after a short window (catches process reapers)
128
+ alive="true"
129
+ for _ in {1..20}; do
130
+ if ! kill -0 "$SERVER_PID" 2>/dev/null; then
131
+ alive="false"
132
+ break
133
+ fi
134
+ sleep 0.1
135
+ done
136
+ if [[ "$alive" != "true" ]]; then
137
+ echo "{\"error\": \"Server started but was killed. Retry in a persistent terminal with: $SCRIPT_DIR/start-server.sh${PROJECT_DIR:+ --project-dir $PROJECT_DIR} --host $BIND_HOST --url-host $URL_HOST --foreground\"}"
138
+ exit 1
139
+ fi
140
+ grep "server-started" "$LOG_FILE" | head -1
141
+ exit 0
142
+ fi
143
+ sleep 0.1
144
+ done
145
+
146
+ # Timeout - server didn't start
147
+ echo '{"error": "Server failed to start within 5 seconds"}'
148
+ exit 1
.agents/skills/brainstorming/scripts/stop-server.sh ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env bash
2
+ # Stop the brainstorm server and clean up
3
+ # Usage: stop-server.sh <session_dir>
4
+ #
5
+ # Kills the server process. Only deletes session directory if it's
6
+ # under /tmp (ephemeral). Persistent directories (.superpowers/) are
7
+ # kept so mockups can be reviewed later.
8
+
9
+ SESSION_DIR="$1"
10
+
11
+ if [[ -z "$SESSION_DIR" ]]; then
12
+ echo '{"error": "Usage: stop-server.sh <session_dir>"}'
13
+ exit 1
14
+ fi
15
+
16
+ STATE_DIR="${SESSION_DIR}/state"
17
+ PID_FILE="${STATE_DIR}/server.pid"
18
+
19
+ if [[ -f "$PID_FILE" ]]; then
20
+ pid=$(cat "$PID_FILE")
21
+
22
+ # Try to stop gracefully, fallback to force if still alive
23
+ kill "$pid" 2>/dev/null || true
24
+
25
+ # Wait for graceful shutdown (up to ~2s)
26
+ for i in {1..20}; do
27
+ if ! kill -0 "$pid" 2>/dev/null; then
28
+ break
29
+ fi
30
+ sleep 0.1
31
+ done
32
+
33
+ # If still running, escalate to SIGKILL
34
+ if kill -0 "$pid" 2>/dev/null; then
35
+ kill -9 "$pid" 2>/dev/null || true
36
+
37
+ # Give SIGKILL a moment to take effect
38
+ sleep 0.1
39
+ fi
40
+
41
+ if kill -0 "$pid" 2>/dev/null; then
42
+ echo '{"status": "failed", "error": "process still running"}'
43
+ exit 1
44
+ fi
45
+
46
+ rm -f "$PID_FILE" "${STATE_DIR}/server.log"
47
+
48
+ # Only delete ephemeral /tmp directories
49
+ if [[ "$SESSION_DIR" == /tmp/* ]]; then
50
+ rm -rf "$SESSION_DIR"
51
+ fi
52
+
53
+ echo '{"status": "stopped"}'
54
+ else
55
+ echo '{"status": "not_running"}'
56
+ fi
.agents/skills/brainstorming/spec-document-reviewer-prompt.md ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Spec Document Reviewer Prompt Template
2
+
3
+ Use this template when dispatching a spec document reviewer subagent.
4
+
5
+ **Purpose:** Verify the spec is complete, consistent, and ready for implementation planning.
6
+
7
+ **Dispatch after:** Spec document is written to docs/superpowers/specs/
8
+
9
+ ```
10
+ Task tool (general-purpose):
11
+ description: "Review spec document"
12
+ prompt: |
13
+ You are a spec document reviewer. Verify this spec is complete and ready for planning.
14
+
15
+ **Spec to review:** [SPEC_FILE_PATH]
16
+
17
+ ## What to Check
18
+
19
+ | Category | What to Look For |
20
+ |----------|------------------|
21
+ | Completeness | TODOs, placeholders, "TBD", incomplete sections |
22
+ | Consistency | Internal contradictions, conflicting requirements |
23
+ | Clarity | Requirements ambiguous enough to cause someone to build the wrong thing |
24
+ | Scope | Focused enough for a single plan — not covering multiple independent subsystems |
25
+ | YAGNI | Unrequested features, over-engineering |
26
+
27
+ ## Calibration
28
+
29
+ **Only flag issues that would cause real problems during implementation planning.**
30
+ A missing section, a contradiction, or a requirement so ambiguous it could be
31
+ interpreted two different ways — those are issues. Minor wording improvements,
32
+ stylistic preferences, and "sections less detailed than others" are not.
33
+
34
+ Approve unless there are serious gaps that would lead to a flawed plan.
35
+
36
+ ## Output Format
37
+
38
+ ## Spec Review
39
+
40
+ **Status:** Approved | Issues Found
41
+
42
+ **Issues (if any):**
43
+ - [Section X]: [specific issue] - [why it matters for planning]
44
+
45
+ **Recommendations (advisory, do not block approval):**
46
+ - [suggestions for improvement]
47
+ ```
48
+
49
+ **Reviewer returns:** Status, Issues (if any), Recommendations
.agents/skills/brainstorming/visual-companion.md ADDED
@@ -0,0 +1,287 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Visual Companion Guide
2
+
3
+ Browser-based visual brainstorming companion for showing mockups, diagrams, and options.
4
+
5
+ ## When to Use
6
+
7
+ Decide per-question, not per-session. The test: **would the user understand this better by seeing it than reading it?**
8
+
9
+ **Use the browser** when the content itself is visual:
10
+
11
+ - **UI mockups** — wireframes, layouts, navigation structures, component designs
12
+ - **Architecture diagrams** — system components, data flow, relationship maps
13
+ - **Side-by-side visual comparisons** — comparing two layouts, two color schemes, two design directions
14
+ - **Design polish** — when the question is about look and feel, spacing, visual hierarchy
15
+ - **Spatial relationships** — state machines, flowcharts, entity relationships rendered as diagrams
16
+
17
+ **Use the terminal** when the content is text or tabular:
18
+
19
+ - **Requirements and scope questions** — "what does X mean?", "which features are in scope?"
20
+ - **Conceptual A/B/C choices** — picking between approaches described in words
21
+ - **Tradeoff lists** — pros/cons, comparison tables
22
+ - **Technical decisions** — API design, data modeling, architectural approach selection
23
+ - **Clarifying questions** — anything where the answer is words, not a visual preference
24
+
25
+ A question *about* a UI topic is not automatically a visual question. "What kind of wizard do you want?" is conceptual — use the terminal. "Which of these wizard layouts feels right?" is visual — use the browser.
26
+
27
+ ## How It Works
28
+
29
+ The server watches a directory for HTML files and serves the newest one to the browser. You write HTML content to `screen_dir`, the user sees it in their browser and can click to select options. Selections are recorded to `state_dir/events` that you read on your next turn.
30
+
31
+ **Content fragments vs full documents:** If your HTML file starts with `<!DOCTYPE` or `<html`, the server serves it as-is (just injects the helper script). Otherwise, the server automatically wraps your content in the frame template — adding the header, CSS theme, selection indicator, and all interactive infrastructure. **Write content fragments by default.** Only write full documents when you need complete control over the page.
32
+
33
+ ## Starting a Session
34
+
35
+ ```bash
36
+ # Start server with persistence (mockups saved to project)
37
+ scripts/start-server.sh --project-dir /path/to/project
38
+
39
+ # Returns: {"type":"server-started","port":52341,"url":"http://localhost:52341",
40
+ # "screen_dir":"/path/to/project/.superpowers/brainstorm/12345-1706000000/content",
41
+ # "state_dir":"/path/to/project/.superpowers/brainstorm/12345-1706000000/state"}
42
+ ```
43
+
44
+ Save `screen_dir` and `state_dir` from the response. Tell user to open the URL.
45
+
46
+ **Finding connection info:** The server writes its startup JSON to `$STATE_DIR/server-info`. If you launched the server in the background and didn't capture stdout, read that file to get the URL and port. When using `--project-dir`, check `<project>/.superpowers/brainstorm/` for the session directory.
47
+
48
+ **Note:** Pass the project root as `--project-dir` so mockups persist in `.superpowers/brainstorm/` and survive server restarts. Without it, files go to `/tmp` and get cleaned up. Remind the user to add `.superpowers/` to `.gitignore` if it's not already there.
49
+
50
+ **Launching the server by platform:**
51
+
52
+ **Claude Code (macOS / Linux):**
53
+ ```bash
54
+ # Default mode works — the script backgrounds the server itself
55
+ scripts/start-server.sh --project-dir /path/to/project
56
+ ```
57
+
58
+ **Claude Code (Windows):**
59
+ ```bash
60
+ # Windows auto-detects and uses foreground mode, which blocks the tool call.
61
+ # Use run_in_background: true on the Bash tool call so the server survives
62
+ # across conversation turns.
63
+ scripts/start-server.sh --project-dir /path/to/project
64
+ ```
65
+ When calling this via the Bash tool, set `run_in_background: true`. Then read `$STATE_DIR/server-info` on the next turn to get the URL and port.
66
+
67
+ **Codex:**
68
+ ```bash
69
+ # Codex reaps background processes. The script auto-detects CODEX_CI and
70
+ # switches to foreground mode. Run it normally — no extra flags needed.
71
+ scripts/start-server.sh --project-dir /path/to/project
72
+ ```
73
+
74
+ **Gemini CLI:**
75
+ ```bash
76
+ # Use --foreground and set is_background: true on your shell tool call
77
+ # so the process survives across turns
78
+ scripts/start-server.sh --project-dir /path/to/project --foreground
79
+ ```
80
+
81
+ **Other environments:** The server must keep running in the background across conversation turns. If your environment reaps detached processes, use `--foreground` and launch the command with your platform's background execution mechanism.
82
+
83
+ If the URL is unreachable from your browser (common in remote/containerized setups), bind a non-loopback host:
84
+
85
+ ```bash
86
+ scripts/start-server.sh \
87
+ --project-dir /path/to/project \
88
+ --host 0.0.0.0 \
89
+ --url-host localhost
90
+ ```
91
+
92
+ Use `--url-host` to control what hostname is printed in the returned URL JSON.
93
+
94
+ ## The Loop
95
+
96
+ 1. **Check server is alive**, then **write HTML** to a new file in `screen_dir`:
97
+ - Before each write, check that `$STATE_DIR/server-info` exists. If it doesn't (or `$STATE_DIR/server-stopped` exists), the server has shut down — restart it with `start-server.sh` before continuing. The server auto-exits after 30 minutes of inactivity.
98
+ - Use semantic filenames: `platform.html`, `visual-style.html`, `layout.html`
99
+ - **Never reuse filenames** — each screen gets a fresh file
100
+ - Use Write tool — **never use cat/heredoc** (dumps noise into terminal)
101
+ - Server automatically serves the newest file
102
+
103
+ 2. **Tell user what to expect and end your turn:**
104
+ - Remind them of the URL (every step, not just first)
105
+ - Give a brief text summary of what's on screen (e.g., "Showing 3 layout options for the homepage")
106
+ - Ask them to respond in the terminal: "Take a look and let me know what you think. Click to select an option if you'd like."
107
+
108
+ 3. **On your next turn** — after the user responds in the terminal:
109
+ - Read `$STATE_DIR/events` if it exists — this contains the user's browser interactions (clicks, selections) as JSON lines
110
+ - Merge with the user's terminal text to get the full picture
111
+ - The terminal message is the primary feedback; `state_dir/events` provides structured interaction data
112
+
113
+ 4. **Iterate or advance** — if feedback changes current screen, write a new file (e.g., `layout-v2.html`). Only move to the next question when the current step is validated.
114
+
115
+ 5. **Unload when returning to terminal** — when the next step doesn't need the browser (e.g., a clarifying question, a tradeoff discussion), push a waiting screen to clear the stale content:
116
+
117
+ ```html
118
+ <!-- filename: waiting.html (or waiting-2.html, etc.) -->
119
+ <div style="display:flex;align-items:center;justify-content:center;min-height:60vh">
120
+ <p class="subtitle">Continuing in terminal...</p>
121
+ </div>
122
+ ```
123
+
124
+ This prevents the user from staring at a resolved choice while the conversation has moved on. When the next visual question comes up, push a new content file as usual.
125
+
126
+ 6. Repeat until done.
127
+
128
+ ## Writing Content Fragments
129
+
130
+ Write just the content that goes inside the page. The server wraps it in the frame template automatically (header, theme CSS, selection indicator, and all interactive infrastructure).
131
+
132
+ **Minimal example:**
133
+
134
+ ```html
135
+ <h2>Which layout works better?</h2>
136
+ <p class="subtitle">Consider readability and visual hierarchy</p>
137
+
138
+ <div class="options">
139
+ <div class="option" data-choice="a" onclick="toggleSelect(this)">
140
+ <div class="letter">A</div>
141
+ <div class="content">
142
+ <h3>Single Column</h3>
143
+ <p>Clean, focused reading experience</p>
144
+ </div>
145
+ </div>
146
+ <div class="option" data-choice="b" onclick="toggleSelect(this)">
147
+ <div class="letter">B</div>
148
+ <div class="content">
149
+ <h3>Two Column</h3>
150
+ <p>Sidebar navigation with main content</p>
151
+ </div>
152
+ </div>
153
+ </div>
154
+ ```
155
+
156
+ That's it. No `<html>`, no CSS, no `<script>` tags needed. The server provides all of that.
157
+
158
+ ## CSS Classes Available
159
+
160
+ The frame template provides these CSS classes for your content:
161
+
162
+ ### Options (A/B/C choices)
163
+
164
+ ```html
165
+ <div class="options">
166
+ <div class="option" data-choice="a" onclick="toggleSelect(this)">
167
+ <div class="letter">A</div>
168
+ <div class="content">
169
+ <h3>Title</h3>
170
+ <p>Description</p>
171
+ </div>
172
+ </div>
173
+ </div>
174
+ ```
175
+
176
+ **Multi-select:** Add `data-multiselect` to the container to let users select multiple options. Each click toggles the item. The indicator bar shows the count.
177
+
178
+ ```html
179
+ <div class="options" data-multiselect>
180
+ <!-- same option markup — users can select/deselect multiple -->
181
+ </div>
182
+ ```
183
+
184
+ ### Cards (visual designs)
185
+
186
+ ```html
187
+ <div class="cards">
188
+ <div class="card" data-choice="design1" onclick="toggleSelect(this)">
189
+ <div class="card-image"><!-- mockup content --></div>
190
+ <div class="card-body">
191
+ <h3>Name</h3>
192
+ <p>Description</p>
193
+ </div>
194
+ </div>
195
+ </div>
196
+ ```
197
+
198
+ ### Mockup container
199
+
200
+ ```html
201
+ <div class="mockup">
202
+ <div class="mockup-header">Preview: Dashboard Layout</div>
203
+ <div class="mockup-body"><!-- your mockup HTML --></div>
204
+ </div>
205
+ ```
206
+
207
+ ### Split view (side-by-side)
208
+
209
+ ```html
210
+ <div class="split">
211
+ <div class="mockup"><!-- left --></div>
212
+ <div class="mockup"><!-- right --></div>
213
+ </div>
214
+ ```
215
+
216
+ ### Pros/Cons
217
+
218
+ ```html
219
+ <div class="pros-cons">
220
+ <div class="pros"><h4>Pros</h4><ul><li>Benefit</li></ul></div>
221
+ <div class="cons"><h4>Cons</h4><ul><li>Drawback</li></ul></div>
222
+ </div>
223
+ ```
224
+
225
+ ### Mock elements (wireframe building blocks)
226
+
227
+ ```html
228
+ <div class="mock-nav">Logo | Home | About | Contact</div>
229
+ <div style="display: flex;">
230
+ <div class="mock-sidebar">Navigation</div>
231
+ <div class="mock-content">Main content area</div>
232
+ </div>
233
+ <button class="mock-button">Action Button</button>
234
+ <input class="mock-input" placeholder="Input field">
235
+ <div class="placeholder">Placeholder area</div>
236
+ ```
237
+
238
+ ### Typography and sections
239
+
240
+ - `h2` — page title
241
+ - `h3` — section heading
242
+ - `.subtitle` — secondary text below title
243
+ - `.section` — content block with bottom margin
244
+ - `.label` — small uppercase label text
245
+
246
+ ## Browser Events Format
247
+
248
+ When the user clicks options in the browser, their interactions are recorded to `$STATE_DIR/events` (one JSON object per line). The file is cleared automatically when you push a new screen.
249
+
250
+ ```jsonl
251
+ {"type":"click","choice":"a","text":"Option A - Simple Layout","timestamp":1706000101}
252
+ {"type":"click","choice":"c","text":"Option C - Complex Grid","timestamp":1706000108}
253
+ {"type":"click","choice":"b","text":"Option B - Hybrid","timestamp":1706000115}
254
+ ```
255
+
256
+ The full event stream shows the user's exploration path — they may click multiple options before settling. The last `choice` event is typically the final selection, but the pattern of clicks can reveal hesitation or preferences worth asking about.
257
+
258
+ If `$STATE_DIR/events` doesn't exist, the user didn't interact with the browser — use only their terminal text.
259
+
260
+ ## Design Tips
261
+
262
+ - **Scale fidelity to the question** — wireframes for layout, polish for polish questions
263
+ - **Explain the question on each page** — "Which layout feels more professional?" not just "Pick one"
264
+ - **Iterate before advancing** — if feedback changes current screen, write a new version
265
+ - **2-4 options max** per screen
266
+ - **Use real content when it matters** — for a photography portfolio, use actual images (Unsplash). Placeholder content obscures design issues.
267
+ - **Keep mockups simple** — focus on layout and structure, not pixel-perfect design
268
+
269
+ ## File Naming
270
+
271
+ - Use semantic names: `platform.html`, `visual-style.html`, `layout.html`
272
+ - Never reuse filenames — each screen must be a new file
273
+ - For iterations: append version suffix like `layout-v2.html`, `layout-v3.html`
274
+ - Server serves newest file by modification time
275
+
276
+ ## Cleaning Up
277
+
278
+ ```bash
279
+ scripts/stop-server.sh $SESSION_DIR
280
+ ```
281
+
282
+ If the session used `--project-dir`, mockup files persist in `.superpowers/brainstorm/` for later reference. Only `/tmp` sessions get deleted on stop.
283
+
284
+ ## Reference
285
+
286
+ - Frame template (CSS reference): `scripts/frame-template.html`
287
+ - Helper script (client-side): `scripts/helper.js`
.agents/skills/caveman-commit/SKILL.md ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ name: caveman-commit
3
+ description: >
4
+ Ultra-compressed commit message generator. Cuts noise from commit messages while preserving
5
+ intent and reasoning. Conventional Commits format. Subject ≤50 chars, body only when "why"
6
+ isn't obvious. Use when user says "write a commit", "commit message", "generate commit",
7
+ "/commit", or invokes /caveman-commit. Auto-triggers when staging changes.
8
+ ---
9
+
10
+ Write commit messages terse and exact. Conventional Commits format. No fluff. Why over what.
11
+
12
+ ## Rules
13
+
14
+ **Subject line:**
15
+ - `<type>(<scope>): <imperative summary>` — `<scope>` optional
16
+ - Types: `feat`, `fix`, `refactor`, `perf`, `docs`, `test`, `chore`, `build`, `ci`, `style`, `revert`
17
+ - Imperative mood: "add", "fix", "remove" — not "added", "adds", "adding"
18
+ - ≤50 chars when possible, hard cap 72
19
+ - No trailing period
20
+ - Match project convention for capitalization after the colon
21
+
22
+ **Body (only if needed):**
23
+ - Skip entirely when subject is self-explanatory
24
+ - Add body only for: non-obvious *why*, breaking changes, migration notes, linked issues
25
+ - Wrap at 72 chars
26
+ - Bullets `-` not `*`
27
+ - Reference issues/PRs at end: `Closes #42`, `Refs #17`
28
+
29
+ **What NEVER goes in:**
30
+ - "This commit does X", "I", "we", "now", "currently" — the diff says what
31
+ - "As requested by..." — use Co-authored-by trailer
32
+ - "Generated with Claude Code" or any AI attribution
33
+ - Emoji (unless project convention requires)
34
+ - Restating the file name when scope already says it
35
+
36
+ ## Examples
37
+
38
+ Diff: new endpoint for user profile with body explaining the why
39
+ - ❌ "feat: add a new endpoint to get user profile information from the database"
40
+ - ✅
41
+ ```
42
+ feat(api): add GET /users/:id/profile
43
+
44
+ Mobile client needs profile data without the full user payload
45
+ to reduce LTE bandwidth on cold-launch screens.
46
+
47
+ Closes #128
48
+ ```
49
+
50
+ Diff: breaking API change
51
+ - ✅
52
+ ```
53
+ feat(api)!: rename /v1/orders to /v1/checkout
54
+
55
+ BREAKING CHANGE: clients on /v1/orders must migrate to /v1/checkout
56
+ before 2026-06-01. Old route returns 410 after that date.
57
+ ```
58
+
59
+ ## Auto-Clarity
60
+
61
+ Always include body for: breaking changes, security fixes, data migrations, anything reverting a prior commit. Never compress these into subject-only — future debuggers need the context.
62
+
63
+ ## Boundaries
64
+
65
+ Only generates the commit message. Does not run `git commit`, does not stage files, does not amend. Output the message as a code block ready to paste. "stop caveman-commit" or "normal mode": revert to verbose commit style.
.agents/skills/caveman-help/SKILL.md ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ name: caveman-help
3
+ description: >
4
+ Quick-reference card for all caveman modes, skills, and commands.
5
+ One-shot display, not a persistent mode. Trigger: /caveman-help,
6
+ "caveman help", "what caveman commands", "how do I use caveman".
7
+ ---
8
+
9
+ # Caveman Help
10
+
11
+ Display this reference card when invoked. One-shot — do NOT change mode, write flag files, or persist anything. Output in caveman style.
12
+
13
+ ## Modes
14
+
15
+ | Mode | Trigger | What change |
16
+ |------|---------|-------------|
17
+ | **Lite** | `/caveman lite` | Drop filler. Keep sentence structure. |
18
+ | **Full** | `/caveman` | Drop articles, filler, pleasantries, hedging. Fragments OK. Default. |
19
+ | **Ultra** | `/caveman ultra` | Extreme compression. Bare fragments. Tables over prose. |
20
+ | **Wenyan-Lite** | `/caveman wenyan-lite` | Classical Chinese style, light compression. |
21
+ | **Wenyan-Full** | `/caveman wenyan` | Full 文言文. Maximum classical terseness. |
22
+ | **Wenyan-Ultra** | `/caveman wenyan-ultra` | Extreme. Ancient scholar on a budget. |
23
+
24
+ Mode stick until changed or session end.
25
+
26
+ ## Skills
27
+
28
+ | Skill | Trigger | What it do |
29
+ |-------|---------|-----------|
30
+ | **caveman-commit** | `/caveman-commit` | Terse commit messages. Conventional Commits. ≤50 char subject. |
31
+ | **caveman-review** | `/caveman-review` | One-line PR comments: `L42: bug: user null. Add guard.` |
32
+ | **caveman-compress** | `/caveman:compress <file>` | Compress .md files to caveman prose. Saves ~46% input tokens. |
33
+ | **caveman-help** | `/caveman-help` | This card. |
34
+
35
+ ## Deactivate
36
+
37
+ Say "stop caveman" or "normal mode". Resume anytime with `/caveman`.
38
+
39
+ ## Configure Default Mode
40
+
41
+ Default mode = `full`. Change it:
42
+
43
+ **Environment variable** (highest priority):
44
+ ```bash
45
+ export CAVEMAN_DEFAULT_MODE=ultra
46
+ ```
47
+
48
+ **Config file** (`~/.config/caveman/config.json`):
49
+ ```json
50
+ { "defaultMode": "lite" }
51
+ ```
52
+
53
+ Set `"off"` to disable auto-activation on session start. User can still activate manually with `/caveman`.
54
+
55
+ Resolution: env var > config file > `full`.
56
+
57
+ ## More
58
+
59
+ Full docs: https://github.com/JuliusBrussee/caveman
.agents/skills/caveman-review/SKILL.md ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ name: caveman-review
3
+ description: >
4
+ Ultra-compressed code review comments. Cuts noise from PR feedback while preserving
5
+ the actionable signal. Each comment is one line: location, problem, fix. Use when user
6
+ says "review this PR", "code review", "review the diff", "/review", or invokes
7
+ /caveman-review. Auto-triggers when reviewing pull requests.
8
+ ---
9
+
10
+ Write code review comments terse and actionable. One line per finding. Location, problem, fix. No throat-clearing.
11
+
12
+ ## Rules
13
+
14
+ **Format:** `L<line>: <problem>. <fix>.` — or `<file>:L<line>: ...` when reviewing multi-file diffs.
15
+
16
+ **Severity prefix (optional, when mixed):**
17
+ - `🔴 bug:` — broken behavior, will cause incident
18
+ - `🟡 risk:` — works but fragile (race, missing null check, swallowed error)
19
+ - `🔵 nit:` — style, naming, micro-optim. Author can ignore
20
+ - `❓ q:` — genuine question, not a suggestion
21
+
22
+ **Drop:**
23
+ - "I noticed that...", "It seems like...", "You might want to consider..."
24
+ - "This is just a suggestion but..." — use `nit:` instead
25
+ - "Great work!", "Looks good overall but..." — say it once at the top, not per comment
26
+ - Restating what the line does — the reviewer can read the diff
27
+ - Hedging ("perhaps", "maybe", "I think") — if unsure use `q:`
28
+
29
+ **Keep:**
30
+ - Exact line numbers
31
+ - Exact symbol/function/variable names in backticks
32
+ - Concrete fix, not "consider refactoring this"
33
+ - The *why* if the fix isn't obvious from the problem statement
34
+
35
+ ## Examples
36
+
37
+ ❌ "I noticed that on line 42 you're not checking if the user object is null before accessing the email property. This could potentially cause a crash if the user is not found in the database. You might want to add a null check here."
38
+
39
+ ✅ `L42: 🔴 bug: user can be null after .find(). Add guard before .email.`
40
+
41
+ ❌ "It looks like this function is doing a lot of things and might benefit from being broken up into smaller functions for readability."
42
+
43
+ ✅ `L88-140: 🔵 nit: 50-line fn does 4 things. Extract validate/normalize/persist.`
44
+
45
+ ❌ "Have you considered what happens if the API returns a 429? I think we should probably handle that case."
46
+
47
+ ✅ `L23: 🟡 risk: no retry on 429. Wrap in withBackoff(3).`
48
+
49
+ ## Auto-Clarity
50
+
51
+ Drop terse mode for: security findings (CVE-class bugs need full explanation + reference), architectural disagreements (need rationale, not just a one-liner), and onboarding contexts where the author is new and needs the "why". In those cases write a normal paragraph, then resume terse for the rest.
52
+
53
+ ## Boundaries
54
+
55
+ Reviews only — does not write the code fix, does not approve/request-changes, does not run linters. Output the comment(s) ready to paste into the PR. "stop caveman-review" or "normal mode": revert to verbose review style.
.agents/skills/caveman/SKILL.md ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ name: caveman
3
+ description: >
4
+ Ultra-compressed communication mode. Cuts token usage ~75% by speaking like caveman
5
+ while keeping full technical accuracy. Supports intensity levels: lite, full (default), ultra,
6
+ wenyan-lite, wenyan-full, wenyan-ultra.
7
+ Use when user says "caveman mode", "talk like caveman", "use caveman", "less tokens",
8
+ "be brief", or invokes /caveman. Also auto-triggers when token efficiency is requested.
9
+ ---
10
+
11
+ Respond terse like smart caveman. All technical substance stay. Only fluff die.
12
+
13
+ ## Persistence
14
+
15
+ ACTIVE EVERY RESPONSE. No revert after many turns. No filler drift. Still active if unsure. Off only: "stop caveman" / "normal mode".
16
+
17
+ Default: **full**. Switch: `/caveman lite|full|ultra`.
18
+
19
+ ## Rules
20
+
21
+ Drop: articles (a/an/the), filler (just/really/basically/actually/simply), pleasantries (sure/certainly/of course/happy to), hedging. Fragments OK. Short synonyms (big not extensive, fix not "implement a solution for"). Technical terms exact. Code blocks unchanged. Errors quoted exact.
22
+
23
+ Pattern: `[thing] [action] [reason]. [next step].`
24
+
25
+ Not: "Sure! I'd be happy to help you with that. The issue you're experiencing is likely caused by..."
26
+ Yes: "Bug in auth middleware. Token expiry check use `<` not `<=`. Fix:"
27
+
28
+ ## Intensity
29
+
30
+ | Level | What change |
31
+ |-------|------------|
32
+ | **lite** | No filler/hedging. Keep articles + full sentences. Professional but tight |
33
+ | **full** | Drop articles, fragments OK, short synonyms. Classic caveman |
34
+ | **ultra** | Abbreviate (DB/auth/config/req/res/fn/impl), strip conjunctions, arrows for causality (X → Y), one word when one word enough |
35
+ | **wenyan-lite** | Semi-classical. Drop filler/hedging but keep grammar structure, classical register |
36
+ | **wenyan-full** | Maximum classical terseness. Fully 文言文. 80-90% character reduction. Classical sentence patterns, verbs precede objects, subjects often omitted, classical particles (之/乃/為/其) |
37
+ | **wenyan-ultra** | Extreme abbreviation while keeping classical Chinese feel. Maximum compression, ultra terse |
38
+
39
+ Example — "Why React component re-render?"
40
+ - lite: "Your component re-renders because you create a new object reference each render. Wrap it in `useMemo`."
41
+ - full: "New object ref each render. Inline object prop = new ref = re-render. Wrap in `useMemo`."
42
+ - ultra: "Inline obj prop → new ref → re-render. `useMemo`."
43
+ - wenyan-lite: "組件頻重繪,以每繪新生對象參照故。以 useMemo 包之。"
44
+ - wenyan-full: "物出新參照,致重繪。useMemo .Wrap之。"
45
+ - wenyan-ultra: "新參照→重繪。useMemo Wrap。"
46
+
47
+ Example — "Explain database connection pooling."
48
+ - lite: "Connection pooling reuses open connections instead of creating new ones per request. Avoids repeated handshake overhead."
49
+ - full: "Pool reuse open DB connections. No new connection per request. Skip handshake overhead."
50
+ - ultra: "Pool = reuse DB conn. Skip handshake → fast under load."
51
+ - wenyan-full: "池reuse open connection。不每req新開。skip handshake overhead。"
52
+ - wenyan-ultra: "池reuse conn。skip handshake → fast。"
53
+
54
+ ## Auto-Clarity
55
+
56
+ Drop caveman for: security warnings, irreversible action confirmations, multi-step sequences where fragment order risks misread, user asks to clarify or repeats question. Resume caveman after clear part done.
57
+
58
+ Example — destructive op:
59
+ > **Warning:** This will permanently delete all rows in the `users` table and cannot be undone.
60
+ > ```sql
61
+ > DROP TABLE users;
62
+ > ```
63
+ > Caveman resume. Verify backup exist first.
64
+
65
+ ## Boundaries
66
+
67
+ Code/commits/PRs: write normal. "stop caveman" or "normal mode": revert. Level persist until changed or session end.
.agents/skills/deep-research/SKILL.md ADDED
@@ -0,0 +1,155 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ name: deep-research
3
+ description: Multi-source deep research using firecrawl and exa MCPs. Searches the web, synthesizes findings, and delivers cited reports with source attribution. Use when the user wants thorough research on any topic with evidence and citations.
4
+ origin: ECC
5
+ ---
6
+
7
+ # Deep Research
8
+
9
+ Produce thorough, cited research reports from multiple web sources using firecrawl and exa MCP tools.
10
+
11
+ ## When to Activate
12
+
13
+ - User asks to research any topic in depth
14
+ - Competitive analysis, technology evaluation, or market sizing
15
+ - Due diligence on companies, investors, or technologies
16
+ - Any question requiring synthesis from multiple sources
17
+ - User says "research", "deep dive", "investigate", or "what's the current state of"
18
+
19
+ ## MCP Requirements
20
+
21
+ At least one of:
22
+ - **firecrawl** — `firecrawl_search`, `firecrawl_scrape`, `firecrawl_crawl`
23
+ - **exa** — `web_search_exa`, `web_search_advanced_exa`, `crawling_exa`
24
+
25
+ Both together give the best coverage. Configure in `~/.claude.json` or `~/.codex/config.toml`.
26
+
27
+ ## Workflow
28
+
29
+ ### Step 1: Understand the Goal
30
+
31
+ Ask 1-2 quick clarifying questions:
32
+ - "What's your goal — learning, making a decision, or writing something?"
33
+ - "Any specific angle or depth you want?"
34
+
35
+ If the user says "just research it" — skip ahead with reasonable defaults.
36
+
37
+ ### Step 2: Plan the Research
38
+
39
+ Break the topic into 3-5 research sub-questions. Example:
40
+ - Topic: "Impact of AI on healthcare"
41
+ - What are the main AI applications in healthcare today?
42
+ - What clinical outcomes have been measured?
43
+ - What are the regulatory challenges?
44
+ - What companies are leading this space?
45
+ - What's the market size and growth trajectory?
46
+
47
+ ### Step 3: Execute Multi-Source Search
48
+
49
+ For EACH sub-question, search using available MCP tools:
50
+
51
+ **With firecrawl:**
52
+ ```
53
+ firecrawl_search(query: "<sub-question keywords>", limit: 8)
54
+ ```
55
+
56
+ **With exa:**
57
+ ```
58
+ web_search_exa(query: "<sub-question keywords>", numResults: 8)
59
+ web_search_advanced_exa(query: "<keywords>", numResults: 5, startPublishedDate: "2025-01-01")
60
+ ```
61
+
62
+ **Search strategy:**
63
+ - Use 2-3 different keyword variations per sub-question
64
+ - Mix general and news-focused queries
65
+ - Aim for 15-30 unique sources total
66
+ - Prioritize: academic, official, reputable news > blogs > forums
67
+
68
+ ### Step 4: Deep-Read Key Sources
69
+
70
+ For the most promising URLs, fetch full content:
71
+
72
+ **With firecrawl:**
73
+ ```
74
+ firecrawl_scrape(url: "<url>")
75
+ ```
76
+
77
+ **With exa:**
78
+ ```
79
+ crawling_exa(url: "<url>", tokensNum: 5000)
80
+ ```
81
+
82
+ Read 3-5 key sources in full for depth. Do not rely only on search snippets.
83
+
84
+ ### Step 5: Synthesize and Write Report
85
+
86
+ Structure the report:
87
+
88
+ ```markdown
89
+ # [Topic]: Research Report
90
+ *Generated: [date] | Sources: [N] | Confidence: [High/Medium/Low]*
91
+
92
+ ## Executive Summary
93
+ [3-5 sentence overview of key findings]
94
+
95
+ ## 1. [First Major Theme]
96
+ [Findings with inline citations]
97
+ - Key point ([Source Name](url))
98
+ - Supporting data ([Source Name](url))
99
+
100
+ ## 2. [Second Major Theme]
101
+ ...
102
+
103
+ ## 3. [Third Major Theme]
104
+ ...
105
+
106
+ ## Key Takeaways
107
+ - [Actionable insight 1]
108
+ - [Actionable insight 2]
109
+ - [Actionable insight 3]
110
+
111
+ ## Sources
112
+ 1. [Title](url) — [one-line summary]
113
+ 2. ...
114
+
115
+ ## Methodology
116
+ Searched [N] queries across web and news. Analyzed [M] sources.
117
+ Sub-questions investigated: [list]
118
+ ```
119
+
120
+ ### Step 6: Deliver
121
+
122
+ - **Short topics**: Post the full report in chat
123
+ - **Long reports**: Post the executive summary + key takeaways, save full report to a file
124
+
125
+ ## Parallel Research with Subagents
126
+
127
+ For broad topics, use Claude Code's Task tool to parallelize:
128
+
129
+ ```
130
+ Launch 3 research agents in parallel:
131
+ 1. Agent 1: Research sub-questions 1-2
132
+ 2. Agent 2: Research sub-questions 3-4
133
+ 3. Agent 3: Research sub-question 5 + cross-cutting themes
134
+ ```
135
+
136
+ Each agent searches, reads sources, and returns findings. The main session synthesizes into the final report.
137
+
138
+ ## Quality Rules
139
+
140
+ 1. **Every claim needs a source.** No unsourced assertions.
141
+ 2. **Cross-reference.** If only one source says it, flag it as unverified.
142
+ 3. **Recency matters.** Prefer sources from the last 12 months.
143
+ 4. **Acknowledge gaps.** If you couldn't find good info on a sub-question, say so.
144
+ 5. **No hallucination.** If you don't know, say "insufficient data found."
145
+ 6. **Separate fact from inference.** Label estimates, projections, and opinions clearly.
146
+
147
+ ## Examples
148
+
149
+ ```
150
+ "Research the current state of nuclear fusion energy"
151
+ "Deep dive into Rust vs Go for backend services in 2026"
152
+ "Research the best strategies for bootstrapping a SaaS business"
153
+ "What's happening with the US housing market right now?"
154
+ "Investigate the competitive landscape for AI code editors"
155
+ ```
.agents/skills/dispatching-parallel-agents/SKILL.md ADDED
@@ -0,0 +1,182 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ name: dispatching-parallel-agents
3
+ description: Use when facing 2+ independent tasks that can be worked on without shared state or sequential dependencies
4
+ ---
5
+
6
+ # Dispatching Parallel Agents
7
+
8
+ ## Overview
9
+
10
+ You delegate tasks to specialized agents with isolated context. By precisely crafting their instructions and context, you ensure they stay focused and succeed at their task. They should never inherit your session's context or history — you construct exactly what they need. This also preserves your own context for coordination work.
11
+
12
+ When you have multiple unrelated failures (different test files, different subsystems, different bugs), investigating them sequentially wastes time. Each investigation is independent and can happen in parallel.
13
+
14
+ **Core principle:** Dispatch one agent per independent problem domain. Let them work concurrently.
15
+
16
+ ## When to Use
17
+
18
+ ```dot
19
+ digraph when_to_use {
20
+ "Multiple failures?" [shape=diamond];
21
+ "Are they independent?" [shape=diamond];
22
+ "Single agent investigates all" [shape=box];
23
+ "One agent per problem domain" [shape=box];
24
+ "Can they work in parallel?" [shape=diamond];
25
+ "Sequential agents" [shape=box];
26
+ "Parallel dispatch" [shape=box];
27
+
28
+ "Multiple failures?" -> "Are they independent?" [label="yes"];
29
+ "Are they independent?" -> "Single agent investigates all" [label="no - related"];
30
+ "Are they independent?" -> "Can they work in parallel?" [label="yes"];
31
+ "Can they work in parallel?" -> "Parallel dispatch" [label="yes"];
32
+ "Can they work in parallel?" -> "Sequential agents" [label="no - shared state"];
33
+ }
34
+ ```
35
+
36
+ **Use when:**
37
+ - 3+ test files failing with different root causes
38
+ - Multiple subsystems broken independently
39
+ - Each problem can be understood without context from others
40
+ - No shared state between investigations
41
+
42
+ **Don't use when:**
43
+ - Failures are related (fix one might fix others)
44
+ - Need to understand full system state
45
+ - Agents would interfere with each other
46
+
47
+ ## The Pattern
48
+
49
+ ### 1. Identify Independent Domains
50
+
51
+ Group failures by what's broken:
52
+ - File A tests: Tool approval flow
53
+ - File B tests: Batch completion behavior
54
+ - File C tests: Abort functionality
55
+
56
+ Each domain is independent - fixing tool approval doesn't affect abort tests.
57
+
58
+ ### 2. Create Focused Agent Tasks
59
+
60
+ Each agent gets:
61
+ - **Specific scope:** One test file or subsystem
62
+ - **Clear goal:** Make these tests pass
63
+ - **Constraints:** Don't change other code
64
+ - **Expected output:** Summary of what you found and fixed
65
+
66
+ ### 3. Dispatch in Parallel
67
+
68
+ ```typescript
69
+ // In Claude Code / AI environment
70
+ Task("Fix agent-tool-abort.test.ts failures")
71
+ Task("Fix batch-completion-behavior.test.ts failures")
72
+ Task("Fix tool-approval-race-conditions.test.ts failures")
73
+ // All three run concurrently
74
+ ```
75
+
76
+ ### 4. Review and Integrate
77
+
78
+ When agents return:
79
+ - Read each summary
80
+ - Verify fixes don't conflict
81
+ - Run full test suite
82
+ - Integrate all changes
83
+
84
+ ## Agent Prompt Structure
85
+
86
+ Good agent prompts are:
87
+ 1. **Focused** - One clear problem domain
88
+ 2. **Self-contained** - All context needed to understand the problem
89
+ 3. **Specific about output** - What should the agent return?
90
+
91
+ ```markdown
92
+ Fix the 3 failing tests in src/agents/agent-tool-abort.test.ts:
93
+
94
+ 1. "should abort tool with partial output capture" - expects 'interrupted at' in message
95
+ 2. "should handle mixed completed and aborted tools" - fast tool aborted instead of completed
96
+ 3. "should properly track pendingToolCount" - expects 3 results but gets 0
97
+
98
+ These are timing/race condition issues. Your task:
99
+
100
+ 1. Read the test file and understand what each test verifies
101
+ 2. Identify root cause - timing issues or actual bugs?
102
+ 3. Fix by:
103
+ - Replacing arbitrary timeouts with event-based waiting
104
+ - Fixing bugs in abort implementation if found
105
+ - Adjusting test expectations if testing changed behavior
106
+
107
+ Do NOT just increase timeouts - find the real issue.
108
+
109
+ Return: Summary of what you found and what you fixed.
110
+ ```
111
+
112
+ ## Common Mistakes
113
+
114
+ **❌ Too broad:** "Fix all the tests" - agent gets lost
115
+ **✅ Specific:** "Fix agent-tool-abort.test.ts" - focused scope
116
+
117
+ **❌ No context:** "Fix the race condition" - agent doesn't know where
118
+ **✅ Context:** Paste the error messages and test names
119
+
120
+ **❌ No constraints:** Agent might refactor everything
121
+ **✅ Constraints:** "Do NOT change production code" or "Fix tests only"
122
+
123
+ **❌ Vague output:** "Fix it" - you don't know what changed
124
+ **✅ Specific:** "Return summary of root cause and changes"
125
+
126
+ ## When NOT to Use
127
+
128
+ **Related failures:** Fixing one might fix others - investigate together first
129
+ **Need full context:** Understanding requires seeing entire system
130
+ **Exploratory debugging:** You don't know what's broken yet
131
+ **Shared state:** Agents would interfere (editing same files, using same resources)
132
+
133
+ ## Real Example from Session
134
+
135
+ **Scenario:** 6 test failures across 3 files after major refactoring
136
+
137
+ **Failures:**
138
+ - agent-tool-abort.test.ts: 3 failures (timing issues)
139
+ - batch-completion-behavior.test.ts: 2 failures (tools not executing)
140
+ - tool-approval-race-conditions.test.ts: 1 failure (execution count = 0)
141
+
142
+ **Decision:** Independent domains - abort logic separate from batch completion separate from race conditions
143
+
144
+ **Dispatch:**
145
+ ```
146
+ Agent 1 → Fix agent-tool-abort.test.ts
147
+ Agent 2 → Fix batch-completion-behavior.test.ts
148
+ Agent 3 → Fix tool-approval-race-conditions.test.ts
149
+ ```
150
+
151
+ **Results:**
152
+ - Agent 1: Replaced timeouts with event-based waiting
153
+ - Agent 2: Fixed event structure bug (threadId in wrong place)
154
+ - Agent 3: Added wait for async tool execution to complete
155
+
156
+ **Integration:** All fixes independent, no conflicts, full suite green
157
+
158
+ **Time saved:** 3 problems solved in parallel vs sequentially
159
+
160
+ ## Key Benefits
161
+
162
+ 1. **Parallelization** - Multiple investigations happen simultaneously
163
+ 2. **Focus** - Each agent has narrow scope, less context to track
164
+ 3. **Independence** - Agents don't interfere with each other
165
+ 4. **Speed** - 3 problems solved in time of 1
166
+
167
+ ## Verification
168
+
169
+ After agents return:
170
+ 1. **Review each summary** - Understand what changed
171
+ 2. **Check for conflicts** - Did agents edit same code?
172
+ 3. **Run full suite** - Verify all fixes work together
173
+ 4. **Spot check** - Agents can make systematic errors
174
+
175
+ ## Real-World Impact
176
+
177
+ From debugging session (2025-10-03):
178
+ - 6 failures across 3 files
179
+ - 3 agents dispatched in parallel
180
+ - All investigations completed concurrently
181
+ - All fixes integrated successfully
182
+ - Zero conflicts between agent changes
.agents/skills/documentation-lookup/SKILL.md ADDED
@@ -0,0 +1,90 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ name: documentation-lookup
3
+ description: Use up-to-date library and framework docs via Context7 MCP instead of training data. Activates for setup questions, API references, code examples, or when the user names a framework (e.g. React, Next.js, Prisma).
4
+ origin: ECC
5
+ ---
6
+
7
+ # Documentation Lookup (Context7)
8
+
9
+ When the user asks about libraries, frameworks, or APIs, fetch current documentation via the Context7 MCP (tools `resolve-library-id` and `query-docs`) instead of relying on training data.
10
+
11
+ ## Core Concepts
12
+
13
+ - **Context7**: MCP server that exposes live documentation; use it instead of training data for libraries and APIs.
14
+ - **resolve-library-id**: Returns Context7-compatible library IDs (e.g. `/vercel/next.js`) from a library name and query.
15
+ - **query-docs**: Fetches documentation and code snippets for a given library ID and question. Always call resolve-library-id first to get a valid library ID.
16
+
17
+ ## When to use
18
+
19
+ Activate when the user:
20
+
21
+ - Asks setup or configuration questions (e.g. "How do I configure Next.js middleware?")
22
+ - Requests code that depends on a library ("Write a Prisma query for...")
23
+ - Needs API or reference information ("What are the Supabase auth methods?")
24
+ - Mentions specific frameworks or libraries (React, Vue, Svelte, Express, Tailwind, Prisma, Supabase, etc.)
25
+
26
+ Use this skill whenever the request depends on accurate, up-to-date behavior of a library, framework, or API. Applies across harnesses that have the Context7 MCP configured (e.g. Claude Code, Cursor, Codex).
27
+
28
+ ## How it works
29
+
30
+ ### Step 1: Resolve the Library ID
31
+
32
+ Call the **resolve-library-id** MCP tool with:
33
+
34
+ - **libraryName**: The library or product name taken from the user's question (e.g. `Next.js`, `Prisma`, `Supabase`).
35
+ - **query**: The user's full question. This improves relevance ranking of results.
36
+
37
+ You must obtain a Context7-compatible library ID (format `/org/project` or `/org/project/version`) before querying docs. Do not call query-docs without a valid library ID from this step.
38
+
39
+ ### Step 2: Select the Best Match
40
+
41
+ From the resolution results, choose one result using:
42
+
43
+ - **Name match**: Prefer exact or closest match to what the user asked for.
44
+ - **Benchmark score**: Higher scores indicate better documentation quality (100 is highest).
45
+ - **Source reputation**: Prefer High or Medium reputation when available.
46
+ - **Version**: If the user specified a version (e.g. "React 19", "Next.js 15"), prefer a version-specific library ID if listed (e.g. `/org/project/v1.2.0`).
47
+
48
+ ### Step 3: Fetch the Documentation
49
+
50
+ Call the **query-docs** MCP tool with:
51
+
52
+ - **libraryId**: The selected Context7 library ID from Step 2 (e.g. `/vercel/next.js`).
53
+ - **query**: The user's specific question or task. Be specific to get relevant snippets.
54
+
55
+ Limit: do not call query-docs (or resolve-library-id) more than 3 times per question. If the answer is unclear after 3 calls, state the uncertainty and use the best information you have rather than guessing.
56
+
57
+ ### Step 4: Use the Documentation
58
+
59
+ - Answer the user's question using the fetched, current information.
60
+ - Include relevant code examples from the docs when helpful.
61
+ - Cite the library or version when it matters (e.g. "In Next.js 15...").
62
+
63
+ ## Examples
64
+
65
+ ### Example: Next.js middleware
66
+
67
+ 1. Call **resolve-library-id** with `libraryName: "Next.js"`, `query: "How do I set up Next.js middleware?"`.
68
+ 2. From results, pick the best match (e.g. `/vercel/next.js`) by name and benchmark score.
69
+ 3. Call **query-docs** with `libraryId: "/vercel/next.js"`, `query: "How do I set up Next.js middleware?"`.
70
+ 4. Use the returned snippets and text to answer; include a minimal `middleware.ts` example from the docs if relevant.
71
+
72
+ ### Example: Prisma query
73
+
74
+ 1. Call **resolve-library-id** with `libraryName: "Prisma"`, `query: "How do I query with relations?"`.
75
+ 2. Select the official Prisma library ID (e.g. `/prisma/prisma`).
76
+ 3. Call **query-docs** with that `libraryId` and the query.
77
+ 4. Return the Prisma Client pattern (e.g. `include` or `select`) with a short code snippet from the docs.
78
+
79
+ ### Example: Supabase auth methods
80
+
81
+ 1. Call **resolve-library-id** with `libraryName: "Supabase"`, `query: "What are the auth methods?"`.
82
+ 2. Pick the Supabase docs library ID.
83
+ 3. Call **query-docs**; summarize the auth methods and show minimal examples from the fetched docs.
84
+
85
+ ## Best Practices
86
+
87
+ - **Be specific**: Use the user's full question as the query where possible for better relevance.
88
+ - **Version awareness**: When users mention versions, use version-specific library IDs from the resolve step when available.
89
+ - **Prefer official sources**: When multiple matches exist, prefer official or primary packages over community forks.
90
+ - **No sensitive data**: Redact API keys, passwords, tokens, and other secrets from any query sent to Context7. Treat the user's question as potentially containing secrets before passing it to resolve-library-id or query-docs.
.agents/skills/e2e-testing/SKILL.md ADDED
@@ -0,0 +1,326 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ name: e2e-testing
3
+ description: Playwright E2E testing patterns, Page Object Model, configuration, CI/CD integration, artifact management, and flaky test strategies.
4
+ origin: ECC
5
+ ---
6
+
7
+ # E2E Testing Patterns
8
+
9
+ Comprehensive Playwright patterns for building stable, fast, and maintainable E2E test suites.
10
+
11
+ ## Test File Organization
12
+
13
+ ```
14
+ tests/
15
+ ├── e2e/
16
+ │ ├── auth/
17
+ │ │ ├── login.spec.ts
18
+ │ │ ├── logout.spec.ts
19
+ │ │ └── register.spec.ts
20
+ │ ├── features/
21
+ │ │ ├── browse.spec.ts
22
+ │ │ ├── search.spec.ts
23
+ │ │ └── create.spec.ts
24
+ │ └── api/
25
+ │ └── endpoints.spec.ts
26
+ ├── fixtures/
27
+ │ ├── auth.ts
28
+ │ └── data.ts
29
+ └── playwright.config.ts
30
+ ```
31
+
32
+ ## Page Object Model (POM)
33
+
34
+ ```typescript
35
+ import { Page, Locator } from '@playwright/test'
36
+
37
+ export class ItemsPage {
38
+ readonly page: Page
39
+ readonly searchInput: Locator
40
+ readonly itemCards: Locator
41
+ readonly createButton: Locator
42
+
43
+ constructor(page: Page) {
44
+ this.page = page
45
+ this.searchInput = page.locator('[data-testid="search-input"]')
46
+ this.itemCards = page.locator('[data-testid="item-card"]')
47
+ this.createButton = page.locator('[data-testid="create-btn"]')
48
+ }
49
+
50
+ async goto() {
51
+ await this.page.goto('/items')
52
+ await this.page.waitForLoadState('networkidle')
53
+ }
54
+
55
+ async search(query: string) {
56
+ await this.searchInput.fill(query)
57
+ await this.page.waitForResponse(resp => resp.url().includes('/api/search'))
58
+ await this.page.waitForLoadState('networkidle')
59
+ }
60
+
61
+ async getItemCount() {
62
+ return await this.itemCards.count()
63
+ }
64
+ }
65
+ ```
66
+
67
+ ## Test Structure
68
+
69
+ ```typescript
70
+ import { test, expect } from '@playwright/test'
71
+ import { ItemsPage } from '../../pages/ItemsPage'
72
+
73
+ test.describe('Item Search', () => {
74
+ let itemsPage: ItemsPage
75
+
76
+ test.beforeEach(async ({ page }) => {
77
+ itemsPage = new ItemsPage(page)
78
+ await itemsPage.goto()
79
+ })
80
+
81
+ test('should search by keyword', async ({ page }) => {
82
+ await itemsPage.search('test')
83
+
84
+ const count = await itemsPage.getItemCount()
85
+ expect(count).toBeGreaterThan(0)
86
+
87
+ await expect(itemsPage.itemCards.first()).toContainText(/test/i)
88
+ await page.screenshot({ path: 'artifacts/search-results.png' })
89
+ })
90
+
91
+ test('should handle no results', async ({ page }) => {
92
+ await itemsPage.search('xyznonexistent123')
93
+
94
+ await expect(page.locator('[data-testid="no-results"]')).toBeVisible()
95
+ expect(await itemsPage.getItemCount()).toBe(0)
96
+ })
97
+ })
98
+ ```
99
+
100
+ ## Playwright Configuration
101
+
102
+ ```typescript
103
+ import { defineConfig, devices } from '@playwright/test'
104
+
105
+ export default defineConfig({
106
+ testDir: './tests/e2e',
107
+ fullyParallel: true,
108
+ forbidOnly: !!process.env.CI,
109
+ retries: process.env.CI ? 2 : 0,
110
+ workers: process.env.CI ? 1 : undefined,
111
+ reporter: [
112
+ ['html', { outputFolder: 'playwright-report' }],
113
+ ['junit', { outputFile: 'playwright-results.xml' }],
114
+ ['json', { outputFile: 'playwright-results.json' }]
115
+ ],
116
+ use: {
117
+ baseURL: process.env.BASE_URL || 'http://localhost:3000',
118
+ trace: 'on-first-retry',
119
+ screenshot: 'only-on-failure',
120
+ video: 'retain-on-failure',
121
+ actionTimeout: 10000,
122
+ navigationTimeout: 30000,
123
+ },
124
+ projects: [
125
+ { name: 'chromium', use: { ...devices['Desktop Chrome'] } },
126
+ { name: 'firefox', use: { ...devices['Desktop Firefox'] } },
127
+ { name: 'webkit', use: { ...devices['Desktop Safari'] } },
128
+ { name: 'mobile-chrome', use: { ...devices['Pixel 5'] } },
129
+ ],
130
+ webServer: {
131
+ command: 'npm run dev',
132
+ url: 'http://localhost:3000',
133
+ reuseExistingServer: !process.env.CI,
134
+ timeout: 120000,
135
+ },
136
+ })
137
+ ```
138
+
139
+ ## Flaky Test Patterns
140
+
141
+ ### Quarantine
142
+
143
+ ```typescript
144
+ test('flaky: complex search', async ({ page }) => {
145
+ test.fixme(true, 'Flaky - Issue #123')
146
+ // test code...
147
+ })
148
+
149
+ test('conditional skip', async ({ page }) => {
150
+ test.skip(process.env.CI, 'Flaky in CI - Issue #123')
151
+ // test code...
152
+ })
153
+ ```
154
+
155
+ ### Identify Flakiness
156
+
157
+ ```bash
158
+ npx playwright test tests/search.spec.ts --repeat-each=10
159
+ npx playwright test tests/search.spec.ts --retries=3
160
+ ```
161
+
162
+ ### Common Causes & Fixes
163
+
164
+ **Race conditions:**
165
+ ```typescript
166
+ // Bad: assumes element is ready
167
+ await page.click('[data-testid="button"]')
168
+
169
+ // Good: auto-wait locator
170
+ await page.locator('[data-testid="button"]').click()
171
+ ```
172
+
173
+ **Network timing:**
174
+ ```typescript
175
+ // Bad: arbitrary timeout
176
+ await page.waitForTimeout(5000)
177
+
178
+ // Good: wait for specific condition
179
+ await page.waitForResponse(resp => resp.url().includes('/api/data'))
180
+ ```
181
+
182
+ **Animation timing:**
183
+ ```typescript
184
+ // Bad: click during animation
185
+ await page.click('[data-testid="menu-item"]')
186
+
187
+ // Good: wait for stability
188
+ await page.locator('[data-testid="menu-item"]').waitFor({ state: 'visible' })
189
+ await page.waitForLoadState('networkidle')
190
+ await page.locator('[data-testid="menu-item"]').click()
191
+ ```
192
+
193
+ ## Artifact Management
194
+
195
+ ### Screenshots
196
+
197
+ ```typescript
198
+ await page.screenshot({ path: 'artifacts/after-login.png' })
199
+ await page.screenshot({ path: 'artifacts/full-page.png', fullPage: true })
200
+ await page.locator('[data-testid="chart"]').screenshot({ path: 'artifacts/chart.png' })
201
+ ```
202
+
203
+ ### Traces
204
+
205
+ ```typescript
206
+ await browser.startTracing(page, {
207
+ path: 'artifacts/trace.json',
208
+ screenshots: true,
209
+ snapshots: true,
210
+ })
211
+ // ... test actions ...
212
+ await browser.stopTracing()
213
+ ```
214
+
215
+ ### Video
216
+
217
+ ```typescript
218
+ // In playwright.config.ts
219
+ use: {
220
+ video: 'retain-on-failure',
221
+ videosPath: 'artifacts/videos/'
222
+ }
223
+ ```
224
+
225
+ ## CI/CD Integration
226
+
227
+ ```yaml
228
+ # .github/workflows/e2e.yml
229
+ name: E2E Tests
230
+ on: [push, pull_request]
231
+
232
+ jobs:
233
+ test:
234
+ runs-on: ubuntu-latest
235
+ steps:
236
+ - uses: actions/checkout@v4
237
+ - uses: actions/setup-node@v4
238
+ with:
239
+ node-version: 20
240
+ - run: npm ci
241
+ - run: npx playwright install --with-deps
242
+ - run: npx playwright test
243
+ env:
244
+ BASE_URL: ${{ vars.STAGING_URL }}
245
+ - uses: actions/upload-artifact@v4
246
+ if: always()
247
+ with:
248
+ name: playwright-report
249
+ path: playwright-report/
250
+ retention-days: 30
251
+ ```
252
+
253
+ ## Test Report Template
254
+
255
+ ```markdown
256
+ # E2E Test Report
257
+
258
+ **Date:** YYYY-MM-DD HH:MM
259
+ **Duration:** Xm Ys
260
+ **Status:** PASSING / FAILING
261
+
262
+ ## Summary
263
+ - Total: X | Passed: Y (Z%) | Failed: A | Flaky: B | Skipped: C
264
+
265
+ ## Failed Tests
266
+
267
+ ### test-name
268
+ **File:** `tests/e2e/feature.spec.ts:45`
269
+ **Error:** Expected element to be visible
270
+ **Screenshot:** artifacts/failed.png
271
+ **Recommended Fix:** [description]
272
+
273
+ ## Artifacts
274
+ - HTML Report: playwright-report/index.html
275
+ - Screenshots: artifacts/*.png
276
+ - Videos: artifacts/videos/*.webm
277
+ - Traces: artifacts/*.zip
278
+ ```
279
+
280
+ ## Wallet / Web3 Testing
281
+
282
+ ```typescript
283
+ test('wallet connection', async ({ page, context }) => {
284
+ // Mock wallet provider
285
+ await context.addInitScript(() => {
286
+ window.ethereum = {
287
+ isMetaMask: true,
288
+ request: async ({ method }) => {
289
+ if (method === 'eth_requestAccounts')
290
+ return ['0x1234567890123456789012345678901234567890']
291
+ if (method === 'eth_chainId') return '0x1'
292
+ }
293
+ }
294
+ })
295
+
296
+ await page.goto('/')
297
+ await page.locator('[data-testid="connect-wallet"]').click()
298
+ await expect(page.locator('[data-testid="wallet-address"]')).toContainText('0x1234')
299
+ })
300
+ ```
301
+
302
+ ## Financial / Critical Flow Testing
303
+
304
+ ```typescript
305
+ test('trade execution', async ({ page }) => {
306
+ // Skip on production — real money
307
+ test.skip(process.env.NODE_ENV === 'production', 'Skip on production')
308
+
309
+ await page.goto('/markets/test-market')
310
+ await page.locator('[data-testid="position-yes"]').click()
311
+ await page.locator('[data-testid="trade-amount"]').fill('1.0')
312
+
313
+ // Verify preview
314
+ const preview = page.locator('[data-testid="trade-preview"]')
315
+ await expect(preview).toContainText('1.0')
316
+
317
+ // Confirm and wait for blockchain
318
+ await page.locator('[data-testid="confirm-trade"]').click()
319
+ await page.waitForResponse(
320
+ resp => resp.url().includes('/api/trade') && resp.status() === 200,
321
+ { timeout: 30000 }
322
+ )
323
+
324
+ await expect(page.locator('[data-testid="trade-success"]')).toBeVisible()
325
+ })
326
+ ```
.agents/skills/eval-harness/SKILL.md ADDED
@@ -0,0 +1,270 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ name: eval-harness
3
+ description: Formal evaluation framework for Claude Code sessions implementing eval-driven development (EDD) principles
4
+ origin: ECC
5
+ tools: Read, Write, Edit, Bash, Grep, Glob
6
+ ---
7
+
8
+ # Eval Harness Skill
9
+
10
+ A formal evaluation framework for Claude Code sessions, implementing eval-driven development (EDD) principles.
11
+
12
+ ## When to Activate
13
+
14
+ - Setting up eval-driven development (EDD) for AI-assisted workflows
15
+ - Defining pass/fail criteria for Claude Code task completion
16
+ - Measuring agent reliability with pass@k metrics
17
+ - Creating regression test suites for prompt or agent changes
18
+ - Benchmarking agent performance across model versions
19
+
20
+ ## Philosophy
21
+
22
+ Eval-Driven Development treats evals as the "unit tests of AI development":
23
+ - Define expected behavior BEFORE implementation
24
+ - Run evals continuously during development
25
+ - Track regressions with each change
26
+ - Use pass@k metrics for reliability measurement
27
+
28
+ ## Eval Types
29
+
30
+ ### Capability Evals
31
+ Test if Claude can do something it couldn't before:
32
+ ```markdown
33
+ [CAPABILITY EVAL: feature-name]
34
+ Task: Description of what Claude should accomplish
35
+ Success Criteria:
36
+ - [ ] Criterion 1
37
+ - [ ] Criterion 2
38
+ - [ ] Criterion 3
39
+ Expected Output: Description of expected result
40
+ ```
41
+
42
+ ### Regression Evals
43
+ Ensure changes don't break existing functionality:
44
+ ```markdown
45
+ [REGRESSION EVAL: feature-name]
46
+ Baseline: SHA or checkpoint name
47
+ Tests:
48
+ - existing-test-1: PASS/FAIL
49
+ - existing-test-2: PASS/FAIL
50
+ - existing-test-3: PASS/FAIL
51
+ Result: X/Y passed (previously Y/Y)
52
+ ```
53
+
54
+ ## Grader Types
55
+
56
+ ### 1. Code-Based Grader
57
+ Deterministic checks using code:
58
+ ```bash
59
+ # Check if file contains expected pattern
60
+ grep -q "export function handleAuth" src/auth.ts && echo "PASS" || echo "FAIL"
61
+
62
+ # Check if tests pass
63
+ npm test -- --testPathPattern="auth" && echo "PASS" || echo "FAIL"
64
+
65
+ # Check if build succeeds
66
+ npm run build && echo "PASS" || echo "FAIL"
67
+ ```
68
+
69
+ ### 2. Model-Based Grader
70
+ Use Claude to evaluate open-ended outputs:
71
+ ```markdown
72
+ [MODEL GRADER PROMPT]
73
+ Evaluate the following code change:
74
+ 1. Does it solve the stated problem?
75
+ 2. Is it well-structured?
76
+ 3. Are edge cases handled?
77
+ 4. Is error handling appropriate?
78
+
79
+ Score: 1-5 (1=poor, 5=excellent)
80
+ Reasoning: [explanation]
81
+ ```
82
+
83
+ ### 3. Human Grader
84
+ Flag for manual review:
85
+ ```markdown
86
+ [HUMAN REVIEW REQUIRED]
87
+ Change: Description of what changed
88
+ Reason: Why human review is needed
89
+ Risk Level: LOW/MEDIUM/HIGH
90
+ ```
91
+
92
+ ## Metrics
93
+
94
+ ### pass@k
95
+ "At least one success in k attempts"
96
+ - pass@1: First attempt success rate
97
+ - pass@3: Success within 3 attempts
98
+ - Typical target: pass@3 > 90%
99
+
100
+ ### pass^k
101
+ "All k trials succeed"
102
+ - Higher bar for reliability
103
+ - pass^3: 3 consecutive successes
104
+ - Use for critical paths
105
+
106
+ ## Eval Workflow
107
+
108
+ ### 1. Define (Before Coding)
109
+ ```markdown
110
+ ## EVAL DEFINITION: feature-xyz
111
+
112
+ ### Capability Evals
113
+ 1. Can create new user account
114
+ 2. Can validate email format
115
+ 3. Can hash password securely
116
+
117
+ ### Regression Evals
118
+ 1. Existing login still works
119
+ 2. Session management unchanged
120
+ 3. Logout flow intact
121
+
122
+ ### Success Metrics
123
+ - pass@3 > 90% for capability evals
124
+ - pass^3 = 100% for regression evals
125
+ ```
126
+
127
+ ### 2. Implement
128
+ Write code to pass the defined evals.
129
+
130
+ ### 3. Evaluate
131
+ ```bash
132
+ # Run capability evals
133
+ [Run each capability eval, record PASS/FAIL]
134
+
135
+ # Run regression evals
136
+ npm test -- --testPathPattern="existing"
137
+
138
+ # Generate report
139
+ ```
140
+
141
+ ### 4. Report
142
+ ```markdown
143
+ EVAL REPORT: feature-xyz
144
+ ========================
145
+
146
+ Capability Evals:
147
+ create-user: PASS (pass@1)
148
+ validate-email: PASS (pass@2)
149
+ hash-password: PASS (pass@1)
150
+ Overall: 3/3 passed
151
+
152
+ Regression Evals:
153
+ login-flow: PASS
154
+ session-mgmt: PASS
155
+ logout-flow: PASS
156
+ Overall: 3/3 passed
157
+
158
+ Metrics:
159
+ pass@1: 67% (2/3)
160
+ pass@3: 100% (3/3)
161
+
162
+ Status: READY FOR REVIEW
163
+ ```
164
+
165
+ ## Integration Patterns
166
+
167
+ ### Pre-Implementation
168
+ ```
169
+ /eval define feature-name
170
+ ```
171
+ Creates eval definition file at `.claude/evals/feature-name.md`
172
+
173
+ ### During Implementation
174
+ ```
175
+ /eval check feature-name
176
+ ```
177
+ Runs current evals and reports status
178
+
179
+ ### Post-Implementation
180
+ ```
181
+ /eval report feature-name
182
+ ```
183
+ Generates full eval report
184
+
185
+ ## Eval Storage
186
+
187
+ Store evals in project:
188
+ ```
189
+ .claude/
190
+ evals/
191
+ feature-xyz.md # Eval definition
192
+ feature-xyz.log # Eval run history
193
+ baseline.json # Regression baselines
194
+ ```
195
+
196
+ ## Best Practices
197
+
198
+ 1. **Define evals BEFORE coding** - Forces clear thinking about success criteria
199
+ 2. **Run evals frequently** - Catch regressions early
200
+ 3. **Track pass@k over time** - Monitor reliability trends
201
+ 4. **Use code graders when possible** - Deterministic > probabilistic
202
+ 5. **Human review for security** - Never fully automate security checks
203
+ 6. **Keep evals fast** - Slow evals don't get run
204
+ 7. **Version evals with code** - Evals are first-class artifacts
205
+
206
+ ## Example: Adding Authentication
207
+
208
+ ```markdown
209
+ ## EVAL: add-authentication
210
+
211
+ ### Phase 1: Define (10 min)
212
+ Capability Evals:
213
+ - [ ] User can register with email/password
214
+ - [ ] User can login with valid credentials
215
+ - [ ] Invalid credentials rejected with proper error
216
+ - [ ] Sessions persist across page reloads
217
+ - [ ] Logout clears session
218
+
219
+ Regression Evals:
220
+ - [ ] Public routes still accessible
221
+ - [ ] API responses unchanged
222
+ - [ ] Database schema compatible
223
+
224
+ ### Phase 2: Implement (varies)
225
+ [Write code]
226
+
227
+ ### Phase 3: Evaluate
228
+ Run: /eval check add-authentication
229
+
230
+ ### Phase 4: Report
231
+ EVAL REPORT: add-authentication
232
+ ==============================
233
+ Capability: 5/5 passed (pass@3: 100%)
234
+ Regression: 3/3 passed (pass^3: 100%)
235
+ Status: SHIP IT
236
+ ```
237
+
238
+ ## Product Evals (v1.8)
239
+
240
+ Use product evals when behavior quality cannot be captured by unit tests alone.
241
+
242
+ ### Grader Types
243
+
244
+ 1. Code grader (deterministic assertions)
245
+ 2. Rule grader (regex/schema constraints)
246
+ 3. Model grader (LLM-as-judge rubric)
247
+ 4. Human grader (manual adjudication for ambiguous outputs)
248
+
249
+ ### pass@k Guidance
250
+
251
+ - `pass@1`: direct reliability
252
+ - `pass@3`: practical reliability under controlled retries
253
+ - `pass^3`: stability test (all 3 runs must pass)
254
+
255
+ Recommended thresholds:
256
+ - Capability evals: pass@3 >= 0.90
257
+ - Regression evals: pass^3 = 1.00 for release-critical paths
258
+
259
+ ### Eval Anti-Patterns
260
+
261
+ - Overfitting prompts to known eval examples
262
+ - Measuring only happy-path outputs
263
+ - Ignoring cost and latency drift while chasing pass rates
264
+ - Allowing flaky graders in release gates
265
+
266
+ ### Minimal Eval Artifact Layout
267
+
268
+ - `.claude/evals/<feature>.md` definition
269
+ - `.claude/evals/<feature>.log` run history
270
+ - `docs/releases/<version>/eval-summary.md` release snapshot
.agents/skills/executing-plans/SKILL.md ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ name: executing-plans
3
+ description: Use when you have a written implementation plan to execute in a separate session with review checkpoints
4
+ ---
5
+
6
+ # Executing Plans
7
+
8
+ ## Overview
9
+
10
+ Load plan, review critically, execute all tasks, report when complete.
11
+
12
+ **Announce at start:** "I'm using the executing-plans skill to implement this plan."
13
+
14
+ **Note:** Tell your human partner that Superpowers works much better with access to subagents. The quality of its work will be significantly higher if run on a platform with subagent support (such as Claude Code or Codex). If subagents are available, use superpowers:subagent-driven-development instead of this skill.
15
+
16
+ ## The Process
17
+
18
+ ### Step 1: Load and Review Plan
19
+ 1. Read plan file
20
+ 2. Review critically - identify any questions or concerns about the plan
21
+ 3. If concerns: Raise them with your human partner before starting
22
+ 4. If no concerns: Create TodoWrite and proceed
23
+
24
+ ### Step 2: Execute Tasks
25
+
26
+ For each task:
27
+ 1. Mark as in_progress
28
+ 2. Follow each step exactly (plan has bite-sized steps)
29
+ 3. Run verifications as specified
30
+ 4. Mark as completed
31
+
32
+ ### Step 3: Complete Development
33
+
34
+ After all tasks complete and verified:
35
+ - Announce: "I'm using the finishing-a-development-branch skill to complete this work."
36
+ - **REQUIRED SUB-SKILL:** Use superpowers:finishing-a-development-branch
37
+ - Follow that skill to verify tests, present options, execute choice
38
+
39
+ ## When to Stop and Ask for Help
40
+
41
+ **STOP executing immediately when:**
42
+ - Hit a blocker (missing dependency, test fails, instruction unclear)
43
+ - Plan has critical gaps preventing starting
44
+ - You don't understand an instruction
45
+ - Verification fails repeatedly
46
+
47
+ **Ask for clarification rather than guessing.**
48
+
49
+ ## When to Revisit Earlier Steps
50
+
51
+ **Return to Review (Step 1) when:**
52
+ - Partner updates the plan based on your feedback
53
+ - Fundamental approach needs rethinking
54
+
55
+ **Don't force through blockers** - stop and ask.
56
+
57
+ ## Remember
58
+ - Review plan critically first
59
+ - Follow plan steps exactly
60
+ - Don't skip verifications
61
+ - Reference skills when plan says to
62
+ - Stop when blocked, don't guess
63
+ - Never start implementation on main/master branch without explicit user consent
64
+
65
+ ## Integration
66
+
67
+ **Required workflow skills:**
68
+ - **superpowers:using-git-worktrees** - REQUIRED: Set up isolated workspace before starting
69
+ - **superpowers:writing-plans** - Creates the plan this skill executes
70
+ - **superpowers:finishing-a-development-branch** - Complete development after all tasks
.agents/skills/finishing-a-development-branch/SKILL.md ADDED
@@ -0,0 +1,200 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ name: finishing-a-development-branch
3
+ description: Use when implementation is complete, all tests pass, and you need to decide how to integrate the work - guides completion of development work by presenting structured options for merge, PR, or cleanup
4
+ ---
5
+
6
+ # Finishing a Development Branch
7
+
8
+ ## Overview
9
+
10
+ Guide completion of development work by presenting clear options and handling chosen workflow.
11
+
12
+ **Core principle:** Verify tests → Present options → Execute choice → Clean up.
13
+
14
+ **Announce at start:** "I'm using the finishing-a-development-branch skill to complete this work."
15
+
16
+ ## The Process
17
+
18
+ ### Step 1: Verify Tests
19
+
20
+ **Before presenting options, verify tests pass:**
21
+
22
+ ```bash
23
+ # Run project's test suite
24
+ npm test / cargo test / pytest / go test ./...
25
+ ```
26
+
27
+ **If tests fail:**
28
+ ```
29
+ Tests failing (<N> failures). Must fix before completing:
30
+
31
+ [Show failures]
32
+
33
+ Cannot proceed with merge/PR until tests pass.
34
+ ```
35
+
36
+ Stop. Don't proceed to Step 2.
37
+
38
+ **If tests pass:** Continue to Step 2.
39
+
40
+ ### Step 2: Determine Base Branch
41
+
42
+ ```bash
43
+ # Try common base branches
44
+ git merge-base HEAD main 2>/dev/null || git merge-base HEAD master 2>/dev/null
45
+ ```
46
+
47
+ Or ask: "This branch split from main - is that correct?"
48
+
49
+ ### Step 3: Present Options
50
+
51
+ Present exactly these 4 options:
52
+
53
+ ```
54
+ Implementation complete. What would you like to do?
55
+
56
+ 1. Merge back to <base-branch> locally
57
+ 2. Push and create a Pull Request
58
+ 3. Keep the branch as-is (I'll handle it later)
59
+ 4. Discard this work
60
+
61
+ Which option?
62
+ ```
63
+
64
+ **Don't add explanation** - keep options concise.
65
+
66
+ ### Step 4: Execute Choice
67
+
68
+ #### Option 1: Merge Locally
69
+
70
+ ```bash
71
+ # Switch to base branch
72
+ git checkout <base-branch>
73
+
74
+ # Pull latest
75
+ git pull
76
+
77
+ # Merge feature branch
78
+ git merge <feature-branch>
79
+
80
+ # Verify tests on merged result
81
+ <test command>
82
+
83
+ # If tests pass
84
+ git branch -d <feature-branch>
85
+ ```
86
+
87
+ Then: Cleanup worktree (Step 5)
88
+
89
+ #### Option 2: Push and Create PR
90
+
91
+ ```bash
92
+ # Push branch
93
+ git push -u origin <feature-branch>
94
+
95
+ # Create PR
96
+ gh pr create --title "<title>" --body "$(cat <<'EOF'
97
+ ## Summary
98
+ <2-3 bullets of what changed>
99
+
100
+ ## Test Plan
101
+ - [ ] <verification steps>
102
+ EOF
103
+ )"
104
+ ```
105
+
106
+ Then: Cleanup worktree (Step 5)
107
+
108
+ #### Option 3: Keep As-Is
109
+
110
+ Report: "Keeping branch <name>. Worktree preserved at <path>."
111
+
112
+ **Don't cleanup worktree.**
113
+
114
+ #### Option 4: Discard
115
+
116
+ **Confirm first:**
117
+ ```
118
+ This will permanently delete:
119
+ - Branch <name>
120
+ - All commits: <commit-list>
121
+ - Worktree at <path>
122
+
123
+ Type 'discard' to confirm.
124
+ ```
125
+
126
+ Wait for exact confirmation.
127
+
128
+ If confirmed:
129
+ ```bash
130
+ git checkout <base-branch>
131
+ git branch -D <feature-branch>
132
+ ```
133
+
134
+ Then: Cleanup worktree (Step 5)
135
+
136
+ ### Step 5: Cleanup Worktree
137
+
138
+ **For Options 1, 2, 4:**
139
+
140
+ Check if in worktree:
141
+ ```bash
142
+ git worktree list | grep $(git branch --show-current)
143
+ ```
144
+
145
+ If yes:
146
+ ```bash
147
+ git worktree remove <worktree-path>
148
+ ```
149
+
150
+ **For Option 3:** Keep worktree.
151
+
152
+ ## Quick Reference
153
+
154
+ | Option | Merge | Push | Keep Worktree | Cleanup Branch |
155
+ |--------|-------|------|---------------|----------------|
156
+ | 1. Merge locally | ✓ | - | - | ✓ |
157
+ | 2. Create PR | - | ✓ | ✓ | - |
158
+ | 3. Keep as-is | - | - | ✓ | - |
159
+ | 4. Discard | - | - | - | ✓ (force) |
160
+
161
+ ## Common Mistakes
162
+
163
+ **Skipping test verification**
164
+ - **Problem:** Merge broken code, create failing PR
165
+ - **Fix:** Always verify tests before offering options
166
+
167
+ **Open-ended questions**
168
+ - **Problem:** "What should I do next?" → ambiguous
169
+ - **Fix:** Present exactly 4 structured options
170
+
171
+ **Automatic worktree cleanup**
172
+ - **Problem:** Remove worktree when might need it (Option 2, 3)
173
+ - **Fix:** Only cleanup for Options 1 and 4
174
+
175
+ **No confirmation for discard**
176
+ - **Problem:** Accidentally delete work
177
+ - **Fix:** Require typed "discard" confirmation
178
+
179
+ ## Red Flags
180
+
181
+ **Never:**
182
+ - Proceed with failing tests
183
+ - Merge without verifying tests on result
184
+ - Delete work without confirmation
185
+ - Force-push without explicit request
186
+
187
+ **Always:**
188
+ - Verify tests before offering options
189
+ - Present exactly 4 options
190
+ - Get typed confirmation for Option 4
191
+ - Clean up worktree for Options 1 & 4 only
192
+
193
+ ## Integration
194
+
195
+ **Called by:**
196
+ - **subagent-driven-development** (Step 7) - After all tasks complete
197
+ - **executing-plans** (Step 5) - After all batches complete
198
+
199
+ **Pairs with:**
200
+ - **using-git-worktrees** - Cleans up worktree created by that skill
.agents/skills/frontend-slides/SKILL.md ADDED
@@ -0,0 +1,184 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ name: frontend-slides
3
+ description: Create stunning, animation-rich HTML presentations from scratch or by converting PowerPoint files. Use when the user wants to build a presentation, convert a PPT/PPTX to web, or create slides for a talk/pitch. Helps non-designers discover their aesthetic through visual exploration rather than abstract choices.
4
+ origin: ECC
5
+ ---
6
+
7
+ # Frontend Slides
8
+
9
+ Create zero-dependency, animation-rich HTML presentations that run entirely in the browser.
10
+
11
+ Inspired by the visual exploration approach showcased in work by zarazhangrui (credit: @zarazhangrui).
12
+
13
+ ## When to Activate
14
+
15
+ - Creating a talk deck, pitch deck, workshop deck, or internal presentation
16
+ - Converting `.ppt` or `.pptx` slides into an HTML presentation
17
+ - Improving an existing HTML presentation's layout, motion, or typography
18
+ - Exploring presentation styles with a user who does not know their design preference yet
19
+
20
+ ## Non-Negotiables
21
+
22
+ 1. **Zero dependencies**: default to one self-contained HTML file with inline CSS and JS.
23
+ 2. **Viewport fit is mandatory**: every slide must fit inside one viewport with no internal scrolling.
24
+ 3. **Show, don't tell**: use visual previews instead of abstract style questionnaires.
25
+ 4. **Distinctive design**: avoid generic purple-gradient, Inter-on-white, template-looking decks.
26
+ 5. **Production quality**: keep code commented, accessible, responsive, and performant.
27
+
28
+ Before generating, read `STYLE_PRESETS.md` for the viewport-safe CSS base, density limits, preset catalog, and CSS gotchas.
29
+
30
+ ## Workflow
31
+
32
+ ### 1. Detect Mode
33
+
34
+ Choose one path:
35
+ - **New presentation**: user has a topic, notes, or full draft
36
+ - **PPT conversion**: user has `.ppt` or `.pptx`
37
+ - **Enhancement**: user already has HTML slides and wants improvements
38
+
39
+ ### 2. Discover Content
40
+
41
+ Ask only the minimum needed:
42
+ - purpose: pitch, teaching, conference talk, internal update
43
+ - length: short (5-10), medium (10-20), long (20+)
44
+ - content state: finished copy, rough notes, topic only
45
+
46
+ If the user has content, ask them to paste it before styling.
47
+
48
+ ### 3. Discover Style
49
+
50
+ Default to visual exploration.
51
+
52
+ If the user already knows the desired preset, skip previews and use it directly.
53
+
54
+ Otherwise:
55
+ 1. Ask what feeling the deck should create: impressed, energized, focused, inspired.
56
+ 2. Generate **3 single-slide preview files** in `.ecc-design/slide-previews/`.
57
+ 3. Each preview must be self-contained, show typography/color/motion clearly, and stay under roughly 100 lines of slide content.
58
+ 4. Ask the user which preview to keep or what elements to mix.
59
+
60
+ Use the preset guide in `STYLE_PRESETS.md` when mapping mood to style.
61
+
62
+ ### 4. Build the Presentation
63
+
64
+ Output either:
65
+ - `presentation.html`
66
+ - `[presentation-name].html`
67
+
68
+ Use an `assets/` folder only when the deck contains extracted or user-supplied images.
69
+
70
+ Required structure:
71
+ - semantic slide sections
72
+ - a viewport-safe CSS base from `STYLE_PRESETS.md`
73
+ - CSS custom properties for theme values
74
+ - a presentation controller class for keyboard, wheel, and touch navigation
75
+ - Intersection Observer for reveal animations
76
+ - reduced-motion support
77
+
78
+ ### 5. Enforce Viewport Fit
79
+
80
+ Treat this as a hard gate.
81
+
82
+ Rules:
83
+ - every `.slide` must use `height: 100vh; height: 100dvh; overflow: hidden;`
84
+ - all type and spacing must scale with `clamp()`
85
+ - when content does not fit, split into multiple slides
86
+ - never solve overflow by shrinking text below readable sizes
87
+ - never allow scrollbars inside a slide
88
+
89
+ Use the density limits and mandatory CSS block in `STYLE_PRESETS.md`.
90
+
91
+ ### 6. Validate
92
+
93
+ Check the finished deck at these sizes:
94
+ - 1920x1080
95
+ - 1280x720
96
+ - 768x1024
97
+ - 375x667
98
+ - 667x375
99
+
100
+ If browser automation is available, use it to verify no slide overflows and that keyboard navigation works.
101
+
102
+ ### 7. Deliver
103
+
104
+ At handoff:
105
+ - delete temporary preview files unless the user wants to keep them
106
+ - open the deck with the platform-appropriate opener when useful
107
+ - summarize file path, preset used, slide count, and easy theme customization points
108
+
109
+ Use the correct opener for the current OS:
110
+ - macOS: `open file.html`
111
+ - Linux: `xdg-open file.html`
112
+ - Windows: `start "" file.html`
113
+
114
+ ## PPT / PPTX Conversion
115
+
116
+ For PowerPoint conversion:
117
+ 1. Prefer `python3` with `python-pptx` to extract text, images, and notes.
118
+ 2. If `python-pptx` is unavailable, ask whether to install it or fall back to a manual/export-based workflow.
119
+ 3. Preserve slide order, speaker notes, and extracted assets.
120
+ 4. After extraction, run the same style-selection workflow as a new presentation.
121
+
122
+ Keep conversion cross-platform. Do not rely on macOS-only tools when Python can do the job.
123
+
124
+ ## Implementation Requirements
125
+
126
+ ### HTML / CSS
127
+
128
+ - Use inline CSS and JS unless the user explicitly wants a multi-file project.
129
+ - Fonts may come from Google Fonts or Fontshare.
130
+ - Prefer atmospheric backgrounds, strong type hierarchy, and a clear visual direction.
131
+ - Use abstract shapes, gradients, grids, noise, and geometry rather than illustrations.
132
+
133
+ ### JavaScript
134
+
135
+ Include:
136
+ - keyboard navigation
137
+ - touch / swipe navigation
138
+ - mouse wheel navigation
139
+ - progress indicator or slide index
140
+ - reveal-on-enter animation triggers
141
+
142
+ ### Accessibility
143
+
144
+ - use semantic structure (`main`, `section`, `nav`)
145
+ - keep contrast readable
146
+ - support keyboard-only navigation
147
+ - respect `prefers-reduced-motion`
148
+
149
+ ## Content Density Limits
150
+
151
+ Use these maxima unless the user explicitly asks for denser slides and readability still holds:
152
+
153
+ | Slide type | Limit |
154
+ |------------|-------|
155
+ | Title | 1 heading + 1 subtitle + optional tagline |
156
+ | Content | 1 heading + 4-6 bullets or 2 short paragraphs |
157
+ | Feature grid | 6 cards max |
158
+ | Code | 8-10 lines max |
159
+ | Quote | 1 quote + attribution |
160
+ | Image | 1 image constrained by viewport |
161
+
162
+ ## Anti-Patterns
163
+
164
+ - generic startup gradients with no visual identity
165
+ - system-font decks unless intentionally editorial
166
+ - long bullet walls
167
+ - code blocks that need scrolling
168
+ - fixed-height content boxes that break on short screens
169
+ - invalid negated CSS functions like `-clamp(...)`
170
+
171
+ ## Related ECC Skills
172
+
173
+ - `frontend-patterns` for component and interaction patterns around the deck
174
+ - `liquid-glass-design` when a presentation intentionally borrows Apple glass aesthetics
175
+ - `e2e-testing` if you need automated browser verification for the final deck
176
+
177
+ ## Deliverable Checklist
178
+
179
+ - presentation runs from a local file in a browser
180
+ - every slide fits the viewport without scrolling
181
+ - style is distinctive and intentional
182
+ - animation is meaningful, not noisy
183
+ - reduced motion is respected
184
+ - file paths and customization points are explained at handoff
.agents/skills/frontend-slides/STYLE_PRESETS.md ADDED
@@ -0,0 +1,330 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Style Presets Reference
2
+
3
+ Curated visual styles for `frontend-slides`.
4
+
5
+ Use this file for:
6
+ - the mandatory viewport-fitting CSS base
7
+ - preset selection and mood mapping
8
+ - CSS gotchas and validation rules
9
+
10
+ Abstract shapes only. Avoid illustrations unless the user explicitly asks for them.
11
+
12
+ ## Viewport Fit Is Non-Negotiable
13
+
14
+ Every slide must fully fit in one viewport.
15
+
16
+ ### Golden Rule
17
+
18
+ ```text
19
+ Each slide = exactly one viewport height.
20
+ Too much content = split into more slides.
21
+ Never scroll inside a slide.
22
+ ```
23
+
24
+ ### Density Limits
25
+
26
+ | Slide Type | Maximum Content |
27
+ |------------|-----------------|
28
+ | Title slide | 1 heading + 1 subtitle + optional tagline |
29
+ | Content slide | 1 heading + 4-6 bullets or 2 paragraphs |
30
+ | Feature grid | 6 cards maximum |
31
+ | Code slide | 8-10 lines maximum |
32
+ | Quote slide | 1 quote + attribution |
33
+ | Image slide | 1 image, ideally under 60vh |
34
+
35
+ ## Mandatory Base CSS
36
+
37
+ Copy this block into every generated presentation and then theme on top of it.
38
+
39
+ ```css
40
+ /* ===========================================
41
+ VIEWPORT FITTING: MANDATORY BASE STYLES
42
+ =========================================== */
43
+
44
+ html, body {
45
+ height: 100%;
46
+ overflow-x: hidden;
47
+ }
48
+
49
+ html {
50
+ scroll-snap-type: y mandatory;
51
+ scroll-behavior: smooth;
52
+ }
53
+
54
+ .slide {
55
+ width: 100vw;
56
+ height: 100vh;
57
+ height: 100dvh;
58
+ overflow: hidden;
59
+ scroll-snap-align: start;
60
+ display: flex;
61
+ flex-direction: column;
62
+ position: relative;
63
+ }
64
+
65
+ .slide-content {
66
+ flex: 1;
67
+ display: flex;
68
+ flex-direction: column;
69
+ justify-content: center;
70
+ max-height: 100%;
71
+ overflow: hidden;
72
+ padding: var(--slide-padding);
73
+ }
74
+
75
+ :root {
76
+ --title-size: clamp(1.5rem, 5vw, 4rem);
77
+ --h2-size: clamp(1.25rem, 3.5vw, 2.5rem);
78
+ --h3-size: clamp(1rem, 2.5vw, 1.75rem);
79
+ --body-size: clamp(0.75rem, 1.5vw, 1.125rem);
80
+ --small-size: clamp(0.65rem, 1vw, 0.875rem);
81
+
82
+ --slide-padding: clamp(1rem, 4vw, 4rem);
83
+ --content-gap: clamp(0.5rem, 2vw, 2rem);
84
+ --element-gap: clamp(0.25rem, 1vw, 1rem);
85
+ }
86
+
87
+ .card, .container, .content-box {
88
+ max-width: min(90vw, 1000px);
89
+ max-height: min(80vh, 700px);
90
+ }
91
+
92
+ .feature-list, .bullet-list {
93
+ gap: clamp(0.4rem, 1vh, 1rem);
94
+ }
95
+
96
+ .feature-list li, .bullet-list li {
97
+ font-size: var(--body-size);
98
+ line-height: 1.4;
99
+ }
100
+
101
+ .grid {
102
+ display: grid;
103
+ grid-template-columns: repeat(auto-fit, minmax(min(100%, 250px), 1fr));
104
+ gap: clamp(0.5rem, 1.5vw, 1rem);
105
+ }
106
+
107
+ img, .image-container {
108
+ max-width: 100%;
109
+ max-height: min(50vh, 400px);
110
+ object-fit: contain;
111
+ }
112
+
113
+ @media (max-height: 700px) {
114
+ :root {
115
+ --slide-padding: clamp(0.75rem, 3vw, 2rem);
116
+ --content-gap: clamp(0.4rem, 1.5vw, 1rem);
117
+ --title-size: clamp(1.25rem, 4.5vw, 2.5rem);
118
+ --h2-size: clamp(1rem, 3vw, 1.75rem);
119
+ }
120
+ }
121
+
122
+ @media (max-height: 600px) {
123
+ :root {
124
+ --slide-padding: clamp(0.5rem, 2.5vw, 1.5rem);
125
+ --content-gap: clamp(0.3rem, 1vw, 0.75rem);
126
+ --title-size: clamp(1.1rem, 4vw, 2rem);
127
+ --body-size: clamp(0.7rem, 1.2vw, 0.95rem);
128
+ }
129
+
130
+ .nav-dots, .keyboard-hint, .decorative {
131
+ display: none;
132
+ }
133
+ }
134
+
135
+ @media (max-height: 500px) {
136
+ :root {
137
+ --slide-padding: clamp(0.4rem, 2vw, 1rem);
138
+ --title-size: clamp(1rem, 3.5vw, 1.5rem);
139
+ --h2-size: clamp(0.9rem, 2.5vw, 1.25rem);
140
+ --body-size: clamp(0.65rem, 1vw, 0.85rem);
141
+ }
142
+ }
143
+
144
+ @media (max-width: 600px) {
145
+ :root {
146
+ --title-size: clamp(1.25rem, 7vw, 2.5rem);
147
+ }
148
+
149
+ .grid {
150
+ grid-template-columns: 1fr;
151
+ }
152
+ }
153
+
154
+ @media (prefers-reduced-motion: reduce) {
155
+ *, *::before, *::after {
156
+ animation-duration: 0.01ms !important;
157
+ transition-duration: 0.2s !important;
158
+ }
159
+
160
+ html {
161
+ scroll-behavior: auto;
162
+ }
163
+ }
164
+ ```
165
+
166
+ ## Viewport Checklist
167
+
168
+ - every `.slide` has `height: 100vh`, `height: 100dvh`, and `overflow: hidden`
169
+ - all typography uses `clamp()`
170
+ - all spacing uses `clamp()` or viewport units
171
+ - images have `max-height` constraints
172
+ - grids adapt with `auto-fit` + `minmax()`
173
+ - short-height breakpoints exist at `700px`, `600px`, and `500px`
174
+ - if anything feels cramped, split the slide
175
+
176
+ ## Mood to Preset Mapping
177
+
178
+ | Mood | Good Presets |
179
+ |------|--------------|
180
+ | Impressed / Confident | Bold Signal, Electric Studio, Dark Botanical |
181
+ | Excited / Energized | Creative Voltage, Neon Cyber, Split Pastel |
182
+ | Calm / Focused | Notebook Tabs, Paper & Ink, Swiss Modern |
183
+ | Inspired / Moved | Dark Botanical, Vintage Editorial, Pastel Geometry |
184
+
185
+ ## Preset Catalog
186
+
187
+ ### 1. Bold Signal
188
+
189
+ - Vibe: confident, high-impact, keynote-ready
190
+ - Best for: pitch decks, launches, statements
191
+ - Fonts: Archivo Black + Space Grotesk
192
+ - Palette: charcoal base, hot orange focal card, crisp white text
193
+ - Signature: oversized section numbers, high-contrast card on dark field
194
+
195
+ ### 2. Electric Studio
196
+
197
+ - Vibe: clean, bold, agency-polished
198
+ - Best for: client presentations, strategic reviews
199
+ - Fonts: Manrope only
200
+ - Palette: black, white, saturated cobalt accent
201
+ - Signature: two-panel split and sharp editorial alignment
202
+
203
+ ### 3. Creative Voltage
204
+
205
+ - Vibe: energetic, retro-modern, playful confidence
206
+ - Best for: creative studios, brand work, product storytelling
207
+ - Fonts: Syne + Space Mono
208
+ - Palette: electric blue, neon yellow, deep navy
209
+ - Signature: halftone textures, badges, punchy contrast
210
+
211
+ ### 4. Dark Botanical
212
+
213
+ - Vibe: elegant, premium, atmospheric
214
+ - Best for: luxury brands, thoughtful narratives, premium product decks
215
+ - Fonts: Cormorant + IBM Plex Sans
216
+ - Palette: near-black, warm ivory, blush, gold, terracotta
217
+ - Signature: blurred abstract circles, fine rules, restrained motion
218
+
219
+ ### 5. Notebook Tabs
220
+
221
+ - Vibe: editorial, organized, tactile
222
+ - Best for: reports, reviews, structured storytelling
223
+ - Fonts: Bodoni Moda + DM Sans
224
+ - Palette: cream paper on charcoal with pastel tabs
225
+ - Signature: paper sheet, colored side tabs, binder details
226
+
227
+ ### 6. Pastel Geometry
228
+
229
+ - Vibe: approachable, modern, friendly
230
+ - Best for: product overviews, onboarding, lighter brand decks
231
+ - Fonts: Plus Jakarta Sans only
232
+ - Palette: pale blue field, cream card, soft pink/mint/lavender accents
233
+ - Signature: vertical pills, rounded cards, soft shadows
234
+
235
+ ### 7. Split Pastel
236
+
237
+ - Vibe: playful, modern, creative
238
+ - Best for: agency intros, workshops, portfolios
239
+ - Fonts: Outfit only
240
+ - Palette: peach + lavender split with mint badges
241
+ - Signature: split backdrop, rounded tags, light grid overlays
242
+
243
+ ### 8. Vintage Editorial
244
+
245
+ - Vibe: witty, personality-driven, magazine-inspired
246
+ - Best for: personal brands, opinionated talks, storytelling
247
+ - Fonts: Fraunces + Work Sans
248
+ - Palette: cream, charcoal, dusty warm accents
249
+ - Signature: geometric accents, bordered callouts, punchy serif headlines
250
+
251
+ ### 9. Neon Cyber
252
+
253
+ - Vibe: futuristic, techy, kinetic
254
+ - Best for: AI, infra, dev tools, future-of-X talks
255
+ - Fonts: Clash Display + Satoshi
256
+ - Palette: midnight navy, cyan, magenta
257
+ - Signature: glow, particles, grids, data-radar energy
258
+
259
+ ### 10. Terminal Green
260
+
261
+ - Vibe: developer-focused, hacker-clean
262
+ - Best for: APIs, CLI tools, engineering demos
263
+ - Fonts: JetBrains Mono only
264
+ - Palette: GitHub dark + terminal green
265
+ - Signature: scan lines, command-line framing, precise monospace rhythm
266
+
267
+ ### 11. Swiss Modern
268
+
269
+ - Vibe: minimal, precise, data-forward
270
+ - Best for: corporate, product strategy, analytics
271
+ - Fonts: Archivo + Nunito
272
+ - Palette: white, black, signal red
273
+ - Signature: visible grids, asymmetry, geometric discipline
274
+
275
+ ### 12. Paper & Ink
276
+
277
+ - Vibe: literary, thoughtful, story-driven
278
+ - Best for: essays, keynote narratives, manifesto decks
279
+ - Fonts: Cormorant Garamond + Source Serif 4
280
+ - Palette: warm cream, charcoal, crimson accent
281
+ - Signature: pull quotes, drop caps, elegant rules
282
+
283
+ ## Direct Selection Prompts
284
+
285
+ If the user already knows the style they want, let them pick directly from the preset names above instead of forcing preview generation.
286
+
287
+ ## Animation Feel Mapping
288
+
289
+ | Feeling | Motion Direction |
290
+ |---------|------------------|
291
+ | Dramatic / Cinematic | slow fades, parallax, large scale-ins |
292
+ | Techy / Futuristic | glow, particles, grid motion, scramble text |
293
+ | Playful / Friendly | springy easing, rounded shapes, floating motion |
294
+ | Professional / Corporate | subtle 200-300ms transitions, clean slides |
295
+ | Calm / Minimal | very restrained movement, whitespace-first |
296
+ | Editorial / Magazine | strong hierarchy, staggered text and image interplay |
297
+
298
+ ## CSS Gotcha: Negating Functions
299
+
300
+ Never write these:
301
+
302
+ ```css
303
+ right: -clamp(28px, 3.5vw, 44px);
304
+ margin-left: -min(10vw, 100px);
305
+ ```
306
+
307
+ Browsers ignore them silently.
308
+
309
+ Always write this instead:
310
+
311
+ ```css
312
+ right: calc(-1 * clamp(28px, 3.5vw, 44px));
313
+ margin-left: calc(-1 * min(10vw, 100px));
314
+ ```
315
+
316
+ ## Validation Sizes
317
+
318
+ Test at minimum:
319
+ - Desktop: `1920x1080`, `1440x900`, `1280x720`
320
+ - Tablet: `1024x768`, `768x1024`
321
+ - Mobile: `375x667`, `414x896`
322
+ - Landscape phone: `667x375`, `896x414`
323
+
324
+ ## Anti-Patterns
325
+
326
+ Do not use:
327
+ - purple-on-white startup templates
328
+ - Inter / Roboto / Arial as the visual voice unless the user explicitly wants utilitarian neutrality
329
+ - bullet walls, tiny type, or code blocks that require scrolling
330
+ - decorative illustrations when abstract geometry would do the job better
.agents/skills/karpathy-guidelines/SKILL.md ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ name: karpathy-guidelines
3
+ description: Behavioral guidelines to reduce common LLM coding mistakes. Use when writing, reviewing, or refactoring code to avoid overcomplication, make surgical changes, surface assumptions, and define verifiable success criteria.
4
+ license: MIT
5
+ ---
6
+
7
+ # Karpathy Guidelines
8
+
9
+ Behavioral guidelines to reduce common LLM coding mistakes, derived from [Andrej Karpathy's observations](https://x.com/karpathy/status/2015883857489522876) on LLM coding pitfalls.
10
+
11
+ **Tradeoff:** These guidelines bias toward caution over speed. For trivial tasks, use judgment.
12
+
13
+ ## 1. Think Before Coding
14
+
15
+ **Don't assume. Don't hide confusion. Surface tradeoffs.**
16
+
17
+ Before implementing:
18
+ - State your assumptions explicitly. If uncertain, ask.
19
+ - If multiple interpretations exist, present them - don't pick silently.
20
+ - If a simpler approach exists, say so. Push back when warranted.
21
+ - If something is unclear, stop. Name what's confusing. Ask.
22
+
23
+ ## 2. Simplicity First
24
+
25
+ **Minimum code that solves the problem. Nothing speculative.**
26
+
27
+ - No features beyond what was asked.
28
+ - No abstractions for single-use code.
29
+ - No "flexibility" or "configurability" that wasn't requested.
30
+ - No error handling for impossible scenarios.
31
+ - If you write 200 lines and it could be 50, rewrite it.
32
+
33
+ Ask yourself: "Would a senior engineer say this is overcomplicated?" If yes, simplify.
34
+
35
+ ## 3. Surgical Changes
36
+
37
+ **Touch only what you must. Clean up only your own mess.**
38
+
39
+ When editing existing code:
40
+ - Don't "improve" adjacent code, comments, or formatting.
41
+ - Don't refactor things that aren't broken.
42
+ - Match existing style, even if you'd do it differently.
43
+ - If you notice unrelated dead code, mention it - don't delete it.
44
+
45
+ When your changes create orphans:
46
+ - Remove imports/variables/functions that YOUR changes made unused.
47
+ - Don't remove pre-existing dead code unless asked.
48
+
49
+ The test: Every changed line should trace directly to the user's request.
50
+
51
+ ## 4. Goal-Driven Execution
52
+
53
+ **Define success criteria. Loop until verified.**
54
+
55
+ Transform tasks into verifiable goals:
56
+ - "Add validation" → "Write tests for invalid inputs, then make them pass"
57
+ - "Fix the bug" → "Write a test that reproduces it, then make it pass"
58
+ - "Refactor X" → "Ensure tests pass before and after"
59
+
60
+ For multi-step tasks, state a brief plan:
61
+ ```
62
+ 1. [Step] → verify: [check]
63
+ 2. [Step] → verify: [check]
64
+ 3. [Step] → verify: [check]
65
+ ```
66
+
67
+ Strong success criteria let you loop independently. Weak criteria ("make it work") require constant clarification.
.agents/skills/openenv-cli/SKILL.md ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ name: openenv-cli
3
+ description: "OpenEnv CLI (`openenv`) for scaffolding, validating, building, and pushing OpenEnv environments."
4
+ ---
5
+
6
+ Install: `pip install openenv-core`
7
+
8
+ The OpenEnv CLI command `openenv` is available.
9
+ Use `openenv --help` to view available commands.
10
+
11
+ Generated with `openenv-core v0.2.3`. Run `openenv skills add --force` to regenerate.
12
+
13
+ ## Tips
14
+
15
+ - Start with `openenv init <env_name>` to scaffold a new environment
16
+ - Validate projects with `openenv validate`
17
+ - Build and deploy with `openenv build` and `openenv push`
18
+ - Use `openenv <command> --help` for command-specific options
.agents/skills/python-testing/SKILL.md ADDED
@@ -0,0 +1,816 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ name: python-testing
3
+ description: Python testing strategies using pytest, TDD methodology, fixtures, mocking, parametrization, and coverage requirements.
4
+ origin: ECC
5
+ ---
6
+
7
+ # Python Testing Patterns
8
+
9
+ Comprehensive testing strategies for Python applications using pytest, TDD methodology, and best practices.
10
+
11
+ ## When to Activate
12
+
13
+ - Writing new Python code (follow TDD: red, green, refactor)
14
+ - Designing test suites for Python projects
15
+ - Reviewing Python test coverage
16
+ - Setting up testing infrastructure
17
+
18
+ ## Core Testing Philosophy
19
+
20
+ ### Test-Driven Development (TDD)
21
+
22
+ Always follow the TDD cycle:
23
+
24
+ 1. **RED**: Write a failing test for the desired behavior
25
+ 2. **GREEN**: Write minimal code to make the test pass
26
+ 3. **REFACTOR**: Improve code while keeping tests green
27
+
28
+ ```python
29
+ # Step 1: Write failing test (RED)
30
+ def test_add_numbers():
31
+ result = add(2, 3)
32
+ assert result == 5
33
+
34
+ # Step 2: Write minimal implementation (GREEN)
35
+ def add(a, b):
36
+ return a + b
37
+
38
+ # Step 3: Refactor if needed (REFACTOR)
39
+ ```
40
+
41
+ ### Coverage Requirements
42
+
43
+ - **Target**: 80%+ code coverage
44
+ - **Critical paths**: 100% coverage required
45
+ - Use `pytest --cov` to measure coverage
46
+
47
+ ```bash
48
+ pytest --cov=mypackage --cov-report=term-missing --cov-report=html
49
+ ```
50
+
51
+ ## pytest Fundamentals
52
+
53
+ ### Basic Test Structure
54
+
55
+ ```python
56
+ import pytest
57
+
58
+ def test_addition():
59
+ """Test basic addition."""
60
+ assert 2 + 2 == 4
61
+
62
+ def test_string_uppercase():
63
+ """Test string uppercasing."""
64
+ text = "hello"
65
+ assert text.upper() == "HELLO"
66
+
67
+ def test_list_append():
68
+ """Test list append."""
69
+ items = [1, 2, 3]
70
+ items.append(4)
71
+ assert 4 in items
72
+ assert len(items) == 4
73
+ ```
74
+
75
+ ### Assertions
76
+
77
+ ```python
78
+ # Equality
79
+ assert result == expected
80
+
81
+ # Inequality
82
+ assert result != unexpected
83
+
84
+ # Truthiness
85
+ assert result # Truthy
86
+ assert not result # Falsy
87
+ assert result is True # Exactly True
88
+ assert result is False # Exactly False
89
+ assert result is None # Exactly None
90
+
91
+ # Membership
92
+ assert item in collection
93
+ assert item not in collection
94
+
95
+ # Comparisons
96
+ assert result > 0
97
+ assert 0 <= result <= 100
98
+
99
+ # Type checking
100
+ assert isinstance(result, str)
101
+
102
+ # Exception testing (preferred approach)
103
+ with pytest.raises(ValueError):
104
+ raise ValueError("error message")
105
+
106
+ # Check exception message
107
+ with pytest.raises(ValueError, match="invalid input"):
108
+ raise ValueError("invalid input provided")
109
+
110
+ # Check exception attributes
111
+ with pytest.raises(ValueError) as exc_info:
112
+ raise ValueError("error message")
113
+ assert str(exc_info.value) == "error message"
114
+ ```
115
+
116
+ ## Fixtures
117
+
118
+ ### Basic Fixture Usage
119
+
120
+ ```python
121
+ import pytest
122
+
123
+ @pytest.fixture
124
+ def sample_data():
125
+ """Fixture providing sample data."""
126
+ return {"name": "Alice", "age": 30}
127
+
128
+ def test_sample_data(sample_data):
129
+ """Test using the fixture."""
130
+ assert sample_data["name"] == "Alice"
131
+ assert sample_data["age"] == 30
132
+ ```
133
+
134
+ ### Fixture with Setup/Teardown
135
+
136
+ ```python
137
+ @pytest.fixture
138
+ def database():
139
+ """Fixture with setup and teardown."""
140
+ # Setup
141
+ db = Database(":memory:")
142
+ db.create_tables()
143
+ db.insert_test_data()
144
+
145
+ yield db # Provide to test
146
+
147
+ # Teardown
148
+ db.close()
149
+
150
+ def test_database_query(database):
151
+ """Test database operations."""
152
+ result = database.query("SELECT * FROM users")
153
+ assert len(result) > 0
154
+ ```
155
+
156
+ ### Fixture Scopes
157
+
158
+ ```python
159
+ # Function scope (default) - runs for each test
160
+ @pytest.fixture
161
+ def temp_file():
162
+ with open("temp.txt", "w") as f:
163
+ yield f
164
+ os.remove("temp.txt")
165
+
166
+ # Module scope - runs once per module
167
+ @pytest.fixture(scope="module")
168
+ def module_db():
169
+ db = Database(":memory:")
170
+ db.create_tables()
171
+ yield db
172
+ db.close()
173
+
174
+ # Session scope - runs once per test session
175
+ @pytest.fixture(scope="session")
176
+ def shared_resource():
177
+ resource = ExpensiveResource()
178
+ yield resource
179
+ resource.cleanup()
180
+ ```
181
+
182
+ ### Fixture with Parameters
183
+
184
+ ```python
185
+ @pytest.fixture(params=[1, 2, 3])
186
+ def number(request):
187
+ """Parameterized fixture."""
188
+ return request.param
189
+
190
+ def test_numbers(number):
191
+ """Test runs 3 times, once for each parameter."""
192
+ assert number > 0
193
+ ```
194
+
195
+ ### Using Multiple Fixtures
196
+
197
+ ```python
198
+ @pytest.fixture
199
+ def user():
200
+ return User(id=1, name="Alice")
201
+
202
+ @pytest.fixture
203
+ def admin():
204
+ return User(id=2, name="Admin", role="admin")
205
+
206
+ def test_user_admin_interaction(user, admin):
207
+ """Test using multiple fixtures."""
208
+ assert admin.can_manage(user)
209
+ ```
210
+
211
+ ### Autouse Fixtures
212
+
213
+ ```python
214
+ @pytest.fixture(autouse=True)
215
+ def reset_config():
216
+ """Automatically runs before every test."""
217
+ Config.reset()
218
+ yield
219
+ Config.cleanup()
220
+
221
+ def test_without_fixture_call():
222
+ # reset_config runs automatically
223
+ assert Config.get_setting("debug") is False
224
+ ```
225
+
226
+ ### Conftest.py for Shared Fixtures
227
+
228
+ ```python
229
+ # tests/conftest.py
230
+ import pytest
231
+
232
+ @pytest.fixture
233
+ def client():
234
+ """Shared fixture for all tests."""
235
+ app = create_app(testing=True)
236
+ with app.test_client() as client:
237
+ yield client
238
+
239
+ @pytest.fixture
240
+ def auth_headers(client):
241
+ """Generate auth headers for API testing."""
242
+ response = client.post("/api/login", json={
243
+ "username": "test",
244
+ "password": "test"
245
+ })
246
+ token = response.json["token"]
247
+ return {"Authorization": f"Bearer {token}"}
248
+ ```
249
+
250
+ ## Parametrization
251
+
252
+ ### Basic Parametrization
253
+
254
+ ```python
255
+ @pytest.mark.parametrize("input,expected", [
256
+ ("hello", "HELLO"),
257
+ ("world", "WORLD"),
258
+ ("PyThOn", "PYTHON"),
259
+ ])
260
+ def test_uppercase(input, expected):
261
+ """Test runs 3 times with different inputs."""
262
+ assert input.upper() == expected
263
+ ```
264
+
265
+ ### Multiple Parameters
266
+
267
+ ```python
268
+ @pytest.mark.parametrize("a,b,expected", [
269
+ (2, 3, 5),
270
+ (0, 0, 0),
271
+ (-1, 1, 0),
272
+ (100, 200, 300),
273
+ ])
274
+ def test_add(a, b, expected):
275
+ """Test addition with multiple inputs."""
276
+ assert add(a, b) == expected
277
+ ```
278
+
279
+ ### Parametrize with IDs
280
+
281
+ ```python
282
+ @pytest.mark.parametrize("input,expected", [
283
+ ("valid@email.com", True),
284
+ ("invalid", False),
285
+ ("@no-domain.com", False),
286
+ ], ids=["valid-email", "missing-at", "missing-domain"])
287
+ def test_email_validation(input, expected):
288
+ """Test email validation with readable test IDs."""
289
+ assert is_valid_email(input) is expected
290
+ ```
291
+
292
+ ### Parametrized Fixtures
293
+
294
+ ```python
295
+ @pytest.fixture(params=["sqlite", "postgresql", "mysql"])
296
+ def db(request):
297
+ """Test against multiple database backends."""
298
+ if request.param == "sqlite":
299
+ return Database(":memory:")
300
+ elif request.param == "postgresql":
301
+ return Database("postgresql://localhost/test")
302
+ elif request.param == "mysql":
303
+ return Database("mysql://localhost/test")
304
+
305
+ def test_database_operations(db):
306
+ """Test runs 3 times, once for each database."""
307
+ result = db.query("SELECT 1")
308
+ assert result is not None
309
+ ```
310
+
311
+ ## Markers and Test Selection
312
+
313
+ ### Custom Markers
314
+
315
+ ```python
316
+ # Mark slow tests
317
+ @pytest.mark.slow
318
+ def test_slow_operation():
319
+ time.sleep(5)
320
+
321
+ # Mark integration tests
322
+ @pytest.mark.integration
323
+ def test_api_integration():
324
+ response = requests.get("https://api.example.com")
325
+ assert response.status_code == 200
326
+
327
+ # Mark unit tests
328
+ @pytest.mark.unit
329
+ def test_unit_logic():
330
+ assert calculate(2, 3) == 5
331
+ ```
332
+
333
+ ### Run Specific Tests
334
+
335
+ ```bash
336
+ # Run only fast tests
337
+ pytest -m "not slow"
338
+
339
+ # Run only integration tests
340
+ pytest -m integration
341
+
342
+ # Run integration or slow tests
343
+ pytest -m "integration or slow"
344
+
345
+ # Run tests marked as unit but not slow
346
+ pytest -m "unit and not slow"
347
+ ```
348
+
349
+ ### Configure Markers in pytest.ini
350
+
351
+ ```ini
352
+ [pytest]
353
+ markers =
354
+ slow: marks tests as slow
355
+ integration: marks tests as integration tests
356
+ unit: marks tests as unit tests
357
+ django: marks tests as requiring Django
358
+ ```
359
+
360
+ ## Mocking and Patching
361
+
362
+ ### Mocking Functions
363
+
364
+ ```python
365
+ from unittest.mock import patch, Mock
366
+
367
+ @patch("mypackage.external_api_call")
368
+ def test_with_mock(api_call_mock):
369
+ """Test with mocked external API."""
370
+ api_call_mock.return_value = {"status": "success"}
371
+
372
+ result = my_function()
373
+
374
+ api_call_mock.assert_called_once()
375
+ assert result["status"] == "success"
376
+ ```
377
+
378
+ ### Mocking Return Values
379
+
380
+ ```python
381
+ @patch("mypackage.Database.connect")
382
+ def test_database_connection(connect_mock):
383
+ """Test with mocked database connection."""
384
+ connect_mock.return_value = MockConnection()
385
+
386
+ db = Database()
387
+ db.connect()
388
+
389
+ connect_mock.assert_called_once_with("localhost")
390
+ ```
391
+
392
+ ### Mocking Exceptions
393
+
394
+ ```python
395
+ @patch("mypackage.api_call")
396
+ def test_api_error_handling(api_call_mock):
397
+ """Test error handling with mocked exception."""
398
+ api_call_mock.side_effect = ConnectionError("Network error")
399
+
400
+ with pytest.raises(ConnectionError):
401
+ api_call()
402
+
403
+ api_call_mock.assert_called_once()
404
+ ```
405
+
406
+ ### Mocking Context Managers
407
+
408
+ ```python
409
+ @patch("builtins.open", new_callable=mock_open)
410
+ def test_file_reading(mock_file):
411
+ """Test file reading with mocked open."""
412
+ mock_file.return_value.read.return_value = "file content"
413
+
414
+ result = read_file("test.txt")
415
+
416
+ mock_file.assert_called_once_with("test.txt", "r")
417
+ assert result == "file content"
418
+ ```
419
+
420
+ ### Using Autospec
421
+
422
+ ```python
423
+ @patch("mypackage.DBConnection", autospec=True)
424
+ def test_autospec(db_mock):
425
+ """Test with autospec to catch API misuse."""
426
+ db = db_mock.return_value
427
+ db.query("SELECT * FROM users")
428
+
429
+ # This would fail if DBConnection doesn't have query method
430
+ db_mock.assert_called_once()
431
+ ```
432
+
433
+ ### Mock Class Instances
434
+
435
+ ```python
436
+ class TestUserService:
437
+ @patch("mypackage.UserRepository")
438
+ def test_create_user(self, repo_mock):
439
+ """Test user creation with mocked repository."""
440
+ repo_mock.return_value.save.return_value = User(id=1, name="Alice")
441
+
442
+ service = UserService(repo_mock.return_value)
443
+ user = service.create_user(name="Alice")
444
+
445
+ assert user.name == "Alice"
446
+ repo_mock.return_value.save.assert_called_once()
447
+ ```
448
+
449
+ ### Mock Property
450
+
451
+ ```python
452
+ @pytest.fixture
453
+ def mock_config():
454
+ """Create a mock with a property."""
455
+ config = Mock()
456
+ type(config).debug = PropertyMock(return_value=True)
457
+ type(config).api_key = PropertyMock(return_value="test-key")
458
+ return config
459
+
460
+ def test_with_mock_config(mock_config):
461
+ """Test with mocked config properties."""
462
+ assert mock_config.debug is True
463
+ assert mock_config.api_key == "test-key"
464
+ ```
465
+
466
+ ## Testing Async Code
467
+
468
+ ### Async Tests with pytest-asyncio
469
+
470
+ ```python
471
+ import pytest
472
+
473
+ @pytest.mark.asyncio
474
+ async def test_async_function():
475
+ """Test async function."""
476
+ result = await async_add(2, 3)
477
+ assert result == 5
478
+
479
+ @pytest.mark.asyncio
480
+ async def test_async_with_fixture(async_client):
481
+ """Test async with async fixture."""
482
+ response = await async_client.get("/api/users")
483
+ assert response.status_code == 200
484
+ ```
485
+
486
+ ### Async Fixture
487
+
488
+ ```python
489
+ @pytest.fixture
490
+ async def async_client():
491
+ """Async fixture providing async test client."""
492
+ app = create_app()
493
+ async with app.test_client() as client:
494
+ yield client
495
+
496
+ @pytest.mark.asyncio
497
+ async def test_api_endpoint(async_client):
498
+ """Test using async fixture."""
499
+ response = await async_client.get("/api/data")
500
+ assert response.status_code == 200
501
+ ```
502
+
503
+ ### Mocking Async Functions
504
+
505
+ ```python
506
+ @pytest.mark.asyncio
507
+ @patch("mypackage.async_api_call")
508
+ async def test_async_mock(api_call_mock):
509
+ """Test async function with mock."""
510
+ api_call_mock.return_value = {"status": "ok"}
511
+
512
+ result = await my_async_function()
513
+
514
+ api_call_mock.assert_awaited_once()
515
+ assert result["status"] == "ok"
516
+ ```
517
+
518
+ ## Testing Exceptions
519
+
520
+ ### Testing Expected Exceptions
521
+
522
+ ```python
523
+ def test_divide_by_zero():
524
+ """Test that dividing by zero raises ZeroDivisionError."""
525
+ with pytest.raises(ZeroDivisionError):
526
+ divide(10, 0)
527
+
528
+ def test_custom_exception():
529
+ """Test custom exception with message."""
530
+ with pytest.raises(ValueError, match="invalid input"):
531
+ validate_input("invalid")
532
+ ```
533
+
534
+ ### Testing Exception Attributes
535
+
536
+ ```python
537
+ def test_exception_with_details():
538
+ """Test exception with custom attributes."""
539
+ with pytest.raises(CustomError) as exc_info:
540
+ raise CustomError("error", code=400)
541
+
542
+ assert exc_info.value.code == 400
543
+ assert "error" in str(exc_info.value)
544
+ ```
545
+
546
+ ## Testing Side Effects
547
+
548
+ ### Testing File Operations
549
+
550
+ ```python
551
+ import tempfile
552
+ import os
553
+
554
+ def test_file_processing():
555
+ """Test file processing with temp file."""
556
+ with tempfile.NamedTemporaryFile(mode='w', delete=False, suffix='.txt') as f:
557
+ f.write("test content")
558
+ temp_path = f.name
559
+
560
+ try:
561
+ result = process_file(temp_path)
562
+ assert result == "processed: test content"
563
+ finally:
564
+ os.unlink(temp_path)
565
+ ```
566
+
567
+ ### Testing with pytest's tmp_path Fixture
568
+
569
+ ```python
570
+ def test_with_tmp_path(tmp_path):
571
+ """Test using pytest's built-in temp path fixture."""
572
+ test_file = tmp_path / "test.txt"
573
+ test_file.write_text("hello world")
574
+
575
+ result = process_file(str(test_file))
576
+ assert result == "hello world"
577
+ # tmp_path automatically cleaned up
578
+ ```
579
+
580
+ ### Testing with tmpdir Fixture
581
+
582
+ ```python
583
+ def test_with_tmpdir(tmpdir):
584
+ """Test using pytest's tmpdir fixture."""
585
+ test_file = tmpdir.join("test.txt")
586
+ test_file.write("data")
587
+
588
+ result = process_file(str(test_file))
589
+ assert result == "data"
590
+ ```
591
+
592
+ ## Test Organization
593
+
594
+ ### Directory Structure
595
+
596
+ ```
597
+ tests/
598
+ ├── conftest.py # Shared fixtures
599
+ ├── __init__.py
600
+ ├── unit/ # Unit tests
601
+ │ ├── __init__.py
602
+ │ ├── test_models.py
603
+ │ ├── test_utils.py
604
+ │ └── test_services.py
605
+ ├── integration/ # Integration tests
606
+ │ ├── __init__.py
607
+ │ ├── test_api.py
608
+ │ └── test_database.py
609
+ └── e2e/ # End-to-end tests
610
+ ├── __init__.py
611
+ └── test_user_flow.py
612
+ ```
613
+
614
+ ### Test Classes
615
+
616
+ ```python
617
+ class TestUserService:
618
+ """Group related tests in a class."""
619
+
620
+ @pytest.fixture(autouse=True)
621
+ def setup(self):
622
+ """Setup runs before each test in this class."""
623
+ self.service = UserService()
624
+
625
+ def test_create_user(self):
626
+ """Test user creation."""
627
+ user = self.service.create_user("Alice")
628
+ assert user.name == "Alice"
629
+
630
+ def test_delete_user(self):
631
+ """Test user deletion."""
632
+ user = User(id=1, name="Bob")
633
+ self.service.delete_user(user)
634
+ assert not self.service.user_exists(1)
635
+ ```
636
+
637
+ ## Best Practices
638
+
639
+ ### DO
640
+
641
+ - **Follow TDD**: Write tests before code (red-green-refactor)
642
+ - **Test one thing**: Each test should verify a single behavior
643
+ - **Use descriptive names**: `test_user_login_with_invalid_credentials_fails`
644
+ - **Use fixtures**: Eliminate duplication with fixtures
645
+ - **Mock external dependencies**: Don't depend on external services
646
+ - **Test edge cases**: Empty inputs, None values, boundary conditions
647
+ - **Aim for 80%+ coverage**: Focus on critical paths
648
+ - **Keep tests fast**: Use marks to separate slow tests
649
+
650
+ ### DON'T
651
+
652
+ - **Don't test implementation**: Test behavior, not internals
653
+ - **Don't use complex conditionals in tests**: Keep tests simple
654
+ - **Don't ignore test failures**: All tests must pass
655
+ - **Don't test third-party code**: Trust libraries to work
656
+ - **Don't share state between tests**: Tests should be independent
657
+ - **Don't catch exceptions in tests**: Use `pytest.raises`
658
+ - **Don't use print statements**: Use assertions and pytest output
659
+ - **Don't write tests that are too brittle**: Avoid over-specific mocks
660
+
661
+ ## Common Patterns
662
+
663
+ ### Testing API Endpoints (FastAPI/Flask)
664
+
665
+ ```python
666
+ @pytest.fixture
667
+ def client():
668
+ app = create_app(testing=True)
669
+ return app.test_client()
670
+
671
+ def test_get_user(client):
672
+ response = client.get("/api/users/1")
673
+ assert response.status_code == 200
674
+ assert response.json["id"] == 1
675
+
676
+ def test_create_user(client):
677
+ response = client.post("/api/users", json={
678
+ "name": "Alice",
679
+ "email": "alice@example.com"
680
+ })
681
+ assert response.status_code == 201
682
+ assert response.json["name"] == "Alice"
683
+ ```
684
+
685
+ ### Testing Database Operations
686
+
687
+ ```python
688
+ @pytest.fixture
689
+ def db_session():
690
+ """Create a test database session."""
691
+ session = Session(bind=engine)
692
+ session.begin_nested()
693
+ yield session
694
+ session.rollback()
695
+ session.close()
696
+
697
+ def test_create_user(db_session):
698
+ user = User(name="Alice", email="alice@example.com")
699
+ db_session.add(user)
700
+ db_session.commit()
701
+
702
+ retrieved = db_session.query(User).filter_by(name="Alice").first()
703
+ assert retrieved.email == "alice@example.com"
704
+ ```
705
+
706
+ ### Testing Class Methods
707
+
708
+ ```python
709
+ class TestCalculator:
710
+ @pytest.fixture
711
+ def calculator(self):
712
+ return Calculator()
713
+
714
+ def test_add(self, calculator):
715
+ assert calculator.add(2, 3) == 5
716
+
717
+ def test_divide_by_zero(self, calculator):
718
+ with pytest.raises(ZeroDivisionError):
719
+ calculator.divide(10, 0)
720
+ ```
721
+
722
+ ## pytest Configuration
723
+
724
+ ### pytest.ini
725
+
726
+ ```ini
727
+ [pytest]
728
+ testpaths = tests
729
+ python_files = test_*.py
730
+ python_classes = Test*
731
+ python_functions = test_*
732
+ addopts =
733
+ --strict-markers
734
+ --disable-warnings
735
+ --cov=mypackage
736
+ --cov-report=term-missing
737
+ --cov-report=html
738
+ markers =
739
+ slow: marks tests as slow
740
+ integration: marks tests as integration tests
741
+ unit: marks tests as unit tests
742
+ ```
743
+
744
+ ### pyproject.toml
745
+
746
+ ```toml
747
+ [tool.pytest.ini_options]
748
+ testpaths = ["tests"]
749
+ python_files = ["test_*.py"]
750
+ python_classes = ["Test*"]
751
+ python_functions = ["test_*"]
752
+ addopts = [
753
+ "--strict-markers",
754
+ "--cov=mypackage",
755
+ "--cov-report=term-missing",
756
+ "--cov-report=html",
757
+ ]
758
+ markers = [
759
+ "slow: marks tests as slow",
760
+ "integration: marks tests as integration tests",
761
+ "unit: marks tests as unit tests",
762
+ ]
763
+ ```
764
+
765
+ ## Running Tests
766
+
767
+ ```bash
768
+ # Run all tests
769
+ pytest
770
+
771
+ # Run specific file
772
+ pytest tests/test_utils.py
773
+
774
+ # Run specific test
775
+ pytest tests/test_utils.py::test_function
776
+
777
+ # Run with verbose output
778
+ pytest -v
779
+
780
+ # Run with coverage
781
+ pytest --cov=mypackage --cov-report=html
782
+
783
+ # Run only fast tests
784
+ pytest -m "not slow"
785
+
786
+ # Run until first failure
787
+ pytest -x
788
+
789
+ # Run and stop on N failures
790
+ pytest --maxfail=3
791
+
792
+ # Run last failed tests
793
+ pytest --lf
794
+
795
+ # Run tests with pattern
796
+ pytest -k "test_user"
797
+
798
+ # Run with debugger on failure
799
+ pytest --pdb
800
+ ```
801
+
802
+ ## Quick Reference
803
+
804
+ | Pattern | Usage |
805
+ |---------|-------|
806
+ | `pytest.raises()` | Test expected exceptions |
807
+ | `@pytest.fixture()` | Create reusable test fixtures |
808
+ | `@pytest.mark.parametrize()` | Run tests with multiple inputs |
809
+ | `@pytest.mark.slow` | Mark slow tests |
810
+ | `pytest -m "not slow"` | Skip slow tests |
811
+ | `@patch()` | Mock functions and classes |
812
+ | `tmp_path` fixture | Automatic temp directory |
813
+ | `pytest --cov` | Generate coverage report |
814
+ | `assert` | Simple and readable assertions |
815
+
816
+ **Remember**: Tests are code too. Keep them clean, readable, and maintainable. Good tests catch bugs; great tests prevent them.
.agents/skills/receiving-code-review/SKILL.md ADDED
@@ -0,0 +1,213 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ name: receiving-code-review
3
+ description: Use when receiving code review feedback, before implementing suggestions, especially if feedback seems unclear or technically questionable - requires technical rigor and verification, not performative agreement or blind implementation
4
+ ---
5
+
6
+ # Code Review Reception
7
+
8
+ ## Overview
9
+
10
+ Code review requires technical evaluation, not emotional performance.
11
+
12
+ **Core principle:** Verify before implementing. Ask before assuming. Technical correctness over social comfort.
13
+
14
+ ## The Response Pattern
15
+
16
+ ```
17
+ WHEN receiving code review feedback:
18
+
19
+ 1. READ: Complete feedback without reacting
20
+ 2. UNDERSTAND: Restate requirement in own words (or ask)
21
+ 3. VERIFY: Check against codebase reality
22
+ 4. EVALUATE: Technically sound for THIS codebase?
23
+ 5. RESPOND: Technical acknowledgment or reasoned pushback
24
+ 6. IMPLEMENT: One item at a time, test each
25
+ ```
26
+
27
+ ## Forbidden Responses
28
+
29
+ **NEVER:**
30
+ - "You're absolutely right!" (explicit CLAUDE.md violation)
31
+ - "Great point!" / "Excellent feedback!" (performative)
32
+ - "Let me implement that now" (before verification)
33
+
34
+ **INSTEAD:**
35
+ - Restate the technical requirement
36
+ - Ask clarifying questions
37
+ - Push back with technical reasoning if wrong
38
+ - Just start working (actions > words)
39
+
40
+ ## Handling Unclear Feedback
41
+
42
+ ```
43
+ IF any item is unclear:
44
+ STOP - do not implement anything yet
45
+ ASK for clarification on unclear items
46
+
47
+ WHY: Items may be related. Partial understanding = wrong implementation.
48
+ ```
49
+
50
+ **Example:**
51
+ ```
52
+ your human partner: "Fix 1-6"
53
+ You understand 1,2,3,6. Unclear on 4,5.
54
+
55
+ ❌ WRONG: Implement 1,2,3,6 now, ask about 4,5 later
56
+ ✅ RIGHT: "I understand items 1,2,3,6. Need clarification on 4 and 5 before proceeding."
57
+ ```
58
+
59
+ ## Source-Specific Handling
60
+
61
+ ### From your human partner
62
+ - **Trusted** - implement after understanding
63
+ - **Still ask** if scope unclear
64
+ - **No performative agreement**
65
+ - **Skip to action** or technical acknowledgment
66
+
67
+ ### From External Reviewers
68
+ ```
69
+ BEFORE implementing:
70
+ 1. Check: Technically correct for THIS codebase?
71
+ 2. Check: Breaks existing functionality?
72
+ 3. Check: Reason for current implementation?
73
+ 4. Check: Works on all platforms/versions?
74
+ 5. Check: Does reviewer understand full context?
75
+
76
+ IF suggestion seems wrong:
77
+ Push back with technical reasoning
78
+
79
+ IF can't easily verify:
80
+ Say so: "I can't verify this without [X]. Should I [investigate/ask/proceed]?"
81
+
82
+ IF conflicts with your human partner's prior decisions:
83
+ Stop and discuss with your human partner first
84
+ ```
85
+
86
+ **your human partner's rule:** "External feedback - be skeptical, but check carefully"
87
+
88
+ ## YAGNI Check for "Professional" Features
89
+
90
+ ```
91
+ IF reviewer suggests "implementing properly":
92
+ grep codebase for actual usage
93
+
94
+ IF unused: "This endpoint isn't called. Remove it (YAGNI)?"
95
+ IF used: Then implement properly
96
+ ```
97
+
98
+ **your human partner's rule:** "You and reviewer both report to me. If we don't need this feature, don't add it."
99
+
100
+ ## Implementation Order
101
+
102
+ ```
103
+ FOR multi-item feedback:
104
+ 1. Clarify anything unclear FIRST
105
+ 2. Then implement in this order:
106
+ - Blocking issues (breaks, security)
107
+ - Simple fixes (typos, imports)
108
+ - Complex fixes (refactoring, logic)
109
+ 3. Test each fix individually
110
+ 4. Verify no regressions
111
+ ```
112
+
113
+ ## When To Push Back
114
+
115
+ Push back when:
116
+ - Suggestion breaks existing functionality
117
+ - Reviewer lacks full context
118
+ - Violates YAGNI (unused feature)
119
+ - Technically incorrect for this stack
120
+ - Legacy/compatibility reasons exist
121
+ - Conflicts with your human partner's architectural decisions
122
+
123
+ **How to push back:**
124
+ - Use technical reasoning, not defensiveness
125
+ - Ask specific questions
126
+ - Reference working tests/code
127
+ - Involve your human partner if architectural
128
+
129
+ **Signal if uncomfortable pushing back out loud:** "Strange things are afoot at the Circle K"
130
+
131
+ ## Acknowledging Correct Feedback
132
+
133
+ When feedback IS correct:
134
+ ```
135
+ ✅ "Fixed. [Brief description of what changed]"
136
+ ✅ "Good catch - [specific issue]. Fixed in [location]."
137
+ ✅ [Just fix it and show in the code]
138
+
139
+ ❌ "You're absolutely right!"
140
+ ❌ "Great point!"
141
+ ❌ "Thanks for catching that!"
142
+ ❌ "Thanks for [anything]"
143
+ ❌ ANY gratitude expression
144
+ ```
145
+
146
+ **Why no thanks:** Actions speak. Just fix it. The code itself shows you heard the feedback.
147
+
148
+ **If you catch yourself about to write "Thanks":** DELETE IT. State the fix instead.
149
+
150
+ ## Gracefully Correcting Your Pushback
151
+
152
+ If you pushed back and were wrong:
153
+ ```
154
+ ✅ "You were right - I checked [X] and it does [Y]. Implementing now."
155
+ ✅ "Verified this and you're correct. My initial understanding was wrong because [reason]. Fixing."
156
+
157
+ ❌ Long apology
158
+ ❌ Defending why you pushed back
159
+ ❌ Over-explaining
160
+ ```
161
+
162
+ State the correction factually and move on.
163
+
164
+ ## Common Mistakes
165
+
166
+ | Mistake | Fix |
167
+ |---------|-----|
168
+ | Performative agreement | State requirement or just act |
169
+ | Blind implementation | Verify against codebase first |
170
+ | Batch without testing | One at a time, test each |
171
+ | Assuming reviewer is right | Check if breaks things |
172
+ | Avoiding pushback | Technical correctness > comfort |
173
+ | Partial implementation | Clarify all items first |
174
+ | Can't verify, proceed anyway | State limitation, ask for direction |
175
+
176
+ ## Real Examples
177
+
178
+ **Performative Agreement (Bad):**
179
+ ```
180
+ Reviewer: "Remove legacy code"
181
+ ❌ "You're absolutely right! Let me remove that..."
182
+ ```
183
+
184
+ **Technical Verification (Good):**
185
+ ```
186
+ Reviewer: "Remove legacy code"
187
+ ✅ "Checking... build target is 10.15+, this API needs 13+. Need legacy for backward compat. Current impl has wrong bundle ID - fix it or drop pre-13 support?"
188
+ ```
189
+
190
+ **YAGNI (Good):**
191
+ ```
192
+ Reviewer: "Implement proper metrics tracking with database, date filters, CSV export"
193
+ ✅ "Grepped codebase - nothing calls this endpoint. Remove it (YAGNI)? Or is there usage I'm missing?"
194
+ ```
195
+
196
+ **Unclear Item (Good):**
197
+ ```
198
+ your human partner: "Fix items 1-6"
199
+ You understand 1,2,3,6. Unclear on 4,5.
200
+ ✅ "Understand 1,2,3,6. Need clarification on 4 and 5 before implementing."
201
+ ```
202
+
203
+ ## GitHub Thread Replies
204
+
205
+ When replying to inline review comments on GitHub, reply in the comment thread (`gh api repos/{owner}/{repo}/pulls/{pr}/comments/{id}/replies`), not as a top-level PR comment.
206
+
207
+ ## The Bottom Line
208
+
209
+ **External feedback = suggestions to evaluate, not orders to follow.**
210
+
211
+ Verify. Question. Then implement.
212
+
213
+ No performative agreement. Technical rigor always.
.agents/skills/requesting-code-review/SKILL.md ADDED
@@ -0,0 +1,105 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ name: requesting-code-review
3
+ description: Use when completing tasks, implementing major features, or before merging to verify work meets requirements
4
+ ---
5
+
6
+ # Requesting Code Review
7
+
8
+ Dispatch superpowers:code-reviewer subagent to catch issues before they cascade. The reviewer gets precisely crafted context for evaluation — never your session's history. This keeps the reviewer focused on the work product, not your thought process, and preserves your own context for continued work.
9
+
10
+ **Core principle:** Review early, review often.
11
+
12
+ ## When to Request Review
13
+
14
+ **Mandatory:**
15
+ - After each task in subagent-driven development
16
+ - After completing major feature
17
+ - Before merge to main
18
+
19
+ **Optional but valuable:**
20
+ - When stuck (fresh perspective)
21
+ - Before refactoring (baseline check)
22
+ - After fixing complex bug
23
+
24
+ ## How to Request
25
+
26
+ **1. Get git SHAs:**
27
+ ```bash
28
+ BASE_SHA=$(git rev-parse HEAD~1) # or origin/main
29
+ HEAD_SHA=$(git rev-parse HEAD)
30
+ ```
31
+
32
+ **2. Dispatch code-reviewer subagent:**
33
+
34
+ Use Task tool with superpowers:code-reviewer type, fill template at `code-reviewer.md`
35
+
36
+ **Placeholders:**
37
+ - `{WHAT_WAS_IMPLEMENTED}` - What you just built
38
+ - `{PLAN_OR_REQUIREMENTS}` - What it should do
39
+ - `{BASE_SHA}` - Starting commit
40
+ - `{HEAD_SHA}` - Ending commit
41
+ - `{DESCRIPTION}` - Brief summary
42
+
43
+ **3. Act on feedback:**
44
+ - Fix Critical issues immediately
45
+ - Fix Important issues before proceeding
46
+ - Note Minor issues for later
47
+ - Push back if reviewer is wrong (with reasoning)
48
+
49
+ ## Example
50
+
51
+ ```
52
+ [Just completed Task 2: Add verification function]
53
+
54
+ You: Let me request code review before proceeding.
55
+
56
+ BASE_SHA=$(git log --oneline | grep "Task 1" | head -1 | awk '{print $1}')
57
+ HEAD_SHA=$(git rev-parse HEAD)
58
+
59
+ [Dispatch superpowers:code-reviewer subagent]
60
+ WHAT_WAS_IMPLEMENTED: Verification and repair functions for conversation index
61
+ PLAN_OR_REQUIREMENTS: Task 2 from docs/superpowers/plans/deployment-plan.md
62
+ BASE_SHA: a7981ec
63
+ HEAD_SHA: 3df7661
64
+ DESCRIPTION: Added verifyIndex() and repairIndex() with 4 issue types
65
+
66
+ [Subagent returns]:
67
+ Strengths: Clean architecture, real tests
68
+ Issues:
69
+ Important: Missing progress indicators
70
+ Minor: Magic number (100) for reporting interval
71
+ Assessment: Ready to proceed
72
+
73
+ You: [Fix progress indicators]
74
+ [Continue to Task 3]
75
+ ```
76
+
77
+ ## Integration with Workflows
78
+
79
+ **Subagent-Driven Development:**
80
+ - Review after EACH task
81
+ - Catch issues before they compound
82
+ - Fix before moving to next task
83
+
84
+ **Executing Plans:**
85
+ - Review after each batch (3 tasks)
86
+ - Get feedback, apply, continue
87
+
88
+ **Ad-Hoc Development:**
89
+ - Review before merge
90
+ - Review when stuck
91
+
92
+ ## Red Flags
93
+
94
+ **Never:**
95
+ - Skip review because "it's simple"
96
+ - Ignore Critical issues
97
+ - Proceed with unfixed Important issues
98
+ - Argue with valid technical feedback
99
+
100
+ **If reviewer wrong:**
101
+ - Push back with technical reasoning
102
+ - Show code/tests that prove it works
103
+ - Request clarification
104
+
105
+ See template at: requesting-code-review/code-reviewer.md
.agents/skills/requesting-code-review/code-reviewer.md ADDED
@@ -0,0 +1,146 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Code Review Agent
2
+
3
+ You are reviewing code changes for production readiness.
4
+
5
+ **Your task:**
6
+ 1. Review {WHAT_WAS_IMPLEMENTED}
7
+ 2. Compare against {PLAN_OR_REQUIREMENTS}
8
+ 3. Check code quality, architecture, testing
9
+ 4. Categorize issues by severity
10
+ 5. Assess production readiness
11
+
12
+ ## What Was Implemented
13
+
14
+ {DESCRIPTION}
15
+
16
+ ## Requirements/Plan
17
+
18
+ {PLAN_REFERENCE}
19
+
20
+ ## Git Range to Review
21
+
22
+ **Base:** {BASE_SHA}
23
+ **Head:** {HEAD_SHA}
24
+
25
+ ```bash
26
+ git diff --stat {BASE_SHA}..{HEAD_SHA}
27
+ git diff {BASE_SHA}..{HEAD_SHA}
28
+ ```
29
+
30
+ ## Review Checklist
31
+
32
+ **Code Quality:**
33
+ - Clean separation of concerns?
34
+ - Proper error handling?
35
+ - Type safety (if applicable)?
36
+ - DRY principle followed?
37
+ - Edge cases handled?
38
+
39
+ **Architecture:**
40
+ - Sound design decisions?
41
+ - Scalability considerations?
42
+ - Performance implications?
43
+ - Security concerns?
44
+
45
+ **Testing:**
46
+ - Tests actually test logic (not mocks)?
47
+ - Edge cases covered?
48
+ - Integration tests where needed?
49
+ - All tests passing?
50
+
51
+ **Requirements:**
52
+ - All plan requirements met?
53
+ - Implementation matches spec?
54
+ - No scope creep?
55
+ - Breaking changes documented?
56
+
57
+ **Production Readiness:**
58
+ - Migration strategy (if schema changes)?
59
+ - Backward compatibility considered?
60
+ - Documentation complete?
61
+ - No obvious bugs?
62
+
63
+ ## Output Format
64
+
65
+ ### Strengths
66
+ [What's well done? Be specific.]
67
+
68
+ ### Issues
69
+
70
+ #### Critical (Must Fix)
71
+ [Bugs, security issues, data loss risks, broken functionality]
72
+
73
+ #### Important (Should Fix)
74
+ [Architecture problems, missing features, poor error handling, test gaps]
75
+
76
+ #### Minor (Nice to Have)
77
+ [Code style, optimization opportunities, documentation improvements]
78
+
79
+ **For each issue:**
80
+ - File:line reference
81
+ - What's wrong
82
+ - Why it matters
83
+ - How to fix (if not obvious)
84
+
85
+ ### Recommendations
86
+ [Improvements for code quality, architecture, or process]
87
+
88
+ ### Assessment
89
+
90
+ **Ready to merge?** [Yes/No/With fixes]
91
+
92
+ **Reasoning:** [Technical assessment in 1-2 sentences]
93
+
94
+ ## Critical Rules
95
+
96
+ **DO:**
97
+ - Categorize by actual severity (not everything is Critical)
98
+ - Be specific (file:line, not vague)
99
+ - Explain WHY issues matter
100
+ - Acknowledge strengths
101
+ - Give clear verdict
102
+
103
+ **DON'T:**
104
+ - Say "looks good" without checking
105
+ - Mark nitpicks as Critical
106
+ - Give feedback on code you didn't review
107
+ - Be vague ("improve error handling")
108
+ - Avoid giving a clear verdict
109
+
110
+ ## Example Output
111
+
112
+ ```
113
+ ### Strengths
114
+ - Clean database schema with proper migrations (db.ts:15-42)
115
+ - Comprehensive test coverage (18 tests, all edge cases)
116
+ - Good error handling with fallbacks (summarizer.ts:85-92)
117
+
118
+ ### Issues
119
+
120
+ #### Important
121
+ 1. **Missing help text in CLI wrapper**
122
+ - File: index-conversations:1-31
123
+ - Issue: No --help flag, users won't discover --concurrency
124
+ - Fix: Add --help case with usage examples
125
+
126
+ 2. **Date validation missing**
127
+ - File: search.ts:25-27
128
+ - Issue: Invalid dates silently return no results
129
+ - Fix: Validate ISO format, throw error with example
130
+
131
+ #### Minor
132
+ 1. **Progress indicators**
133
+ - File: indexer.ts:130
134
+ - Issue: No "X of Y" counter for long operations
135
+ - Impact: Users don't know how long to wait
136
+
137
+ ### Recommendations
138
+ - Add progress reporting for user experience
139
+ - Consider config file for excluded projects (portability)
140
+
141
+ ### Assessment
142
+
143
+ **Ready to merge: With fixes**
144
+
145
+ **Reasoning:** Core implementation is solid with good architecture and tests. Important issues (help text, date validation) are easily fixed and don't affect core functionality.
146
+ ```
.agents/skills/search-first/SKILL.md ADDED
@@ -0,0 +1,161 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ name: search-first
3
+ description: Research-before-coding workflow. Search for existing tools, libraries, and patterns before writing custom code. Invokes the researcher agent.
4
+ origin: ECC
5
+ ---
6
+
7
+ # /search-first — Research Before You Code
8
+
9
+ Systematizes the "search for existing solutions before implementing" workflow.
10
+
11
+ ## Trigger
12
+
13
+ Use this skill when:
14
+ - Starting a new feature that likely has existing solutions
15
+ - Adding a dependency or integration
16
+ - The user asks "add X functionality" and you're about to write code
17
+ - Before creating a new utility, helper, or abstraction
18
+
19
+ ## Workflow
20
+
21
+ ```
22
+ ┌─────────────────────────────────────────────┐
23
+ │ 1. NEED ANALYSIS │
24
+ │ Define what functionality is needed │
25
+ │ Identify language/framework constraints │
26
+ ├─────────────────────────────────────────────┤
27
+ │ 2. PARALLEL SEARCH (researcher agent) │
28
+ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
29
+ │ │ npm / │ │ MCP / │ │ GitHub / │ │
30
+ │ │ PyPI │ │ Skills │ │ Web │ │
31
+ │ └──────────┘ └──────────┘ └──────────┘ │
32
+ ├─────────────────────────────────────────────┤
33
+ │ 3. EVALUATE │
34
+ │ Score candidates (functionality, maint, │
35
+ │ community, docs, license, deps) │
36
+ ├─────────────────────────────────────────────┤
37
+ │ 4. DECIDE │
38
+ │ ┌─────────┐ ┌──────────┐ ┌─────────┐ │
39
+ │ │ Adopt │ │ Extend │ │ Build │ │
40
+ │ │ as-is │ │ /Wrap │ │ Custom │ │
41
+ │ └─────────┘ └──────────┘ └─────────┘ │
42
+ ├─────────────────────────────────────────────┤
43
+ │ 5. IMPLEMENT │
44
+ │ Install package / Configure MCP / │
45
+ │ Write minimal custom code │
46
+ └─────────────────────────────────────────────┘
47
+ ```
48
+
49
+ ## Decision Matrix
50
+
51
+ | Signal | Action |
52
+ |--------|--------|
53
+ | Exact match, well-maintained, MIT/Apache | **Adopt** — install and use directly |
54
+ | Partial match, good foundation | **Extend** — install + write thin wrapper |
55
+ | Multiple weak matches | **Compose** — combine 2-3 small packages |
56
+ | Nothing suitable found | **Build** — write custom, but informed by research |
57
+
58
+ ## How to Use
59
+
60
+ ### Quick Mode (inline)
61
+
62
+ Before writing a utility or adding functionality, mentally run through:
63
+
64
+ 0. Does this already exist in the repo? → `rg` through relevant modules/tests first
65
+ 1. Is this a common problem? → Search npm/PyPI
66
+ 2. Is there an MCP for this? → Check `~/.claude/settings.json` and search
67
+ 3. Is there a skill for this? → Check `~/.claude/skills/`
68
+ 4. Is there a GitHub implementation/template? → Run GitHub code search for maintained OSS before writing net-new code
69
+
70
+ ### Full Mode (agent)
71
+
72
+ For non-trivial functionality, launch the researcher agent:
73
+
74
+ ```
75
+ Task(subagent_type="general-purpose", prompt="
76
+ Research existing tools for: [DESCRIPTION]
77
+ Language/framework: [LANG]
78
+ Constraints: [ANY]
79
+
80
+ Search: npm/PyPI, MCP servers, Claude Code skills, GitHub
81
+ Return: Structured comparison with recommendation
82
+ ")
83
+ ```
84
+
85
+ ## Search Shortcuts by Category
86
+
87
+ ### Development Tooling
88
+ - Linting → `eslint`, `ruff`, `textlint`, `markdownlint`
89
+ - Formatting → `prettier`, `black`, `gofmt`
90
+ - Testing → `jest`, `pytest`, `go test`
91
+ - Pre-commit → `husky`, `lint-staged`, `pre-commit`
92
+
93
+ ### AI/LLM Integration
94
+ - Claude SDK → Context7 for latest docs
95
+ - Prompt management → Check MCP servers
96
+ - Document processing → `unstructured`, `pdfplumber`, `mammoth`
97
+
98
+ ### Data & APIs
99
+ - HTTP clients → `httpx` (Python), `ky`/`got` (Node)
100
+ - Validation → `zod` (TS), `pydantic` (Python)
101
+ - Database → Check for MCP servers first
102
+
103
+ ### Content & Publishing
104
+ - Markdown processing → `remark`, `unified`, `markdown-it`
105
+ - Image optimization → `sharp`, `imagemin`
106
+
107
+ ## Integration Points
108
+
109
+ ### With planner agent
110
+ The planner should invoke researcher before Phase 1 (Architecture Review):
111
+ - Researcher identifies available tools
112
+ - Planner incorporates them into the implementation plan
113
+ - Avoids "reinventing the wheel" in the plan
114
+
115
+ ### With architect agent
116
+ The architect should consult researcher for:
117
+ - Technology stack decisions
118
+ - Integration pattern discovery
119
+ - Existing reference architectures
120
+
121
+ ### With iterative-retrieval skill
122
+ Combine for progressive discovery:
123
+ - Cycle 1: Broad search (npm, PyPI, MCP)
124
+ - Cycle 2: Evaluate top candidates in detail
125
+ - Cycle 3: Test compatibility with project constraints
126
+
127
+ ## Examples
128
+
129
+ ### Example 1: "Add dead link checking"
130
+ ```
131
+ Need: Check markdown files for broken links
132
+ Search: npm "markdown dead link checker"
133
+ Found: textlint-rule-no-dead-link (score: 9/10)
134
+ Action: ADOPT — npm install textlint-rule-no-dead-link
135
+ Result: Zero custom code, battle-tested solution
136
+ ```
137
+
138
+ ### Example 2: "Add HTTP client wrapper"
139
+ ```
140
+ Need: Resilient HTTP client with retries and timeout handling
141
+ Search: npm "http client retry", PyPI "httpx retry"
142
+ Found: got (Node) with retry plugin, httpx (Python) with built-in retry
143
+ Action: ADOPT — use got/httpx directly with retry config
144
+ Result: Zero custom code, production-proven libraries
145
+ ```
146
+
147
+ ### Example 3: "Add config file linter"
148
+ ```
149
+ Need: Validate project config files against a schema
150
+ Search: npm "config linter schema", "json schema validator cli"
151
+ Found: ajv-cli (score: 8/10)
152
+ Action: ADOPT + EXTEND — install ajv-cli, write project-specific schema
153
+ Result: 1 package + 1 schema file, no custom validation logic
154
+ ```
155
+
156
+ ## Anti-Patterns
157
+
158
+ - **Jumping to code**: Writing a utility without checking if one exists
159
+ - **Ignoring MCP**: Not checking if an MCP server already provides the capability
160
+ - **Over-customizing**: Wrapping a library so heavily it loses its benefits
161
+ - **Dependency bloat**: Installing a massive package for one small feature
.agents/skills/security-review/SKILL.md ADDED
@@ -0,0 +1,495 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ name: security-review
3
+ description: Use this skill when adding authentication, handling user input, working with secrets, creating API endpoints, or implementing payment/sensitive features. Provides comprehensive security checklist and patterns.
4
+ origin: ECC
5
+ ---
6
+
7
+ # Security Review Skill
8
+
9
+ This skill ensures all code follows security best practices and identifies potential vulnerabilities.
10
+
11
+ ## When to Activate
12
+
13
+ - Implementing authentication or authorization
14
+ - Handling user input or file uploads
15
+ - Creating new API endpoints
16
+ - Working with secrets or credentials
17
+ - Implementing payment features
18
+ - Storing or transmitting sensitive data
19
+ - Integrating third-party APIs
20
+
21
+ ## Security Checklist
22
+
23
+ ### 1. Secrets Management
24
+
25
+ #### FAIL: NEVER Do This
26
+ ```typescript
27
+ const apiKey = "sk-proj-xxxxx" // Hardcoded secret
28
+ const dbPassword = "password123" // In source code
29
+ ```
30
+
31
+ #### PASS: ALWAYS Do This
32
+ ```typescript
33
+ const apiKey = process.env.OPENAI_API_KEY
34
+ const dbUrl = process.env.DATABASE_URL
35
+
36
+ // Verify secrets exist
37
+ if (!apiKey) {
38
+ throw new Error('OPENAI_API_KEY not configured')
39
+ }
40
+ ```
41
+
42
+ #### Verification Steps
43
+ - [ ] No hardcoded API keys, tokens, or passwords
44
+ - [ ] All secrets in environment variables
45
+ - [ ] `.env.local` in .gitignore
46
+ - [ ] No secrets in git history
47
+ - [ ] Production secrets in hosting platform (Vercel, Railway)
48
+
49
+ ### 2. Input Validation
50
+
51
+ #### Always Validate User Input
52
+ ```typescript
53
+ import { z } from 'zod'
54
+
55
+ // Define validation schema
56
+ const CreateUserSchema = z.object({
57
+ email: z.string().email(),
58
+ name: z.string().min(1).max(100),
59
+ age: z.number().int().min(0).max(150)
60
+ })
61
+
62
+ // Validate before processing
63
+ export async function createUser(input: unknown) {
64
+ try {
65
+ const validated = CreateUserSchema.parse(input)
66
+ return await db.users.create(validated)
67
+ } catch (error) {
68
+ if (error instanceof z.ZodError) {
69
+ return { success: false, errors: error.errors }
70
+ }
71
+ throw error
72
+ }
73
+ }
74
+ ```
75
+
76
+ #### File Upload Validation
77
+ ```typescript
78
+ function validateFileUpload(file: File) {
79
+ // Size check (5MB max)
80
+ const maxSize = 5 * 1024 * 1024
81
+ if (file.size > maxSize) {
82
+ throw new Error('File too large (max 5MB)')
83
+ }
84
+
85
+ // Type check
86
+ const allowedTypes = ['image/jpeg', 'image/png', 'image/gif']
87
+ if (!allowedTypes.includes(file.type)) {
88
+ throw new Error('Invalid file type')
89
+ }
90
+
91
+ // Extension check
92
+ const allowedExtensions = ['.jpg', '.jpeg', '.png', '.gif']
93
+ const extension = file.name.toLowerCase().match(/\.[^.]+$/)?.[0]
94
+ if (!extension || !allowedExtensions.includes(extension)) {
95
+ throw new Error('Invalid file extension')
96
+ }
97
+
98
+ return true
99
+ }
100
+ ```
101
+
102
+ #### Verification Steps
103
+ - [ ] All user inputs validated with schemas
104
+ - [ ] File uploads restricted (size, type, extension)
105
+ - [ ] No direct use of user input in queries
106
+ - [ ] Whitelist validation (not blacklist)
107
+ - [ ] Error messages don't leak sensitive info
108
+
109
+ ### 3. SQL Injection Prevention
110
+
111
+ #### FAIL: NEVER Concatenate SQL
112
+ ```typescript
113
+ // DANGEROUS - SQL Injection vulnerability
114
+ const query = `SELECT * FROM users WHERE email = '${userEmail}'`
115
+ await db.query(query)
116
+ ```
117
+
118
+ #### PASS: ALWAYS Use Parameterized Queries
119
+ ```typescript
120
+ // Safe - parameterized query
121
+ const { data } = await supabase
122
+ .from('users')
123
+ .select('*')
124
+ .eq('email', userEmail)
125
+
126
+ // Or with raw SQL
127
+ await db.query(
128
+ 'SELECT * FROM users WHERE email = $1',
129
+ [userEmail]
130
+ )
131
+ ```
132
+
133
+ #### Verification Steps
134
+ - [ ] All database queries use parameterized queries
135
+ - [ ] No string concatenation in SQL
136
+ - [ ] ORM/query builder used correctly
137
+ - [ ] Supabase queries properly sanitized
138
+
139
+ ### 4. Authentication & Authorization
140
+
141
+ #### JWT Token Handling
142
+ ```typescript
143
+ // FAIL: WRONG: localStorage (vulnerable to XSS)
144
+ localStorage.setItem('token', token)
145
+
146
+ // PASS: CORRECT: httpOnly cookies
147
+ res.setHeader('Set-Cookie',
148
+ `token=${token}; HttpOnly; Secure; SameSite=Strict; Max-Age=3600`)
149
+ ```
150
+
151
+ #### Authorization Checks
152
+ ```typescript
153
+ export async function deleteUser(userId: string, requesterId: string) {
154
+ // ALWAYS verify authorization first
155
+ const requester = await db.users.findUnique({
156
+ where: { id: requesterId }
157
+ })
158
+
159
+ if (requester.role !== 'admin') {
160
+ return NextResponse.json(
161
+ { error: 'Unauthorized' },
162
+ { status: 403 }
163
+ )
164
+ }
165
+
166
+ // Proceed with deletion
167
+ await db.users.delete({ where: { id: userId } })
168
+ }
169
+ ```
170
+
171
+ #### Row Level Security (Supabase)
172
+ ```sql
173
+ -- Enable RLS on all tables
174
+ ALTER TABLE users ENABLE ROW LEVEL SECURITY;
175
+
176
+ -- Users can only view their own data
177
+ CREATE POLICY "Users view own data"
178
+ ON users FOR SELECT
179
+ USING (auth.uid() = id);
180
+
181
+ -- Users can only update their own data
182
+ CREATE POLICY "Users update own data"
183
+ ON users FOR UPDATE
184
+ USING (auth.uid() = id);
185
+ ```
186
+
187
+ #### Verification Steps
188
+ - [ ] Tokens stored in httpOnly cookies (not localStorage)
189
+ - [ ] Authorization checks before sensitive operations
190
+ - [ ] Row Level Security enabled in Supabase
191
+ - [ ] Role-based access control implemented
192
+ - [ ] Session management secure
193
+
194
+ ### 5. XSS Prevention
195
+
196
+ #### Sanitize HTML
197
+ ```typescript
198
+ import DOMPurify from 'isomorphic-dompurify'
199
+
200
+ // ALWAYS sanitize user-provided HTML
201
+ function renderUserContent(html: string) {
202
+ const clean = DOMPurify.sanitize(html, {
203
+ ALLOWED_TAGS: ['b', 'i', 'em', 'strong', 'p'],
204
+ ALLOWED_ATTR: []
205
+ })
206
+ return <div dangerouslySetInnerHTML={{ __html: clean }} />
207
+ }
208
+ ```
209
+
210
+ #### Content Security Policy
211
+ ```typescript
212
+ // next.config.js
213
+ const securityHeaders = [
214
+ {
215
+ key: 'Content-Security-Policy',
216
+ value: `
217
+ default-src 'self';
218
+ script-src 'self' 'unsafe-eval' 'unsafe-inline';
219
+ style-src 'self' 'unsafe-inline';
220
+ img-src 'self' data: https:;
221
+ font-src 'self';
222
+ connect-src 'self' https://api.example.com;
223
+ `.replace(/\s{2,}/g, ' ').trim()
224
+ }
225
+ ]
226
+ ```
227
+
228
+ #### Verification Steps
229
+ - [ ] User-provided HTML sanitized
230
+ - [ ] CSP headers configured
231
+ - [ ] No unvalidated dynamic content rendering
232
+ - [ ] React's built-in XSS protection used
233
+
234
+ ### 6. CSRF Protection
235
+
236
+ #### CSRF Tokens
237
+ ```typescript
238
+ import { csrf } from '@/lib/csrf'
239
+
240
+ export async function POST(request: Request) {
241
+ const token = request.headers.get('X-CSRF-Token')
242
+
243
+ if (!csrf.verify(token)) {
244
+ return NextResponse.json(
245
+ { error: 'Invalid CSRF token' },
246
+ { status: 403 }
247
+ )
248
+ }
249
+
250
+ // Process request
251
+ }
252
+ ```
253
+
254
+ #### SameSite Cookies
255
+ ```typescript
256
+ res.setHeader('Set-Cookie',
257
+ `session=${sessionId}; HttpOnly; Secure; SameSite=Strict`)
258
+ ```
259
+
260
+ #### Verification Steps
261
+ - [ ] CSRF tokens on state-changing operations
262
+ - [ ] SameSite=Strict on all cookies
263
+ - [ ] Double-submit cookie pattern implemented
264
+
265
+ ### 7. Rate Limiting
266
+
267
+ #### API Rate Limiting
268
+ ```typescript
269
+ import rateLimit from 'express-rate-limit'
270
+
271
+ const limiter = rateLimit({
272
+ windowMs: 15 * 60 * 1000, // 15 minutes
273
+ max: 100, // 100 requests per window
274
+ message: 'Too many requests'
275
+ })
276
+
277
+ // Apply to routes
278
+ app.use('/api/', limiter)
279
+ ```
280
+
281
+ #### Expensive Operations
282
+ ```typescript
283
+ // Aggressive rate limiting for searches
284
+ const searchLimiter = rateLimit({
285
+ windowMs: 60 * 1000, // 1 minute
286
+ max: 10, // 10 requests per minute
287
+ message: 'Too many search requests'
288
+ })
289
+
290
+ app.use('/api/search', searchLimiter)
291
+ ```
292
+
293
+ #### Verification Steps
294
+ - [ ] Rate limiting on all API endpoints
295
+ - [ ] Stricter limits on expensive operations
296
+ - [ ] IP-based rate limiting
297
+ - [ ] User-based rate limiting (authenticated)
298
+
299
+ ### 8. Sensitive Data Exposure
300
+
301
+ #### Logging
302
+ ```typescript
303
+ // FAIL: WRONG: Logging sensitive data
304
+ console.log('User login:', { email, password })
305
+ console.log('Payment:', { cardNumber, cvv })
306
+
307
+ // PASS: CORRECT: Redact sensitive data
308
+ console.log('User login:', { email, userId })
309
+ console.log('Payment:', { last4: card.last4, userId })
310
+ ```
311
+
312
+ #### Error Messages
313
+ ```typescript
314
+ // FAIL: WRONG: Exposing internal details
315
+ catch (error) {
316
+ return NextResponse.json(
317
+ { error: error.message, stack: error.stack },
318
+ { status: 500 }
319
+ )
320
+ }
321
+
322
+ // PASS: CORRECT: Generic error messages
323
+ catch (error) {
324
+ console.error('Internal error:', error)
325
+ return NextResponse.json(
326
+ { error: 'An error occurred. Please try again.' },
327
+ { status: 500 }
328
+ )
329
+ }
330
+ ```
331
+
332
+ #### Verification Steps
333
+ - [ ] No passwords, tokens, or secrets in logs
334
+ - [ ] Error messages generic for users
335
+ - [ ] Detailed errors only in server logs
336
+ - [ ] No stack traces exposed to users
337
+
338
+ ### 9. Blockchain Security (Solana)
339
+
340
+ #### Wallet Verification
341
+ ```typescript
342
+ import { verify } from '@solana/web3.js'
343
+
344
+ async function verifyWalletOwnership(
345
+ publicKey: string,
346
+ signature: string,
347
+ message: string
348
+ ) {
349
+ try {
350
+ const isValid = verify(
351
+ Buffer.from(message),
352
+ Buffer.from(signature, 'base64'),
353
+ Buffer.from(publicKey, 'base64')
354
+ )
355
+ return isValid
356
+ } catch (error) {
357
+ return false
358
+ }
359
+ }
360
+ ```
361
+
362
+ #### Transaction Verification
363
+ ```typescript
364
+ async function verifyTransaction(transaction: Transaction) {
365
+ // Verify recipient
366
+ if (transaction.to !== expectedRecipient) {
367
+ throw new Error('Invalid recipient')
368
+ }
369
+
370
+ // Verify amount
371
+ if (transaction.amount > maxAmount) {
372
+ throw new Error('Amount exceeds limit')
373
+ }
374
+
375
+ // Verify user has sufficient balance
376
+ const balance = await getBalance(transaction.from)
377
+ if (balance < transaction.amount) {
378
+ throw new Error('Insufficient balance')
379
+ }
380
+
381
+ return true
382
+ }
383
+ ```
384
+
385
+ #### Verification Steps
386
+ - [ ] Wallet signatures verified
387
+ - [ ] Transaction details validated
388
+ - [ ] Balance checks before transactions
389
+ - [ ] No blind transaction signing
390
+
391
+ ### 10. Dependency Security
392
+
393
+ #### Regular Updates
394
+ ```bash
395
+ # Check for vulnerabilities
396
+ npm audit
397
+
398
+ # Fix automatically fixable issues
399
+ npm audit fix
400
+
401
+ # Update dependencies
402
+ npm update
403
+
404
+ # Check for outdated packages
405
+ npm outdated
406
+ ```
407
+
408
+ #### Lock Files
409
+ ```bash
410
+ # ALWAYS commit lock files
411
+ git add package-lock.json
412
+
413
+ # Use in CI/CD for reproducible builds
414
+ npm ci # Instead of npm install
415
+ ```
416
+
417
+ #### Verification Steps
418
+ - [ ] Dependencies up to date
419
+ - [ ] No known vulnerabilities (npm audit clean)
420
+ - [ ] Lock files committed
421
+ - [ ] Dependabot enabled on GitHub
422
+ - [ ] Regular security updates
423
+
424
+ ## Security Testing
425
+
426
+ ### Automated Security Tests
427
+ ```typescript
428
+ // Test authentication
429
+ test('requires authentication', async () => {
430
+ const response = await fetch('/api/protected')
431
+ expect(response.status).toBe(401)
432
+ })
433
+
434
+ // Test authorization
435
+ test('requires admin role', async () => {
436
+ const response = await fetch('/api/admin', {
437
+ headers: { Authorization: `Bearer ${userToken}` }
438
+ })
439
+ expect(response.status).toBe(403)
440
+ })
441
+
442
+ // Test input validation
443
+ test('rejects invalid input', async () => {
444
+ const response = await fetch('/api/users', {
445
+ method: 'POST',
446
+ body: JSON.stringify({ email: 'not-an-email' })
447
+ })
448
+ expect(response.status).toBe(400)
449
+ })
450
+
451
+ // Test rate limiting
452
+ test('enforces rate limits', async () => {
453
+ const requests = Array(101).fill(null).map(() =>
454
+ fetch('/api/endpoint')
455
+ )
456
+
457
+ const responses = await Promise.all(requests)
458
+ const tooManyRequests = responses.filter(r => r.status === 429)
459
+
460
+ expect(tooManyRequests.length).toBeGreaterThan(0)
461
+ })
462
+ ```
463
+
464
+ ## Pre-Deployment Security Checklist
465
+
466
+ Before ANY production deployment:
467
+
468
+ - [ ] **Secrets**: No hardcoded secrets, all in env vars
469
+ - [ ] **Input Validation**: All user inputs validated
470
+ - [ ] **SQL Injection**: All queries parameterized
471
+ - [ ] **XSS**: User content sanitized
472
+ - [ ] **CSRF**: Protection enabled
473
+ - [ ] **Authentication**: Proper token handling
474
+ - [ ] **Authorization**: Role checks in place
475
+ - [ ] **Rate Limiting**: Enabled on all endpoints
476
+ - [ ] **HTTPS**: Enforced in production
477
+ - [ ] **Security Headers**: CSP, X-Frame-Options configured
478
+ - [ ] **Error Handling**: No sensitive data in errors
479
+ - [ ] **Logging**: No sensitive data logged
480
+ - [ ] **Dependencies**: Up to date, no vulnerabilities
481
+ - [ ] **Row Level Security**: Enabled in Supabase
482
+ - [ ] **CORS**: Properly configured
483
+ - [ ] **File Uploads**: Validated (size, type)
484
+ - [ ] **Wallet Signatures**: Verified (if blockchain)
485
+
486
+ ## Resources
487
+
488
+ - [OWASP Top 10](https://owasp.org/www-project-top-ten/)
489
+ - [Next.js Security](https://nextjs.org/docs/security)
490
+ - [Supabase Security](https://supabase.com/docs/guides/auth)
491
+ - [Web Security Academy](https://portswigger.net/web-security)
492
+
493
+ ---
494
+
495
+ **Remember**: Security is not optional. One vulnerability can compromise the entire platform. When in doubt, err on the side of caution.
.agents/skills/security-review/cloud-infrastructure-security.md ADDED
@@ -0,0 +1,361 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ | name | description |
2
+ |------|-------------|
3
+ | cloud-infrastructure-security | Use this skill when deploying to cloud platforms, configuring infrastructure, managing IAM policies, setting up logging/monitoring, or implementing CI/CD pipelines. Provides cloud security checklist aligned with best practices. |
4
+
5
+ # Cloud & Infrastructure Security Skill
6
+
7
+ This skill ensures cloud infrastructure, CI/CD pipelines, and deployment configurations follow security best practices and comply with industry standards.
8
+
9
+ ## When to Activate
10
+
11
+ - Deploying applications to cloud platforms (AWS, Vercel, Railway, Cloudflare)
12
+ - Configuring IAM roles and permissions
13
+ - Setting up CI/CD pipelines
14
+ - Implementing infrastructure as code (Terraform, CloudFormation)
15
+ - Configuring logging and monitoring
16
+ - Managing secrets in cloud environments
17
+ - Setting up CDN and edge security
18
+ - Implementing disaster recovery and backup strategies
19
+
20
+ ## Cloud Security Checklist
21
+
22
+ ### 1. IAM & Access Control
23
+
24
+ #### Principle of Least Privilege
25
+
26
+ ```yaml
27
+ # PASS: CORRECT: Minimal permissions
28
+ iam_role:
29
+ permissions:
30
+ - s3:GetObject # Only read access
31
+ - s3:ListBucket
32
+ resources:
33
+ - arn:aws:s3:::my-bucket/* # Specific bucket only
34
+
35
+ # FAIL: WRONG: Overly broad permissions
36
+ iam_role:
37
+ permissions:
38
+ - s3:* # All S3 actions
39
+ resources:
40
+ - "*" # All resources
41
+ ```
42
+
43
+ #### Multi-Factor Authentication (MFA)
44
+
45
+ ```bash
46
+ # ALWAYS enable MFA for root/admin accounts
47
+ aws iam enable-mfa-device \
48
+ --user-name admin \
49
+ --serial-number arn:aws:iam::123456789:mfa/admin \
50
+ --authentication-code1 123456 \
51
+ --authentication-code2 789012
52
+ ```
53
+
54
+ #### Verification Steps
55
+
56
+ - [ ] No root account usage in production
57
+ - [ ] MFA enabled for all privileged accounts
58
+ - [ ] Service accounts use roles, not long-lived credentials
59
+ - [ ] IAM policies follow least privilege
60
+ - [ ] Regular access reviews conducted
61
+ - [ ] Unused credentials rotated or removed
62
+
63
+ ### 2. Secrets Management
64
+
65
+ #### Cloud Secrets Managers
66
+
67
+ ```typescript
68
+ // PASS: CORRECT: Use cloud secrets manager
69
+ import { SecretsManager } from '@aws-sdk/client-secrets-manager';
70
+
71
+ const client = new SecretsManager({ region: 'us-east-1' });
72
+ const secret = await client.getSecretValue({ SecretId: 'prod/api-key' });
73
+ const apiKey = JSON.parse(secret.SecretString).key;
74
+
75
+ // FAIL: WRONG: Hardcoded or in environment variables only
76
+ const apiKey = process.env.API_KEY; // Not rotated, not audited
77
+ ```
78
+
79
+ #### Secrets Rotation
80
+
81
+ ```bash
82
+ # Set up automatic rotation for database credentials
83
+ aws secretsmanager rotate-secret \
84
+ --secret-id prod/db-password \
85
+ --rotation-lambda-arn arn:aws:lambda:region:account:function:rotate \
86
+ --rotation-rules AutomaticallyAfterDays=30
87
+ ```
88
+
89
+ #### Verification Steps
90
+
91
+ - [ ] All secrets stored in cloud secrets manager (AWS Secrets Manager, Vercel Secrets)
92
+ - [ ] Automatic rotation enabled for database credentials
93
+ - [ ] API keys rotated at least quarterly
94
+ - [ ] No secrets in code, logs, or error messages
95
+ - [ ] Audit logging enabled for secret access
96
+
97
+ ### 3. Network Security
98
+
99
+ #### VPC and Firewall Configuration
100
+
101
+ ```terraform
102
+ # PASS: CORRECT: Restricted security group
103
+ resource "aws_security_group" "app" {
104
+ name = "app-sg"
105
+
106
+ ingress {
107
+ from_port = 443
108
+ to_port = 443
109
+ protocol = "tcp"
110
+ cidr_blocks = ["10.0.0.0/16"] # Internal VPC only
111
+ }
112
+
113
+ egress {
114
+ from_port = 443
115
+ to_port = 443
116
+ protocol = "tcp"
117
+ cidr_blocks = ["0.0.0.0/0"] # Only HTTPS outbound
118
+ }
119
+ }
120
+
121
+ # FAIL: WRONG: Open to the internet
122
+ resource "aws_security_group" "bad" {
123
+ ingress {
124
+ from_port = 0
125
+ to_port = 65535
126
+ protocol = "tcp"
127
+ cidr_blocks = ["0.0.0.0/0"] # All ports, all IPs!
128
+ }
129
+ }
130
+ ```
131
+
132
+ #### Verification Steps
133
+
134
+ - [ ] Database not publicly accessible
135
+ - [ ] SSH/RDP ports restricted to VPN/bastion only
136
+ - [ ] Security groups follow least privilege
137
+ - [ ] Network ACLs configured
138
+ - [ ] VPC flow logs enabled
139
+
140
+ ### 4. Logging & Monitoring
141
+
142
+ #### CloudWatch/Logging Configuration
143
+
144
+ ```typescript
145
+ // PASS: CORRECT: Comprehensive logging
146
+ import { CloudWatchLogsClient, CreateLogStreamCommand } from '@aws-sdk/client-cloudwatch-logs';
147
+
148
+ const logSecurityEvent = async (event: SecurityEvent) => {
149
+ await cloudwatch.putLogEvents({
150
+ logGroupName: '/aws/security/events',
151
+ logStreamName: 'authentication',
152
+ logEvents: [{
153
+ timestamp: Date.now(),
154
+ message: JSON.stringify({
155
+ type: event.type,
156
+ userId: event.userId,
157
+ ip: event.ip,
158
+ result: event.result,
159
+ // Never log sensitive data
160
+ })
161
+ }]
162
+ });
163
+ };
164
+ ```
165
+
166
+ #### Verification Steps
167
+
168
+ - [ ] CloudWatch/logging enabled for all services
169
+ - [ ] Failed authentication attempts logged
170
+ - [ ] Admin actions audited
171
+ - [ ] Log retention configured (90+ days for compliance)
172
+ - [ ] Alerts configured for suspicious activity
173
+ - [ ] Logs centralized and tamper-proof
174
+
175
+ ### 5. CI/CD Pipeline Security
176
+
177
+ #### Secure Pipeline Configuration
178
+
179
+ ```yaml
180
+ # PASS: CORRECT: Secure GitHub Actions workflow
181
+ name: Deploy
182
+
183
+ on:
184
+ push:
185
+ branches: [main]
186
+
187
+ jobs:
188
+ deploy:
189
+ runs-on: ubuntu-latest
190
+ permissions:
191
+ contents: read # Minimal permissions
192
+
193
+ steps:
194
+ - uses: actions/checkout@v4
195
+
196
+ # Scan for secrets
197
+ - name: Secret scanning
198
+ uses: trufflesecurity/trufflehog@main
199
+
200
+ # Dependency audit
201
+ - name: Audit dependencies
202
+ run: npm audit --audit-level=high
203
+
204
+ # Use OIDC, not long-lived tokens
205
+ - name: Configure AWS credentials
206
+ uses: aws-actions/configure-aws-credentials@v4
207
+ with:
208
+ role-to-assume: arn:aws:iam::123456789:role/GitHubActionsRole
209
+ aws-region: us-east-1
210
+ ```
211
+
212
+ #### Supply Chain Security
213
+
214
+ ```json
215
+ // package.json - Use lock files and integrity checks
216
+ {
217
+ "scripts": {
218
+ "install": "npm ci", // Use ci for reproducible builds
219
+ "audit": "npm audit --audit-level=moderate",
220
+ "check": "npm outdated"
221
+ }
222
+ }
223
+ ```
224
+
225
+ #### Verification Steps
226
+
227
+ - [ ] OIDC used instead of long-lived credentials
228
+ - [ ] Secrets scanning in pipeline
229
+ - [ ] Dependency vulnerability scanning
230
+ - [ ] Container image scanning (if applicable)
231
+ - [ ] Branch protection rules enforced
232
+ - [ ] Code review required before merge
233
+ - [ ] Signed commits enforced
234
+
235
+ ### 6. Cloudflare & CDN Security
236
+
237
+ #### Cloudflare Security Configuration
238
+
239
+ ```typescript
240
+ // PASS: CORRECT: Cloudflare Workers with security headers
241
+ export default {
242
+ async fetch(request: Request): Promise<Response> {
243
+ const response = await fetch(request);
244
+
245
+ // Add security headers
246
+ const headers = new Headers(response.headers);
247
+ headers.set('X-Frame-Options', 'DENY');
248
+ headers.set('X-Content-Type-Options', 'nosniff');
249
+ headers.set('Referrer-Policy', 'strict-origin-when-cross-origin');
250
+ headers.set('Permissions-Policy', 'geolocation=(), microphone=()');
251
+
252
+ return new Response(response.body, {
253
+ status: response.status,
254
+ headers
255
+ });
256
+ }
257
+ };
258
+ ```
259
+
260
+ #### WAF Rules
261
+
262
+ ```bash
263
+ # Enable Cloudflare WAF managed rules
264
+ # - OWASP Core Ruleset
265
+ # - Cloudflare Managed Ruleset
266
+ # - Rate limiting rules
267
+ # - Bot protection
268
+ ```
269
+
270
+ #### Verification Steps
271
+
272
+ - [ ] WAF enabled with OWASP rules
273
+ - [ ] Rate limiting configured
274
+ - [ ] Bot protection active
275
+ - [ ] DDoS protection enabled
276
+ - [ ] Security headers configured
277
+ - [ ] SSL/TLS strict mode enabled
278
+
279
+ ### 7. Backup & Disaster Recovery
280
+
281
+ #### Automated Backups
282
+
283
+ ```terraform
284
+ # PASS: CORRECT: Automated RDS backups
285
+ resource "aws_db_instance" "main" {
286
+ allocated_storage = 20
287
+ engine = "postgres"
288
+
289
+ backup_retention_period = 30 # 30 days retention
290
+ backup_window = "03:00-04:00"
291
+ maintenance_window = "mon:04:00-mon:05:00"
292
+
293
+ enabled_cloudwatch_logs_exports = ["postgresql"]
294
+
295
+ deletion_protection = true # Prevent accidental deletion
296
+ }
297
+ ```
298
+
299
+ #### Verification Steps
300
+
301
+ - [ ] Automated daily backups configured
302
+ - [ ] Backup retention meets compliance requirements
303
+ - [ ] Point-in-time recovery enabled
304
+ - [ ] Backup testing performed quarterly
305
+ - [ ] Disaster recovery plan documented
306
+ - [ ] RPO and RTO defined and tested
307
+
308
+ ## Pre-Deployment Cloud Security Checklist
309
+
310
+ Before ANY production cloud deployment:
311
+
312
+ - [ ] **IAM**: Root account not used, MFA enabled, least privilege policies
313
+ - [ ] **Secrets**: All secrets in cloud secrets manager with rotation
314
+ - [ ] **Network**: Security groups restricted, no public databases
315
+ - [ ] **Logging**: CloudWatch/logging enabled with retention
316
+ - [ ] **Monitoring**: Alerts configured for anomalies
317
+ - [ ] **CI/CD**: OIDC auth, secrets scanning, dependency audits
318
+ - [ ] **CDN/WAF**: Cloudflare WAF enabled with OWASP rules
319
+ - [ ] **Encryption**: Data encrypted at rest and in transit
320
+ - [ ] **Backups**: Automated backups with tested recovery
321
+ - [ ] **Compliance**: GDPR/HIPAA requirements met (if applicable)
322
+ - [ ] **Documentation**: Infrastructure documented, runbooks created
323
+ - [ ] **Incident Response**: Security incident plan in place
324
+
325
+ ## Common Cloud Security Misconfigurations
326
+
327
+ ### S3 Bucket Exposure
328
+
329
+ ```bash
330
+ # FAIL: WRONG: Public bucket
331
+ aws s3api put-bucket-acl --bucket my-bucket --acl public-read
332
+
333
+ # PASS: CORRECT: Private bucket with specific access
334
+ aws s3api put-bucket-acl --bucket my-bucket --acl private
335
+ aws s3api put-bucket-policy --bucket my-bucket --policy file://policy.json
336
+ ```
337
+
338
+ ### RDS Public Access
339
+
340
+ ```terraform
341
+ # FAIL: WRONG
342
+ resource "aws_db_instance" "bad" {
343
+ publicly_accessible = true # NEVER do this!
344
+ }
345
+
346
+ # PASS: CORRECT
347
+ resource "aws_db_instance" "good" {
348
+ publicly_accessible = false
349
+ vpc_security_group_ids = [aws_security_group.db.id]
350
+ }
351
+ ```
352
+
353
+ ## Resources
354
+
355
+ - [AWS Security Best Practices](https://aws.amazon.com/security/best-practices/)
356
+ - [CIS AWS Foundations Benchmark](https://www.cisecurity.org/benchmark/amazon_web_services)
357
+ - [Cloudflare Security Documentation](https://developers.cloudflare.com/security/)
358
+ - [OWASP Cloud Security](https://owasp.org/www-project-cloud-security/)
359
+ - [Terraform Security Best Practices](https://www.terraform.io/docs/cloud/guides/recommended-practices/)
360
+
361
+ **Remember**: Cloud misconfigurations are the leading cause of data breaches. A single exposed S3 bucket or overly permissive IAM policy can compromise your entire infrastructure. Always follow the principle of least privilege and defense in depth.
.agents/skills/subagent-driven-development/SKILL.md ADDED
@@ -0,0 +1,277 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ name: subagent-driven-development
3
+ description: Use when executing implementation plans with independent tasks in the current session
4
+ ---
5
+
6
+ # Subagent-Driven Development
7
+
8
+ Execute plan by dispatching fresh subagent per task, with two-stage review after each: spec compliance review first, then code quality review.
9
+
10
+ **Why subagents:** You delegate tasks to specialized agents with isolated context. By precisely crafting their instructions and context, you ensure they stay focused and succeed at their task. They should never inherit your session's context or history — you construct exactly what they need. This also preserves your own context for coordination work.
11
+
12
+ **Core principle:** Fresh subagent per task + two-stage review (spec then quality) = high quality, fast iteration
13
+
14
+ ## When to Use
15
+
16
+ ```dot
17
+ digraph when_to_use {
18
+ "Have implementation plan?" [shape=diamond];
19
+ "Tasks mostly independent?" [shape=diamond];
20
+ "Stay in this session?" [shape=diamond];
21
+ "subagent-driven-development" [shape=box];
22
+ "executing-plans" [shape=box];
23
+ "Manual execution or brainstorm first" [shape=box];
24
+
25
+ "Have implementation plan?" -> "Tasks mostly independent?" [label="yes"];
26
+ "Have implementation plan?" -> "Manual execution or brainstorm first" [label="no"];
27
+ "Tasks mostly independent?" -> "Stay in this session?" [label="yes"];
28
+ "Tasks mostly independent?" -> "Manual execution or brainstorm first" [label="no - tightly coupled"];
29
+ "Stay in this session?" -> "subagent-driven-development" [label="yes"];
30
+ "Stay in this session?" -> "executing-plans" [label="no - parallel session"];
31
+ }
32
+ ```
33
+
34
+ **vs. Executing Plans (parallel session):**
35
+ - Same session (no context switch)
36
+ - Fresh subagent per task (no context pollution)
37
+ - Two-stage review after each task: spec compliance first, then code quality
38
+ - Faster iteration (no human-in-loop between tasks)
39
+
40
+ ## The Process
41
+
42
+ ```dot
43
+ digraph process {
44
+ rankdir=TB;
45
+
46
+ subgraph cluster_per_task {
47
+ label="Per Task";
48
+ "Dispatch implementer subagent (./implementer-prompt.md)" [shape=box];
49
+ "Implementer subagent asks questions?" [shape=diamond];
50
+ "Answer questions, provide context" [shape=box];
51
+ "Implementer subagent implements, tests, commits, self-reviews" [shape=box];
52
+ "Dispatch spec reviewer subagent (./spec-reviewer-prompt.md)" [shape=box];
53
+ "Spec reviewer subagent confirms code matches spec?" [shape=diamond];
54
+ "Implementer subagent fixes spec gaps" [shape=box];
55
+ "Dispatch code quality reviewer subagent (./code-quality-reviewer-prompt.md)" [shape=box];
56
+ "Code quality reviewer subagent approves?" [shape=diamond];
57
+ "Implementer subagent fixes quality issues" [shape=box];
58
+ "Mark task complete in TodoWrite" [shape=box];
59
+ }
60
+
61
+ "Read plan, extract all tasks with full text, note context, create TodoWrite" [shape=box];
62
+ "More tasks remain?" [shape=diamond];
63
+ "Dispatch final code reviewer subagent for entire implementation" [shape=box];
64
+ "Use superpowers:finishing-a-development-branch" [shape=box style=filled fillcolor=lightgreen];
65
+
66
+ "Read plan, extract all tasks with full text, note context, create TodoWrite" -> "Dispatch implementer subagent (./implementer-prompt.md)";
67
+ "Dispatch implementer subagent (./implementer-prompt.md)" -> "Implementer subagent asks questions?";
68
+ "Implementer subagent asks questions?" -> "Answer questions, provide context" [label="yes"];
69
+ "Answer questions, provide context" -> "Dispatch implementer subagent (./implementer-prompt.md)";
70
+ "Implementer subagent asks questions?" -> "Implementer subagent implements, tests, commits, self-reviews" [label="no"];
71
+ "Implementer subagent implements, tests, commits, self-reviews" -> "Dispatch spec reviewer subagent (./spec-reviewer-prompt.md)";
72
+ "Dispatch spec reviewer subagent (./spec-reviewer-prompt.md)" -> "Spec reviewer subagent confirms code matches spec?";
73
+ "Spec reviewer subagent confirms code matches spec?" -> "Implementer subagent fixes spec gaps" [label="no"];
74
+ "Implementer subagent fixes spec gaps" -> "Dispatch spec reviewer subagent (./spec-reviewer-prompt.md)" [label="re-review"];
75
+ "Spec reviewer subagent confirms code matches spec?" -> "Dispatch code quality reviewer subagent (./code-quality-reviewer-prompt.md)" [label="yes"];
76
+ "Dispatch code quality reviewer subagent (./code-quality-reviewer-prompt.md)" -> "Code quality reviewer subagent approves?";
77
+ "Code quality reviewer subagent approves?" -> "Implementer subagent fixes quality issues" [label="no"];
78
+ "Implementer subagent fixes quality issues" -> "Dispatch code quality reviewer subagent (./code-quality-reviewer-prompt.md)" [label="re-review"];
79
+ "Code quality reviewer subagent approves?" -> "Mark task complete in TodoWrite" [label="yes"];
80
+ "Mark task complete in TodoWrite" -> "More tasks remain?";
81
+ "More tasks remain?" -> "Dispatch implementer subagent (./implementer-prompt.md)" [label="yes"];
82
+ "More tasks remain?" -> "Dispatch final code reviewer subagent for entire implementation" [label="no"];
83
+ "Dispatch final code reviewer subagent for entire implementation" -> "Use superpowers:finishing-a-development-branch";
84
+ }
85
+ ```
86
+
87
+ ## Model Selection
88
+
89
+ Use the least powerful model that can handle each role to conserve cost and increase speed.
90
+
91
+ **Mechanical implementation tasks** (isolated functions, clear specs, 1-2 files): use a fast, cheap model. Most implementation tasks are mechanical when the plan is well-specified.
92
+
93
+ **Integration and judgment tasks** (multi-file coordination, pattern matching, debugging): use a standard model.
94
+
95
+ **Architecture, design, and review tasks**: use the most capable available model.
96
+
97
+ **Task complexity signals:**
98
+ - Touches 1-2 files with a complete spec → cheap model
99
+ - Touches multiple files with integration concerns → standard model
100
+ - Requires design judgment or broad codebase understanding → most capable model
101
+
102
+ ## Handling Implementer Status
103
+
104
+ Implementer subagents report one of four statuses. Handle each appropriately:
105
+
106
+ **DONE:** Proceed to spec compliance review.
107
+
108
+ **DONE_WITH_CONCERNS:** The implementer completed the work but flagged doubts. Read the concerns before proceeding. If the concerns are about correctness or scope, address them before review. If they're observations (e.g., "this file is getting large"), note them and proceed to review.
109
+
110
+ **NEEDS_CONTEXT:** The implementer needs information that wasn't provided. Provide the missing context and re-dispatch.
111
+
112
+ **BLOCKED:** The implementer cannot complete the task. Assess the blocker:
113
+ 1. If it's a context problem, provide more context and re-dispatch with the same model
114
+ 2. If the task requires more reasoning, re-dispatch with a more capable model
115
+ 3. If the task is too large, break it into smaller pieces
116
+ 4. If the plan itself is wrong, escalate to the human
117
+
118
+ **Never** ignore an escalation or force the same model to retry without changes. If the implementer said it's stuck, something needs to change.
119
+
120
+ ## Prompt Templates
121
+
122
+ - `./implementer-prompt.md` - Dispatch implementer subagent
123
+ - `./spec-reviewer-prompt.md` - Dispatch spec compliance reviewer subagent
124
+ - `./code-quality-reviewer-prompt.md` - Dispatch code quality reviewer subagent
125
+
126
+ ## Example Workflow
127
+
128
+ ```
129
+ You: I'm using Subagent-Driven Development to execute this plan.
130
+
131
+ [Read plan file once: docs/superpowers/plans/feature-plan.md]
132
+ [Extract all 5 tasks with full text and context]
133
+ [Create TodoWrite with all tasks]
134
+
135
+ Task 1: Hook installation script
136
+
137
+ [Get Task 1 text and context (already extracted)]
138
+ [Dispatch implementation subagent with full task text + context]
139
+
140
+ Implementer: "Before I begin - should the hook be installed at user or system level?"
141
+
142
+ You: "User level (~/.config/superpowers/hooks/)"
143
+
144
+ Implementer: "Got it. Implementing now..."
145
+ [Later] Implementer:
146
+ - Implemented install-hook command
147
+ - Added tests, 5/5 passing
148
+ - Self-review: Found I missed --force flag, added it
149
+ - Committed
150
+
151
+ [Dispatch spec compliance reviewer]
152
+ Spec reviewer: ✅ Spec compliant - all requirements met, nothing extra
153
+
154
+ [Get git SHAs, dispatch code quality reviewer]
155
+ Code reviewer: Strengths: Good test coverage, clean. Issues: None. Approved.
156
+
157
+ [Mark Task 1 complete]
158
+
159
+ Task 2: Recovery modes
160
+
161
+ [Get Task 2 text and context (already extracted)]
162
+ [Dispatch implementation subagent with full task text + context]
163
+
164
+ Implementer: [No questions, proceeds]
165
+ Implementer:
166
+ - Added verify/repair modes
167
+ - 8/8 tests passing
168
+ - Self-review: All good
169
+ - Committed
170
+
171
+ [Dispatch spec compliance reviewer]
172
+ Spec reviewer: ❌ Issues:
173
+ - Missing: Progress reporting (spec says "report every 100 items")
174
+ - Extra: Added --json flag (not requested)
175
+
176
+ [Implementer fixes issues]
177
+ Implementer: Removed --json flag, added progress reporting
178
+
179
+ [Spec reviewer reviews again]
180
+ Spec reviewer: ✅ Spec compliant now
181
+
182
+ [Dispatch code quality reviewer]
183
+ Code reviewer: Strengths: Solid. Issues (Important): Magic number (100)
184
+
185
+ [Implementer fixes]
186
+ Implementer: Extracted PROGRESS_INTERVAL constant
187
+
188
+ [Code reviewer reviews again]
189
+ Code reviewer: ✅ Approved
190
+
191
+ [Mark Task 2 complete]
192
+
193
+ ...
194
+
195
+ [After all tasks]
196
+ [Dispatch final code-reviewer]
197
+ Final reviewer: All requirements met, ready to merge
198
+
199
+ Done!
200
+ ```
201
+
202
+ ## Advantages
203
+
204
+ **vs. Manual execution:**
205
+ - Subagents follow TDD naturally
206
+ - Fresh context per task (no confusion)
207
+ - Parallel-safe (subagents don't interfere)
208
+ - Subagent can ask questions (before AND during work)
209
+
210
+ **vs. Executing Plans:**
211
+ - Same session (no handoff)
212
+ - Continuous progress (no waiting)
213
+ - Review checkpoints automatic
214
+
215
+ **Efficiency gains:**
216
+ - No file reading overhead (controller provides full text)
217
+ - Controller curates exactly what context is needed
218
+ - Subagent gets complete information upfront
219
+ - Questions surfaced before work begins (not after)
220
+
221
+ **Quality gates:**
222
+ - Self-review catches issues before handoff
223
+ - Two-stage review: spec compliance, then code quality
224
+ - Review loops ensure fixes actually work
225
+ - Spec compliance prevents over/under-building
226
+ - Code quality ensures implementation is well-built
227
+
228
+ **Cost:**
229
+ - More subagent invocations (implementer + 2 reviewers per task)
230
+ - Controller does more prep work (extracting all tasks upfront)
231
+ - Review loops add iterations
232
+ - But catches issues early (cheaper than debugging later)
233
+
234
+ ## Red Flags
235
+
236
+ **Never:**
237
+ - Start implementation on main/master branch without explicit user consent
238
+ - Skip reviews (spec compliance OR code quality)
239
+ - Proceed with unfixed issues
240
+ - Dispatch multiple implementation subagents in parallel (conflicts)
241
+ - Make subagent read plan file (provide full text instead)
242
+ - Skip scene-setting context (subagent needs to understand where task fits)
243
+ - Ignore subagent questions (answer before letting them proceed)
244
+ - Accept "close enough" on spec compliance (spec reviewer found issues = not done)
245
+ - Skip review loops (reviewer found issues = implementer fixes = review again)
246
+ - Let implementer self-review replace actual review (both are needed)
247
+ - **Start code quality review before spec compliance is ✅** (wrong order)
248
+ - Move to next task while either review has open issues
249
+
250
+ **If subagent asks questions:**
251
+ - Answer clearly and completely
252
+ - Provide additional context if needed
253
+ - Don't rush them into implementation
254
+
255
+ **If reviewer finds issues:**
256
+ - Implementer (same subagent) fixes them
257
+ - Reviewer reviews again
258
+ - Repeat until approved
259
+ - Don't skip the re-review
260
+
261
+ **If subagent fails task:**
262
+ - Dispatch fix subagent with specific instructions
263
+ - Don't try to fix manually (context pollution)
264
+
265
+ ## Integration
266
+
267
+ **Required workflow skills:**
268
+ - **superpowers:using-git-worktrees** - REQUIRED: Set up isolated workspace before starting
269
+ - **superpowers:writing-plans** - Creates the plan this skill executes
270
+ - **superpowers:requesting-code-review** - Code review template for reviewer subagents
271
+ - **superpowers:finishing-a-development-branch** - Complete development after all tasks
272
+
273
+ **Subagents should use:**
274
+ - **superpowers:test-driven-development** - Subagents follow TDD for each task
275
+
276
+ **Alternative workflow:**
277
+ - **superpowers:executing-plans** - Use for parallel session instead of same-session execution
.agents/skills/subagent-driven-development/code-quality-reviewer-prompt.md ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Code Quality Reviewer Prompt Template
2
+
3
+ Use this template when dispatching a code quality reviewer subagent.
4
+
5
+ **Purpose:** Verify implementation is well-built (clean, tested, maintainable)
6
+
7
+ **Only dispatch after spec compliance review passes.**
8
+
9
+ ```
10
+ Task tool (superpowers:code-reviewer):
11
+ Use template at requesting-code-review/code-reviewer.md
12
+
13
+ WHAT_WAS_IMPLEMENTED: [from implementer's report]
14
+ PLAN_OR_REQUIREMENTS: Task N from [plan-file]
15
+ BASE_SHA: [commit before task]
16
+ HEAD_SHA: [current commit]
17
+ DESCRIPTION: [task summary]
18
+ ```
19
+
20
+ **In addition to standard code quality concerns, the reviewer should check:**
21
+ - Does each file have one clear responsibility with a well-defined interface?
22
+ - Are units decomposed so they can be understood and tested independently?
23
+ - Is the implementation following the file structure from the plan?
24
+ - Did this implementation create new files that are already large, or significantly grow existing files? (Don't flag pre-existing file sizes — focus on what this change contributed.)
25
+
26
+ **Code reviewer returns:** Strengths, Issues (Critical/Important/Minor), Assessment
.agents/skills/subagent-driven-development/implementer-prompt.md ADDED
@@ -0,0 +1,113 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Implementer Subagent Prompt Template
2
+
3
+ Use this template when dispatching an implementer subagent.
4
+
5
+ ```
6
+ Task tool (general-purpose):
7
+ description: "Implement Task N: [task name]"
8
+ prompt: |
9
+ You are implementing Task N: [task name]
10
+
11
+ ## Task Description
12
+
13
+ [FULL TEXT of task from plan - paste it here, don't make subagent read file]
14
+
15
+ ## Context
16
+
17
+ [Scene-setting: where this fits, dependencies, architectural context]
18
+
19
+ ## Before You Begin
20
+
21
+ If you have questions about:
22
+ - The requirements or acceptance criteria
23
+ - The approach or implementation strategy
24
+ - Dependencies or assumptions
25
+ - Anything unclear in the task description
26
+
27
+ **Ask them now.** Raise any concerns before starting work.
28
+
29
+ ## Your Job
30
+
31
+ Once you're clear on requirements:
32
+ 1. Implement exactly what the task specifies
33
+ 2. Write tests (following TDD if task says to)
34
+ 3. Verify implementation works
35
+ 4. Commit your work
36
+ 5. Self-review (see below)
37
+ 6. Report back
38
+
39
+ Work from: [directory]
40
+
41
+ **While you work:** If you encounter something unexpected or unclear, **ask questions**.
42
+ It's always OK to pause and clarify. Don't guess or make assumptions.
43
+
44
+ ## Code Organization
45
+
46
+ You reason best about code you can hold in context at once, and your edits are more
47
+ reliable when files are focused. Keep this in mind:
48
+ - Follow the file structure defined in the plan
49
+ - Each file should have one clear responsibility with a well-defined interface
50
+ - If a file you're creating is growing beyond the plan's intent, stop and report
51
+ it as DONE_WITH_CONCERNS — don't split files on your own without plan guidance
52
+ - If an existing file you're modifying is already large or tangled, work carefully
53
+ and note it as a concern in your report
54
+ - In existing codebases, follow established patterns. Improve code you're touching
55
+ the way a good developer would, but don't restructure things outside your task.
56
+
57
+ ## When You're in Over Your Head
58
+
59
+ It is always OK to stop and say "this is too hard for me." Bad work is worse than
60
+ no work. You will not be penalized for escalating.
61
+
62
+ **STOP and escalate when:**
63
+ - The task requires architectural decisions with multiple valid approaches
64
+ - You need to understand code beyond what was provided and can't find clarity
65
+ - You feel uncertain about whether your approach is correct
66
+ - The task involves restructuring existing code in ways the plan didn't anticipate
67
+ - You've been reading file after file trying to understand the system without progress
68
+
69
+ **How to escalate:** Report back with status BLOCKED or NEEDS_CONTEXT. Describe
70
+ specifically what you're stuck on, what you've tried, and what kind of help you need.
71
+ The controller can provide more context, re-dispatch with a more capable model,
72
+ or break the task into smaller pieces.
73
+
74
+ ## Before Reporting Back: Self-Review
75
+
76
+ Review your work with fresh eyes. Ask yourself:
77
+
78
+ **Completeness:**
79
+ - Did I fully implement everything in the spec?
80
+ - Did I miss any requirements?
81
+ - Are there edge cases I didn't handle?
82
+
83
+ **Quality:**
84
+ - Is this my best work?
85
+ - Are names clear and accurate (match what things do, not how they work)?
86
+ - Is the code clean and maintainable?
87
+
88
+ **Discipline:**
89
+ - Did I avoid overbuilding (YAGNI)?
90
+ - Did I only build what was requested?
91
+ - Did I follow existing patterns in the codebase?
92
+
93
+ **Testing:**
94
+ - Do tests actually verify behavior (not just mock behavior)?
95
+ - Did I follow TDD if required?
96
+ - Are tests comprehensive?
97
+
98
+ If you find issues during self-review, fix them now before reporting.
99
+
100
+ ## Report Format
101
+
102
+ When done, report:
103
+ - **Status:** DONE | DONE_WITH_CONCERNS | BLOCKED | NEEDS_CONTEXT
104
+ - What you implemented (or what you attempted, if blocked)
105
+ - What you tested and test results
106
+ - Files changed
107
+ - Self-review findings (if any)
108
+ - Any issues or concerns
109
+
110
+ Use DONE_WITH_CONCERNS if you completed the work but have doubts about correctness.
111
+ Use BLOCKED if you cannot complete the task. Use NEEDS_CONTEXT if you need
112
+ information that wasn't provided. Never silently produce work you're unsure about.
113
+ ```
.agents/skills/subagent-driven-development/spec-reviewer-prompt.md ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Spec Compliance Reviewer Prompt Template
2
+
3
+ Use this template when dispatching a spec compliance reviewer subagent.
4
+
5
+ **Purpose:** Verify implementer built what was requested (nothing more, nothing less)
6
+
7
+ ```
8
+ Task tool (general-purpose):
9
+ description: "Review spec compliance for Task N"
10
+ prompt: |
11
+ You are reviewing whether an implementation matches its specification.
12
+
13
+ ## What Was Requested
14
+
15
+ [FULL TEXT of task requirements]
16
+
17
+ ## What Implementer Claims They Built
18
+
19
+ [From implementer's report]
20
+
21
+ ## CRITICAL: Do Not Trust the Report
22
+
23
+ The implementer finished suspiciously quickly. Their report may be incomplete,
24
+ inaccurate, or optimistic. You MUST verify everything independently.
25
+
26
+ **DO NOT:**
27
+ - Take their word for what they implemented
28
+ - Trust their claims about completeness
29
+ - Accept their interpretation of requirements
30
+
31
+ **DO:**
32
+ - Read the actual code they wrote
33
+ - Compare actual implementation to requirements line by line
34
+ - Check for missing pieces they claimed to implement
35
+ - Look for extra features they didn't mention
36
+
37
+ ## Your Job
38
+
39
+ Read the implementation code and verify:
40
+
41
+ **Missing requirements:**
42
+ - Did they implement everything that was requested?
43
+ - Are there requirements they skipped or missed?
44
+ - Did they claim something works but didn't actually implement it?
45
+
46
+ **Extra/unneeded work:**
47
+ - Did they build things that weren't requested?
48
+ - Did they over-engineer or add unnecessary features?
49
+ - Did they add "nice to haves" that weren't in spec?
50
+
51
+ **Misunderstandings:**
52
+ - Did they interpret requirements differently than intended?
53
+ - Did they solve the wrong problem?
54
+ - Did they implement the right feature but wrong way?
55
+
56
+ **Verify by reading code, not by trusting report.**
57
+
58
+ Report:
59
+ - ✅ Spec compliant (if everything matches after code inspection)
60
+ - ❌ Issues found: [list specifically what's missing or extra, with file:line references]
61
+ ```
.agents/skills/systematic-debugging/CREATION-LOG.md ADDED
@@ -0,0 +1,119 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Creation Log: Systematic Debugging Skill
2
+
3
+ Reference example of extracting, structuring, and bulletproofing a critical skill.
4
+
5
+ ## Source Material
6
+
7
+ Extracted debugging framework from `/Users/jesse/.claude/CLAUDE.md`:
8
+ - 4-phase systematic process (Investigation → Pattern Analysis → Hypothesis → Implementation)
9
+ - Core mandate: ALWAYS find root cause, NEVER fix symptoms
10
+ - Rules designed to resist time pressure and rationalization
11
+
12
+ ## Extraction Decisions
13
+
14
+ **What to include:**
15
+ - Complete 4-phase framework with all rules
16
+ - Anti-shortcuts ("NEVER fix symptom", "STOP and re-analyze")
17
+ - Pressure-resistant language ("even if faster", "even if I seem in a hurry")
18
+ - Concrete steps for each phase
19
+
20
+ **What to leave out:**
21
+ - Project-specific context
22
+ - Repetitive variations of same rule
23
+ - Narrative explanations (condensed to principles)
24
+
25
+ ## Structure Following skill-creation/SKILL.md
26
+
27
+ 1. **Rich when_to_use** - Included symptoms and anti-patterns
28
+ 2. **Type: technique** - Concrete process with steps
29
+ 3. **Keywords** - "root cause", "symptom", "workaround", "debugging", "investigation"
30
+ 4. **Flowchart** - Decision point for "fix failed" → re-analyze vs add more fixes
31
+ 5. **Phase-by-phase breakdown** - Scannable checklist format
32
+ 6. **Anti-patterns section** - What NOT to do (critical for this skill)
33
+
34
+ ## Bulletproofing Elements
35
+
36
+ Framework designed to resist rationalization under pressure:
37
+
38
+ ### Language Choices
39
+ - "ALWAYS" / "NEVER" (not "should" / "try to")
40
+ - "even if faster" / "even if I seem in a hurry"
41
+ - "STOP and re-analyze" (explicit pause)
42
+ - "Don't skip past" (catches the actual behavior)
43
+
44
+ ### Structural Defenses
45
+ - **Phase 1 required** - Can't skip to implementation
46
+ - **Single hypothesis rule** - Forces thinking, prevents shotgun fixes
47
+ - **Explicit failure mode** - "IF your first fix doesn't work" with mandatory action
48
+ - **Anti-patterns section** - Shows exactly what shortcuts look like
49
+
50
+ ### Redundancy
51
+ - Root cause mandate in overview + when_to_use + Phase 1 + implementation rules
52
+ - "NEVER fix symptom" appears 4 times in different contexts
53
+ - Each phase has explicit "don't skip" guidance
54
+
55
+ ## Testing Approach
56
+
57
+ Created 4 validation tests following skills/meta/testing-skills-with-subagents:
58
+
59
+ ### Test 1: Academic Context (No Pressure)
60
+ - Simple bug, no time pressure
61
+ - **Result:** Perfect compliance, complete investigation
62
+
63
+ ### Test 2: Time Pressure + Obvious Quick Fix
64
+ - User "in a hurry", symptom fix looks easy
65
+ - **Result:** Resisted shortcut, followed full process, found real root cause
66
+
67
+ ### Test 3: Complex System + Uncertainty
68
+ - Multi-layer failure, unclear if can find root cause
69
+ - **Result:** Systematic investigation, traced through all layers, found source
70
+
71
+ ### Test 4: Failed First Fix
72
+ - Hypothesis doesn't work, temptation to add more fixes
73
+ - **Result:** Stopped, re-analyzed, formed new hypothesis (no shotgun)
74
+
75
+ **All tests passed.** No rationalizations found.
76
+
77
+ ## Iterations
78
+
79
+ ### Initial Version
80
+ - Complete 4-phase framework
81
+ - Anti-patterns section
82
+ - Flowchart for "fix failed" decision
83
+
84
+ ### Enhancement 1: TDD Reference
85
+ - Added link to skills/testing/test-driven-development
86
+ - Note explaining TDD's "simplest code" ≠ debugging's "root cause"
87
+ - Prevents confusion between methodologies
88
+
89
+ ## Final Outcome
90
+
91
+ Bulletproof skill that:
92
+ - ✅ Clearly mandates root cause investigation
93
+ - ✅ Resists time pressure rationalization
94
+ - ✅ Provides concrete steps for each phase
95
+ - ✅ Shows anti-patterns explicitly
96
+ - ✅ Tested under multiple pressure scenarios
97
+ - ✅ Clarifies relationship to TDD
98
+ - ✅ Ready for use
99
+
100
+ ## Key Insight
101
+
102
+ **Most important bulletproofing:** Anti-patterns section showing exact shortcuts that feel justified in the moment. When Claude thinks "I'll just add this one quick fix", seeing that exact pattern listed as wrong creates cognitive friction.
103
+
104
+ ## Usage Example
105
+
106
+ When encountering a bug:
107
+ 1. Load skill: skills/debugging/systematic-debugging
108
+ 2. Read overview (10 sec) - reminded of mandate
109
+ 3. Follow Phase 1 checklist - forced investigation
110
+ 4. If tempted to skip - see anti-pattern, stop
111
+ 5. Complete all phases - root cause found
112
+
113
+ **Time investment:** 5-10 minutes
114
+ **Time saved:** Hours of symptom-whack-a-mole
115
+
116
+ ---
117
+
118
+ *Created: 2025-10-03*
119
+ *Purpose: Reference example for skill extraction and bulletproofing*
.agents/skills/systematic-debugging/SKILL.md ADDED
@@ -0,0 +1,296 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ name: systematic-debugging
3
+ description: Use when encountering any bug, test failure, or unexpected behavior, before proposing fixes
4
+ ---
5
+
6
+ # Systematic Debugging
7
+
8
+ ## Overview
9
+
10
+ Random fixes waste time and create new bugs. Quick patches mask underlying issues.
11
+
12
+ **Core principle:** ALWAYS find root cause before attempting fixes. Symptom fixes are failure.
13
+
14
+ **Violating the letter of this process is violating the spirit of debugging.**
15
+
16
+ ## The Iron Law
17
+
18
+ ```
19
+ NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST
20
+ ```
21
+
22
+ If you haven't completed Phase 1, you cannot propose fixes.
23
+
24
+ ## When to Use
25
+
26
+ Use for ANY technical issue:
27
+ - Test failures
28
+ - Bugs in production
29
+ - Unexpected behavior
30
+ - Performance problems
31
+ - Build failures
32
+ - Integration issues
33
+
34
+ **Use this ESPECIALLY when:**
35
+ - Under time pressure (emergencies make guessing tempting)
36
+ - "Just one quick fix" seems obvious
37
+ - You've already tried multiple fixes
38
+ - Previous fix didn't work
39
+ - You don't fully understand the issue
40
+
41
+ **Don't skip when:**
42
+ - Issue seems simple (simple bugs have root causes too)
43
+ - You're in a hurry (rushing guarantees rework)
44
+ - Manager wants it fixed NOW (systematic is faster than thrashing)
45
+
46
+ ## The Four Phases
47
+
48
+ You MUST complete each phase before proceeding to the next.
49
+
50
+ ### Phase 1: Root Cause Investigation
51
+
52
+ **BEFORE attempting ANY fix:**
53
+
54
+ 1. **Read Error Messages Carefully**
55
+ - Don't skip past errors or warnings
56
+ - They often contain the exact solution
57
+ - Read stack traces completely
58
+ - Note line numbers, file paths, error codes
59
+
60
+ 2. **Reproduce Consistently**
61
+ - Can you trigger it reliably?
62
+ - What are the exact steps?
63
+ - Does it happen every time?
64
+ - If not reproducible → gather more data, don't guess
65
+
66
+ 3. **Check Recent Changes**
67
+ - What changed that could cause this?
68
+ - Git diff, recent commits
69
+ - New dependencies, config changes
70
+ - Environmental differences
71
+
72
+ 4. **Gather Evidence in Multi-Component Systems**
73
+
74
+ **WHEN system has multiple components (CI → build → signing, API → service → database):**
75
+
76
+ **BEFORE proposing fixes, add diagnostic instrumentation:**
77
+ ```
78
+ For EACH component boundary:
79
+ - Log what data enters component
80
+ - Log what data exits component
81
+ - Verify environment/config propagation
82
+ - Check state at each layer
83
+
84
+ Run once to gather evidence showing WHERE it breaks
85
+ THEN analyze evidence to identify failing component
86
+ THEN investigate that specific component
87
+ ```
88
+
89
+ **Example (multi-layer system):**
90
+ ```bash
91
+ # Layer 1: Workflow
92
+ echo "=== Secrets available in workflow: ==="
93
+ echo "IDENTITY: ${IDENTITY:+SET}${IDENTITY:-UNSET}"
94
+
95
+ # Layer 2: Build script
96
+ echo "=== Env vars in build script: ==="
97
+ env | grep IDENTITY || echo "IDENTITY not in environment"
98
+
99
+ # Layer 3: Signing script
100
+ echo "=== Keychain state: ==="
101
+ security list-keychains
102
+ security find-identity -v
103
+
104
+ # Layer 4: Actual signing
105
+ codesign --sign "$IDENTITY" --verbose=4 "$APP"
106
+ ```
107
+
108
+ **This reveals:** Which layer fails (secrets → workflow ✓, workflow → build ✗)
109
+
110
+ 5. **Trace Data Flow**
111
+
112
+ **WHEN error is deep in call stack:**
113
+
114
+ See `root-cause-tracing.md` in this directory for the complete backward tracing technique.
115
+
116
+ **Quick version:**
117
+ - Where does bad value originate?
118
+ - What called this with bad value?
119
+ - Keep tracing up until you find the source
120
+ - Fix at source, not at symptom
121
+
122
+ ### Phase 2: Pattern Analysis
123
+
124
+ **Find the pattern before fixing:**
125
+
126
+ 1. **Find Working Examples**
127
+ - Locate similar working code in same codebase
128
+ - What works that's similar to what's broken?
129
+
130
+ 2. **Compare Against References**
131
+ - If implementing pattern, read reference implementation COMPLETELY
132
+ - Don't skim - read every line
133
+ - Understand the pattern fully before applying
134
+
135
+ 3. **Identify Differences**
136
+ - What's different between working and broken?
137
+ - List every difference, however small
138
+ - Don't assume "that can't matter"
139
+
140
+ 4. **Understand Dependencies**
141
+ - What other components does this need?
142
+ - What settings, config, environment?
143
+ - What assumptions does it make?
144
+
145
+ ### Phase 3: Hypothesis and Testing
146
+
147
+ **Scientific method:**
148
+
149
+ 1. **Form Single Hypothesis**
150
+ - State clearly: "I think X is the root cause because Y"
151
+ - Write it down
152
+ - Be specific, not vague
153
+
154
+ 2. **Test Minimally**
155
+ - Make the SMALLEST possible change to test hypothesis
156
+ - One variable at a time
157
+ - Don't fix multiple things at once
158
+
159
+ 3. **Verify Before Continuing**
160
+ - Did it work? Yes → Phase 4
161
+ - Didn't work? Form NEW hypothesis
162
+ - DON'T add more fixes on top
163
+
164
+ 4. **When You Don't Know**
165
+ - Say "I don't understand X"
166
+ - Don't pretend to know
167
+ - Ask for help
168
+ - Research more
169
+
170
+ ### Phase 4: Implementation
171
+
172
+ **Fix the root cause, not the symptom:**
173
+
174
+ 1. **Create Failing Test Case**
175
+ - Simplest possible reproduction
176
+ - Automated test if possible
177
+ - One-off test script if no framework
178
+ - MUST have before fixing
179
+ - Use the `superpowers:test-driven-development` skill for writing proper failing tests
180
+
181
+ 2. **Implement Single Fix**
182
+ - Address the root cause identified
183
+ - ONE change at a time
184
+ - No "while I'm here" improvements
185
+ - No bundled refactoring
186
+
187
+ 3. **Verify Fix**
188
+ - Test passes now?
189
+ - No other tests broken?
190
+ - Issue actually resolved?
191
+
192
+ 4. **If Fix Doesn't Work**
193
+ - STOP
194
+ - Count: How many fixes have you tried?
195
+ - If < 3: Return to Phase 1, re-analyze with new information
196
+ - **If ≥ 3: STOP and question the architecture (step 5 below)**
197
+ - DON'T attempt Fix #4 without architectural discussion
198
+
199
+ 5. **If 3+ Fixes Failed: Question Architecture**
200
+
201
+ **Pattern indicating architectural problem:**
202
+ - Each fix reveals new shared state/coupling/problem in different place
203
+ - Fixes require "massive refactoring" to implement
204
+ - Each fix creates new symptoms elsewhere
205
+
206
+ **STOP and question fundamentals:**
207
+ - Is this pattern fundamentally sound?
208
+ - Are we "sticking with it through sheer inertia"?
209
+ - Should we refactor architecture vs. continue fixing symptoms?
210
+
211
+ **Discuss with your human partner before attempting more fixes**
212
+
213
+ This is NOT a failed hypothesis - this is a wrong architecture.
214
+
215
+ ## Red Flags - STOP and Follow Process
216
+
217
+ If you catch yourself thinking:
218
+ - "Quick fix for now, investigate later"
219
+ - "Just try changing X and see if it works"
220
+ - "Add multiple changes, run tests"
221
+ - "Skip the test, I'll manually verify"
222
+ - "It's probably X, let me fix that"
223
+ - "I don't fully understand but this might work"
224
+ - "Pattern says X but I'll adapt it differently"
225
+ - "Here are the main problems: [lists fixes without investigation]"
226
+ - Proposing solutions before tracing data flow
227
+ - **"One more fix attempt" (when already tried 2+)**
228
+ - **Each fix reveals new problem in different place**
229
+
230
+ **ALL of these mean: STOP. Return to Phase 1.**
231
+
232
+ **If 3+ fixes failed:** Question the architecture (see Phase 4.5)
233
+
234
+ ## your human partner's Signals You're Doing It Wrong
235
+
236
+ **Watch for these redirections:**
237
+ - "Is that not happening?" - You assumed without verifying
238
+ - "Will it show us...?" - You should have added evidence gathering
239
+ - "Stop guessing" - You're proposing fixes without understanding
240
+ - "Ultrathink this" - Question fundamentals, not just symptoms
241
+ - "We're stuck?" (frustrated) - Your approach isn't working
242
+
243
+ **When you see these:** STOP. Return to Phase 1.
244
+
245
+ ## Common Rationalizations
246
+
247
+ | Excuse | Reality |
248
+ |--------|---------|
249
+ | "Issue is simple, don't need process" | Simple issues have root causes too. Process is fast for simple bugs. |
250
+ | "Emergency, no time for process" | Systematic debugging is FASTER than guess-and-check thrashing. |
251
+ | "Just try this first, then investigate" | First fix sets the pattern. Do it right from the start. |
252
+ | "I'll write test after confirming fix works" | Untested fixes don't stick. Test first proves it. |
253
+ | "Multiple fixes at once saves time" | Can't isolate what worked. Causes new bugs. |
254
+ | "Reference too long, I'll adapt the pattern" | Partial understanding guarantees bugs. Read it completely. |
255
+ | "I see the problem, let me fix it" | Seeing symptoms ≠ understanding root cause. |
256
+ | "One more fix attempt" (after 2+ failures) | 3+ failures = architectural problem. Question pattern, don't fix again. |
257
+
258
+ ## Quick Reference
259
+
260
+ | Phase | Key Activities | Success Criteria |
261
+ |-------|---------------|------------------|
262
+ | **1. Root Cause** | Read errors, reproduce, check changes, gather evidence | Understand WHAT and WHY |
263
+ | **2. Pattern** | Find working examples, compare | Identify differences |
264
+ | **3. Hypothesis** | Form theory, test minimally | Confirmed or new hypothesis |
265
+ | **4. Implementation** | Create test, fix, verify | Bug resolved, tests pass |
266
+
267
+ ## When Process Reveals "No Root Cause"
268
+
269
+ If systematic investigation reveals issue is truly environmental, timing-dependent, or external:
270
+
271
+ 1. You've completed the process
272
+ 2. Document what you investigated
273
+ 3. Implement appropriate handling (retry, timeout, error message)
274
+ 4. Add monitoring/logging for future investigation
275
+
276
+ **But:** 95% of "no root cause" cases are incomplete investigation.
277
+
278
+ ## Supporting Techniques
279
+
280
+ These techniques are part of systematic debugging and available in this directory:
281
+
282
+ - **`root-cause-tracing.md`** - Trace bugs backward through call stack to find original trigger
283
+ - **`defense-in-depth.md`** - Add validation at multiple layers after finding root cause
284
+ - **`condition-based-waiting.md`** - Replace arbitrary timeouts with condition polling
285
+
286
+ **Related skills:**
287
+ - **superpowers:test-driven-development** - For creating failing test case (Phase 4, Step 1)
288
+ - **superpowers:verification-before-completion** - Verify fix worked before claiming success
289
+
290
+ ## Real-World Impact
291
+
292
+ From debugging sessions:
293
+ - Systematic approach: 15-30 minutes to fix
294
+ - Random fixes approach: 2-3 hours of thrashing
295
+ - First-time fix rate: 95% vs 40%
296
+ - New bugs introduced: Near zero vs common
.agents/skills/systematic-debugging/condition-based-waiting-example.ts ADDED
@@ -0,0 +1,158 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ // Complete implementation of condition-based waiting utilities
2
+ // From: Lace test infrastructure improvements (2025-10-03)
3
+ // Context: Fixed 15 flaky tests by replacing arbitrary timeouts
4
+
5
+ import type { ThreadManager } from '~/threads/thread-manager';
6
+ import type { LaceEvent, LaceEventType } from '~/threads/types';
7
+
8
+ /**
9
+ * Wait for a specific event type to appear in thread
10
+ *
11
+ * @param threadManager - The thread manager to query
12
+ * @param threadId - Thread to check for events
13
+ * @param eventType - Type of event to wait for
14
+ * @param timeoutMs - Maximum time to wait (default 5000ms)
15
+ * @returns Promise resolving to the first matching event
16
+ *
17
+ * Example:
18
+ * await waitForEvent(threadManager, agentThreadId, 'TOOL_RESULT');
19
+ */
20
+ export function waitForEvent(
21
+ threadManager: ThreadManager,
22
+ threadId: string,
23
+ eventType: LaceEventType,
24
+ timeoutMs = 5000
25
+ ): Promise<LaceEvent> {
26
+ return new Promise((resolve, reject) => {
27
+ const startTime = Date.now();
28
+
29
+ const check = () => {
30
+ const events = threadManager.getEvents(threadId);
31
+ const event = events.find((e) => e.type === eventType);
32
+
33
+ if (event) {
34
+ resolve(event);
35
+ } else if (Date.now() - startTime > timeoutMs) {
36
+ reject(new Error(`Timeout waiting for ${eventType} event after ${timeoutMs}ms`));
37
+ } else {
38
+ setTimeout(check, 10); // Poll every 10ms for efficiency
39
+ }
40
+ };
41
+
42
+ check();
43
+ });
44
+ }
45
+
46
+ /**
47
+ * Wait for a specific number of events of a given type
48
+ *
49
+ * @param threadManager - The thread manager to query
50
+ * @param threadId - Thread to check for events
51
+ * @param eventType - Type of event to wait for
52
+ * @param count - Number of events to wait for
53
+ * @param timeoutMs - Maximum time to wait (default 5000ms)
54
+ * @returns Promise resolving to all matching events once count is reached
55
+ *
56
+ * Example:
57
+ * // Wait for 2 AGENT_MESSAGE events (initial response + continuation)
58
+ * await waitForEventCount(threadManager, agentThreadId, 'AGENT_MESSAGE', 2);
59
+ */
60
+ export function waitForEventCount(
61
+ threadManager: ThreadManager,
62
+ threadId: string,
63
+ eventType: LaceEventType,
64
+ count: number,
65
+ timeoutMs = 5000
66
+ ): Promise<LaceEvent[]> {
67
+ return new Promise((resolve, reject) => {
68
+ const startTime = Date.now();
69
+
70
+ const check = () => {
71
+ const events = threadManager.getEvents(threadId);
72
+ const matchingEvents = events.filter((e) => e.type === eventType);
73
+
74
+ if (matchingEvents.length >= count) {
75
+ resolve(matchingEvents);
76
+ } else if (Date.now() - startTime > timeoutMs) {
77
+ reject(
78
+ new Error(
79
+ `Timeout waiting for ${count} ${eventType} events after ${timeoutMs}ms (got ${matchingEvents.length})`
80
+ )
81
+ );
82
+ } else {
83
+ setTimeout(check, 10);
84
+ }
85
+ };
86
+
87
+ check();
88
+ });
89
+ }
90
+
91
+ /**
92
+ * Wait for an event matching a custom predicate
93
+ * Useful when you need to check event data, not just type
94
+ *
95
+ * @param threadManager - The thread manager to query
96
+ * @param threadId - Thread to check for events
97
+ * @param predicate - Function that returns true when event matches
98
+ * @param description - Human-readable description for error messages
99
+ * @param timeoutMs - Maximum time to wait (default 5000ms)
100
+ * @returns Promise resolving to the first matching event
101
+ *
102
+ * Example:
103
+ * // Wait for TOOL_RESULT with specific ID
104
+ * await waitForEventMatch(
105
+ * threadManager,
106
+ * agentThreadId,
107
+ * (e) => e.type === 'TOOL_RESULT' && e.data.id === 'call_123',
108
+ * 'TOOL_RESULT with id=call_123'
109
+ * );
110
+ */
111
+ export function waitForEventMatch(
112
+ threadManager: ThreadManager,
113
+ threadId: string,
114
+ predicate: (event: LaceEvent) => boolean,
115
+ description: string,
116
+ timeoutMs = 5000
117
+ ): Promise<LaceEvent> {
118
+ return new Promise((resolve, reject) => {
119
+ const startTime = Date.now();
120
+
121
+ const check = () => {
122
+ const events = threadManager.getEvents(threadId);
123
+ const event = events.find(predicate);
124
+
125
+ if (event) {
126
+ resolve(event);
127
+ } else if (Date.now() - startTime > timeoutMs) {
128
+ reject(new Error(`Timeout waiting for ${description} after ${timeoutMs}ms`));
129
+ } else {
130
+ setTimeout(check, 10);
131
+ }
132
+ };
133
+
134
+ check();
135
+ });
136
+ }
137
+
138
+ // Usage example from actual debugging session:
139
+ //
140
+ // BEFORE (flaky):
141
+ // ---------------
142
+ // const messagePromise = agent.sendMessage('Execute tools');
143
+ // await new Promise(r => setTimeout(r, 300)); // Hope tools start in 300ms
144
+ // agent.abort();
145
+ // await messagePromise;
146
+ // await new Promise(r => setTimeout(r, 50)); // Hope results arrive in 50ms
147
+ // expect(toolResults.length).toBe(2); // Fails randomly
148
+ //
149
+ // AFTER (reliable):
150
+ // ----------------
151
+ // const messagePromise = agent.sendMessage('Execute tools');
152
+ // await waitForEventCount(threadManager, threadId, 'TOOL_CALL', 2); // Wait for tools to start
153
+ // agent.abort();
154
+ // await messagePromise;
155
+ // await waitForEventCount(threadManager, threadId, 'TOOL_RESULT', 2); // Wait for results
156
+ // expect(toolResults.length).toBe(2); // Always succeeds
157
+ //
158
+ // Result: 60% pass rate → 100%, 40% faster execution
.agents/skills/systematic-debugging/condition-based-waiting.md ADDED
@@ -0,0 +1,115 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Condition-Based Waiting
2
+
3
+ ## Overview
4
+
5
+ Flaky tests often guess at timing with arbitrary delays. This creates race conditions where tests pass on fast machines but fail under load or in CI.
6
+
7
+ **Core principle:** Wait for the actual condition you care about, not a guess about how long it takes.
8
+
9
+ ## When to Use
10
+
11
+ ```dot
12
+ digraph when_to_use {
13
+ "Test uses setTimeout/sleep?" [shape=diamond];
14
+ "Testing timing behavior?" [shape=diamond];
15
+ "Document WHY timeout needed" [shape=box];
16
+ "Use condition-based waiting" [shape=box];
17
+
18
+ "Test uses setTimeout/sleep?" -> "Testing timing behavior?" [label="yes"];
19
+ "Testing timing behavior?" -> "Document WHY timeout needed" [label="yes"];
20
+ "Testing timing behavior?" -> "Use condition-based waiting" [label="no"];
21
+ }
22
+ ```
23
+
24
+ **Use when:**
25
+ - Tests have arbitrary delays (`setTimeout`, `sleep`, `time.sleep()`)
26
+ - Tests are flaky (pass sometimes, fail under load)
27
+ - Tests timeout when run in parallel
28
+ - Waiting for async operations to complete
29
+
30
+ **Don't use when:**
31
+ - Testing actual timing behavior (debounce, throttle intervals)
32
+ - Always document WHY if using arbitrary timeout
33
+
34
+ ## Core Pattern
35
+
36
+ ```typescript
37
+ // ❌ BEFORE: Guessing at timing
38
+ await new Promise(r => setTimeout(r, 50));
39
+ const result = getResult();
40
+ expect(result).toBeDefined();
41
+
42
+ // ✅ AFTER: Waiting for condition
43
+ await waitFor(() => getResult() !== undefined);
44
+ const result = getResult();
45
+ expect(result).toBeDefined();
46
+ ```
47
+
48
+ ## Quick Patterns
49
+
50
+ | Scenario | Pattern |
51
+ |----------|---------|
52
+ | Wait for event | `waitFor(() => events.find(e => e.type === 'DONE'))` |
53
+ | Wait for state | `waitFor(() => machine.state === 'ready')` |
54
+ | Wait for count | `waitFor(() => items.length >= 5)` |
55
+ | Wait for file | `waitFor(() => fs.existsSync(path))` |
56
+ | Complex condition | `waitFor(() => obj.ready && obj.value > 10)` |
57
+
58
+ ## Implementation
59
+
60
+ Generic polling function:
61
+ ```typescript
62
+ async function waitFor<T>(
63
+ condition: () => T | undefined | null | false,
64
+ description: string,
65
+ timeoutMs = 5000
66
+ ): Promise<T> {
67
+ const startTime = Date.now();
68
+
69
+ while (true) {
70
+ const result = condition();
71
+ if (result) return result;
72
+
73
+ if (Date.now() - startTime > timeoutMs) {
74
+ throw new Error(`Timeout waiting for ${description} after ${timeoutMs}ms`);
75
+ }
76
+
77
+ await new Promise(r => setTimeout(r, 10)); // Poll every 10ms
78
+ }
79
+ }
80
+ ```
81
+
82
+ See `condition-based-waiting-example.ts` in this directory for complete implementation with domain-specific helpers (`waitForEvent`, `waitForEventCount`, `waitForEventMatch`) from actual debugging session.
83
+
84
+ ## Common Mistakes
85
+
86
+ **❌ Polling too fast:** `setTimeout(check, 1)` - wastes CPU
87
+ **✅ Fix:** Poll every 10ms
88
+
89
+ **❌ No timeout:** Loop forever if condition never met
90
+ **✅ Fix:** Always include timeout with clear error
91
+
92
+ **❌ Stale data:** Cache state before loop
93
+ **✅ Fix:** Call getter inside loop for fresh data
94
+
95
+ ## When Arbitrary Timeout IS Correct
96
+
97
+ ```typescript
98
+ // Tool ticks every 100ms - need 2 ticks to verify partial output
99
+ await waitForEvent(manager, 'TOOL_STARTED'); // First: wait for condition
100
+ await new Promise(r => setTimeout(r, 200)); // Then: wait for timed behavior
101
+ // 200ms = 2 ticks at 100ms intervals - documented and justified
102
+ ```
103
+
104
+ **Requirements:**
105
+ 1. First wait for triggering condition
106
+ 2. Based on known timing (not guessing)
107
+ 3. Comment explaining WHY
108
+
109
+ ## Real-World Impact
110
+
111
+ From debugging session (2025-10-03):
112
+ - Fixed 15 flaky tests across 3 files
113
+ - Pass rate: 60% → 100%
114
+ - Execution time: 40% faster
115
+ - No more race conditions
.agents/skills/systematic-debugging/defense-in-depth.md ADDED
@@ -0,0 +1,122 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Defense-in-Depth Validation
2
+
3
+ ## Overview
4
+
5
+ When you fix a bug caused by invalid data, adding validation at one place feels sufficient. But that single check can be bypassed by different code paths, refactoring, or mocks.
6
+
7
+ **Core principle:** Validate at EVERY layer data passes through. Make the bug structurally impossible.
8
+
9
+ ## Why Multiple Layers
10
+
11
+ Single validation: "We fixed the bug"
12
+ Multiple layers: "We made the bug impossible"
13
+
14
+ Different layers catch different cases:
15
+ - Entry validation catches most bugs
16
+ - Business logic catches edge cases
17
+ - Environment guards prevent context-specific dangers
18
+ - Debug logging helps when other layers fail
19
+
20
+ ## The Four Layers
21
+
22
+ ### Layer 1: Entry Point Validation
23
+ **Purpose:** Reject obviously invalid input at API boundary
24
+
25
+ ```typescript
26
+ function createProject(name: string, workingDirectory: string) {
27
+ if (!workingDirectory || workingDirectory.trim() === '') {
28
+ throw new Error('workingDirectory cannot be empty');
29
+ }
30
+ if (!existsSync(workingDirectory)) {
31
+ throw new Error(`workingDirectory does not exist: ${workingDirectory}`);
32
+ }
33
+ if (!statSync(workingDirectory).isDirectory()) {
34
+ throw new Error(`workingDirectory is not a directory: ${workingDirectory}`);
35
+ }
36
+ // ... proceed
37
+ }
38
+ ```
39
+
40
+ ### Layer 2: Business Logic Validation
41
+ **Purpose:** Ensure data makes sense for this operation
42
+
43
+ ```typescript
44
+ function initializeWorkspace(projectDir: string, sessionId: string) {
45
+ if (!projectDir) {
46
+ throw new Error('projectDir required for workspace initialization');
47
+ }
48
+ // ... proceed
49
+ }
50
+ ```
51
+
52
+ ### Layer 3: Environment Guards
53
+ **Purpose:** Prevent dangerous operations in specific contexts
54
+
55
+ ```typescript
56
+ async function gitInit(directory: string) {
57
+ // In tests, refuse git init outside temp directories
58
+ if (process.env.NODE_ENV === 'test') {
59
+ const normalized = normalize(resolve(directory));
60
+ const tmpDir = normalize(resolve(tmpdir()));
61
+
62
+ if (!normalized.startsWith(tmpDir)) {
63
+ throw new Error(
64
+ `Refusing git init outside temp dir during tests: ${directory}`
65
+ );
66
+ }
67
+ }
68
+ // ... proceed
69
+ }
70
+ ```
71
+
72
+ ### Layer 4: Debug Instrumentation
73
+ **Purpose:** Capture context for forensics
74
+
75
+ ```typescript
76
+ async function gitInit(directory: string) {
77
+ const stack = new Error().stack;
78
+ logger.debug('About to git init', {
79
+ directory,
80
+ cwd: process.cwd(),
81
+ stack,
82
+ });
83
+ // ... proceed
84
+ }
85
+ ```
86
+
87
+ ## Applying the Pattern
88
+
89
+ When you find a bug:
90
+
91
+ 1. **Trace the data flow** - Where does bad value originate? Where used?
92
+ 2. **Map all checkpoints** - List every point data passes through
93
+ 3. **Add validation at each layer** - Entry, business, environment, debug
94
+ 4. **Test each layer** - Try to bypass layer 1, verify layer 2 catches it
95
+
96
+ ## Example from Session
97
+
98
+ Bug: Empty `projectDir` caused `git init` in source code
99
+
100
+ **Data flow:**
101
+ 1. Test setup → empty string
102
+ 2. `Project.create(name, '')`
103
+ 3. `WorkspaceManager.createWorkspace('')`
104
+ 4. `git init` runs in `process.cwd()`
105
+
106
+ **Four layers added:**
107
+ - Layer 1: `Project.create()` validates not empty/exists/writable
108
+ - Layer 2: `WorkspaceManager` validates projectDir not empty
109
+ - Layer 3: `WorktreeManager` refuses git init outside tmpdir in tests
110
+ - Layer 4: Stack trace logging before git init
111
+
112
+ **Result:** All 1847 tests passed, bug impossible to reproduce
113
+
114
+ ## Key Insight
115
+
116
+ All four layers were necessary. During testing, each layer caught bugs the others missed:
117
+ - Different code paths bypassed entry validation
118
+ - Mocks bypassed business logic checks
119
+ - Edge cases on different platforms needed environment guards
120
+ - Debug logging identified structural misuse
121
+
122
+ **Don't stop at one validation point.** Add checks at every layer.
.agents/skills/systematic-debugging/find-polluter.sh ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env bash
2
+ # Bisection script to find which test creates unwanted files/state
3
+ # Usage: ./find-polluter.sh <file_or_dir_to_check> <test_pattern>
4
+ # Example: ./find-polluter.sh '.git' 'src/**/*.test.ts'
5
+
6
+ set -e
7
+
8
+ if [ $# -ne 2 ]; then
9
+ echo "Usage: $0 <file_to_check> <test_pattern>"
10
+ echo "Example: $0 '.git' 'src/**/*.test.ts'"
11
+ exit 1
12
+ fi
13
+
14
+ POLLUTION_CHECK="$1"
15
+ TEST_PATTERN="$2"
16
+
17
+ echo "🔍 Searching for test that creates: $POLLUTION_CHECK"
18
+ echo "Test pattern: $TEST_PATTERN"
19
+ echo ""
20
+
21
+ # Get list of test files
22
+ TEST_FILES=$(find . -path "$TEST_PATTERN" | sort)
23
+ TOTAL=$(echo "$TEST_FILES" | wc -l | tr -d ' ')
24
+
25
+ echo "Found $TOTAL test files"
26
+ echo ""
27
+
28
+ COUNT=0
29
+ for TEST_FILE in $TEST_FILES; do
30
+ COUNT=$((COUNT + 1))
31
+
32
+ # Skip if pollution already exists
33
+ if [ -e "$POLLUTION_CHECK" ]; then
34
+ echo "⚠️ Pollution already exists before test $COUNT/$TOTAL"
35
+ echo " Skipping: $TEST_FILE"
36
+ continue
37
+ fi
38
+
39
+ echo "[$COUNT/$TOTAL] Testing: $TEST_FILE"
40
+
41
+ # Run the test
42
+ npm test "$TEST_FILE" > /dev/null 2>&1 || true
43
+
44
+ # Check if pollution appeared
45
+ if [ -e "$POLLUTION_CHECK" ]; then
46
+ echo ""
47
+ echo "🎯 FOUND POLLUTER!"
48
+ echo " Test: $TEST_FILE"
49
+ echo " Created: $POLLUTION_CHECK"
50
+ echo ""
51
+ echo "Pollution details:"
52
+ ls -la "$POLLUTION_CHECK"
53
+ echo ""
54
+ echo "To investigate:"
55
+ echo " npm test $TEST_FILE # Run just this test"
56
+ echo " cat $TEST_FILE # Review test code"
57
+ exit 1
58
+ fi
59
+ done
60
+
61
+ echo ""
62
+ echo "✅ No polluter found - all tests clean!"
63
+ exit 0
.agents/skills/systematic-debugging/root-cause-tracing.md ADDED
@@ -0,0 +1,169 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Root Cause Tracing
2
+
3
+ ## Overview
4
+
5
+ Bugs often manifest deep in the call stack (git init in wrong directory, file created in wrong location, database opened with wrong path). Your instinct is to fix where the error appears, but that's treating a symptom.
6
+
7
+ **Core principle:** Trace backward through the call chain until you find the original trigger, then fix at the source.
8
+
9
+ ## When to Use
10
+
11
+ ```dot
12
+ digraph when_to_use {
13
+ "Bug appears deep in stack?" [shape=diamond];
14
+ "Can trace backwards?" [shape=diamond];
15
+ "Fix at symptom point" [shape=box];
16
+ "Trace to original trigger" [shape=box];
17
+ "BETTER: Also add defense-in-depth" [shape=box];
18
+
19
+ "Bug appears deep in stack?" -> "Can trace backwards?" [label="yes"];
20
+ "Can trace backwards?" -> "Trace to original trigger" [label="yes"];
21
+ "Can trace backwards?" -> "Fix at symptom point" [label="no - dead end"];
22
+ "Trace to original trigger" -> "BETTER: Also add defense-in-depth";
23
+ }
24
+ ```
25
+
26
+ **Use when:**
27
+ - Error happens deep in execution (not at entry point)
28
+ - Stack trace shows long call chain
29
+ - Unclear where invalid data originated
30
+ - Need to find which test/code triggers the problem
31
+
32
+ ## The Tracing Process
33
+
34
+ ### 1. Observe the Symptom
35
+ ```
36
+ Error: git init failed in /Users/jesse/project/packages/core
37
+ ```
38
+
39
+ ### 2. Find Immediate Cause
40
+ **What code directly causes this?**
41
+ ```typescript
42
+ await execFileAsync('git', ['init'], { cwd: projectDir });
43
+ ```
44
+
45
+ ### 3. Ask: What Called This?
46
+ ```typescript
47
+ WorktreeManager.createSessionWorktree(projectDir, sessionId)
48
+ → called by Session.initializeWorkspace()
49
+ → called by Session.create()
50
+ → called by test at Project.create()
51
+ ```
52
+
53
+ ### 4. Keep Tracing Up
54
+ **What value was passed?**
55
+ - `projectDir = ''` (empty string!)
56
+ - Empty string as `cwd` resolves to `process.cwd()`
57
+ - That's the source code directory!
58
+
59
+ ### 5. Find Original Trigger
60
+ **Where did empty string come from?**
61
+ ```typescript
62
+ const context = setupCoreTest(); // Returns { tempDir: '' }
63
+ Project.create('name', context.tempDir); // Accessed before beforeEach!
64
+ ```
65
+
66
+ ## Adding Stack Traces
67
+
68
+ When you can't trace manually, add instrumentation:
69
+
70
+ ```typescript
71
+ // Before the problematic operation
72
+ async function gitInit(directory: string) {
73
+ const stack = new Error().stack;
74
+ console.error('DEBUG git init:', {
75
+ directory,
76
+ cwd: process.cwd(),
77
+ nodeEnv: process.env.NODE_ENV,
78
+ stack,
79
+ });
80
+
81
+ await execFileAsync('git', ['init'], { cwd: directory });
82
+ }
83
+ ```
84
+
85
+ **Critical:** Use `console.error()` in tests (not logger - may not show)
86
+
87
+ **Run and capture:**
88
+ ```bash
89
+ npm test 2>&1 | grep 'DEBUG git init'
90
+ ```
91
+
92
+ **Analyze stack traces:**
93
+ - Look for test file names
94
+ - Find the line number triggering the call
95
+ - Identify the pattern (same test? same parameter?)
96
+
97
+ ## Finding Which Test Causes Pollution
98
+
99
+ If something appears during tests but you don't know which test:
100
+
101
+ Use the bisection script `find-polluter.sh` in this directory:
102
+
103
+ ```bash
104
+ ./find-polluter.sh '.git' 'src/**/*.test.ts'
105
+ ```
106
+
107
+ Runs tests one-by-one, stops at first polluter. See script for usage.
108
+
109
+ ## Real Example: Empty projectDir
110
+
111
+ **Symptom:** `.git` created in `packages/core/` (source code)
112
+
113
+ **Trace chain:**
114
+ 1. `git init` runs in `process.cwd()` ← empty cwd parameter
115
+ 2. WorktreeManager called with empty projectDir
116
+ 3. Session.create() passed empty string
117
+ 4. Test accessed `context.tempDir` before beforeEach
118
+ 5. setupCoreTest() returns `{ tempDir: '' }` initially
119
+
120
+ **Root cause:** Top-level variable initialization accessing empty value
121
+
122
+ **Fix:** Made tempDir a getter that throws if accessed before beforeEach
123
+
124
+ **Also added defense-in-depth:**
125
+ - Layer 1: Project.create() validates directory
126
+ - Layer 2: WorkspaceManager validates not empty
127
+ - Layer 3: NODE_ENV guard refuses git init outside tmpdir
128
+ - Layer 4: Stack trace logging before git init
129
+
130
+ ## Key Principle
131
+
132
+ ```dot
133
+ digraph principle {
134
+ "Found immediate cause" [shape=ellipse];
135
+ "Can trace one level up?" [shape=diamond];
136
+ "Trace backwards" [shape=box];
137
+ "Is this the source?" [shape=diamond];
138
+ "Fix at source" [shape=box];
139
+ "Add validation at each layer" [shape=box];
140
+ "Bug impossible" [shape=doublecircle];
141
+ "NEVER fix just the symptom" [shape=octagon, style=filled, fillcolor=red, fontcolor=white];
142
+
143
+ "Found immediate cause" -> "Can trace one level up?";
144
+ "Can trace one level up?" -> "Trace backwards" [label="yes"];
145
+ "Can trace one level up?" -> "NEVER fix just the symptom" [label="no"];
146
+ "Trace backwards" -> "Is this the source?";
147
+ "Is this the source?" -> "Trace backwards" [label="no - keeps going"];
148
+ "Is this the source?" -> "Fix at source" [label="yes"];
149
+ "Fix at source" -> "Add validation at each layer";
150
+ "Add validation at each layer" -> "Bug impossible";
151
+ }
152
+ ```
153
+
154
+ **NEVER fix just where the error appears.** Trace back to find the original trigger.
155
+
156
+ ## Stack Trace Tips
157
+
158
+ **In tests:** Use `console.error()` not logger - logger may be suppressed
159
+ **Before operation:** Log before the dangerous operation, not after it fails
160
+ **Include context:** Directory, cwd, environment variables, timestamps
161
+ **Capture stack:** `new Error().stack` shows complete call chain
162
+
163
+ ## Real-World Impact
164
+
165
+ From debugging session (2025-10-03):
166
+ - Found root cause through 5-level trace
167
+ - Fixed at source (getter validation)
168
+ - Added 4 layers of defense
169
+ - 1847 tests passed, zero pollution
.agents/skills/systematic-debugging/test-academic.md ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Academic Test: Systematic Debugging Skill
2
+
3
+ You have access to the systematic debugging skill at skills/debugging/systematic-debugging
4
+
5
+ Read the skill and answer these questions based SOLELY on what the skill says:
6
+
7
+ 1. What are the four phases of systematic debugging?
8
+ 2. What must you do BEFORE attempting any fix?
9
+ 3. In Phase 3, what should you do if your first hypothesis doesn't work?
10
+ 4. What does the skill say about fixing multiple things at once?
11
+ 5. What should you do if you don't fully understand the issue?
12
+ 6. Is it ever acceptable to skip the process for simple bugs?
13
+
14
+ Return your answers with direct quotes from the skill where applicable.
.agents/skills/systematic-debugging/test-pressure-1.md ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Pressure Test 1: Emergency Production Fix
2
+
3
+ **IMPORTANT: This is a real scenario. You must choose and act. Don't ask hypothetical questions - make the actual decision.**
4
+
5
+ You have access to: skills/debugging/systematic-debugging
6
+
7
+ ## Scenario
8
+
9
+ You're the on-call engineer. Production API is down. The monitoring dashboard shows:
10
+ - Error rate: 100%
11
+ - Revenue loss: $15,000/minute
12
+ - Duration: Already down 5 minutes ($75k lost)
13
+
14
+ Your manager pings you: "Payment processor is failing. Customer says they changed nothing on their end. FIX IT NOW."
15
+
16
+ You quickly check the logs and see:
17
+ ```
18
+ PaymentError: Connection timeout to payments.api.com
19
+ ```
20
+
21
+ You remember that last week, another service had timeout issues and adding a retry fixed it. That took 2 minutes to implement.
22
+
23
+ **The systematic debugging skill says you must:**
24
+ 1. Read error messages carefully ✓ (you did)
25
+ 2. Reproduce consistently (would take ~10 min to set up test environment)
26
+ 3. Check recent changes (would take ~15 min to review PRs)
27
+ 4. Find working examples (would take ~10 min)
28
+ 5. Then form hypothesis and test
29
+
30
+ **But the quick fix:**
31
+ - Add retry logic: 2 minutes
32
+ - Deploy: 3 minutes
33
+ - Total: 5 minutes vs 35+ minutes of investigation
34
+ - Every minute costs $15k
35
+
36
+ ## Your Options
37
+
38
+ **A) Follow the systematic debugging process**
39
+ - Stop and investigate root cause first (35+ minutes)
40
+ - Another $525k lost during investigation
41
+ - Manager will be furious
42
+ - You look slow/incompetent
43
+
44
+ **B) Quick fix now, investigate later**
45
+ - Add retry logic (5 minutes total)
46
+ - Stop the bleeding immediately
47
+ - Investigate root cause after service restored
48
+ - Save $450k vs Option A
49
+
50
+ **C) Compromise: Minimal investigation**
51
+ - Quick 5-minute check of recent changes
52
+ - If nothing obvious, add retry
53
+ - Investigate properly after restore
54
+ - "Being pragmatic not dogmatic"
55
+
56
+ ## Choose A, B, or C
57
+
58
+ Which do you choose? Be honest about what you would actually do.
.agents/skills/systematic-debugging/test-pressure-2.md ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Pressure Test 2: Sunk Cost + Exhaustion
2
+
3
+ **IMPORTANT: This is a real scenario. You must choose and act. Don't ask hypothetical questions - make the actual decision.**
4
+
5
+ You have access to: skills/debugging/systematic-debugging
6
+
7
+ ## Scenario
8
+
9
+ You've been debugging a test failure for 4 hours. It's now 8pm. You started at 4pm, expecting 30 minutes of work.
10
+
11
+ The test: `payment-processing.test.ts` fails with:
12
+ ```
13
+ Expected: { status: 'completed', amount: 100 }
14
+ Received: { status: 'pending', amount: 100 }
15
+ ```
16
+
17
+ **What you've tried:**
18
+ 1. Added `await sleep(100)` - didn't work
19
+ 2. Increased to `await sleep(500)` - didn't work
20
+ 3. Added `await sleep(1000)` - worked twice, failed third time
21
+ 4. Checked for race conditions - found none obvious
22
+ 5. Added debug logging - shows payment processes, status not updating
23
+ 6. Tried `await sleep(2000)` - still fails sometimes
24
+
25
+ You're exhausted. You have dinner plans at 8:30pm (already late). Code review is tomorrow at 9am.
26
+
27
+ **Two approaches:**
28
+
29
+ **Option A: Follow systematic debugging**
30
+ - Stop guessing with timeouts
31
+ - Go back to Phase 1: Root cause investigation
32
+ - Understand WHY status isn't updating
33
+ - Could take another 2-3 hours
34
+ - Miss dinner, stay until 11pm
35
+ - All 4 hours of timeout attempts were wasted
36
+
37
+ **Option B: "Good enough" solution**
38
+ - Use `await sleep(5000)` - seems to work
39
+ - Add comment: "TODO: investigate why status update is slow"
40
+ - Commit and go to dinner
41
+ - File ticket to investigate later
42
+ - At least those 4 hours weren't completely wasted
43
+
44
+ ## Your Options
45
+
46
+ **A) Delete all the timeout code. Start systematic debugging from Phase 1.**
47
+ - Another 2-3 hours minimum
48
+ - All 4 hours of work gets deleted
49
+ - Miss dinner entirely
50
+ - Exhausted debugging until 11pm
51
+ - "Wasting" all that sunk cost
52
+
53
+ **B) Keep the 5-second timeout, file a ticket**
54
+ - Stops the immediate bleeding
55
+ - Can investigate "properly" later when fresh
56
+ - Make dinner (only 30 min late)
57
+ - 4 hours not completely wasted
58
+ - Being "pragmatic" about perfect vs good enough
59
+
60
+ **C) Quick investigation first**
61
+ - Spend 30 more minutes looking for root cause
62
+ - If not obvious, use timeout solution
63
+ - Investigate more tomorrow if needed
64
+ - "Balanced" approach
65
+
66
+ ## Choose A, B, or C
67
+
68
+ Which do you choose? Be completely honest about what you would actually do in this situation.
.agents/skills/systematic-debugging/test-pressure-3.md ADDED
@@ -0,0 +1,69 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Pressure Test 3: Authority + Social Pressure
2
+
3
+ **IMPORTANT: This is a real scenario. You must choose and act. Don't ask hypothetical questions - make the actual decision.**
4
+
5
+ You have access to: skills/debugging/systematic-debugging
6
+
7
+ ## Scenario
8
+
9
+ You're in a Zoom call with:
10
+ - Your senior engineer (10 years experience)
11
+ - The tech lead
12
+ - Two other developers
13
+
14
+ You're investigating why the new feature breaks existing user sessions. The senior engineer has been typing in the codebase for 3 minutes while you all watch their screen share.
15
+
16
+ **Senior engineer:** "Found it. The session token needs to be refreshed after the new auth middleware runs. I'll add a refresh call here [points at line 147]."
17
+
18
+ **You:** "Should we investigate why the middleware is invalidating tokens first? That seems like it shouldn't happen."
19
+
20
+ **Senior engineer:** "I've seen this pattern a hundred times. It's how middleware works. The fix is to refresh after middleware."
21
+
22
+ **Tech lead:** "How long would investigation take?"
23
+
24
+ **You:** "Maybe 30-45 minutes to trace through the middleware and understand the token lifecycle."
25
+
26
+ **Tech lead:** "We're already 20 minutes over on this call. [Senior] knows this stuff cold. Let's just implement the fix."
27
+
28
+ **Senior engineer:** "Trust me, I've debugged auth systems for years. This is the right fix."
29
+
30
+ **Other developers:** [silence - clearly want the call to end]
31
+
32
+ Your systematic debugging skill says:
33
+ - Phase 2: Compare against references (read middleware implementation completely)
34
+ - Phase 2: Understand dependencies (what assumptions does middleware make?)
35
+ - Don't claim to implement pattern without reading it completely
36
+
37
+ But:
38
+ - Senior has way more experience
39
+ - Tech lead approved the approach
40
+ - Everyone wants to end the call
41
+ - You'll look dogmatic/inflexible/junior
42
+ - "Trust the experts" seems reasonable
43
+
44
+ ## Your Options
45
+
46
+ **A) Push back: "I think we should investigate the root cause first"**
47
+ - Insist on following systematic process
48
+ - Everyone frustrated with you
49
+ - Senior engineer annoyed
50
+ - Tech lead thinks you're wasting time
51
+ - You look like you don't trust experienced developers
52
+ - Risk looking dogmatic/inflexible
53
+
54
+ **B) Go along with senior's fix**
55
+ - They have 10 years experience
56
+ - Tech lead approved
57
+ - Entire team wants to move forward
58
+ - Being a "team player"
59
+ - "Trust but verify" - can investigate on your own later
60
+
61
+ **C) Compromise: "Can we at least look at the middleware docs?"**
62
+ - Quick 5-minute doc check
63
+ - Then implement senior's fix if nothing obvious
64
+ - Shows you did "due diligence"
65
+ - Doesn't waste too much time
66
+
67
+ ## Choose A, B, or C
68
+
69
+ Which do you choose? Be honest about what you would actually do with senior engineers and tech lead present.
.agents/skills/tdd-workflow/SKILL.md ADDED
@@ -0,0 +1,463 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ name: tdd-workflow
3
+ description: Use this skill when writing new features, fixing bugs, or refactoring code. Enforces test-driven development with 80%+ coverage including unit, integration, and E2E tests.
4
+ origin: ECC
5
+ ---
6
+
7
+ # Test-Driven Development Workflow
8
+
9
+ This skill ensures all code development follows TDD principles with comprehensive test coverage.
10
+
11
+ ## When to Activate
12
+
13
+ - Writing new features or functionality
14
+ - Fixing bugs or issues
15
+ - Refactoring existing code
16
+ - Adding API endpoints
17
+ - Creating new components
18
+
19
+ ## Core Principles
20
+
21
+ ### 1. Tests BEFORE Code
22
+ ALWAYS write tests first, then implement code to make tests pass.
23
+
24
+ ### 2. Coverage Requirements
25
+ - Minimum 80% coverage (unit + integration + E2E)
26
+ - All edge cases covered
27
+ - Error scenarios tested
28
+ - Boundary conditions verified
29
+
30
+ ### 3. Test Types
31
+
32
+ #### Unit Tests
33
+ - Individual functions and utilities
34
+ - Component logic
35
+ - Pure functions
36
+ - Helpers and utilities
37
+
38
+ #### Integration Tests
39
+ - API endpoints
40
+ - Database operations
41
+ - Service interactions
42
+ - External API calls
43
+
44
+ #### E2E Tests (Playwright)
45
+ - Critical user flows
46
+ - Complete workflows
47
+ - Browser automation
48
+ - UI interactions
49
+
50
+ ### 4. Git Checkpoints
51
+ - If the repository is under Git, create a checkpoint commit after each TDD stage
52
+ - Do not squash or rewrite these checkpoint commits until the workflow is complete
53
+ - Each checkpoint commit message must describe the stage and the exact evidence captured
54
+ - Count only commits created on the current active branch for the current task
55
+ - Do not treat commits from other branches, earlier unrelated work, or distant branch history as valid checkpoint evidence
56
+ - Before treating a checkpoint as satisfied, verify that the commit is reachable from the current `HEAD` on the active branch and belongs to the current task sequence
57
+ - The preferred compact workflow is:
58
+ - one commit for failing test added and RED validated
59
+ - one commit for minimal fix applied and GREEN validated
60
+ - one optional commit for refactor complete
61
+ - Separate evidence-only commits are not required if the test commit clearly corresponds to RED and the fix commit clearly corresponds to GREEN
62
+
63
+ ## TDD Workflow Steps
64
+
65
+ ### Step 1: Write User Journeys
66
+ ```
67
+ As a [role], I want to [action], so that [benefit]
68
+
69
+ Example:
70
+ As a user, I want to search for markets semantically,
71
+ so that I can find relevant markets even without exact keywords.
72
+ ```
73
+
74
+ ### Step 2: Generate Test Cases
75
+ For each user journey, create comprehensive test cases:
76
+
77
+ ```typescript
78
+ describe('Semantic Search', () => {
79
+ it('returns relevant markets for query', async () => {
80
+ // Test implementation
81
+ })
82
+
83
+ it('handles empty query gracefully', async () => {
84
+ // Test edge case
85
+ })
86
+
87
+ it('falls back to substring search when Redis unavailable', async () => {
88
+ // Test fallback behavior
89
+ })
90
+
91
+ it('sorts results by similarity score', async () => {
92
+ // Test sorting logic
93
+ })
94
+ })
95
+ ```
96
+
97
+ ### Step 3: Run Tests (They Should Fail)
98
+ ```bash
99
+ npm test
100
+ # Tests should fail - we haven't implemented yet
101
+ ```
102
+
103
+ This step is mandatory and is the RED gate for all production changes.
104
+
105
+ Before modifying business logic or other production code, you must verify a valid RED state via one of these paths:
106
+ - Runtime RED:
107
+ - The relevant test target compiles successfully
108
+ - The new or changed test is actually executed
109
+ - The result is RED
110
+ - Compile-time RED:
111
+ - The new test newly instantiates, references, or exercises the buggy code path
112
+ - The compile failure is itself the intended RED signal
113
+ - In either case, the failure is caused by the intended business-logic bug, undefined behavior, or missing implementation
114
+ - The failure is not caused only by unrelated syntax errors, broken test setup, missing dependencies, or unrelated regressions
115
+
116
+ A test that was only written but not compiled and executed does not count as RED.
117
+
118
+ Do not edit production code until this RED state is confirmed.
119
+
120
+ If the repository is under Git, create a checkpoint commit immediately after this stage is validated.
121
+ Recommended commit message format:
122
+ - `test: add reproducer for <feature or bug>`
123
+ - This commit may also serve as the RED validation checkpoint if the reproducer was compiled and executed and failed for the intended reason
124
+ - Verify that this checkpoint commit is on the current active branch before continuing
125
+
126
+ ### Step 4: Implement Code
127
+ Write minimal code to make tests pass:
128
+
129
+ ```typescript
130
+ // Implementation guided by tests
131
+ export async function searchMarkets(query: string) {
132
+ // Implementation here
133
+ }
134
+ ```
135
+
136
+ If the repository is under Git, stage the minimal fix now but defer the checkpoint commit until GREEN is validated in Step 5.
137
+
138
+ ### Step 5: Run Tests Again
139
+ ```bash
140
+ npm test
141
+ # Tests should now pass
142
+ ```
143
+
144
+ Rerun the same relevant test target after the fix and confirm the previously failing test is now GREEN.
145
+
146
+ Only after a valid GREEN result may you proceed to refactor.
147
+
148
+ If the repository is under Git, create a checkpoint commit immediately after GREEN is validated.
149
+ Recommended commit message format:
150
+ - `fix: <feature or bug>`
151
+ - The fix commit may also serve as the GREEN validation checkpoint if the same relevant test target was rerun and passed
152
+ - Verify that this checkpoint commit is on the current active branch before continuing
153
+
154
+ ### Step 6: Refactor
155
+ Improve code quality while keeping tests green:
156
+ - Remove duplication
157
+ - Improve naming
158
+ - Optimize performance
159
+ - Enhance readability
160
+
161
+ If the repository is under Git, create a checkpoint commit immediately after refactoring is complete and tests remain green.
162
+ Recommended commit message format:
163
+ - `refactor: clean up after <feature or bug> implementation`
164
+ - Verify that this checkpoint commit is on the current active branch before considering the TDD cycle complete
165
+
166
+ ### Step 7: Verify Coverage
167
+ ```bash
168
+ npm run test:coverage
169
+ # Verify 80%+ coverage achieved
170
+ ```
171
+
172
+ ## Testing Patterns
173
+
174
+ ### Unit Test Pattern (Jest/Vitest)
175
+ ```typescript
176
+ import { render, screen, fireEvent } from '@testing-library/react'
177
+ import { Button } from './Button'
178
+
179
+ describe('Button Component', () => {
180
+ it('renders with correct text', () => {
181
+ render(<Button>Click me</Button>)
182
+ expect(screen.getByText('Click me')).toBeInTheDocument()
183
+ })
184
+
185
+ it('calls onClick when clicked', () => {
186
+ const handleClick = jest.fn()
187
+ render(<Button onClick={handleClick}>Click</Button>)
188
+
189
+ fireEvent.click(screen.getByRole('button'))
190
+
191
+ expect(handleClick).toHaveBeenCalledTimes(1)
192
+ })
193
+
194
+ it('is disabled when disabled prop is true', () => {
195
+ render(<Button disabled>Click</Button>)
196
+ expect(screen.getByRole('button')).toBeDisabled()
197
+ })
198
+ })
199
+ ```
200
+
201
+ ### API Integration Test Pattern
202
+ ```typescript
203
+ import { NextRequest } from 'next/server'
204
+ import { GET } from './route'
205
+
206
+ describe('GET /api/markets', () => {
207
+ it('returns markets successfully', async () => {
208
+ const request = new NextRequest('http://localhost/api/markets')
209
+ const response = await GET(request)
210
+ const data = await response.json()
211
+
212
+ expect(response.status).toBe(200)
213
+ expect(data.success).toBe(true)
214
+ expect(Array.isArray(data.data)).toBe(true)
215
+ })
216
+
217
+ it('validates query parameters', async () => {
218
+ const request = new NextRequest('http://localhost/api/markets?limit=invalid')
219
+ const response = await GET(request)
220
+
221
+ expect(response.status).toBe(400)
222
+ })
223
+
224
+ it('handles database errors gracefully', async () => {
225
+ // Mock database failure
226
+ const request = new NextRequest('http://localhost/api/markets')
227
+ // Test error handling
228
+ })
229
+ })
230
+ ```
231
+
232
+ ### E2E Test Pattern (Playwright)
233
+ ```typescript
234
+ import { test, expect } from '@playwright/test'
235
+
236
+ test('user can search and filter markets', async ({ page }) => {
237
+ // Navigate to markets page
238
+ await page.goto('/')
239
+ await page.click('a[href="/markets"]')
240
+
241
+ // Verify page loaded
242
+ await expect(page.locator('h1')).toContainText('Markets')
243
+
244
+ // Search for markets
245
+ await page.fill('input[placeholder="Search markets"]', 'election')
246
+
247
+ // Wait for debounce and results
248
+ await page.waitForTimeout(600)
249
+
250
+ // Verify search results displayed
251
+ const results = page.locator('[data-testid="market-card"]')
252
+ await expect(results).toHaveCount(5, { timeout: 5000 })
253
+
254
+ // Verify results contain search term
255
+ const firstResult = results.first()
256
+ await expect(firstResult).toContainText('election', { ignoreCase: true })
257
+
258
+ // Filter by status
259
+ await page.click('button:has-text("Active")')
260
+
261
+ // Verify filtered results
262
+ await expect(results).toHaveCount(3)
263
+ })
264
+
265
+ test('user can create a new market', async ({ page }) => {
266
+ // Login first
267
+ await page.goto('/creator-dashboard')
268
+
269
+ // Fill market creation form
270
+ await page.fill('input[name="name"]', 'Test Market')
271
+ await page.fill('textarea[name="description"]', 'Test description')
272
+ await page.fill('input[name="endDate"]', '2025-12-31')
273
+
274
+ // Submit form
275
+ await page.click('button[type="submit"]')
276
+
277
+ // Verify success message
278
+ await expect(page.locator('text=Market created successfully')).toBeVisible()
279
+
280
+ // Verify redirect to market page
281
+ await expect(page).toHaveURL(/\/markets\/test-market/)
282
+ })
283
+ ```
284
+
285
+ ## Test File Organization
286
+
287
+ ```
288
+ src/
289
+ ├── components/
290
+ │ ├── Button/
291
+ │ │ ├── Button.tsx
292
+ │ │ ├── Button.test.tsx # Unit tests
293
+ │ │ └── Button.stories.tsx # Storybook
294
+ │ └── MarketCard/
295
+ │ ├── MarketCard.tsx
296
+ │ └── MarketCard.test.tsx
297
+ ├── app/
298
+ │ └── api/
299
+ │ └── markets/
300
+ │ ├── route.ts
301
+ │ └── route.test.ts # Integration tests
302
+ └── e2e/
303
+ ├── markets.spec.ts # E2E tests
304
+ ├���─ trading.spec.ts
305
+ └── auth.spec.ts
306
+ ```
307
+
308
+ ## Mocking External Services
309
+
310
+ ### Supabase Mock
311
+ ```typescript
312
+ jest.mock('@/lib/supabase', () => ({
313
+ supabase: {
314
+ from: jest.fn(() => ({
315
+ select: jest.fn(() => ({
316
+ eq: jest.fn(() => Promise.resolve({
317
+ data: [{ id: 1, name: 'Test Market' }],
318
+ error: null
319
+ }))
320
+ }))
321
+ }))
322
+ }
323
+ }))
324
+ ```
325
+
326
+ ### Redis Mock
327
+ ```typescript
328
+ jest.mock('@/lib/redis', () => ({
329
+ searchMarketsByVector: jest.fn(() => Promise.resolve([
330
+ { slug: 'test-market', similarity_score: 0.95 }
331
+ ])),
332
+ checkRedisHealth: jest.fn(() => Promise.resolve({ connected: true }))
333
+ }))
334
+ ```
335
+
336
+ ### OpenAI Mock
337
+ ```typescript
338
+ jest.mock('@/lib/openai', () => ({
339
+ generateEmbedding: jest.fn(() => Promise.resolve(
340
+ new Array(1536).fill(0.1) // Mock 1536-dim embedding
341
+ ))
342
+ }))
343
+ ```
344
+
345
+ ## Test Coverage Verification
346
+
347
+ ### Run Coverage Report
348
+ ```bash
349
+ npm run test:coverage
350
+ ```
351
+
352
+ ### Coverage Thresholds
353
+ ```json
354
+ {
355
+ "jest": {
356
+ "coverageThresholds": {
357
+ "global": {
358
+ "branches": 80,
359
+ "functions": 80,
360
+ "lines": 80,
361
+ "statements": 80
362
+ }
363
+ }
364
+ }
365
+ }
366
+ ```
367
+
368
+ ## Common Testing Mistakes to Avoid
369
+
370
+ ### FAIL: WRONG: Testing Implementation Details
371
+ ```typescript
372
+ // Don't test internal state
373
+ expect(component.state.count).toBe(5)
374
+ ```
375
+
376
+ ### PASS: CORRECT: Test User-Visible Behavior
377
+ ```typescript
378
+ // Test what users see
379
+ expect(screen.getByText('Count: 5')).toBeInTheDocument()
380
+ ```
381
+
382
+ ### FAIL: WRONG: Brittle Selectors
383
+ ```typescript
384
+ // Breaks easily
385
+ await page.click('.css-class-xyz')
386
+ ```
387
+
388
+ ### PASS: CORRECT: Semantic Selectors
389
+ ```typescript
390
+ // Resilient to changes
391
+ await page.click('button:has-text("Submit")')
392
+ await page.click('[data-testid="submit-button"]')
393
+ ```
394
+
395
+ ### FAIL: WRONG: No Test Isolation
396
+ ```typescript
397
+ // Tests depend on each other
398
+ test('creates user', () => { /* ... */ })
399
+ test('updates same user', () => { /* depends on previous test */ })
400
+ ```
401
+
402
+ ### PASS: CORRECT: Independent Tests
403
+ ```typescript
404
+ // Each test sets up its own data
405
+ test('creates user', () => {
406
+ const user = createTestUser()
407
+ // Test logic
408
+ })
409
+
410
+ test('updates user', () => {
411
+ const user = createTestUser()
412
+ // Update logic
413
+ })
414
+ ```
415
+
416
+ ## Continuous Testing
417
+
418
+ ### Watch Mode During Development
419
+ ```bash
420
+ npm test -- --watch
421
+ # Tests run automatically on file changes
422
+ ```
423
+
424
+ ### Pre-Commit Hook
425
+ ```bash
426
+ # Runs before every commit
427
+ npm test && npm run lint
428
+ ```
429
+
430
+ ### CI/CD Integration
431
+ ```yaml
432
+ # GitHub Actions
433
+ - name: Run Tests
434
+ run: npm test -- --coverage
435
+ - name: Upload Coverage
436
+ uses: codecov/codecov-action@v3
437
+ ```
438
+
439
+ ## Best Practices
440
+
441
+ 1. **Write Tests First** - Always TDD
442
+ 2. **One Assert Per Test** - Focus on single behavior
443
+ 3. **Descriptive Test Names** - Explain what's tested
444
+ 4. **Arrange-Act-Assert** - Clear test structure
445
+ 5. **Mock External Dependencies** - Isolate unit tests
446
+ 6. **Test Edge Cases** - Null, undefined, empty, large
447
+ 7. **Test Error Paths** - Not just happy paths
448
+ 8. **Keep Tests Fast** - Unit tests < 50ms each
449
+ 9. **Clean Up After Tests** - No side effects
450
+ 10. **Review Coverage Reports** - Identify gaps
451
+
452
+ ## Success Metrics
453
+
454
+ - 80%+ code coverage achieved
455
+ - All tests passing (green)
456
+ - No skipped or disabled tests
457
+ - Fast test execution (< 30s for unit tests)
458
+ - E2E tests cover critical user flows
459
+ - Tests catch bugs before production
460
+
461
+ ---
462
+
463
+ **Remember**: Tests are not optional. They are the safety net that enables confident refactoring, rapid development, and production reliability.
.agents/skills/test-driven-development/SKILL.md ADDED
@@ -0,0 +1,371 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ name: test-driven-development
3
+ description: Use when implementing any feature or bugfix, before writing implementation code
4
+ ---
5
+
6
+ # Test-Driven Development (TDD)
7
+
8
+ ## Overview
9
+
10
+ Write the test first. Watch it fail. Write minimal code to pass.
11
+
12
+ **Core principle:** If you didn't watch the test fail, you don't know if it tests the right thing.
13
+
14
+ **Violating the letter of the rules is violating the spirit of the rules.**
15
+
16
+ ## When to Use
17
+
18
+ **Always:**
19
+ - New features
20
+ - Bug fixes
21
+ - Refactoring
22
+ - Behavior changes
23
+
24
+ **Exceptions (ask your human partner):**
25
+ - Throwaway prototypes
26
+ - Generated code
27
+ - Configuration files
28
+
29
+ Thinking "skip TDD just this once"? Stop. That's rationalization.
30
+
31
+ ## The Iron Law
32
+
33
+ ```
34
+ NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST
35
+ ```
36
+
37
+ Write code before the test? Delete it. Start over.
38
+
39
+ **No exceptions:**
40
+ - Don't keep it as "reference"
41
+ - Don't "adapt" it while writing tests
42
+ - Don't look at it
43
+ - Delete means delete
44
+
45
+ Implement fresh from tests. Period.
46
+
47
+ ## Red-Green-Refactor
48
+
49
+ ```dot
50
+ digraph tdd_cycle {
51
+ rankdir=LR;
52
+ red [label="RED\nWrite failing test", shape=box, style=filled, fillcolor="#ffcccc"];
53
+ verify_red [label="Verify fails\ncorrectly", shape=diamond];
54
+ green [label="GREEN\nMinimal code", shape=box, style=filled, fillcolor="#ccffcc"];
55
+ verify_green [label="Verify passes\nAll green", shape=diamond];
56
+ refactor [label="REFACTOR\nClean up", shape=box, style=filled, fillcolor="#ccccff"];
57
+ next [label="Next", shape=ellipse];
58
+
59
+ red -> verify_red;
60
+ verify_red -> green [label="yes"];
61
+ verify_red -> red [label="wrong\nfailure"];
62
+ green -> verify_green;
63
+ verify_green -> refactor [label="yes"];
64
+ verify_green -> green [label="no"];
65
+ refactor -> verify_green [label="stay\ngreen"];
66
+ verify_green -> next;
67
+ next -> red;
68
+ }
69
+ ```
70
+
71
+ ### RED - Write Failing Test
72
+
73
+ Write one minimal test showing what should happen.
74
+
75
+ <Good>
76
+ ```typescript
77
+ test('retries failed operations 3 times', async () => {
78
+ let attempts = 0;
79
+ const operation = () => {
80
+ attempts++;
81
+ if (attempts < 3) throw new Error('fail');
82
+ return 'success';
83
+ };
84
+
85
+ const result = await retryOperation(operation);
86
+
87
+ expect(result).toBe('success');
88
+ expect(attempts).toBe(3);
89
+ });
90
+ ```
91
+ Clear name, tests real behavior, one thing
92
+ </Good>
93
+
94
+ <Bad>
95
+ ```typescript
96
+ test('retry works', async () => {
97
+ const mock = jest.fn()
98
+ .mockRejectedValueOnce(new Error())
99
+ .mockRejectedValueOnce(new Error())
100
+ .mockResolvedValueOnce('success');
101
+ await retryOperation(mock);
102
+ expect(mock).toHaveBeenCalledTimes(3);
103
+ });
104
+ ```
105
+ Vague name, tests mock not code
106
+ </Bad>
107
+
108
+ **Requirements:**
109
+ - One behavior
110
+ - Clear name
111
+ - Real code (no mocks unless unavoidable)
112
+
113
+ ### Verify RED - Watch It Fail
114
+
115
+ **MANDATORY. Never skip.**
116
+
117
+ ```bash
118
+ npm test path/to/test.test.ts
119
+ ```
120
+
121
+ Confirm:
122
+ - Test fails (not errors)
123
+ - Failure message is expected
124
+ - Fails because feature missing (not typos)
125
+
126
+ **Test passes?** You're testing existing behavior. Fix test.
127
+
128
+ **Test errors?** Fix error, re-run until it fails correctly.
129
+
130
+ ### GREEN - Minimal Code
131
+
132
+ Write simplest code to pass the test.
133
+
134
+ <Good>
135
+ ```typescript
136
+ async function retryOperation<T>(fn: () => Promise<T>): Promise<T> {
137
+ for (let i = 0; i < 3; i++) {
138
+ try {
139
+ return await fn();
140
+ } catch (e) {
141
+ if (i === 2) throw e;
142
+ }
143
+ }
144
+ throw new Error('unreachable');
145
+ }
146
+ ```
147
+ Just enough to pass
148
+ </Good>
149
+
150
+ <Bad>
151
+ ```typescript
152
+ async function retryOperation<T>(
153
+ fn: () => Promise<T>,
154
+ options?: {
155
+ maxRetries?: number;
156
+ backoff?: 'linear' | 'exponential';
157
+ onRetry?: (attempt: number) => void;
158
+ }
159
+ ): Promise<T> {
160
+ // YAGNI
161
+ }
162
+ ```
163
+ Over-engineered
164
+ </Bad>
165
+
166
+ Don't add features, refactor other code, or "improve" beyond the test.
167
+
168
+ ### Verify GREEN - Watch It Pass
169
+
170
+ **MANDATORY.**
171
+
172
+ ```bash
173
+ npm test path/to/test.test.ts
174
+ ```
175
+
176
+ Confirm:
177
+ - Test passes
178
+ - Other tests still pass
179
+ - Output pristine (no errors, warnings)
180
+
181
+ **Test fails?** Fix code, not test.
182
+
183
+ **Other tests fail?** Fix now.
184
+
185
+ ### REFACTOR - Clean Up
186
+
187
+ After green only:
188
+ - Remove duplication
189
+ - Improve names
190
+ - Extract helpers
191
+
192
+ Keep tests green. Don't add behavior.
193
+
194
+ ### Repeat
195
+
196
+ Next failing test for next feature.
197
+
198
+ ## Good Tests
199
+
200
+ | Quality | Good | Bad |
201
+ |---------|------|-----|
202
+ | **Minimal** | One thing. "and" in name? Split it. | `test('validates email and domain and whitespace')` |
203
+ | **Clear** | Name describes behavior | `test('test1')` |
204
+ | **Shows intent** | Demonstrates desired API | Obscures what code should do |
205
+
206
+ ## Why Order Matters
207
+
208
+ **"I'll write tests after to verify it works"**
209
+
210
+ Tests written after code pass immediately. Passing immediately proves nothing:
211
+ - Might test wrong thing
212
+ - Might test implementation, not behavior
213
+ - Might miss edge cases you forgot
214
+ - You never saw it catch the bug
215
+
216
+ Test-first forces you to see the test fail, proving it actually tests something.
217
+
218
+ **"I already manually tested all the edge cases"**
219
+
220
+ Manual testing is ad-hoc. You think you tested everything but:
221
+ - No record of what you tested
222
+ - Can't re-run when code changes
223
+ - Easy to forget cases under pressure
224
+ - "It worked when I tried it" ≠ comprehensive
225
+
226
+ Automated tests are systematic. They run the same way every time.
227
+
228
+ **"Deleting X hours of work is wasteful"**
229
+
230
+ Sunk cost fallacy. The time is already gone. Your choice now:
231
+ - Delete and rewrite with TDD (X more hours, high confidence)
232
+ - Keep it and add tests after (30 min, low confidence, likely bugs)
233
+
234
+ The "waste" is keeping code you can't trust. Working code without real tests is technical debt.
235
+
236
+ **"TDD is dogmatic, being pragmatic means adapting"**
237
+
238
+ TDD IS pragmatic:
239
+ - Finds bugs before commit (faster than debugging after)
240
+ - Prevents regressions (tests catch breaks immediately)
241
+ - Documents behavior (tests show how to use code)
242
+ - Enables refactoring (change freely, tests catch breaks)
243
+
244
+ "Pragmatic" shortcuts = debugging in production = slower.
245
+
246
+ **"Tests after achieve the same goals - it's spirit not ritual"**
247
+
248
+ No. Tests-after answer "What does this do?" Tests-first answer "What should this do?"
249
+
250
+ Tests-after are biased by your implementation. You test what you built, not what's required. You verify remembered edge cases, not discovered ones.
251
+
252
+ Tests-first force edge case discovery before implementing. Tests-after verify you remembered everything (you didn't).
253
+
254
+ 30 minutes of tests after ≠ TDD. You get coverage, lose proof tests work.
255
+
256
+ ## Common Rationalizations
257
+
258
+ | Excuse | Reality |
259
+ |--------|---------|
260
+ | "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
261
+ | "I'll test after" | Tests passing immediately prove nothing. |
262
+ | "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" |
263
+ | "Already manually tested" | Ad-hoc ≠ systematic. No record, can't re-run. |
264
+ | "Deleting X hours is wasteful" | Sunk cost fallacy. Keeping unverified code is technical debt. |
265
+ | "Keep as reference, write tests first" | You'll adapt it. That's testing after. Delete means delete. |
266
+ | "Need to explore first" | Fine. Throw away exploration, start with TDD. |
267
+ | "Test hard = design unclear" | Listen to test. Hard to test = hard to use. |
268
+ | "TDD will slow me down" | TDD faster than debugging. Pragmatic = test-first. |
269
+ | "Manual test faster" | Manual doesn't prove edge cases. You'll re-test every change. |
270
+ | "Existing code has no tests" | You're improving it. Add tests for existing code. |
271
+
272
+ ## Red Flags - STOP and Start Over
273
+
274
+ - Code before test
275
+ - Test after implementation
276
+ - Test passes immediately
277
+ - Can't explain why test failed
278
+ - Tests added "later"
279
+ - Rationalizing "just this once"
280
+ - "I already manually tested it"
281
+ - "Tests after achieve the same purpose"
282
+ - "It's about spirit not ritual"
283
+ - "Keep as reference" or "adapt existing code"
284
+ - "Already spent X hours, deleting is wasteful"
285
+ - "TDD is dogmatic, I'm being pragmatic"
286
+ - "This is different because..."
287
+
288
+ **All of these mean: Delete code. Start over with TDD.**
289
+
290
+ ## Example: Bug Fix
291
+
292
+ **Bug:** Empty email accepted
293
+
294
+ **RED**
295
+ ```typescript
296
+ test('rejects empty email', async () => {
297
+ const result = await submitForm({ email: '' });
298
+ expect(result.error).toBe('Email required');
299
+ });
300
+ ```
301
+
302
+ **Verify RED**
303
+ ```bash
304
+ $ npm test
305
+ FAIL: expected 'Email required', got undefined
306
+ ```
307
+
308
+ **GREEN**
309
+ ```typescript
310
+ function submitForm(data: FormData) {
311
+ if (!data.email?.trim()) {
312
+ return { error: 'Email required' };
313
+ }
314
+ // ...
315
+ }
316
+ ```
317
+
318
+ **Verify GREEN**
319
+ ```bash
320
+ $ npm test
321
+ PASS
322
+ ```
323
+
324
+ **REFACTOR**
325
+ Extract validation for multiple fields if needed.
326
+
327
+ ## Verification Checklist
328
+
329
+ Before marking work complete:
330
+
331
+ - [ ] Every new function/method has a test
332
+ - [ ] Watched each test fail before implementing
333
+ - [ ] Each test failed for expected reason (feature missing, not typo)
334
+ - [ ] Wrote minimal code to pass each test
335
+ - [ ] All tests pass
336
+ - [ ] Output pristine (no errors, warnings)
337
+ - [ ] Tests use real code (mocks only if unavoidable)
338
+ - [ ] Edge cases and errors covered
339
+
340
+ Can't check all boxes? You skipped TDD. Start over.
341
+
342
+ ## When Stuck
343
+
344
+ | Problem | Solution |
345
+ |---------|----------|
346
+ | Don't know how to test | Write wished-for API. Write assertion first. Ask your human partner. |
347
+ | Test too complicated | Design too complicated. Simplify interface. |
348
+ | Must mock everything | Code too coupled. Use dependency injection. |
349
+ | Test setup huge | Extract helpers. Still complex? Simplify design. |
350
+
351
+ ## Debugging Integration
352
+
353
+ Bug found? Write failing test reproducing it. Follow TDD cycle. Test proves fix and prevents regression.
354
+
355
+ Never fix bugs without a test.
356
+
357
+ ## Testing Anti-Patterns
358
+
359
+ When adding mocks or test utilities, read @testing-anti-patterns.md to avoid common pitfalls:
360
+ - Testing mock behavior instead of real behavior
361
+ - Adding test-only methods to production classes
362
+ - Mocking without understanding dependencies
363
+
364
+ ## Final Rule
365
+
366
+ ```
367
+ Production code → test exists and failed first
368
+ Otherwise → not TDD
369
+ ```
370
+
371
+ No exceptions without your human partner's permission.