Role & Scope
Solo developer responsible for the full stack: plugin architecture for Cursor IDE and Claude Cowork, the video generation pipeline (Playwright capture, ElevenLabs narration, Remotion rendering), multi-platform output delivery, and the non-interactive script approval workflow.
Architecture & Design
Designed the local-first plugin model, the voice-locking pattern, and the per-platform rendering pipeline from scratch.
Implementation
Built the full codebase analysis engine, script generator, Playwright capture layer, ElevenLabs integration, and Remotion video renderer.
Video Pipeline
Engineered the end-to-end flow from codebase to rendered MP4 — including scene transitions, audio sync, and per-platform timing.
Multi-Platform Delivery
Built separate rendering pipelines for 7 platforms with optimized dimensions, scripts, and narration per target (Instagram, TikTok, Twitter, GitHub, Product Hunt, and more).
Constraints
Built within tight technical boundaries — every constraint shaped the architecture.
Local-First
No cloud uploads, no SaaS dependencies. All video generation runs entirely on the user’s machine. Data never leaves the local filesystem.
User-Owned API Keys
Users provide their own ElevenLabs and Google Veo 3 keys. No shared credentials, no vendor lock-in, no usage tracking.
Bash 3.2 Compatibility
Must work on macOS default shell — no associative arrays, no modern Bash features. Scripts use POSIX-compatible patterns throughout.
Non-Interactive Execution
No terminal prompts allowed. The plugin runs inside IDE and Cowork contexts where read commands and $EDITOR don’t work. Script approval happens through the plugin UI instead.
Dual Plugin Targets
Must function as both a Cursor IDE plugin and a Claude Cowork plugin — same codebase, two different runtime environments with different capabilities.
Tradeoffs
The biggest design decision: single video vs. per-platform pipeline.
Single Video vs. Per-Platform Pipeline
The initial approach generated one full-length video and used FFmpeg truncation for shorter platforms. Quick to build, but every platform got the same content cropped to fit.
The final architecture runs a complete per-platform pipeline: each platform gets its own narration script, audio generation, and Remotion render with correct dimensions. Instagram and TikTok get vertical 1080×1920. Twitter gets a punchy 30-second cut. GitHub gets the full technical walkthrough.
This means 7× the generation time and API calls, but the quality difference is significant — each platform gets content optimized for its audience, timing, and screen size.
Risks & Mitigations
Each risk was identified during development and patched before shipping.
Voice Inconsistency Across Scenes
Risk: ElevenLabs Voice Design API generates a new random voice per call. Every scene sounded like a different person — jarring and unprofessional across a multi-scene demo.
Mitigation: Implemented a voice-locking pattern. Voice Design runs once to generate a voice, the generated_voice_id is captured and saved to voice-lock.json, then all subsequent scenes across all platforms use standard TTS with that locked ID. Delete voice-lock.json to regenerate.
Each scene calls Voice Design → random voice per scene → inconsistent narrator
Voice Design once → lock ID to voice-lock.json → consistent voice across all scenes and platforms
Security Design
Security was a primary design constraint, not an afterthought.
No Cloud Uploads
Video generation, codebase analysis, and rendering all happen locally. No source code, screenshots, or video files are uploaded to any external service.
User-Owned API Keys
API keys for ElevenLabs and Google Veo 3 are stored locally and provided by the user. No shared API accounts, no credential proxying.
No Data Leaves the Machine
The only external calls are to ElevenLabs for voice synthesis and Google for AI clips — using the user’s own keys. All intermediate artifacts (scripts, screenshots, audio, video) stay on the local filesystem.
Content-Security-Policy
Generated HTML artifacts include strict CSP headers — no inline scripts from external sources, no remote image loading, no third-party tracking.
Architecture
The system maps directly to its component model. Each concern has a dedicated layer.
Iteration Timeline
Three key iterations shaped the final product — each solving problems that only surfaced at runtime.
Click an event to expand details.
Commit History
Visual Evidence
Product demo generated by Demo Maker.