Exploring the
frontier of technology.
A digital garden for thoughts on AI, software architecture, and the future of human-computer interaction.
Building the 'Eyes' of an AI Agent: Deep Dive into Chrome DevTools Protocol & Accessibility Tree
How does an AI agent 'see' a web page? This post dissects the 700+ line implementation that enables reliable page understanding through Accessibility Tree scanning, iframe handling, and multi-strategy fallbacks—the foundation of any browser automation agent.
Context Management in Practice: Implementing AgentFold for Long-Horizon Browser Agents
Long-horizon web agents face a fundamental challenge: context saturation. This post details our implementation of AgentFold-inspired context management, demonstrating how proactive folding operations enable agents to maintain focus and efficiency across hundreds of interaction steps.
The Anatomy of an Agentic Browser Extension: System Architecture and Design Principles
Modern LLMs can talk, but an agentic product must act. This post dissects the architectural gap between a chatbot and a functional agent, exploring the distributed system principles—Execution, Orchestration, and Control—required to build reliable browser automation.