Blog

My blog posts and technical notes.

Posts

Agent Tool Calling Thinks — Examining CLI, MCP, and code-centric interfaces as tools for LLM agents. Which interface fits which task?
Agentic Agent: Exploit Versatile Agent Capabilities via Predefined Agent — How to recover general-purpose LLM capabilities through a restricted, single predefined agent API.
Single-Tool Agent via BusyBox — What if we reduced the entire agent toolbox to one tool? An open-source implementation is available at BusyAgent.
Memory Systems Are Not Learning — Retrieval is not learning. Two arguments for why the distinction matters in the post-scaling era.
LLMs, Knowledge Graphs, and the Fine-Tuning Landscape — On the incompatibility of LLMs and KGs, reasoning model fine-tuning hazards, CoT collapse, and why synthetic data is the path forward for small labs.
Tracking Word Meaning Evolution in Vector Space — Can we remove mean pooling from embedding models to track how individual words shift in vector space over historical corpora?
Entropy-Gated Model Switching — Use a small model by default, switch to a large model when next-token entropy is high. A principled way to balance speed and quality.
Special System Prompt Token — What if an entire character persona could be encoded into a single token? On the concept behind arxiv.org/html/2511.23271v1.
40 Tokens per Second, Zero Words: Debugging a Gemma4 Vision Overflow in vLLM — A multimodal model was decoding normally, consuming KV cache, and returning nothing. The bug turned out to be one FP16 infinity per image patch.
When a Tool Schema Is Visible but the Model Still Calls bash — Investigating DeepSeek-V4-Flash tool-schema grounding failures, poisoned tool-call history, and a prompt-level mitigation for DSML agents.