Archive — Page 7 — Pratik Dhanave

March 31, 2026 · 2 min read

Ardan Ultimate AI #24 — A hardened shell tool with RBAC

Giving an LLM a `run_command` tool is convenient and terrifying. The hardened version: allow-listed binaries, argument scrubbing, RBAC per user, audit per invocation.

Ardan LabsGoSecurityAgents

March 30, 2026 · 2 min read

Ardan Ultimate AI #23 — Direct and indirect prompt injection, plus defenses

The single biggest LLM security risk. The example walks through both forms (direct from user input, indirect via retrieved content) and the layered defenses: system prompt isolation, content classification, output validation, structured tool schemas.

Ardan LabsGoSecurityPrompt Injection

March 29, 2026 · 2 min read

Ardan Ultimate AI #22 — Cascading model router (cheap first, expensive on miss)

Most queries are simple. A cascading router tries a small/fast/cheap model first; if confidence is low or the task is hard, it escalates to a larger one. Costs collapse without hurting quality.

Ardan LabsGoLLM OpsCost Optimisation

March 28, 2026 · 2 min read

Ardan Ultimate AI #21 — Adaptive retrieval (decide whether to RAG at all)

Not every question needs retrieval. A classifier gates RAG: chat or general knowledge questions skip it; factual or document-grounded questions trigger it. Saves latency and tokens on the simple half of queries.

Ardan LabsGoRAGCost Optimisation

March 27, 2026 · 2 min read

Ardan Ultimate AI #20 — Embedding-based semantic cache

Exact-match caching misses paraphrases. "What is the refund policy?" and "How do refunds work?" should both hit the same cached answer. Semantic cache embeds queries and matches by similarity.

Ardan LabsGoCachingCost Optimisation

March 26, 2026 · 2 min read

Ardan Ultimate AI #19 — Speculative decoding with a draft model

Run a small draft model to predict several tokens at once; verify them in a single pass with the large model. Latency drops without quality dropping. The technique production LLM serving uses but most application engineers don't see.

Ardan LabsGoLLM OpsPerformance

March 25, 2026 · 2 min read

Ardan Ultimate AI #18 — Incremental message caching (IMC) for chat

A long chat reprocesses the entire history on every turn. Prefix caching lets the LLM serve the cached KV-cache prefix from the previous turn and only compute the new suffix. Massive latency win on long conversations.

Ardan LabsGoLLM OpsPerformance

March 24, 2026 · 2 min read

Ardan Ultimate AI #17 — Building an agent over an MCP server

Model Context Protocol standardises tool calling across LLMs. The example builds both sides: an MCP server exposing tools, and an agent that calls them. Works the same against any MCP-compatible LLM.

Ardan LabsGoMCPAgents

March 23, 2026 · 2 min read

Ardan Ultimate AI #16 — Tool hardening: panic recovery and per-tool timeouts

A panicking tool kills the agent loop. A slow tool blocks the loop forever. The example shows the boring-but-essential wrappers: recover, deadlines, structured errors.

Ardan LabsGoAgentsReliability

March 22, 2026 · 2 min read

Ardan Ultimate AI #15 — A read-only NL→SQL tool

Give an LLM a SQL tool, watch it write delete statements. The read-only version: parse the generated SQL, refuse anything that isn't SELECT, validate against an allow-listed schema, run with a strict timeout.

Ardan LabsGoSQLAgentsSecurity