#Ardan Labs

Posts about ardan labs. ← All posts

A2AADKAGTAIAI GovernanceAIGPAMLAPI DesignAWSAadhaarAccountingAgentsAnomaly DetectionArchitectureArdan LabsAuditAudit LogAzureBCPBankingBedrockBenchmarksBhashiniBigQueryCRAGCachingCareerCase StudyClinical Decision SupportCloud ArchitectureCloud KMSCloud RunCoding AgentsCommunicationComplianceConcurrencyConfigCost OptimisationCryptographyCultureCures ActDSLData ResidencyDatabase DesignDatabase MigrationDatabase SecurityDataflowDatastreamDebuggingDeploymentDesign PatternDevOpsDeveloper ExperienceDevice FlowDistributed SystemsDoclingElevenLabsEmbeddingsEngineeringEntity ResolutionEnvoyEvaluationFHIRFREE-AIFinOpsFinTechFoundationsFraudGCPGDPRGKEGOMEMLIMITGSoCGeminiGenieGitHubGoGo 1.23GoMLXGoogle CloudGoogle Cloud NextGovernanceGrafanaGraphQLGraphRAGHIPAAHITLHL7 v2Healthcare ITHyDEIAPPISO 27001IdempotencyIdentity FederationIncident ResponseIndic LanguagesIngestionIntegrationJWTJupyterKMSKYCKafkaKnowledge GraphKubernetesLLMLLM OpsLLM-as-JudgeLatencyLendingLessons LearnedLocal AILoggingMAFMARAMCPML EngineeringMagenticMemoryMentorshipMicroservicesMiddlewareMigrationMulti-AgentMulti-Agent AIMulti-CloudMulti-LanguageMultilingualNPCINetworkingOAuthOPAOTelOWASPObservabilityOllamaOpen BankingOpen SourceOpenTelemetryOperationsOperatorsOpinionOrchestrationPAMPCSEPDFPKCEPasskeysPatternsPaymentsPerformancePipelinePolicyPolicy as CodePostgreSQLPrivacy EngineeringProductionPrometheusPrompt InjectionPromptingProtocolsProvider AbstractionPub/SubPythonRAGRBACRBIREPLRFC 8693ReactRedisRefactorRegistryRegulationReliabilityReservationsResilienceRetrievalRetrospectiveSAMLSLOSOC 2SPIFFESPIRESQLSRESSESagaSaudi ArabiaSchemaSecuritySecurity Command CenterSelf-RAGService MeshSoftware ArchitectureSpannerSpeakingState ManagementStdlibStorageStreamingTata GroupTerraformTestingTier PromotionToken BudgetingTool CallingToolsUAEUPIUXVectorsVertex AIVideoVisionVoice AIVotingWebAuthnWhisperWorkflowWorkflowsWorkload IdentityWorkload Identity FederationWritingZero-Trustembed.FSerrgroupgRPCiter.SeqmTLSpgvectorslog
· Engineering

Ardan Ultimate AI #30 — PDF extraction with Docling + LLM

PDFs are the format that breaks every RAG pipeline. Docling is the IBM-research extractor that handles layout, tables, and figures. The example wires Docling + LLM to make PDFs usable.

· Engineering

Ardan Ultimate AI #25 — Poisoned-document attacks on RAG and defenses

A RAG pipeline that ingests user-supplied documents is a prompt-injection vector. An attacker uploads a document with hidden instructions; the LLM retrieves it and follows them. Defense: input filtering, content classification, output verification.

· Engineering

Ardan Ultimate AI #23 — Direct and indirect prompt injection, plus defenses

The single biggest LLM security risk. The example walks through both forms (direct from user input, indirect via retrieved content) and the layered defenses: system prompt isolation, content classification, output validation, structured tool schemas.

· Engineering

Ardan Ultimate AI #20 — Embedding-based semantic cache

Exact-match caching misses paraphrases. "What is the refund policy?" and "How do refunds work?" should both hit the same cached answer. Semantic cache embeds queries and matches by similarity.

· Engineering

Ardan Ultimate AI #19 — Speculative decoding with a draft model

Run a small draft model to predict several tokens at once; verify them in a single pass with the large model. Latency drops without quality dropping. The technique production LLM serving uses but most application engineers don't see.

· Engineering

Ardan Ultimate AI #18 — Incremental message caching (IMC) for chat

A long chat reprocesses the entire history on every turn. Prefix caching lets the LLM serve the cached KV-cache prefix from the previous turn and only compute the new suffix. Massive latency win on long conversations.

· Engineering

Ardan Ultimate AI #17 — Building an agent over an MCP server

Model Context Protocol standardises tool calling across LLMs. The example builds both sides: an MCP server exposing tools, and an agent that calls them. Works the same against any MCP-compatible LLM.

· Engineering

Ardan Ultimate AI #15 — A read-only NL→SQL tool

Give an LLM a SQL tool, watch it write delete statements. The read-only version: parse the generated SQL, refuse anything that isn't SELECT, validate against an allow-listed schema, run with a strict timeout.

· Engineering

Ardan Ultimate AI #14 — A streaming agent with a reasoning panel

Stream the agent's reasoning and tool calls to the UI as they happen. The user sees "thinking about X, calling tool Y, got result Z, now answering..." — dramatically better UX than waiting for the final answer.

· Engineering

Ardan Ultimate AI #13 — A minimal multi-tool agent loop

The smallest possible multi-tool agent. The loop is 30 lines of Go and shows exactly what an "agent" is — there's no magic, just a structured back-and-forth between the LLM and a set of tools until the model says stop.

· Engineering

Ardan Ultimate AI #12 — Two-phase tool calling explained

The tool-calling dance: the LLM emits a structured tool call → application runs the tool → application appends the result → the LLM uses it in the next turn. Two phases. Everything else is detail.

· Engineering

Ardan Ultimate AI #09 — Debugging retrieval in isolation (K and threshold)

When RAG gives wrong answers, the problem is usually retrieval, not the LLM. The example isolates the retrieval step so you can see exactly what chunks come back for a given query, with what scores, and tune K and the similarity threshold accordingly.

· Engineering

Ardan Ultimate AI #05 — The same question with and without RAG

Side-by-side comparison: ask the LLM a domain question with no context, then ask with retrieved context. The without-RAG answer is plausible nonsense. The with-RAG answer is correct. The example that motivates everything else in the course.

· Engineering

Ardan Ultimate AI #03 — Context injection into a prompt

Before RAG and tools, the original way to give an LLM extra information was to inject it into the prompt. The example shows the right way to format injected context and what the LLM does (and doesn't) pay attention to.

· Engineering

Ardan Ultimate AI #02 — LLM-generated embeddings

Hand-crafting vectors stops scaling at about 10 dimensions. LLM-generated embeddings give you a 1024-dim vector that captures semantic meaning. The example shows how to generate them and what they're good for.