About
Bio
I'm Juan Manuel Infante Quiroga, an AI developer focused on Generative AI, NLP, and AIOps based in Bogotá, Colombia. I build production AI systems for the layer most tutorials stop before reaching. At Finanzauto, I architected a RAG platform that processes hundreds of thousands of WhatsApp conversations daily — 1,000+ concurrent users, automated collections, real-time customer support.
Next Token is the documentation layer for that work — the orchestration decisions, the infrastructure tradeoffs, and the failure modes that don't appear in prompt engineering posts. Long-form articles when an idea earns the depth, field notes when it doesn't, rendered notebooks when the code is the point.
What I Work On
The work centers on four problems that production consistently makes hard.
RAG pipelines are the first. Retrieval sounds mechanical until you're assembling context at scale — deciding what enters the window, in what order, while keeping latency acceptable when hundreds of requests arrive simultaneously.
Orchestration is the second. LangGraph workflows, Twilio-integrated callbots, Airflow pipelines that need to survive restarts: orchestration is where correctness gets expensive. The challenge isn't wiring the pieces together — it's keeping them coherent when one fails.
NLP and multimodal agents are the third. Combining speech recognition with LLM reasoning introduces latency constraints that pure-text systems never face. The decisions at that layer — how to buffer, how to recover from a misrecognition mid-conversation — don't appear in any chatbot tutorial.
AIOps is the fourth. Knowing whether a system that handled 300,000 messages last week will handle next week's is the problem that makes every production deployment honest. Evaluation frameworks, observability pipelines, and the CI/CD discipline that turns a working model into a reliable service.