Design an AI Agent Platform — System Design Practice | AmanAI Lab

Problems›Design an AI Agent Platform

OpenAIAnthGoogleMSFT

45:00

Hard

The Problem

Problem

Design a platform that runs autonomous AI agents which plan multi-step tasks, call tools, and act on a user's behalf (e.g. "research X and draft a report", or "monitor my inbox and reply to routine emails").

The platform should support:

•A planning/reasoning loop (the agent decides the next action each step)
•A tool registry (web search, code execution, API calls, file I/O)
•Short-term (working) and long-term (persistent) memory
•Long-running tasks lasting minutes to hours, with human-approval checkpoints
•Thousands of agents running concurrently

What you'll be assessed on

The agent execution loop, tool orchestration & safety, state/memory management, handling long-running and failing tasks, and cost/loop control.

Scale & Constraints

▸100K concurrent agent runs at peak
▸A single task may span 5–50 reasoning steps over minutes to hours
▸Each step is an LLM call (P95 ≤ 3s) plus 0–N tool calls
▸Agents must be pausable/resumable and survive a worker crash
▸Hard caps per run: max steps, max tokens, max wall-clock, max $ spend
▸Tool calls that mutate external state require an audit trail

Must Cover0/11

Hints (if stuck)

💡 Model a run as a durable state machine (Temporal, or a steps table in Postgres) so any step can resume after failure.

💡 Decouple the reasoning loop from tool execution with a queue — tools run on isolated workers.

💡 Keep the full step history; long context is summarised, but the durable log is the source of truth for replay.

💡 Always enforce hard budgets (steps, tokens, $, time) — runaway agent loops are the #1 cost risk.

💡 Treat every tool the agent can call as an attack surface: scope permissions and validate arguments.

0 words · auto-saved

Back to problems

Mock interviewTake quizFlashcards

Problem

The platform should support:

•A planning/reasoning loop (the agent decides the next action each step)

•A tool registry (web search, code execution, API calls, file I/O)

•Short-term (working) and long-term (persistent) memory

•Long-running tasks lasting minutes to hours, with human-approval checkpoints

•Thousands of agents running concurrently

What you'll be assessed on