A static AI agent performs at the same level on task 1,000 as it did on task 1. A self-evolving agent reaches 91 percent accuracy by task 1,000 because it rewrites its own instructions after every failure. Same model. Different architecture.

You deployed an AI agent. It works well enough. But every few days it makes the same type of mistake. It misformats a report. It misclassifies an email. It uses the wrong tone in a customer response. You fix it manually each time, but the agent never learns from the correction.

The problem is that your agent has no feedback loop. Its system prompt is frozen at the moment you wrote it. It cannot observe its own failures, analyze what went wrong, or update its instructions. Every mistake it makes today, it will make again tomorrow.

$60/mo

system cost

$10,000/mo

manual cost replaced

99.4%

cost reduction

The stack

The self-evolution architecture adds three components to any existing agent. Component 1 (Observer): After every task, a separate evaluation agent scores the output on predefined quality dimensions: accuracy, tone, format compliance, and completeness.

Component 2 (Analyzer): When a task scores below threshold, the analyzer agent examines the failure, identifies the root cause (missing context, ambiguous instruction, edge case not covered), and generates a specific prompt amendment.

Component 3 (Updater): The amendment is appended to the agent's system prompt as a new rule. The next time the agent encounters a similar task, the updated instruction prevents the same failure. Over hundreds of tasks, the system prompt evolves from a generic instruction set into a battle-tested rulebook that covers every edge case your business encounters.

ClaudeThe evolving agent

Executes tasks using a system prompt that grows and refines over time. Each evolution cycle adds specificity to the instructions without increasing ambiguity.

UltronThe evolution controller

Manages the feedback loop: triggers the observer after each task, routes failures to the analyzer, validates proposed amendments, and applies approved changes to the system prompt.

ultron.sh/agents

SupabaseThe evolution log

Stores every task output, quality score, failure analysis, and prompt amendment. Creates a complete audit trail of how the agent evolved and why each rule was added.

What it replaces

2 line items, starting with the QA reviewer, priced against the tools that now do the work. The last bar is the whole system at $60/mo.

$4,000/mo

QA Reviewer, now Observer Agent

$6,000/mo

Prompt Engineer (ongoing), now Self-Evolution Loop

$60/mo

The whole system

Monthly cost of each role the system replaces, against the system itself.

Why it holds

Everyone can buy Claude. What separates the setups that last from the ones that collapse is one idea.

The counterintuitive insight is that self-evolving agents should start with intentionally minimal system prompts. A 50-word starting prompt forces the agent to fail early and often, which generates a rich stream of feedback for the evolution loop. After 500 tasks, that 50-word prompt has grown into a 2,000-word battle-tested instruction set that covers edge cases you would never have anticipated manually.

What is inside

This is not theory. 3 pieces, ready to run.

In this playbook

2 of 3

Observer quality rubric

Analyzer root-cause templates

Enable evolution on your first agent

Unlock

How it's built

The file tree, so you know exactly what you would be standing up.

System files

evolution/: observer_agent.mdanalyzer_agent.mdupdater_agent.mdquality_rubric.json
logs/: evolution_history.tsprompt_versions.jsonperformance_metrics.ts

One rule to leave with, the one that stops the QA reviewer from creeping back into the budget.

Stop rewriting your prompts manually. Build agents that rewrite themselves.

The numbers above trace back to the Self-Improving Agent Systems Research, not projections.

Self-Improving Agent Systems Research

You can wire Claude and the rest of this stack by hand from the playbook above. Or you skip the assembly, because standing up systems like this is exactly what Ultron does.

$10,000

is what this system replaces every month. Ultron runs it for $60/mo.

No card required. Set it up in about ten minutes.

How to build AI agents that improve themselves

The stack

What it replaces

Why it holds

What is inside

In this playbook

How it's built

Keep reading

Build a bedtime story app in your own voice with Ultron

5 Ultron skills every beginner should install

Email Agent

Build a bedtime story app in your own voice with Ultron

5 Ultron skills every beginner should install

Email Agent