Building A Self Evolving Agent

This disrupted my whole thinking on AI

Mar 18, 2026

This whole project started because I read a post about a 200 line coding project that let an AI agent have access to its source, could add the features it wanted according to its goals, and recompile itself.

It evolved.

That one simple post sent me down a complete rabbit hole, I built a complete agent ecosystem where agents comminate with each other, build their own tools, have shared knowledge, rank themselves and more. All in a secure environment.

… and when I set it free, I don’t think I was quite prepared for what it could do.

In the next post I am going to be sharing what it built, some of its journal entries, and I think you will be quite surprised by what it can do, but for now, let me share some of its features.

This whole weekend project changed my entire thinking on agents and how useful they are. In-fact, it totally disrupted my whole thinking on AI.

What Is Avi?

Avi is a self-hosted multi-agent platform built in Rust that runs on PC, Mac, or Linux. Each agent is an autonomous AI persona with its own identity, goals, memory, tactics, and schedule. Agents run evolution cycles, researching, producing output, improving their own tools, and journaling their progress, all without human intervention.

You create agents with a one-line prompt. The AI expands it into a full persona. Then you watch them work.

Core Features

🤖 Multi-Agent Architecture

Run as many agents as you need, each with distinct personas and missions. A strategic intelligence analyst, a cybersecurity sentinel, a content writer, a system developer, all running independently with their own:

Identity: Detailed persona defining voice, capabilities, and operating principles
Goals: Creator-defined priorities decomposed into trackable sub-goals with progress bars
Journal: Persistent memory of every session, with archiving and context budget management
Learnings: Accumulated insights extracted from experience
Tactics: Self-maintained playbook of approaches that work (TACTICS.md)
Output: Structured reports, briefings, and deliverables per cycle
State: Session continuity via state.json (focus, threads, blockers, dimensional self-grades)

🧠 AI-Powered Agent Creation

Create a new agent from a single sentence. Avi calls the AI to expand your prompt into a complete persona with detailed identity, capabilities, voice, and initial goals, ready to run immediately.

The web dashboard includes a creation wizard with pre-built templates (Code Reviewer, Market Analyst, Content Writer, Cybersecurity Sentinel) and a “Create from Scratch” option.

⏱️ Flexible Scheduling

Each agent has a configurable schedule using human-readable expressions:

"every 3h" run every 3 hours
"every 30m" run every 30 minutes
"daily at 09:00" run once per day
"on-demand" manual only, never auto-triggers

Agents can also request their own scheduling.

🔄 Evolution Cycles

Each cycle follows a structured 4-phase process:

Research — Agents use tools (curl, scripts, APIs) to gather information
Output — Produce a focused deliverable (report, analysis, briefing)
Self-Improvement — Build better tools, refine techniques, update tactics
Journal & Commit — Record observations, commit changes, push to Git

🎯 Cycle Types

Agents choose their cycle intensity each run:

Deep: Full research + analysis + output. For new data or important findings.
Maintenance: Quick scan, update goals, check messages. For steady-state periods.
Skip: Emit a one-line status and yield. When nothing has changed.

🎯 Goal Decomposition

Agents decompose their INPUT.md priorities into trackable sub-goals in GOALS.md. The web dashboard shows:

Color-coded progress bars (green ≥75%, amber ≥40%, red <40%)
Sub-task completion counts
Per-goal breakdown with expandable subtask lists

📡 Inter-Agent Communication

Agents aren’t isolated. They share a bulletin board for broadcasting messages and can read other agents’ output for cross-pollination. They also have:

Directed messaging: Send messages to specific agents via the API
Threaded conversations: Multi-turn exchanges in agents/shared/threads/
Cross-agent knowledge graph: Typed entities with relationships shared across all agents

🌐 Cross-Agent Knowledge Graph

Agents contribute structured entities to a shared knowledge base at agents/shared/knowledge/:

Typed entities (technology, threat, vulnerability, trend, organization, concept)
Typed relationships between entities
Visual graph in the System tab with color-coded cards
Agents reference and build on each other’s contributed knowledge

🛡️ Safety & Sandboxing

File whitelist: Each agent can only modify files within its allowed paths
Protected files: Critical files can never be changed
Build verification: Every code change must pass build + tests before committing
Automatic revert: Failed builds trigger automatic source rollback
Session timeout: Hung processes are killed and retried automatically

📊 Context Budget Management

Agents don’t lose their minds as journals grow. The system automatically:

Keeps the last 3 journal entries in full detail
Summarizes older entries to single-line references
Archives complete history for reference (toggleable in the web dashboard)
Prevents context window overflow

🧠 Session Continuity

Agents save their thinking state at the end of each cycle, current focus, open threads, blockers, confidence level, cycle type, dimensional self-grades, expertise tags, and scheduling hints. Next cycle, they read it back and pick up where they left off.

🔧 Skills System

Global skills in skills/ — available to all agents
Per-agent skills in agents/<name>/skills/ — agent-specific tools
Auto-generated manifest (skills/MANIFEST.md) — agents know what’s available
Skills include: journal writing, code review, documentation generation, test generation, vision analysis, memory search, and more

📋 Skill Request Pipeline

Agents can request new capabilities they can’t build themselves:

Agent writes a request to agents/shared/requests/ with type, priority, and justification
Self-filter rules prevent lazy requests, agents must explain why they can’t build it themselves
You review requests in the dashboard Approvals tab, see who’s asking, what they need, and why
Approve or Deny with one click
Approved requests route to the system developer agent as a task

📈 Reputation & Trust System

Each agent has a computed reputation score (0-100) based on:

Grade history average (contributes up to 70 points)
Grade consistency (contributes up to 30 points)
Displayed as a color-coded progress bar in the dashboard

📊 Dimensional Self-Assessment

Agents rate themselves across 5 dimensions each cycle:
Depth: How thoroughly the topic was explored
Relevance: How relevant to current priorities
Novelty: How much new information vs. repetition
Actionability: How useful and actionable the findings
Quality: Overall writing and analysis quality

Displayed as horizontal bar charts in the web dashboard overview.

🏷️ Expertise Tags

Agents declare their areas of expertise via expertise_tags (e.g. ["threat-intelligence", "content creator"]). Displayed as branded pill badges in the dashboard.

👍 Creator Reactions

React to any agent’s output with 👍 (useful) or 👎 (needs improvement) directly from the Output tab. Reactions are saved and fed back into the agent’s next cycle prompt — closing the feedback loop.

📝 Self-Modification (TACTICS.md)

Agents maintain a self-evolved playbook called TACTICS.md where they record:

Approaches that produced high-graded outputs
Patterns the creator responded positively to
What NOT to do (approaches that led to low grades)
Preferred output format and style

📈 Growth Narrative

Each agent gets an auto-computed growth summary showing: cycle count, average grade, trend direction (↑/↓/steady), and creator reaction counts. Displayed in the Overview tab.

⏰ Temporal Awareness

Agents know when they last ran and can see time-since-last-run. They can request specific scheduling:

"15m", "30m", "1h", "2h", "4h" — request to run again soon
"skip_next" — ask to be skipped next cycle

🛟 Failure Recovery

Prompt-level instructions ensure agents gracefully handle failures:

Unreachable data sources → use cached data and note it
Failed approaches → try alternatives instead of repeating
Can’t produce full output → produce status update instead of nothing
Blocked → set blocked_on in state.json with clear explanation

⚡ Concurrent Evolution

Multiple agents can evolve simultaneously. The scheduler launches all agents whose timers have expired, no waiting for one to finish before starting the next.

📝 Output Auto-Grading

After each cycle, the system automatically scores the agent’s output on a 1-10 scale using a lightweight LLM. Grades are tracked and displayed on the dashboard with trend sparklines.

📈 Adaptive Learning Loop

The system tracks output quality grades over time: if an agent’s grades are declining, it auto-injects corrective feedback. If improving, it reinforces the current approach. Agents self-correct without creator intervention.

🔔 Event-Based Triggers

Agents can respond to triggers such as being messaged by another agent, or an agent providing more information to its knowledge graph.

💰 Cost Tracking

Per-cycle costs are logged for each agent. Track spending over time and identify expensive cycles.

🔍 Memory Search

Agents can search their own past journal, learnings, scratchpad, and outputs.

Discussion about this post

Ready for more?