Self-Improving Agent

A unified system for continuous improvement: capture corrections, reflect on work, maintain persistent memory across projects and sessions, and extract reusable skills from recurring patterns.

Storage Architecture

Two tiers: project-local for active work, global for cross-project patterns. Plus a shared channel for cross-session experience exchange.

Project-Local (per project)

<project-root>/.learnings/
└── LEARNINGS.md      # Corrections, patterns, and errors for this project

Created on first use. Lightweight — one file, simple format.

Global (always available)

~/self-improving/
├── memory.md          # HOT: ≤100 lines, always loaded — top rules and preferences
├── corrections.md     # Last 50 corrections log
├── projects/          # WARM: per-project patterns, loaded on context match
│   └── {project-name}.md
├── domains/           # WARM: domain-specific patterns (code, writing, comms)
│   └── {domain}.md
├── shared/            # Cross-session insights — any session writes, all sessions read
│   └── YYYY-MM-DD-topic.md
├── archive/           # COLD: decayed or long-unused patterns
└── index.md           # Topic index for quick lookup

Initialisation

Project-local (create when first logging for a project):

mkdir -p .learnings
[ -f .learnings/LEARNINGS.md ] || printf "# Learnings — %s\n\nCaptured corrections, patterns, and errors.\n\n---\n" "$(basename $(pwd))" > .learnings/LEARNINGS.md

Global (create once):

mkdir -p ~/self-improving/{projects,domains,shared,archive}
[ -f ~/self-improving/memory.md ] || printf "# Global Memory\n\n## Rules\n\n## Preferences\n\n## Patterns\n\n---\n" > ~/self-improving/memory.md
[ -f ~/self-improving/corrections.md ] || printf "# Corrections Log\n\nLast 50 corrections. Oldest auto-pruned.\n\n---\n" > ~/self-improving/corrections.md
[ -f ~/self-improving/index.md ] || printf "# Index\n\nTopic → file mapping for quick lookup.\n\n---\n" > ~/self-improving/index.md

Startup Check

At the beginning of each session, perform a lightweight health check to keep memory from decaying:

Prune expired shared insights: Delete files in ~/self-improving/shared/ older than 7 days.
Check shared/ volume: If more than 20 files exist, prune oldest or promote and delete.
Check HOT layer health: If memory.md exceeds 100 lines, trigger compaction (see Compaction section).
Log check result: Briefly note what was pruned or compacted — no verbose output.

This replaces a heavy heartbeat with a minimal startup sweep. No background processes, no recurring timers — just one pass when the session starts.

When to Log

Auto-detect these signals:

Corrections → log immediately:

"No, that's not right..." / "Actually, it should be..." / "You're wrong about..."
"I prefer X, not Y" / "Always do X" / "Never do Y" / "Stop doing X"
"That's outdated..." / "I told you before..."

Errors → log to project-local:

Non-zero exit codes, exceptions, unexpected output
API failures, timeouts, connection issues

Feature gaps → note for later:

"Can you also..." / "I wish you could..." / "Why can't you..."

Knowledge gaps → log when you learn something new:

User provides info you didn't have
Your understanding was wrong or outdated

Better approaches → log when you discover improvements:

A faster way to do something recurring
A cleaner pattern you hadn't considered

Do NOT log:

One-time instructions ("do X now")
Pure context ("in this file...")
Hypotheticals ("what if...")
Anything inferred from silence — only explicit corrections

Logging Format

Keep it lightweight. No heavy metadata — just enough to be useful later.

Project-Local Entry

Append to .learnings/LEARNINGS.md:

### [correction|error|pattern|gap] short summary
**Date**: YYYY-MM-DD
**Context**: what was happening

What went wrong / what was learned:
Description of the situation.

**Rule**: the takeaway (one sentence).

Global Correction Entry

Append to ~/self-improving/corrections.md:

### [date] short summary
**Project**: project-name (or "general")
**Context**: what was happening

**Correction**: what the user said.
**Rule**: the permanent takeaway.

Keep only the last 50 entries — prune oldest when it exceeds that.

Global Memory Entry (HOT tier)

Append to ~/self-improving/memory.md under the appropriate section:

- **Rule text** (from: project/area, first noted: date)

Self-Reflection

After completing significant work, pause and evaluate:

Did it meet expectations? — Compare outcome vs intent
What could be better? — Identify improvements for next time
Is this a pattern? — If yes, log it

When to self-reflect:

After multi-step tasks
After receiving feedback (positive or negative)
After fixing a bug or mistake
When you notice your output could be better

Format:

CONTEXT: [type of task]
REFLECTION: [what I noticed]
LESSON: [what to do differently — log if reusable]

Only persist lessons that are reusable — one-off observations stay in conversation context.

Cross-Session Sharing

When running multiple sessions in parallel (e.g., different projects, different tasks), use the shared directory as a lightweight message board. No special tools required — just file conventions.

How It Works

~/self-improving/shared/ is a flat directory that all sessions can read and write. Each shared insight is a small markdown file. Sessions check for new files at natural breakpoints.

Writing a Shared Insight

When your session discovers something worth sharing with other running sessions:

Create ~/self-improving/shared/YYYY-MM-DD-topic-slug.md:

# Topic Title
**From**: project-name / task description
**Date**: YYYY-MM-DD HH:MM
**Relevance**: writing | code | general | project-name

What was learned or discovered:
Brief description.

**Actionable rule**: one sentence the other session can apply.

Keep it short — this is a sticky note, not a document.
Only share insights that are reusable by other sessions, not project-specific debugging details.

Reading Shared Insights

At natural breakpoints (starting a new task, switching context, after completing work):

# Check for insights from the last 3 days
find ~/self-improving/shared/ -name "*.md" -mtime -3 | sort

Read any relevant files. If an insight applies to your current work, apply it and optionally delete the file (consumed). If not relevant, skip it.

Lifecycle

Write when you discover something another session could use
Read at task start or context switches
Delete after applying (consumed) or after 7 days if not relevant
Promote to memory.md (HOT) or domains/ (WARM) if the insight proves broadly valuable
Never let shared/ accumulate beyond 20 files — prune oldest or promote and delete

What's Worth Sharing

| Share | Don't Share | |-------|------------| | "User prefers concise bullet points over prose for summaries" | "The bug is on line 42 of foo.py" | | "API X requires auth header format Y" | "I'm debugging the login flow" | | "This markdown template works well for reports" | One-off file-specific fixes | | "User corrected: never use Oxford comma" | Internal reasoning or hypotheses |

Memory Tiers & Lifecycle

Tier Rules

| Tier | Location | Size Limit | Loaded When | |------|----------|------------|-------------| | HOT | memory.md | ≤100 lines | Every session | | WARM | projects/, domains/ | ≤200 lines each | Context matches | | COLD | archive/ | Unlimited | Explicit query only |

Promotion (upward)

Pattern applied successfully 3+ times in 7 days → promote to HOT
Project pattern useful across 2+ projects → promote to WARM domain file
Any pattern used consistently → consider extraction as a Skill

Demotion (downward)

HOT pattern unused for 30 days → demote to WARM
WARM pattern unused for 90 days → archive to COLD
Never delete without asking — demote, summarize, or archive instead

Compaction

When any file exceeds its size limit:

Merge similar entries into single rules
Archive unused patterns
Summarize verbose entries
Never lose confirmed user preferences

Namespace & Conflict Resolution

Isolation

Project patterns → projects/{name}.md
Global preferences → HOT (memory.md)
Domain patterns (code, writing, comms) → domains/{domain}.md

Inheritance

Global → Domain → Project (outermost is default, innermost overrides)

When Patterns Conflict

Most specific wins (project > domain > global)
Most recent wins (at same level)
If still ambiguous → ask the user

Promotion to Project Memory

When a project-local learning is broadly applicable:

Distill into a concise rule
Add to CLAUDE.md (project conventions) or relevant project file
Update the entry: mark as promoted

What belongs where:

| Target | What Goes There | |--------|----------------| | CLAUDE.md | Project conventions, build commands, gotchas | | ~/self-improving/memory.md | Cross-project preferences and behavioral rules | | ~/self-improving/projects/{name}.md | Project-specific patterns | | ~/self-improving/domains/{domain}.md | Domain patterns (code, writing, comms) |

Skill Extraction

When a learning becomes a reusable, self-contained capability, extract it as a Skill.

Extraction Criteria (any one):

Recurring across 3+ sessions or projects
Verified working with a tested solution
Non-obvious (required investigation to discover)
Broadly applicable (not project-specific)
User says "save this as a skill"

Process:

Create skills/{skill-name}/SKILL.md
Write self-contained instructions (no references to current session)
Include YAML frontmatter with name and description
Verify it works in a fresh session
Mark the source learning as promoted_to_skill

Review Workflow

Before Major Tasks

Check global memory (memory.md) for relevant rules
Search project-local .learnings/ for past issues in this area
Load matching WARM files if they exist

At Natural Breakpoints

After completing a feature or document
When switching between projects
When you notice a repeated correction

Quick Status

# Count pending learnings in current project
grep -c "^###" .learnings/LEARNINGS.md 2>/dev/null || echo 0

# Count global rules
grep -c "^-" ~/self-improving/memory.md 2>/dev/null || echo 0

# List corrections from last 7 days
grep -A3 "$(date -d '7 days ago' +%Y-%m-%d 2>/dev/null || date -v-7d +%Y-%m-%d)" ~/self-improving/corrections.md

Query Interface

| User says | Action | |-----------|--------| | "What do you know about X?" | Search all tiers for X | | "What have you learned?" | Show recent corrections and HOT rules | | "Show project patterns" | Load projects/{name}.md | | "Any shared insights?" | List recent files in shared/ | | "Memory stats" | Report counts per tier (see below) | | "Forget X" | Search all tiers, confirm before removing | | "What could be improved?" | Self-reflect on recent work |

Memory Stats

Memory Status

HOT (always loaded):
  memory.md: X entries

WARM (load on demand):
  projects/: X files, X entries total
  domains/: X files, X entries total

SHARED (cross-session):
  shared/: X files (X new since last check)

COLD (archived):
  archive/: X files

Recent activity (7 days):
  Corrections logged: X
  Promotions to HOT: X
  Demotions to WARM: X
  Shared insights written: X
  Shared insights consumed: X

Security

Never store credentials, tokens, API keys, or secrets
Never store health data, addresses, or sensitive personal info
Redact sensitive context in error logs — summaries over raw output
Prefer short descriptions over full transcripts

Transparency

Cite memory sources when applying rules: "Using rule from projects/foo.md"
When loading from memory, tell the user what was loaded
If context limits prevent loading everything, state what was skipped
Every memory operation is visible — no silent behavior changes

Common Pitfalls

| Pitfall | Why It Fails | Better Approach | |---------|-------------|-----------------| | Learning from silence | Creates false rules | Wait for explicit correction or repeated evidence | | Promoting too fast | Pollutes HOT memory | Keep tentative until pattern repeats 3x | | Loading everything | Wastes context window | Load HOT + smallest matching WARM files only | | Deleting entries | Loses trust and history | Demote or archive instead | | Logging everything | Creates noise | Only log explicit corrections, verified errors, and reusable patterns | | Hoarding shared insights | Clutters shared/, wastes reads | Consume (delete) after applying, promote valuable ones to WARM/HOT | | Sharing too specifically | Other sessions can't reuse | Share the rule, not the debugging details |