Five agents collaboratively repairing a complex machine labeled Mega-Device X1 in a futuristic lab filled with tools and monitors.
AI, webdev

5-Agent Framework for Code Audits

I’ve been seeing the same anti-pattern everywhere lately.
Someone opens Cursor, Copilot or Claude and pastes a giant prompt:

You are a Principal Security Engineer, Staff Node.js Engineer, and Senior SRE performing a production-grade audit of this codebase.
Your mission is NOT to explain the code.
Your mission is to aggressively find:
1. Security vulnerabilities
2. Reliability issues
3. Logic bugs
4. Performance bottlenecks
5. Race conditions
6. Data integrity issues
7. Scalability problems
8. Operational risks
9. Bad architectural decisions
10. Technical debt that could cause future incidents
Codebase stack:
- NodeJS
- Express
- TypeScript
- (discover additional technologies automatically)
Rules:
- Think like an attacker first.
- Then think like an SRE responsible for keeping production alive at 3AM.
- Then think like a senior engineer maintaining this system for 5 years.
- Be skeptical of every assumption.
- Never assume code is safe because it works.
For every file you inspect, evaluate the following categories.
## SECURITY CHECKLIST
### Authentication
- Missing authentication
- Broken authentication
- Insecure session management
- JWT issues
- Token expiration issues
- Missing token validation
- Weak secrets handling
- Secret leakage
### Authorization
- IDOR vulnerabilities
- Privilege escalation risks
- Missing ownership validation
- Missing role checks
- Overly broad permissions
### Input Validation
- SQL Injection
- NoSQL Injection
- Command Injection
- Path Traversal
- Prototype Pollution
- XSS
- SSRF
- Open Redirects
- Unsafe deserialization
- Header injection
### API Security
- Missing rate limiting
- Missing request size limits
- Missing CORS restrictions
- Information leakage
- Verb tampering
- Sensitive endpoint exposure
### Secrets Management
- Hardcoded secrets
- API keys in code
- Credentials in configs
- Sensitive logs
### Dependencies
- Dangerous packages
- Deprecated packages
- Unmaintained packages
- Supply chain risks
### Infrastructure
- Unsafe environment variable usage
- Missing security headers
- Missing HTTPS enforcement
- Dangerous Express configuration
---
## RELIABILITY CHECKLIST
Find:
- Missing try/catch blocks
- Unhandled promise rejections
- Silent failures
- Swallowed exceptions
- Missing timeouts
- Missing retries
- Infinite loops
- Resource leaks
- Memory leaks
- File descriptor leaks
- Database connection leaks
- Event listener leaks
---
## DATA INTEGRITY CHECKLIST
Find:
- Non-atomic operations
- Race conditions
- Concurrent update issues
- Duplicate writes
- Missing transactions
- Inconsistent states
- Event ordering problems
- Partial failures
---
## PERFORMANCE CHECKLIST
Find:
- N+1 queries
- Sequential async code that should be parallelized
- Excessive awaits inside loops
- Blocking CPU work
- Large memory allocations
- Missing caching opportunities
- Excessive serialization
- Repeated computations
Estimate impact whenever possible.
---
## EXPRESS SPECIFIC CHECKLIST
Inspect:
app.ts
server.ts
middleware/
routes/
controllers/
services/
repositories/
models/
Look for:
- Missing helmet
- Missing compression
- Missing body size limits
- Missing rate limiting
- Missing request validation
- Missing centralized error handling
- Missing graceful shutdown
- Missing health checks
- Missing request IDs
- Missing correlation IDs
---
## TYPESCRIPT CHECKLIST
Find:
- use of any
- unsafe type assertions
- ignored compiler errors
- null/undefined bugs
- impossible states
- weak interfaces
- duplicate types
---
## OBSERVABILITY CHECKLIST
Verify:
- Structured logging
- Error tracking
- Metrics
- Health endpoints
- Distributed tracing
- Audit logs
- Correlation IDs
---
## OUTPUT FORMAT
Do NOT dump all findings.
Prioritize findings by severity.
Use this exact format:
# CRITICAL
Issue:
Location:
Impact:
Attack scenario:
Evidence:
Fix:
# HIGH
Issue:
Location:
Impact:
Evidence:
Fix:
# MEDIUM
Issue:
Location:
Impact:
Evidence:
Fix:
# LOW
Issue:
Location:
Impact:
Evidence:
Fix:
# ARCHITECTURAL IMPROVEMENTS
1.
2.
3.
# TOP 10 ACTION ITEMS
Order by highest ROI and risk reduction.
IMPORTANT RULES:
- Never speculate.
- If evidence is insufficient, explicitly say:
"Potential issue - needs verification."
- Show the exact file and line numbers whenever possible.
- If you cannot verify a vulnerability, do not present it as fact.
- Suggest concrete code fixes, not generic advice.
- Think adversarially.

Be a principal security engineer, SRE, performance engineer and senior TypeScript expert.
Audit my entire codebase before production.

Sounds smart but usually produces mediocre results.

Here’s the pattern I’ve noticed:

  • The first few findings are excellent.
  • Then the model starts skimming.
  • Then it starts hedging.
  • Eventually it turns into a summary instead of an audit.

This isn’t a prompting problem.

It’s a job design problem – Looks at this:

❌ Giant Agent
Codebase
One Super Prompt
40 mixed findings
Nobody reads it
✅ Specialized Agents
Security
Codebase → Reliability
Performance
Platform
TypeScript
↓ ↓ ↓ ↓ ↓
One merged triage doc

We’re asking one agent to do five different jobs simultaneously.
Humans don’t work that way.
Engineering organizations don’t work that way. LLMs don’t either.

Treat AI agents like engineering teams

In a healthy engineering organization, you don’t ask one person to be:

  • The security engineer
  • The SRE
  • The performance expert
  • The platform engineer
  • The TypeScript expert

You specialize.
Do the exact same thing with your AI agents.

I call this the 5-Agent Production Audit Framework.

Agent #1: Security & Authentication

Persona: Principal Security Engineer

This agent thinks like an attacker. Your red team.

Scope:

  • Authentication
  • Authorization
  • Input validation
  • Injection vulnerabilities
  • XSS
  • SSRF
  • Secrets management
  • Dependency risks

Run this one first.

Security findings are usually the highest severity and other audits will often reference the same code paths.

Agent #2: Reliability & Data Integrity

Persona: Senior SRE (He wrote this SRE book)

This agent asks one question:

What happens at 3AM when something fails?

Scope:

  • Unhandled exceptions
  • Silent failures
  • Missing retries
  • Resource leaks
  • Race conditions
  • Missing transactions
  • Partial failures

This is your “will this wake somebody up at night?” audit.

Agent #3: Performance & Scalability

Persona: Staff Node.js Performance Engineer

Scope:

  • N+1 queries
  • Sequential awaits
  • Event loop blockers
  • Missing caches
  • Excessive serialization
  • Memory inefficiencies

One rule is critical here:

Every finding must estimate impact.

Don’t say:

This could be slow.

Say:

This endpoint executes 200 database queries instead of 1 under load.

Huge difference.

Agent #4: Platform & Observability

Persona: Staff Platform Engineer

Scope:

  • Helmet
  • Compression
  • Body limits
  • Rate limiting
  • Graceful shutdown
  • Health checks
  • Structured logging
  • Correlation IDs
  • Metrics

Production-ready systems are debuggable systems.

These two belong together.

Agent #5: TypeScript & Code Health

Persona: Senior TypeScript Engineer

Scope:

  • any usage and not types
  • Unsafe assertions
  • Null bugs
  • Duplicate types
  • Impossible states
  • Weak interfaces

This one is intentionally last.
Not because it’s unimportant.
Because it’s usually the first thing that gets ignored when mixed with security findings.

Give it dedicated attention.

Why this works better

Three reasons:

1. Smaller scope = deeper analysis

An agent looking only for authorization bugs will trace every token validation path.
An agent looking for authorization bugs, race conditions and N+1 queries will skim all three.

2. Different mental models don’t mix well

Thinking like an attacker is different from thinking like an SRE.
Both are valuable.
Neither benefits from context switching.

3. The output becomes actionable

Nobody wants a 50-item audit report.
Five reports with 8 findings each are dramatically easier to assign and fix.
Security reviews security.
Platform reviews platform. Performance reviews performance.
That’s exactly how engineering organizations already operate.

How to run this in practice

  1. Use identical output formats for all agents.
  2. Give each agent only its own checklist.
  3. Run them against the same commit.
  4. Merge HIGH and CRITICAL findings into a single triage document.
  5. Re-run only the agent that corresponds to the fixes you made.

One thing not to do

Don’t split by folders.

Don’t do:

  • Agent A → routes/
  • Agent B → services/
  • Agent C → controllers/

That simply recreates the original problem. Every agent now needs all the expertise again.
Split by domain expertise, not by directory structure.

The takeaway

The giant audit prompt isn’t wrong. It’s just too broad. One agent doing five jobs becomes average at all five.
Five specialized agents become genuinely useful. That’s also how we build engineering organizations.
Maybe we should build our AI workflows the same way.


Discover more from Ido Green

Subscribe to get the latest posts sent to your email.

Standard

Leave a comment