5-Agent Framework for Code Audits

I’ve been seeing the same anti-pattern everywhere lately.
Someone opens Cursor, Copilot or Claude and pastes a giant prompt:

			
You are a Principal Security Engineer, Staff Node.js Engineer, and Senior SRE performing a production-grade audit of this codebase.
Your mission is NOT to explain the code.
Your mission is to aggressively find:
1. Security vulnerabilities
2. Reliability issues
3. Logic bugs
4. Performance bottlenecks
5. Race conditions
6. Data integrity issues
7. Scalability problems
8. Operational risks
9. Bad architectural decisions
10. Technical debt that could cause future incidents
Codebase stack:
- NodeJS
- Express
- TypeScript
- (discover additional technologies automatically)
Rules:
- Think like an attacker first.
- Then think like an SRE responsible for keeping production alive at 3AM.
- Then think like a senior engineer maintaining this system for 5 years.
- Be skeptical of every assumption.
- Never assume code is safe because it works.
For every file you inspect, evaluate the following categories.
## SECURITY CHECKLIST
### Authentication
- Missing authentication
- Broken authentication
- Insecure session management
- JWT issues
- Token expiration issues
- Missing token validation
- Weak secrets handling
- Secret leakage
### Authorization
- IDOR vulnerabilities
- Privilege escalation risks
- Missing ownership validation
- Missing role checks
- Overly broad permissions
### Input Validation
- SQL Injection
- NoSQL Injection
- Command Injection
- Path Traversal
- Prototype Pollution
- XSS
- SSRF
- Open Redirects
- Unsafe deserialization
- Header injection
### API Security
- Missing rate limiting
- Missing request size limits
- Missing CORS restrictions
- Information leakage
- Verb tampering
- Sensitive endpoint exposure
### Secrets Management
- Hardcoded secrets
- API keys in code
- Credentials in configs
- Sensitive logs
### Dependencies
- Dangerous packages
- Deprecated packages
- Unmaintained packages
- Supply chain risks
### Infrastructure
- Unsafe environment variable usage
- Missing security headers
- Missing HTTPS enforcement
- Dangerous Express configuration
---
## RELIABILITY CHECKLIST
Find:
- Missing try/catch blocks
- Unhandled promise rejections
- Silent failures
- Swallowed exceptions
- Missing timeouts
- Missing retries
- Infinite loops
- Resource leaks
- Memory leaks
- File descriptor leaks
- Database connection leaks
- Event listener leaks
---
## DATA INTEGRITY CHECKLIST
Find:
- Non-atomic operations
- Race conditions
- Concurrent update issues
- Duplicate writes
- Missing transactions
- Inconsistent states
- Event ordering problems
- Partial failures
---
## PERFORMANCE CHECKLIST
Find:
- N+1 queries
- Sequential async code that should be parallelized
- Excessive awaits inside loops
- Blocking CPU work
- Large memory allocations
- Missing caching opportunities
- Excessive serialization
- Repeated computations
Estimate impact whenever possible.
---
## EXPRESS SPECIFIC CHECKLIST
Inspect:
app.ts
server.ts
middleware/
routes/
controllers/
services/
repositories/
models/
Look for:
- Missing helmet
- Missing compression
- Missing body size limits
- Missing rate limiting
- Missing request validation
- Missing centralized error handling
- Missing graceful shutdown
- Missing health checks
- Missing request IDs
- Missing correlation IDs
---
## TYPESCRIPT CHECKLIST
Find:
- use of any
- unsafe type assertions
- ignored compiler errors
- null/undefined bugs
- impossible states
- weak interfaces
- duplicate types
---
## OBSERVABILITY CHECKLIST
Verify:
- Structured logging
- Error tracking
- Metrics
- Health endpoints
- Distributed tracing
- Audit logs
- Correlation IDs
---
## OUTPUT FORMAT
Do NOT dump all findings.
Prioritize findings by severity.
Use this exact format:
# CRITICAL
Issue:
Location:
Impact:
Attack scenario:
Evidence:
Fix:
# HIGH
Issue:
Location:
Impact:
Evidence:
Fix:
# MEDIUM
Issue:
Location:
Impact:
Evidence:
Fix:
# LOW
Issue:
Location:
Impact:
Evidence:
Fix:
# ARCHITECTURAL IMPROVEMENTS
1.
2.
3.
# TOP 10 ACTION ITEMS
Order by highest ROI and risk reduction.
IMPORTANT RULES:
- Never speculate.
- If evidence is insufficient, explicitly say:
  "Potential issue - needs verification."
- Show the exact file and line numbers whenever possible.
- If you cannot verify a vulnerability, do not present it as fact.
- Suggest concrete code fixes, not generic advice.
- Think adversarially.

		

Be a principal security engineer, SRE, performance engineer and senior TypeScript expert.
Audit my entire codebase before production.

Sounds smart but usually produces mediocre results.

Here’s the pattern I’ve noticed:

The first few findings are excellent.
Then the model starts skimming.
Then it starts hedging.
Eventually it turns into a summary instead of an audit.

This isn’t a prompting problem.

It’s a job design problem – Looks at this:

			
❌ Giant Agent
Codebase
   ↓
One Super Prompt
   ↓
40 mixed findings
   ↓
Nobody reads it
✅ Specialized Agents
           Security
               ↓
Codebase → Reliability
               ↓
          Performance
               ↓
            Platform
               ↓
          TypeScript
       ↓ ↓ ↓ ↓ ↓
   One merged triage doc

		

We’re asking one agent to do five different jobs simultaneously.
Humans don’t work that way.
Engineering organizations don’t work that way. LLMs don’t either.

Treat AI agents like engineering teams

In a healthy engineering organization, you don’t ask one person to be:

The security engineer
The SRE
The performance expert
The platform engineer
The TypeScript expert

You specialize.
Do the exact same thing with your AI agents.

I call this the 5-Agent Production Audit Framework.

Agent #1: Security & Authentication

Persona: Principal Security Engineer

This agent thinks like an attacker. Your red team.

Scope:

Authentication
Authorization
Input validation
Injection vulnerabilities
XSS
SSRF
Secrets management
Dependency risks

Run this one first.

Security findings are usually the highest severity and other audits will often reference the same code paths.

Agent #2: Reliability & Data Integrity

Persona: Senior SRE (He wrote this SRE book)

This agent asks one question:

What happens at 3AM when something fails?

Scope:

Unhandled exceptions
Silent failures
Missing retries
Resource leaks
Race conditions
Missing transactions
Partial failures

This is your “will this wake somebody up at night?” audit.

Agent #3: Performance & Scalability

Persona: Staff Node.js Performance Engineer

Scope:

N+1 queries
Sequential awaits
Event loop blockers
Missing caches
Excessive serialization
Memory inefficiencies

One rule is critical here:

Every finding must estimate impact.

Don’t say:

This could be slow.

Say:

This endpoint executes 200 database queries instead of 1 under load.

Huge difference.

Agent #4: Platform & Observability

Persona: Staff Platform Engineer

Scope:

Helmet
Compression
Body limits
Rate limiting
Graceful shutdown
Health checks
Structured logging
Correlation IDs
Metrics

Production-ready systems are debuggable systems.

These two belong together.

Agent #5: TypeScript & Code Health

Persona: Senior TypeScript Engineer

Scope:

any usage and not types
Unsafe assertions
Null bugs
Duplicate types
Impossible states
Weak interfaces

This one is intentionally last.
Not because it’s unimportant.
Because it’s usually the first thing that gets ignored when mixed with security findings.

Give it dedicated attention.

Why this works better

Three reasons:

1. Smaller scope = deeper analysis

An agent looking only for authorization bugs will trace every token validation path.
An agent looking for authorization bugs, race conditions and N+1 queries will skim all three.

2. Different mental models don’t mix well

Thinking like an attacker is different from thinking like an SRE.
Both are valuable.
Neither benefits from context switching.

3. The output becomes actionable

Nobody wants a 50-item audit report.
Five reports with 8 findings each are dramatically easier to assign and fix.
Security reviews security.
Platform reviews platform. Performance reviews performance.
That’s exactly how engineering organizations already operate.

How to run this in practice

Use identical output formats for all agents.
Give each agent only its own checklist.
Run them against the same commit.
Merge HIGH and CRITICAL findings into a single triage document.
Re-run only the agent that corresponds to the fixes you made.

One thing not to do

Don’t split by folders.

Don’t do:

Agent A → routes/
Agent B → services/
Agent C → controllers/

That simply recreates the original problem. Every agent now needs all the expertise again.
Split by domain expertise, not by directory structure.

The takeaway

The giant audit prompt isn’t wrong. It’s just too broad. One agent doing five jobs becomes average at all five.
Five specialized agents become genuinely useful. That’s also how we build engineering organizations.
Maybe we should build our AI workflows the same way.

Discover more from Ido Green

Subscribe to get the latest posts sent to your email.

Ido Green

Thoughts To Remember

5-Agent Framework for Code Audits

Treat AI agents like engineering teams

Agent #1: Security & Authentication

Agent #2: Reliability & Data Integrity

Agent #3: Performance & Scalability

Agent #4: Platform & Observability

Agent #5: TypeScript & Code Health

Why this works better

1. Smaller scope = deeper analysis

2. Different mental models don’t mix well

3. The output becomes actionable

How to run this in practice

One thing not to do

The takeaway

Discover more from Ido Green

Leave a comment Cancel reply

Treat AI agents like engineering teams

Agent #1: Security & Authentication

Agent #2: Reliability & Data Integrity

Agent #3: Performance & Scalability

Agent #4: Platform & Observability

Agent #5: TypeScript & Code Health

Why this works better

1. Smaller scope = deeper analysis

2. Different mental models don’t mix well

3. The output becomes actionable

How to run this in practice

One thing not to do

The takeaway

Discover more from Ido Green

Rate this:

Share only with good friends:

Leave a comment Cancel reply