I’ve been seeing the same anti-pattern everywhere lately.
Someone opens Cursor, Copilot or Claude and pastes a giant prompt:
You are a Principal Security Engineer, Staff Node.js Engineer, and Senior SRE performing a production-grade audit of this codebase.Your mission is NOT to explain the code.Your mission is to aggressively find:1. Security vulnerabilities2. Reliability issues3. Logic bugs4. Performance bottlenecks5. Race conditions6. Data integrity issues7. Scalability problems8. Operational risks9. Bad architectural decisions10. Technical debt that could cause future incidentsCodebase stack:- NodeJS- Express- TypeScript- (discover additional technologies automatically)Rules:- Think like an attacker first.- Then think like an SRE responsible for keeping production alive at 3AM.- Then think like a senior engineer maintaining this system for 5 years.- Be skeptical of every assumption.- Never assume code is safe because it works.For every file you inspect, evaluate the following categories.## SECURITY CHECKLIST### Authentication- Missing authentication- Broken authentication- Insecure session management- JWT issues- Token expiration issues- Missing token validation- Weak secrets handling- Secret leakage### Authorization- IDOR vulnerabilities- Privilege escalation risks- Missing ownership validation- Missing role checks- Overly broad permissions### Input Validation- SQL Injection- NoSQL Injection- Command Injection- Path Traversal- Prototype Pollution- XSS- SSRF- Open Redirects- Unsafe deserialization- Header injection### API Security- Missing rate limiting- Missing request size limits- Missing CORS restrictions- Information leakage- Verb tampering- Sensitive endpoint exposure### Secrets Management- Hardcoded secrets- API keys in code- Credentials in configs- Sensitive logs### Dependencies- Dangerous packages- Deprecated packages- Unmaintained packages- Supply chain risks### Infrastructure- Unsafe environment variable usage- Missing security headers- Missing HTTPS enforcement- Dangerous Express configuration---## RELIABILITY CHECKLISTFind:- Missing try/catch blocks- Unhandled promise rejections- Silent failures- Swallowed exceptions- Missing timeouts- Missing retries- Infinite loops- Resource leaks- Memory leaks- File descriptor leaks- Database connection leaks- Event listener leaks---## DATA INTEGRITY CHECKLISTFind:- Non-atomic operations- Race conditions- Concurrent update issues- Duplicate writes- Missing transactions- Inconsistent states- Event ordering problems- Partial failures---## PERFORMANCE CHECKLISTFind:- N+1 queries- Sequential async code that should be parallelized- Excessive awaits inside loops- Blocking CPU work- Large memory allocations- Missing caching opportunities- Excessive serialization- Repeated computationsEstimate impact whenever possible.---## EXPRESS SPECIFIC CHECKLISTInspect:app.tsserver.tsmiddleware/routes/controllers/services/repositories/models/Look for:- Missing helmet- Missing compression- Missing body size limits- Missing rate limiting- Missing request validation- Missing centralized error handling- Missing graceful shutdown- Missing health checks- Missing request IDs- Missing correlation IDs---## TYPESCRIPT CHECKLISTFind:- use of any- unsafe type assertions- ignored compiler errors- null/undefined bugs- impossible states- weak interfaces- duplicate types---## OBSERVABILITY CHECKLISTVerify:- Structured logging- Error tracking- Metrics- Health endpoints- Distributed tracing- Audit logs- Correlation IDs---## OUTPUT FORMATDo NOT dump all findings.Prioritize findings by severity.Use this exact format:# CRITICALIssue:Location:Impact:Attack scenario:Evidence:Fix:# HIGHIssue:Location:Impact:Evidence:Fix:# MEDIUMIssue:Location:Impact:Evidence:Fix:# LOWIssue:Location:Impact:Evidence:Fix:# ARCHITECTURAL IMPROVEMENTS1.2.3.# TOP 10 ACTION ITEMSOrder by highest ROI and risk reduction.IMPORTANT RULES:- Never speculate.- If evidence is insufficient, explicitly say: "Potential issue - needs verification."- Show the exact file and line numbers whenever possible.- If you cannot verify a vulnerability, do not present it as fact.- Suggest concrete code fixes, not generic advice.- Think adversarially.
Be a principal security engineer, SRE, performance engineer and senior TypeScript expert.
Audit my entire codebase before production.
Sounds smart but usually produces mediocre results.
Here’s the pattern I’ve noticed:
- The first few findings are excellent.
- Then the model starts skimming.
- Then it starts hedging.
- Eventually it turns into a summary instead of an audit.
This isn’t a prompting problem.
It’s a job design problem – Looks at this:
❌ Giant AgentCodebase ↓One Super Prompt ↓40 mixed findings ↓Nobody reads it✅ Specialized Agents Security ↓Codebase → Reliability ↓ Performance ↓ Platform ↓ TypeScript ↓ ↓ ↓ ↓ ↓ One merged triage doc
We’re asking one agent to do five different jobs simultaneously.
Humans don’t work that way.
Engineering organizations don’t work that way. LLMs don’t either.
Treat AI agents like engineering teams
In a healthy engineering organization, you don’t ask one person to be:
- The security engineer
- The SRE
- The performance expert
- The platform engineer
- The TypeScript expert
You specialize.
Do the exact same thing with your AI agents.
I call this the 5-Agent Production Audit Framework.
Agent #1: Security & Authentication
Persona: Principal Security Engineer
This agent thinks like an attacker. Your red team.
Scope:
- Authentication
- Authorization
- Input validation
- Injection vulnerabilities
- XSS
- SSRF
- Secrets management
- Dependency risks
Run this one first.
Security findings are usually the highest severity and other audits will often reference the same code paths.
Agent #2: Reliability & Data Integrity
Persona: Senior SRE (He wrote this SRE book)
This agent asks one question:
What happens at 3AM when something fails?
Scope:
- Unhandled exceptions
- Silent failures
- Missing retries
- Resource leaks
- Race conditions
- Missing transactions
- Partial failures
This is your “will this wake somebody up at night?” audit.
Agent #3: Performance & Scalability
Persona: Staff Node.js Performance Engineer
Scope:
- N+1 queries
- Sequential awaits
- Event loop blockers
- Missing caches
- Excessive serialization
- Memory inefficiencies
One rule is critical here:
Every finding must estimate impact.
Don’t say:
This could be slow.
Say:
This endpoint executes 200 database queries instead of 1 under load.
Huge difference.
Agent #4: Platform & Observability
Persona: Staff Platform Engineer
Scope:
- Helmet
- Compression
- Body limits
- Rate limiting
- Graceful shutdown
- Health checks
- Structured logging
- Correlation IDs
- Metrics
Production-ready systems are debuggable systems.
These two belong together.
Agent #5: TypeScript & Code Health
Persona: Senior TypeScript Engineer
Scope:
- any usage and not types
- Unsafe assertions
- Null bugs
- Duplicate types
- Impossible states
- Weak interfaces
This one is intentionally last.
Not because it’s unimportant.
Because it’s usually the first thing that gets ignored when mixed with security findings.
Give it dedicated attention.
Why this works better
Three reasons:
1. Smaller scope = deeper analysis
An agent looking only for authorization bugs will trace every token validation path.
An agent looking for authorization bugs, race conditions and N+1 queries will skim all three.
2. Different mental models don’t mix well
Thinking like an attacker is different from thinking like an SRE.
Both are valuable.
Neither benefits from context switching.
3. The output becomes actionable
Nobody wants a 50-item audit report.
Five reports with 8 findings each are dramatically easier to assign and fix.
Security reviews security.
Platform reviews platform. Performance reviews performance.
That’s exactly how engineering organizations already operate.
How to run this in practice
- Use identical output formats for all agents.
- Give each agent only its own checklist.
- Run them against the same commit.
- Merge HIGH and CRITICAL findings into a single triage document.
- Re-run only the agent that corresponds to the fixes you made.
One thing not to do
Don’t split by folders.
Don’t do:
- Agent A → routes/
- Agent B → services/
- Agent C → controllers/
That simply recreates the original problem. Every agent now needs all the expertise again.
Split by domain expertise, not by directory structure.
The takeaway
The giant audit prompt isn’t wrong. It’s just too broad. One agent doing five jobs becomes average at all five.
Five specialized agents become genuinely useful. That’s also how we build engineering organizations.
Maybe we should build our AI workflows the same way.
Discover more from Ido Green
Subscribe to get the latest posts sent to your email.