I fix and secure AI apps before they break in production.
Teams ship AI features and whole apps that work in the demo and fall over on the hard parts — broken auth, leaking secrets, prompt-injection holes, half-wired integrations. I do the other half: make them work, and make them safe.
# AI-generated login — works in the demo, unsafe in prodif user.password == request.form["pw"]:if bcrypt.checkpw(request.form["pw"].encode(), user.pw_hash): return issue_session(user)
What I do
The other half of building with AI.
AI tools get you to a demo fast. I take it the rest of the way — reliable, secure, and tested.
rescue
AI-generated code rescue
Debug and harden the auth, database, API, validation, deploy, and test gaps that Lovable / Replit / Cursor leave behind.
security
LLM & app security
Prompt-injection probes, OWASP-LLM-2025-aligned reviews, secrets/misconfig checks, and secure-code review.
integrate
AI integrations & automation
LLM extraction/classification wired into APIs, ERP/CRM systems, and n8n/Make workflows — with retries, idempotency, and validation.
build
Custom AI tools
RAG assistants, document extraction, support triage, lead-gen pipelines, and the internal automation around them.
Proof you can run
Tested reference projects — not screenshots.
Each is a small, honest capability demo with a clear threat model, real tests, and a measurable before/after you can reproduce in one command.
Fix & secure AI-generated apps
24 tests · AST fixer
Detects real issues (debug-in-prod, missing auth, plaintext passwords, hardcoded secrets, injection-prone DB calls) via AST analysis, patches supported Python, and rebuilds auth the safe way — with a before/after diff.
A black-box red-team aligned to the OWASP LLM Top-10 (2025): prompt injection, sensitive-info disclosure, output handling, excessive agency, system-prompt leakage. Surfaces 11 findings on a vulnerable target, 0 on the hardened one.
Turns messy invoices/receipts into clean, validated JSON with arithmetic reconciliation and confidence scoring — low-confidence docs route to human review. Quality is measured on a labelled gold set, not claimed.
A production-shaped pipeline: incoming order/email → AI-structured → validated → synced through a resilient client with content-addressed idempotency (corrected re-sends never double-post), retries with backoff, and typed errors.