Building an AI-Powered Vulnerability Analysis Tool

August 15, 2025

Building an AI-Powered Vulnerability Analysis Tool — By CyberDudeBivash

Executive summary

Goal: a production-ready system that ingests code, containers, SBOMs, and cloud configs; enriches with threat intel; then uses AI + rules to (1) detect issues, (2) predict exploitability/business risk, and (3) generate actionable, prioritized fixes for developers and SecOps.

1) Architecture (high level)

Sources → Ingestion → Scanners → AI Engine → Risk & Fixes → Delivery

Sources: GitHub/GitLab repos, CI artifacts, containers/VM images, SBOMs (CycloneDX/SPDX), package registries, cloud & IaC (Terraform/K8s), ticketing history.
Ingestion: webhooks + schedulers, message queue (e.g., RabbitMQ), object store (S3/GCS), metadata DB (Postgres).
Scanners (baseline):
- SAST: Semgrep, Bandit (py), ESLint security (js/ts).
- SCA: Syft→SBOM, Grype/Trivy→CVE match.
- Secrets: Gitleaks.
- IaC/Cloud: Checkov/tfsec, kube-bench.
AI Engine (see §3): LLM + ML microservices for dedup, exploit-likelihood, business impact, fix drafts.
Risk & Fixes: scoring service combining CVSS, exploit signals, blast-radius, and compensating controls.
Delivery: PR comments, Jira tickets, Slack/Teams, dashboard, export (PDF/JSON), CERT-In-ready report template (for Indian orgs).

2) Data model (core objects)

Asset (repo, image, function, microservice, cloud resource)
Finding {cve/cwe, location, evidence, scanner, severity}
Context {SBOM deps, ownership, runtime presence, network egress, data sensitivity tag}
Signals {EPSS-like score, exploit POC seen?, commit cadence, package popularity/staleness}
Recommendation {fix steps, patch version, code patch draft}
Ticket {status, SLA, owner}

3) AI components (what the “AI” actually does)

Finding consolidation
- LLM (code-aware) clusters duplicates across scanners; NER to extract CWE, component, version, file, function.
Exploitability prediction
- Gradient-boosted model (XGBoost/LightGBM) with features: CVSS vector, textual embeddings (BERT/MPNet) of advisory, package popularity, release freshness, presence in runtime, internet-exposed?, known POC indicators, historical MTTR.
Business impact estimation
- Rules + ML using data classification tags (PII/financial), asset criticality, user count, blast radius.
Fix generation & review
- LLM drafts patch diffs / config changes; guardrail policy: never auto-merge, require human review; include unit tests where possible.
Explanations
- Short, developer-friendly rationale (“Why this matters”) + ATT&CK mapping + references.
RAG knowledge
- Vector store of advisories, internal runbooks, past incident notes. Retrieval-augmented answers keep guidance current and org-specific.

Guardrails: strict prompt-escaping, allow-lists for tools, reproducible prompts, red-team tests for prompt injection, and output diff linting.

4) Risk score (prioritization)

Risk = f(Exploitability, BusinessImpact, Exposure, CompensatingControls, TimeSinceDisclosure)

Example weights to start:

Exploitability 0.4, Business Impact 0.3, External Exposure 0.2, Controls −0.1, Age +0.1.
Tune with historical incidents to minimize mean time to risk reduction (MTRR).

5) Developer workflow (CI/CD + IDE)

Pre-commit/PR: run light SAST/secret scans; the bot comments with 1-line summary + quick-fix patch.
CI: full SCA + IaC; fail only on risk>threshold (not raw severity) to avoid alert fatigue.
IDE: extension shows “why, how to fix, and sample code,” linked to docset.
Auto-ticketing: one Jira per service/epic; SLAs tied to risk; reminders in Slack.

6) MVP to v1 roadmap

Week 0–2 (MVP)

Repos & containers ingestion; Syft→SBOM; Grype/Trivy; Semgrep; Gitleaks.
Simple risk formula (CVSS + exposure flags).
Dashboard + PR comments; exports (PDF/JSON).

Week 3–6

LLM dedup/summarize findings; CWE mapping; fix text suggestions.
IaC scanning; cloud misconfig baselines.
RAG store of advisories & internal runbooks.

Week 7–10

Exploitability ML (train on historical CVEs, EPSS-style features, PoC sightings).
Business impact model using asset tags & data classes.
Policy engine for CERT-In/DPDP reporting fields (timestamps, logs, contact POCs).

Week 11–14 (v1)

Code-patch drafts for top ecosystems (JS/TS, Python, Java).
Role-based RBAC, audit logs, org SSO (OIDC/SAML).
Multi-tenant SaaS hardening; rate limits; per-tenant encryption.

7) Tech stack (suggested)

Backend: Python (FastAPI), workers (Celery/Redis), Postgres, S3/GCS.
ML/AI: PyTorch/Transformers, XGBoost/LightGBM, sentence-transformers, FAISS/pgvector.
Scanners: Semgrep, Bandit, Trivy/Grype, Syft, Gitleaks, Checkov/tfsec, kube-bench.
Queue/Events: RabbitMQ/Kafka.
Frontend: React + Tailwind; charts (Recharts).
Deploy: Docker + Kubernetes; observability (OpenTelemetry, Prometheus, Grafana).
Security: Vault for secrets; OPA/Gatekeeper for policy.

8) Example API (sketch)


POST /ingest/sbom            # upload CycloneDX/SPDX
POST /scan/repo?url=...      # enqueue repo scan
GET  /findings?asset_id=...  # list normalized findings
POST /score                  # compute risk score for payload
POST /fix/draft              # get LLM patch suggestion (human review required)

Risk scoring pseudo-code


def risk_score(exp, impact, exposure, controls, age_days):
    return 0.4*exp + 0.3*impact + 0.2*exposure - 0.1*controls + 0.1*min(age_days/90,1.0)

9) Data & evaluation

Labels: historical exploitation (yes/no), MTTR, incident attribution, whether a finding led to a hotfix.
Metrics: precision@k for “critical first,” reduction in open risk, developer acceptance rate of fixes, PR cycle time.
Offline→Online: start with offline CVE corpora; then A/B on teams to prove fewer “false-urgent” tickets.

10) Compliance & India specifics

CERT-In: ensure the report export includes incident type, affected systems, indicators, timestamps, and a 24×7 contact field.
DPDP: tag datasets and findings that touch personal data; support breach-assessment workflows.
Auditability: store prompts, model versions, and decision traces for every recommendation.

11) Pricing & GTM (quick sketch)

Free tier: 1 repo, weekly scans, dashboard.
Pro: per-seat or per-asset; CI checks, PR comments, Jira/Slack.
Enterprise: on-prem/air-gapped, SSO/SIEM, custom models, CERT-In report pack.
Add-on: “Fix Assist” (LLM patch drafts) billed per successful merge.

12) Risks & mitigations

Hallucinated fixes → require tests + gated review; limit LLM to read-only code unless diff is approved.
Noise/duplication → AI dedup + “merge similar findings” across scanners.
Supply-chain of the AI itself → pin model weights/images; MBOM/SBOM for the tool; signed releases.
Data leakage → tenant-scoped stores; PII redaction; no training on customer code without explicit consent.

13) What we can build first (fast win)

GitHub App that:
1. Generates SBOM with Syft,
2. Runs Trivy/Grype + Semgrep,
3. Calls our LLM summarizer to post one prioritized PR comment with: risk score → “why it matters” → fix snippet → link to CERT-In export.

Search This Blog

Cyberdudebivash