Incident Response Plan¶

Effective Date: 2026-03-02 Last Review: 2026-03-02 Next Review: 2026-09-02 Owner: Greg Felice, Project Lead

1. Purpose¶

This plan defines how security incidents affecting the tomo ecosystem are detected, triaged, contained, resolved, and reviewed. It ensures consistent, timely response that minimizes damage and preserves evidence for analysis.

2. Scope¶

This plan covers incidents affecting:

Tomo SDK — compromised PyPI package, supply chain attacks, malicious contributions
tomo Docker image — compromised container, unauthorized image pushes
tomo hosted service — unauthorized data access, service disruption, data breach
Infrastructure — dweezil server compromise, CI/CD pipeline abuse, credential theft
Third-party services — breaches at Docker Hub, PyPI, GitHub, Backblaze B2

3. Severity Levels¶

Severity	Description	Response Time	Update Frequency	Examples
P1 — Critical	Active exploitation, data breach, complete service outage	30 minutes	Every 2 hours	Compromised PyPI package, database exfiltration, server root compromise
P2 — High	Confirmed vulnerability under active threat, partial outage	4 hours	Every 8 hours	Exploitable CVE in production dependency, unauthorized CI credential use
P3 — Medium	Vulnerability identified but not actively exploited	24 hours	Daily	Dependency CVE with no known exploit, misconfigured firewall rule
P4 — Low	Minor security issue, informational	72 hours	As needed	Failed brute-force attempts, stale user account, policy deviation

4. Roles and Responsibilities¶

Role	Responsibility	Current Holder
Incident Commander (IC)	Owns the incident lifecycle, makes containment decisions, coordinates communication	Greg Felice
Technical Lead	Performs investigation, implements containment and remediation	Greg Felice
Communications Lead	Drafts external notifications, updates stakeholders	Greg Felice

As the team grows, these roles will be distributed. Until then, the Project Lead fills all roles.

5. Detection Sources¶

Source	What It Detects	Alert Method
Grafana / Prometheus	Service health, anomalous database connections, disk/CPU spikes	Grafana alerting (email, webhook)
PostgreSQL audit logs	Unauthorized queries, failed auth, DDL changes	Log review, Grafana dashboard
Woodpecker CI	SAST findings (bandit), dependency CVEs (pip-audit), container vulnerabilities (trivy), leaked secrets (trufflehog)	Pipeline failure notification
Authentik	Failed login attempts, MFA bypass attempts, unusual session activity	Authentik event logs
Systemd journal	SSH access, sudo usage, service crashes	Log review
External reports	Vulnerability reports via SECURITY.md process	Email (security@rizlabs.com)
Uptime monitoring	Service availability	Alerting (webhook)

6. Incident Response Process¶

Phase 1: Detection and Reporting¶

Incident detected via monitoring alert, CI failure, log review, or external report
Document initial observations: what, when, how detected, affected systems
Open incident record (Forgejo issue with security-incident label, or offline log if Forgejo is compromised)
Assign severity level (P1-P4)

Phase 2: Triage¶

Confirm the incident is real (not a false positive)
Identify affected systems, data, and users
Determine if the incident is ongoing or historical
Reassess severity if initial assessment was incorrect
Decide on containment strategy (see Phase 3)

Triage decision tree:

Is data being actively exfiltrated? --> P1, immediate containment
Is a published artifact (PyPI, Docker Hub) compromised? --> P1, immediate yank/retag
Is infrastructure access compromised? --> P1/P2, rotate credentials immediately
Is a vulnerability confirmed but not exploited? --> P2/P3, plan remediation
Is this informational only? --> P4, document and schedule fix

Phase 3: Containment¶

Short-term containment (stop the bleeding):

Scenario	Action
Compromised server	Isolate network (firewall deny-all), preserve disk state
Compromised PyPI package	`pip install tomo-sdk==<safe-version>`, yank compromised version
Compromised Docker image	Remove tag from Docker Hub, push known-good image
Credential theft	Rotate all affected credentials immediately
Database breach	Revoke compromised roles, enable pg_hba deny rules
CI pipeline abuse	Disable Woodpecker pipelines, rotate CI secrets

Long-term containment (prevent recurrence while preserving evidence):

Rebuild affected systems from known-good state if necessary
Apply patches or configuration changes
Enhance monitoring for the attack vector
Do not destroy forensic evidence (preserve logs, disk snapshots)

Phase 4: Eradication¶

Identify root cause
Remove attacker access (accounts, backdoors, malware)
Patch the vulnerability that was exploited
Verify no persistence mechanisms remain
Scan for indicators of compromise (IoCs) across all systems

Phase 5: Recovery¶

Restore services from known-good backups if needed (see Backup and Recovery Policy)
Verify system integrity before returning to production
Monitor closely for 72 hours after recovery
Confirm all rotated credentials are propagated to dependent systems

Phase 6: Post-Incident Review¶

Conduct review within 5 business days of incident closure
Use the post-incident review template (Section 9)
Document lessons learned and action items
Update policies, runbooks, and monitoring as needed
Store review in docs/security/incident-reviews/YYYY-MM-DD-title.md

7. Escalation Contacts¶

Priority	Contact	Method	Timeframe
P1	Greg Felice	Phone, Signal	Immediate, 24/7
P2	Greg Felice	Email, Signal	Within 4 hours
P3-P4	Greg Felice	Email, Forgejo issue	Next business day

External escalation (if required):

Entity	When	Contact
PyPI Security	Compromised package	security@pypi.org
Docker Hub Security	Compromised image	security@docker.com
GitHub Security	Repository compromise	Via GitHub support
Upstream Apache AGE	Vulnerability in AGE	security@apache.org
Law enforcement	Criminal activity, data breach with legal reporting obligation	Local authorities

8. Communication Templates¶

8.1 Internal Incident Declaration¶

Subject: [P{severity}] Security Incident — {brief description}

Incident ID: INC-YYYY-NNN
Severity: P{1-4}
Detected: {timestamp}
Affected Systems: {list}
Current Status: {Investigating | Containing | Remediating | Resolved}

Summary:
{What happened, what we know so far}

Immediate Actions Taken:
{What has been done}

Next Steps:
{What will be done next}

Incident Commander: Greg Felice

8.2 External User Notification (P1/P2 — Data Breach or Compromised Artifact)¶

Subject: Security Notice — tomo {SDK | Docker Image | Hosted Service}

We are writing to inform you of a security incident affecting {component}.

What happened:
{Clear, factual description}

What data/systems were affected:
{Specific scope}

What we have done:
{Containment and remediation actions}

What you should do:
{User actions: update version, rotate credentials, etc.}

Timeline:
- {timestamp}: Incident detected
- {timestamp}: Containment completed
- {timestamp}: Remediation deployed

We will provide updates as our investigation continues. If you have questions, contact security@rizlabs.com.

Greg Felice
tomo Project Lead

8.3 Public Advisory (for SDK/Docker supply chain incidents)¶

Subject: [SECURITY] tomo {version} — {CVE ID if applicable}

Affected versions: {version range}
Fixed version: {version}
Severity: {Critical | High | Medium | Low}

Description:
{Technical description of the vulnerability}

Impact:
{What an attacker could do}

Mitigation:
{Steps to fix: upgrade command, workaround}

Credit:
{Reporter, if they wish to be credited}

References:
- {CVE link}
- {Related advisory links}

9. Post-Incident Review Template¶

# Post-Incident Review: INC-YYYY-NNN

**Date of Review:** YYYY-MM-DD
**Incident Commander:** {name}
**Participants:** {names}

## Incident Summary

- **Severity:** P{1-4}
- **Duration:** {detection to resolution}
- **Affected Systems:** {list}
- **User Impact:** {description}

## Timeline

| Time (UTC) | Event |
|------------|-------|
| {timestamp} | {event} |

## Root Cause

{What caused the incident}

## What Went Well

- {item}

## What Could Be Improved

- {item}

## Action Items

| Action | Owner | Due Date | Status |
|--------|-------|----------|--------|
| {action} | {name} | {date} | {Open/Done} |

## Metrics

- **Time to detect:** {duration}
- **Time to contain:** {duration}
- **Time to resolve:** {duration}
- **Data exposed:** {scope or "None"}

10. Evidence Preservation¶

During any P1 or P2 incident:

Do not reboot affected systems before preserving volatile evidence
Capture full disk snapshot (LVM snapshot or filesystem-level copy)
Export all relevant logs to a separate, secure location
Record network connections (ss -tunapl, iptables -L -n)
Capture running processes (ps auxf, /proc state)
Timestamp and hash all evidence files (SHA-256)
Maintain chain of custody documentation

11. Testing¶

Test	Frequency	Method
Tabletop exercise	Annually	Walk through a P1 scenario with all role holders
Detection validation	Quarterly	Trigger test alerts and verify notification delivery
Runbook review	Semi-annually	Review and update all response procedures
Communication test	Annually	Send test notification through all escalation channels

12. Compliance Mapping¶

SOC 2 Criteria	Control
CC7.2	Monitoring of system components for anomalies
CC7.3	Evaluation of identified security events
CC7.4	Incident response and containment
CC7.5	Communication of incidents to affected parties
CC2.3	Internal communication of security matters