Research Prototype · BSL 1.1 → Apache 2.0 2030

Multi-layer defense for autonomous agents

Four-classifier ensemble. 16 AoC defense modules. Distributed O₂ consensus. First open-source defense against Microsoft ThoughtVirus (arXiv:2603.00131).

16
AoC failure modes
blocked
4
Classifier
ensemble
7
Protection
layers
240
Tests
passing
$ git clone https://gitlab.com/cognitive-functors/c4-meta-system $ cd c4-meta-system $ docker compose --profile distributed-o2 up -d

Seven-Level Defense Architecture

L1Mathematical Empathy Checkpoint: Self→System requires Other mediation. Structural impossibility, not a rule.
L2Architectural Sandboxing, resource limits, tier enforcement. Capability tokens, not RBAC.
L3Behavioral 16 AoC defense modules. C4 state trajectory monitoring. Per-category analysis.
L4SVETILO Values 7 ethical seals verified on every O₂ window. Religiously neutral. 8 wisdom traditions.
L5Governance CapMAC cryptographic tokens. Two-person rule with Ed25519 dual signatures. Timeout + audit trail.
L6Telemetry 15 Prometheus metrics. 11 AlertManager rules. 20 Grafana panels. OpenTelemetry tracing.
L7Emergency Stop Circuit breaker with cryptographic jitter. Automatic halt on empathy violation.

Research-Backed Defenses

Agents of Chaos

arXiv:2602.20021 · Shapira et al. · Feb 2026

All 11 documented failure modes mapped to C4 states + 5 extended categories. Our defense predates the paper — we were building for the problem before it was named.

ThoughtVirus

arXiv:2603.00131 · Weckbecker et al. · Feb 2026

Two-layer defense: regex explicit patterns + C4 trajectory KL-drift detection. Catches implicit bias propagation where content filters fail. First open-source ThoughtVirus defense.

Core Capabilities

4-Classifier Ensemble

ONNX BERT (416MB) + RuleBased (152 patterns) + Heuristic + LLM Semantic. Dual OR-gate safety net.

CapMAC Access

Cryptographic capability tokens. Unforgeable, attenuable, revocable. Not RBAC.

Distributed O₂

Redis backend + consistent hash ring + Raft consensus (Redis pub/sub). 3-node cluster.

Deobfuscation

95 homoglyphs + Unicode NFKD + Base64/ROT13 + zero-width chars + leetspeak. 6-stage pipeline.

Circuit Breaker

ARMED → TRIPPED → RECOVERY state machine. Cryptographic jitter. Auth-protected reset.

Honeypots

7 trap types. Dynamic realistic data generation. Session-aware deployment. Trap detection.

License Tiers

Academic
Free
Research & education
30 req/min · 100MB models
Startup
$499/mo
<$1M revenue
120 req/min · 250MB models
Business
$2,499/mo
$1M–$100M revenue
600 req/min · 500MB models
Enterprise
Custom
>$100M revenue · SLA
Unlimited · On-prem

Resources

Red Team Challenge

Break our defenses. We'll thank you.

No AI safety system is perfect. If you find a prompt that bypasses any defense layer, report it and we'll fix it — publicly acknowledging your contribution.

🦊 Report Bypasses