Research Prototype · BSL 1.1 → Apache 2.0 2030

Multi-layer defense for autonomous agents

Four-classifier ensemble. 16 AoC defense modules. Distributed O₂ consensus. First open-source defense against Microsoft ThoughtVirus (arXiv:2603.00131).

AoC failure modes
blocked

Classifier
ensemble

Protection
layers

240

Tests
passing

$ git clone https://gitlab.com/cognitive-functors/c4-meta-system
$ cd c4-meta-system
$ docker compose --profile distributed-o2 up -d

Seven-Level Defense Architecture

L1Mathematical Empathy Checkpoint: Self→System requires Other mediation. Structural impossibility, not a rule.

L2Architectural Sandboxing, resource limits, tier enforcement. Capability tokens, not RBAC.

L3Behavioral 16 AoC defense modules. C4 state trajectory monitoring. Per-category analysis.

L4SVETILO Values 7 ethical seals verified on every O₂ window. Religiously neutral. 8 wisdom traditions.

L5Governance CapMAC cryptographic tokens. Two-person rule with Ed25519 dual signatures. Timeout + audit trail.

L6Telemetry 15 Prometheus metrics. 11 AlertManager rules. 20 Grafana panels. OpenTelemetry tracing.

L7Emergency Stop Circuit breaker with cryptographic jitter. Automatic halt on empathy violation.

Research-Backed Defenses

◈

Agents of Chaos

arXiv:2602.20021 · Shapira et al. · Feb 2026

All 11 documented failure modes mapped to C4 states + 5 extended categories. Our defense predates the paper — we were building for the problem before it was named.

◉

ThoughtVirus

arXiv:2603.00131 · Weckbecker et al. · Feb 2026

Two-layer defense: regex explicit patterns + C4 trajectory KL-drift detection. Catches implicit bias propagation where content filters fail. First open-source ThoughtVirus defense.

Core Capabilities

⊞

4-Classifier Ensemble

ONNX BERT (416MB) + RuleBased (152 patterns) + Heuristic + LLM Semantic. Dual OR-gate safety net.

⊡

CapMAC Access

Cryptographic capability tokens. Unforgeable, attenuable, revocable. Not RBAC.

⊛

Distributed O₂

Redis backend + consistent hash ring + Raft consensus (Redis pub/sub). 3-node cluster.

⎔

Deobfuscation

95 homoglyphs + Unicode NFKD + Base64/ROT13 + zero-width chars + leetspeak. 6-stage pipeline.

⬡

Circuit Breaker

ARMED → TRIPPED → RECOVERY state machine. Cryptographic jitter. Auth-protected reset.

⚡

Honeypots

7 trap types. Dynamic realistic data generation. Session-aware deployment. Trap detection.

License Tiers

Academic

Free

Research & education

30 req/min · 100MB models

Startup

$499/mo

<$1M revenue

120 req/min · 250MB models

Business

$2,499/mo

$1M–$100M revenue

600 req/min · 500MB models

Enterprise

Custom

>$100M revenue · SLA

Unlimited · On-prem

Resources

▣

Red Team Challenge

⚔

Break our defenses. We'll thank you.

No AI safety system is perfect. If you find a prompt that bypasses any defense layer, report it and we'll fix it — publicly acknowledging your contribution.

🦊 Report Bypasses