Darkfield: a two-model architecture
for autonomous operational vision.
This brief describes the system as it stands in private beta. It covers the architecture, the data flow, the training mechanism, and the deployment topology. It is intended for technical leads at partner organizations evaluating an integration.
TL;DR
Darkfield is a vision platform consisting of two proprietary models. THINK plans pipelines and supervises evaluation; SEE perceives every frame at sub-fifty-millisecond latency. THINK orchestrates SEE end-to-end — composing detection graphs, scoring outputs against its own annotations, and dispatching per-camera retrains — with no human in the loop.
- Onboarding is three inputs: an operation description, an RTSP stream, and a tracking prompt — all in plain language.
- Output is a structured table whose schema THINK authors and revises. Notifications can be email, SMS, or voice call.
- The model audits itself on rolling samples and triggers per-camera finetunes when precision falls below tolerance.
Architecture
Five stages, two models, one closed loop. The arrows below describe the runtime data flow during steady-state operation; onboarding is the same path executed in slow motion under human review.
figure 02.1 · runtime data flow · arrows are causal, not synchronous
Why two models.
A single foundation model that watches every frame is too expensive to run continuously and too slow to react inside a control loop. A single small model that runs every frame is, on its own, brittle: it cannot decide what to track, cannot revise its own pipeline, cannot tell you when the camera is in the wrong place.
Darkfield separates the two cognitive roles deliberately. THINK is the planner — multimodal, slow, expensive, and run only when a decision is needed. SEE is the perceiver — small, fast, specialized to a single camera, and run on every frame. THINK controls SEE the way a senior engineer controls a junior one: by configuration, by inspection, and by retraining — all of it autonomous from the partner's point of view.
The split is structural, not a packaging decision. The two models are trained separately, evaluated separately, and improved on different cadences. SEE specializes per-camera over hours; THINK improves across the entire fleet over weeks.
Model specifications.
| THINK | SEE | |
|---|---|---|
| role | planner · supervisor | perceiver · per-frame |
| modality | multimodal (text + vision + tools) | vision · prompted by language |
| parameter count | undisclosed · private beta | undisclosed · private beta |
| typical latency | seconds to minutes per decision | < 50 ms per frame on a single GPU |
| invocation | on-demand · throttled | continuous · per-camera |
| training cadence | monthly · across the fleet | continuous · per-camera adapter |
| output | plans · schemas · evaluations · alerts | boxes · masks · IDs · OCR strings |
Continuous training.
A vision model deployed once and never retrained loses precision as the scene drifts: lighting changes with the seasons, conveyors are re-tooled, signage is replaced, cameras are bumped. Conventional CV pipelines respond to this with a calendar — a quarterly retrain whose effectiveness is unmeasurable until the next one.
Darkfield retrains on a signal, not a calendar. THINK pulls a rolling sample of recent events, scores SEE's outputs against its own annotations, and launches a per-camera finetune when precision drops below the tolerance declared at onboarding. The new weights are validated on a held-out slice before they are hot-swapped into production.
The result is that SEE is, in effect, a different model on every camera — and a slightly different one every week. We have observed a 6–11pp recall gap closed within the first 48 hours of operation on partner sites, with no human annotation involved. The methodology is written up internally and will be published once the private beta closes.
Performance characteristics.
Indicative numbers from steady-state operation across partner sites. Latency targets are met on a single L4 GPU per camera; THINK runs on a small shared cluster.
| metric | SEE · per camera | THINK · cluster |
|---|---|---|
| p50 latency | 22 ms / frame | 2.4 s / decision |
| p99 latency | 48 ms / frame | 14 s / decision |
| throughput | ~30 fps · 1080p | up to 1,200 decisions/min |
| typical hardware | 1× L4 (24 GB) | shared H100 pool |
| uptime SLO | 99.5 % per stream | 99.9 % control plane |
| finetune cost | ~ $4 · 90 min / cycle | — |
Deployment topology.
Three modes are supported. The choice is driven by network conditions, residency requirements, and how much hardware the partner is willing to host.
Hosted
Cameras stream to Darkfield's regional ingest. SEE and THINK run in our infrastructure. Lowest operational burden, highest dependency on uplink bandwidth.
- UK primary
- SOC 2 Type II in progress
- ≥ 4 Mbps per 1080p stream
Edge SEE · cloud THINK
SEE runs on a partner-hosted edge box at the camera. Only events and clips reach the cloud, where THINK supervises. A good fit for sites with constrained uplinks or sensitive footage.
- 1× L4 / 4 cameras at the edge
- Footage never leaves the site
- ~ 50 KB/event in egress
Air-gapped
Both models on the partner's hardware. Updates are signed and shipped on disk. Required for some defense and critical-infrastructure deployments.
- 1× H100 minimum for THINK
- Signed weight updates · no telemetry
- Quarterly on-site engineering review
Security and compliance.
- Data residency
- UK primary today. On-prem and air-gapped deployments available for partners with stricter requirements.
- Footage retention
- Default: clips and screenshots retained for 30 days; structured event rows retained indefinitely. Both configurable per-deployment.
- Access control
- SSO via OIDC. Per-camera and per-schema RBAC. Full audit log of every model decision, every tool invocation, and every operator action.
- Compliance posture
- UK GDPR-compliant by default. SOC 2 Type II in progress, expected Q3 2026. ISO 27001 on the roadmap for 2027.
- Model security
- Weight files signed and verified at load. Per-camera adapters are quarantined from the foundation model and from other adapters.
- Privacy posture
- No biometric identification by default. Optional face-blurring at the edge before any frame leaves the site. PII never used as a training signal.
Bring us a stream and a question.
The brief is not the product. The product is a six-week onboarding in which the model writes its own pipeline against your operation. We're taking on a small number of partners this quarter.