models

Two models.
One thinks, one looks.

Darkfield is built on a deliberate split: a multimodal planner that reasons across minutes, and a vision perceiver that responds in under fifty milliseconds. Neither model is general-purpose. Both were designed for the same problem.

/ architecture

Planner and perceiver,
designed together.

system 2 planner multimodal

Darkfield THINK

Reads documentation, inspects streams, authors pipelines, supervises SEE. Runs when reasoning is needed — not on every frame.

Conversational planning — ingests PDFs, SOPs, KPIs to negotiate a schema before detection begins.
Live video understanding — inspects feeds and recommends repositioning when a camera is insufficient.
Pipeline authorship — composes detect / track / OCR / count / zone steps into a runnable graph.
Spatial grounding — resolves natural language to pixel coordinates on the image.
Finetune controller — evaluates SEE's output, curates training pairs, decides retrain vs. model-swap.

Why two models,
not one?

The trade-off between reasoning capacity and per-frame latency makes a single-model design strictly worse than the split. A multimodal LLM can't run at fifty milliseconds per frame; a frame-rate vision model can't read an SOP or author a schema. Running a large model on every frame would cost three orders of magnitude more and still produce worse plans.

The split also gives you a natural separation of trust: SEE handles the data plane (raw video, bounding boxes, labels) while THINK handles the control plane (planning, evaluation, retraining decisions, outbound alerts). That boundary makes both models auditable, independently replaceable, and easier to reason about in a compliance context.

THINK

roleplanner · orchestrator

latency targetseconds–minutes

input modalitytext · image · video · docs

fires whenon planning events

SEE

roleperceiver · data plane

latency target<50ms per frame

input modalityvision only

fires whenon every frame

Ready to see the models
against your streams?

We're onboarding a small number of partners in private beta.

read the technical brief

Two models.One thinks, one looks.

Planner and perceiver,designed together.

Why two models,not one?

Ready to see the modelsagainst your streams?

Two models.
One thinks, one looks.

Planner and perceiver,
designed together.

Why two models,
not one?

Ready to see the models
against your streams?