/ research at darkfield

An operating model for
autonomous perception.

Darkfield is, first, a lab. Our work is on building vision systems that decide their own pipeline, evaluate their own output, and improve themselves with no engineer in the loop. Below: the threads we're pulling on.

/ focus areas

Five threads.
Two models, end-to-end.

Each thread is led by a small team and reports against a single benchmark. Internal seminars on Wednesdays; external talks twice a year.

001 / spatial grounding

Aiming language at pixels.

Resolving a noun phrase like "the third valve from the left" into a pixel cluster, with industrial-grade precision. THINK's grounding head is distilled from an internal annotation corpus we'd never use for training SEE directly.

lead · senior researcher
002 / type-aware perception

Subtypes without a class list.

A spiral mixer and a planetary mixer share most pixels. We train SEE on a hierarchy of industrial subtypes so a single prompt resolves to the right machine — not just the right category.

lead · staff researcher
003 / autonomous evaluation

Letting the model grade its own homework.

THINK samples SEE's output and scores it against a schema THINK itself authored. The hard part is making that audit trustworthy — and recovering when it isn't.

lead · senior researcher
004 / per-camera specialization

Finetune as a runtime, not a sprint.

SEE specializes per-camera over its first 48 hours of operation, then continues to drift toward the scene as it ages. We treat the model as a population, not a single artifact.

lead · staff researcher
005 / closed-loop deployment

The AI orchestration layer as a research target.

An environment a model can operate in is a different design problem from one a human can operate in. Tools must be typed, retries must be cheap, and side effects must be reversible at the granularity the model reasons at. This is the longest-running and least public thread in the lab.

lead · research engineer
/ benchmark · public release

DarkfieldOps-300.
The eval set that makes subtypes measurable.

Open-vocabulary detection benchmarks test whether a model can find objects. They don't test whether it can tell a spiral mixer from a planetary mixer. DarkfieldOps-300 does.

WHAT IT IS

300+ production-line objects across five industrial settings, each labelled at two levels: the parent class (mixer, filler, vehicle) and the subtype (spiral / planetary, rotary / linear / piston, tanker / flatbed / curtain-side). Footage is real CCTV — the angles, resolutions, and lighting conditions of actual deployments, not studio captures.

300+ annotated objects · 14,200 subtype labels · 5 industrial verticals
Real CCTV footage · partner-sourced under research licence
Published with research licence · code and baselines included
VERTICALS COVERED
Food manufacturing mixers · fillers · ovens · dividers
Cold-chain logistics pallets · racking · forklifts
Forecourt vehicles · number plates · pumps
Warehouse yard lorry subtypes · dock doors · dwell zones
Quarry / aggregates machinery · conveyor states · site entries
WHY IT EXISTS · THE TYPE-AWARE SCORE

Standard mAP measures whether the right bounding box appeared. It treats spiral mixer and planetary mixer as indistinguishably correct if the box is in the right place. DarkfieldOps-300 introduces a type-aware score: macro-F1 over subtype labels conditioned on correct parent-class detection. A model that finds every mixer but cannot tell them apart scores near zero. A model that distinguishes subtypes reliably scores near one.

0.89
SEE · type-aware score
0.32
open-vocab baseline · same eval
/ papers

On hold until
private beta ends.

We have several papers written and ready, drawn from the threads above. We're holding them internal until the private beta closes — once partners are public, the work will be too. If you're a researcher and want to talk about the underlying methods, write to us.

hello@linox.co.uk

Want to test the models
against your streams?

We're onboarding a small number of partners in private beta.

read the technical brief