An AI that trains itself,
until it hits the accuracy you asked for.
You agree an accuracy target on day one. The AI watches its own output, retrains on the actual scene when accuracy drifts, and validates the new version before it goes live. You see every change.
Five stages.
Running without interruption.
The eval-and-retrain loop runs continuously from hour 48 of deployment. No calendar. No ticket. No engineer.
SEE produces detections at line rate.
Every frame processed by SEE emits a detection event. Events are stored with full provenance — camera, timestamp, bounding box, label, confidence, track ID.
→THINK pulls a rolling N-sample.
Rather than reviewing every event, THINK pulls a statistically significant sample from the rolling window. Sample size is tuned to detection density — busier scenes sample more sparsely.
→THINK annotates the sample; scores SEE against it.
Using its own multimodal understanding, THINK annotates sampled frames as ground truth, then scores SEE's predictions. Precision and recall are computed against the data table agreed at onboarding.
→Within tolerance? Continue. Below? Diagnose.
If metrics are inside the agreed tolerance, the loop continues. If below, THINK diagnoses the failure mode: labelling problem (wrong class), coverage problem (camera angle), or model-fit problem (wrong base model).
→Finetune SEE, or swap base model. Hot-swap on validation.
For labelling and coverage problems: a per-camera adapter is trained. For model-fit problems: the system switches to RT-DETR, YOLO-26, or a future SAM3-class model when one fits better. New weights go live on validation — without stopping inference.
// the loop doesn't fire below the agreed tolerance — it fires when precision drops below it. the bar is set by you at onboarding.
The dataset, curated by the model.
THINK selects training examples actively. Not randomly from the event log — strategically, identifying the frames that carry the most signal:
- →High-uncertainty frames — where SEE's confidence was low.
- →Edge-of-distribution lighting — low-light, direct sun, reflection.
- →Novel object configurations — orientations or arrangements not seen in training.
- →Disagreements — frames where SEE's label and THINK's label differ.
The result is a small, high-leverage dataset — typically 2,000–4,000 samples per per-camera adapter — not a large noisy one. The model teaches itself efficiently.
What selection looks like.
A mock of the curation output from a recent cold-chain deployment. THINK selected 47 frames from 18,000 events in the rolling window — the 47 that moved the metric.
// 47 frames selected from 18,000 events · cold-chain deployment · cycle 14
You set the bar.
The AI holds it.
At onboarding, you agree precision and recall tolerances per capability. Auto-training fires below the agreed bar, not on a schedule. Nothing retrains without a metric crossing the threshold you set.
Freeze weights for an audit window — no retrains, no swaps, no changes. Available from Dashboards with a single toggle.
Every weight change is recorded — the eval that triggered it, the sample that was used, the validation that cleared it. Immutable; available in Dashboards under the Auto-Training tab.
Tolerances are editable at any time. Tighter tolerances trigger more frequent retrains; looser tolerances reduce overhead. Changes take effect at the next eval cycle.
Why we sometimes pick
a different model.
Not every problem is a Darkfield-SEE problem. If RT-DETR on its own already hits the tolerance bar for a particular camera, that's what runs — because running SEE on that camera would be unnecessarily expensive without a precision benefit. The AI decides this, not a configuration screen.
We're honest about this because we think it builds more trust than pretending SEE is always the right tool. The goal is the metric, not the model. If a future SAM3-class model outperforms SEE on your specific scene, auto-training will swap it in — with your approval, in the audit log.
- →SEE's per-camera adapter is at saturation — further finetuning produces <0.5pp gain.
- →A candidate model scores higher than SEE on the current eval set with zero fine-tune.
- →Scene properties have shifted substantially (new lighting rig, new product line) and SEE needs a full retrain to adapt.
The curve from
onboarding to steady state.
Indicative numbers from recent partner deployments. The first retrain fires at hour 06–12 in most deployments; steady state is typically reached by hour 48.
// indicative numbers across recent partner sites · individual results vary · methodology held internal until beta closes — research
Set the bar.
Let the AI hold it.
We're onboarding a small number of partners in private beta.