feature · detection

Eight things every CCTV deployment needs,
in one AI.

Detection, tracking, segmentation, counting, OCR, heatmaps, classification, zones — and you ask for them in plain English. No class lists to pick from. No labels to draw.

/ eight capabilities

Eight building blocks.
One model runs them all.

01 / detect objects

Open-vocabulary, type-aware.

Prompted in plain language — no class list required. SEE distinguishes a spiral mixer from a planetary mixer in the same frame on the first pass. Adding a new object class is a sentence, not a sprint.

02 / detect people

Persistent identity, role inference, no biometrics.

Operators and visitors distinguished by role cue (high-vis vs. plain clothes), assigned persistent IDs by appearance and trajectory. No biometrics by default; face recognition opt-in per deployment, under UK-GDPR.

03 / track

Re-ID through occlusion and frame re-entry.

Built-in tracker trained jointly with the detection head. A forklift's path is drawn as a stable polyline across a dock, ID intact through an occlusion event. No third-party tracking library required.

Pixel-accurate segmentation masks on waste items in a bin
04 / segment

Pixel-accurate masks, not just boxes.

Masks available for every detected object. Useful for exact pallet area calculations, contamination detection, and fill-level estimation. SAM-class decoder adapted for CCTV aspect ratios and scene density.

Bottles being counted on a production line
05 / count

Stable totals under crowding and motion blur.

Count chip overlaid per-frame. Robust against the conditions industrial scenes actually produce — tight stacking, fast movement, and partial occlusion at the edges of frame. Shift totals accumulated automatically.

Printed text and labels — OCR target
06 / OCR

Calibrated for CCTV angles and lighting.

Number plate recognition, label reading, and asset ID capture. Handles oblique angles, partial shadow, and the motion blur typical of entry/exit cameras. OCR output is attached to the detection event row.

Vehicles and equipment classified by subtype
07 / classify

Fine-grained subtype within a detection.

Lorry → tanker / flatbed / curtain-side. Filler → rotary / linear / piston. The subtype vocabulary is open — the AI extends it at onboarding with a plain-language description. No additional labelled data required for the first pass.

A packing-line zone — defined by polygon, watched continuously
08 / zones

Polygons in real-world coordinates; entry, dwell, violation.

Zones are defined in plain language — "the staging area near dock B" — and grounded to pixel coordinates by the AI. Entry events, dwell timers, and violation flags are emitted as structured rows to Dashboards.

/ compositions

Capabilities combined
into higher-order outputs.

Most operational questions are answered not by a single capability, but by a chain. The AI builds these automatically from your prompt.

common compositions

Heatmaps
detect + zones + time aggregation → spatial frequency overlay showing where objects spend most time. Useful for layout optimisation and throughput analysis.
Cycle time
track + zones + state machine → time between zone entry events per tracked object. Produces a per-machine or per-station cycle-time distribution over a shift.
License plate read
detect car → crop → detect plate → OCR → lookup → full vehicle record attached to the event row.
Dwell alert
track + zones + dwell timer + VLM confirmation → alert dispatched when an object exceeds the agreed dwell threshold inside a named zone, with a Vision-Language pass to filter false positives.

// compositions are authored by the AI at onboarding time — not configured by a user in a UI tree.

/ latency

Per-frame budget
on the Linox AI vision-box.

capability latency @1080p notes
detect objects 38ms base detection head · open-vocabulary
detect people 38ms shared backbone with object detection · parallel
track +4ms incremental; re-ID pass only on new objects
segment +6ms mask head; runs on detected boxes only
count <1ms aggregation over detection outputs · negligible
OCR +8ms crop-and-pass on detected text regions only
classify (subtype) +3ms classification head on existing crop
zones <1ms polygon intersection test · CPU-side
full composition (detect + track + OCR + zones) 44ms typical production pipeline · within 50ms budget

// measured on the Linox AI vision-box that ships with every deployment · FP16 · 1080p · per-camera adapter loaded · batch-1 single-stream baseline

/ honesty

What detection
won't do.

We'd rather be clear about the edges of the capability now than have you discover them in production.

face recognition

Off by default. Detection assigns persistent IDs from appearance and trajectory, no biometrics. Available as an opt-in per deployment under a UK-GDPR-compliant agreement — for example, authorised-access verification or persistent identity across cameras — but never the default.

gesture recognition

Limited — coarse gestures (raised hand, pointing) are possible with per-camera fine-tune; fine-grained sign language or nuanced interaction is outside the current capability.

audio

Out of scope. SEE is a vision model. Microphone feeds are not processed at any stage.

very small objects

Objects below ~16×16 pixels at inference resolution are unreliable. The AI flags this at stream inspection time and recommends a closer camera if the task requires it.

Eight capabilities,
running against your streams.

We're onboarding a small number of partners in private beta.

read about SEE →