Why this matters for EHS, not just data science
EHS managers in KSA are increasingly asked to sign off on AI vision systems where the technical evidence is presented in language designed for ML engineers. Two things follow. First, vendors quote favourable metrics; nobody pushes back. Second, when the system underperforms in the field, EHS owns the outcome.
The fix is to insist on three numbers, all anchored in the PPE detection solution and the how-accurate-is-PPE-detection answer:
- Precision at the operating threshold — what fraction of alarms are real violations.
- Recall at the operating threshold — what fraction of real violations are caught.
- False-positive rate per camera-hour — how loud the system is in practice.
Everything else is supporting context.
IoU in plain language
IoU (Intersection over Union) measures how well a predicted bounding box overlaps the ground-truth bounding box. An IoU of 1.0 is a perfect match; an IoU of 0.0 is no overlap.
For PPE work, the convention is:
- IoU 0.5 — the predicted box overlaps at least 50% with the ground truth. Standard for hard-hat detection.
- IoU 0.75 — stricter, used for academic benchmarks. Rarely the operating threshold in industrial deployments.
When a vendor quotes “mAP@0.5”, they mean mean Average Precision evaluated at IoU 0.5. That is one number on one validation set, not a guarantee on your site.
What mAP does and does not tell you
mAP (mean Average Precision) is a single scalar that summarises a precision-recall curve across multiple confidence thresholds and classes. Its strengths and weaknesses:
| What mAP tells you | What mAP hides |
|---|---|
| Overall model quality on a labelled set | Behaviour on your site |
| Useful for comparing models at the same task | Sensitivity to demographic and lighting shift |
| A reasonable proxy for “does this model work” | Per-class breakdown |
| Comparable across versions of the same vendor | Comparable across vendors only with the same test set |
mAP@0.5 numbers above 0.85 are common in 2026; numbers above 0.95 should be treated with suspicion until the test set is disclosed.
Precision and recall — the EHS-relevant pair
These two are the metrics EHS managers should anchor on:
- Precision = of all detections the system raised, how many were real. High precision means few false alarms.
- Recall = of all real violations, how many the system caught. High recall means few misses.
The trade-off is direct: lowering the confidence threshold raises recall but drops precision, and vice versa. The goal is to choose a threshold that balances supervisor trust (precision) against safety coverage (recall) for your site’s risk profile.
For a hard-hat deployment on an Aramco contractor gate, typical 2026 operating points are:
- Precision ≥ 0.92 at IoU 0.5, confidence 0.65
- Recall ≥ 0.88 at the same operating point [VERIFY-SME]
Below those numbers, supervisor trust erodes and the system stops being used.
Confusion matrix and what to read in it
A confusion matrix lays out true positives, false positives, true negatives, false negatives. For hard-hat detection on a 10,000-frame test set, a defensible report looks like:
| Predicted: violation | Predicted: compliant | |
|---|---|---|
| Actual: violation | 880 (TP) | 120 (FN) |
| Actual: compliant | 75 (FP) | 8,925 (TN) |
That gives precision 0.92, recall 0.88, false-positive rate 0.83% per frame. See the confusion matrix glossary, precision and recall entries for definitions.
The false-positive cost equation
The number that decides whether the system survives in production is false positives per camera-hour. A naive 5 fps stream produces 18,000 frames per hour per camera. Even a 0.1% per-frame false-positive rate yields 18 nuisance alerts per camera per hour — multiply by 200 cameras and the supervisor abandons the system within a week.
Three engineering controls bring this down to operationally tolerable levels:
- Persistence rule: a violation only fires if the same worker track holds the violation for 3 consecutive frames at 5 fps.
- Zone gating: hard-hat rules only evaluated in zones where helmets are mandatory; tied to the perimeter monitoring solution.
- Permit-aware suppression: confined-space and indoor-office zones are excluded.
After those controls, a well-tuned 2026 system on a KSA contractor site lands at around 1–3 nuisance alerts per camera per shift, which is operationally tolerable.
What to demand in a vendor proposal
A defensible vendor proposal shows three tables explicitly:
- Per-class precision/recall at the operating IoU and confidence — not just aggregate mAP.
- Per-condition breakdown — daylight, dusk, night, dust event, indoor.
- False-positive rate per camera-hour after persistence and zone gating.
If the proposal does not include those tables, send it back. The top 10 platforms shortlist names vendors expected to provide this level of disclosure.
Common vendor metric tricks
Recurring 2024–2026 patterns to watch for:
- Test set leakage — model evaluated on frames similar to training data. Demand a held-out KSA validation set.
- mAP@0.5 only — without a precision-recall curve, you cannot pick an operating threshold.
- Aggregate-only numbers — a 0.9 mAP can hide a 0.6 recall on FRC coveralls.
- “99% accuracy” — accuracy on a heavily imbalanced test set (most frames have no violation) is a meaningless number.
For honest comparisons see the comparisons hub.
How to validate on your own site
A two-week validation protocol:
- Capture 4 hours of footage across daylight, dusk and night from each candidate camera.
- Hand-label a subset of 1,000 frames against your PPE class list.
- Run the vendor’s model on the same set and compute precision/recall per class.
- Ask for the operating threshold to be chosen against your test set, not the vendor’s.
Anchor this in the PPE detection contractor guide and the Aramco EHS compliance guide.
Next steps
If you are scoping or auditing a hard-hat detection system in 2026, start with the PPE detection solution, the how-accurate-is-PPE-detection answer, and the PDPL compliance checklist. Cross-link to the edge inference glossary and to the object detection entry for the underlying primitives.
Book an EHS-side accuracy review and we will produce a precision-recall report on your own footage within two weeks.

