How accurate is PPE detection AI in real construction sites?
Modern PPE detection models reach 0.91 to 0.96 mAP@0.5 on benchmark datasets, but real-site accuracy on Saudi construction projects typically sits at 0.86 to 0.93 mAP after site tuning. Hard hats and high-visibility vests detect at over 96% recall; harnesses and respirators sit at 78 to 88%. False-positive rate after two weeks of tuning runs under 4% per shift on FI Tech deployments.
Lab benchmarks oversell. Real construction sites have dust, glare, partial occlusion, multiple PPE classes, and workers in unusual postures. Honest accuracy reporting separates lab numbers from field numbers and breaks them down per PPE class.
Field accuracy by class — typical 90-day baseline
| Class | Recall | Precision | Notes |
|---|---|---|---|
| Hard hat | 96 to 98% | 95 to 97% | Strongest class; clear shape, distinct against most backgrounds |
| Hi-vis vest | 94 to 97% | 93 to 96% | Reflective material can confuse at night without IR |
| Safety boots | 83 to 90% | 82 to 88% | Often occluded by dust, mud, low camera angle |
| Harness (work-at-height) | 80 to 88% | 78 to 86% | Confusable with regular straps; needs class-specific training data |
| Respirator / dust mask | 78 to 86% | 76 to 84% | Small object; benefits from 4MP+ cameras |
| Gloves | 74 to 84% | 72 to 82% | High intra-class variance, often hand-occluded |
| Goggles / face shield | 72 to 82% | 70 to 80% | Glare-sensitive; needs anti-reflection lens or angled camera |
What degrades accuracy on Saudi sites
- Dust and haze — visibility under 3 km cuts IoU by 8 to 14% for small classes.
- Camera height and angle — over 8 m elevation with steep tilt reduces recall on boots and gloves up to 18%.
- Resolution — under 1080p on PPE smaller than 40 px loses up to 22% recall.
- Crew uniformity — Aramco red coveralls or NEOM-spec uniforms not in baseline training data drop precision until retrained.
- Backlighting — strong west sun after 14:00 in summer increases false negatives 6 to 10%.
- Ramadan night shifts — low-light conditions need IR-capable cameras or supplementary lighting.
How accuracy improves over time
- Day 0 to 14 — out-of-the-box deployment on a tuned baseline model. Expect 85 to 89% mAP.
- Day 15 to 45 — site-specific images annotated and added to training set. Expect 88 to 92%.
- Day 46 to 90 — hard-negative mining from false positives stored from production. Expect 91 to 94%.
- Day 90+ — quarterly retraining cycle. Stable at 92 to 95% for primary classes.
What false positives cost
Untuned alerts erode supervisor trust within two weeks. Target under 4% false-positive rate per shift. FI Tech deployments tune by combining temporal smoothing (a violation must persist 1.5 seconds before alerting), zone gating (only alert in mandatory-PPE zones), and class-confidence thresholds tuned per camera.
For pricing tied to class count, see PPE detection cost. For sensor placement, see CCTV integration.