I&T Wish - Virtual Construction Safety Assistant - based on Multimodal Large Language Model embedded with construction safety knowledge 2025-05-21
Virtual Construction Safety Assistant - based on Multimodal Large Language Model embedded with construction safety knowledge
I&T Wish
Virtual Construction Safety Assistant - based on Multimodal Large Language Model embedded with construction safety knowledge (REF: W-0571)
Matched I&T Solution
Trial Project
Summary and Challenges
This project will develop a Virtual Construction Safety Assistant (VCSA) based on a Multi-modal Large-Language Model (MLLM) fine-tuned for "construction works" and "safety." The VCSA will provide real-time safety monitoring on construction sites using semantic reasoning to detect and record a broader range of safety rule violations that require high-level reasoning, surpassing conventional computer vision methods suffering from the following limitations:
Inflexible against diverse regulatory safety standards: Current AI models are trained on static scenarios, thus are limited to predefined tasks and inflexible to evolving site conditions;
Ineffective computational resource utilization: Cloud-only computing suffers from high computational costs and latency in critical incident alerting, while edge-only computing lacks analytical capacity for complex reasoning-intensive analyses;
Reactive risk identification without causal reasoning: Existing systems superficially flag safety violations, while lacking capabilities in deeper root-cause analyses and actionable mitigation planning tailored to site-specific metadata;
Lacked contextual awareness and refinement capability: Existing anomaly detection models excel at spotting obvious violations but struggle with ambiguous edge cases and over-reliance on manual corrections involving tedious data labeling.
Expected Outcome
By automatically identifying safety violations and issuing timely alerts, the VCSA aims to enhance worker well-being and optimize resource allocation for safety. It will be integrated as a software plugin into CCTV monitoring systems and alerting devices, ensuring comprehensive safety management on-site.
Technologies shall include but not limited to the followings:
Large-small language model co-adapter frameworks, with a coefficient variation-based difficulty grading mechanism to stratify training datasets by the samples’ degree of complexity, and with multi-stage curriculum learning strategies to progressively optimize the AI models toward advanced loss functions tailor-made for construction safety applications;
Hybrid edge-cloud computing paradigms with staged analyses based on criticality-latency trade-off, for cost-effective resource utilization and timely incident reporting to safety managers;
Integration of chain-of-thought prompting with agentic retrieval-augmented generation frameworks, for transforming risk identification into root cause reasoning and actionable mitigation planning, dynamically augmented by site-specific metadata (e.g. historical incidents, evolving project requirements or site conditions);
Multi-stage time-series analytical frameworks for automated video-based anomaly detection for high-confidence tasks, integrated with continual feedback-and-correction mechanisms upon low-confidence edge cases, to refine the AI models’ site-specific knowledge for enhanced scene-adaptive generalization and contextual awareness of the AI models.