Page 527 - AI for Good Innovate for Impact
P. 527
AI for Good Innovate for Impact
(continued)
Item Details
Model Training Data Collection and Prep: In the early stage, on-site surveillance footage
and Fine-Tun- materials were collected, totaling 100GB, including video data of simulated 4.5: Manufacturing
ing smoke release on-site. After deploying the large model, data screening was
conducted, and misidentified data from the scene was added to the model
training, achieving dual iteration of both the model and data.
Model architecture and Training: As a transformer-based Computer Vision
(CV) large model from ZTE, NebulaCV-grounding 2.0 is an innovative open-
set object detection model. It not only supports traditional text prompt
detection but also introduces visual prompt interactive detection, effec-
tively addressing the challenge of detecting long-tail categories in object
detection. The model consists of four core modules: a text prompt encoder,
a visual prompt encoder, an image encoder, and an encoder-decoder
layer. The text prompt encoder allows users to specify detection categories
through text descriptions, while the visual prompt encoder enables users
to annotate detection targets interactively on images, significantly enhanc-
ing the model's flexibility and practicality. Additionally, Grounding2.0
innovatively incorporates the Mixture of Experts (MOE) architecture in the
decoder, significantly improving model performance while maintaining
inference speed, achieving a dual optimization of efficiency and accuracy. It
is trained on over 2 billion general samples and 60 million industry-specific
samples to enhance specialized scenario understanding capabilities.
Validation (Performance Metrics) and Fine-tuning: The project employs
on-site practical testing to verify accuracy. It utilizes real-world data for
Reinforcement Fine-Tuning, optimizing the performance of large models
through reward-driven training cycles to enhance their reasoning and
generalization capabilities in specific tasks or domains.
Testbeds or The solution has been deployed and tested at Taicang Port in Jiangsu,
Pilot Deploy- China, with a total of 65 tests conducted.
ments These included 30 smoke videos and 35 non-smoke videos, covering
scenarios such as daytime, nighttime, rainy days, and foggy conditions.
Based on the deployment test results, the algorithm's accuracy exceeds
90%, with a false alarm rate of less than 5% and a missed detection rate of
less than 1%, meeting the requirements of the business scenario.
2 Use Case Description
2�1 Description
This case leverages ZTE Nebula visual large model to address the high safety risks in hazardous
chemical operations and inefficient manual monitoring at the ports. By integrating a small
edge deep learning model with cloud-based Nebula large model in a hybrid architecture[1],
the project supports 18 intelligent scenarios spanning four functional domains: personnel
management, operational compliance, vehicle tracking, and safety inspections. A unified
port safety management platform has been developed to provide real-time video analytics,
predictive risk alerts, and automated incident forensics.
491

