Summary

Monitoring ML/AI refers to the continuous, real-time tracking and observing an ML/AI system's in production environments. Monitoring ML/AI evaluates the performance of ML/AI model to determine whether it operates effectively. When the ML/AI model experiences some performance degradation, appropriate maintenance measures should be taken to restore performance.

ML/AI models are trained based on historical data and assumptions about the operational environment. However, the environment is dynamic. These dynamics can lead to model degradation — a decline in predictive accuracy or decision quality over time — caused by phenomena such as data drift, concept drift. Therefore, the environment and running state of ML/AI model should be monitored in order to determine whether the model should be updated or not.

To overcome various issues that resulted in performance degradation, a set of parameters and events should be defined and monitored. As a result, choosing appropriate monitoring strategies based on the specific use case, data characteristics, and business requirements is a critical step in ensuring the long-term reliability and effectiveness of ML/AI systems.

Recommendation ITU-T Q.4081 will give a guide and reference of monitoring ML/AI methods and metrics.