Page 217 - AI for Good Innovate for Impact
P. 217
AI for Good Innovate for Impact
Use Case 8: AI for green multi-cluster: Intelligent management
towards green and low-carbon, large-scale multi-clusters Change 4.2-Climate
Organization: Institute of Artificial Intelligence (TeleAI), China Telecom
Country: China
Contact Person:
Primary: Qizhen Weng, wengqzh@ chinatelecom .cn
Secondary: Yuankai Fan, fanyk1@ chinatelecom .cn
1 Use Case Summary Table
Item Details
Category Climate Change/Natural Disaster
Problem Addressed Using AI to efficiently managing computing resource and scheduling
jobs across multiple large-scale AI clusters. This AI-driven approach
aims to reduce energy consumption and carbon emissions, surpasses
traditional rule-based multi-cluster management methods in adapt-
ability and effectiveness, and is capable of explaining its decisions to
humans via multi-modal generative models.
Key Aspects of Solution • AI-powered job scheduling that improves efficiency through real-
time multi-dimensional resource monitoring.
• Employing GPU multiplexing and shared resource scheduling
algorithms to enhance cluster utilization.
• Cross-cluster coordination via multiple AI agents towards overall
green and low-carbon goals.
• Employing multi-modal generative large models to provide
interpretable explanations of scheduling decisions, enhancing
transparency for end-users.
Technology Keywords Job Scheduling, Multi-Cluster Management, Multi-Modal Generative
Models, GPU Multiplexing, Kubernetes-based Cluster Coordination
Data Availability Private; Public
1) https:// github .com/ alibaba/ clusterdata [1]
2) https:// github .com/ InternLM/ AcmeTrace [2]
3) https:// github .com/ ml -energy/ zeus [3]
Metadata (Type of Data) Text data (job resource requirements and performance logs, energy
consumption metrics, energy source information, etc.) and images
(cluster metrics represented through graphs)
181