Page 705 - AI for Good Innovate for Impact
P. 705

AI for Good Innovate for Impact



               (continued)

                Item                     Details
                Data Availability        Private                                                                      4.8: Smart home/
                                         However, the dataset is planned to be gradually opened up in the          cities
                                         future, with the specific release schedule yet to be determined.
                Metadata (Type of Data)  Text Data, Image Data, Multimodal Data (Image-Text Pairs), Time
                                         Series Data
                Model    Training   and AstroOne is trained on large-scale, high-quality astronomical data-
                Fine-Tuning              sets. On top of a strong language foundation model, we apply
                                         a multi-stage fine-tuning process including Continual Pretrain-
                                         ing (CPT), Supervised Fine-Tuning (SFT), and Direct Preference
                                         Optimization (DPO), to enhance domain-specific capabilities in
                                         astronomy.
                                         In parallel, we build dedicated foundational models for solar imag-
                                         ery and time-domain observations using self-supervised learning.
                                         These models are then aligned with the language model using
                                         CLIP-style contrastive learning to form a unified multimodal system.
                                         To ensure practical deployment and inclusive application, the
                                         training pipeline also includes techniques such as knowledge
                                         distillation, bias mitigation for fairness, and multilingual enhance-
                                         ment to support broader accessibility.

                Testbeds or Pilot Deploy- [1]
                ments

                Code repositories        AstroOne has been made available for free use globally. The model
                                         is also planned to be open-sourced on GitHub in the future, as part
                                         of the Zhejiang Lab's Open Science Initiative.


               2      Use Case Description


               2�1     Description

               Humanity's pursuit of high-quality sustainable development not only requires further exploration
               of space to discover exploitable extraterrestrial resources but also necessitates precise
               predictions of various celestial activities, such as the Sun, to avoid potential disaster impacts.
               However, astronomical research faces challenges such as the diverse and complex modalities
               of data and high barriers to entry for study. Current astronomical research is undergoing a
               paradigm shift from hypothesis-driven to data-driven approaches. Artificial intelligence has
               become a key enabler for exploring the universe.[4][ ] Therefore, we developed the AstroOne
                                                              2
               series of models based on Qwen2.5.[ ]
                                                 3
               AstroOne maintains the question-answering dataset in the following ways:1) Pre-training
               question-answering data for general large models: Used for foundational language capability
               learning. 2) Retrieval-augmented Generation(RAG) knowledge base data: Provides specific
               knowledge support in the field of astronomy. 3) Real-time internet search data: Captures


               2   []  Zhang, C., Shen, Y., Zhou, Y., & Tang, J. AstroCLIP: Contrastive Language-Image Pretraining for Astronomy.
                  arXiv preprint, arXiv:2311.15756, 2023.
               3   []  An Yang, Baosong Yang, Beichen Zhang,et al., Qwen2.5 Technical Report. arXiv preprint, arXiv:2412.15115
                  2024.



                                                                                                    669
   700   701   702   703   704   705   706   707   708   709   710