Page 705 - AI for Good Innovate for Impact
P. 705
AI for Good Innovate for Impact
(continued)
Item Details
Data Availability Private 4.8: Smart home/
However, the dataset is planned to be gradually opened up in the cities
future, with the specific release schedule yet to be determined.
Metadata (Type of Data) Text Data, Image Data, Multimodal Data (Image-Text Pairs), Time
Series Data
Model Training and AstroOne is trained on large-scale, high-quality astronomical data-
Fine-Tuning sets. On top of a strong language foundation model, we apply
a multi-stage fine-tuning process including Continual Pretrain-
ing (CPT), Supervised Fine-Tuning (SFT), and Direct Preference
Optimization (DPO), to enhance domain-specific capabilities in
astronomy.
In parallel, we build dedicated foundational models for solar imag-
ery and time-domain observations using self-supervised learning.
These models are then aligned with the language model using
CLIP-style contrastive learning to form a unified multimodal system.
To ensure practical deployment and inclusive application, the
training pipeline also includes techniques such as knowledge
distillation, bias mitigation for fairness, and multilingual enhance-
ment to support broader accessibility.
Testbeds or Pilot Deploy- [1]
ments
Code repositories AstroOne has been made available for free use globally. The model
is also planned to be open-sourced on GitHub in the future, as part
of the Zhejiang Lab's Open Science Initiative.
2 Use Case Description
2�1 Description
Humanity's pursuit of high-quality sustainable development not only requires further exploration
of space to discover exploitable extraterrestrial resources but also necessitates precise
predictions of various celestial activities, such as the Sun, to avoid potential disaster impacts.
However, astronomical research faces challenges such as the diverse and complex modalities
of data and high barriers to entry for study. Current astronomical research is undergoing a
paradigm shift from hypothesis-driven to data-driven approaches. Artificial intelligence has
become a key enabler for exploring the universe.[4][ ] Therefore, we developed the AstroOne
2
series of models based on Qwen2.5.[ ]
3
AstroOne maintains the question-answering dataset in the following ways:1) Pre-training
question-answering data for general large models: Used for foundational language capability
learning. 2) Retrieval-augmented Generation(RAG) knowledge base data: Provides specific
knowledge support in the field of astronomy. 3) Real-time internet search data: Captures
2 [] Zhang, C., Shen, Y., Zhou, Y., & Tang, J. AstroCLIP: Contrastive Language-Image Pretraining for Astronomy.
arXiv preprint, arXiv:2311.15756, 2023.
3 [] An Yang, Baosong Yang, Beichen Zhang,et al., Qwen2.5 Technical Report. arXiv preprint, arXiv:2412.15115
2024.
669

