Page 707 - AI for Good Innovate for Impact
P. 707

AI for Good Innovate for Impact



                    and its performance is on par with a 671B-parameter DeepSeek ] model, despite
                                                                                 [10
                    having only 72B parameters. AstroOne significantly outperforms the peer astronomy-
                    domain model AstroLLaMA 70B ][ ] on the GPQA-astro ](Graduate-Level Google-
                                                                         [13
                                                  [11
                                                     12
                    Proof Q&A Benchmark, – Astronomy Subset) and NAOC-TianYi test sets. [4][5][14 ]                cities  4.8: Smart home/

















               The score is calculated as the proportion of correctly answered questions (accuracy). The same
               metric is used in subsequent figures.




















               AstroOne possesses leading astronomical data analysis capabilities. For instance, stellar
               spectrum model can efficiently predict dozens of stellar parameters. It has identified over 8,000
               candidates for extremely metal-poor stars, known as early universe messengers, whereas the
               total number identified by humans so far is merely around 50.[8]
               (2)  In addition, it also offers comprehensive capabilities for multimodal data integration,
                    modeling, and prediction. For instance, the solar model currently under development
                    leverages large-scale multimodal observational datasets—including documentation,
                    spectral measurements, imagery, and time-series recordings—to build a multimodal,
                    dynamically evolving foundation model. This model enables diverse downstream
                    applications such as full-disk magnetic field forecasting at hourly and daily scales,
                    Carrington-rotation–level predictions, and imputation of  missing observations. By



               10   []  DeepSeek's R1 is a large language model with strong reasoning skills, https:// huggingface .co/ deepseek
                  -ai/ DeepSeek -R1
               11   []  Tijmen de Haan, Yuan-Sen Ting, Tirthankar Ghosal,et al., AstroMLab 4: Benchmark-Topping Performance
                  in Astronomy Q&A with a 70B-Parameter Domain-Specialized Reasoning Model, arXiv preprint,
                  arXiv:2505.17592 2025
               12   []  AstroMLab's latest Llama-based Model for astronomy domain, https:// huggingface .co/ AstroMLab/
                  AstroSage -70B
               13   []  A graduate-level question answering dataset covering a wide range of scientific and general knowledge
                  domains, https:// huggingface .co/ datasets/ Idavidrein/ gpqa
               14   []  Wu, J., Liu, P., Deng, J., et al. InternVL-Chat: Vision-Language Foundation Model for Science with Scientific
                  Multimodal Instruction Tuning. arXiv preprint arXiv:2312.01908, 2023.



                                                                                                    671
   702   703   704   705   706   707   708   709   710   711   712