Page 707 - AI for Good Innovate for Impact

P. 707

AI for Good Innovate for Impact

and its performance is on par with a 671B-parameter DeepSeek ] model, despite
[10
having only 72B parameters. AstroOne significantly outperforms the peer astronomy-
domain model AstroLLaMA 70B ][ ] on the GPQA-astro ](Graduate-Level Google-
[13
[11
12
Proof Q&A Benchmark, – Astronomy Subset) and NAOC-TianYi test sets. [4][5][14 ] cities 4.8: Smart home/

The score is calculated as the proportion of correctly answered questions (accuracy). The same
metric is used in subsequent figures.

AstroOne possesses leading astronomical data analysis capabilities. For instance, stellar
spectrum model can efficiently predict dozens of stellar parameters. It has identified over 8,000
candidates for extremely metal-poor stars, known as early universe messengers, whereas the
total number identified by humans so far is merely around 50.[8]
(2) In addition, it also offers comprehensive capabilities for multimodal data integration,
modeling, and prediction. For instance, the solar model currently under development
leverages large-scale multimodal observational datasets—including documentation,
spectral measurements, imagery, and time-series recordings—to build a multimodal,
dynamically evolving foundation model. This model enables diverse downstream
applications such as full-disk magnetic field forecasting at hourly and daily scales,
Carrington-rotation–level predictions, and imputation of missing observations. By

10 [] DeepSeek's R1 is a large language model with strong reasoning skills, https:// huggingface .co/ deepseek
-ai/ DeepSeek -R1
11 [] Tijmen de Haan, Yuan-Sen Ting, Tirthankar Ghosal,et al., AstroMLab 4: Benchmark-Topping Performance
in Astronomy Q&A with a 70B-Parameter Domain-Specialized Reasoning Model, arXiv preprint,
arXiv:2505.17592 2025
12 [] AstroMLab's latest Llama-based Model for astronomy domain, https:// huggingface .co/ AstroMLab/
AstroSage -70B
13 [] A graduate-level question answering dataset covering a wide range of scientific and general knowledge
domains, https:// huggingface .co/ datasets/ Idavidrein/ gpqa
14 [] Wu, J., Liu, P., Deng, J., et al. InternVL-Chat: Vision-Language Foundation Model for Science with Scientific
Multimodal Instruction Tuning. arXiv preprint arXiv:2312.01908, 2023.

671

702 703 704 705 706 707 708 709 710 711 712