
Deep learning methods have been considered promising for accelerating molecular screening in drug discovery and material design. Due to the limited availability of labelled data, various self-supervised molecular pre-training methods have been presented. Although many existing methods utilize common pre-training tasks in computer vision and natural language processing, they often overlook the fundamental physical principles governing molecules. In contrast, applying denoising in pre-training can be interpreted as an equivalent force learning, but the limited noise distribution introduces bias into the molecular distribution. To address this issue, we introduce a molecular pre-training framework called fractional denoising, which decouples noise design from the constraints imposed by force learning equivalence. In this way, the noise becomes customizable, allowing for incorporating chemical priors to substantially improve the molecular distribution modelling. Experiments demonstrate that our framework consistently outperforms existing methods, establishing state-of-the-art results across force prediction, quantum chemical properties and binding affinity tasks. The refined noise design enhances force accuracy and sampling coverage, which contribute to the creation of physically consistent molecular representations, ultimately leading to superior predictive performance.
Publication:
NATURE MACHINE INTELLIGENCE
http://dx.doi.org/10.1038/s42256-024-00900-z
Author:
Yuyan Ni
Institute for AI Industry Research (AIR), Tsinghua University, Beijing, China
Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
University of Chinese Academy of Sciences, Beijing, China
Institute for AI Industry Research (AIR), Tsinghua University, Beijing, China.
These authors contributed equally: Yuyan Ni
e-mail: lanyanyan@air.tsinghua.edu.cn
Shikun Feng
Institute for AI Industry Research (AIR), Tsinghua University, Beijing, China
These authors contributed equally: Shikun Feng
Xin Hong
Institute for AI Industry Research (AIR), Tsinghua University, Beijing, China
Yuancheng Sun
University of Chinese Academy of Sciences, Beijing, China
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Beijing Academy of Artificial Intelligence, Beijing, China
Wei-Ying Ma
Institute for AI Industry Research (AIR), Tsinghua University, Beijing, China
Zhi-Ming Ma
Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
University of Chinese Academy of Sciences, Beijing, China
Qiwei Ye
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Yanyan Lan
Institute for AI Industry Research (AIR), Tsinghua University, Beijing, China
Beijing Academy of Artificial Intelligence, Beijing, China
附件下载: