MaterialsAtlas Model
A Large Encoder-Decoder Family of Foundation Models For Chemical Language
chemical language modelsfoundation modelsencoder-decodercheminformaticsSMILESproperty predictionmachine learningself-supervised learning
This paper introduces a large encoder-decoder chemical foundation model pre-trained on 91 million SMILES samples (4 billion molecular tokens) from PubChem. The model supports tasks like quantum property prediction and offers variants with 289M and 8x289M parameters. Experiments validate its state-of-the-art performance and demonstrate a separable latent space with few-shot learning capabilities.