MaterialsAtlas Model

A Large Encoder-Decoder Family of Foundation Models For Chemical Language

chemical language modelsfoundation modelsencoder-decodercheminformaticsSMILESproperty predictionmachine learningself-supervised learning

This paper introduces a large encoder-decoder chemical foundation model pre-trained on 91 million SMILES samples (4 billion molecular tokens) from PubChem. The model supports tasks like quantum property prediction and offers variants with 289M and 8x289M parameters. Experiments validate its state-of-the-art performance and demonstrate a separable latent space with few-shot learning capabilities.

TypeModel
DomainNot specified
LicenseNot specified
ContributorsNot specified