MaterialsAtlas Dataset

Awesome materials datasets

Materials-Related Databases

These databases are sorted by the number of claimed data entries and by the existence of a method of programmatic access.

| Database | Description | Contact | API / Docs | Size |
|---|---|---|---|---|
| [ChemSpider](https://www.chemspider.com) | Free chemical structure database providing access to over 32 million structures, properties, and associated information. It integrates and links compounds from approximately 500 data sources and is owned by the Royal Society of Chemistry. | | [REST](https://www.chemspider.com), [Python: chemspipy](https://blog.matt-swain.com) | 30M chemicals and compounds |
| [CCDC / Cambridge Crystallographic Data Centre](https://www.ccdc.cam.ac.uk) | The Cambridge Structural Database is a repository for small-molecule organic and metal-organic crystal structures, containing results from X-ray and neutron diffraction analyses. | | | >700k diffraction analyses and 3D structures |
| [AFLOW](https://www.aflowlib.org) | Distributed materials-properties repository from high-throughput ab initio calculations. | Stefano Curtarolo | [REST](https://aflowlib.duke.edu) | 630,000 compounds |
| [Open Quantum Materials Database / OQMD](https://oqmd.org) | Database of DFT-calculated thermodynamic and structural properties. Provides online access and recommends downloading the full database and API for large-scale use. | Chris Wolverton | [Python: qmpy](https://oqmd.org) | 285,780 compounds |
| [Materials Project](https://www.materialsproject.org) | Open web-based access to computed information on known and predicted materials, using supercomputing and electronic-structure methods. Includes analysis tools for materials design. | Kristin Persson, Gerbrand Ceder | [Python API](https://www.materialsproject.org) | 58,000 compounds; 33,000 band structures |
| [Computational Materials Repository / CMR](https://wiki.fysik.dtu.dk) | Tools to store, retrieve, and search electronic-structure calculations. | | [Docs](https://wiki.fysik.dtu.dk), [Python](https://wiki.fysik.dtu.dk) | |
| [3D Materials Atlas](https://cosmicweb.mse.iastate.edu) | Repository for 3D experiments and simulations on a variety of material systems. | | | |
| [Interatomic Potentials Repository Project](https://www.ctcms.nist.gov) | Repository for interatomic potentials, related files, and evaluation tools to help researchers obtain and assess interatomic models. | Chandler Becker, Zachary Trautt / NIST | | |
| [Web Force-Field / WebFF](https://www.nist.gov) | Repository consisting of a database, software engine, and web-client interface. The database supports multiple force-field tables. | Frederick Phelan / NIST | | |
| [ThermoML](https://trc.nist.gov) | Collection of ThermoML files representing experimental thermophysical and thermochemical property data from journal articles. | NIST Thermodynamics Research Center | | |
| [GBRV Pseudopotential Library](https://www.physics.rutgers.edu) | Open-source pseudopotential library optimized for high-throughput DFT calculations. Provides files for Quantum ESPRESSO, ABINIT, JDFTx, and Vanderbilt ultrasoft pseudopotential generation. | Kevin Garrity / NIST | | |
| [PAW Atomic Datasets](https://www.abinit.org) | PAW atomic datasets for ABINIT and tools for creating new datasets. | | | |
| [PSlibrary](https://qe-forge.org) | Library of inputs for the `ld1.x` atomic code, allowing generation of PAW datasets and ultrasoft pseudopotentials. | | | |
| Computational Materials Data Network / CMD Network | Computational materials data network resource. | | [Website](http://www.asminternational.org/web/cmdnetwork) | |

---

Cloud Services

---

Relevant Codes

| Code | Description | Maintained By | API / Docs |
|---|---|---|---|
| [MAST / Materials Simulation Toolkit](https://pythonhosted.org) | Automated workflow manager and post-processing tool focused on diffusion and defect workflows using DFT, primarily interfacing with VASP. | Dane Morgan | [Docs](https://pythonhosted.org) |
| [NWChem](https://www.nwchem-sw.org) | Computational chemistry package designed for scalable calculations on parallel supercomputers and workstation clusters. | PNNL | [Docs](https://www.nwchem-sw.org) |
| [pymatgen](https://pypi.python.org) | Python Materials Genomics library for materials analysis. Powers parts of the Materials Project. | Materials Genome Initiative | [Docs](https://pypi.python.org) |
| pymatgen-db | Database add-on for pymatgen that enables creation of Materials Project-style MongoDB databases and querying into pymatgen objects. | Materials Genome Initiative | Docs |
| [Swift](https://swift-lang.org) | Parallel scripting system for multicore machines, clusters, clouds, and supercomputers. | University of Chicago / Argonne | [Docs](https://swift-lang.org) |

Additional Codes

---

Python Machine-Learning Stack and Resources

| Code | Description | Docs |
|---|---|---|
| [scikit-learn](https://scikit-learn.org) | Machine learning in Python. Provides tools for data mining and data analysis. | [Docs](https://scikit-learn.org) |
| [SciPy](https://www.scipy.org) | Python-based ecosystem of open-source software for mathematics, science, and engineering. | [Docs](https://www.scipy.org) |
| [NumPy](https://www.numpy.org) | Fundamental package for scientific computing with Python, including multidimensional arrays, broadcasting, linear algebra, Fourier transforms, random numbers, and C/C++/Fortran integration. | [Docs](https://docs.scipy.org) |
| [pandas](https://pandas.pydata.org) | Data structures for working with relational and labeled data; widely used for practical data analysis in Python. | [Docs](https://pandas.pydata.org) |

Tutorials and Examples

---

API Wrappers

No entries listed.

---

Other Databases, Codes, and Efforts to Sort and Catalog

Acknowledgement: Ben Blaiszik https://github.com/blaiszik

TypeDataset
DomainNot specified
LicenseNot specified
ContributorsNot specified