Machine learning and data science approaches are becoming critical to scientific and technological advancement, with thousands of new scientific publications yearly and countless private companies relying on ML as critical aspects of their business models. For this growth to continue to translate into practice, data and models need to be easily available for training, retraining, reproducing, and verifying usefulness on chosen tasks. Unfortunately, the discovery of datasets, models, and the underlying code is often a challenge. Here, we introduce Foundry, a simple way to publish and discover datasets for machine learning and to link these datasets to predictive models. Foundry is a synthesis of service capabilities from the Materials Data Facility and DLHub, layered with Python tooling, standardized metadata, and a file structure specification to meet the needs of the machine learning community.
- Xiang-Guo Li, Ben Blaiszik, Marcus Emory Schwarting, Ryan Jacobs, Aristana Scourtas, K. J. Schmidt, Paul M. Voyles, and Dane Morgan. “Graph network based deep learning of bandgaps.” The Journal of Chemical Physics (2021)
- Jingrui Wei, Ben Blaiszik, Dane Morgan, and Paul Voyles. “Benchmark tests of atom-locating CNN models with a consistent dataset.” Microscopy and Microanalysis Volume 27 Supplement S1 (July 2021)
- Foundry Usage Examples – The Foundry Github has a folder full of example notebooks to help users get started.
This work was supported by the National Science Foundation under NSF Award Number: 1931306 “Collaborative Research: Framework: Machine Learning Materials Innovation Infrastructure”.