Machine learning and data science approaches are becoming critical to scientific and technological advancement, with thousands of new scientific publications yearly and countless private companies relying on ML as critical aspects of their business models. For this growth to continue to translate into practice, data and models need to be easily available for training, retraining, reproducing, and verifying usefulness on chosen tasks. Unfortunately, the discovery of datasets, models, and the underlying code is often a challenge. Here, we introduce Foundry, a simple way to publish and discover datasets for machine learning and to link these datasets to predictive models. Foundry is a synthesis of service capabilities from the Materials Data Facility and DLHub, layered with Python tooling, standardized metadata, and a file structure specification to meet the needs of the machine learning community.


Other Links

  • Foundry Usage Examples – The Foundry Github has a folder full of example notebooks to help users get started.


This work was supported by the National Science Foundation under NSF Award Number: 1931306 “Collaborative Research: Framework: Machine Learning Materials Innovation Infrastructure”.