Cloud platforms are increasingly relied upon to conduct large scale science. However, the method by which infrastructure is provisioned and managed are ad hoc. We are developing new methods to profile application performance, predict cloud market conditions, and automate provisioning decisions.
The Center for Codesign of Online Data Access and Reduction (CODAR) will develop new methods and science for delivering the right bits to the right place at the right time on exascale computers.
Draining the Data Swamp
Techniques for extracting rich metadata from heterogeneous scientific data repositories.
Data and Learning Hub for Science (DLHub)
A simple way to find, share, publish, and run machine learning models and discover training data for science
Vast quantities of scientific data are distributed across storage systems and data repositories. We are developing methods to crawl, extract metadata from, and index those data.
A wealth of valuable data is locked within the millions of research articles published each year. We are researching methods to liberate this data via hybrid human-machine models.
Architecting a solution to crawl, index, integrate, and distribute geo-spatial data at scale.
The Materials Data Facility
The Materials Data Facility (MDF) project is working to develop and deploy advanced services to help materials scientists publish datasets, encourage data reuse and sharing, and facilitate simple discovery of data.
Parsl is a parallel scripting library for Python. It provides a model by which complex workflows can be represented in an intuitive Python-based control application. It facilitates transparent parallel execution of workflow components (apps) on any distributed or parallel computing system.
Ripple is an automation framework designed to managing data throughout its lifecycle, in which users specify via high-level rules and the actions to be performed on data at different times and locations.