Cyberinfrastructure for Autonomous Science
Exponential increases in data volumes and velocities are overwhelming finite human capabilities. Continued progress in science and engineering demands that we automate a broad spectrum of currently manual research data manipulation tasks, from data transfer and sharing to data acquisition, publication, indexing, analysis, and inference.

Cloud Provisioning
Cloud platforms are increasingly relied upon to conduct large scale science. However, the method by which infrastructure is provisioned and managed are ad hoc. We are developing new methods to profile application performance, predict cloud market conditions, and automate provisioning decisions.

The Center for Codesign of Online Data Access and Reduction (CODAR) will develop new methods and science for delivering the right bits to the right place at the right time on exascale computers.

Draining the Data Swamp
Techniques for extracting rich metadata from heterogeneous scientific data repositories.

Data and Learning Hub for Science (DLHub)
A simple way to find, share, publish, and run machine learning models and discover training data for science

Serverless Supercomputing
funcX is a Function as a Service (FaaS) platform for science. It is designed to be applied to existing cyberinfrastructures to provide scalable, secure, and on-demand execution of short duration scientific functions.

Globus Search
Vast quantities of scientific data are distributed across storage systems and data repositories. We are developing methods to crawl, extract metadata from, and index those data.

Information Extraction
A wealth of valuable data is locked within the millions of research articles published each year. We are researching methods to liberate this data via hybrid human-machine models.

The Materials Data Facility
The Materials Data Facility (MDF) project is working to develop and deploy advanced services to help materials scientists publish datasets, encourage data reuse and sharing, and facilitate simple discovery of data.

Parsl is a parallel scripting library for Python. It provides a model by which complex workflows can be represented in an intuitive Python-based control application. It facilitates transparent parallel execution of workflow components (apps) on any distributed or parallel computing system.