Cost-aware computing
Cloud platforms are increasingly relied upon to conduct large scale science. However, the method by which infrastructure is provisioned and managed are ad hoc. We are developing new methods to profile application performance, predict cloud market conditions, and automate provisioning decisions.

Automation of classificaiton of fluffy "cloud" in the air identify different cloud types, and the properties to improve understanding of cloud dynamics and feedback. We are developing unsupervised machine learning methods capable of clustering several hundreds of TB of satellite cloud imagery without any assumptions concerning artificial cloud categories.

The Center for Codesign of Online Data Access and Reduction (CODAR) develops new methods and science for delivering the right bits to the right place at the right time on exascale computers.

Colmena is a Python library for building applications that combine AI and simulation workflows on HPC. Its core feature is a communication library that simplifies tools for intelligently steering large ensemble simulations.

Leveraging FaaS to extract metadata from large, distributed, and complex data sets

Data and Learning Hub for Science (DLHub)
A simple way to find, share, publish, and run machine learning models and discover training data for science

Exponential increases in data volumes and velocities are overwhelming finite human capabilities. Continued progress in science and engineering demands that we automate a broad spectrum of currently manual research data manipulation tasks, from data transfer and sharing to data acquisition, publication, indexing, analysis, and inference.

Foundry is a platform that allows scientists to share and access datasets and ML models while setting industry benchmarks

funcX is a Function as a Service (FaaS) platform for science. It is designed to be applied to existing cyberinfrastructures to provide scalable, secure, and on-demand execution of short duration scientific functions.

Globus Automation for Data-Intensive Experimental Research

Globus Search
Vast quantities of scientific data are distributed across storage systems and data repositories. We are developing methods to crawl, extract metadata from, and index those data.

Information Extraction
A wealth of valuable data is locked within the millions of research articles published each year. We are researching methods to liberate this data via hybrid human-machine models.

Distributed K-FAC
Kronecker-factored Approximate Curvature (K-FAC) can enable efficient second-order optimization and faster deep neural network training than traditional optimizers (e.g., SGD). Distributed K-FAC performs the expensive second-order computations in a model-parallel method to efficiently utilize HPC resources.

3D Neural Networks for Materials
Methods of materials imaging and microscopy are now capable of generating increasingly large and complex data, which far surpasses the ability for manual post-processing and analysis. We work to apply deep learning architectures to assist materials researchers with tasks that would otherwise require manual annotation.

The Materials Data Facility
The Materials Data Facility (MDF) project is working to develop and deploy advanced services to help materials scientists publish datasets, encourage data reuse and sharing, and facilitate simple discovery of data.

Parsl is a parallel programming library for Python. It provides a model by which complex workflows can be represented in an intuitive Python-based control application. It facilitates transparent parallel execution of workflow components (apps) on any distributed or parallel computing system.

Whole Tale
Whole Tale cloud-hosted platform for conducting and sharing reproducible science. Researchers can use the platform to conduct interactive research in a custom computing environment. They may then export the resulting tale, capturing data, code, computational environment, and narrative.