Active Projects

These are projects that are very actively under development.

AI4PFAS
Deep learning workflow and dataset for toxicity prediction.

Cost-aware computing
Automating cost-aware profiling, prediction, and provisioning of cloud and HPC resources.

Clouds
Self-supervised data-driven methods for the application on big satellite clouds imagery

CODAR
We develop methods for online data analysis and reduction on exascale computers

Colmena
Machine Learning-Based Steering of Ensemble Simulations

D-Rex
Algorithms and models for fast, reliable data storage using erasure coding with heterogeneous storage nodes

Flight
Hierarchical Federated Learning across the Computing Continuum.

Flows
We are developing methods to automate the scientific data lifecycle.

Foundry
An open source machine learning platform for scientists

funcX
funcX is a Function as a Service platform for scientific computing.

Garden
Garden turns researchers' AI models into citable APIs that run on scientific computing infrastructure.

Gladier
Globus Automation for Data-Intensive Experimental Research

Scientific Language Modeling and Information Extraction
Data mining from literature with a foundational science-focused language model

The Materials Data Facility
We are creating data services to help materials scientists publish and discover data

MFANN
Mutivariate Functional Approximation through Neural Networks

Octopus Event Fabric
Cloud-to-edge event fabric that handles millions of events securely at scale.

Parsl
Parallel programming library for Python

ProxyStore
Facilitate efficient data flow management in distributed Python applications.

SZ3 Compression
We develop a series of prediction-based lossy compression algorithms for scientific simulations.

WaterMark
A benchmark of temporal AI models for water level prediction

Whole Tale
Whole Tale is a cloud-hosted platform for conducting and sharing reproducible science

Other Projects

These are projects that are not currently being actively developed but may or may not still be used to support above active projects.

Xtract
Metadata Extraction for Everyone

Data and Learning Hub for Science (DLHub)
A simple way to find, share, publish, and run machine learning models and discover training data for science

Globus Search
We are developing methods to index large amounts of scientific data distributed over heterogeneous storage systems

GreenFaaS
A monitoring and scheduling framework to find the most energy-efficient endpoints for your application.

KAISA: Scalable Second-Order Deep Neural Network Training
KAISA is a novel distributed framework for training large models with K-FAC at scale.