Xtract

The rapid generation of data from distributed IoT devices, scientific instruments, and compute clusters presents unique data management challenges. The influx of large, heterogeneous, and complex data causes repositories to become siloed or generally unsearchable—both problems not currently well-addressed by distributed file systems. To this end, we at Globus Labs are actively building Xtract, a serverless middleware that leverages FaaS in order to extract metadata from files spread across heterogeneous edge computing resources. We’re currently studying how Xtract can automatically construct file extraction workflows subject to users’ cost, time, and compute allocation constraints. To this end, Xtract enables the creation of a searchable centralized index across distributed data collections.

Publications