Cyberinfrastructure for Autonomous Science

Researchers are faced with an increasingly complex data landscape in which data are obtained from a number of different sources (e.g., instruments, computers, published data), stored in disjoint storage systems, and analyzed on an area of high performance and cloud computers. Given the increasing speed at which data are produced, combined with increasingly complex scientific processes and the requisite data management, munging, and organization activities required to make sense of data, researchers are faced with new bottlenecks in the discovery process. Improving data lifecycle management practices is essential to enhancing productivity, facilitating reproducible research, and encouraging collaboration. We posit that researchers require automated methods for managing their data such that tedious and repetitive tasks (e.g., transferring, archiving, and analyzing) are accomplished without continuous user input.

Automate is an automation Platform as a Service that provides reliable, secure, and efficient orchestration of research data management and manipulation activities that span science disciplines, resources, locations, and time periods. Automate makes it possible, for example, for a researcher to indicate that new data should trigger, in turn, a transfer to a remote computer, analysis of those data, updates to registries, and email to scientists.

Our research focuses on three core areas: (1) Developing cyberinfrastructure to enable autonomous scientific pipelines to be described and performed; (2) investigating serverless computing’s ability to abstract computing environments and facilitate automations across all forms of research computing resources; and (3) exploring techniques and programming models which enable non-technical users to design and configure custom automations.

Further reading