Links

GitHub Website

Tags

#Distributed Systems #Cloud and Edge Computing

Octopus Event Fabric

Modern science relies on a distributed research infrastructure that generates large volumes of events. As applications span sites, scientists need to consume and act on events from many sources. We introduce Octopus, a hybrid cloud-edge event fabric designed for scalable, flexible scientific event-driven architecture (EDA). Benchmarks and real-world studies show Octopus meets the demands of scientific applications for throughput, latency, and resilience. Our experience suggests EDA maps naturally to scientific workflows.

  • Global reach: A cloud-hosted Amazon MSK (Kafka) cluster accessible from edge and HPC sites.
  • Secure access: Per-topic authentication and authorization via Globus Auth and AWS IAM.
  • Managed triggers: Users can deploy filterable, auto-scaling Lambda functions to invoke arbitrary web services on matching events.
  • Scalable and resilient: Brokers, web services, and triggers scale independently; supports at-least-once delivery, producer acks, and consumer commits.

Together with Mofka, which provides high-throughput data and metadata exchange within HPC environments, cloud-hosted Octopus forms a high-performance, hierarchical event bus that bridges HPC systems, edge resources, and cloud services. In WRATH, Octopus serves as a cross-facility streaming fabric for task metrics, heartbeats, and failure logs from task-based parallel programming frameworks (e.g., Parsl), enabling real-time failure categorization and informed termination or hierarchical retry actions. In Icicle (to be released), Octopus, combined with Apache Flink, provides an event-driven substrate for ingesting, transporting, and processing high-volume file system metadata events, supporting scalable real-time monitoring and continuous metadata index maintenance across HPC file systems.

Publications

Funding and Acknowledgements

We thank the entire team of the Diaspora Project for their helpful comments and feedback. This material is based upon work supported by the U.S. Department of Energy (DOE), Office of Science, Office of Advanced Scientific Computing Research, under Contract DE-AC02-06CH11357.

People

Alok Kamatar
Haochen Pan
Ian Foster
Kyle Chard
Ryan Chard