Links

GitHub

Tags

#Fault Tolerance #Reliability #Distributed and Heterogeneous Data Storage #Erasure Coding #Load Balancing

D-Rex

D-Rex proposes an innovative reliability model that uses the concept of a reliability target. This target is expressed as the probability of successfully accessing a data item within a given time interval. This model makes it easy for users to express and reason about quality-of-service guarantees. The model contains two dynamic algorithms, D-Rex LB and D-Rex SC. These algorithms aim to solve the multi-objective optimization problem of storing as much data as possible while satisfying reliability and storage capacity targets and minimizing the I/O overhead associated with read/write operations. Using diverse datasets, we ran extensive simulations and demonstrated that, on average, D-Rex SC and D-Rex LB store 45% and 31% more data, respectively, while operating at speeds that are only 0.4 and 0.3 MB/s slower than classic state-of-the-art algorithms when using heterogeneous nodes.

Publications

Funding and Acknowledgements

Based upon work supported by the U.S. Department of Energy (DOE), Office of Science, Office of Advanced Scientific Computing Research, under Contract DE-AC02-06CH11357. Partially funded by NSF, under Grant CSSI-2411386 and by MICIU/AEI-10.13039501100011033/ “FEDER Una manera de hacer Europa” under the project PID2022-138050NB-I00. We gratefully acknowledge the support of Chameleon Cloud for providing the computational resources (project CHI-231082).

People

Hai Duc Nguyen
Haochen Pan
Ian Foster
Kyle Chard
Maxime Gonthier
Valerie Hayot-Sasson