4 Comments

There is a correction in the durability calculations in the Book.

Erasure Coding

11 nines of durability for EC(8,4), with AFR of 0.81% is incorrect!

Using the calculator referenced in the book it is 14 nines.

EC(8,4)

MTTR = 6.5

AFR = 0.81% = 0.0081

P(F) = AFR * (MTTR (days)/365) = 0.0081 * (6.5/365)

P(S) = 1 - P(F)

0.99999999999999722488 ~ 14 nines

EC(17,3)

MTTR = 6.5

AFR = 0.41% = 0.0041

P(F) = AFR * (MTTR (days)/365) = 0.0041 * (6.5/365)

P(S) = 1 - P(F)

0.99999999999227524332 ~ 11 nines

For 11 nines we need AFR of 0.41% and EC(17,3) scheme

Replication

Similarly, 6 nines of durability for replication 3 is incorrect as well. It is 9 nines! (In this case, the book does mentions it as a rough estimate).

(1- (0.0081 * (6.5/365))^3)^(365/6.5)

0.99999999983146269658 = 9

Besides MTTR for replication will be better in comparison to erasure coding because of parity calculation overhead involved in EC. This would give us number of nines of durability better than 9 in this case!

https://github.com/Backblaze/erasure-coding-durability/blob/master/durability.py

Expand full comment

Book mentions following statement under back-of-the-envelope estimations. What is the purpose of this? How does it serve in any estimations? IOPS is not mentioned anywhere in the rest of the chapter.

"IOPS. Let’s assume one hard disk (SATA interface, 7200 rpm) is capable of doing 100~150 random seeks per second (100-150 IOPS)."

Expand full comment

Book mentions about "40% of storage usage ratio" while back-of-the-envelope estimations. Can we confidently say storage usage ratio can be calculated as:

Storage Usage Ratio = Usable Capacity / Raw Capacity

Usable Capacity = Raw Capacity ×

(1 − Replication Overhead) ×

(1 − Metadata Overhead) ×

(1 − Reserved Space Overhead)

Expand full comment

Thanks Alex for this article ! A question - how does API service know which instance of data service to connect to ? S3 is a widely used B2B system and operates at a large scale , I am sure there must be more complexity involved while handing the request to store/reterive the data.

Expand full comment