Thursday, December 26, 2013

Lim x -> infinity (2^10x - 10^3x) = ? or Gigabytes != Gibibytes

What is a gibibyte?  Why, a gibibyte is simply the only proper way for a technically minded person to count bytes.  Well, that or kibibytes, mebibytes, or any other power of two based prefix.

While most people (including hard drive manufacturers) think we count bytes of data in kilos, megas, or gigas, they are sadly mistaken.  Granted, there has been some confusion over the matter, and because of this there is a growing disparity between what the advertised capacity of digital data storage offers, and what an operating system will tell you is actually available.

Truth - one segment of the computer industry counts bytes one way and another segment counts them in an entirely different way.  Case in point: the hard drive in my laptop is a Western Digital WD5000BEKT.  It's sold as a 500 GB drive.  That's 500 billion bytes.  More specifically, the data sheet for this product says it has a "formatted capacity" of 500,107 MB, or 500,107 million bytes.  However most operating systems will report this drive as having 465 GB.  That's a difference of about 7%, but has it always been this way?

Believe it or not, it has.

Up until the last few years in the history of digital storage mediums the difference between counting in bytes in powers of ten and powers of two hasn't been of much concern.  However, we are beginning to now see the lasting effects of this oversight which was made early on in the digital realm of data storage measurement.

The basics are this:  Digital computers typically work by counting in binary.  There are two values, on and off (or 0 and 1).  Each 0 or 1 is a bit (i.e. binary digit) and if you want to count past 1 you just lengthen your number by more bits.  Two bits 00, 01, 10, 11.  Three bits: 000, 001, 010, 011, 100, 101, 110, 111, and so on.  Eight bits equal a byte (interestingly, four is a nibble) and 16, 32, and 64 bits are called "words" depending on the computer architecture.  Since binary was the number system of the land, counting of large numbers of bytes was done in kilobytes or "K".  This was roughly based on the Metric System prefix "kilo" or 1000.  The tough part was that the number 1000 was not a power of two, so 1024 (or 2 raised to the 10th power).  While 1024 didn't exactly equal one "kilo" of bytes, it was close enough for government work.


In the early days, this discrepancy was small, and not very significant.  Today, the difference has grown, and will continue to diverge as time goes on.  The graph above demonstrates the divergence as the number of bytes reported grows.  The y-axis should read "Percent Available of Reported Bytes" but you get the idea.

Fortunately, at this point today we are only in the ~1.0E+12 to ~4.0E+12 range for single hard drives.  But even that is nearing a 10% difference in advertised capacity.  Theoretically, if storage capacities continue to increase at the rate that they have for the last 30 years, in another 30 short years (when I have my mortgage paid off) we'll be in ~4.0E+18 territory and the discrepancy will be nearing a whopping 15 percent.  By the turn of the next century the difference will be over 20%.

This seems like false advertising doesn't it?  If we were to make an analogy to the automobile world, it would be like counting horsepower one way in the showroom and another way in real world use - on a sliding scale.  I don't know about you but I don't want our ancestors looking back at us shaking their fists saying "Why do I only get 80% of the advertised capacity of this storage device!"

For the sake of our children, let's fix this storage capacity mess!  I call on the storage device manufacturers of the world to correctly report their storage in values of two raised to the 10th rather than this silly base-10 stuff.

5 comments:

Kruiser said...

Then, really 500 GB should be 111110100 GB, right?

Alex Dodge said...

Your math is wrong. The difference between gigs and gibs is a constant ratio, not a difference like you claim. That means that hard drives have always been, and will always be, around 93% as big as advertised, if you wrongly assume that they're using the same units as OSes. What's increasing is the size of that 7%.

RyanL said...

Alex, I believe that it is you who are mistaken.

1 KB is actually 0.97 KiB.
1 MB is actually 0.95 MiB
1 GB is actually 0.93 GiB
1 TB is actually 0.91 TiB
1 PB is actually 0.89 PiB
...

And so on. The latter number is what an OS will report. It get's worse, I promise.

Alex Dodge said...

I'm talking about your title. The limit of x2^10 - x10^3 as x tends toward infinity is infinity, but that's not super profound. It just means that 1024 > 1000. You'd get the same result comparing yards and meters. The problem, which you mention in your comment, is the divergence that comes from having two different sets of exponential prefixes.

Personally, I'd rather just use normal SI prefixes. I don't care whether my hard drive's size is evenly divisible by a power of 2, and it doesn't make sense to have a special set of prefixes for bits. That's a bit of historical baggage that I'm ready to drop. OSX already does this, in some contexts. (e.g.: Finder will report a file's size in megabytes, but ls will report in mibibytes.)

RyanL said...

Oh, so now you are changing your argument since your previous one was wrong. Of course it's not "super profound." It's a math problem. Blog post titles and math problems are rarely profound. The point is that the two forms of measurements diverge and will eventually be very far apart. Furthermore all operating systems report kibi, mebi, and gibi units. Therefore, continuing to sell storage devices in the other format is silly and should be changed. Both for clarity and honesty.