Thursday, December 26, 2013

20131223-D800-5191

Lim x -> infinity (2^10x - 10^3x) = ? or Gigabytes != Gibibytes

What is a gibibyte?  Why, a gibibyte is simply the only proper way for a technically minded person to count bytes.  Well, that or kibibytes, mebibytes, or any other power of two based prefix.

While most people (including hard drive manufacturers) think we count bytes of data in kilos, megas, or gigas, they are sadly mistaken.  Granted, there has been some confusion over the matter, and because of this there is a growing disparity between what the advertised capacity of digital data storage offers, and what an operating system will tell you is actually available.

Truth - one segment of the computer industry counts bytes one way and another segment counts them in an entirely different way.  Case in point: the hard drive in my laptop is a Western Digital WD5000BEKT.  It's sold as a 500 GB drive.  That's 500 billion bytes.  More specifically, the data sheet for this product says it has a "formatted capacity" of 500,107 MB, or 500,107 million bytes.  However most operating systems will report this drive as having 465 GB.  That's a difference of about 7%, but has it always been this way?

Believe it or not, it has.

Up until the last few years in the history of digital storage mediums the difference between counting in bytes in powers of ten and powers of two hasn't been of much concern.  However, we are beginning to now see the lasting effects of this oversight which was made early on in the digital realm of data storage measurement.

The basics are this:  Digital computers typically work by counting in binary.  There are two values, on and off (or 0 and 1).  Each 0 or 1 is a bit (i.e. binary digit) and if you want to count past 1 you just lengthen your number by more bits.  Two bits 00, 01, 10, 11.  Three bits: 000, 001, 010, 011, 100, 101, 110, 111, and so on.  Eight bits equal a byte (interestingly, four is a nibble) and 16, 32, and 64 bits are called "words" depending on the computer architecture.  Since binary was the number system of the land, counting of large numbers of bytes was done in kilobytes or "K".  This was roughly based on the Metric System prefix "kilo" or 1000.  The tough part was that the number 1000 was not a power of two, so 1024 (or 2 raised to the 10th power).  While 1024 didn't exactly equal one "kilo" of bytes, it was close enough for government work.


In the early days, this discrepancy was small, and not very significant.  Today, the difference has grown, and will continue to diverge as time goes on.  The graph above demonstrates the divergence as the number of bytes reported grows.  The y-axis should read "Percent Available of Reported Bytes" but you get the idea.

Fortunately, at this point today we are only in the ~1.0E+12 to ~4.0E+12 range for single hard drives.  But even that is nearing a 10% difference in advertised capacity.  Theoretically, if storage capacities continue to increase at the rate that they have for the last 30 years, in another 30 short years (when I have my mortgage paid off) we'll be in ~4.0E+18 territory and the discrepancy will be nearing a whopping 15 percent.  By the turn of the next century the difference will be over 20%.

This seems like false advertising doesn't it?  If we were to make an analogy to the automobile world, it would be like counting horsepower one way in the showroom and another way in real world use - on a sliding scale.  I don't know about you but I don't want our ancestors looking back at us shaking their fists saying "Why do I only get 80% of the advertised capacity of this storage device!"

For the sake of our children, let's fix this storage capacity mess!  I call on the storage device manufacturers of the world to correctly report their storage in values of two raised to the 10th rather than this silly base-10 stuff.