Can anyone tell me why is /dev/random is preferred for security while wiping data from a hard drive?
Simple answer, /dev/random
is not preferred. Both are equally secure. Use /dev/zero
for easier verification. Also less CPU usage and possibly faster.
More complete answer. For modern hard drives platter density is such that it's impossible to obtain signals from incompletely overwritten sectors of the drive, that people such as Gutmann wrote about many, many years ago. As far as modern hard drives are concerned (I'd place this as any hard drive whose capacity can be measured in Gigabytes or better), if it's overwritten it's gone. End of story. So it doesn't matter what you change the data to. Just that you change the data.
To add onto this, even if you wipe a hard drive completely, there may still be data left on the drive in sectors that were remapped by the hard drive's firmware but these are relatively rare, and only a very small amount of data would be contained within, not to mention that you'd need very specialized equipment to retrieve that data (you'd have to edit the G-List
within the System Area
of the drive to get at it), not to mention that the reason why those sectors were remapped in the first place is because they were failing.
So to sum up, DoD wipes are stupid, Gutmann wipes are stupider, use /dev/zero
, it's good in nearly 100% of all cases. And if it's an edge case then you need to have very specialized know how to get at the data and also remove the data.
"thanks! so, what about usb stick?"
USB stick is a different animal altogether, you'd need to bypass the flash controller in order to clean it out, even a Gutmann wipe won't completely remove the data because of wear leveling algorithms. But just like a hard drive, if you overwrite the data once, it's gone, the trick is forcing the device to actually overwrite the data.
That being said, if you have a cheap USB stick without a controller which does wear leveling then a single pass 0-fill should be sufficient to remove the data within. Otherwise, you're looking at custom hardware and soldering work.
SSDs should be considered USB sticks with a controller that performs wear leveling. SSDs will always do wear leveling, I do not know of any exceptions to this rule. Many USB sticks do not.
How do you tell if a USB stick does wear leveling? You need to take it apart and inspect the controller chip and look up a datasheet on it.
"Would you give a source for the statement that it is "impossible to obtain signals from incompletely overwritten sectors of the drive" ? I am not talking about tests from computer magazines concerning data recovery stores, I am talking of the worst case scenario: a well-equipped government laboratory. So I really would like to know how can you guarantee that statement, preferably a scientific paper."
I'll give some justification and information regarding the analog storage of digital data on magnetic media. The following is mostly things that I was taught while on the job at a data recovery company, and may partially inaccurate in places. If so, let me know, I will correct it. But this is my best understanding of the material.
After a hard drive is manufactured the first thing that happens is it receives servo labels from a servo label writing machine. This is a separate machine whose sole job is to take a completely blank hard drive and bootstrap it. (This is why hard drives have holes in them covered with aluminum tape, that's where the servo labeling machine places its write heads.) If you've ever had a drive that when you powered it on it just generated "click click click" it's is because it could not read the servo labels. When a hard drive is powered on the first thing it tries to do is fling its read heads somewhere onto the platter and acquire a track. Servo labels define tracks. If it can't see a servo label it reaches the middle, makes a clack, pulls the arm back and tries again.
The reason why I mention this, is that is pretty much the only instance that an external device reads and writes to the hard drive and it describes approximately the limit that hardware outside of that drives own read heads can work with the data on a platter. If it were possible to make servo labels smaller and more space efficient hard drive manufacturers would. Servo labels are comparatively space inefficient for two reasons.
A ring of servo labels defines a track. There are some things you must know about tracks.
After servo labels are written, then comes the low level format. An actual low level 1980s format of a drive, except more complicated. Because platters are circular but hard drive speeds are constant the amount of area passing under the read head is a variable function of the distance to the middle of the platter. So, in an effort to squeeze every last drop of storage out of a platter the density of the platter is variable and defined in zones. On a typical 3.5" hard drive there will be several dozen zones with different platter densities.
One of which is special and extra low density called the System Area
. The System Area is where all of the firmware and configuration settings are stored on the drive. It has an extra low density because that information is more important. The lower the density the less chance there is that something will randomly screw up. It happens all of the time of course, but less often than something in the user area.
After the drive is low level formatted the firmware is written to the System Area. The firmware is different for every drive. In order to optimize the drive for the ridiculously fine requirements of the platters, each drive must be tuned. (This actually takes place before the low level format, of course, because you have to know how good the equipment is in order to decide how dense to make the platters.) This data is known as adaptives
and is saved in the System Area. Information in the adaptives area is stuff like "how much voltage should I use to correct myself when the servo labels tell me I'm drifting off track", and other information required to make the hard drive actually work. If the adaptives are off slightly it might be impossible to access the user area. The system area is easier to access, so only very few adaptives are required to be stored on the PCB CMOS.
Take aways from this paragraph:
So. Because the user are has such a high density, it actually is very (very (very very)) likely to get screwed up bits in the normal course of things. This can be caused by many, many factors including very slight timing issues and platter degradation. A good percentage of sectors of your hard drive actually contain screwed up bits. (You can verify this yourself by issuing an ATA28 READLONG
command to your drive (only valid for the first 127 GB or so. There is no ATA48
equivalent it was dropped!) several times on many sectors and comparing the output. You'll find that it isn't a rare occurrence that certain bits will misbehave and act suck on or off or even flip randomly.) It's a fact of life. Which is why we have ECC.
ECC is a checksum contained after the 512 (or 4096 in newer drives) bytes of data that will correct that data if it has few enough incorrect bits. The exact number depends on firmware and manufacturer, but all drives have it and all drives need it (and it's surprisingly higher than you'd expect, something like 48-60 bytes that can detect and correct up to 6-8 error bytes. Crazy math going on.) This is because the density of the platters is too high for even the highly specialized and tuned internal hard drive equipment.
Finally, I want to talk about the preamp chip. It's located on the arm of the hard drive and acts as a megaphone. Because the signals are being generated from very small magnetic fields, acting on very small heads they have a very small potential. So you cannot use the hard drive head for the Gutmann method, because you cannot get an accurate enough reading from it to make Gutmann's technique worthwhile.
But let's posit that the NSA has a piece of magic equipment, and they can get a very accurate read (accurate enough to calculate the potential and derive the previously written data) of any particular bit in 1 ms. What do they need first?
First, they need the System Area. Because that's where the Translator is stored (the translator is the things that turns an LBA address into a PCHS address (Physical Cylinder Head Sector as opposed to the logical CHS address which is fake and only around for legacy reasons). The size of the System Area varies, and you can get it without resorting to magic tools. Normally, it's only around 50-100MB. The layout of the translator is firmware specific, so you have to reverse it (but it's been done before, no big deal.)
So first problem, signal to noise. As mentioned, platter density is tuned way higher that is strictly safe. Gutmann's method requires a very low variance in normal read/write activity to calculate previous states of the bits with any accuracy. If the variance in signal is significant then it can screw over these attempts. And the variance is significant enough to completely screw you over (that's why ECC is so crazy in modern drives.) An analogy would be like trying to perfectly hear someone whispering to you while someone is talking to you in the middle of a noisy room.
Second problem, time. Even if the electron microscope is very fast and accurate (1ms per bit! That's lightning for an electron microscope. It's also slower than a 1200 baud modem), there is a LOT of data on a hard drive and a full image will take a very long time. (WA says 126 years for an entire 500GB hard drive, and that's NOT including ECC data (which you need). There's also lots of other metadata associated with hard drive sectors that I didn't mention, like ID fields, and Address markers, but these don't get overwritten, perhaps you can come up with a faster way to image them normally? Doubtless there are ways to speed up this process (such as selectively imaging portions of the drive) but even that will take you months of 24/7 around the clock work just to get the $MFT
file on a standard hard drive (typically around 50-300MB on a drive with Windows installed)).
Third problem, admissibility. If the government is after you they're after you for only a few reasons, they want to know something that you know, or they want to arrest you and put you in prison. There are easier ways to get the former (rubber hose cryptography), and the latter will require regular evidence procedures. Going back to the analogy, if someone testified that someone told them something in a whisper, while someone else was talking to them in the middle of a crowded and noisy room, there is a lot of room for doubt there. It would never be the sort of strong evidence that would want to spend lots of time and money.