Recover overwritten files using grep

Posted on Wed 02 August 2017 in Linux

Any one who has used a computer for a good amount of time has overwritten a file. A late night mv command typo'd, a drag and drop misclick. Even if you stop using the drive straight away, most disk recovery tools won't look for files that have been overwritten rather than straight up deleted. But with a bit of luck, you can use one of the simplest linux command line tools to recover your precious files!

Just a heads up before we start, this method only really works on text files. Binary files, such as music and video, are a little more difficult to search for!

The most important thing is to stop writing to the file system as soon as possible! Unplug it, power it off, STOP USING IT!

To begin with, have your device in a working linux install, but unmounted. You need to know the rough length of the file, and a small amount of text from within the file. The more you can remember, the less junk you'll have to search through.

The command we'll be using is grep. Grep can search through binary files (such as block devices!) for text strings. It has a few switches we can use to help search through all the junk that a filesystem contains.
For example, we'll be looking for a file that contains the phrase `import GPIO', and is around 100 lines long. We'll be looking on the block device /dev/sdd2.

Grep can be used to find that file as follows:

$ grep -i -a -B100 -A100 'import GPIO' /dev/sdd2 > results.txt
  • -i makes our search term case-insensitive. This allows us to find a match even if our search term may be slightly off.
  • -a is the important bit - it allows us to search through a binary file for text.
  • -B <num lines> and -A <num lines> gives you the 100 lines either side of the search result respectively. This is used to retrieve the whole file, rather than just the line we're searching for
  • /dev/sdd2 is the device we are searching. Block devices are the way the linux kernel allows you to access each filesystem seperately.
  • > results.txt Redirects the output of the grep command to file. This allows us to dig through the results to find our missing file!

This command will give us a file - results.txt. This file contains all the matches grep has found, as well as the lines before and after each match, as specified. This file will almost certainly contain multiple copies of the file due to the way filesystems work. Look through, verify you have an up to date copy, and then copy your data out of the file!

Oh, and next time - back up your dang data! ;)

Why does this work?

This works as filesystems don't actually overwrite the physical data on disk when you write a file. It's typically quicker for a file system to write to areas of the drive that have been previously marked as empty, and then move the reference to the file to the new blocks. This means the old data remains, lost but still on disk, until something else needs the space. This is why it is critical to stop using the drive as soon as possible, as when the blocks have been overwritten it's gone for good!