Periklis Ntanasis:
Master's Touch

fade out

Trackable File Distribution: A simple approach

Scenario

Let’s asume that we have a community or something that distributes material such as pictures, pdf’s e.t.c. by a website to it’s registered only users. If the management wants only the registered members to download the material is it possible to somehow track a member who leaks something in another public mean such as uploading it to a public place or redistributing it via torrents?

The solution I came up is pretty simple and it’s called steganography.

Before I continue let me state that this method isn’t bulletproof.

Steganography

Steganography is the art and science of writing hidden messages in such a way that no one, apart from the sender and intended recipient, suspects the existence of the message, a form of security through obscurity.

— wikipedia

So, in general steaganography is a process where you are hiding something inside something else! Usually hiding it in plain sight.

The trick is that no one knows that something exists in there. As wikipedia states that’s a form of security through obscurity.

So how steganography fits in our case?

That we are going to do is hide a unique identifier inside each downloaded file that will connect it to a specific user.

Let’s see how!

In the above example I am going to show you that concept with some PHP code. Note that I just choose PHP because I feel more familiar with but this can be done in any language of your choise.

Let’s asume that we are going to distribute a GIF file.

First of all let’s construct the identifier.

You can construct the identifier any way you like but here it is the first thing that poped into my head.

If we have a user with username master and with password pass we can create a unique identifier that will be the first 7 digits of the user name plus password hash (md5 will do). This is md5("masterpass") = ab1e5cb87bca828b54a4a24c2b37ea8f and the first 7 digits are ab1e5cb. This pattern can identify 36^7 unique users.

So now that we have our unique identifier we can serve the file to the user like that:

<?php

function sendfile($file,$identifier,$id_length) {
  if (file_exists($file)) {
    $filesize = filesize($file)+$id_length;
    header('Content-Description: File Transfer');
    header('Content-Type: application/octet-stream');
    header('Content-Disposition: attachment; filename='.basename($file));
    header('Content-Transfer-Encoding: binary');
    header('Expires: 0');
    header('Cache-Control: must-revalidate, post-check=0, pre-check=0');
    header('Pragma: public');
    header('Content-Length: ' . $filesize);
    ob_clean();
    flush();
    readfile($file);
    echo $identifier;
    exit;
  }
  else {
    echo "FILE NOT FOUND! :(";
  }
}

sendfile("image.gif","ab1e5cb",7);

?>

Here is the gist with the code.

The sendfile function takes the files name/location, the identifier and the identifiers length(in bytes) and returns the trackable file to the user.

If we run diff to the original GIF and the one our user downloaded we’ll see that they differ.

If we hexdump the original file we’ll see that the last hex is that:

0002710 0000 003b
000271a

The 3B hex value is the GIF footer. However most programs use only the header to identify the file so most image viewers will display our GIF just as the original.

The GIF that our user have downloaded will be 7 bytes larger:

$ du -b image1.gif
10010  image1.gif
$ du -b image.gif
10003  image.gif

If we hexdump it the last line will be:

0002710 0000 613b 3162 3565 6263
000271a

As we can see there are 7 more bytes which are our unique identifier. You can check that if you check the hex values in an ascii character table.

Further more hexdump can dump a file in ascii with the -C switch. So the downloaded file last line is:

00002710  00 00 3b 61 62 31 65 35  63 62                    |..;ab1e5cb|
0000271a

Here we can see our identifier pretty clearly.

Sum up

To sum up if a user had leaked the file image.gif we can easily associate that file with him!

Why not bulletproof

That’s pretty obvious! As easily as we inserted the identifier someone can remove it.

Also some programs may check the file footer before they open the file. That may result to a failure.

Conclusion

There are other means that can achieve the same result such as a visual watermark introduced in a picture. However, those that aren’t alert/destroy the file can be reversed with some basic skills.

Furthermore there are staganography methods that can mark a file in a less profound way. Those methods are more advanced and complex and as I implied earlier they may alter/destroy the original’s file data and have as a result a copy with lesser quality.

Comments

fade out