Learning by example

The pitfalls of learning by example

Usually when I am writing code in an unfamiliar environment (language,IDE, framework etc.) I tend to search and study examples and if I am not satisfied, I then check the documentation/manual.

I am doing this because it seems easier and quicker to understand something in this way and because I am usually more concerned to get the expected result than understand in depth how I got it.

However, some times this may produce some nasty bugs.

My Case

Here, I would like to introduce you to my short experience with pycrypto library’s hashing functions and how my habit to learn by example has failed me.

A couple of months before I was enrolled in courseras' crypto class offered by Standford’s Dan Boneh.

Most programming assigments included code snippets in python and suggested a python or c++ library to use. Regardless my small experience with python I choose it to implement the majority of the assignments.

So, in a programming assignment I had to consecutively hash different data chunks to check if any transmission errors took place.

As I was unfamiliar with pycrypto, I searched for how to sha128 in pycrypto. This got me to the library’s pydoc and I show this example here

>>> from Crypto.Hash import SHA
>>>
>>> h = SHA.new()
>>> h.update(b'Hello')
>>> print h.hexdigest()

So, this seemed pretty straight forward and so I did something like that in my program:

from Crypto.Hash import SHA

h = SHA.new()
for chunk in data:
        h.update(chunk)
        print h.hexdigest()

This was a total disaster. The program runned smoothly but I was getting the wrong result.

I was lucky enough to test the update method shortly after I spotted the issue and realised that consecutive update() calls on the same data produced different hashes.

Puzzled by that, I just included h = SHA.new() in the loop. However, If I have read the documentation a little bit sooner, it could have saved me some time.

The new() function of SHA says Return a fresh instance of the hash object. The fresh word was the key to my problem. Later, it says It is equivalent to an early call to SHA1Hash.update(). So, clearly there is a difference between the update() calls.

Problem was that the update() concatenates every given argument with the previous result. This becomes clear if you read the pycryptos’s source code comments.

In other words this m.update(a); m.update(b) is the same with this m.update(a+b).

So, in my case I should have done the following:

from Crypto.Hash import SHA

for chunk in data:
        print SHA.new(chunk).hexdigest()

Sadly, there wasn’t a BIG warning sign in the SHA’s doc page. However, I assume that if there was I would ignore it as long as my eye have spotted the example.

So, at the end of the day if everything else fails remember to read the f*cking manual.

The pitfalls of learning by example

My Case

Comments