Hey guys,
I’m afraid today I have to confess to an almighty amount of stupidity, but first - please allow me to set the record straight so that nobody ever makes that mistake again:
The PHP MD5 function produces the same hashes as any other MD5 function written in any other language ever devised.
Please don’t pay any heed whatsoever to the whisperings that abound the wonderful interweb. Hmmm… Maybe ‘abound’ isn’t the right word, which is exactly what should have sent the rumour-alarm clanging away in my head. Truth is, there are a small spattering of references to this problem, but often people (like me) work out that it was a problem with their own code that they attributed to the ‘encoding problem’ because it’s easier to ‘blame their tools‘.
As was pointed out to me:
md5 always takes the argument as a bit vector rather than a string of letters, i.e. no encoding matters. If your script is written in ISO-8559-15 and you passed an embedded string literal to md5(), the result is the hash of a ISO-8859-15 string
Y’know what? It’s true! When I did a bit more debugging I found that I was inserting invisible whitespace into the string I tried hashing. Whitespace is as visible as any other character to the MD5 function - the hash of ‘ hashtext’ (notice the leading space) will therefore be different to the hash of ‘hashtext’. Nothing to do with utf7 or utf8!
And guess what? It’s not just me… On experts exchange1 I found a user with a similar problem in Java. He later explains that in his case a string wasn’t being lowecased prior to hashing:
Hello. Have got access to the php code now and can see that the php programmer did not actually follow the specification (did not make all chars to lowercase bfore md5…) Sorry to have bothereed u with this, was extremely painfull to sort out the bug when I could not see the php code.
But this sort of response is never publicised in the same way. These answers, these non-problems, are always buried as apologetic admissions of bad development practice. I want to put an end to this, and by publishing this post I hope to nip this slowly spreading rumour in the bud.
Go forth and spread the good news - The PHP MD5 is not dead. Long live (urm) the PHP MD5 function!
Tom x
-
Tom August 4th, 2008 at 2:45 pm
-
Tom August 4th, 2008 at 3:00 pm
-
Tom August 4th, 2008 at 4:12 pm
-
Tom August 4th, 2008 at 4:42 pm
-
Tom August 4th, 2008 at 4:53 pm
-
Peter(new comment) September 8th, 2008 at 12:21 am
-
Amrox(new comment) December 14th, 2008 at 7:59 pm
-
Zxirupofrur(new comment) August 23rd, 2009 at 6:39 am