Another important observation is published by Max Locktyukhin. (Both implementations are public domain.) Translation to X86 and D by Kai Nacke <kai@redstar.de>
References:
http://arctic.org/~dean/crypto/sha1.html Fast implementation of SHA1Computes SHA1 digests of arbitrary data, using an optimized algorithm with SSSE3 instructions.