How can I write a sha1 hash function from scratch?

Step-by-step guide to implementing a basic SHA1 hash function from scratch covering input processing, initialization, computation rounds and output.
On this page

How can I write a sha1 hash function from scratch?

Excerpt

Learn how to write a SHA1 hash function from scratch and understand the process behind it. This blog post explains each step in detail and provides examples for better understanding.


Introduction

SHA1 or SHA-1 is a popular one-way cryptographic hash function used widely in security applications. Though the algorithm is complex, it is educational and insightful to understand the core steps involved in implementing a basic SHA1 hash function from scratch. In this post, we will walk through the key stages of developing a SHA1 hashing program in a simplified manner.

Understanding SHA1

SHA1 takes an arbitrary length input message and produces a 160-bit hash value. Key properties:

  • Small changes in input lead to significant changes in output
  • Infeasible to generate original input from hash
  • Low chance of collisions for different inputs

The SHA1 algorithm operates on binary data and consists of 6 main steps:

  1. Convert input to binary
  2. Pad the input
  3. Break into 512-bit blocks
  4. Initialize variables
  5. Perform hash computation
  6. Output final 160-bit hash

Let’s look at each step:

Step 1: Converting Input to Binary

The input string is first converted to a binary bit string. This can be done by:

  • ASCII encoding - converting characters to 8-bit ASCII values
  • Unicode encoding - converting characters to 16-bit Unicode code points

For example, the string “Hello” becomes

10100100001100101011011000110110001101111

in ASCII binary form.

Step 2: Padding the Message

The binary message is padded to ensure its length is a multiple of 512 bits. Padding involves:

  • Appending a single 1 bit to message
  • Adding 0 bits until message is 64 bits shy of next 512 multiple
  • Appending a 64-bit message length value at the end

So “Hello” with its ASCII binary is padded to become

101001000011001010110110001101100011011110000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001010000001010100100000011001010

Step 3: Breaking the Message into Blocks

The padded message is then broken into 512-bit blocks. If the padded length is 1048 bits, we get two 512-bit blocks:

1Block 1: 0100100001100101011011000110110001101111000000100000000000000000000000000000000000000000000000000000000
2
3Block 2: 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001010000001010100100000011001010

Step 4: Initializing Variables and Constants

SHA1 uses five 32-bit variables initialized as:

1var h0 := 0x67452301
2var h1 := 0xEFCDAB89
3var h2 := 0x98BADCFE
4var h3 := 0x10325476
5var h4 := 0xC3D2E1F0

It also uses constants generated from the fractional parts of the cube root of primes from 2 to 79.

Step 5: Hash Computation

This involves four rounds of processing the message blocks:

  1. Message Scheduling - Creating 80 expanded message words
  2. Compression Function - Conducting logical operations usingExpanded message, constants and hash values.
  3. Updating Hash Value - Adding output of compressionfunction to initial hash values.
  4. Output - Converting final hash values to hexadecimal to generate 160-bit hash.

The compression and update steps are repeated for each 512-bit block.

Step 6: Generating the Final Hash

After all blocks are processed, the final 160-bit hash value is produced by concatenating the 5 updated 32-bit hash variables h0 to h4.

For “Hello”, the final SHA1 hash is

1f7ff9e8b7bb2e09b70935a5d785e0cc5d9d0abf0

An free online tool to quickly verify your answers

Conclusion

Implementing the SHA1 algorithm from scratch provides valuable insights into its internal workings. By following the essential steps of preprocessing, initialization, hash computation and output generation, we can produce the final SHA1 digest for any input. Learning these cryptographic concepts and methods prepares us for tackling more advanced hash functions down the road.