EuCrypt Chapter 7: Keccak Sponge

January 25th, 2018

~ This is part of the EuCrypt series. Start with Introducing EuCrypt. ~

Using the Keccak transformations from Chapter 6, I can now finally implement the actual Keccak sponge1 that is useful for EuCrypt mainly as a hashing function. To start with, I should say that "sponge" really doesn't strike me as a very useful name or sort of metaphor to use - the whole thing is still essentially a Rubik's cuboid of bits that can be loaded with input bits (through a xor operation), scrambled (by means of the set of transformations discussed in the previous chapter) and read as needed. I get it that Rubik's is trademarked and one wants a "new" name/metaphor and what not. I also totally agree that an actual sponge-organism absorbing/sucking and spitting bits would make at least a livelier sort of hash function but Keccak is what it is and that's a very cold mechanism of scrambling bits (in a 3D sort of way at least, granted) repeatedly. Anyway, to preserve history and make it easy to follow the match between implementation and reference document, I'll use the authors' terminology to denote the thing although I will keep using the Rubik's cube analogy as needed, to explain how it actually works.

The Keccak sponge takes as input a stream of bits of arbitrary length and spits as output another stream of bits of any specific length that is requested. For this reason, the ideal Keccak sponge implementation does not care at all about endianness: bits are absorbed as they come, whether most significant or least significant first. This approach has the significant advantage of clarity: one can easily follow the stream of bits as they make it into the sponge's structure and then as they are squeezed out. Moreover, there is no need to cater explicitly for big endian or little endian machines as such. The choice of Ada as programming language for this implementation makes it also particularly straightforward to model precisely what is described i.e. a transformation at bit level rather than octet level, hence unconcerned with endianness. However, it turns out that my current implementation is not yet there because at the moment it relies on the round constants (for the iota transformation in the previous chapter) as numbers (hence with bit-level representation dependent on endianness) rather than bit streams. Consequently, I'll have to go back over this in next chapter and change everything to bit-level operation. For now though, let's see what the current sponge looks like anyway.

The Keccak sponge itself comes with an important parameter, namely bitrate: this represents the number of bits that the sponge can absorb or squeeze in one single iteration, meaning without needing to scramble the state. The way the sponge works is that it first pads the input stream with 10*1 (so a minimum of 2 bits set to 1 with as many or as few 0 in between as needed) so that the length of the input stream is a multiple of the requested bitrate. Then it splits the padded input stream into chunks of bitrate length and it proceeds to "absorb" those chunks one by one (i.e. xor in the first part of its state), scrambling the state after each chunk. When all the input stream has been absorbed, the sponge moves onto the "squeezing" stage where it simply reads bits from the first part of its state, scrambling the state again after each bitrate bits read. Essentially, the bitrate parameter stands for the number of bits of internal state that are used directly for absorbing and squeezing between two consecutive scramblings of the Keccak state.

The bitrate parameter decides indirectly a sponge's "capacity". This is the remaining number of bits of the state and it can be calculated as width - bitrate. The capacity of a sponge is the number of bits that are never directly read at a squeeze nor directly xored with input bits when absorbing input. The capacity bits contribute otherwise to the scrambling of state at each iteration of the sponge and they represent therefore the "secret" part of a sponge's internal state. Consequently, there is a tradeoff between a higher bitrate that would make the Keccak hashing faster in principle and a higher capacity that increases the "secret" part of the sponge. Note that a sponge's bitrate is a matter of choice at each and every use of the sponge - there is no reason really for this to be fixed necessarily by the Keccak implementation itself. Consequently, my current approach is to have it as a parameter of the Sponge procedure itself, to be decided on by the calling code, at each call. Because this bitrate is effectively used to split the input stream into blocks of equal length, I'm calling the parameter Block_Len, specifying however that it has to be a value of the Keccak_Rate type that models the constraints of a valid bitrate (0 < bitrate < width).

To model the Keccak sponge with its specific input/output requirements, I first define a few new types and subtypes, in eucrypt/smg_keccak/smg_keccak.ads:

  -- rate can be chosen by caller at each call, between 1 and width of state
  -- higher rate means sponge "eats" more bits at a time but has fewer bits in
  --   the "secret" part of the state (i.e. lower capacity)
  subtype Keccak_Rate is Positive range 1..Width;  -- capacity = width - rate

  type Bit is mod 2;
  type Bitstream is array( Natural range <> ) of Bit; -- any length; message
  subtype Bitword is Bitstream( 0..Z_Length - 1 ); -- bits of one state "word"

Note in the above the Bitword subtype of Bitstream: this is a stream of bits of precisely same length as the Z dimension of the Keccak cuboid (EuCrypt uses 64 bits as the value of the Z_Length constant in smg_keccak code). In other words, a Bitword is the bistream equivalent of the numerical "ZWord" value stored at any (X,Y) position of the Keccak sponge. Consequently, there is a need for some ways to convert between the two (Bitword to ZWord and back):

  -- type conversions
  function BitsToWord( Bits: in Bitword ) return ZWord;
  function WordToBits( Word: in ZWord ) return Bitword;

The type conversions are provided as above in the public part of the SMG_Keccak package for now, since they might conceivably be useful to users of Keccak too, at some point. However, as I'll change the Keccak implementation to get rid of multi-octet numbers and work simply at bit level *everywhere*, it's quite likely that those methods will simply be discarded as they are not needed anymore. For the time being though, once those conversions are in place, there is only the sponge function itself to define immediately afterwards:

  -- public function, the sponge itself
  -- Keccak sponge structure using Keccak_Function, Pad and a given bitrate;
  -- Input - the stream of bits to hash (the message)
  -- Block_Len - the bitrate to use; this is effectively the block length
  --             for splitting Input AND squeezing output between scrambles
  -- Output - a bitstream of desired size for holding output
  procedure Sponge(Input      : in Bitstream;
                   Block_Len  : in Keccak_Rate;
                   Output     : out Bitstream);

Finally, in the same eucrypt/smg_keccak/smg_keccak.ads, I added to the private part of the SMG_Keccak package two new procedures, SqueezeBlock and AbsorbBlock that are used by the Sponge function. Note that these two procedures are not really meant for external use, mainly because they represent intermediate steps in the sponge operation and therefore are not of much use by themselves.

  -- this will squeeze Block'Length bits out of state S
  -- NO scramble of state in here!
  -- NB: make SURE that Block'Length is the correct bitrate for this sponge
  -- in particular, Block'Length should be a correct bitrate aka LESS than Width
  procedure SqueezeBlock( Block: out Bitstream; S: in State);

  -- This absorbs into sponge the given block, modifying the state accordingly
  -- NO scramble of state in here so make sure the whole Block fits in state!
  -- NB: make SURE that Block'Length is *the correct bitrate* for this sponge
  -- in particular, Block'Length should be a correct bitrate aka LESS than Width
  procedure AbsorbBlock( Block: in Bitstream; S: in out State );

The detailed implementations of all the above new procedures and functions are in eucrypt/smg_keccak/smg_keccak.adb:

-- public function, sponge
  procedure Sponge( Input      : in Bitstream;
                    Block_Len  : in Keccak_Rate;
                    Output     : out Bitstream) is
    Internal  : State := (others => (others => 0));
  begin
    --absorb input into sponge in a loop on available blocks, including padding
    declare
      -- number of input blocks after padding (between 2 and block_len bits pad)
      Padded_Blocks : constant Positive := 1 + (Input'Length + 1) / Block_Len;
      Padded        : Bitstream ( 1 .. Padded_Blocks * Block_Len );
      Block         : Bitstream ( 1 .. Block_Len );
    begin
      -- initialise Padded with 0 everywhere
      Padded := ( others => 0 );
      -- copy and pad input with rule 10*1
      Padded( Padded'First .. Padded'First + Input'Length - 1 ) := Input;
      Padded( Padded'First + Input'Length )                     := 1;
      Padded( Padded'Last )                                     := 1;

      -- loop through padded input and absorb block by block into sponge
      -- padded input IS a multiple of blocks, so no stray bits left
      for B in 0 .. Padded_Blocks - 1 loop
        -- first get the current block to absorb
        Block   := Padded( Padded'First + B * Block_Len ..
                           Padded'First + (B+1) * Block_Len - 1 );
        AbsorbBlock( Block, Internal );
        -- scramble state with Keccak function
        Internal := Keccak_Function( Internal );

      end loop; -- end absorb loop for blocks
    end; -- end absorb stage

    --squeeze required bits from sponge in a loop as needed
    declare
      -- full blocks per output
      BPO     : constant Natural := Output'Length / Block_Len;
      -- stray bits per output
      SPO     : constant Natural := Output'Length mod Block_Len;
      Block   : Bitstream( 1 .. Block_Len );
    begin
      -- squeeze block by block (if at least one full block is needed)
      for I in 0 .. BPO - 1 loop
        SqueezeBlock( Block, Internal );
        Output( Output'First + I * Block_Len ..
                Output'First + (I + 1) * Block_Len -1) := Block;

        -- scramble state
        Internal := Keccak_Function( Internal );
      end loop;  -- end squeezing full blocks

      -- squeeze any partial block needed (stray bits)
      if SPO > 0 then
        SqueezeBlock( Block, Internal );
        Output( Output'Last - SPO + 1 .. Output'Last ) :=
                Block( Block'First .. Block'First + SPO - 1 );
      end if; -- end squeezing partial last block (stray bits)

    end; -- end squeeze stage
  end Sponge;

  -- convert from a bitstream of ZWord size to an actual ZWord number
  -- first bit of bitstream will be most significant bit of ZWord
  function BitsToWord( Bits: in Bitword ) return ZWord is
    W: ZWord;
    P: Natural;
  begin
    W := 0;
    P := 0;
    for I in reverse Bitword'Range loop
      W := W + ZWord( Bits(I) ) * ( 2**P );
      P := P + 1;
    end loop;
    return W;
  end BitsToWord;

  -- convert from a ZWord (lane of state) to a bitstream of ZWord size
  -- most significant bit of ZWord will be left most bit of bitstream
  function WordToBits( Word: in ZWord ) return Bitword is
    Bits: Bitword := (others => 0);
    W: ZWord;
  begin
    W := Word;
    for I in reverse Bitword'Range loop
      Bits( I ) := Bit( W mod 2 );
      W := W / 2;
    end loop;
    return Bits;
  end WordToBits;

-- helper procedures for sponge absorb/squeeze
  -- NO scramble here, this will absorb ALL given block, make sure it fits!
  procedure AbsorbBlock( Block: in Bitstream; S: in out State ) is
    WPB: constant Natural := Block'Length / Z_Length;   -- words per block
    SBB: constant Natural := Block'Length mod Z_Length; -- stray bits
    FromPos, ToPos        : Natural;
    X, Y                  : XYCoord;
    Word                  : ZWord;
    BWord                 : Bitword;
  begin
    -- xor current block into first Block'Length bits of state
    -- a block can consist in more than one word
    X := 0;
    Y := 0;
    for I in 0..WPB-1 loop
      FromPos := Block'First + I * Z_Length;
      ToPos   := FromPos + Z_Length - 1;
      Word := BitsToWord( Block( FromPos .. ToPos ) );
      S( X, Y ) := S( X, Y ) xor Word;
      -- move on to next word in state
      X := X + 1;
      if X = 0 then
        Y := Y + 1;
      end if;
    end loop;
    -- absorb also any remaining bits from block
    if SBB > 0 then
      ToPos := Block'Last;
      FromPos := ToPos - SBB + 1;
      BWord := (others => 0);
      BWord(Bitword'First .. Bitword'First + SBB - 1) := Block(ToPos..FromPos);
      Word := BitsToWord( BWord );
      S( X, Y ) := S( X, Y ) xor Word;
    end if;
  end AbsorbBlock;

  -- NO scramble here, this will squeeze Block'Length bits out of *same* state S
  procedure SqueezeBlock( Block: out Bitstream; S: in State) is
    X, Y    : XYCoord;
    BWord   : Bitword;
    FromPos : Natural;
    Len     : Natural;
  begin
    X := 0;
    Y := 0;
    FromPos := Block'First;

    while FromPos <= Block'Last loop
      BWord := WordToBits( S(X, Y) );

      X := X + 1;
      if X = 0 then
        Y := Y + 1;
      end if;

      -- copy full word if it fits or
      --   only as many bits as are still needed to fill the block
      Len := Block'Last - FromPos + 1;
      if Len > Z_Length then
        Len := Z_Length;
      end if;

      Block(FromPos..FromPos+Len-1) := BWord(BWord'First..BWord'First+Len-1);
      FromPos := FromPos + Len;
    end loop;
  end SqueezeBlock;

As usual, there are a few tests added to eucrypt/smg_keccak/tests/smg_keccak-test.adb. First, there is a simple test of the Keccak function itself (the one that puts together the transformations for a full scramble of the state). Second, there are tests of the two conversion functions from Bitword to ZWord and the other way around. Finally, there are tests for the sponge itself. The test data for the sponge was obtained by running the transformations separately according to the reference paper. The additional methods in the test file are the following:

 procedure print_bitstream(B: in Bitstream; Title: in String) is
    Hex       : array(0..15) of Character := ("0123456789ABCDEF");
    HexString : String(1..B'Length/4);
    C         : Natural;
    Pos       : Natural;
  begin
    for I in 1..B'Length/4 loop
      Pos := (I-1)*4 + B'First;
      C := Natural( B(Pos) ) * 8 +
           Natural( B(Pos + 1) ) * 4 +
           Natural( B(Pos + 2) ) * 2 +
           Natural( B(Pos + 3) );
			HexString(I) := Hex(C);
    end loop;
    Put_Line("---" & Title & "---");
    Put_Line(HexString);
  end print_bitstream;

  procedure test_bits_to_word_conversion is
    bits: Bitword := (others => 0);
    obtained_bits: Bitword := (others => 0);
    expected: ZWord;
    obtained: ZWord;
  begin
    expected := 16#E7DDE140798F25F1#;
    bits := (1,1,1,0, 0,1,1,1, 1,1,0,1, 1,1,0,1, 1,1,1,0, 0,0,0,1, 0,1,0,0,
             0,0,0,0, 0,1,1,1, 1,0,0,1, 1,0,0,0, 1,1,1,1, 0,0,1,0, 0,1,0,1,
             1,1,1,1, 0,0,0,1);
    obtained := BitsToWord(bits);
    obtained_bits := WordToBits(expected);

    if obtained /= expected then
      Put_Line("FAIL: bits to word");
      Put_Line("Expected: " & ZWord'Image(expected));
      Put_Line("Obtained: " & ZWord'Image(obtained));
    else
      Put_Line("PASSED: bits to word");
    end if;

    if obtained_bits /= bits then
      Put_Line("FAIL: word to bits");
      Put("Expected: ");
      for I in Bitword'Range loop
        Put(Bit'Image(bits(I)));
      end loop;
      Put_Line("");
      Put_Line("Obtained: ");
      for I in Bitword'Range loop
        Put(Bit'Image(obtained_bits(I)));
      end loop;
      Put_Line("");
    else
      Put_Line("PASSED: word to bits");
    end if;
  end test_bits_to_word_conversion;

  procedure test_sponge is
    Bitrate   : constant Keccak_Rate := 1344;
    Input     : Bitstream(1..5) := (1, 1, 0, 0, 1);
    Output    : Bitstream(1..Bitrate*2);
    Hex       : array(0..15) of Character := ("0123456789ABCDEF");
    HexString : String(1..Bitrate/2);
    C         : Natural;
    ExpHex    : String(1..Bitrate/2);
    Error     : Natural;
    Pos       : Natural;
  begin
    ExpHex := "B57B7DAED6330F79BA5783C5D45EABFFA1461FAC6CEA09BD"&
              "AAC114F17E23E5B349EECBC907E07FA36ECF8374079811E8"&
              "5E49243D04182C389E68C733BE698468423DB9891D3A7B10"&
              "320E0356AB4AB916F68C0EA20365A1D4DBA48218CA89CBB8"&
              "6D08A34E04544D4100FFE9CB138EADC2D3FC0E8CC2BC15A7"&
              "5B950776970BFC310F33BF609630D73CAD918CF54657589E"&
              "42CF7CBF20DE677D2AB7E49389F6F6C3B3FE2992905325CE"&
              "60931C1515043595ADC1619CB7E034EF52BDC485D03B7FDD"&
              "7345E849FFB4C4426195C8D88C1E7BF9ADA41B92E006C3DA"&
              "F1ED0FD63ADD9408A3FC815F727457692727637687C1F79D"&
              "837DE20798E64C878181C02DF56A533F684459E8A03C8EF6"&
              "234854531110E6CD9BDEFEA85E35C802B1ACDDF29C9332E2"&
              "53C0FA72F3ED1ABA274838CFE6EF8BD572E89E1C2135F6A7"&
              "5BC5D6EA4F85C9A757E68E561A56AC0FC19F1F086C43272F";

    Put_Line("---sponge test---");
    Sponge(Input, Bitrate, Output);
    Put_Line("Input is:");
    for I of Input loop
      Put(Bit'Image(I));
    end loop;
    new_line(1);

    Put_Line("Output is:");
    for I of Output loop
      Put(Bit'Image(I));
    end loop;
    new_line(1);

    Error := 0;
    for I in 1..Output'Length/4 loop
      Pos := Output'First + (I-1)*4;
      C := Natural( Output( Pos ) ) * 8 +
           Natural( Output( Pos + 1 ) ) * 4 +
           Natural( Output( Pos + 2 ) ) * 2 +
           Natural( Output( Pos + 3 ) );
			Hexstring(I) := Hex(C);
      if Hexstring(I) /= ExpHex(I) then
        Error := Error + 1;
      end if;
    end loop;
    Put_Line("Expected: ");
    Put_Line(ExpHex);
    Put_Line("Obtained: ");
    Put_Line(Hexstring);
    Put_Line("Errors found: " & Natural'Image(Error));

  end test_sponge;

  procedure test_keccak_function(T: in Test_Round) is
    S: State;
  begin
    Put_Line("---Full Keccak Function test---");
    S := Keccak_Function(T(Round_Index'First)(None));
    if S /= T(Round_Index'Last)(Iota) then
      Put_Line("FAILED: full keccak function test");
    else
      Put_Line("PASSED: full keccak function test");
    end if;
  end test_keccak_function;

The .vpatch for the above, together with its signature are as usual on my Reference Code Shelf and linked for your convenience below:

In the next chapter I'll get rid of the endianness trouble by changing everything in the Keccak implementation (constants' representation included) so that all of it works at bit level, as it should! Feel free to play around with this current version anyway but be aware that it's not the one that will be used in EuCrypt. For the proper, bit-level implementation of Keccak, see Chapter 8 of this series (to be published next week).


  1. As described in The Keccak Reference v. 3.0, Bertoni, G., Daemen, J., Peeters, M. and Van Assche, G., 2011.