We seperate key expansion away from encryption. Which means we have to call updateKey() before using a new cipher key to encrypt message.
Because SubBytes is independent of each byte’s location in states and ShiftRows only shifts in integer of bytes, these two part could exchange their position in processing sequence without changing the result. Although no improvement is achieved here, this will benefit later optimization.
The matrix multiplication in MixColumns is actually two parts: multiply bytes in column of states with bytes in row of the matrix, then add the result. Value of matrix elements could only be 01, 02 or 03. We can get the original result of SubBytes and its multiplication of 01, 02 or 03 by one-time’s look up of a new S-Box table called “sbox_mix_col_1”. This could save a lot of logics to do multiplication.
Based on similar consideration, we also merge such multiplication in KeyExpansion into one time table look up of another new S-Box called “sbox_Rcon”. Although sbox_Rcon and sbox_mix_col_1 are bigger than original S-Box, they all could be stored in 1 BRAM on chip. Such merges saves logics without additional resource cost.