SHA-3 FPGA Implementation
The hardware performance of the SHA-3 candidates was evaluated on SASEBO-GII (Virtex-5 xc5vlx30). The hash hardware
macros were developed by Sakiyama Lab., University of Electro-Communications. The source code can be used freely for research purposes.
Control Software
Control FPGA (Spartan-3A xc3s50a)
Reference
Algorithm |
Code |
Block
Size
[bit] |
Speed |
Size |
Max
Freq.
[MHz] |
Clock
Cycles |
Throughput
[Mbps]
(Long Msg) |
Latency
[us]
(Short Msg)
1024bit |
Slices |
Reg |
LUTs |
Core |
Core
+I/F |
Core |
Core
+I/F |
Core |
Core
+I/F |
SHA-256 |
src/bin |
512 |
260 |
68 |
148 |
1,958 |
899 |
0.785 |
1.89 |
609 |
1,224 |
2,045 |
BLAKE |
src/bin |
512 |
115 |
22 |
121 |
2,676 |
478 |
0.574 |
3.57 |
1,660 |
1,393 |
5,154 |
BMW |
src/bin
| 512 |
34 |
2 |
98 |
8,704 |
178 |
0.235 |
10.12 |
4,530 |
1,317 |
15,012 |
CubeHash |
src/bin
| 256 |
185 |
16 |
64 |
2,960 |
740 |
1.297 |
2.85 |
590 |
1,316 |
2,182 |
ECHO |
src/bin
| 1,536 |
149 |
99 |
407 |
2,312 |
562 |
0.664 |
3.05 |
2,827 |
4,198 |
9,885 |
Fugue |
src/bin
| 32 |
78 |
2 |
8 |
1,248 |
312 |
1.321 |
4.47 |
4,013 |
1,043 |
13,255 |
Grøstl |
src/bin
| 512 |
154 |
10 |
106 |
7,885 |
744 |
0.260 |
2.44 |
2,616 |
1,570 |
10,088 |
Hamsi |
src/bin
| 32 |
210 |
4 |
10 |
1,680 |
672 |
0.690 |
1.92 |
718 |
841 |
2,499 |
JH |
src/bin
| 512 |
201 |
39 |
135 |
2,639 |
762 |
0.582 |
2.25 |
2,661 |
1,612 |
8,392 |
Keccak |
src/bin | 1,024 |
205 |
25 |
217 |
8,397 |
967 |
0.244 |
2.35 |
1,433 |
2,666 |
4,806 |
Luffa |
src/bin
| 256 |
261 |
9 |
57 |
7,424 |
1,172 |
0.207 |
1.31 |
1,048 |
1,446 |
3,754 |
Shabal |
src/bin
| 512 |
228 |
50 |
143 |
2,335 |
816 |
1.316 |
2.75 |
1,251 |
2,061 |
4,219 |
SHAvite-3 |
src/bin
| 512 |
251 |
38 |
143 |
3,382 |
959 |
0.454 |
1.79 |
1,063 |
1,363 |
3,564 |
SIMD |
src/bin
| 512 |
75 |
46 |
142 |
835 |
270 |
1.840 |
6.32 |
3,987 |
6,693 |
13,908 |
Skein |
src/bin
| 256 |
115 |
21 |
75 |
1,402 |
393 |
0.904 |
3.20 |
854 |
929 |
2,864 |
ASIC Implementation
ASIC hardware performance of the SHA-3 candidates was evauated using the
same designs shown above when designed in the STMicro 90-nm standard cell
library.
Algorithm |
Code |
Synthesis
Module |
Block
Size
[bit] |
Speed |
Size
[gate] |
Efficiency
[Kbs/
gate] |
Max
Freq.
[MHz] |
Clock
Cycles |
Through-
put
[Mbps] |
SHA-256 |
src |
SHA256
_CORE |
512 |
735.3 355.9 116.6 |
68 |
5,536 2,680 878 |
18,677 13,199 11,332 |
290.6 203.0 77.4 |
BLAKE |
src |
BLAKE
_CORE |
512 |
286.5 260.4 146.6 |
22 |
6,668 6,061 3,412 |
36,994 30,292 23,214 |
180.5 200.1 147.0 |
BMW |
src |
bmw256 |
512 |
101.3 84.4 67.4 |
2 |
25,937 21,603 17,262 |
128,655 115,001 105,566 |
201.6 187.9 163.5 |
CubeHash |
src |
CubeHash
_CORE |
256 |
515.5 352.1 171.8 |
16 |
8,247 5,834 2,749 |
35,548 21,336 16,320 |
232.0 264.1 168.5 |
ECHO |
src |
ECHO
_CORE |
1,536 |
362.3 260.4 146.8 |
99 |
5,621 4,040 2,278 |
101,068 67,803 57,834 |
55.6 59.6 39.4 |
Fugue |
src |
fugue256 |
32 |
170.1 113.0 77.8 |
2 |
2,721 1,808 1,245 |
56,734 45,553 46,683 |
48.0 37.9 26.7 |
Grøstl |
src |
GROESTL
_CORE |
512 |
337.8 257.7 127.9 |
10 |
17,297 13,196 6,547 |
139,113 86,191 56,66 |
124.3 153.1 115.5 |
Hamsi |
src |
HAMSHI
_CORE |
32 |
970.9 543.5 352.1 |
4 |
7,767 4,348 2,817 |
67,582 36,981 32,116 |
114.9 117.6 87.7 |
JH |
src |
crypto
_fpga |
512 |
763.4 694.4 353.4 |
39 |
10,022 9.117 4,639 |
54,594 42,775 31,864 |
183.6 213.1 145.6 |
Keccak |
src |
keccak |
1,024 |
781.3 540.5 354.6 |
25 |
33,333 23,063 15,130 |
50,675 33,664 29,548 |
657.8 685.1 512.0 |
Luffa |
src |
LUFFA
_CORE |
256 |
1,010.1 537.6 262.5 |
9 |
28,732 15,293 7,466 |
39,642 19,797 19,359 |
724.8 772.5 385.6 |
Shabal |
src |
SHABAL
_CORE |
512 |
591.7 543.5 350.9 |
50 |
6,059 5,565 3,593 |
34,642 30,328 27,752 |
174.9 183.5 129.5 |
SHAvite-3 |
src |
Shavite
_top |
512 |
625.0 492.6 206.6 |
38 |
8,421 6,637 2,784 |
59,390 42,036 33,875 |
141.8 157.9 82.2 |
SIMD |
src |
simd256 |
512 |
284.9 261.1 113.1 |
46 |
3,171
2,906 1,259 |
138,980 122,118 88,947 |
22.8 23.8 14.2 |
Skein |
src |
SKEIN
_CORE |
256 |
270.3 207.0 146.2 |
21 |
3,295 2,524 1,782 |
43,132 28,782 22,562 |
76.4 87.7 79.0 |
|
Hardware optimized ImplementationAIST and Aoki
Lab., Tohoku University are cooperatively developing SHA-3 hardware with various architectures.
The performance results of the developed SHA-3 hardware were evaluated
using the STMicro 90-nm standard cell library . The source code released
below is only for academic use. Reading the conditions and restrictions is strongly
recommended.
Algorithm |
Code |
Synthesis
Module |
Block
Size
[bit] |
S-box |
Speed |
Size
[gate] |
Efficiency
[Kbs/
gate] |
Max
Freq.
[MHz] |
Clock
Cycles |
Through-
put
[Mbps] |
SHA-256 |
src |
SHA256 |
512 |
|
505.1 349.7 209.2 |
72 |
3,592 2,486 1,488 |
15,574 9,563 8,230 |
230.6 260.0 180.8 |
Grøstl (Normal) |
src |
GROESTL
_256 |
512 |
|
349.7 260.4 113.1 |
11 |
16,275 12,121 5,265 |
120,812 70,953 57,908 |
134.7 170.8 90.9 |
Grøstl (Compact) |
src |
GROESTL
_256 |
512 |
|
348.4 261.8 101.6 |
21 |
8,945 6,382 2,478 |
84,053 46,256 34,783 |
101.1 138.0 71.2 |
Keccak |
src |
keccak |
1,024 |
|
1,030.9 552.5 355.9 |
24 |
43,986 23,573 15,184 |
55,900 26,501 25,167 |
786.9 889.5 603.3 |
Luffa
(High-Speed) |
src |
Luffa |
256 |
Bit
Slice |
625.0 552.5 357.1 |
5 |
32,000 28,287 18,286 |
60,856 44,290 31,201 |
525.8 638.7 586.1 |
Table |
684.9 549.5
350.9 |
35,069 28,132
17,965 |
62,838 38,274
29,336 |
558.1 735.0
612.4 |
Luffa
(Normal) |
src |
Luffa |
Bit
Slice |
1,000.0 552.5 357.1 |
9 |
28,444 15,715 10,159 |
39,394 19,736 18,907 |
722.0 796.3 537.3 |
Table |
1,087.0 549.5
355.9 |
31,258 15,629 10,123 |
39,513 19,604 18,933 |
791.1 797.2 534.7 |
Luffa
(Compact 1) |
src |
Luffa |
Bit
Slice |
757.6 546.6 355.9 |
25 |
7,758 5,596 3,641 |
25,558 17,477 14,710 |
303.5 320.2 247.7 |
Table |
862.5 555.6
355.9 |
8,463 5,689 3,644 |
26,373 16,467 14,817 |
320.9 345.5 245.9 |
Luffa
(Compact 2) |
src |
Luffa |
Bit
Slice |
813.0 552.5 358.4 |
129 |
1,613 1,096 711 |
24,285 16,801 15,381 |
66.4 65.3 46.2 |
Table |
813.0 555.6
358.4 |
1,613 1,103 811 |
22,500 16,633 15,383 |
71.7 66.3 46.2 |
|
|
|