STREAMベンチマーク
メモリの帯域性能を測るのにSTREAMベンチマークがある。
使い方はソースをダウンロードしてコンパイルするだけ。
専門家が真面目にやるときはgccではなくインテルのコンパイラを使ったり、アレイサイズなどのパラメータを調整するようだが、最適化オプション付けただけの手抜きでも、大体92GB/s ~ 105GB/sでており、ほぼインテルの資料通りの数値が出てしまっている。
# gcc -O3 stream.c -o stream # # gcc -O3 -fopenmp stream.c -o stream_openmp # # export OMP_NUM_THREADS=24 # ./stream_openmp ------------------------------------------------------------- STREAM version $Revision: 5.10 $ ------------------------------------------------------------- This system uses 8 bytes per array element. ------------------------------------------------------------- Array size = 10000000 (elements), Offset = 0 (elements) Memory per array = 76.3 MiB (= 0.1 GiB). Total memory required = 228.9 MiB (= 0.2 GiB). Each kernel will be executed 10 times. The *best* time for each kernel (excluding the first iteration) will be used to compute the reported bandwidth. ------------------------------------------------------------- Number of Threads requested = 24 Number of Threads counted = 24 ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 24683 microseconds. (= 24683 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Best Rate MB/s Avg time Min time Max time Copy: 100312.2 0.002189 0.001595 0.004063 Scale: 92691.8 0.001851 0.001726 0.002420 Add: 103541.8 0.002493 0.002318 0.002596 Triad: 105219.3 0.002473 0.002281 0.002672 ------------------------------------------------------------- Solution Validates: avg error less than 1.000000e-13 on all three arrays -------------------------------------------------------------