Intel Memory Latency Checker (mlc)
メモリのレイテンシと帯域幅をNUMAノード毎に計測してくれる。コマンド一発で簡単で便利。ダウンロードはここ。
2ソケのマシンでローカルは85ns前後、リモートが130~140nsという感じ。
# ./mlc Intel(R) Memory Latency Checker - v3.9 Measuring idle latencies (in ns)... Numa node Numa node 0 1 0 84.6 132.5 1 137.2 85.0 Measuring Peak Injection Memory Bandwidths for the system Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec) Using all the threads from each core if Hyper-threading is enabled Using traffic with the following read-write ratios ALL Reads : 214676.0 3:1 Reads-Writes : 198112.2 2:1 Reads-Writes : 196484.1 1:1 Reads-Writes : 180014.4 Stream-triad like: 176706.4 Measuring Memory Bandwidths between nodes within system Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec) Using all the threads from each core if Hyper-threading is enabled Using Read-only traffic type Numa node Numa node 0 1 0 108625.7 34419.7 1 34478.1 106305.6 Measuring Loaded Latencies for the system Using all the threads from each core if Hyper-threading is enabled Using Read-only traffic type Inject Latency Bandwidth Delay (ns) MB/sec ========================== 00000 193.25 215455.4 00002 193.47 215368.1 00008 193.03 215284.4 00015 188.18 215231.4 00050 158.86 206498.9 00100 115.37 156563.4 00200 105.77 107812.6 00300 101.93 77927.1 00400 99.50 61162.8 00500 97.91 50090.8 00700 96.64 36768.6 01000 94.84 26342.3 01300 94.32 20592.4 01700 93.30 16004.3 02500 92.33 11171.7 03500 91.60 8209.9 05000 91.14 5974.3 09000 90.24 3648.2 20000 89.55 2039.8 Measuring cache-to-cache transfer latency (in ns)... Local Socket L2->L2 HIT latency 49.0 Local Socket L2->L2 HITM latency 49.0 Remote Socket L2->L2 HITM latency (data address homed in writer socket) Reader Numa Node Writer Numa Node 0 1 0 - 111.3 1 111.1 - Remote Socket L2->L2 HITM latency (data address homed in reader socket) Reader Numa Node Writer Numa Node 0 1 0 - 178.5 1 180.7 -