iSCSI vs NVMe over TCP
同一環境下(OS、NICやマシン)で、さらに条件を揃えるため、SSDではなくnullデバイスを使用。
下記出力のsdeがiscsiデバイスで、nvme1n1がNVMe Over TCP。両者ともリモートマシン上の同一/dev/nullb0に接続されている。
# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 447.1G 0 disk ├─sda1 8:1 0 600M 0 part /boot/efi ├─sda2 8:2 0 1G 0 part /boot └─sda3 8:3 0 445.6G 0 part ├─fc31-root 253:0 0 15G 0 lvm / └─fc31-swap 253:1 0 31.4G 0 lvm [SWAP] sdb 8:16 0 1.5T 0 disk sdc 8:32 0 1.5T 0 disk sdd 8:48 1 14.6G 0 disk └─sdd1 8:49 1 14.6G 0 part sde 8:64 0 250G 0 disk sr0 11:0 1 2G 0 rom nvme0n1 259:0 0 477G 0 disk ├─nvme0n1p1 259:1 0 512M 0 part └─nvme0n1p2 259:2 0 476.4G 0 part nvme1n1 259:4 0 250G 0 disk
FIOを使い、リード70%ライト30%、ブロックサイズ4KB、256並列、Qデプスは1の条件で計測する。
まずはiSCSI。リードが24,800 IOPSでスループットは97.0MB/s、レーテンシは平均8.2msといった感じ。
# fio --name=iscsi --filename=/dev/sde --rw=randrw --rwmixread=70 --direct=1 --invalidate=1 --ioengine=libaio --bs=4k --numjobs=256 --time_based --runtime=10 --group_reporting --iodepth=1 iscsi: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1 ... fio-3.7 Starting 256 processes Jobs: 256 (f=256): [m(256)][100.0%][r=98.8MiB/s,w=43.3MiB/s][r=25.3k,w=11.1k IOPS][eta 00m:00s] iscsi: (groupid=0, jobs=256): err= 0: pid=2766: Fri Dec 6 22:59:38 2019 read: IOPS=24.8k, BW=97.0MiB/s (102MB/s)(973MiB/10026msec) slat (nsec): min=1395, max=12744k, avg=71535.37, stdev=174994.31 clat (nsec): min=1088, max=42954k, avg=8100632.98, stdev=6503994.47 lat (usec): min=126, max=42972, avg=8172.73, stdev=6508.26 clat percentiles (usec): | 1.00th=[ 799], 5.00th=[ 1156], 10.00th=[ 1500], 20.00th=[ 2245], | 30.00th=[ 3195], 40.00th=[ 4490], 50.00th=[ 6128], 60.00th=[ 8225], | 70.00th=[10814], 80.00th=[13829], 90.00th=[17957], 95.00th=[20841], | 99.00th=[26608], 99.50th=[28443], 99.90th=[32637], 99.95th=[34341], | 99.99th=[38011] bw ( KiB/s): min= 216, max= 624, per=0.39%, avg=389.93, stdev=55.55, samples=4892 iops : min= 54, max= 156, avg=97.43, stdev=13.89, samples=4892 write: IOPS=10.7k, BW=41.8MiB/s (43.8MB/s)(419MiB/10026msec) slat (usec): min=2, max=1860, avg=72.70, stdev=174.42 clat (nsec): min=1193, max=32943k, avg=4786197.03, stdev=3747743.28 lat (usec): min=197, max=32991, avg=4859.45, stdev=3753.62 clat percentiles (usec): | 1.00th=[ 742], 5.00th=[ 1020], 10.00th=[ 1270], 20.00th=[ 1713], | 30.00th=[ 2212], 40.00th=[ 2802], 50.00th=[ 3556], 60.00th=[ 4555], | 70.00th=[ 5800], 80.00th=[ 7570], 90.00th=[10159], 95.00th=[12518], | 99.00th=[16909], 99.50th=[18482], 99.90th=[22676], 99.95th=[25035], | 99.99th=[29230] bw ( KiB/s): min= 48, max= 328, per=0.39%, avg=168.04, stdev=41.55, samples=4892 iops : min= 12, max= 82, avg=41.96, stdev=10.39, samples=4892 lat (usec) : 2=0.01%, 4=0.01%, 250=0.01%, 500=0.06%, 750=0.77% lat (usec) : 1000=2.57% lat (msec) : 2=16.19%, 4=22.56%, 10=31.77%, 20=21.64%, 50=4.44% cpu : usr=0.21%, sys=0.66%, ctx=392549, majf=0, minf=3695 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=249030,107245,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=1 Run status group 0 (all jobs): READ: bw=97.0MiB/s (102MB/s), 97.0MiB/s-97.0MiB/s (102MB/s-102MB/s), io=973MiB (1020MB), run=10026-10026msec WRITE: bw=41.8MiB/s (43.8MB/s), 41.8MiB/s-41.8MiB/s (43.8MB/s-43.8MB/s), io=419MiB (439MB), run=10026-10026msec Disk stats (read/write): sdd: ios=246238/106056, merge=0/0, ticks=1986382/502897, in_queue=2313614, util=97.08% #
続いてNVMe Over TCPも同一条件で計測する。 同じくリードに注目すると104,000 IOPS、スループットが427MB/s、平均レーテンシが1.7ms。
[root@rdma21 ~]# fio --name=nvme-tcp --filename=/dev/nvme0n1 --rw=randrw --rwmixread=70 --direct=1 --invalidate=1 --ioengine=libaio --bs=4k --numjobs=256 --time_based --runtime=10 --group_reporting --iodepth=1 rdma: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1 ... fio-3.7 Starting 256 processes Jobs: 256 (f=256): [m(256)][100.0%][r=407MiB/s,w=175MiB/s][r=104k,w=44.8k IOPS][eta 00m:00s] rdma: (groupid=0, jobs=256): err= 0: pid=2591: Fri Dec 6 22:14:10 2019 read: IOPS=104k, BW=407MiB/s (427MB/s)(4074MiB/10008msec) slat (nsec): min=1461, max=16435k, avg=13431.75, stdev=55340.30 clat (nsec): min=627, max=33919k, avg=1692241.44, stdev=1242439.66 lat (usec): min=56, max=33964, avg=1705.99, stdev=1242.58 clat percentiles (usec): | 1.00th=[ 151], 5.00th=[ 245], 10.00th=[ 355], 20.00th=[ 594], | 30.00th=[ 857], 40.00th=[ 1156], 50.00th=[ 1467], 60.00th=[ 1778], | 70.00th=[ 2147], 80.00th=[ 2606], 90.00th=[ 3326], 95.00th=[ 4015], | 99.00th=[ 5604], 99.50th=[ 6325], 99.90th=[ 8225], 99.95th=[ 9241], | 99.99th=[14353] bw ( KiB/s): min= 1056, max= 3272, per=0.39%, avg=1625.09, stdev=240.45, samples=4870 iops : min= 264, max= 818, avg=406.27, stdev=60.12, samples=4870 write: IOPS=44.7k, BW=175MiB/s (183MB/s)(1749MiB/10008msec) slat (nsec): min=1677, max=18010k, avg=14509.37, stdev=52819.88 clat (nsec): min=486, max=47722k, avg=1714147.63, stdev=1264153.19 lat (usec): min=53, max=47816, avg=1728.97, stdev=1264.08 clat percentiles (usec): | 1.00th=[ 135], 5.00th=[ 233], 10.00th=[ 347], 20.00th=[ 594], | 30.00th=[ 873], 40.00th=[ 1172], 50.00th=[ 1483], 60.00th=[ 1811], | 70.00th=[ 2180], 80.00th=[ 2638], 90.00th=[ 3359], 95.00th=[ 4047], | 99.00th=[ 5669], 99.50th=[ 6390], 99.90th=[ 8291], 99.95th=[ 9372], | 99.99th=[14484] bw ( KiB/s): min= 416, max= 1328, per=0.39%, avg=697.71, stdev=116.18, samples=4870 iops : min= 104, max= 332, avg=174.40, stdev=29.05, samples=4870 lat (nsec) : 500=0.01%, 750=0.01%, 1000=0.01% lat (usec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01% lat (usec) : 100=0.16%, 250=5.24%, 500=10.92%, 750=9.63%, 1000=8.69% lat (msec) : 2=31.38%, 4=28.87%, 10=5.05%, 20=0.04%, 50=0.01% cpu : usr=0.45%, sys=0.91%, ctx=1980857, majf=0, minf=3471 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=1042843,447857,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=1 Run status group 0 (all jobs): READ: bw=407MiB/s (427MB/s), 407MiB/s-407MiB/s (427MB/s-427MB/s), io=4074MiB (4271MB), run=10008-10008msec WRITE: bw=175MiB/s (183MB/s), 175MiB/s-175MiB/s (183MB/s-183MB/s), io=1749MiB (1834MB), run=10008-10008msec Disk stats (read/write): nvme0n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00% #
NVMe over TCPは、従来からあるNVMe over Fabric (ROCE)に比べて性能が出ないと言われるが、iSCSIと比較すると圧倒的に良い。
同一環境でRDMA設定し、NVMe over Fabric (ROCE)との性能比較も実施する。