Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.14.0
-
RHEL8.3 servers/clients
-
3
-
9223372036854775807
Description
sanity test_398c fails with 'fio write error'. This failure is different from LU-13897.
We’ve only seen this failure once for RHEL8.3 server/client testing at https://testing.whamcloud.com/test_sets/fe2f2ee3-067d-431b-b69c-01858086821c . Looking at the suite_log output, we see a couple of fio calls with ‘err=71’
== sanity test 398c: run fio to test AIO ============================================================= 21:25:37 (1607462737) /usr/bin/fio debug=0 40+0 records in 40+0 records out 41943040 bytes (42 MB, 40 MiB) copied, 0.0712628 s, 589 MB/s osc.lustre-OST0000-osc-ffff8a2144fe3000.rpc_stats=clear writing 40M to OST0 by fio with 4 jobs... rand-write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=16 ... fio-3.19 Starting 4 processes fio: io_u error on file /mnt/lustre/f398c.sanity: Protocol error: write offset=2654208, buflen=4096 fio: io_u error on file /mnt/lustre/f398c.sanity: Protocol error: write offset=2682880, buflen=4096 fio: io_u error on file /mnt/lustre/f398c.sanity: Protocol error: write offset=2686976, buflen=4096 fio: io_u error on file /mnt/lustre/f398c.sanity: Protocol error: write offset=2695168, buflen=4096 fio: io_u error on file /mnt/lustre/f398c.sanity: Protocol error: write offset=3534848, buflen=4096 fio: io_u error on file /mnt/lustre/f398c.sanity: Protocol error: write offset=3543040, buflen=4096 fio: io_u error on file /mnt/lustre/f398c.sanity: Protocol error: write offset=3547136, buflen=4096 fio: io_u error on file /mnt/lustre/f398c.sanity: Protocol error: write offset=3567616, buflen=4096 fio: io_u error on file /mnt/lustre/f398c.sanity: Protocol error: write offset=3575808, buflen=4096 fio: io_u error on file /mnt/lustre/f398c.sanity: Protocol error: write offset=3588096, buflen=4096 fio: io_u error on file /mnt/lustre/f398c.sanity: Protocol error: write offset=3596288, buflen=4096 fio: pid=892877, err=71/file:io_u.c:1803, func=io_u error, error=Protocol error rand-write: (groupid=0, jobs=1): err= 0: pid=892876: Tue Dec 8 21:28:37 2020 write: IOPS=14, BW=58.4KiB/s (59.8kB/s)(10.0MiB/175370msec); 0 zone resets slat (usec): min=14, max=321, avg=37.10, stdev=35.86 clat (msec): min=116, max=1944, avg=1095.64, stdev=174.44 lat (msec): min=116, max=1944, avg=1095.68, stdev=174.44 clat percentiles (msec): | 1.00th=[ 617], 5.00th=[ 877], 10.00th=[ 927], 20.00th=[ 978], | 30.00th=[ 1011], 40.00th=[ 1053], 50.00th=[ 1083], 60.00th=[ 1116], | 70.00th=[ 1150], 80.00th=[ 1217], 90.00th=[ 1284], 95.00th=[ 1385], | 99.00th=[ 1737], 99.50th=[ 1854], 99.90th=[ 1905], 99.95th=[ 1921], | 99.99th=[ 1938] bw ( KiB/s): min= 8, max= 128, per=25.59%, avg=59.36, stdev=25.99, samples=343 iops : min= 2, max= 32, avg=14.84, stdev= 6.50, samples=343 lat (msec) : 250=0.04%, 500=0.23%, 750=2.30%, 1000=24.30%, 2000=73.12% cpu : usr=0.01%, sys=0.05%, ctx=756, majf=0, minf=10 IO depths : 1=0.1%, 2=0.1%, 4=0.2%, 8=0.3%, 16=99.4%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=0,2560,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=16 rand-write: (groupid=0, jobs=1): err=71 (file:io_u.c:1803, func=io_u error, error=Protocol error): pid=892877: Tue Dec 8 21:28:37 2020 … Run status group 0 (all jobs): WRITE: bw=233KiB/s (239kB/s), 58.2KiB/s-58.4KiB/s (59.6kB/s-59.8kB/s), io=39.0MiB (41.9MB), run=175199-175612msec sanity test_398c: @@@@@@ FAIL: fio write error Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:6257:error() = /usr/lib64/lustre/tests/sanity.sh:22338:test_398c()
Looking at the console logs, client1 (vm1) has as error
[19342.934999] Lustre: DEBUG MARKER: == sanity test 398c: run fio to test AIO ============================================================= 21:25:37 (1607462737) [19520.867822] LustreError: 558753:0:(osc_request.c:1947:osc_brw_fini_request()) lustre-OST0005-osc-ffff8a2144fe3000: unexpected positive size 1 [19523.015054] Lustre: DEBUG MARKER: /usr/sbin/lctl mark sanity test_398c: @@@@@@ FAIL: fio write error [19523.582640] Lustre: DEBUG MARKER: sanity test_398c: @@@@@@ FAIL: fio write error