Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14205

RHEL8.3: sanity test 398c fails with 'fio write error'

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.14.0
    • Lustre 2.14.0
    • RHEL8.3 servers/clients
    • 3
    • 9223372036854775807

    Description

      sanity test_398c fails with 'fio write error'. This failure is different from LU-13897.
      We’ve only seen this failure once for RHEL8.3 server/client testing at https://testing.whamcloud.com/test_sets/fe2f2ee3-067d-431b-b69c-01858086821c . Looking at the suite_log output, we see a couple of fio calls with ‘err=71’

      == sanity test 398c: run fio to test AIO ============================================================= 21:25:37 (1607462737)
      /usr/bin/fio
      debug=0
      40+0 records in
      40+0 records out
      41943040 bytes (42 MB, 40 MiB) copied, 0.0712628 s, 589 MB/s
      osc.lustre-OST0000-osc-ffff8a2144fe3000.rpc_stats=clear
      writing 40M to OST0 by fio with 4 jobs...
      rand-write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=16
      ...
      fio-3.19
      Starting 4 processes
      fio: io_u error on file /mnt/lustre/f398c.sanity: Protocol error: write offset=2654208, buflen=4096
      fio: io_u error on file /mnt/lustre/f398c.sanity: Protocol error: write offset=2682880, buflen=4096
      fio: io_u error on file /mnt/lustre/f398c.sanity: Protocol error: write offset=2686976, buflen=4096
      fio: io_u error on file /mnt/lustre/f398c.sanity: Protocol error: write offset=2695168, buflen=4096
      fio: io_u error on file /mnt/lustre/f398c.sanity: Protocol error: write offset=3534848, buflen=4096
      fio: io_u error on file /mnt/lustre/f398c.sanity: Protocol error: write offset=3543040, buflen=4096
      fio: io_u error on file /mnt/lustre/f398c.sanity: Protocol error: write offset=3547136, buflen=4096
      fio: io_u error on file /mnt/lustre/f398c.sanity: Protocol error: write offset=3567616, buflen=4096
      fio: io_u error on file /mnt/lustre/f398c.sanity: Protocol error: write offset=3575808, buflen=4096
      fio: io_u error on file /mnt/lustre/f398c.sanity: Protocol error: write offset=3588096, buflen=4096
      fio: io_u error on file /mnt/lustre/f398c.sanity: Protocol error: write offset=3596288, buflen=4096
      fio: pid=892877, err=71/file:io_u.c:1803, func=io_u error, error=Protocol error
      
      rand-write: (groupid=0, jobs=1): err= 0: pid=892876: Tue Dec  8 21:28:37 2020
        write: IOPS=14, BW=58.4KiB/s (59.8kB/s)(10.0MiB/175370msec); 0 zone resets
          slat (usec): min=14, max=321, avg=37.10, stdev=35.86
          clat (msec): min=116, max=1944, avg=1095.64, stdev=174.44
           lat (msec): min=116, max=1944, avg=1095.68, stdev=174.44
          clat percentiles (msec):
           |  1.00th=[  617],  5.00th=[  877], 10.00th=[  927], 20.00th=[  978],
           | 30.00th=[ 1011], 40.00th=[ 1053], 50.00th=[ 1083], 60.00th=[ 1116],
           | 70.00th=[ 1150], 80.00th=[ 1217], 90.00th=[ 1284], 95.00th=[ 1385],
           | 99.00th=[ 1737], 99.50th=[ 1854], 99.90th=[ 1905], 99.95th=[ 1921],
           | 99.99th=[ 1938]
         bw (  KiB/s): min=    8, max=  128, per=25.59%, avg=59.36, stdev=25.99, samples=343
         iops        : min=    2, max=   32, avg=14.84, stdev= 6.50, samples=343
        lat (msec)   : 250=0.04%, 500=0.23%, 750=2.30%, 1000=24.30%, 2000=73.12%
        cpu          : usr=0.01%, sys=0.05%, ctx=756, majf=0, minf=10
        IO depths    : 1=0.1%, 2=0.1%, 4=0.2%, 8=0.3%, 16=99.4%, 32=0.0%, >=64=0.0%
           submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
           complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
           issued rwts: total=0,2560,0,0 short=0,0,0,0 dropped=0,0,0,0
           latency   : target=0, window=0, percentile=100.00%, depth=16
      rand-write: (groupid=0, jobs=1): err=71 (file:io_u.c:1803, func=io_u error, error=Protocol error): pid=892877: Tue Dec  8 21:28:37 2020
      …
      Run status group 0 (all jobs):
        WRITE: bw=233KiB/s (239kB/s), 58.2KiB/s-58.4KiB/s (59.6kB/s-59.8kB/s), io=39.0MiB (41.9MB), run=175199-175612msec
       sanity test_398c: @@@@@@ FAIL: fio write error 
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:6257:error()
        = /usr/lib64/lustre/tests/sanity.sh:22338:test_398c()
      

      Looking at the console logs, client1 (vm1) has as error

      [19342.934999] Lustre: DEBUG MARKER: == sanity test 398c: run fio to test AIO ============================================================= 21:25:37 (1607462737)
      [19520.867822] LustreError: 558753:0:(osc_request.c:1947:osc_brw_fini_request()) lustre-OST0005-osc-ffff8a2144fe3000: unexpected positive size 1
      [19523.015054] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  sanity test_398c: @@@@@@ FAIL: fio write error 
      [19523.582640] Lustre: DEBUG MARKER: sanity test_398c: @@@@@@ FAIL: fio write error
      

      Attachments

        Issue Links

          Activity

            People

              wshilong Wang Shilong (Inactive)
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: