[LU-17425] sanity: Timeout occurred after 360 minutes sanity/398b (ldiskfs) Created: 15/Jan/24  Updated: 17/Jan/24  Resolved: 15/Jan/24

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Not a Bug Votes: 0
Labels: None

Attachments: Zip Archive lustre-rsync-test_2023-12-13_0408.zip    
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Arshad <arshad.hussain@aeoncomputing.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/c4ae0a74-b6f4-43c8-bdf0-7a11bcfac0d9

Test session details:
clients: https://build.whamcloud.com/job/lustre-reviews/101345 - 5.14.0-362.8.1.el9_3.x86_64
servers: https://build.whamcloud.com/job/lustre-reviews/101345 - 5.14.0-284.30.1_lustre.el9.x86_64

 

== sanity test 398b: DIO and buffer IO race ============== 01:03:41 (1705280621)
/usr/bin/fio
48+0 records in
48+0 records out
50331648 bytes (50 MB, 48 MiB) copied, 1.13544 s, 44.3 MB/s
mix direct rw 4096 by fio with 4 jobs...
mix buffer rw 4096 by fio with 4 jobs...
rand-rw: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=16
...
fio-3.35
Starting 4 processes
rand-rw: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=16
...
fio-3.35
Starting 4 processes
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1rand-rw: ...
...
fio-3.35
Starting 4 processes
rand-rw: (g=0): rw=randrw, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=psync, iodepth=16
...
fio-3.35
Starting 4 processes
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1rand-rw: (groupid=0, jobs=1): err= 0: pid=1003439: Mon Jan 15 01:05:39 2024
  read: IOPS=149, BW=2384KiB/s (2441kB/s)(5920KiB/2483msec)
    clat (usec): min=39, max=20761, avg=1986.72, stdev=2735.14
     lat (usec): min=39, max=20762, avg=1989.45, stdev=2733.85
    clat percentiles (usec):
     |  1.00th=[   42],  5.00th=[   48], 10.00th=[   55], 20.00th=[   77],
     | 30.00th=[   82], 40.00th=[   97], 50.00th=[ 1893], 60.00th=[ 2147],
     | 70.00th=[ 2442], 80.00th=[ 3261], 90.00th=[ 4686], 95.00th=[ 6849],
     | 99.00th=[13304], 99.50th=[20841], 99.90th=[20841], 99.95th=[20841],
     | 99.99th=[20841]
   bw (  KiB/s): min=  224, max= 3040, per=22.24%, avg=1832.00, stdev=1224.50, samples=4
   iops        : min=   14, max=  190, avg=114.50, stdev=76.53, samples=4
  write: IOPS=160, BW=2565KiB/s (2626kB/s)(6368KiB/2483msec); 0 zone resets
    clat (usec): min=572, max=110816, avg=4373.09, stdev=13753.95
     lat (usec): min=572, max=110816, avg=4373.68, stdev=13753.95
    clat percentiles (usec):
     |  1.00th=[   603],  5.00th=[   660], 10.00th=[   685], 20.00th=[   725],
     | 30.00th=[   775], 40.00th=[   816], 50.00th=[   898], 60.00th=[  1106],
     | 70.00th=[  1909], 80.00th=[  3228], 90.00th=[  5407], 95.00th=[ 12911],
     | 99.00th=[ 88605], 99.50th=[104334], 99.90th=[110625], 99.95th=[110625],
     | 99.99th=[110625]
   bw (  KiB/s): min=  160, max= 3296, per=23.36%, avg=2024.00, stdev=1385.61, samples=4
   iops        : min=   10, max=  206, avg=126.50, stdev=86.60, samples=4
  lat (usec)   : 50=3.65%, 100=16.02%, 250=2.86%, 500=0.52%, 750=13.02%
  lat (usec)   : 1000=16.15%
  lat (msec)   : 2=11.33%, 4=21.74%, 10=11.07%, 20=1.30%, 50=0.91%
  lat (msec)   : 100=1.17%, 250=0.26%
  cpu          : usr=0.12%, sys=19.86%, ctx=1272, majf=0, minf=9
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=370,398,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16
rand-rw: (groupid=0, jobs=1): err= 0: pid=1003440: Mon Jan 15 01:05:39 2024
  read: IOPS=123, BW=1975KiB/s (2023kB/s)(5744KiB/2908msec)
    clat (usec): min=41, max=53237, avg=1529.51, stdev=3741.11
     lat (usec): min=41, max=53237, avg=1529.79, stdev=3741.12
    clat percentiles (usec):
     |  1.00th=[   42],  5.00th=[   44], 10.00th=[   46], 20.00th=[   66],
     | 30.00th=[   77], 40.00th=[   80], 50.00th=[   87], 60.00th=[   96],
     | 70.00th=[  174], 80.00th=[ 2638], 90.00th=[ 4686], 95.00th=[ 6652],
     | 99.00th=[12256], 99.50th=[15795], 99.90th=[53216], 99.95th=[53216],
     | 99.99th=[53216]
   bw (  KiB/s): min=   64, max= 3212, per=20.04%, avg=1651.00, stdev=1592.67, samples=4
   iops        : min=    4, max=  200, avg=103.00, stdev=99.30, samples=4
  write: IOPS=140, BW=2250KiB/s (2304kB/s)(6544KiB/2908msec); 0 zone resets
    clat (usec): min=559, max=232210, avg=5755.85, stdev=21572.62
     lat (usec): min=560, max=232211, avg=5757.25, stdev=21572.59
    clat percentiles (usec):
     |  1.00th=[   611],  5.00th=[   644], 10.00th=[   676], 20.00th=[   725],
     | 30.00th=[   766], 40.00th=[   832], 50.00th=[   922], 60.00th=[  1123],
     | 70.00th=[  1713], 80.00th=[  2999], 90.00th=[  5800], 95.00th=[ 17171],
     | 99.00th=[122160], 99.50th=[145753], 99.90th=[231736], 99.95th=[231736],
     | 99.99th=[231736]
   bw (  KiB/s): min=  160, max= 3785, per=17.60%, avg=1525.00, stdev=1724.95, samples=5
   iops        : min=   10, max=  236, avg=95.20, stdev=107.63, samples=5
  lat (usec)   : 50=7.16%, 100=21.61%, 250=4.30%, 500=0.13%, 750=14.06%
  lat (usec)   : 1000=15.23%
  lat (msec)   : 2=10.55%, 4=11.98%, 10=10.16%, 20=2.60%, 50=0.52%
  lat (msec)   : 100=0.91%, 250=0.78%
  cpu          : usr=0.10%, sys=16.34%, ctx=732, majf=0, minf=10
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=359,409,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16
rand-rw: (groupid=0, jobs=1): err= 0: pid=1003442: Mon Jan 15 01:05:39 2024
  read: IOPS=163, BW=2623KiB/s (2686kB/s)(6256KiB/2385msec)
    clat (usec): min=41, max=69442, avg=2504.57, stdev=4796.18
     lat (usec): min=41, max=69442, avg=2504.90, stdev=4796.16
    clat percentiles (usec):
     |  1.00th=[   43],  5.00th=[   49], 10.00th=[   69], 20.00th=[   81],
     | 30.00th=[  124], 40.00th=[ 1893], 50.00th=[ 2057], 60.00th=[ 2212],
     | 70.00th=[ 2671], 80.00th=[ 2999], 90.00th=[ 4228], 95.00th=[ 6652],
     | 99.00th=[22676], 99.50th=[42206], 99.90th=[69731], 99.95th=[69731],
     | 99.99th=[69731]
   bw (  KiB/s): min= 1571, max= 3904, per=30.89%, avg=2544.75, stdev=1031.19, samples=4
   iops        : min=   98, max=  244, avg=159.00, stdev=64.51, samples=4
  write: IOPS=158, BW=2529KiB/s (2590kB/s)(6032KiB/2385msec); 0 zone resets
    clat (usec): min=602, max=115688, avg=3717.80, stdev=12062.93
     lat (usec): min=602, max=115689, avg=3718.33, stdev=12062.94
    clat percentiles (usec):
     |  1.00th=[   611],  5.00th=[   652], 10.00th=[   676], 20.00th=[   709],
     | 30.00th=[   758], 40.00th=[   824], 50.00th=[   938], 60.00th=[  1450],
     | 70.00th=[  1811], 80.00th=[  2868], 90.00th=[  4752], 95.00th=[  8291],
     | 99.00th=[ 81265], 99.50th=[ 98042], 99.90th=[115868], 99.95th=[115868],
     | 99.99th=[115868]
   bw (  KiB/s): min= 1696, max= 3552, per=29.36%, avg=2544.75, stdev=869.62, samples=4
   iops        : min=  106, max=  222, avg=159.00, stdev=54.39, samples=4
  lat (usec)   : 50=3.12%, 100=11.20%, 250=1.69%, 500=0.13%, 750=14.32%
  lat (usec)   : 1000=12.63%
  lat (msec)   : 2=15.76%, 4=29.95%, 10=8.07%, 20=1.04%, 50=0.91%
  lat (msec)   : 100=1.04%, 250=0.13%
  cpu          : usr=0.21%, sys=22.15%, ctx=1306, majf=0, minf=10
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=391,377,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16
rand-rw: (groupid=0, jobs=1): err= 0: pid=1003443: Mon Jan 15 01:05:39 2024
  read: IOPS=138, BW=2219KiB/s (2273kB/s)(6032KiB/2718msec)
    clat (usec): min=41, max=23129, avg=1687.53, stdev=3014.18
     lat (usec): min=41, max=23129, avg=1687.80, stdev=3014.18
    clat percentiles (usec):
     |  1.00th=[   42],  5.00th=[   44], 10.00th=[   47], 20.00th=[   61],
     | 30.00th=[   77], 40.00th=[   82], 50.00th=[   93], 60.00th=[  145],
     | 70.00th=[ 2245], 80.00th=[ 2802], 90.00th=[ 4752], 95.00th=[ 6980],
     | 99.00th=[15401], 99.50th=[21103], 99.90th=[23200], 99.95th=[23200],
     | 99.99th=[23200]
   bw (  KiB/s): min=   96, max= 3808, per=22.53%, avg=1856.00, stdev=1682.36, samples=5
   iops        : min=    6, max=  238, avg=116.00, stdev=105.15, samples=5
  write: IOPS=143, BW=2302KiB/s (2357kB/s)(6256KiB/2718msec); 0 zone resets
    clat (usec): min=572, max=195334, avg=5312.74, stdev=17995.81
     lat (usec): min=573, max=195334, avg=5313.27, stdev=17995.83
    clat percentiles (usec):
     |  1.00th=[   594],  5.00th=[   660], 10.00th=[   685], 20.00th=[   734],
     | 30.00th=[   783], 40.00th=[   824], 50.00th=[   881], 60.00th=[  1074],
     | 70.00th=[  1811], 80.00th=[  2868], 90.00th=[  6783], 95.00th=[ 21627],
     | 99.00th=[103285], 99.50th=[124257], 99.90th=[196084], 99.95th=[196084],
     | 99.99th=[196084]
   bw (  KiB/s): min=  160, max= 3744, per=23.03%, avg=1996.80, stdev=1775.01, samples=5
   iops        : min=   10, max=  234, avg=124.80, stdev=110.94, samples=5
  lat (usec)   : 50=7.03%, 100=19.14%, 250=4.30%, 500=0.13%, 750=12.24%
  lat (usec)   : 1000=17.45%
  lat (msec)   : 2=8.59%, 4=16.54%, 10=9.64%, 20=2.08%, 50=1.30%
  lat (msec)   : 100=1.04%, 250=0.52%
  cpu          : usr=0.04%, sys=17.04%, ctx=1010, majf=0, minf=10
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=377,391,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16Run status group 0 (all jobs):
   READ: bw=8237KiB/s (8434kB/s), 1975KiB/s-2623KiB/s (2023kB/s-2686kB/s), io=23.4MiB (24.5MB), run=2385-2908msec
  WRITE: bw=8666KiB/s (8874kB/s), 2250KiB/s-2565KiB/s (2304kB/s-2626kB/s), io=24.6MiB (25.8MB), run=2385-2908msecrand-rw: (groupid=0, jobs=1): err= 0: pid=1003436: Mon Jan 15 01:06:05 2024
  read: IOPS=12, BW=203KiB/s (208kB/s)(5920KiB/29191msec)
    clat (usec): min=1462, max=99109, avg=3316.33, stdev=7719.56
     lat (usec): min=1462, max=99110, avg=3316.64, stdev=7719.57
    clat percentiles (usec):
     |  1.00th=[ 1483],  5.00th=[ 1549], 10.00th=[ 1614], 20.00th=[ 1663],
     | 30.00th=[ 1713], 40.00th=[ 1762], 50.00th=[ 1844], 60.00th=[ 1926],
     | 70.00th=[ 2089], 80.00th=[ 2343], 90.00th=[ 4178], 95.00th=[ 5997],
     | 99.00th=[34866], 99.50th=[81265], 99.90th=[99091], 99.95th=[99091],
     | 99.99th=[99091]
   bw (  KiB/s): min=   32, max=  576, per=24.58%, avg=200.83, stdev=98.27, samples=58
   iops        : min=    2, max=   36, avg=12.55, stdev= 6.14, samples=58
  write: IOPS=13, BW=218KiB/s (223kB/s)(6368KiB/29191msec); 0 zone resets
    clat (msec): min=31, max=209, avg=70.25, stdev=22.29
     lat (msec): min=31, max=209, avg=70.25, stdev=22.29
    clat percentiles (msec):
     |  1.00th=[   35],  5.00th=[   44], 10.00th=[   48], 20.00th=[   54],
     | 30.00th=[   58], 40.00th=[   62], 50.00th=[   67], 60.00th=[   72],
     | 70.00th=[   75], 80.00th=[   84], 90.00th=[   99], 95.00th=[  111],
     | 99.00th=[  150], 99.50th=[  199], 99.90th=[  209], 99.95th=[  209],
     | 99.99th=[  209]
   bw (  KiB/s): min=   96, max=  288, per=25.24%, avg=216.83, stdev=40.23, samples=58
   iops        : min=    6, max=   18, avg=13.55, stdev= 2.51, samples=58
  lat (msec)   : 2=32.29%, 4=10.81%, 10=3.39%, 20=0.52%, 50=7.55%
  lat (msec)   : 100=40.76%, 250=4.69%
  cpu          : usr=0.01%, sys=0.99%, ctx=1766, majf=0, minf=10
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=370,398,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16
rand-rw: (groupid=0, jobs=1): err= 0: pid=1003437: Mon Jan 15 01:06:05 2024
  read: IOPS=12, BW=195KiB/s (200kB/s)(5744KiB/29441msec)
    clat (usec): min=1447, max=66583, avg=2888.69, stdev=5913.88
     lat (usec): min=1447, max=66583, avg=2888.99, stdev=5913.88
    clat percentiles (usec):
     |  1.00th=[ 1483],  5.00th=[ 1582], 10.00th=[ 1614], 20.00th=[ 1663],
     | 30.00th=[ 1713], 40.00th=[ 1762], 50.00th=[ 1795], 60.00th=[ 1876],
     | 70.00th=[ 2008], 80.00th=[ 2212], 90.00th=[ 3392], 95.00th=[ 5014],
     | 99.00th=[39060], 99.50th=[53216], 99.90th=[66323], 99.95th=[66323],
     | 99.99th=[66323]
   bw (  KiB/s): min=   32, max=  512, per=24.09%, avg=196.49, stdev=106.38, samples=57
   iops        : min=    2, max=   32, avg=12.28, stdev= 6.65, samples=57
  write: IOPS=13, BW=222KiB/s (228kB/s)(6544KiB/29441msec); 0 zone resets
    clat (msec): min=16, max=232, avg=69.44, stdev=23.80
     lat (msec): min=16, max=232, avg=69.44, stdev=23.80
    clat percentiles (msec):
     |  1.00th=[   26],  5.00th=[   43], 10.00th=[   48], 20.00th=[   55],
     | 30.00th=[   58], 40.00th=[   62], 50.00th=[   65], 60.00th=[   69],
     | 70.00th=[   74], 80.00th=[   82], 90.00th=[   97], 95.00th=[  114],
     | 99.00th=[  140], 99.50th=[  192], 99.90th=[  232], 99.95th=[  232],
     | 99.99th=[  232]
   bw (  KiB/s): min=   64, max=  288, per=25.35%, avg=217.93, stdev=44.23, samples=58
   iops        : min=    4, max=   18, avg=13.62, stdev= 2.76, samples=58
  lat (msec)   : 2=32.42%, 4=10.68%, 10=2.73%, 20=0.26%, 50=7.16%
  lat (msec)   : 100=42.45%, 250=4.30%
  cpu          : usr=0.01%, sys=0.95%, ctx=1738, majf=0, minf=12
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=359,409,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16
rand-rw: (groupid=0, jobs=1): err= 0: pid=1003438: Mon Jan 15 01:06:05 2024
  read: IOPS=13, BW=220KiB/s (226kB/s)(6256KiB/28379msec)
    clat (usec): min=1451, max=50590, avg=2644.53, stdev=4071.91
     lat (usec): min=1452, max=50591, avg=2644.84, stdev=4071.92
    clat percentiles (usec):
     |  1.00th=[ 1483],  5.00th=[ 1549], 10.00th=[ 1598], 20.00th=[ 1647],
     | 30.00th=[ 1696], 40.00th=[ 1745], 50.00th=[ 1795], 60.00th=[ 1893],
     | 70.00th=[ 2008], 80.00th=[ 2245], 90.00th=[ 3163], 95.00th=[ 4424],
     | 99.00th=[25297], 99.50th=[31327], 99.90th=[50594], 99.95th=[50594],
     | 99.99th=[50594]
   bw (  KiB/s): min=   32, max=  608, per=27.29%, avg=222.86, stdev=134.52, samples=56
   iops        : min=    2, max=   38, avg=13.93, stdev= 8.41, samples=56
  write: IOPS=13, BW=213KiB/s (218kB/s)(6032KiB/28379msec); 0 zone resets
    clat (msec): min=31, max=211, avg=72.52, stdev=24.19
     lat (msec): min=31, max=211, avg=72.52, stdev=24.19
    clat percentiles (msec):
     |  1.00th=[   40],  5.00th=[   48], 10.00th=[   52], 20.00th=[   56],
     | 30.00th=[   59], 40.00th=[   64], 50.00th=[   68], 60.00th=[   72],
     | 70.00th=[   77], 80.00th=[   85], 90.00th=[  100], 95.00th=[  118],
     | 99.00th=[  190], 99.50th=[  207], 99.90th=[  211], 99.95th=[  211],
     | 99.99th=[  211]
   bw (  KiB/s): min=   64, max=  256, per=24.77%, avg=212.00, stdev=40.74, samples=56
   iops        : min=    4, max=   16, avg=13.25, stdev= 2.55, samples=56
  lat (msec)   : 2=34.77%, 4=13.02%, 10=1.69%, 20=0.65%, 50=4.69%
  lat (msec)   : 100=40.49%, 250=4.69%
  cpu          : usr=0.01%, sys=1.01%, ctx=1743, majf=0, minf=10
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=391,377,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16
rand-rw: (groupid=0, jobs=1): err= 0: pid=1003441: Mon Jan 15 01:06:05 2024
  read: IOPS=13, BW=208KiB/s (213kB/s)(6032KiB/28977msec)
    clat (usec): min=1476, max=86347, avg=3206.18, stdev=7498.33
     lat (usec): min=1476, max=86348, avg=3206.49, stdev=7498.34
    clat percentiles (usec):
     |  1.00th=[ 1500],  5.00th=[ 1565], 10.00th=[ 1598], 20.00th=[ 1680],
     | 30.00th=[ 1729], 40.00th=[ 1762], 50.00th=[ 1811], 60.00th=[ 1893],
     | 70.00th=[ 1991], 80.00th=[ 2311], 90.00th=[ 3720], 95.00th=[ 6128],
     | 99.00th=[41681], 99.50th=[71828], 99.90th=[86508], 99.95th=[86508],
     | 99.99th=[86508]
   bw (  KiB/s): min=   32, max=  448, per=25.81%, avg=210.86, stdev=106.01, samples=56
   iops        : min=    2, max=   28, avg=13.18, stdev= 6.63, samples=56
  write: IOPS=13, BW=216KiB/s (221kB/s)(6256KiB/28977msec); 0 zone resets
    clat (msec): min=24, max=202, avg=71.01, stdev=23.17
     lat (msec): min=24, max=202, avg=71.01, stdev=23.17
    clat percentiles (msec):
     |  1.00th=[   42],  5.00th=[   46], 10.00th=[   50], 20.00th=[   55],
     | 30.00th=[   58], 40.00th=[   62], 50.00th=[   65], 60.00th=[   71],
     | 70.00th=[   77], 80.00th=[   82], 90.00th=[  105], 95.00th=[  115],
     | 99.00th=[  150], 99.50th=[  199], 99.90th=[  203], 99.95th=[  203],
     | 99.99th=[  203]
   bw (  KiB/s): min=   64, max=  256, per=25.00%, avg=214.46, stdev=41.88, samples=57
   iops        : min=    4, max=   16, avg=13.40, stdev= 2.62, samples=57
  lat (msec)   : 2=34.77%, 4=10.03%, 10=2.99%, 20=0.26%, 50=6.51%
  lat (msec)   : 100=39.58%, 250=5.86%
  cpu          : usr=0.02%, sys=0.96%, ctx=1741, majf=0, minf=10
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=377,391,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16Run status group 0 (all jobs):
   READ: bw=814KiB/s (833kB/s), 195KiB/s-220KiB/s (200kB/s-226kB/s), io=23.4MiB (24.5MB), run=28379-29441msec
  WRITE: bw=856KiB/s (876kB/s), 213KiB/s-222KiB/s (218kB/s-228kB/s), io=24.6MiB (25.8MB), run=28379-29441msec
mix direct rw 1048576 by fio with 4 jobs...
mix buffer rw 1048576 by fio with 4 jobs...
rand-rw: (g=0): rw=randrw, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=psync, iodepth=16
...
fio-3.35
Starting 4 processes
rand-rw: (g=0): rw=randrw, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=psync, iodepth=16
...
fio-3.35
Starting 4 processes
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
 

<<Please provide additional information about the failure here>>



 Comments   
Comment by Arshad Hussain [ 15/Jan/24 ]

Server nodes have standard sysrq traces... (all logs attached)

[16087.220337] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == sanity test 398a: direct IO should cancel lock otherwise lockless ========================================================== 01:03:34 \(1705280614\)^M
[16087.519271] Lustre: DEBUG MARKER: == sanity test 398a: direct IO should cancel lock otherwise lockless ========================================================== 01:03:34 (1705280614)^M
[16090.175721] Lustre: DEBUG MARKER: sysctl -wq kernel/kptr_restrict=1^M
[16091.768405] Lustre: DEBUG MARKER: sysctl --values kernel/kptr_restrict^M
[16092.098888] Lustre: DEBUG MARKER: sysctl -wq kernel/kptr_restrict=1^M
[16093.337161] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == sanity test 398b: DIO and buffer IO race ============== 01:03:41 \(1705280621\)^M
[16093.641854] Lustre: DEBUG MARKER: == sanity test 398b: DIO and buffer IO race ============== 01:03:41 (1705280621)^M
[21535.641139] sysrq: Changing Loglevel^M
[21535.643393] sysrq: Loglevel set to 8^M
[21535.646212] sysrq: Show State^M
[21535.646753] task:systemd         state:S stack:    0 pid:    1 ppid:     0 flags:0x00000002^M
[21535.647878] Call Trace:^M
[21535.648311]  <TASK>^M
[21535.648681]  __schedule+0x248/0x620^M
[21535.649284]  schedule+0x2d/0x60^M
[21535.649771]  schedule_hrtimeout_range_clock+0x198/0x1b0^M
[21535.650495]  ? ep_send_events+0x272/0x2c0^M
[21535.651134]  ep_poll+0x348/0x3b0^M
[21535.651626]  ? cpuacct_percpu_seq_show+0x10/0x10^M
[21535.652324]  do_epoll_wait+0xa3/0xc0^M
[21535.652854]  __x64_sys_epoll_wait+0x60/0x100^M
[21535.653475]  do_syscall_64+0x5c/0x90^M
[21535.654032]  ? syscall_exit_to_user_mode+0x12/0x30^M
[21535.654711]  ? do_syscall_64+0x69/0x90^M
[21535.655268]  ? __task_pid_nr_ns+0x97/0xb0^M
[21535.655862]  ? syscall_exit_to_user_mode+0x12/0x30^M
[21535.656559]  ? do_syscall_64+0x69/0x90^M
[21535.657120]  ? __rseq_handle_notify_resume+0x32/0x50^M
[21535.657825]  ? exit_to_user_mode_loop+0xd0/0x130^M
[21535.658496]  ? exit_to_user_mode_prepare+0xec/0x100^M
[21535.659193]  ? sysvec_apic_timer_interrupt+0x3c/0x90^M
[21535.659882]  entry_SYSCALL_64_after_hwframe+0x63/0xcd^M
...
Comment by Arshad Hussain [ 15/Jan/24 ]

This was triggered by non-enforced maloo run. This is not a bug yet.

Comment by Alex Zhuravlev [ 16/Jan/24 ]

from the traces it doesn't look like something was really wrong, rather it was very slow.

Comment by Arshad Hussain [ 17/Jan/24 ]

Thanks for looking into this. It took longer than the timeout and sysrq forced kill the slow process (testcase). Still this was triggered from non-enforced(optional) maloo run and I have marked it closed.

Generated at Sat Feb 10 03:35:20 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.