Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12258

sanity test_101d timeout when doing rolling upgrade OSS from 2.10.7 to 2.12.1 with ZFS

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Minor Minor
    • None
    • Lustre 2.12.1
    • None
    • 3
    • 9223372036854775807

      1. setup system with 2.10.7 with 1 MDS (ZFS), 2 OSTs (ZFS), 1 client
      2. upgrade OSS from 2.10.7 to 2.12.1, others remain 2.10.7, run sanity, test_101d timeout, on OSS side showing following trace. ldiskfs doesn't have the problem

      [ 3268.291244] Lustre: DEBUG MARKER: == sanity test 101d: file read with and witt
      hout read-ahead enabled =================================== 00:21:09 (15566700699
      )
      [ 3280.980154] WARNING: MMP writes to pool 'lustre-ost1' have not succeeded in oo
      ver 5s; suspending pool
      [ 3280.981448] WARNING: Pool 'lustre-ost1' has encountered an uncorrectable I/O  
      failure and has been suspended.
      
      [ 3281.091886] WARNING: MMP writes to pool 'lustre-ost2' have not succeeded in oo
      ver 5s; suspending pool
      [ 3281.092868] WARNING: Pool 'lustre-ost2' has encountered an uncorrectable I/O  
      failure and has been suspended.
      
      [ 3474.405076] LNet: Service thread pid 30189 was inactive for 200.40s. The three
      ad might be hung, or it might only be slow and will resume later. Dumping the stt
      ack trace for debugging purposes:
      [ 3474.409289] Pid: 30189, comm: ll_ost_io00_000 3.10.0-957.10.1.el7_lustre.x86__
      64 #1 SMP Mon Apr 22 22:25:47 UTC 2019
      [ 3474.410361] Call Trace:
      [ 3474.410361] Call Trace:
      [ 3474.410694]  [<ffffffffc07922d5>] cv_wait_common+0x125/0x150 [spl]
      [ 3474.411403]  [<ffffffffc0792315>] __cv_wait+0x15/0x20 [spl]
      [ 3474.412004]  [<ffffffffc08d32bf>] txg_wait_synced+0xef/0x140 [zfs]
      [ 3474.412817]  [<ffffffffc0888c95>] dmu_tx_wait+0x275/0x3c0 [zfs]
      [ 3474.413488]  [<ffffffffc0888e72>] dmu_tx_assign+0x92/0x490 [zfs]
      [ 3474.414163]  [<ffffffffc11f6009>] osd_trans_start+0x199/0x440 [osd_zfs]
      [ 3474.414896]  [<ffffffffc131cc85>] ofd_trans_start+0x75/0xf0 [ofd]
      [ 3474.415596]  [<ffffffffc1323881>] ofd_commitrw_write+0xa31/0x1d40 [ofd]
      [ 3474.416312]  [<ffffffffc1327c6c>] ofd_commitrw+0x48c/0x9e0 [ofd]
      [ 3474.416962]  [<ffffffffc102947c>] tgt_brw_write+0x10cc/0x1cf0 [ptlrpc]
      [ 3474.417923]  [<ffffffffc10251da>] tgt_request_handle+0xaea/0x1580 [ptlrpc]
      [ 3474.418699]  [<ffffffffc0fca80b>] ptlrpc_server_handle_request+0x24b/0xab0 [pp
      tlrpc]
      [ 3474.419550]  [<ffffffffc0fce13c>] ptlrpc_main+0xafc/0x1fc0 [ptlrpc]
      [ 3474.420261]  [<ffffffffa0cc1c71>] kthread+0xd1/0xe0
      [ 3474.420817]  [<ffffffffa1375c37>] ret_from_fork_nospec_end+0x0/0x39
      [ 3474.421507]  [<ffffffffffffffff>] 0xffffffffffffffff
      

            wc-triage WC Triage
            sarah Sarah Liu
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: