Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12258

sanity test_101d timeout when doing rolling upgrade OSS from 2.10.7 to 2.12.1 with ZFS

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.12.1
    • None
    • 3
    • 9223372036854775807

    Description

      1. setup system with 2.10.7 with 1 MDS (ZFS), 2 OSTs (ZFS), 1 client
      2. upgrade OSS from 2.10.7 to 2.12.1, others remain 2.10.7, run sanity, test_101d timeout, on OSS side showing following trace. ldiskfs doesn't have the problem

      [ 3268.291244] Lustre: DEBUG MARKER: == sanity test 101d: file read with and witt
      hout read-ahead enabled =================================== 00:21:09 (15566700699
      )
      [ 3280.980154] WARNING: MMP writes to pool 'lustre-ost1' have not succeeded in oo
      ver 5s; suspending pool
      [ 3280.981448] WARNING: Pool 'lustre-ost1' has encountered an uncorrectable I/O  
      failure and has been suspended.
      
      [ 3281.091886] WARNING: MMP writes to pool 'lustre-ost2' have not succeeded in oo
      ver 5s; suspending pool
      [ 3281.092868] WARNING: Pool 'lustre-ost2' has encountered an uncorrectable I/O  
      failure and has been suspended.
      
      [ 3474.405076] LNet: Service thread pid 30189 was inactive for 200.40s. The three
      ad might be hung, or it might only be slow and will resume later. Dumping the stt
      ack trace for debugging purposes:
      [ 3474.409289] Pid: 30189, comm: ll_ost_io00_000 3.10.0-957.10.1.el7_lustre.x86__
      64 #1 SMP Mon Apr 22 22:25:47 UTC 2019
      [ 3474.410361] Call Trace:
      [ 3474.410361] Call Trace:
      [ 3474.410694]  [<ffffffffc07922d5>] cv_wait_common+0x125/0x150 [spl]
      [ 3474.411403]  [<ffffffffc0792315>] __cv_wait+0x15/0x20 [spl]
      [ 3474.412004]  [<ffffffffc08d32bf>] txg_wait_synced+0xef/0x140 [zfs]
      [ 3474.412817]  [<ffffffffc0888c95>] dmu_tx_wait+0x275/0x3c0 [zfs]
      [ 3474.413488]  [<ffffffffc0888e72>] dmu_tx_assign+0x92/0x490 [zfs]
      [ 3474.414163]  [<ffffffffc11f6009>] osd_trans_start+0x199/0x440 [osd_zfs]
      [ 3474.414896]  [<ffffffffc131cc85>] ofd_trans_start+0x75/0xf0 [ofd]
      [ 3474.415596]  [<ffffffffc1323881>] ofd_commitrw_write+0xa31/0x1d40 [ofd]
      [ 3474.416312]  [<ffffffffc1327c6c>] ofd_commitrw+0x48c/0x9e0 [ofd]
      [ 3474.416962]  [<ffffffffc102947c>] tgt_brw_write+0x10cc/0x1cf0 [ptlrpc]
      [ 3474.417923]  [<ffffffffc10251da>] tgt_request_handle+0xaea/0x1580 [ptlrpc]
      [ 3474.418699]  [<ffffffffc0fca80b>] ptlrpc_server_handle_request+0x24b/0xab0 [pp
      tlrpc]
      [ 3474.419550]  [<ffffffffc0fce13c>] ptlrpc_main+0xafc/0x1fc0 [ptlrpc]
      [ 3474.420261]  [<ffffffffa0cc1c71>] kthread+0xd1/0xe0
      [ 3474.420817]  [<ffffffffa1375c37>] ret_from_fork_nospec_end+0x0/0x39
      [ 3474.421507]  [<ffffffffffffffff>] 0xffffffffffffffff
      

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              sarah Sarah Liu
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: