Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.4.0
    • Lustre 2.4.0
    • LLNL/Hyperion
    • 3
    • 7822

    Description

      Started SWL. Had immediate watchdog timeouts on OSTs, clients are stuck in 'comp' state, are evicted by servers and unable to reconnect, as servers are returning EBUSY. Appears to be some sort of watchdog storm on OSTs, stack dump attached.

      Attachments

        Activity

          [LU-3202] SWL Causes OST hang

          Patch landed. Let me know if more work is needed and I will reopen the ticket.

          jlevi Jodi Levi (Inactive) added a comment - Patch landed. Let me know if more work is needed and I will reopen the ticket.
          yong.fan nasf (Inactive) added a comment - - edited

          Originally, to control the race between the OI scrub inserting
          OI mapping and the osd_object_destroy() removing OI mapping on
          the same target, the inode::i_mutex was used.

          But the unlink thread which called osd_object_destroy() already
          started transaction handle. Such order is different from others
          as to may cause some deadlock between transaction start and the
          obtain inode::i_mutex.

          So now, the osd_object_destroy() will not obtain inode::i_mutex,
          instead, the OI scrub will check whether someone unlinked the
          inode or not during the OI scrub rebuilding the OI mapping, and
          remove the new-inserted OI mapping if the race happened.

          This is the patch:
          http://review.whamcloud.com/#change,6124

          yong.fan nasf (Inactive) added a comment - - edited Originally, to control the race between the OI scrub inserting OI mapping and the osd_object_destroy() removing OI mapping on the same target, the inode::i_mutex was used. But the unlink thread which called osd_object_destroy() already started transaction handle. Such order is different from others as to may cause some deadlock between transaction start and the obtain inode::i_mutex. So now, the osd_object_destroy() will not obtain inode::i_mutex, instead, the OI scrub will check whether someone unlinked the inode or not during the OI scrub rebuilding the OI mapping, and remove the new-inserted OI mapping if the race happened. This is the patch: http://review.whamcloud.com/#change,6124
          green Oleg Drokin added a comment -

          Reassigned to FanYong as he's going to take it further with a fix.

          green Oleg Drokin added a comment - Reassigned to FanYong as he's going to take it further with a fix.
          green Oleg Drokin added a comment -

          This seems to be a pretty clear deadlock:

          2013-04-22 11:21:58 ll_ost01_001  D 000000000000000a     0  5490      2 0x000000
          00
          2013-04-22 11:21:58  ffff8803b0c4f9b0 0000000000000046 0000000000000000 ffff8803
          2f4e8950
          2013-04-22 11:21:58  ffff8803b0c4fa10 ffff8802ad507020 ffff8803b0c4f9a0 ffff8802
          0a13c518
          2013-04-22 11:21:58  ffff8803b0c49058 ffff8803b0c4ffd8 000000000000fb88 ffff8803
          b0c49058
          2013-04-22 11:21:58 Call Trace:
          2013-04-22 11:21:58  [<ffffffff8150f3be>] __mutex_lock_slowpath+0x13e/0x180
          2013-04-22 11:21:58  [<ffffffff8150f25b>] mutex_lock+0x2b/0x50
          2013-04-22 11:21:58  [<ffffffffa100265e>] osd_object_destroy+0x1be/0x580 [osd_ldiskfs]
          2013-04-22 11:21:58  [<ffffffffa089ebcf>] ? ldiskfs_dirty_inode+0x4f/0x60 [ldiskfs]
          2013-04-22 11:21:58  [<ffffffffa10dcd20>] ofd_object_destroy+0x380/0x680 [ofd]
          2013-04-22 11:21:58  [<ffffffffa10d0056>] ofd_destroy_by_fid+0x266/0x610 [ofd]
          2013-04-22 11:21:58  [<ffffffffa0aca7a0>] ? ldlm_blocking_ast+0x0/0x180 [ptlrpc]
          2013-04-22 11:21:58  [<ffffffffa0acbe00>] ? ldlm_completion_ast+0x0/0x960 [ptlrpc]
          2013-04-22 11:21:58  [<ffffffffa0af4345>] ? lustre_msg_buf+0x55/0x60 [ptlrpc]
          2013-04-22 11:21:58  [<ffffffffa10d3bb7>] ofd_destroy+0x1a7/0x8b0 [ofd]
          

          vs

          2013-04-22 11:21:56  [<ffffffffa0873ce5>] jbd2_log_wait_commit+0xc5/0x140 [jbd2]
          2013-04-22 11:21:56  [<ffffffff81096ca0>] ? autoremove_wake_function+0x0/0x40
          2013-04-22 11:21:56  [<ffffffffa089714e>] ldiskfs_sync_file+0x15e/0x260 [ldiskfs
          ]
          2013-04-22 11:21:56  [<ffffffffa0fff854>] osd_object_sync+0x144/0x190 [osd_ldisk
          fs]
          2013-04-22 11:21:56  [<ffffffffa10cfadb>] ofd_sync+0x36b/0x680 [ofd]
          2013-04-22 11:21:56  [<ffffffffa109eb30>] ost_blocking_ast+0x640/0x10f0 [ost]
          2013-04-22 11:21:56  [<ffffffffa0964baf>] ? lu_context_fini+0x2f/0xc0 [obdclass]
          2013-04-22 11:21:56  [<ffffffffa0aab16c>] ldlm_cancel_callback+0x6c/0x1a0 [ptlrp
          c]
          2013-04-22 11:21:56  [<ffffffffa0aab2fa>] ldlm_lock_cancel+0x5a/0x1e0 [ptlrpc]
          2013-04-22 11:21:56  [<ffffffffa0aceee4>] ldlm_request_cancel+0x254/0x410 [ptlrp
          c]
          2013-04-22 11:21:56  [<ffffffffa0acf1dd>] ldlm_handle_cancel+0x13d/0x240 [ptlrpc
          ]
          2013-04-22 11:21:56  [<ffffffffa0ad1218>] ldlm_cancel_handler+0x3f8/0x600 [ptlrp
          c]
          2013-04-22 11:21:56  [<ffffffffa0b0539c>] ptlrpc_server_handle_request+0x40c/0xd
          90 [ptlrpc]
          

          In fact osd_object_destroy trying to take i_mutex while having a transaction open is a big no-no as it could lead to this exact lock inversions - usually i_mutex is taken bfore having a transaction open.
          The ldiskfs_sync_file is called with i_mutex_held.

          What's unclear is how come we did not hit this any earlier, this code has been present in our tree since mid last year, it seems.

          green Oleg Drokin added a comment - This seems to be a pretty clear deadlock: 2013-04-22 11:21:58 ll_ost01_001 D 000000000000000a 0 5490 2 0x000000 00 2013-04-22 11:21:58 ffff8803b0c4f9b0 0000000000000046 0000000000000000 ffff8803 2f4e8950 2013-04-22 11:21:58 ffff8803b0c4fa10 ffff8802ad507020 ffff8803b0c4f9a0 ffff8802 0a13c518 2013-04-22 11:21:58 ffff8803b0c49058 ffff8803b0c4ffd8 000000000000fb88 ffff8803 b0c49058 2013-04-22 11:21:58 Call Trace: 2013-04-22 11:21:58 [<ffffffff8150f3be>] __mutex_lock_slowpath+0x13e/0x180 2013-04-22 11:21:58 [<ffffffff8150f25b>] mutex_lock+0x2b/0x50 2013-04-22 11:21:58 [<ffffffffa100265e>] osd_object_destroy+0x1be/0x580 [osd_ldiskfs] 2013-04-22 11:21:58 [<ffffffffa089ebcf>] ? ldiskfs_dirty_inode+0x4f/0x60 [ldiskfs] 2013-04-22 11:21:58 [<ffffffffa10dcd20>] ofd_object_destroy+0x380/0x680 [ofd] 2013-04-22 11:21:58 [<ffffffffa10d0056>] ofd_destroy_by_fid+0x266/0x610 [ofd] 2013-04-22 11:21:58 [<ffffffffa0aca7a0>] ? ldlm_blocking_ast+0x0/0x180 [ptlrpc] 2013-04-22 11:21:58 [<ffffffffa0acbe00>] ? ldlm_completion_ast+0x0/0x960 [ptlrpc] 2013-04-22 11:21:58 [<ffffffffa0af4345>] ? lustre_msg_buf+0x55/0x60 [ptlrpc] 2013-04-22 11:21:58 [<ffffffffa10d3bb7>] ofd_destroy+0x1a7/0x8b0 [ofd] vs 2013-04-22 11:21:56 [<ffffffffa0873ce5>] jbd2_log_wait_commit+0xc5/0x140 [jbd2] 2013-04-22 11:21:56 [<ffffffff81096ca0>] ? autoremove_wake_function+0x0/0x40 2013-04-22 11:21:56 [<ffffffffa089714e>] ldiskfs_sync_file+0x15e/0x260 [ldiskfs ] 2013-04-22 11:21:56 [<ffffffffa0fff854>] osd_object_sync+0x144/0x190 [osd_ldisk fs] 2013-04-22 11:21:56 [<ffffffffa10cfadb>] ofd_sync+0x36b/0x680 [ofd] 2013-04-22 11:21:56 [<ffffffffa109eb30>] ost_blocking_ast+0x640/0x10f0 [ost] 2013-04-22 11:21:56 [<ffffffffa0964baf>] ? lu_context_fini+0x2f/0xc0 [obdclass] 2013-04-22 11:21:56 [<ffffffffa0aab16c>] ldlm_cancel_callback+0x6c/0x1a0 [ptlrp c] 2013-04-22 11:21:56 [<ffffffffa0aab2fa>] ldlm_lock_cancel+0x5a/0x1e0 [ptlrpc] 2013-04-22 11:21:56 [<ffffffffa0aceee4>] ldlm_request_cancel+0x254/0x410 [ptlrp c] 2013-04-22 11:21:56 [<ffffffffa0acf1dd>] ldlm_handle_cancel+0x13d/0x240 [ptlrpc ] 2013-04-22 11:21:56 [<ffffffffa0ad1218>] ldlm_cancel_handler+0x3f8/0x600 [ptlrp c] 2013-04-22 11:21:56 [<ffffffffa0b0539c>] ptlrpc_server_handle_request+0x40c/0xd 90 [ptlrpc] In fact osd_object_destroy trying to take i_mutex while having a transaction open is a big no-no as it could lead to this exact lock inversions - usually i_mutex is taken bfore having a transaction open. The ldiskfs_sync_file is called with i_mutex_held. What's unclear is how come we did not hit this any earlier, this code has been present in our tree since mid last year, it seems.
          pjones Peter Jones added a comment -

          Oleg is looking into this

          pjones Peter Jones added a comment - Oleg is looking into this

          Reformatted the filesystem, restarted the test, had failure immediately. attached are consoles from all OSTs, and console from the first client to be evicted.

          cliffw Cliff White (Inactive) added a comment - Reformatted the filesystem, restarted the test, had failure immediately. attached are consoles from all OSTs, and console from the first client to be evicted.

          sysrq -p from one OST

          cliffw Cliff White (Inactive) added a comment - sysrq -p from one OST

          People

            yong.fan nasf (Inactive)
            cliffw Cliff White (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: