Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5128

ASSERTION( atomic_read(&obd->obd_req_replay_clients) == 0 ) failed

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.7.0, Lustre 2.5.3
    • Lustre 2.4.3
    • Lustre-2.4.3
    • 3
    • 14153

    Description

      MDS failovered and once MDS's recovery finished, many OSS crahsed due to following ASSERTION.

      2014-05-30 17:39:07 Lustre: Skipped 3 previous similar messages
      2014-05-30 17:39:07 LustreError: 18967:0:(ldlm_lib.c:1851:target_next_replay_req()) ASSERTION( atomic_read(&obd->obd_req_replay_clients) == 0 ) failed: 
      2014-05-30 17:39:07 LustreError: 18967:0:(ldlm_lib.c:1851:target_next_replay_req()) LBUG
      2014-05-30 17:39:07 Pid: 18967, comm: tgt_recov
      2014-05-30 17:39:07 
      2014-05-30 17:39:07 Call Trace:
      2014-05-30 17:39:07  [<ffffffffa0353895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      2014-05-30 17:39:07  [<ffffffffa0353e97>] lbug_with_loc+0x47/0xb0 [libcfs]
      2014-05-30 17:39:07  [<ffffffffa066f48c>] target_recovery_thread+0x14ac/0x1970 [ptlrpc]
      2014-05-30 17:39:07  [<ffffffffa066dfe0>] ? target_recovery_thread+0x0/0x1970 [ptlrpc]
      2014-05-30 17:39:07  [<ffffffff8100c0ca>] child_rip+0xa/0x20
      2014-05-30 17:39:07  [<ffffffffa066dfe0>] ? target_recovery_thread+0x0/0x1970 [ptlrpc]
      2014-05-30 17:39:07  [<ffffffffa066dfe0>] ? target_recovery_thread+0x0/0x1970 [ptlrpc]
      2014-05-30 17:39:07  [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
      2014-05-30 17:39:07 
      2014-05-30 17:39:07 Kernel panic - not syncing: LBUG
      2014-05-30 17:39:07 Pid: 18967, comm: tgt_recov Not tainted 2.6.32-358.18.1.el6_lustre.x86_64 #1
      2014-05-30 17:39:07 Call Trace:
      2014-05-30 17:39:07  [<ffffffff8150de58>] ? panic+0xa7/0x16f
      2014-05-30 17:39:07  [<ffffffffa0353eeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
      2014-05-30 17:39:07  [<ffffffffa066f48c>] ? target_recovery_thread+0x14ac/0x1970 [ptlrpc]
      2014-05-30 17:39:07  [<ffffffffa066dfe0>] ? target_recovery_thread+0x0/0x1970 [ptlrpc]
      2014-05-30 17:39:07  [<ffffffff8100c0ca>] ? child_rip+0xa/0x20
      2014-05-30 17:39:07  [<ffffffffa066dfe0>] ? target_recovery_thread+0x0/0x1970 [ptlrpc]
      2014-05-30 17:39:07  [<ffffffffa066dfe0>] ? target_recovery_thread+0x0/0x1970 [ptlrpc]
      2014-05-30 17:39:07  [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
      

      LU-1522 and LU-2397 reported similar problem, but these patches have been merged in b2_4, already.

      Attachments

        Activity

          People

            hongchao.zhang Hongchao Zhang
            ihara Shuichi Ihara (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: