Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13089

ASSERTION( (((( lock))->l_flags & (1ULL << 50)) != 0) ) failed

Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • None
    • Lustre 2.12.4
    • 3
    • 9223372036854775807

    Description

      I hit this in b2_12-next testing, but there's nothing I see in there that appears related so I think this is just super rare race that's pretty similar in nature to LU-4269

      except this time it's glimpse cb vs cancel cb race

      console gives us:

      [19625.122592] LustreError: 8347:0:(ldlm_lock.c:213:ldlm_lock_put()) ASSERTION( (((( lock))->l_flags & (1ULL << 50)) != 0) ) failed: 
      [19625.124370] LustreError: 8347:0:(ldlm_lock.c:213:ldlm_lock_put()) LBUG
      [19625.124704] LustreError: 8367:0:(ldlm_lock.c:205:ldlm_lock_put()) ASSERTION( atomic_read(&lock->l_refc) > 0 ) failed: 
      [19625.124705] LustreError: 8367:0:(ldlm_lock.c:205:ldlm_lock_put()) LBUG
      [19625.124706] Pid: 8367, comm: ldlm_bl_02 3.10.0-7.7-debug #1 SMP Wed Oct 30 09:43:00 EDT 2019
      [19625.124707] Call Trace:
      [19625.124739]  [<ffffffffa01f97dc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
      [19625.124744]  [<ffffffffa01f988c>] lbug_with_loc+0x4c/0xa0 [libcfs]
      [19625.124796]  [<ffffffffa0638197>] ldlm_lock_put+0x557/0x730 [ptlrpc]
      [19625.124821]  [<ffffffffa0639980>] ldlm_lock_destroy_nolock+0x50/0x110 [ptlrpc]
      [19625.124839]  [<ffffffffa063febf>] ldlm_lock_cancel+0x6f/0x1f0 [ptlrpc]
      [19625.124866]  [<ffffffffa0656a3a>] ldlm_cli_cancel_local+0xca/0x3f0 [ptlrpc]
      [19625.124887]  [<ffffffffa065c767>] ldlm_cli_cancel+0x157/0x620 [ptlrpc]
      [19625.124917]  [<ffffffffa08a8b1a>] osc_ldlm_blocking_ast+0x17a/0x3a0 [osc]
      [19625.124949]  [<ffffffffa0668835>] ldlm_handle_bl_callback+0xf5/0x4f0 [ptlrpc]
      [19625.124977]  [<ffffffffa06693e8>] ldlm_bl_thread_main+0x7b8/0x980 [ptlrpc]
      [19625.125040]  [<ffffffff810b8254>] kthread+0xe4/0xf0
      [19625.125053]  [<ffffffff817e0df7>] ret_from_fork_nospec_end+0x0/0x39
      [19625.125063]  [<ffffffffffffffff>] 0xffffffffffffffff
      [19625.125067] Kernel panic - not syncing: LBUG
      

      and the oud 8347 backtrace is

       #4 [ffff88003e313820] vt_console_print at ffffffff814dbe54
       #5 [ffff88003e313880] call_console_drivers.constprop.17 at ffffffff8108cf00
       #6 [ffff88003e3138a8] console_unlock at ffffffff8108e559
       #7 [ffff88003e3138e8] vprintk_emit at ffffffff8108e984
       #8 [ffff88003e313958] vprintk_default at ffffffff8108ed79
       #9 [ffff88003e313968] printk at ffffffff817c67d1
      #10 [ffff88003e3139c8] cfs_print_to_console at ffffffffa01f958a [libcfs]
      #11 [ffff88003e3139f8] libcfs_debug_vmsg2 at ffffffffa01ff9e3 [libcfs]
      #12 [ffff88003e313b48] libcfs_debug_msg at ffffffffa0200077 [libcfs]
      #13 [ffff88003e313ba8] lbug_with_loc at ffffffffa01f9866 [libcfs]
      #14 [ffff88003e313bc8] ldlm_lock_put at ffffffffa0638266 [ptlrpc]
      #15 [ffff88003e313be8] osc_ldlm_glimpse_ast at ffffffffa08a8f28 [osc]
      #16 [ffff88003e313ca0] ldlm_callback_handler at ffffffffa066a0a8 [ptlrpc]
      #17 [ffff88003e313d18] ldlm_callback_handler at ffffffffa066b3c7 [ptlrpc]
      #18 [ffff88003e313d30] ptlrpc_server_handle_request at ffffffffa0699266 [ptlrpc]
      #19 [ffff88003e313dd0] ptlrpc_main at ffffffffa069d261 [ptlrpc]
      #20 [ffff88003e313ea8] kthread at ffffffff810b8254
      

      Attachments

        Issue Links

          Activity

            [LU-13089] ASSERTION( (((( lock))->l_flags & (1ULL << 50)) != 0) ) failed
            pjones Peter Jones added a comment -

            Believed to be a duplicate of LU-11719

            pjones Peter Jones added a comment - Believed to be a duplicate of LU-11719
            gerrit Gerrit Updater added a comment - - edited

            Bobi Jam (bobijam@hotmail.com) abandoned the patch: https://review.whamcloud.com/40286

            gerrit Gerrit Updater added a comment - - edited Bobi Jam (bobijam@hotmail.com) abandoned the patch: https://review.whamcloud.com/40286
            gerrit Gerrit Updater added a comment - - edited

            Bobi Jam (bobijam@hotmail.com) uploaded a patch port: https://review.whamcloud.com/40399

             

            The search_itree was glitched and messed up data->lmd_lock.

            gerrit Gerrit Updater added a comment - - edited Bobi Jam (bobijam@hotmail.com) uploaded a patch port: https://review.whamcloud.com/40399   The search_itree was glitched and messed up data->lmd_lock.
            kim.sebo Kim Sebo added a comment -

            We at ANU/NCI are also seeing this assert, ~10 times per month on 3000 clients. Clients are Centos 8 / 2.12.5 + some backports.

            kim.sebo Kim Sebo added a comment - We at ANU/NCI are also seeing this assert, ~10 times per month on 3000 clients. Clients are Centos 8 / 2.12.5 + some backports.
            lflis Lukasz Flis added a comment - - edited

            Applying the https://review.whamcloud.com/39819 fixed the problem in CYFRONET
            Before going to production with this fix we need to test if LU-11670 is a problem for us.

             

            lflis Lukasz Flis added a comment - - edited Applying the https://review.whamcloud.com/39819  fixed the problem in CYFRONET Before going to production with this fix we need to test if LU-11670  is a problem for us.  

            Bobi Jam (bobijam@hotmail.com) uploaded a new patch: https://review.whamcloud.com/39819
            Subject: LU-13089 osc: revert "glimpse - search for active lock"
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: 37205c60bb2d99363a0c9dbf29d8f4fd684b6fab

            gerrit Gerrit Updater added a comment - Bobi Jam (bobijam@hotmail.com) uploaded a new patch: https://review.whamcloud.com/39819 Subject: LU-13089 osc: revert "glimpse - search for active lock" Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: 37205c60bb2d99363a0c9dbf29d8f4fd684b6fab

            People

              wc-triage WC Triage
              green Oleg Drokin
              Votes:
              2 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: