Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9563

LBUG ldlm_lock_downgrade()) ASSERTION( lock->l_granted_mode & (LCK_PW | LCK_EX) ) failed:

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Blocker
    • None
    • Lustre 2.10.0
    • soak performance cluster
    • 3
    • 9223372036854775807

    Description

      System was running normally, no fault was induced.

      [25789.588202] LustreError: 4166:0:(ldlm_lock.c:2548:ldlm_lock_downgrade()) ASSERTION( lock->l_granted_mode & (LCK_PW | LCK_EX) ) failed: ^M
      [25789.609363] LustreError: 4166:0:(ldlm_lock.c:2548:ldlm_lock_downgrade()) LBUG^M
      [25789.621115] Pid: 4166, comm: ptlrpc_hr01_005^M
      [25789.629112] ^M
      [25789.629112] Call Trace:^M
      [25789.639919]  [<ffffffffa084d7ee>] libcfs_call_trace+0x4e/0x60 [libcfs]^M
      [25789.650059]  [<ffffffffa084d87c>] lbug_with_loc+0x4c/0xb0 [libcfs]^M
      [25789.659583]  [<ffffffffa0b5883b>] ldlm_lock_downgrade+0x19b/0x1d0 [ptlrpc]^M
      [25789.669915]  [<ffffffffa0bae96f>] ptlrpc_hr_main+0x5bf/0x910 [ptlrpc]^M
      [25789.679492]  [<ffffffff810c8345>] ? sched_clock_cpu+0x85/0xc0^M
      [25789.688107]  [<ffffffff810c54c0>] ? default_wake_function+0x0/0x20^M
      [25789.697330]  [<ffffffffa0bae3b0>] ? ptlrpc_hr_main+0x0/0x910 [ptlrpc]^M
      [25789.706522]  [<ffffffff810b0a4f>] kthread+0xcf/0xe0^M
      [25789.714056]  [<ffffffff810b0980>] ? kthread+0x0/0xe0^M
      [25789.721469]  [<ffffffff81697318>] ret_from_fork+0x58/0x90^M
      [25789.729355]  [<ffffffff810b0980>] ? kthread+0x0/0xe0^M
      [25789.736613] ^M
      [25789.740048] Kernel panic - not syncing: LBUG^M
      

      System wedged hard at this time. Will reboot and run with full debug

      Attachments

        Issue Links

          Activity

            [LU-9563] LBUG ldlm_lock_downgrade()) ASSERTION( lock->l_granted_mode & (LCK_PW | LCK_EX) ) failed:
            pjones Peter Jones added a comment -

            Cliff

            Please add the results of the debug logs to LU-9504 - this patch is unlanded to master so does not need a new ticket

            Peter

            pjones Peter Jones added a comment - Cliff Please add the results of the debug logs to LU-9504 - this patch is unlanded to master so does not need a new ticket Peter

            Still hitting this issue, with every new LU-9504 patch. It's not fixed.

            cliffw Cliff White (Inactive) added a comment - Still hitting this issue, with every new LU-9504 patch. It's not fixed.

            Attempted to test latest patch for LU-9504, hit this issue again immediately.

            Jun  5 16:51:12 soak-11 kernel: LustreError: 4204:0:(ldlm_lock.c:2548:ldlm_lock_downgrade()) ASSERTION( lock->l_granted_mode & (LCK_PW | LCK_EX) ) failed:
            Jun  5 16:51:12 soak-11 kernel: LustreError: 4204:0:(ldlm_lock.c:2548:ldlm_lock_downgrade()) LBUG
            Jun  5 16:51:12 soak-11 kernel: Pid: 4204, comm: ptlrpc_hr01_002
            Jun  5 16:51:12 soak-11 kernel: #012Call Trace:
            Jun  5 16:51:12 soak-11 kernel: [<ffffffffa08637ee>] libcfs_call_trace+0x4e/0x60 [libcfs]
            Jun  5 16:51:12 soak-11 kernel: [<ffffffffa086387c>] lbug_with_loc+0x4c/0xb0 [libcfs]
            Jun  5 16:51:12 soak-11 kernel: [<ffffffffa0b6e83b>] ldlm_lock_downgrade+0x19b/0x1d0 [ptlrpc]
            Jun  5 16:51:12 soak-11 kernel: [<ffffffffa0bc57f8>] ptlrpc_handle_rs+0x5c8/0x700 [ptlrpc]
            Jun  5 16:51:12 soak-11 kernel: [<ffffffffa0bc5a15>] ptlrpc_hr_main+0xe5/0x2c0 [ptlrpc]
            Jun  5 16:51:12 soak-11 kernel: [<ffffffff810c54c0>] ? default_wake_function+0x0/0x20
            Jun  5 16:51:12 soak-11 kernel: [<ffffffffa0bc5930>] ? ptlrpc_hr_main+0x0/0x2c0 [ptlrpc]
            Jun  5 16:51:12 soak-11 kernel: [<ffffffff810b0a4f>] kthread+0xcf/0xe0
            Jun  5 16:51:12 soak-11 kernel: [<ffffffff810b0980>] ? kthread+0x0/0xe0
            Jun  5 16:51:12 soak-11 kernel: [<ffffffff81697318>] ret_from_fork+0x58/0x90
            Jun  5 16:51:12 soak-11 kernel: [<ffffffff810b0980>] ? kthread+0x0/0xe0
            
            cliffw Cliff White (Inactive) added a comment - Attempted to test latest patch for LU-9504 , hit this issue again immediately. Jun 5 16:51:12 soak-11 kernel: LustreError: 4204:0:(ldlm_lock.c:2548:ldlm_lock_downgrade()) ASSERTION( lock->l_granted_mode & (LCK_PW | LCK_EX) ) failed: Jun 5 16:51:12 soak-11 kernel: LustreError: 4204:0:(ldlm_lock.c:2548:ldlm_lock_downgrade()) LBUG Jun 5 16:51:12 soak-11 kernel: Pid: 4204, comm: ptlrpc_hr01_002 Jun 5 16:51:12 soak-11 kernel: #012Call Trace: Jun 5 16:51:12 soak-11 kernel: [<ffffffffa08637ee>] libcfs_call_trace+0x4e/0x60 [libcfs] Jun 5 16:51:12 soak-11 kernel: [<ffffffffa086387c>] lbug_with_loc+0x4c/0xb0 [libcfs] Jun 5 16:51:12 soak-11 kernel: [<ffffffffa0b6e83b>] ldlm_lock_downgrade+0x19b/0x1d0 [ptlrpc] Jun 5 16:51:12 soak-11 kernel: [<ffffffffa0bc57f8>] ptlrpc_handle_rs+0x5c8/0x700 [ptlrpc] Jun 5 16:51:12 soak-11 kernel: [<ffffffffa0bc5a15>] ptlrpc_hr_main+0xe5/0x2c0 [ptlrpc] Jun 5 16:51:12 soak-11 kernel: [<ffffffff810c54c0>] ? default_wake_function+0x0/0x20 Jun 5 16:51:12 soak-11 kernel: [<ffffffffa0bc5930>] ? ptlrpc_hr_main+0x0/0x2c0 [ptlrpc] Jun 5 16:51:12 soak-11 kernel: [<ffffffff810b0a4f>] kthread+0xcf/0xe0 Jun 5 16:51:12 soak-11 kernel: [<ffffffff810b0980>] ? kthread+0x0/0xe0 Jun 5 16:51:12 soak-11 kernel: [<ffffffff81697318>] ret_from_fork+0x58/0x90 Jun 5 16:51:12 soak-11 kernel: [<ffffffff810b0980>] ? kthread+0x0/0xe0

            Tested with latest patch of LU-9504
            https://review.whamcloud.com/#/c/27207/'

            Soak hit this issue immediately.

            ay 31 17:11:18 soak-9 kernel: LustreError: 4177:0:(ldlm_lock.c:2548:ldlm_lock_downgrade()) ASSERTION( lock->l_granted_mode & (LCK_PW | LCK_EX) ) failed:
            May 31 17:11:18 soak-9 kernel: LustreError: 4177:0:(ldlm_lock.c:2548:ldlm_lock_downgrade()) LBUG
            May 31 17:11:18 soak-9 kernel: Pid: 4177, comm: ptlrpc_hr01_003
            May 31 17:11:18 soak-9 kernel: #012Call Trace:
            May 31 17:11:18 soak-9 kernel: [<ffffffffa08247ee>] libcfs_call_trace+0x4e/0x60 [libcfs]
            May 31 17:11:19 soak-9 kernel: [<ffffffffa082487c>] lbug_with_loc+0x4c/0xb0 [libcfs]
            May 31 17:11:19 soak-9 kernel: [<ffffffffa0b6283b>] ldlm_lock_downgrade+0x19b/0x1d0 [ptlrpc]
            May 31 17:11:19 soak-9 kernel: [<ffffffffa0bb9620>] ptlrpc_handle_rs+0x3f0/0x640 [ptlrpc]
            May 31 17:11:19 soak-9 kernel: [<ffffffffa0bb9955>] ptlrpc_hr_main+0xe5/0x2c0 [ptlrpc]
            May 31 17:11:19 soak-9 kernel: [<ffffffff810c54c0>] ? default_wake_function+0x0/0x20
            May 31 17:11:19 soak-9 kernel: [<ffffffffa0bb9870>] ? ptlrpc_hr_main+0x0/0x2c0 [ptlrpc]
            May 31 17:11:19 soak-9 kernel: [<ffffffff810b0a4f>] kthread+0xcf/0xe0
            May 31 17:11:19 soak-9 kernel: [<ffffffff810b0980>] ? kthread+0x0/0xe0
            May 31 17:11:19 soak-9 kernel: [<ffffffff81697318>] ret_from_fork+0x58/0x90
            May 31 17:11:19 soak-9 kernel: [<ffffffff810b0980>] ? kthread+0x0/0xe0
            May 31 17:11:19 soak-9 kernel:
            May 31 17:11:19 soak-9 kernel: Kernel panic - not syncing: LBUG
            

            Soak is dead until we see a fix for this.

            cliffw Cliff White (Inactive) added a comment - Tested with latest patch of LU-9504 https://review.whamcloud.com/#/c/27207/ ' Soak hit this issue immediately. ay 31 17:11:18 soak-9 kernel: LustreError: 4177:0:(ldlm_lock.c:2548:ldlm_lock_downgrade()) ASSERTION( lock->l_granted_mode & (LCK_PW | LCK_EX) ) failed: May 31 17:11:18 soak-9 kernel: LustreError: 4177:0:(ldlm_lock.c:2548:ldlm_lock_downgrade()) LBUG May 31 17:11:18 soak-9 kernel: Pid: 4177, comm: ptlrpc_hr01_003 May 31 17:11:18 soak-9 kernel: #012Call Trace: May 31 17:11:18 soak-9 kernel: [<ffffffffa08247ee>] libcfs_call_trace+0x4e/0x60 [libcfs] May 31 17:11:19 soak-9 kernel: [<ffffffffa082487c>] lbug_with_loc+0x4c/0xb0 [libcfs] May 31 17:11:19 soak-9 kernel: [<ffffffffa0b6283b>] ldlm_lock_downgrade+0x19b/0x1d0 [ptlrpc] May 31 17:11:19 soak-9 kernel: [<ffffffffa0bb9620>] ptlrpc_handle_rs+0x3f0/0x640 [ptlrpc] May 31 17:11:19 soak-9 kernel: [<ffffffffa0bb9955>] ptlrpc_hr_main+0xe5/0x2c0 [ptlrpc] May 31 17:11:19 soak-9 kernel: [<ffffffff810c54c0>] ? default_wake_function+0x0/0x20 May 31 17:11:19 soak-9 kernel: [<ffffffffa0bb9870>] ? ptlrpc_hr_main+0x0/0x2c0 [ptlrpc] May 31 17:11:19 soak-9 kernel: [<ffffffff810b0a4f>] kthread+0xcf/0xe0 May 31 17:11:19 soak-9 kernel: [<ffffffff810b0980>] ? kthread+0x0/0xe0 May 31 17:11:19 soak-9 kernel: [<ffffffff81697318>] ret_from_fork+0x58/0x90 May 31 17:11:19 soak-9 kernel: [<ffffffff810b0980>] ? kthread+0x0/0xe0 May 31 17:11:19 soak-9 kernel: May 31 17:11:19 soak-9 kernel: Kernel panic - not syncing: LBUG Soak is dead until we see a fix for this.

            Soak is dead until this issue is fixed.

            cliffw Cliff White (Inactive) added a comment - Soak is dead until this issue is fixed.
            pjones Peter Jones added a comment -

            ok then as LU-9504 is not landed yet, let's close this asa duplicate of LU-9504

            pjones Peter Jones added a comment - ok then as LU-9504 is not landed yet, let's close this asa duplicate of LU-9504
            laisiyao Lai Siyao added a comment -

            https://review.whamcloud.com/#/c/27207/ for LU-9504 doesn't consider race with mdt_steal_ack_lock(), I'll update that patch.

            laisiyao Lai Siyao added a comment - https://review.whamcloud.com/#/c/27207/ for LU-9504 doesn't consider race with mdt_steal_ack_lock(), I'll update that patch.
            pjones Peter Jones added a comment -

            Lai

            Could you please advise on this one?

            Thanks

            Peter

            pjones Peter Jones added a comment - Lai Could you please advise on this one? Thanks Peter
            jamesanunez James Nunez (Inactive) added a comment - - edited Soak was running the build described at https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Soak#SoakTestingonSoak-20170525

            People

              laisiyao Lai Siyao
              cliffw Cliff White (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: