Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4971

sanity-scrub test_2: ldlm_lock2desc()) ASSERTION( lock->l_policy_data.l_inodebits.bits == (MDS_INODELOCK_LOOKUP | MDS_INODELOCK_UPDATE | MDS_INODELOCK_LAYOUT)failed

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.6.0
    • Lustre 2.6.0, Lustre 2.5.4
    • None
    • 3
    • 13763

    Description

      This issue was created by maloo for Nathaniel Clark <nathaniel.l.clark@intel.com>

      This issue relates to the following test suite run:
      http://maloo.whamcloud.com/test_sets/598caeaa-cd2c-11e3-b548-52540035b04c
      https://maloo.whamcloud.com/test_sets/fc94ece0-a552-11e3-9fee-52540035b04c

      The sub-test test_2 failed with the following error:

      test failed to respond and timed out

      Info required for matching: sanity-scrub 2

      Attachments

        Issue Links

          Activity

            [LU-4971] sanity-scrub test_2: ldlm_lock2desc()) ASSERTION( lock->l_policy_data.l_inodebits.bits == (MDS_INODELOCK_LOOKUP | MDS_INODELOCK_UPDATE | MDS_INODELOCK_LAYOUT)failed

            Jian Yu (jian.yu@intel.com) uploaded a new patch: http://review.whamcloud.com/12960
            Subject: LU-4971 ldlm: drop redundant ibits lock interoperability check
            Project: fs/lustre-release
            Branch: b2_5
            Current Patch Set: 1
            Commit: 2b44f94c2bfbefd9b0f85a52218979b851d7af58

            gerrit Gerrit Updater added a comment - Jian Yu (jian.yu@intel.com) uploaded a new patch: http://review.whamcloud.com/12960 Subject: LU-4971 ldlm: drop redundant ibits lock interoperability check Project: fs/lustre-release Branch: b2_5 Current Patch Set: 1 Commit: 2b44f94c2bfbefd9b0f85a52218979b851d7af58
            yujian Jian Yu added a comment - One more instance on Lustre b2_5 branch: https://testing.hpdd.intel.com/test_sets/c800bb44-5e7f-11e4-9843-5254006e85c2
            yujian Jian Yu added a comment -

            Thank you, Nasf. Will do.

            yujian Jian Yu added a comment - Thank you, Nasf. Will do.

            Yujian,

            I do not think it is http://review.whamcloud.com/#/c/12606/ caused the trouble. You need to back-port patch(es) to fix it.

            yong.fan nasf (Inactive) added a comment - Yujian, I do not think it is http://review.whamcloud.com/#/c/12606/ caused the trouble. You need to back-port patch(es) to fix it.
            yujian Jian Yu added a comment -

            While verifying patch http://review.whamcloud.com/12606 on Lustre b2_5 branch, the same failure occurred:

            LustreError: 19407:0:(ldlm_lock.c:669:ldlm_lock2desc()) ASSERTION( lock->l_policy_data.l_inodebits.bits == (MDS_INODELOCK_LOOKUP | MDS_INODELOCK_UPDATE | MDS_INODELOCK_LAYOUT) ) failed: Inappropriate inode lock bits during conversion 3
            LustreError: 19407:0:(ldlm_lock.c:669:ldlm_lock2desc()) LBUG
            Pid: 19407, comm: stat
            
            Call Trace:
             [<ffffffffa03d0895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] 
             [<ffffffffa03d0e97>] lbug_with_loc+0x47/0xb0 [libcfs] 
             [<ffffffffa067a239>] ldlm_lock2desc+0x179/0x180 [ptlrpc] 
             [<ffffffffa068cb90>] ldlm_cli_enqueue+0x1f0/0x790 [ptlrpc] 
             [<ffffffffa06b29ea>] ? ptlrpc_request_set_replen+0x3a/0x60 [ptlrpc] 
             [<ffffffffa06919d0>] ? ldlm_completion_ast+0x0/0x920 [ptlrpc] 
             [<ffffffffa0a825c0>] ? ll_md_blocking_ast+0x0/0x7d0 [lustre] 
             [<ffffffffa08ded8e>] mdc_enqueue+0x2be/0x1a10 [mdc]
             [<ffffffffa01bc294>] ? fld_client_rpc+0x864/0xed0 [fld]
             [<ffffffffa08e06dd>] mdc_intent_lock+0x1fd/0x64a [mdc]
             [<ffffffffa0a825c0>] ? ll_md_blocking_ast+0x0/0x7d0 [lustre] 
             [<ffffffffa06919d0>] ? ldlm_completion_ast+0x0/0x920 [ptlrpc] 
             [<ffffffffa00e58b8>] ? lprocfs_counter_add+0x1a8/0x1d6 [lvfs]
             [<ffffffffa08a846e>] lmv_intent_remote+0x47e/0xa80 [lmv]
             [<ffffffffa0a825c0>] ? ll_md_blocking_ast+0x0/0x7d0 [lustre] 
             [<ffffffffa08a9137>] lmv_intent_lookup+0x6c7/0x700 [lmv]
             [<ffffffffa0a825c0>] ? ll_md_blocking_ast+0x0/0x7d0 [lustre] 
             [<ffffffffa08a9d6a>] lmv_intent_lock+0x32a/0x380 [lmv]
             [<ffffffffa0a825c0>] ? ll_md_blocking_ast+0x0/0x7d0 [lustre] 
             [<ffffffffa0a81d0e>] ? ll_i2gids+0x2e/0xd0 [lustre] 
             [<ffffffffa0a68c0d>] ? ll_prep_md_op_data+0x10d/0x3b0 [lustre] 
             [<ffffffffa0a8491f>] ll_lookup_it+0x33f/0xb00 [lustre] 
             [<ffffffffa0a825c0>] ? ll_md_blocking_ast+0x0/0x7d0 [lustre] 
             [<ffffffffa0a50c51>] ? __ll_inode_revalidate_it+0x1e1/0xc30 [lustre] 
             [<ffffffffa0a8535f>] ll_lookup_nd+0x27f/0x3f0 [lustre] 
             [<ffffffff811a42fe>] ? d_alloc+0x13e/0x1b0
             [<ffffffff81198a35>] do_lookup+0x1a5/0x230
             [<ffffffff81199100>] __link_path_walk+0x200/0x1000
             [<ffffffff8114a3d7>] ? handle_pte_fault+0xf7/0xb00
             [<ffffffff8119a1ba>] path_walk+0x6a/0xe0
             [<ffffffff8119a3cb>] filename_lookup+0x6b/0xc0
             [<ffffffff8119b4f7>] user_path_at+0x57/0xa0
             [<ffffffff8104a98c>] ? __do_page_fault+0x1ec/0x480
             [<ffffffff8118e990>] vfs_fstatat+0x50/0xa0
             [<ffffffff811515a5>] ? do_mmap_pgoff+0x335/0x380
             [<ffffffff8118ea4e>] vfs_lstat+0x1e/0x20
             [<ffffffff8118ea74>] sys_newlstat+0x24/0x50
             [<ffffffff810e1e07>] ? audit_syscall_entry+0x1d7/0x200
             [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
            
            Kernel panic - not syncing: LBUG
            

            Maloo report: https://testing.hpdd.intel.com/test_sets/f4332d04-7af0-11e4-956d-5254006e85c2

            Hi Nasf, could you please take a look whether this is a regression introduced by the patch http://review.whamcloud.com/12606 or not? If not, then this is an issue on Lustre b2_5 branch and I'll back-port your patch to fix it. Thank you!

            yujian Jian Yu added a comment - While verifying patch http://review.whamcloud.com/12606 on Lustre b2_5 branch, the same failure occurred: LustreError: 19407:0:(ldlm_lock.c:669:ldlm_lock2desc()) ASSERTION( lock->l_policy_data.l_inodebits.bits == (MDS_INODELOCK_LOOKUP | MDS_INODELOCK_UPDATE | MDS_INODELOCK_LAYOUT) ) failed: Inappropriate inode lock bits during conversion 3 LustreError: 19407:0:(ldlm_lock.c:669:ldlm_lock2desc()) LBUG Pid: 19407, comm: stat Call Trace: [<ffffffffa03d0895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [<ffffffffa03d0e97>] lbug_with_loc+0x47/0xb0 [libcfs] [<ffffffffa067a239>] ldlm_lock2desc+0x179/0x180 [ptlrpc] [<ffffffffa068cb90>] ldlm_cli_enqueue+0x1f0/0x790 [ptlrpc] [<ffffffffa06b29ea>] ? ptlrpc_request_set_replen+0x3a/0x60 [ptlrpc] [<ffffffffa06919d0>] ? ldlm_completion_ast+0x0/0x920 [ptlrpc] [<ffffffffa0a825c0>] ? ll_md_blocking_ast+0x0/0x7d0 [lustre] [<ffffffffa08ded8e>] mdc_enqueue+0x2be/0x1a10 [mdc] [<ffffffffa01bc294>] ? fld_client_rpc+0x864/0xed0 [fld] [<ffffffffa08e06dd>] mdc_intent_lock+0x1fd/0x64a [mdc] [<ffffffffa0a825c0>] ? ll_md_blocking_ast+0x0/0x7d0 [lustre] [<ffffffffa06919d0>] ? ldlm_completion_ast+0x0/0x920 [ptlrpc] [<ffffffffa00e58b8>] ? lprocfs_counter_add+0x1a8/0x1d6 [lvfs] [<ffffffffa08a846e>] lmv_intent_remote+0x47e/0xa80 [lmv] [<ffffffffa0a825c0>] ? ll_md_blocking_ast+0x0/0x7d0 [lustre] [<ffffffffa08a9137>] lmv_intent_lookup+0x6c7/0x700 [lmv] [<ffffffffa0a825c0>] ? ll_md_blocking_ast+0x0/0x7d0 [lustre] [<ffffffffa08a9d6a>] lmv_intent_lock+0x32a/0x380 [lmv] [<ffffffffa0a825c0>] ? ll_md_blocking_ast+0x0/0x7d0 [lustre] [<ffffffffa0a81d0e>] ? ll_i2gids+0x2e/0xd0 [lustre] [<ffffffffa0a68c0d>] ? ll_prep_md_op_data+0x10d/0x3b0 [lustre] [<ffffffffa0a8491f>] ll_lookup_it+0x33f/0xb00 [lustre] [<ffffffffa0a825c0>] ? ll_md_blocking_ast+0x0/0x7d0 [lustre] [<ffffffffa0a50c51>] ? __ll_inode_revalidate_it+0x1e1/0xc30 [lustre] [<ffffffffa0a8535f>] ll_lookup_nd+0x27f/0x3f0 [lustre] [<ffffffff811a42fe>] ? d_alloc+0x13e/0x1b0 [<ffffffff81198a35>] do_lookup+0x1a5/0x230 [<ffffffff81199100>] __link_path_walk+0x200/0x1000 [<ffffffff8114a3d7>] ? handle_pte_fault+0xf7/0xb00 [<ffffffff8119a1ba>] path_walk+0x6a/0xe0 [<ffffffff8119a3cb>] filename_lookup+0x6b/0xc0 [<ffffffff8119b4f7>] user_path_at+0x57/0xa0 [<ffffffff8104a98c>] ? __do_page_fault+0x1ec/0x480 [<ffffffff8118e990>] vfs_fstatat+0x50/0xa0 [<ffffffff811515a5>] ? do_mmap_pgoff+0x335/0x380 [<ffffffff8118ea4e>] vfs_lstat+0x1e/0x20 [<ffffffff8118ea74>] sys_newlstat+0x24/0x50 [<ffffffff810e1e07>] ? audit_syscall_entry+0x1d7/0x200 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b Kernel panic - not syncing: LBUG Maloo report: https://testing.hpdd.intel.com/test_sets/f4332d04-7af0-11e4-956d-5254006e85c2 Hi Nasf, could you please take a look whether this is a regression introduced by the patch http://review.whamcloud.com/12606 or not? If not, then this is an issue on Lustre b2_5 branch and I'll back-port your patch to fix it. Thank you!

            The patch has been landed to master.

            yong.fan nasf (Inactive) added a comment - The patch has been landed to master.

            Here is the patch to drop the redundant ibits lock interoperability check:

            http://review.whamcloud.com/11004

            yong.fan nasf (Inactive) added a comment - Here is the patch to drop the redundant ibits lock interoperability check: http://review.whamcloud.com/11004
            di.wang Di Wang added a comment -

            "Even though we drop the incompatible ibits, we still needs to prevent the user to use the non-initialised export, otherwise, there may be other potential bugs."

            That is something should never happen, IMHO.

            di.wang Di Wang added a comment - "Even though we drop the incompatible ibits, we still needs to prevent the user to use the non-initialised export, otherwise, there may be other potential bugs." That is something should never happen, IMHO.

            Thanks James!

            yong.fan nasf (Inactive) added a comment - Thanks James!
            jamesanunez James Nunez (Inactive) added a comment - - edited

            I've tested the patch at http://review.whamcloud.com/#/c/10958/ (version 2) and, after running sanity-scrub three times, I have not seen this assertion.

            The patch fixes the crash/error for the configuration I've been testing with.

            jamesanunez James Nunez (Inactive) added a comment - - edited I've tested the patch at http://review.whamcloud.com/#/c/10958/ (version 2) and, after running sanity-scrub three times, I have not seen this assertion. The patch fixes the crash/error for the configuration I've been testing with.

            People

              yong.fan nasf (Inactive)
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: