Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4971

sanity-scrub test_2: ldlm_lock2desc()) ASSERTION( lock->l_policy_data.l_inodebits.bits == (MDS_INODELOCK_LOOKUP | MDS_INODELOCK_UPDATE | MDS_INODELOCK_LAYOUT)failed

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.6.0
    • Lustre 2.6.0, Lustre 2.5.4
    • None
    • 3
    • 13763

    Description

      This issue was created by maloo for Nathaniel Clark <nathaniel.l.clark@intel.com>

      This issue relates to the following test suite run:
      http://maloo.whamcloud.com/test_sets/598caeaa-cd2c-11e3-b548-52540035b04c
      https://maloo.whamcloud.com/test_sets/fc94ece0-a552-11e3-9fee-52540035b04c

      The sub-test test_2 failed with the following error:

      test failed to respond and timed out

      Info required for matching: sanity-scrub 2

      Attachments

        Issue Links

          Activity

            [LU-4971] sanity-scrub test_2: ldlm_lock2desc()) ASSERTION( lock->l_policy_data.l_inodebits.bits == (MDS_INODELOCK_LOOKUP | MDS_INODELOCK_UPDATE | MDS_INODELOCK_LAYOUT)failed
            yujian Jian Yu added a comment -

            Thank you, Nasf. Will do.

            yujian Jian Yu added a comment - Thank you, Nasf. Will do.

            Yujian,

            I do not think it is http://review.whamcloud.com/#/c/12606/ caused the trouble. You need to back-port patch(es) to fix it.

            yong.fan nasf (Inactive) added a comment - Yujian, I do not think it is http://review.whamcloud.com/#/c/12606/ caused the trouble. You need to back-port patch(es) to fix it.
            yujian Jian Yu added a comment -

            While verifying patch http://review.whamcloud.com/12606 on Lustre b2_5 branch, the same failure occurred:

            LustreError: 19407:0:(ldlm_lock.c:669:ldlm_lock2desc()) ASSERTION( lock->l_policy_data.l_inodebits.bits == (MDS_INODELOCK_LOOKUP | MDS_INODELOCK_UPDATE | MDS_INODELOCK_LAYOUT) ) failed: Inappropriate inode lock bits during conversion 3
            LustreError: 19407:0:(ldlm_lock.c:669:ldlm_lock2desc()) LBUG
            Pid: 19407, comm: stat
            
            Call Trace:
             [<ffffffffa03d0895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] 
             [<ffffffffa03d0e97>] lbug_with_loc+0x47/0xb0 [libcfs] 
             [<ffffffffa067a239>] ldlm_lock2desc+0x179/0x180 [ptlrpc] 
             [<ffffffffa068cb90>] ldlm_cli_enqueue+0x1f0/0x790 [ptlrpc] 
             [<ffffffffa06b29ea>] ? ptlrpc_request_set_replen+0x3a/0x60 [ptlrpc] 
             [<ffffffffa06919d0>] ? ldlm_completion_ast+0x0/0x920 [ptlrpc] 
             [<ffffffffa0a825c0>] ? ll_md_blocking_ast+0x0/0x7d0 [lustre] 
             [<ffffffffa08ded8e>] mdc_enqueue+0x2be/0x1a10 [mdc]
             [<ffffffffa01bc294>] ? fld_client_rpc+0x864/0xed0 [fld]
             [<ffffffffa08e06dd>] mdc_intent_lock+0x1fd/0x64a [mdc]
             [<ffffffffa0a825c0>] ? ll_md_blocking_ast+0x0/0x7d0 [lustre] 
             [<ffffffffa06919d0>] ? ldlm_completion_ast+0x0/0x920 [ptlrpc] 
             [<ffffffffa00e58b8>] ? lprocfs_counter_add+0x1a8/0x1d6 [lvfs]
             [<ffffffffa08a846e>] lmv_intent_remote+0x47e/0xa80 [lmv]
             [<ffffffffa0a825c0>] ? ll_md_blocking_ast+0x0/0x7d0 [lustre] 
             [<ffffffffa08a9137>] lmv_intent_lookup+0x6c7/0x700 [lmv]
             [<ffffffffa0a825c0>] ? ll_md_blocking_ast+0x0/0x7d0 [lustre] 
             [<ffffffffa08a9d6a>] lmv_intent_lock+0x32a/0x380 [lmv]
             [<ffffffffa0a825c0>] ? ll_md_blocking_ast+0x0/0x7d0 [lustre] 
             [<ffffffffa0a81d0e>] ? ll_i2gids+0x2e/0xd0 [lustre] 
             [<ffffffffa0a68c0d>] ? ll_prep_md_op_data+0x10d/0x3b0 [lustre] 
             [<ffffffffa0a8491f>] ll_lookup_it+0x33f/0xb00 [lustre] 
             [<ffffffffa0a825c0>] ? ll_md_blocking_ast+0x0/0x7d0 [lustre] 
             [<ffffffffa0a50c51>] ? __ll_inode_revalidate_it+0x1e1/0xc30 [lustre] 
             [<ffffffffa0a8535f>] ll_lookup_nd+0x27f/0x3f0 [lustre] 
             [<ffffffff811a42fe>] ? d_alloc+0x13e/0x1b0
             [<ffffffff81198a35>] do_lookup+0x1a5/0x230
             [<ffffffff81199100>] __link_path_walk+0x200/0x1000
             [<ffffffff8114a3d7>] ? handle_pte_fault+0xf7/0xb00
             [<ffffffff8119a1ba>] path_walk+0x6a/0xe0
             [<ffffffff8119a3cb>] filename_lookup+0x6b/0xc0
             [<ffffffff8119b4f7>] user_path_at+0x57/0xa0
             [<ffffffff8104a98c>] ? __do_page_fault+0x1ec/0x480
             [<ffffffff8118e990>] vfs_fstatat+0x50/0xa0
             [<ffffffff811515a5>] ? do_mmap_pgoff+0x335/0x380
             [<ffffffff8118ea4e>] vfs_lstat+0x1e/0x20
             [<ffffffff8118ea74>] sys_newlstat+0x24/0x50
             [<ffffffff810e1e07>] ? audit_syscall_entry+0x1d7/0x200
             [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
            
            Kernel panic - not syncing: LBUG
            

            Maloo report: https://testing.hpdd.intel.com/test_sets/f4332d04-7af0-11e4-956d-5254006e85c2

            Hi Nasf, could you please take a look whether this is a regression introduced by the patch http://review.whamcloud.com/12606 or not? If not, then this is an issue on Lustre b2_5 branch and I'll back-port your patch to fix it. Thank you!

            yujian Jian Yu added a comment - While verifying patch http://review.whamcloud.com/12606 on Lustre b2_5 branch, the same failure occurred: LustreError: 19407:0:(ldlm_lock.c:669:ldlm_lock2desc()) ASSERTION( lock->l_policy_data.l_inodebits.bits == (MDS_INODELOCK_LOOKUP | MDS_INODELOCK_UPDATE | MDS_INODELOCK_LAYOUT) ) failed: Inappropriate inode lock bits during conversion 3 LustreError: 19407:0:(ldlm_lock.c:669:ldlm_lock2desc()) LBUG Pid: 19407, comm: stat Call Trace: [<ffffffffa03d0895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [<ffffffffa03d0e97>] lbug_with_loc+0x47/0xb0 [libcfs] [<ffffffffa067a239>] ldlm_lock2desc+0x179/0x180 [ptlrpc] [<ffffffffa068cb90>] ldlm_cli_enqueue+0x1f0/0x790 [ptlrpc] [<ffffffffa06b29ea>] ? ptlrpc_request_set_replen+0x3a/0x60 [ptlrpc] [<ffffffffa06919d0>] ? ldlm_completion_ast+0x0/0x920 [ptlrpc] [<ffffffffa0a825c0>] ? ll_md_blocking_ast+0x0/0x7d0 [lustre] [<ffffffffa08ded8e>] mdc_enqueue+0x2be/0x1a10 [mdc] [<ffffffffa01bc294>] ? fld_client_rpc+0x864/0xed0 [fld] [<ffffffffa08e06dd>] mdc_intent_lock+0x1fd/0x64a [mdc] [<ffffffffa0a825c0>] ? ll_md_blocking_ast+0x0/0x7d0 [lustre] [<ffffffffa06919d0>] ? ldlm_completion_ast+0x0/0x920 [ptlrpc] [<ffffffffa00e58b8>] ? lprocfs_counter_add+0x1a8/0x1d6 [lvfs] [<ffffffffa08a846e>] lmv_intent_remote+0x47e/0xa80 [lmv] [<ffffffffa0a825c0>] ? ll_md_blocking_ast+0x0/0x7d0 [lustre] [<ffffffffa08a9137>] lmv_intent_lookup+0x6c7/0x700 [lmv] [<ffffffffa0a825c0>] ? ll_md_blocking_ast+0x0/0x7d0 [lustre] [<ffffffffa08a9d6a>] lmv_intent_lock+0x32a/0x380 [lmv] [<ffffffffa0a825c0>] ? ll_md_blocking_ast+0x0/0x7d0 [lustre] [<ffffffffa0a81d0e>] ? ll_i2gids+0x2e/0xd0 [lustre] [<ffffffffa0a68c0d>] ? ll_prep_md_op_data+0x10d/0x3b0 [lustre] [<ffffffffa0a8491f>] ll_lookup_it+0x33f/0xb00 [lustre] [<ffffffffa0a825c0>] ? ll_md_blocking_ast+0x0/0x7d0 [lustre] [<ffffffffa0a50c51>] ? __ll_inode_revalidate_it+0x1e1/0xc30 [lustre] [<ffffffffa0a8535f>] ll_lookup_nd+0x27f/0x3f0 [lustre] [<ffffffff811a42fe>] ? d_alloc+0x13e/0x1b0 [<ffffffff81198a35>] do_lookup+0x1a5/0x230 [<ffffffff81199100>] __link_path_walk+0x200/0x1000 [<ffffffff8114a3d7>] ? handle_pte_fault+0xf7/0xb00 [<ffffffff8119a1ba>] path_walk+0x6a/0xe0 [<ffffffff8119a3cb>] filename_lookup+0x6b/0xc0 [<ffffffff8119b4f7>] user_path_at+0x57/0xa0 [<ffffffff8104a98c>] ? __do_page_fault+0x1ec/0x480 [<ffffffff8118e990>] vfs_fstatat+0x50/0xa0 [<ffffffff811515a5>] ? do_mmap_pgoff+0x335/0x380 [<ffffffff8118ea4e>] vfs_lstat+0x1e/0x20 [<ffffffff8118ea74>] sys_newlstat+0x24/0x50 [<ffffffff810e1e07>] ? audit_syscall_entry+0x1d7/0x200 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b Kernel panic - not syncing: LBUG Maloo report: https://testing.hpdd.intel.com/test_sets/f4332d04-7af0-11e4-956d-5254006e85c2 Hi Nasf, could you please take a look whether this is a regression introduced by the patch http://review.whamcloud.com/12606 or not? If not, then this is an issue on Lustre b2_5 branch and I'll back-port your patch to fix it. Thank you!

            The patch has been landed to master.

            yong.fan nasf (Inactive) added a comment - The patch has been landed to master.

            Here is the patch to drop the redundant ibits lock interoperability check:

            http://review.whamcloud.com/11004

            yong.fan nasf (Inactive) added a comment - Here is the patch to drop the redundant ibits lock interoperability check: http://review.whamcloud.com/11004
            di.wang Di Wang added a comment -

            "Even though we drop the incompatible ibits, we still needs to prevent the user to use the non-initialised export, otherwise, there may be other potential bugs."

            That is something should never happen, IMHO.

            di.wang Di Wang added a comment - "Even though we drop the incompatible ibits, we still needs to prevent the user to use the non-initialised export, otherwise, there may be other potential bugs." That is something should never happen, IMHO.

            Thanks James!

            yong.fan nasf (Inactive) added a comment - Thanks James!
            jamesanunez James Nunez (Inactive) added a comment - - edited

            I've tested the patch at http://review.whamcloud.com/#/c/10958/ (version 2) and, after running sanity-scrub three times, I have not seen this assertion.

            The patch fixes the crash/error for the configuration I've been testing with.

            jamesanunez James Nunez (Inactive) added a comment - - edited I've tested the patch at http://review.whamcloud.com/#/c/10958/ (version 2) and, after running sanity-scrub three times, I have not seen this assertion. The patch fixes the crash/error for the configuration I've been testing with.

            Even though we drop the incompatible ibits, we still needs to prevent the user to use the non-initialised export, otherwise, there may be other potential bugs.

            yong.fan nasf (Inactive) added a comment - Even though we drop the incompatible ibits, we still needs to prevent the user to use the non-initialised export, otherwise, there may be other potential bugs.
            di.wang Di Wang added a comment -

            Because the key issue is that we are checking un-iniitialized export flags.

            Yes, but that is for checking if the server support inodebit lock. But since 2.6 client will be only connected to inodebit enable server, so why do we need this check? TBH, I really do not like add "connection wait logic" above ptlrpc, which might bring us troubles or even worse.

            di.wang Di Wang added a comment - Because the key issue is that we are checking un-iniitialized export flags. Yes, but that is for checking if the server support inodebit lock. But since 2.6 client will be only connected to inodebit enable server, so why do we need this check? TBH, I really do not like add "connection wait logic" above ptlrpc, which might bring us troubles or even worse.

            Wangdi, I have verified the race conditions locally. But I am not sure whether removing the incompatible bits is a suitable solution or not. Because the key issue is that we are checking un-iniitialized export flags.

            yong.fan nasf (Inactive) added a comment - Wangdi, I have verified the race conditions locally. But I am not sure whether removing the incompatible bits is a suitable solution or not. Because the key issue is that we are checking un-iniitialized export flags.

            People

              yong.fan nasf (Inactive)
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: