Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4971

sanity-scrub test_2: ldlm_lock2desc()) ASSERTION( lock->l_policy_data.l_inodebits.bits == (MDS_INODELOCK_LOOKUP | MDS_INODELOCK_UPDATE | MDS_INODELOCK_LAYOUT)failed

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.6.0
    • Lustre 2.6.0, Lustre 2.5.4
    • None
    • 3
    • 13763

    Description

      This issue was created by maloo for Nathaniel Clark <nathaniel.l.clark@intel.com>

      This issue relates to the following test suite run:
      http://maloo.whamcloud.com/test_sets/598caeaa-cd2c-11e3-b548-52540035b04c
      https://maloo.whamcloud.com/test_sets/fc94ece0-a552-11e3-9fee-52540035b04c

      The sub-test test_2 failed with the following error:

      test failed to respond and timed out

      Info required for matching: sanity-scrub 2

      Attachments

        Issue Links

          Activity

            [LU-4971] sanity-scrub test_2: ldlm_lock2desc()) ASSERTION( lock->l_policy_data.l_inodebits.bits == (MDS_INODELOCK_LOOKUP | MDS_INODELOCK_UPDATE | MDS_INODELOCK_LAYOUT)failed

            Here is the patch to drop the redundant ibits lock interoperability check:

            http://review.whamcloud.com/11004

            yong.fan nasf (Inactive) added a comment - Here is the patch to drop the redundant ibits lock interoperability check: http://review.whamcloud.com/11004

            "Even though we drop the incompatible ibits, we still needs to prevent the user to use the non-initialised export, otherwise, there may be other potential bugs."

            That is something should never happen, IMHO.

            di.wang Di Wang (Inactive) added a comment - "Even though we drop the incompatible ibits, we still needs to prevent the user to use the non-initialised export, otherwise, there may be other potential bugs." That is something should never happen, IMHO.

            Thanks James!

            yong.fan nasf (Inactive) added a comment - Thanks James!
            jamesanunez James Nunez (Inactive) added a comment - - edited

            I've tested the patch at http://review.whamcloud.com/#/c/10958/ (version 2) and, after running sanity-scrub three times, I have not seen this assertion.

            The patch fixes the crash/error for the configuration I've been testing with.

            jamesanunez James Nunez (Inactive) added a comment - - edited I've tested the patch at http://review.whamcloud.com/#/c/10958/ (version 2) and, after running sanity-scrub three times, I have not seen this assertion. The patch fixes the crash/error for the configuration I've been testing with.

            Even though we drop the incompatible ibits, we still needs to prevent the user to use the non-initialised export, otherwise, there may be other potential bugs.

            yong.fan nasf (Inactive) added a comment - Even though we drop the incompatible ibits, we still needs to prevent the user to use the non-initialised export, otherwise, there may be other potential bugs.

            Because the key issue is that we are checking un-iniitialized export flags.

            Yes, but that is for checking if the server support inodebit lock. But since 2.6 client will be only connected to inodebit enable server, so why do we need this check? TBH, I really do not like add "connection wait logic" above ptlrpc, which might bring us troubles or even worse.

            di.wang Di Wang (Inactive) added a comment - Because the key issue is that we are checking un-iniitialized export flags. Yes, but that is for checking if the server support inodebit lock. But since 2.6 client will be only connected to inodebit enable server, so why do we need this check? TBH, I really do not like add "connection wait logic" above ptlrpc, which might bring us troubles or even worse.

            Wangdi, I have verified the race conditions locally. But I am not sure whether removing the incompatible bits is a suitable solution or not. Because the key issue is that we are checking un-iniitialized export flags.

            yong.fan nasf (Inactive) added a comment - Wangdi, I have verified the race conditions locally. But I am not sure whether removing the incompatible bits is a suitable solution or not. Because the key issue is that we are checking un-iniitialized export flags.

            Ah, I see. Hmm, I still think remove those incompatible inodebit handling is a better choice, or can we do sth like this

            root@testnode lustre-release]# git diff lustre/ldlm/
            diff --git a/lustre/ldlm/ldlm_request.c b/lustre/ldlm/ldlm_request.c
            index 76f329d..2c0d05d 100644
            --- a/lustre/ldlm/ldlm_request.c
            +++ b/lustre/ldlm/ldlm_request.c
            @@ -909,7 +909,7 @@ int ldlm_cli_enqueue(struct obd_export *exp, struct ptlrpc_request **reqp,
                                          OBD_CONNECT_IBITS))
                                             lock->l_policy_data.l_inodebits.bits =
                                                     MDS_INODELOCK_LOOKUP |
            -                                        MDS_INODELOCK_UPDATE;
            +                                        MDS_INODELOCK_UPDATE | MDS_INODELOCK_LAYOUT;
                                     else
                                             lock->l_policy_data = *policy;
                             }
            
            di.wang Di Wang (Inactive) added a comment - Ah, I see. Hmm, I still think remove those incompatible inodebit handling is a better choice, or can we do sth like this root@testnode lustre-release]# git diff lustre/ldlm/ diff --git a/lustre/ldlm/ldlm_request.c b/lustre/ldlm/ldlm_request.c index 76f329d..2c0d05d 100644 --- a/lustre/ldlm/ldlm_request.c +++ b/lustre/ldlm/ldlm_request.c @@ -909,7 +909,7 @@ int ldlm_cli_enqueue(struct obd_export *exp, struct ptlrpc_request **reqp, OBD_CONNECT_IBITS)) lock->l_policy_data.l_inodebits.bits = MDS_INODELOCK_LOOKUP | - MDS_INODELOCK_UPDATE; + MDS_INODELOCK_UPDATE | MDS_INODELOCK_LAYOUT; else lock->l_policy_data = *policy; }

            No, for single MDT case, there is no such race condition. Because the mount process will call md_getstatus() on the ROOT object firstly, it is NOT ldlm_enqueue RPC. Then when the mount return to user-space, it can guarantee that the connection between the client and the MDT0 has been established successfully.

            yong.fan nasf (Inactive) added a comment - No, for single MDT case, there is no such race condition. Because the mount process will call md_getstatus() on the ROOT object firstly, it is NOT ldlm_enqueue RPC. Then when the mount return to user-space, it can guarantee that the connection between the client and the MDT0 has been established successfully.

            Hmm, I think this is the problem for single MDT FS too, i.e. ldlm_cli_enqueue should not use exp connection flag before it make sure the connection is established?
            This inodebit connect flag is being added even before 1.6 I guess? At least this inodebit interop code is added in 2007

            commit 113303973ec9f8484eb2355a1a6ef3c4c7fd6a56
            Author: nathan <nathan>
            Date:   Sat Feb 10 06:33:41 2007 +0000
            
                land b1_5 onto HEAD
            

            Which I thought it use exp connection flag in a wrong way. So we probably just remove these code to save some trouble here. Since I do not think we plan to let 2.6 lustre client to connect some non-inodebit support server (1.4 ?)

            di.wang Di Wang (Inactive) added a comment - Hmm, I think this is the problem for single MDT FS too, i.e. ldlm_cli_enqueue should not use exp connection flag before it make sure the connection is established? This inodebit connect flag is being added even before 1.6 I guess? At least this inodebit interop code is added in 2007 commit 113303973ec9f8484eb2355a1a6ef3c4c7fd6a56 Author: nathan <nathan> Date: Sat Feb 10 06:33:41 2007 +0000 land b1_5 onto HEAD Which I thought it use exp connection flag in a wrong way. So we probably just remove these code to save some trouble here. Since I do not think we plan to let 2.6 lustre client to connect some non-inodebit support server (1.4 ?)

            James, would you please to verify the new patch? Thanks!
            http://review.whamcloud.com/#/c/10958/

            yong.fan nasf (Inactive) added a comment - James, would you please to verify the new patch? Thanks! http://review.whamcloud.com/#/c/10958/

            People

              yong.fan nasf (Inactive)
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: