Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4971

sanity-scrub test_2: ldlm_lock2desc()) ASSERTION( lock->l_policy_data.l_inodebits.bits == (MDS_INODELOCK_LOOKUP | MDS_INODELOCK_UPDATE | MDS_INODELOCK_LAYOUT)failed

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.6.0
    • Lustre 2.6.0, Lustre 2.5.4
    • None
    • 3
    • 13763

    Description

      This issue was created by maloo for Nathaniel Clark <nathaniel.l.clark@intel.com>

      This issue relates to the following test suite run:
      http://maloo.whamcloud.com/test_sets/598caeaa-cd2c-11e3-b548-52540035b04c
      https://maloo.whamcloud.com/test_sets/fc94ece0-a552-11e3-9fee-52540035b04c

      The sub-test test_2 failed with the following error:

      test failed to respond and timed out

      Info required for matching: sanity-scrub 2

      Attachments

        Issue Links

          Activity

            [LU-4971] sanity-scrub test_2: ldlm_lock2desc()) ASSERTION( lock->l_policy_data.l_inodebits.bits == (MDS_INODELOCK_LOOKUP | MDS_INODELOCK_UPDATE | MDS_INODELOCK_LAYOUT)failed

            Even though we drop the incompatible ibits, we still needs to prevent the user to use the non-initialised export, otherwise, there may be other potential bugs.

            yong.fan nasf (Inactive) added a comment - Even though we drop the incompatible ibits, we still needs to prevent the user to use the non-initialised export, otherwise, there may be other potential bugs.

            Because the key issue is that we are checking un-iniitialized export flags.

            Yes, but that is for checking if the server support inodebit lock. But since 2.6 client will be only connected to inodebit enable server, so why do we need this check? TBH, I really do not like add "connection wait logic" above ptlrpc, which might bring us troubles or even worse.

            di.wang Di Wang (Inactive) added a comment - Because the key issue is that we are checking un-iniitialized export flags. Yes, but that is for checking if the server support inodebit lock. But since 2.6 client will be only connected to inodebit enable server, so why do we need this check? TBH, I really do not like add "connection wait logic" above ptlrpc, which might bring us troubles or even worse.

            Wangdi, I have verified the race conditions locally. But I am not sure whether removing the incompatible bits is a suitable solution or not. Because the key issue is that we are checking un-iniitialized export flags.

            yong.fan nasf (Inactive) added a comment - Wangdi, I have verified the race conditions locally. But I am not sure whether removing the incompatible bits is a suitable solution or not. Because the key issue is that we are checking un-iniitialized export flags.

            Ah, I see. Hmm, I still think remove those incompatible inodebit handling is a better choice, or can we do sth like this

            root@testnode lustre-release]# git diff lustre/ldlm/
            diff --git a/lustre/ldlm/ldlm_request.c b/lustre/ldlm/ldlm_request.c
            index 76f329d..2c0d05d 100644
            --- a/lustre/ldlm/ldlm_request.c
            +++ b/lustre/ldlm/ldlm_request.c
            @@ -909,7 +909,7 @@ int ldlm_cli_enqueue(struct obd_export *exp, struct ptlrpc_request **reqp,
                                          OBD_CONNECT_IBITS))
                                             lock->l_policy_data.l_inodebits.bits =
                                                     MDS_INODELOCK_LOOKUP |
            -                                        MDS_INODELOCK_UPDATE;
            +                                        MDS_INODELOCK_UPDATE | MDS_INODELOCK_LAYOUT;
                                     else
                                             lock->l_policy_data = *policy;
                             }
            
            di.wang Di Wang (Inactive) added a comment - Ah, I see. Hmm, I still think remove those incompatible inodebit handling is a better choice, or can we do sth like this root@testnode lustre-release]# git diff lustre/ldlm/ diff --git a/lustre/ldlm/ldlm_request.c b/lustre/ldlm/ldlm_request.c index 76f329d..2c0d05d 100644 --- a/lustre/ldlm/ldlm_request.c +++ b/lustre/ldlm/ldlm_request.c @@ -909,7 +909,7 @@ int ldlm_cli_enqueue(struct obd_export *exp, struct ptlrpc_request **reqp, OBD_CONNECT_IBITS)) lock->l_policy_data.l_inodebits.bits = MDS_INODELOCK_LOOKUP | - MDS_INODELOCK_UPDATE; + MDS_INODELOCK_UPDATE | MDS_INODELOCK_LAYOUT; else lock->l_policy_data = *policy; }

            No, for single MDT case, there is no such race condition. Because the mount process will call md_getstatus() on the ROOT object firstly, it is NOT ldlm_enqueue RPC. Then when the mount return to user-space, it can guarantee that the connection between the client and the MDT0 has been established successfully.

            yong.fan nasf (Inactive) added a comment - No, for single MDT case, there is no such race condition. Because the mount process will call md_getstatus() on the ROOT object firstly, it is NOT ldlm_enqueue RPC. Then when the mount return to user-space, it can guarantee that the connection between the client and the MDT0 has been established successfully.

            Hmm, I think this is the problem for single MDT FS too, i.e. ldlm_cli_enqueue should not use exp connection flag before it make sure the connection is established?
            This inodebit connect flag is being added even before 1.6 I guess? At least this inodebit interop code is added in 2007

            commit 113303973ec9f8484eb2355a1a6ef3c4c7fd6a56
            Author: nathan <nathan>
            Date:   Sat Feb 10 06:33:41 2007 +0000
            
                land b1_5 onto HEAD
            

            Which I thought it use exp connection flag in a wrong way. So we probably just remove these code to save some trouble here. Since I do not think we plan to let 2.6 lustre client to connect some non-inodebit support server (1.4 ?)

            di.wang Di Wang (Inactive) added a comment - Hmm, I think this is the problem for single MDT FS too, i.e. ldlm_cli_enqueue should not use exp connection flag before it make sure the connection is established? This inodebit connect flag is being added even before 1.6 I guess? At least this inodebit interop code is added in 2007 commit 113303973ec9f8484eb2355a1a6ef3c4c7fd6a56 Author: nathan <nathan> Date: Sat Feb 10 06:33:41 2007 +0000 land b1_5 onto HEAD Which I thought it use exp connection flag in a wrong way. So we probably just remove these code to save some trouble here. Since I do not think we plan to let 2.6 lustre client to connect some non-inodebit support server (1.4 ?)

            James, would you please to verify the new patch? Thanks!
            http://review.whamcloud.com/#/c/10958/

            yong.fan nasf (Inactive) added a comment - James, would you please to verify the new patch? Thanks! http://review.whamcloud.com/#/c/10958/

            I have run related tests locally for hundreds of times, but cannot reproduce the failure by myself. So I have to go though all related code. Finally, I found that: it may be not related with data corruption, but more seems race condition between the asynchronous connection and synchronous lock_enqueue operation. Consider the following scenario:

            1) There are two MDTs in the system, when the client mount, it will trigger MDS_CONNECT request to each of the MDT. But the connect request will be handled asynchronously. The mount processing will go ahead when the connection between the client and MDT0 is established, but will not wait for the connection between the client and the MDT1 to be established.

            2) After the client mount returned, the application tries to "stat $MOUNT/a/b", the directory "a" resides on the MDT0, the "b" resides on the MDT1. So the client will send lock_enqueue RPC to the MDT0 firstly.

            3) The RPC handler on the MDT0 finds that the target object "b" resides on the remote MDT1, then it replies the client to try to talk with the MDT1.

            4) The client gets the MDT0's reply, then calls lmv_intent_remote() to forward the request to the MDT1 via MDC1, but at that time, the connection between the client and the MDT1 has not been established yet.

            5) For most of other RPCs, above case is not a problem, because the client-side RPC sender (in ptlrpc) will handle kinds of connection issues. But for the ldlm_enqueue, it is some different, because before giving the ldlm_enqueue RPC to the sender (or to the ptlrpc), the sponsor will check the export connect flags that will not be set until the connection between the client and the MDT has been established. So at such time point, the sponsor of the ldlm_enqueue request will get "0" flags from the export connection flags, then trigger the ASSERTION as described in this ticket.

            So it is not a special bug for OI scrub, but more general issue for the whole Lustre framework. Because the sanity-scrub will access some cross-MDTs objects immediately after the client mount (without any wait), then it is more easy to reproduce the failure.

            About the fixing, I do not want to change the general connection/ldlm framework, because I am afraid of introducing some potential bugs. Instead, the patch will make the ldlm_request request to wait the connection to be established for such race case.

            yong.fan nasf (Inactive) added a comment - I have run related tests locally for hundreds of times, but cannot reproduce the failure by myself. So I have to go though all related code. Finally, I found that: it may be not related with data corruption, but more seems race condition between the asynchronous connection and synchronous lock_enqueue operation. Consider the following scenario: 1) There are two MDTs in the system, when the client mount, it will trigger MDS_CONNECT request to each of the MDT. But the connect request will be handled asynchronously. The mount processing will go ahead when the connection between the client and MDT0 is established, but will not wait for the connection between the client and the MDT1 to be established. 2) After the client mount returned, the application tries to "stat $MOUNT/a/b", the directory "a" resides on the MDT0, the "b" resides on the MDT1. So the client will send lock_enqueue RPC to the MDT0 firstly. 3) The RPC handler on the MDT0 finds that the target object "b" resides on the remote MDT1, then it replies the client to try to talk with the MDT1. 4) The client gets the MDT0's reply, then calls lmv_intent_remote() to forward the request to the MDT1 via MDC1, but at that time, the connection between the client and the MDT1 has not been established yet. 5) For most of other RPCs, above case is not a problem, because the client-side RPC sender (in ptlrpc) will handle kinds of connection issues. But for the ldlm_enqueue, it is some different, because before giving the ldlm_enqueue RPC to the sender (or to the ptlrpc), the sponsor will check the export connect flags that will not be set until the connection between the client and the MDT has been established. So at such time point, the sponsor of the ldlm_enqueue request will get "0" flags from the export connection flags, then trigger the ASSERTION as described in this ticket. So it is not a special bug for OI scrub, but more general issue for the whole Lustre framework. Because the sanity-scrub will access some cross-MDTs objects immediately after the client mount (without any wait), then it is more easy to reproduce the failure. About the fixing, I do not want to change the general connection/ldlm framework, because I am afraid of introducing some potential bugs. Instead, the patch will make the ldlm_request request to wait the connection to be established for such race case.

            Since this only happens during sanity-scrub, I expect there is some kind of memory corruption or use-after-free problem in the code? It might be useful to run sanity-scrub on a kernel with CONFIG_DEBUG_SLAB and other memory debugging features (CONFIG_DEBUG_SPINLOCK, CONFIG_DEBUG_OBJECTS, CONFIG_DEBUG_KMEMLEAK, CONFIG_DEBUG_MUTEXES, CONFIG_DEBUG_KOBJECT) enabled to see if this triggers a problem.

            adilger Andreas Dilger added a comment - Since this only happens during sanity-scrub, I expect there is some kind of memory corruption or use-after-free problem in the code? It might be useful to run sanity-scrub on a kernel with CONFIG_DEBUG_SLAB and other memory debugging features (CONFIG_DEBUG_SPINLOCK, CONFIG_DEBUG_OBJECTS, CONFIG_DEBUG_KMEMLEAK, CONFIG_DEBUG_MUTEXES, CONFIG_DEBUG_KOBJECT) enabled to see if this triggers a problem.

            The bits may be right, because the former code cannot find OBD_CONNECT_IBITS in the connection_flags, so the layout ibits will be dropped automatically.

            Here we get the connection_flags as "0x0", that is impossible for any connection. So there should be data corruption.

            yong.fan nasf (Inactive) added a comment - The bits may be right, because the former code cannot find OBD_CONNECT_IBITS in the connection_flags, so the layout ibits will be dropped automatically. Here we get the connection_flags as "0x0", that is impossible for any connection. So there should be data corruption.

            Hmm, it has bits 0x3 set, so it is missing the LAYOUT flag. That would be possible if the file is being migrated, or maybe if it is a remote MDT object? I suspect that the LASSERT is wrong here, but I'm not positive yet as I haven't looked closely at the surrounding code.

            adilger Andreas Dilger added a comment - Hmm, it has bits 0x3 set, so it is missing the LAYOUT flag. That would be possible if the file is being migrated, or maybe if it is a remote MDT object? I suspect that the LASSERT is wrong here, but I'm not positive yet as I haven't looked closely at the surrounding code.

            People

              yong.fan nasf (Inactive)
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: