Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3765

2.5.0<->2.1.5 interop: sanity test 24u: (mdt_handler.c:224:mdt_lock_pdo_init()) ASSERTION( namelen > 0 ) failed

Details

    • 3
    • 9699

    Description

      sanity test 24u hit the following failure on MDS:

      11:06:31:Lustre: DEBUG MARKER: == sanity test 24u: create stripe file == 11:06:31 (1376417191)
      11:06:31:LustreError: 13255:0:(mdt_handler.c:224:mdt_lock_pdo_init()) ASSERTION( namelen > 0 ) failed: 
      11:06:31:LustreError: 13255:0:(mdt_handler.c:224:mdt_lock_pdo_init()) LBUG
      11:06:31:Pid: 13255, comm: mdt_01
      11:06:31:
      11:06:31:Call Trace:
      11:06:31: [<ffffffffa04d0785>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      11:06:31: [<ffffffffa04d0d97>] lbug_with_loc+0x47/0xb0 [libcfs]
      11:06:31: [<ffffffffa0bdea65>] mdt_lock_pdo_init+0xe5/0xf0 [mdt]
      11:06:31: [<ffffffffa0c127c6>] mdt_reint_open+0x1f6/0x2940 [mdt]
      11:06:31: [<ffffffffa077b764>] ? lustre_msg_add_version+0x74/0xd0 [ptlrpc]
      11:06:32: [<ffffffffa0ba256e>] ? md_ucred+0x1e/0x60 [mdd]
      11:06:32: [<ffffffffa0be15d5>] ? mdt_ucred+0x15/0x20 [mdt]
      11:06:32: [<ffffffffa0bf84ec>] ? mdt_root_squash+0x2c/0x3e0 [mdt]
      11:06:32: [<ffffffffa0bfcc51>] mdt_reint_rec+0x41/0xe0 [mdt]
      11:06:32: [<ffffffffa0bf3ed4>] mdt_reint_internal+0x544/0x8e0 [mdt]
      11:06:32: [<ffffffffa0bf453d>] mdt_intent_reint+0x1ed/0x500 [mdt]
      11:06:32: [<ffffffffa0bf2c09>] mdt_intent_policy+0x379/0x690 [mdt]
      11:06:32: [<ffffffffa0737391>] ldlm_lock_enqueue+0x361/0x8f0 [ptlrpc]
      11:06:32: [<ffffffffa075d1ed>] ldlm_handle_enqueue0+0x48d/0xf50 [ptlrpc]
      11:06:32: [<ffffffffa0bf3586>] mdt_enqueue+0x46/0x130 [mdt]
      11:06:32: [<ffffffffa0be8772>] mdt_handle_common+0x932/0x1750 [mdt]
      11:06:32: [<ffffffffa0be9665>] mdt_regular_handle+0x15/0x20 [mdt]
      11:06:32: [<ffffffffa078bbae>] ptlrpc_main+0xc4e/0x1a40 [ptlrpc]
      11:06:32: [<ffffffffa078af60>] ? ptlrpc_main+0x0/0x1a40 [ptlrpc]
      11:06:32: [<ffffffff8100c0ca>] child_rip+0xa/0x20
      11:06:32: [<ffffffffa078af60>] ? ptlrpc_main+0x0/0x1a40 [ptlrpc]
      11:06:32: [<ffffffffa078af60>] ? ptlrpc_main+0x0/0x1a40 [ptlrpc]
      11:06:32: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
      11:06:32:
      11:06:32:Kernel panic - not syncing: LBUG
      

      Maloo report: https://maloo.whamcloud.com/test_sets/e3f3b3d8-0525-11e3-8d88-52540035b04c

      More instances:
      https://maloo.whamcloud.com/test_sets/369e054c-0059-11e3-bb00-52540035b04c
      https://maloo.whamcloud.com/test_sets/0bf3fdbc-f8f5-11e2-8917-52540035b04c
      https://maloo.whamcloud.com/test_sets/59b2a818-f504-11e2-a8f6-52540035b04c

      Attachments

        Issue Links

          Activity

            [LU-3765] 2.5.0<->2.1.5 interop: sanity test 24u: (mdt_handler.c:224:mdt_lock_pdo_init()) ASSERTION( namelen > 0 ) failed
            jhammond John Hammond added a comment -

            The MDT is still generally at the mercy of the client to send valid names. Please see http://review.whamcloud.com/#/c/7961/ from LU-2875.

            jhammond John Hammond added a comment - The MDT is still generally at the mercy of the client to send valid names. Please see http://review.whamcloud.com/#/c/7961/ from LU-2875 .

            This problem was introduced by the patches for LU-3544, and is no longer an issue now that the patch has been reverted.

            adilger Andreas Dilger added a comment - This problem was introduced by the patches for LU-3544 , and is no longer an issue now that the patch has been reverted.
            green Oleg Drokin added a comment -

            LU-3544 was reverted from master as well

            green Oleg Drokin added a comment - LU-3544 was reverted from master as well
            laisiyao Lai Siyao added a comment -

            Patch for b2_1 is on: http://review.whamcloud.com/#/c/7627/

            Now it's ready to continue interop test between 2.5 and 2.1 with these three patches.

            laisiyao Lai Siyao added a comment - Patch for b2_1 is on: http://review.whamcloud.com/#/c/7627/ Now it's ready to continue interop test between 2.5 and 2.1 with these three patches.
            pjones Peter Jones added a comment -

            We have reverted LU-3544 from b2_4 for now but are continuing to work on a more complete fix on master

            pjones Peter Jones added a comment - We have reverted LU-3544 from b2_4 for now but are continuing to work on a more complete fix on master
            laisiyao Lai Siyao added a comment -

            Patch for master is on:
            http://review.whamcloud.com/#/c/7475/
            http://review.whamcloud.com/#/c/7476/

            These patches enabled getattr/open-by-fid by default, thus either fid or name is packed in these requests, and server can handle op-by-fid correctly.

            Once these patches are accepted, they need to be backported to 2.4, and also fix 2.1 server code to maintain 2.5 <-> 2.1 interop. I'll continue working on this.

            laisiyao Lai Siyao added a comment - Patch for master is on: http://review.whamcloud.com/#/c/7475/ http://review.whamcloud.com/#/c/7476/ These patches enabled getattr/open-by-fid by default, thus either fid or name is packed in these requests, and server can handle op-by-fid correctly. Once these patches are accepted, they need to be backported to 2.4, and also fix 2.1 server code to maintain 2.5 <-> 2.1 interop. I'll continue working on this.

            Lai,

            Take a look at my latest in LU-3544. I think it belongs there and not here, but it could go in either. It's about the problems with the proposed patch for LU-3765.

            paf Patrick Farrell (Inactive) added a comment - Lai, Take a look at my latest in LU-3544 . I think it belongs there and not here, but it could go in either. It's about the problems with the proposed patch for LU-3765 .
            laisiyao Lai Siyao added a comment -

            IMO once client specified MDS_OPEN_BY_FID, MDS should never with open with name because name may be invalid, or it will cause inconsistency. If this is true, MDS open by fid code can be simplified a lot.

            Patch is on http://review.whamcloud.com/#/c/7358/

            laisiyao Lai Siyao added a comment - IMO once client specified MDS_OPEN_BY_FID, MDS should never with open with name because name may be invalid, or it will cause inconsistency. If this is true, MDS open by fid code can be simplified a lot. Patch is on http://review.whamcloud.com/#/c/7358/
            yujian Jian Yu added a comment -

            This is blocking the whole test session on Lustre b2_4 client with 2.1.6 server:
            https://maloo.whamcloud.com/test_sessions/ac905704-0569-11e3-b127-52540035b04c

            yujian Jian Yu added a comment - This is blocking the whole test session on Lustre b2_4 client with 2.1.6 server: https://maloo.whamcloud.com/test_sessions/ac905704-0569-11e3-b127-52540035b04c
            pjones Peter Jones added a comment -

            Lai

            Could you please help with this one?

            Thanks

            peter

            pjones Peter Jones added a comment - Lai Could you please help with this one? Thanks peter

            This is clearly caused by

            LU-3544 nfs: writing to new files will return ENOENT

            This happend with SLES11SP2 Lustre client, which in turn acts as an
            NFS server, exporting a subtree of an Lustre fs through NFS.

            We detected that whenever we are writing to a new file using, fx,
            'echo blah > newfile', it will return ENOENT error. We found
            out that this was caused by the anonymous dentry. In SLESS11SP2,
            anonymous dentries are assigned '/' as the name, instead of an
            empty string. When MDT handles the intent_open call, it will look
            up the obj by the name if it is not an empty string, and thus
            couldn't find it.

            As MDS_OPEN_BY_FID is always set on this request, we never need
            to send the name in this request. The fid is already available
            and should be used in case the file has been renamed.

            Signed-off-by: Cheng Shao <cheng_shao@xyratex.com>
            Signed-off-by: Patrick Farrell <paf@cray.com>
            Change-Id: Ia8bd6f2814d05350d0a197df8a3ffd9729e2081b
            Reviewed-on: http://review.whamcloud.com/6920
            Reviewed-by: Bob Glossman <bob.glossman@intel.com>
            Tested-by: Hudson
            Reviewed-by: Alexey Shvetsov <alexxy@gentoo.org>
            Reviewed-by: Lai Siyao <lai.siyao@intel.com>
            Tested-by: Maloo <whamcloud.maloo@gmail.com>
            Reviewed-by: James Simmons <uja.ornl@gmail.com>
            Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

            In this patch, it stops sending name for open by FID and set lovea(test 24u) request. But 2.1.5 server can not handle
            this correctly. So we either

            1. fix 2.1.5 server to handle this zero name length issue. please check open_by_fid part in mdt_reint_open.
            2. or fix b2_4 client to add open lock flag for lovea setting req, which can avoid the problem as well, IMHO.

            di.wang Di Wang (Inactive) added a comment - This is clearly caused by LU-3544 nfs: writing to new files will return ENOENT This happend with SLES11SP2 Lustre client, which in turn acts as an NFS server, exporting a subtree of an Lustre fs through NFS. We detected that whenever we are writing to a new file using, fx, 'echo blah > newfile', it will return ENOENT error. We found out that this was caused by the anonymous dentry. In SLESS11SP2, anonymous dentries are assigned '/' as the name, instead of an empty string. When MDT handles the intent_open call, it will look up the obj by the name if it is not an empty string, and thus couldn't find it. As MDS_OPEN_BY_FID is always set on this request, we never need to send the name in this request. The fid is already available and should be used in case the file has been renamed. Signed-off-by: Cheng Shao <cheng_shao@xyratex.com> Signed-off-by: Patrick Farrell <paf@cray.com> Change-Id: Ia8bd6f2814d05350d0a197df8a3ffd9729e2081b Reviewed-on: http://review.whamcloud.com/6920 Reviewed-by: Bob Glossman <bob.glossman@intel.com> Tested-by: Hudson Reviewed-by: Alexey Shvetsov <alexxy@gentoo.org> Reviewed-by: Lai Siyao <lai.siyao@intel.com> Tested-by: Maloo <whamcloud.maloo@gmail.com> Reviewed-by: James Simmons <uja.ornl@gmail.com> Reviewed-by: Oleg Drokin <oleg.drokin@intel.com> In this patch, it stops sending name for open by FID and set lovea(test 24u) request. But 2.1.5 server can not handle this correctly. So we either 1. fix 2.1.5 server to handle this zero name length issue. please check open_by_fid part in mdt_reint_open. 2. or fix b2_4 client to add open lock flag for lovea setting req, which can avoid the problem as well, IMHO.

            People

              laisiyao Lai Siyao
              yujian Jian Yu
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: