Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16589

sanityn test_55d: FAIL: (2) mv succeeded

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0, Lustre 2.15.3
    • Lustre 2.16.0, Lustre 2.15.3
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for jianyu <yujian@whamcloud.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/5fae0359-520f-440f-9515-09d401c4ae9c

      test_55d failed with the following error:

      == sanityn test 55d: rename file vs link ================= 17:25:02 (1677173102)
      CMD: onyx-65vm4 /usr/sbin/lctl set_param fail_loc=0x155
      fail_loc=0x155
      ln: failed to create hard link '/mnt/lustre2/d55d.sanityn/d55d.sanityn/' => '/mnt/lustre2/d55d.sanityn/f1': No such file or directory
       sanityn test_55d: @@@@@@ FAIL: (2) mv succeeded
      

      Test session details:
      clients: https://build.whamcloud.com/job/lustre-master/4392 - 5.14.21-150400.24.28-default
      servers: https://build.whamcloud.com/job/lustre-master/4392 - 4.18.0-425.10.1.el8_lustre.x86_64

      <<Please provide additional information about the failure here>>

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      sanityn test_55d - (2) mv succeeded

      Attachments

        Issue Links

          Activity

            [LU-16589] sanityn test_55d: FAIL: (2) mv succeeded
            yujian Jian Yu added a comment -

            Sure, Andreas. I just updated https://review.whamcloud.com/50127 to fix sanityn/55d. I'm working on a separate patch to add a new subtest for "ln". BTW, I just found the actual version of coreutils that started containing the "ln" change is 8.31.

            yujian Jian Yu added a comment - Sure, Andreas. I just updated https://review.whamcloud.com/50127 to fix sanityn/55d. I'm working on a separate patch to add a new subtest for " ln ". BTW, I just found the actual version of coreutils that started containing the " ln " change is 8.31.

            Jian, then it seems this bug has existed for so long and could even be considered an issue in the new coreutils? At least it is fixed in newer kernels, and it seems like a very uncommon use case, so it isn't clear that putting a ton of effort into fixing it in Lustre is worthwhile. Neil can work on backporting the fix to SLES15 (if needed), and James or someone can file a ticket to fix it in RHEL8/9.

            It seems the right thing for now is to fix the test (either add "/f1" at the end or remove the trailing "/") so that it is executing the original intent of the patch that added it.

            A separate patch should be made with a new subtest for only this "ln" case. This new subtest should be skipped for new coreutils and old kernels:

                    local ln_ver=$(ln --version | awk '/coreutils/ { print $4 }')
            
                    (( $(version_code $ln_ver) < $(version code 8.32) )) ||
                    (( $(version_code $(uname -r)) >= $(version_code 5.18) )) ||
                            skip "need coreutils < 8.32 or kernel >= 5.18 for ln"
            
                    touch $DIR/$tfile || error "create failed"
                    mkdir $DIR/$tdir || error "mkdir failed"
                    ln $DIR/$tfile $DIR/tdir/ || error "ln to '$tdir/' failed"
                    

            This will at least add test coverage for newer kernels and keep it for old coreutils, so that it doesn't regress in the future.  Additional checks can be added to allow running the test when we know particular distro kernels are fixed. 

            adilger Andreas Dilger added a comment - Jian, then it seems this bug has existed for so long and could even be considered an issue in the new coreutils? At least it is fixed in newer kernels, and it seems like a very uncommon use case, so it isn't clear that putting a ton of effort into fixing it in Lustre is worthwhile. Neil can work on backporting the fix to SLES15 (if needed), and James or someone can file a ticket to fix it in RHEL8/9. It seems the right thing for now is to fix the test (either add "/f1" at the end or remove the trailing "/") so that it is executing the original intent of the patch that added it. A separate patch should be made with a new subtest for only this "ln" case. This new subtest should be skipped for new coreutils and old kernels: local ln_ver=$(ln --version | awk '/coreutils/ { print $4 }' ) (( $(version_code $ln_ver) < $(version code 8.32) )) || (( $(version_code $(uname -r)) >= $(version_code 5.18) )) || skip "need coreutils < 8.32 or kernel >= 5.18 for ln" touch $DIR/$tfile || error "create failed" mkdir $DIR/$tdir || error "mkdir failed" ln $DIR/$tfile $DIR/tdir/ || error "ln to '$tdir/' failed" This will at least add test coverage for newer kernels and keep it for old coreutils, so that it doesn't regress in the future.  Additional checks can be added to allow running the test when we know particular distro kernels are fixed. 
            yujian Jian Yu added a comment -

            I'm not totally sure what is the oldest version affected

            Even on RHEL 7.9 (with kernel 3.10.0-1160), after changing ln from version 8.22 to 8.32, the "ln /mnt/lustre/tfile /mnt/lustre/tdir/" command also failed with -ENOENT.
            This means the kernel issue exists in all of the vendor kernels we support now.

            yujian Jian Yu added a comment - I'm not totally sure what is the oldest version affected Even on RHEL 7.9 (with kernel 3.10.0-1160), after changing ln from version 8.22 to 8.32, the " ln /mnt/lustre/tfile /mnt/lustre/tdir/ " command also failed with -ENOENT . This means the kernel issue exists in all of the vendor kernels we support now.
            neilb Neil Brown added a comment -

            > Would it work to change ll_lookup_nd() to skip the optimization if flags contains LOOKUP_CREATE | LOOKUP_EXCL and LOOKUP_CREATE + !LOOKUP_OPEN + !LOOKUP_EXCL still use the optimization?

            I don't think ->lookup is ever called with LOOKUP_CREATE but not LOOKUP_EXCL.  And if it is, it is every likely to include LOOKUP_OPEN.  Most syscalls that create names give an error (EEXIST) if the name exists.  The only exception is open(O_CREATE).

            Do we really need to fix this is lustre?  It is clearly not a lustre bug.  If we want test_55d to succeed more reliably we can just change the ln command to "ln $DIR2/$tdir/f1 $DIR2/$tdir/$tdir/f1" as Andreas suggested.  That performs exactly the same effective test, but avoids the kernel bug.

             

            neilb Neil Brown added a comment - > Would it work to change ll_lookup_nd() to skip the optimization if flags contains LOOKUP_CREATE | LOOKUP_EXCL and LOOKUP_CREATE + !LOOKUP_OPEN + !LOOKUP_EXCL still use the optimization? I don't think ->lookup is ever called with LOOKUP_CREATE but not LOOKUP_EXCL.  And if it is, it is every likely to include LOOKUP_OPEN.  Most syscalls that create names give an error (EEXIST) if the name exists.  The only exception is open(O_CREATE). Do we really need to fix this is lustre?  It is clearly not a lustre bug.  If we want test_55d to succeed more reliably we can just change the ln command to " ln $DIR2/$tdir/f1 $DIR2/$tdir/$tdir/f1 " as Andreas suggested.  That performs exactly the same effective test, but avoids the kernel bug.  

            This workaround should go under an #ifdef so that the optimization is only disabled for kernels that are affected by it. It definitely is not needed for kernels newer than 5.18, but I'm not totally sure what is the oldest version affected, or if there is a #define we can check that was introduced in the original patch. This is also complicated by vendor backports, so there may need to be both a vanilla kernel version check and a vendor version check. 

            adilger Andreas Dilger added a comment - This workaround should go under an #ifdef so that the optimization is only disabled for kernels that are affected by it. It definitely is not needed for kernels newer than 5.18, but I'm not totally sure what is the oldest version affected, or if there is a #define we can check that was introduced in the original patch. This is also complicated by vendor backports, so there may need to be both a vanilla kernel version check and a vendor version check. 
            bobijam Zhenyu Xu added a comment -

            Would it work to change ll_lookup_nd() to skip the optimization if flags contains LOOKUP_CREATE | LOOKUP_EXCL and LOOKUP_CREATE + !LOOKUP_OPEN + !LOOKUP_EXCL still use the optimization?

            bobijam Zhenyu Xu added a comment - Would it work to change ll_lookup_nd() to skip the optimization if flags contains LOOKUP_CREATE | LOOKUP_EXCL and LOOKUP_CREATE + !LOOKUP_OPEN + !LOOKUP_EXCL still use the optimization?
            yujian Jian Yu added a comment -

            The only way I can think of to work around the problem in lustre is to skip the optimization in ll_lookup_nd() for a CREATE that isn't an OPEN.

            The optimization was added by https://review.whamcloud.com/8257 ("LU-4185 llite: Revise create with no open optimization"):

            commit a2d5b2e83c0a512a3ea59698e8481621ab5856c2
            Author:     Bobi Jam <bobijam@whamcloud.com>
            AuthorDate: Wed Nov 13 15:56:25 2013 +0800
            Commit:     Oleg Drokin <green@whamcloud.com>
            CommitDate: Wed Oct 5 03:51:18 2016 +0000
            
                LU-4185 llite: Revise create with no open optimization
                
                Currently ll_lookup_nd just returns a negative when we are trying
                to create something with no open (read mkdir). This is all fine most
                of the cases, except if the directory where we are trying to do is
                not writeable by us. In that case vfs_create would return EPERM
                seeing as how a negative dentry means the create cannot proceed.
                But in reality if there is an existing name there that we just did
                not have cached, the proper return is EEXIST that could only happen
                if we did do the lookup.
                
                So amend the optimization to only take place if the directory is
                writeable by us, otherwise do the full lookup.
            
            yujian Jian Yu added a comment - The only way I can think of to work around the problem in lustre is to skip the optimization in ll_lookup_nd() for a CREATE that isn't an OPEN. The optimization was added by https://review.whamcloud.com/8257 (" LU-4185 llite: Revise create with no open optimization"): commit a2d5b2e83c0a512a3ea59698e8481621ab5856c2 Author: Bobi Jam <bobijam@whamcloud.com> AuthorDate: Wed Nov 13 15:56:25 2013 +0800 Commit: Oleg Drokin <green@whamcloud.com> CommitDate: Wed Oct 5 03:51:18 2016 +0000 LU-4185 llite: Revise create with no open optimization Currently ll_lookup_nd just returns a negative when we are trying to create something with no open (read mkdir). This is all fine most of the cases, except if the directory where we are trying to do is not writeable by us. In that case vfs_create would return EPERM seeing as how a negative dentry means the create cannot proceed. But in reality if there is an existing name there that we just did not have cached, the proper return is EEXIST that could only happen if we did do the lookup. So amend the optimization to only take place if the directory is writeable by us, otherwise do the full lookup.
            yujian Jian Yu added a comment -

            Thank you Andreas and Neil for the detailed analysis.

            It would be interesting to see what the test does with "ln $DIR2/$tdir/f1 $DIR2/$tdir/$tdir/f1" (i.e. with the target filename specified)?

            Test passed on SLES 15 SP4 client (with kernel 5.14.21-150400.24.28-default):

            == sanityn test 55d: rename file vs link ============================================================= 05:52:19 (1678254739)
            CMD: trevis-86vm8 /usr/sbin/lctl set_param fail_loc=0x155
            fail_loc=0x155
            mv: '/mnt/lustre/d55d.sanityn/f1' and '/mnt/lustre/d55d.sanityn/d55d.sanityn/f1' are the same file
            Resetting fail_loc on all nodes...CMD: trevis-86vm7.trevis.whamcloud.com,trevis-86vm8,trevis-87vm7 lctl set_param -n fail_loc=0             fail_val=0 2>/dev/null
            done.
            CMD: trevis-86vm7.trevis.whamcloud.com /usr/sbin/lctl get_param catastrophe 2>&1
            CMD: trevis-86vm8 /usr/sbin/lctl get_param catastrophe 2>&1
            CMD: trevis-87vm7 /usr/sbin/lctl get_param catastrophe 2>&1
            CMD: trevis-86vm7.trevis.whamcloud.com,trevis-86vm8,trevis-87vm7 dmesg
            PASS 55d (9s)
            
            yujian Jian Yu added a comment - Thank you Andreas and Neil for the detailed analysis. It would be interesting to see what the test does with "ln $DIR2/$tdir/f1 $DIR2/$tdir/$tdir/f1" (i.e. with the target filename specified)? Test passed on SLES 15 SP4 client (with kernel 5.14.21-150400.24.28-default): == sanityn test 55d: rename file vs link ============================================================= 05:52:19 (1678254739) CMD: trevis-86vm8 /usr/sbin/lctl set_param fail_loc=0x155 fail_loc=0x155 mv: '/mnt/lustre/d55d.sanityn/f1' and '/mnt/lustre/d55d.sanityn/d55d.sanityn/f1' are the same file Resetting fail_loc on all nodes...CMD: trevis-86vm7.trevis.whamcloud.com,trevis-86vm8,trevis-87vm7 lctl set_param -n fail_loc=0 fail_val=0 2>/dev/null done. CMD: trevis-86vm7.trevis.whamcloud.com /usr/sbin/lctl get_param catastrophe 2>&1 CMD: trevis-86vm8 /usr/sbin/lctl get_param catastrophe 2>&1 CMD: trevis-87vm7 /usr/sbin/lctl get_param catastrophe 2>&1 CMD: trevis-86vm7.trevis.whamcloud.com,trevis-86vm8,trevis-87vm7 dmesg PASS 55d (9s)
            neilb Neil Brown added a comment -

            Hi Andreas,

             I think it likely that you have identified the problem.  Especially if it is possible that the "rename" happening in the background might have called d_lustre_invalidate() on the original dentry.

            I think it unlikely that the filename_parentat() change is relevant.  That is an internal api in namei.c

            The only way I can think of to work around the problem in lustre is to skip the optimisation in ll_lookup_nd() for a CREATE that isn't an OPEN.  You might be able to detect this particular case by seeing there is an invalid dentry still present.  Maybe.

            I really should apply that patch to SP4.  We have it in SP3.

             

            neilb Neil Brown added a comment - Hi Andreas,  I think it likely that you have identified the problem.  Especially if it is possible that the "rename" happening in the background might have called d_lustre_invalidate() on the original dentry. I think it unlikely that the filename_parentat() change is relevant.  That is an internal api in namei.c The only way I can think of to work around the problem in lustre is to skip the optimisation in ll_lookup_nd() for a CREATE that isn't an OPEN.  You might be able to detect this particular case by seeing there is an invalid dentry still present.  Maybe. I really should apply that patch to SP4.  We have it in SP3.  

            Jian, looking at what test_55d is doing, it is unclear why the linkat()->filename_create() operation is returning -ENOENT? The directory does exist, since it was created with mkdir on the previous line, so either the mkdir $DIR/$tdir/$tdir operation should have instantiated this directory into dcache, or the linkat() syscall should have looked up the second $tdir component during traversal.

            It would be interesting to see what the test does with "ln $DIR2/$tdir/f1 $DIR2/$tdir/$tdir/f1" (i.e. with the target filename specified)? That is really what the "ln" command is trying to do - link f1 into the second $tdir directory so that the delayed mv $DIR2/$tdir/f1 $DIR2/$tdir/$tdir that was trying to create a file named $tdir cannot succeed when there is later a directory named $tdir.

            Even if that allows the test to pass (which would be slightly better than the "pre-stat"), there is still a dcache bug in there somewhere, because the kernel lookup of the second $tdir/ should not have failed.

            What else is interesting is that newer kernels (since v5.18-rc2-188-gb3d4650d82c7) have slightly different code here:

            static struct dentry *filename_create(int dfd, struct filename *name,
                                                  struct path *path, unsigned int lookup_flags)
            {
                    unsigned int create_flags = LOOKUP_CREATE | LOOKUP_EXCL;
                    :
                    :
                    /*
                     * Special case - lookup gave negative, but... we had foo/bar/
                     * From the vfs_mknod() POV we just have a negative dentry -
                     * all is fine. Let's be bastards - you had / on the end, you've
                     * been asking for (non-existent) directory. -ENOENT for you.
                     */
                    if (unlikely(!create_flags)) {
                            error = -ENOENT;
                            goto fail;
                    }
            

            so it would be interesting to know what create_flags are being passed from Lustre?

            It just happens that the commit v5.18-rc2-188-gb3d4650d82c7 that changed this code was written by Neil Brown, whom I've CC'd here in case he has some insight into what is going wrong here. It definitely seems from the commit message that this is a similar situation being hit in Lustre as was previously hit by NFS:

                VFS: filename_create(): fix incorrect intent.
                
                When asked to create a path ending '/', but which is not to be a
                directory (LOOKUP_DIRECTORY not set), filename_create() will never try
                to create the file.  If it doesn't exist, -ENOENT is reported.
                
                However, it still passes LOOKUP_CREATE|LOOKUP_EXCL to the filesystems
                ->lookup() function, even though there is no intent to create.  This is
                misleading and can cause incorrect behaviour.
                
                If you try
                
                   ln -s foo /path/dir/
                
                where 'dir' is a directory on an NFS filesystem which is not currently
                known in the dcache, this will fail with ENOENT.
                
                But as the name is not in the dcache, nfs_lookup gets called with
                LOOKUP_CREATE|LOOKUP_EXCL and so it returns NULL without performing any
                lookup, with the expectation that a subsequent call to create the target
                will be made, and the lookup can be combined with the creation.  In the
                case with a trailing '/' and no LOOKUP_DIRECTORY, that call is never
                made.  Instead filename_create() sees that the dentry is not (yet)
                positive and returns -ENOENT - even though the directory actually
                exists.
                
                So only set LOOKUP_CREATE|LOOKUP_EXCL if there really is an intent to
                create, and use the absence of these flags to decide if -ENOENT should
                be returned.
                
                Note that filename_parentat() is only interested in LOOKUP_REVAL, so we
                split that out and store it in 'reval_flag'.  __lookup_hash() then gets
                reval_flag combined with whatever create flags were determined to be
                needed.
            

            It isn't clear whether we can add a workaround in the Lustre ->lookup() method to handle this case or only fix the test (to add the /f1 component at the end) and get the client distros to apply Neil's patch. It looks like both SLES15sp4 and RHEL9.x are using a 5.14-based kernel, and are missing the v5.18-rc2-188-gb3d4650d82c7 patch, or we are somehow not handling the LOOKUP_* flags properly in Lustre. The changed code may have been affected by v5.14-rc7-68-g0ee50b47532a "namei: change filename_parentat() calling conventions", so it is possible we didn't update to those new conventions?

            adilger Andreas Dilger added a comment - Jian, looking at what test_55d is doing, it is unclear why the linkat()->filename_create() operation is returning -ENOENT ? The directory does exist, since it was created with mkdir on the previous line, so either the mkdir $DIR/$tdir/$tdir operation should have instantiated this directory into dcache, or the linkat() syscall should have looked up the second $tdir component during traversal. It would be interesting to see what the test does with " ln $DIR2/$tdir/f1 $DIR2/$tdir/$tdir/f1 " (i.e. with the target filename specified)? That is really what the " ln " command is trying to do - link f1 into the second $tdir directory so that the delayed mv $DIR2/$tdir/f1 $DIR2/$tdir/$tdir that was trying to create a file named $tdir cannot succeed when there is later a directory named $tdir . Even if that allows the test to pass (which would be slightly better than the "pre-stat"), there is still a dcache bug in there somewhere, because the kernel lookup of the second $tdir/ should not have failed. What else is interesting is that newer kernels (since v5.18-rc2-188-gb3d4650d82c7) have slightly different code here: static struct dentry *filename_create( int dfd, struct filename *name, struct path *path, unsigned int lookup_flags) { unsigned int create_flags = LOOKUP_CREATE | LOOKUP_EXCL; : : /* * Special case - lookup gave negative, but... we had foo/bar/ * From the vfs_mknod() POV we just have a negative dentry - * all is fine. Let 's be bastards - you had / on the end, you' ve * been asking for (non-existent) directory. -ENOENT for you. */ if (unlikely(!create_flags)) { error = -ENOENT; goto fail; } so it would be interesting to know what create_flags are being passed from Lustre? It just happens that the commit v5.18-rc2-188-gb3d4650d82c7 that changed this code was written by Neil Brown, whom I've CC'd here in case he has some insight into what is going wrong here. It definitely seems from the commit message that this is a similar situation being hit in Lustre as was previously hit by NFS: VFS: filename_create(): fix incorrect intent. When asked to create a path ending '/', but which is not to be a directory (LOOKUP_DIRECTORY not set), filename_create() will never try to create the file. If it doesn't exist, -ENOENT is reported. However, it still passes LOOKUP_CREATE|LOOKUP_EXCL to the filesystems ->lookup() function, even though there is no intent to create. This is misleading and can cause incorrect behaviour. If you try ln -s foo /path/dir/ where 'dir' is a directory on an NFS filesystem which is not currently known in the dcache, this will fail with ENOENT. But as the name is not in the dcache, nfs_lookup gets called with LOOKUP_CREATE|LOOKUP_EXCL and so it returns NULL without performing any lookup, with the expectation that a subsequent call to create the target will be made, and the lookup can be combined with the creation. In the case with a trailing '/' and no LOOKUP_DIRECTORY, that call is never made. Instead filename_create() sees that the dentry is not (yet) positive and returns -ENOENT - even though the directory actually exists. So only set LOOKUP_CREATE|LOOKUP_EXCL if there really is an intent to create, and use the absence of these flags to decide if -ENOENT should be returned. Note that filename_parentat() is only interested in LOOKUP_REVAL, so we split that out and store it in 'reval_flag'. __lookup_hash() then gets reval_flag combined with whatever create flags were determined to be needed. It isn't clear whether we can add a workaround in the Lustre ->lookup() method to handle this case or only fix the test (to add the /f1 component at the end) and get the client distros to apply Neil's patch. It looks like both SLES15sp4 and RHEL9.x are using a 5.14-based kernel, and are missing the v5.18-rc2-188-gb3d4650d82c7 patch, or we are somehow not handling the LOOKUP_* flags properly in Lustre. The changed code may have been affected by v5.14-rc7-68-g0ee50b47532a " namei: change filename_parentat() calling conventions ", so it is possible we didn't update to those new conventions?
            yujian Jian Yu added a comment -

            By comparing the attached Ftrace outputs (with and without the trailing "/", --max-graph-depth is 6), and the source code of filename_create() in kernel fs/namei.c, we can see the ENOENT error was returned as follows:

            fs/namei.c
            static struct dentry *filename_create(int dfd, struct filename *name,
                                            struct path *path, unsigned int lookup_flags)
            {
                    // ......
                    /*
                     * Special case - lookup gave negative, but... we had foo/bar/
                     * From the vfs_mknod() POV we just have a negative dentry -
                     * all is fine. Let's be bastards - you had / on the end, you've
                     * been asking for (non-existent) directory. -ENOENT for you.
                     */
                    if (unlikely(!is_dir && last.name[last.len])) {
                            error = -ENOENT;
                            goto fail;
                    }
                    // ......
            }
            

            ftrace.with_slash.depth_6.txt ftrace.without_slash.depth_6.txt

            yujian Jian Yu added a comment - By comparing the attached Ftrace outputs (with and without the trailing " / ", --max-graph-depth is 6), and the source code of filename_create() in kernel fs/namei.c , we can see the ENOENT error was returned as follows: fs/namei.c static struct dentry *filename_create( int dfd, struct filename *name, struct path *path, unsigned int lookup_flags) { // ...... /* * Special case - lookup gave negative, but... we had foo/bar/ * From the vfs_mknod() POV we just have a negative dentry - * all is fine. Let 's be bastards - you had / on the end, you' ve * been asking for (non-existent) directory. -ENOENT for you. */ if (unlikely(!is_dir && last.name[last.len])) { error = -ENOENT; goto fail; } // ...... } ftrace.with_slash.depth_6.txt ftrace.without_slash.depth_6.txt

            People

              yujian Jian Yu
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: