Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16589

sanityn test_55d: FAIL: (2) mv succeeded

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0, Lustre 2.15.3
    • Lustre 2.16.0, Lustre 2.15.3
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for jianyu <yujian@whamcloud.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/5fae0359-520f-440f-9515-09d401c4ae9c

      test_55d failed with the following error:

      == sanityn test 55d: rename file vs link ================= 17:25:02 (1677173102)
      CMD: onyx-65vm4 /usr/sbin/lctl set_param fail_loc=0x155
      fail_loc=0x155
      ln: failed to create hard link '/mnt/lustre2/d55d.sanityn/d55d.sanityn/' => '/mnt/lustre2/d55d.sanityn/f1': No such file or directory
       sanityn test_55d: @@@@@@ FAIL: (2) mv succeeded
      

      Test session details:
      clients: https://build.whamcloud.com/job/lustre-master/4392 - 5.14.21-150400.24.28-default
      servers: https://build.whamcloud.com/job/lustre-master/4392 - 4.18.0-425.10.1.el8_lustre.x86_64

      <<Please provide additional information about the failure here>>

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      sanityn test_55d - (2) mv succeeded

      Attachments

        Issue Links

          Activity

            [LU-16589] sanityn test_55d: FAIL: (2) mv succeeded
            pjones Peter Jones added a comment -

            Landed for 2.16

            pjones Peter Jones added a comment - Landed for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50348/
            Subject: LU-16589 tests: fix hard-link failure in sanityn/55d
            Project: fs/lustre-release
            Branch: b2_15
            Current Patch Set:
            Commit: b01209416cb73e06d50a2bf00855e56fcc37ed02

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50348/ Subject: LU-16589 tests: fix hard-link failure in sanityn/55d Project: fs/lustre-release Branch: b2_15 Current Patch Set: Commit: b01209416cb73e06d50a2bf00855e56fcc37ed02

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50265/
            Subject: LU-16589 tests: add sanity/31l to test ln command
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: e998d21caf99e32495950219e88dd9e7f981363e

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50265/ Subject: LU-16589 tests: add sanity/31l to test ln command Project: fs/lustre-release Branch: master Current Patch Set: Commit: e998d21caf99e32495950219e88dd9e7f981363e

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50127/
            Subject: LU-16589 tests: fix hard-link failure in sanityn/55d
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 25c6b7ad2859729197c3cc6e6dcf0621e4bda6fa

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50127/ Subject: LU-16589 tests: fix hard-link failure in sanityn/55d Project: fs/lustre-release Branch: master Current Patch Set: Commit: 25c6b7ad2859729197c3cc6e6dcf0621e4bda6fa

            "Jian Yu <yujian@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50348
            Subject: LU-16589 tests: fix hard-link failure in sanityn/55d
            Project: fs/lustre-release
            Branch: b2_15
            Current Patch Set: 1
            Commit: 5b3526d01808b0dd270ae0655918fa3b7fc4f941

            gerrit Gerrit Updater added a comment - "Jian Yu <yujian@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50348 Subject: LU-16589 tests: fix hard-link failure in sanityn/55d Project: fs/lustre-release Branch: b2_15 Current Patch Set: 1 Commit: 5b3526d01808b0dd270ae0655918fa3b7fc4f941

            "Jian Yu <yujian@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50265
            Subject: LU-16589 tests: add sanity/31l to test ln command
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 8b6e20e63be8edc4fda22f2318fa8df350c7297b

            gerrit Gerrit Updater added a comment - "Jian Yu <yujian@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50265 Subject: LU-16589 tests: add sanity/31l to test ln command Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 8b6e20e63be8edc4fda22f2318fa8df350c7297b
            yujian Jian Yu added a comment -

            Sure, Andreas. I just updated https://review.whamcloud.com/50127 to fix sanityn/55d. I'm working on a separate patch to add a new subtest for "ln". BTW, I just found the actual version of coreutils that started containing the "ln" change is 8.31.

            yujian Jian Yu added a comment - Sure, Andreas. I just updated https://review.whamcloud.com/50127 to fix sanityn/55d. I'm working on a separate patch to add a new subtest for " ln ". BTW, I just found the actual version of coreutils that started containing the " ln " change is 8.31.

            Jian, then it seems this bug has existed for so long and could even be considered an issue in the new coreutils? At least it is fixed in newer kernels, and it seems like a very uncommon use case, so it isn't clear that putting a ton of effort into fixing it in Lustre is worthwhile. Neil can work on backporting the fix to SLES15 (if needed), and James or someone can file a ticket to fix it in RHEL8/9.

            It seems the right thing for now is to fix the test (either add "/f1" at the end or remove the trailing "/") so that it is executing the original intent of the patch that added it.

            A separate patch should be made with a new subtest for only this "ln" case. This new subtest should be skipped for new coreutils and old kernels:

                    local ln_ver=$(ln --version | awk '/coreutils/ { print $4 }')
            
                    (( $(version_code $ln_ver) < $(version code 8.32) )) ||
                    (( $(version_code $(uname -r)) >= $(version_code 5.18) )) ||
                            skip "need coreutils < 8.32 or kernel >= 5.18 for ln"
            
                    touch $DIR/$tfile || error "create failed"
                    mkdir $DIR/$tdir || error "mkdir failed"
                    ln $DIR/$tfile $DIR/tdir/ || error "ln to '$tdir/' failed"
                    

            This will at least add test coverage for newer kernels and keep it for old coreutils, so that it doesn't regress in the future.  Additional checks can be added to allow running the test when we know particular distro kernels are fixed. 

            adilger Andreas Dilger added a comment - Jian, then it seems this bug has existed for so long and could even be considered an issue in the new coreutils? At least it is fixed in newer kernels, and it seems like a very uncommon use case, so it isn't clear that putting a ton of effort into fixing it in Lustre is worthwhile. Neil can work on backporting the fix to SLES15 (if needed), and James or someone can file a ticket to fix it in RHEL8/9. It seems the right thing for now is to fix the test (either add "/f1" at the end or remove the trailing "/") so that it is executing the original intent of the patch that added it. A separate patch should be made with a new subtest for only this "ln" case. This new subtest should be skipped for new coreutils and old kernels: local ln_ver=$(ln --version | awk '/coreutils/ { print $4 }' ) (( $(version_code $ln_ver) < $(version code 8.32) )) || (( $(version_code $(uname -r)) >= $(version_code 5.18) )) || skip "need coreutils < 8.32 or kernel >= 5.18 for ln" touch $DIR/$tfile || error "create failed" mkdir $DIR/$tdir || error "mkdir failed" ln $DIR/$tfile $DIR/tdir/ || error "ln to '$tdir/' failed" This will at least add test coverage for newer kernels and keep it for old coreutils, so that it doesn't regress in the future.  Additional checks can be added to allow running the test when we know particular distro kernels are fixed. 
            yujian Jian Yu added a comment -

            I'm not totally sure what is the oldest version affected

            Even on RHEL 7.9 (with kernel 3.10.0-1160), after changing ln from version 8.22 to 8.32, the "ln /mnt/lustre/tfile /mnt/lustre/tdir/" command also failed with -ENOENT.
            This means the kernel issue exists in all of the vendor kernels we support now.

            yujian Jian Yu added a comment - I'm not totally sure what is the oldest version affected Even on RHEL 7.9 (with kernel 3.10.0-1160), after changing ln from version 8.22 to 8.32, the " ln /mnt/lustre/tfile /mnt/lustre/tdir/ " command also failed with -ENOENT . This means the kernel issue exists in all of the vendor kernels we support now.
            neilb Neil Brown added a comment -

            > Would it work to change ll_lookup_nd() to skip the optimization if flags contains LOOKUP_CREATE | LOOKUP_EXCL and LOOKUP_CREATE + !LOOKUP_OPEN + !LOOKUP_EXCL still use the optimization?

            I don't think ->lookup is ever called with LOOKUP_CREATE but not LOOKUP_EXCL.  And if it is, it is every likely to include LOOKUP_OPEN.  Most syscalls that create names give an error (EEXIST) if the name exists.  The only exception is open(O_CREATE).

            Do we really need to fix this is lustre?  It is clearly not a lustre bug.  If we want test_55d to succeed more reliably we can just change the ln command to "ln $DIR2/$tdir/f1 $DIR2/$tdir/$tdir/f1" as Andreas suggested.  That performs exactly the same effective test, but avoids the kernel bug.

             

            neilb Neil Brown added a comment - > Would it work to change ll_lookup_nd() to skip the optimization if flags contains LOOKUP_CREATE | LOOKUP_EXCL and LOOKUP_CREATE + !LOOKUP_OPEN + !LOOKUP_EXCL still use the optimization? I don't think ->lookup is ever called with LOOKUP_CREATE but not LOOKUP_EXCL.  And if it is, it is every likely to include LOOKUP_OPEN.  Most syscalls that create names give an error (EEXIST) if the name exists.  The only exception is open(O_CREATE). Do we really need to fix this is lustre?  It is clearly not a lustre bug.  If we want test_55d to succeed more reliably we can just change the ln command to " ln $DIR2/$tdir/f1 $DIR2/$tdir/$tdir/f1 " as Andreas suggested.  That performs exactly the same effective test, but avoids the kernel bug.  

            People

              yujian Jian Yu
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: