Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.17.0
    • Lustre 2.12.5, Lustre 2.15.1
    • None
    • Affected Client OSes: CentOS 7.8.2003, Rocky Linux release 9.1
      Kernels: 5.14.0-162.12.1.el9_1.0.2.x86_64, 3.10.0-1127.8.2.el7.x86_64
    • 3
    • 9223372036854775807

    Description

      The following sequence has a strange issue that does not affect all clients:

      sesser@hercules-login-1 sesser$touch a; mkdir test; touch test; ln -svf $(pwd)/a test/
      ln: test/: cannot overwrite directory
      sesser@hercules-login-1 sesser$ln -svf $(pwd)/a test/
      'test/a' -> '/work2/hpc/users/sesser/a'
      sesser@hercules-login-1 sesser$ln -svf $(pwd)/a test/
      'test/a' -> '/work2/hpc/users/sesser/a'
      sesser@hercules-login-1 sesser$touch test; ln -svf $(pwd)/a test/
      ln: test/: cannot overwrite directory
      sesser@hercules-login-1 sesser$touch test; ln -svf $(pwd)/a test/
      ln: test/: cannot overwrite directory
      sesser@hercules-login-1 sesser$touch test; ls -l; ln -svf $(pwd)/a test/
      total 16
      rw-r---- 1 sesser admin 0 Jan 5 16:48 a
      drwxr-x--- 2 sesser admin 16384 Jan 5 16:48 test
      'test/a' -> '/work2/hpc/users/sesser/a'

      Issuing the following outputs this:

      touch test; strace ln -svf $(pwd)/a test/

      symlinkat("/work2/hpc/users/sesser/a", AT_FDCWD, "test/") = -1 ENOENT (No such file or directory)
      newfstatat(AT_FDCWD, "test/",

      {st_mode=S_IFDIR|0750, st_size=16384, ...}

      , AT_SYMLINK_NOFOLLOW) = 0
      openat(AT_FDCWD, "/usr/share/locale/C.UTF-8/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
      openat(AT_FDCWD, "/usr/share/locale/C.utf8/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
      openat(AT_FDCWD, "/usr/share/locale/C/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
      write(2, "ln: ", 4ln: ) = 4
      write(2, "test/: cannot overwrite director"..., 33test/: cannot overwrite directory) = 33
      write(2, "\n", 1
      ) = 1
      lseek(0, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
      close(0) = 0
      close(1) = 0
      close(2) = 0
      exit_group(1) = ?
      +++ exited with 1 +++

      This is a vendor agnostic problem, as we tested this on another system from a different vendor, and the results are the same. Some clients do behave as expected though.

      Client Details that Work Correctly:
      Client Type 1:

      • OS: CentOS 7.6.1810
      • Kernel: 3.10.0-957.27.2.el7.x86_64
      • Lustre Version: 2.12.8_ddn9
      • Mount Options: defaults, _netdev, user_xattr, flock

      Client Type 2:

      • OS: CentOS 7.8.2003
      • Kernel: 3.10.0-1127.8.2.el7.x86_64
      • Lustre Version: 2.12.5
      • Mount Options: defaults, _netdev, user_xattr, flock

      Client Details that do not Work Correctly:
      Client Type 3:

      • OS: CentOS 7.8.2003
      • Kernel: 3.10.0-1127.8.2.el7.x86_64
      • Lustre Version: 2.15.6
      • Mount Options: defaults, _netdev, user_xattr, flock

      Client Type 4:

      • Rocky 9.1
      • Kernel: 5.14.0-162.12.1.el9_1.0.2.x86_64
      • Lustre Version: 12.15.1
      • Mount Options: defaults, _netdev, user_xattr, flock

      All clients were built using the following commands:

      ./configure --disable-server --enable-quota --enable-mpitests=no
      make
      make check
      make rpms
      yum -y install *.rpms

      Attachments

        Activity

          [LU-17660] Symlink Bug with Lustre Client
          yujian Jian Yu added a comment -

          It turns out the version check in sanity/17p needs to be updated for RHEL 9.5.
          The issue will be fixed in https://review.whamcloud.com/57237.

          yujian Jian Yu added a comment - It turns out the version check in sanity/17p needs to be updated for RHEL 9.5. The issue will be fixed in https://review.whamcloud.com/57237 .
          yujian Jian Yu added a comment - sanity test 17p failed on master branch: https://testing.whamcloud.com/test_sets/ece15700-bd4a-407b-83ac-829bb786ae07 https://testing.whamcloud.com/test_sets/b69c87ba-a44d-4b4e-b289-7ce876bd565c https://testing.whamcloud.com/test_sets/9e2d1f29-8d54-45d2-ab0e-808ba0714be4 https://testing.whamcloud.com/test_sets/7967ea88-ea36-4d5c-9841-029567817bc0  
          pjones Peter Jones added a comment -

          Merged for 2.17

          pjones Peter Jones added a comment - Merged for 2.17

          "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/56639/
          Subject: LU-17660 tests: test symlink file to existing dir
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: f8672e6a0ea9c4fc66a4434601a3783f731aa742

          gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/56639/ Subject: LU-17660 tests: test symlink file to existing dir Project: fs/lustre-release Branch: master Current Patch Set: Commit: f8672e6a0ea9c4fc66a4434601a3783f731aa742
          jbradley John Bradley added a comment -

          We do not have a contract with RHEL - I'll see if we have one with CIQ, as we use Rocky in our environment.

          jbradley John Bradley added a comment - We do not have a contract with RHEL - I'll see if we have one with CIQ, as we use Rocky in our environment.

          jbradley do you have a support contract with RHEL? If yes, it would be useful to raise a ticket in their bugzilla about this, referencing the patch commit v6.9-rc4-39-gbb32cded3be2 to see if they will backport the fix into their kernel.

          adilger Andreas Dilger added a comment - jbradley do you have a support contract with RHEL? If yes, it would be useful to raise a ticket in their bugzilla about this, referencing the patch commit v6.9-rc4-39-gbb32cded3be2 to see if they will backport the fix into their kernel.

          It looks like that commit was landed in kernel v6.9-rc4-39-gbb32cded3be2.

          adilger Andreas Dilger added a comment - It looks like that commit was landed in kernel v6.9-rc4-39-gbb32cded3be2.
          flei Feng Lei added a comment -

          The bug appears on only el9.x series client.

          flei Feng Lei added a comment - The bug appears on only el9.x series client.
          flei Feng Lei added a comment - - edited

          To work around this, stat the target dir instead of touch it, then ln file under the dir:

          # touch a
          # mkdir test
          # stat test
          # ln -sf a test/
          
          flei Feng Lei added a comment - - edited To work around this, stat the target dir instead of touch it, then ln file under the dir: # touch a # mkdir test # stat test # ln -sf a test/
          flei Feng Lei added a comment -

          It seems to be a kernel bug and is fixed later in kernel source:

          commit b3d4650d82c71b9c9a8184de9e8bb656012b289e
          Author: NeilBrown <neilb@suse.de>
          Date:   Thu Apr 14 13:57:35 2022 +1000
              VFS: filename_create(): fix incorrect intent.
              
              When asked to create a path ending '/', but which is not to be a
              directory (LOOKUP_DIRECTORY not set), filename_create() will never try
              to create the file.  If it doesn't exist, -ENOENT is reported.
              
              However, it still passes LOOKUP_CREATE|LOOKUP_EXCL to the filesystems
              ->lookup() function, even though there is no intent to create.  This is
              misleading and can cause incorrect behaviour.
              
              If you try
              
                 ln -s foo /path/dir/
              
              where 'dir' is a directory on an NFS filesystem which is not currently
              known in the dcache, this will fail with ENOENT.
              
              But as the name is not in the dcache, nfs_lookup gets called with
              LOOKUP_CREATE|LOOKUP_EXCL and so it returns NULL without performing any
              lookup, with the expectation that a subsequent call to create the target
              will be made, and the lookup can be combined with the creation.  In the
              case with a trailing '/' and no LOOKUP_DIRECTORY, that call is never
              made.  Instead filename_create() sees that the dentry is not (yet)
              positive and returns -ENOENT - even though the directory actually
              exists.
              
              So only set LOOKUP_CREATE|LOOKUP_EXCL if there really is an intent to
              create, and use the absence of these flags to decide if -ENOENT should
              be returned.
              
              Note that filename_parentat() is only interested in LOOKUP_REVAL, so we
              split that out and store it in 'reval_flag'.  __lookup_hash() then gets
              reval_flag combined with whatever create flags were determined to be
              needed.
           
          flei Feng Lei added a comment - It seems to be a kernel bug and is fixed later in kernel source : commit b3d4650d82c71b9c9a8184de9e8bb656012b289e Author: NeilBrown <neilb@suse.de> Date:   Thu Apr 14 13:57:35 2022 +1000     VFS: filename_create(): fix incorrect intent.          When asked to create a path ending '/' , but which is not to be a     directory (LOOKUP_DIRECTORY not set), filename_create() will never try     to create the file.  If it doesn't exist, -ENOENT is reported.          However, it still passes LOOKUP_CREATE|LOOKUP_EXCL to the filesystems     ->lookup() function, even though there is no intent to create.  This is     misleading and can cause incorrect behaviour.          If you try             ln -s foo /path/dir/          where 'dir' is a directory on an NFS filesystem which is not currently     known in the dcache, this will fail with ENOENT.          But as the name is not in the dcache, nfs_lookup gets called with     LOOKUP_CREATE|LOOKUP_EXCL and so it returns NULL without performing any     lookup, with the expectation that a subsequent call to create the target     will be made, and the lookup can be combined with the creation.  In the     case with a trailing '/' and no LOOKUP_DIRECTORY, that call is never     made.  Instead filename_create() sees that the dentry is not (yet)     positive and returns -ENOENT - even though the directory actually     exists.          So only set LOOKUP_CREATE|LOOKUP_EXCL if there really is an intent to     create, and use the absence of these flags to decide if -ENOENT should     be returned.          Note that filename_parentat() is only interested in LOOKUP_REVAL, so we     split that out and store it in 'reval_flag' .  __lookup_hash() then gets     reval_flag combined with whatever create flags were determined to be     needed.

          People

            flei Feng Lei
            jbradley John Bradley
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: