[LU-16589] sanityn test_55d: FAIL: (2) mv succeeded Created: 24/Feb/23  Updated: 19/Apr/23  Resolved: 19/Apr/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.16.0, Lustre 2.15.3
Fix Version/s: Lustre 2.16.0, Lustre 2.15.3

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Jian Yu
Resolution: Fixed Votes: 0
Labels: None

Attachments: Text File ftrace.with_slash.depth_6.txt     Text File ftrace.without_slash.depth_6.txt    
Issue Links:
Duplicate
Related
is related to LU-4725 wrong lock ordering in rename leads t... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for jianyu <yujian@whamcloud.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/5fae0359-520f-440f-9515-09d401c4ae9c

test_55d failed with the following error:

== sanityn test 55d: rename file vs link ================= 17:25:02 (1677173102)
CMD: onyx-65vm4 /usr/sbin/lctl set_param fail_loc=0x155
fail_loc=0x155
ln: failed to create hard link '/mnt/lustre2/d55d.sanityn/d55d.sanityn/' => '/mnt/lustre2/d55d.sanityn/f1': No such file or directory
 sanityn test_55d: @@@@@@ FAIL: (2) mv succeeded

Test session details:
clients: https://build.whamcloud.com/job/lustre-master/4392 - 5.14.21-150400.24.28-default
servers: https://build.whamcloud.com/job/lustre-master/4392 - 4.18.0-425.10.1.el8_lustre.x86_64

<<Please provide additional information about the failure here>>

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanityn test_55d - (2) mv succeeded



 Comments   
Comment by Jian Yu [ 24/Feb/23 ]

Besides SLES 15 SP4 client, the same failure also occurred on SLES 15 SP3, RHEL 9.0 and RHEL 9.1 clients on master branch.

Comment by Jian Yu [ 24/Feb/23 ]

The ln command in coreutils-8.22 passed on RHEL 7.9 client and the strace outputs were:

1677185837.907658 stat("/mnt/lustre2/d55d.sanityn/d55d.sanityn/", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
1677185837.908877 lstat("/mnt/lustre2/d55d.sanityn/f1", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
1677185837.909992 linkat(AT_FDCWD, "/mnt/lustre2/d55d.sanityn/f1", AT_FDCWD, "/mnt/lustre2/d55d.sanityn/d55d.sanityn/f1", 0) = 0

While the ln command in coreutils-8.32 failed on SLES 15 SP4 client with strace outputs as follows:

1677175127.556014 linkat(AT_FDCWD, "/mnt/lustre2/d55d.sanityn/f1", AT_FDCWD, "/mnt/lustre2/d55d.sanityn/d55d.sanityn/", 0) = -1 ENOENT (No such file or directory)
1677175127.557279 newfstatat(AT_FDCWD, "/mnt/lustre2/d55d.sanityn/f1", {st_mode=S_IFREG|0644, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
1677175127.558307 write(2, "ln: ", 4)   = 4
1677175127.558492 write(2, "failed to create hard link '/mnt"..., 102) = 102
1677175127.558945 write(2, ": No such file or directory", 27) = 27
1677175127.559092 write(2, "\n", 1)     = 1

The stat() and lstat() operations were removed from ln by the following commit in coreutils-8.30:

commit 571f63f5010b047a8a3250304053f05949faded4
Author:     Paul Eggert <eggert@cs.ucla.edu>
AuthorDate: Fri Oct 19 12:19:43 2018 -0700
Commit:     Paul Eggert <eggert@cs.ucla.edu>
CommitDate: Fri Oct 19 12:38:34 2018 -0700

    ln: avoid directory hard-link races
    
    Previously, 'ln A B' did 'stat("B"), lstat("A"), link("A","B")'
    where the stat and lstat were necessary to avoid hard-linking
    directories on systems that can hard-link directories.
    Now, in situations that prohibit hard links to directories,
    'ln A B' merely does 'link("A","B")'.  The new behavior
    avoids some races and should be more efficient.
    This patch was inspired by Bug#10020, which was about 'ln'.
    * bootstrap.conf (gnulib_modules): Add unlinkdir.
    * src/force-link.c (force_linkat, force_symlinkat): New arg for
    error number of previous try.  Return error number, 0, or -1 if
    error, success, or success after removal.  All callers changed.
    * src/ln.c: Include priv-set.h, unlinkdir.h.
    (beware_hard_dir_link): New static var.
    (errnoize, atomic_link): New functions.
    (target_directory_operand): Use errnoize for simplicity.
    (do_link): New arg for error number of previous try.  All callers
    changed.  Do each link atomically if possible.
    (main): Do -r check earlier.  Remove linkdir privileges so we can
    use a single linkat/symlinkat instead of a racy substitute for the
    common case of 'ln A B' and 'ln -s A B'.  Set beware_hard_dir_link
    to disable this optimization.

I'm creating a patch to add ls command before running ln in sanityn/55d to trigger stat()/statx().

Comment by Gerrit Updater [ 24/Feb/23 ]

"Jian Yu <yujian@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50127
Subject: LU-16589 tests: fix dir hard-link failure in sanityn/55d
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 7f5977ab6e1a6f0c81ad74121350d2872da813e0

Comment by Jian Yu [ 01/Mar/23 ]

I found without adding ls command before ln, just removing the trailing slash from "$tdir/" can also make the ln command succeed:

sanityn/55d
        # link in reverse locking order
-       ln $DIR2/$tdir/f1 $DIR2/$tdir/$tdir/
+       ln $DIR2/$tdir/f1 $DIR2/$tdir/$tdir

$DIR2/$tdir/$tdir/f1 was created as a hard link to file $DIR2/$tdir/f1.

 

Comment by Jian Yu [ 02/Mar/23 ]

The failure can be simply reproduced as follows:

# touch /mnt/lustre/tfile
# mkdir /mnt/lustre/tdir

# ln /mnt/lustre/tfile /mnt/lustre/tdir/
ln: failed to create hard link '/mnt/lustre/tdir/' => '/mnt/lustre/tfile': No such file or directory

# ln /mnt/lustre/tfile /mnt/lustre/tdir
# ls /mnt/lustre/tdir
tfile

The strace outputs of "ln /mnt/lustre/tfile /mnt/lustre/tdir/" (with a trailing slash) are:

1677747428.112785 linkat(AT_FDCWD, "/mnt/lustre/tfile", AT_FDCWD, "/mnt/lustre/tdir/", 0) = -1 ENOENT (No such file or directory)
1677747428.113910 newfstatat(AT_FDCWD, "/mnt/lustre/tfile", {st_mode=S_IFREG|0644, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
<~snip~>
1677747428.116781 write(2, "ln: ", 4)   = 4
1677747428.116922 write(2, "failed to create hard link '/mnt"..., 69) = 69
<~snip~>
1677747428.117833 write(2, ": No such file or directory", 27) = 27
1677747428.117941 write(2, "\n", 1)     = 1

The strace outputs of "ln /mnt/lustre/tfile /mnt/lustre/tdir" (without a trailing slash) are:

1677747364.665989 linkat(AT_FDCWD, "/mnt/lustre/tfile", AT_FDCWD, "/mnt/lustre/tdir", 0) = -1 EEXIST (File exists)
1677747364.668271 openat(AT_FDCWD, "/mnt/lustre/tdir", O_RDONLY|O_PATH|O_DIRECTORY) = 3
1677747364.670377 linkat(AT_FDCWD, "/mnt/lustre/tfile", 3, "tfile", 0) = 0
Comment by Andreas Dilger [ 02/Mar/23 ]

The difference in behavior of "ln" and the success/failure of the test appears to boil down to the return code of linkat() with/without the trailing "/".

So this looks like a dcache lookup issue in Lustre, or possibly something slightly with how linkat() is being handled by the kernel or Lustre (e.g. returning "-ENOENT" instead of "-EEXIST")? The presence of the trailing "/" shouldn't make any difference to the kernel (AFAIK it should strip out "/" during lookup?) but possibly something strange happening during path processing.

Jian, could you please capture the Lustre kernel debug=all logs for the two test cases, and check how the two linkat() calls are different in Lustre. Probably only the time from the start to the end of the linkat() syscall is interesting, and where ENOENT vs. EEXIST are being generated.

Comment by Jian Yu [ 02/Mar/23 ]

Sure, Andreas. I have gathered the full debug logs and been looking at them.

Comment by Jian Yu [ 02/Mar/23 ]

In Lustre debug logs, with the trailing "/", there was no "-ENOENT" error. ll_link() from client and mdt_reint_link() from MDS were not called.
Without the trailing "/", ll_link() was called from client, and mdt_reint_link() on MDS returned "-EEXIST":

00000004:00000040:0.0:1677747185.111782:0:22177:0:(mdt_reint.c:1456:mdt_reint_link()) link target tdir existed!
00000004:00000001:0.0:1677747185.111783:0:22177:0:(mdt_reint.c:1457:mdt_reint_link()) Process leaving via unlock_source (rc=18446744073709551599 : -17 : 0xffffffffffffffef)

I'm looking into linkat() to see where "ENOENT" is generated.
 

Comment by Jian Yu [ 03/Mar/23 ]

In linkat()->__file_name_split_at()->__hurd_file_name_split():

__hurd_file_name_split()
  const char *lastslash = strrchr (file_name, '/');

  if (lastslash != NULL)
    {
      if (lastslash == file_name)
        {
          /* "/foobar" => crdir + "foobar".  */
          *name = (char *) file_name + 1;
          return (*use_init_port) (INIT_PORT_CRDIR, &addref);
        }
      else
        {
          /* "/dir1/dir2/.../file".  */
          char dirname[lastslash - file_name + 1];
          memcpy (dirname, file_name, lastslash - file_name);
          dirname[lastslash - file_name] = '\0';
          *name = (char *) lastslash + 1;
          return
            __hurd_file_name_lookup (use_init_port, get_dtable_port, lookup,
                                     dirname, 0, 0, dir);
        }
    }
  else if (file_name[0] == '\0')
    return ENOENT;
  else
    {
      /* "foobar" => cwdir + "foobar".  */
      *name = (char *) file_name;
      return (*use_init_port) (INIT_PORT_CWDIR, &addref);
    }

And in __hurd_file_name_lookup:

__hurd_file_name_lookup()
      /* The caller wants to require that the file we look up is a directory.
         We can do this without an extra RPC by appending a trailing slash
         to the file name we look up.  */
      size_t len = strlen (file_name);
      if (len == 0) 
        file_name = "/";
      else if (file_name[len - 1] != '/')
        {
          char *n = alloca (len + 2);
          memcpy (n, file_name, len);
          n[len] = '/';
          n[len + 1] = '\0';
          file_name = n;
        }

I haven't found out anything is wrong here.

Comment by Lai Siyao [ 03/Mar/23 ]

What's the result of this command on local filesystem? e.g. xfs?

Comment by Jian Yu [ 03/Mar/23 ]

What's the result of this command on local filesystem? e.g. xfs?

It works on local ext4 filesystem:

# df -T /root/
Filesystem     Type 1K-blocks    Used Available Use% Mounted on
/dev/vda2      ext4  20466256 5675620  13725676  30% /

# touch /root/tfile
# mkdir /root/tdir
# ln /root/tfile /root/tdir/
# ls /root/tdir/
tfile
Comment by Jian Yu [ 07/Mar/23 ]

It turned out the ENOENT error was returned from kernel do_linkat()->filename_create():

# rm -rf /mnt/lustre/*
# touch /mnt/lustre/tfile
# mkdir /mnt/lustre/tdir

# trace-cmd record -g do_linkat -p function_graph ln /mnt/lustre/tfile /mnt/lustre/tdir/
  plugin 'function_graph'
ln: failed to create hard link '/mnt/lustre/tdir/' => '/mnt/lustre/tfile': No such file or directory
CPU0 data recorded at offset=0x177000
    0 bytes in size (0 uncompressed)
CPU1 data recorded at offset=0x177000
    349 bytes in size (4096 uncompressed)

# trace-cmd report
cpus=2
              ln-31499 [001] 2019833.457879: funcgraph_entry:                   |  do_linkat() {
              ln-31499 [001] 2019833.457888: funcgraph_entry:        0.449 us   |    irq_enter_rcu();
              ln-31499 [001] 2019833.457889: funcgraph_entry:        8.943 us   |    __sysvec_irq_work();
              ln-31499 [001] 2019833.457898: funcgraph_entry:        0.368 us   |    irq_exit_rcu();
              ln-31499 [001] 2019833.457899: funcgraph_entry:      # 1459.949 us |    user_path_at_empty();
              ln-31499 [001] 2019833.459366: funcgraph_entry:        0.340 us   |    irq_enter_rcu();
              ln-31499 [001] 2019833.459366: funcgraph_entry:        0.834 us   |    __sysvec_irq_work();
              ln-31499 [001] 2019833.459367: funcgraph_entry:        0.304 us   |    irq_exit_rcu();
              ln-31499 [001] 2019833.459368: funcgraph_entry:        0.769 us   |    getname_flags();
              ln-31499 [001] 2019833.459369: funcgraph_entry:      ! 297.289 us |    filename_create();
              ln-31499 [001] 2019833.459667: funcgraph_entry:        4.795 us   |    dput();
              ln-31499 [001] 2019833.459672: funcgraph_entry:        0.343 us   |    mntput();
              ln-31499 [001] 2019833.459673: funcgraph_exit:       # 1794.900 us |  }
Comment by Jian Yu [ 07/Mar/23 ]

By comparing the attached Ftrace outputs (with and without the trailing "/", --max-graph-depth is 6), and the source code of filename_create() in kernel fs/namei.c, we can see the ENOENT error was returned as follows:

fs/namei.c
static struct dentry *filename_create(int dfd, struct filename *name,
                                struct path *path, unsigned int lookup_flags)
{
        // ......
        /*
         * Special case - lookup gave negative, but... we had foo/bar/
         * From the vfs_mknod() POV we just have a negative dentry -
         * all is fine. Let's be bastards - you had / on the end, you've
         * been asking for (non-existent) directory. -ENOENT for you.
         */
        if (unlikely(!is_dir && last.name[last.len])) {
                error = -ENOENT;
                goto fail;
        }
        // ......
}

ftrace.with_slash.depth_6.txt ftrace.without_slash.depth_6.txt

Comment by Andreas Dilger [ 08/Mar/23 ]

Jian, looking at what test_55d is doing, it is unclear why the linkat()->filename_create() operation is returning -ENOENT? The directory does exist, since it was created with mkdir on the previous line, so either the mkdir $DIR/$tdir/$tdir operation should have instantiated this directory into dcache, or the linkat() syscall should have looked up the second $tdir component during traversal.

It would be interesting to see what the test does with "ln $DIR2/$tdir/f1 $DIR2/$tdir/$tdir/f1" (i.e. with the target filename specified)? That is really what the "ln" command is trying to do - link f1 into the second $tdir directory so that the delayed mv $DIR2/$tdir/f1 $DIR2/$tdir/$tdir that was trying to create a file named $tdir cannot succeed when there is later a directory named $tdir.

Even if that allows the test to pass (which would be slightly better than the "pre-stat"), there is still a dcache bug in there somewhere, because the kernel lookup of the second $tdir/ should not have failed.

What else is interesting is that newer kernels (since v5.18-rc2-188-gb3d4650d82c7) have slightly different code here:

static struct dentry *filename_create(int dfd, struct filename *name,
                                      struct path *path, unsigned int lookup_flags)
{
        unsigned int create_flags = LOOKUP_CREATE | LOOKUP_EXCL;
        :
        :
        /*
         * Special case - lookup gave negative, but... we had foo/bar/
         * From the vfs_mknod() POV we just have a negative dentry -
         * all is fine. Let's be bastards - you had / on the end, you've
         * been asking for (non-existent) directory. -ENOENT for you.
         */
        if (unlikely(!create_flags)) {
                error = -ENOENT;
                goto fail;
        }

so it would be interesting to know what create_flags are being passed from Lustre?

It just happens that the commit v5.18-rc2-188-gb3d4650d82c7 that changed this code was written by Neil Brown, whom I've CC'd here in case he has some insight into what is going wrong here. It definitely seems from the commit message that this is a similar situation being hit in Lustre as was previously hit by NFS:

    VFS: filename_create(): fix incorrect intent.
    
    When asked to create a path ending '/', but which is not to be a
    directory (LOOKUP_DIRECTORY not set), filename_create() will never try
    to create the file.  If it doesn't exist, -ENOENT is reported.
    
    However, it still passes LOOKUP_CREATE|LOOKUP_EXCL to the filesystems
    ->lookup() function, even though there is no intent to create.  This is
    misleading and can cause incorrect behaviour.
    
    If you try
    
       ln -s foo /path/dir/
    
    where 'dir' is a directory on an NFS filesystem which is not currently
    known in the dcache, this will fail with ENOENT.
    
    But as the name is not in the dcache, nfs_lookup gets called with
    LOOKUP_CREATE|LOOKUP_EXCL and so it returns NULL without performing any
    lookup, with the expectation that a subsequent call to create the target
    will be made, and the lookup can be combined with the creation.  In the
    case with a trailing '/' and no LOOKUP_DIRECTORY, that call is never
    made.  Instead filename_create() sees that the dentry is not (yet)
    positive and returns -ENOENT - even though the directory actually
    exists.
    
    So only set LOOKUP_CREATE|LOOKUP_EXCL if there really is an intent to
    create, and use the absence of these flags to decide if -ENOENT should
    be returned.
    
    Note that filename_parentat() is only interested in LOOKUP_REVAL, so we
    split that out and store it in 'reval_flag'.  __lookup_hash() then gets
    reval_flag combined with whatever create flags were determined to be
    needed.

It isn't clear whether we can add a workaround in the Lustre ->lookup() method to handle this case or only fix the test (to add the /f1 component at the end) and get the client distros to apply Neil's patch. It looks like both SLES15sp4 and RHEL9.x are using a 5.14-based kernel, and are missing the v5.18-rc2-188-gb3d4650d82c7 patch, or we are somehow not handling the LOOKUP_* flags properly in Lustre. The changed code may have been affected by v5.14-rc7-68-g0ee50b47532a "namei: change filename_parentat() calling conventions", so it is possible we didn't update to those new conventions?

Comment by Neil Brown [ 08/Mar/23 ]

Hi Andreas,

 I think it likely that you have identified the problem.  Especially if it is possible that the "rename" happening in the background might have called d_lustre_invalidate() on the original dentry.

I think it unlikely that the filename_parentat() change is relevant.  That is an internal api in namei.c

The only way I can think of to work around the problem in lustre is to skip the optimisation in ll_lookup_nd() for a CREATE that isn't an OPEN.  You might be able to detect this particular case by seeing there is an invalid dentry still present.  Maybe.

I really should apply that patch to SP4.  We have it in SP3.

 

Comment by Jian Yu [ 08/Mar/23 ]

Thank you Andreas and Neil for the detailed analysis.

It would be interesting to see what the test does with "ln $DIR2/$tdir/f1 $DIR2/$tdir/$tdir/f1" (i.e. with the target filename specified)?

Test passed on SLES 15 SP4 client (with kernel 5.14.21-150400.24.28-default):

== sanityn test 55d: rename file vs link ============================================================= 05:52:19 (1678254739)
CMD: trevis-86vm8 /usr/sbin/lctl set_param fail_loc=0x155
fail_loc=0x155
mv: '/mnt/lustre/d55d.sanityn/f1' and '/mnt/lustre/d55d.sanityn/d55d.sanityn/f1' are the same file
Resetting fail_loc on all nodes...CMD: trevis-86vm7.trevis.whamcloud.com,trevis-86vm8,trevis-87vm7 lctl set_param -n fail_loc=0             fail_val=0 2>/dev/null
done.
CMD: trevis-86vm7.trevis.whamcloud.com /usr/sbin/lctl get_param catastrophe 2>&1
CMD: trevis-86vm8 /usr/sbin/lctl get_param catastrophe 2>&1
CMD: trevis-87vm7 /usr/sbin/lctl get_param catastrophe 2>&1
CMD: trevis-86vm7.trevis.whamcloud.com,trevis-86vm8,trevis-87vm7 dmesg
PASS 55d (9s)
Comment by Jian Yu [ 08/Mar/23 ]

The only way I can think of to work around the problem in lustre is to skip the optimization in ll_lookup_nd() for a CREATE that isn't an OPEN.

The optimization was added by https://review.whamcloud.com/8257 ("LU-4185 llite: Revise create with no open optimization"):

commit a2d5b2e83c0a512a3ea59698e8481621ab5856c2
Author:     Bobi Jam <bobijam@whamcloud.com>
AuthorDate: Wed Nov 13 15:56:25 2013 +0800
Commit:     Oleg Drokin <green@whamcloud.com>
CommitDate: Wed Oct 5 03:51:18 2016 +0000

    LU-4185 llite: Revise create with no open optimization
    
    Currently ll_lookup_nd just returns a negative when we are trying
    to create something with no open (read mkdir). This is all fine most
    of the cases, except if the directory where we are trying to do is
    not writeable by us. In that case vfs_create would return EPERM
    seeing as how a negative dentry means the create cannot proceed.
    But in reality if there is an existing name there that we just did
    not have cached, the proper return is EEXIST that could only happen
    if we did do the lookup.
    
    So amend the optimization to only take place if the directory is
    writeable by us, otherwise do the full lookup.
Comment by Zhenyu Xu [ 12/Mar/23 ]

Would it work to change ll_lookup_nd() to skip the optimization if flags contains LOOKUP_CREATE | LOOKUP_EXCL and LOOKUP_CREATE + !LOOKUP_OPEN + !LOOKUP_EXCL still use the optimization?

Comment by Andreas Dilger [ 12/Mar/23 ]

This workaround should go under an #ifdef so that the optimization is only disabled for kernels that are affected by it. It definitely is not needed for kernels newer than 5.18, but I'm not totally sure what is the oldest version affected, or if there is a #define we can check that was introduced in the original patch. This is also complicated by vendor backports, so there may need to be both a vanilla kernel version check and a vendor version check. 

Comment by Neil Brown [ 12/Mar/23 ]

> Would it work to change ll_lookup_nd() to skip the optimization if flags contains LOOKUP_CREATE | LOOKUP_EXCL and LOOKUP_CREATE + !LOOKUP_OPEN + !LOOKUP_EXCL still use the optimization?

I don't think ->lookup is ever called with LOOKUP_CREATE but not LOOKUP_EXCL.  And if it is, it is every likely to include LOOKUP_OPEN.  Most syscalls that create names give an error (EEXIST) if the name exists.  The only exception is open(O_CREATE).

Do we really need to fix this is lustre?  It is clearly not a lustre bug.  If we want test_55d to succeed more reliably we can just change the ln command to "ln $DIR2/$tdir/f1 $DIR2/$tdir/$tdir/f1" as Andreas suggested.  That performs exactly the same effective test, but avoids the kernel bug.

 

Comment by Jian Yu [ 13/Mar/23 ]

I'm not totally sure what is the oldest version affected

Even on RHEL 7.9 (with kernel 3.10.0-1160), after changing ln from version 8.22 to 8.32, the "ln /mnt/lustre/tfile /mnt/lustre/tdir/" command also failed with -ENOENT.
This means the kernel issue exists in all of the vendor kernels we support now.

Comment by Andreas Dilger [ 13/Mar/23 ]

Jian, then it seems this bug has existed for so long and could even be considered an issue in the new coreutils? At least it is fixed in newer kernels, and it seems like a very uncommon use case, so it isn't clear that putting a ton of effort into fixing it in Lustre is worthwhile. Neil can work on backporting the fix to SLES15 (if needed), and James or someone can file a ticket to fix it in RHEL8/9.

It seems the right thing for now is to fix the test (either add "/f1" at the end or remove the trailing "/") so that it is executing the original intent of the patch that added it.

A separate patch should be made with a new subtest for only this "ln" case. This new subtest should be skipped for new coreutils and old kernels:

        local ln_ver=$(ln --version | awk '/coreutils/ { print $4 }')

        (( $(version_code $ln_ver) < $(version code 8.32) )) ||
        (( $(version_code $(uname -r)) >= $(version_code 5.18) )) ||
                skip "need coreutils < 8.32 or kernel >= 5.18 for ln"

        touch $DIR/$tfile || error "create failed"
        mkdir $DIR/$tdir || error "mkdir failed"
        ln $DIR/$tfile $DIR/tdir/ || error "ln to '$tdir/' failed"
        

This will at least add test coverage for newer kernels and keep it for old coreutils, so that it doesn't regress in the future.  Additional checks can be added to allow running the test when we know particular distro kernels are fixed. 

Comment by Jian Yu [ 13/Mar/23 ]

Sure, Andreas. I just updated https://review.whamcloud.com/50127 to fix sanityn/55d. I'm working on a separate patch to add a new subtest for "ln". BTW, I just found the actual version of coreutils that started containing the "ln" change is 8.31.

Comment by Gerrit Updater [ 13/Mar/23 ]

"Jian Yu <yujian@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50265
Subject: LU-16589 tests: add sanity/31l to test ln command
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 8b6e20e63be8edc4fda22f2318fa8df350c7297b

Comment by Gerrit Updater [ 21/Mar/23 ]

"Jian Yu <yujian@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50348
Subject: LU-16589 tests: fix hard-link failure in sanityn/55d
Project: fs/lustre-release
Branch: b2_15
Current Patch Set: 1
Commit: 5b3526d01808b0dd270ae0655918fa3b7fc4f941

Comment by Gerrit Updater [ 21/Mar/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50127/
Subject: LU-16589 tests: fix hard-link failure in sanityn/55d
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 25c6b7ad2859729197c3cc6e6dcf0621e4bda6fa

Comment by Gerrit Updater [ 28/Mar/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50265/
Subject: LU-16589 tests: add sanity/31l to test ln command
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: e998d21caf99e32495950219e88dd9e7f981363e

Comment by Gerrit Updater [ 19/Apr/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50348/
Subject: LU-16589 tests: fix hard-link failure in sanityn/55d
Project: fs/lustre-release
Branch: b2_15
Current Patch Set:
Commit: b01209416cb73e06d50a2bf00855e56fcc37ed02

Comment by Peter Jones [ 19/Apr/23 ]

Landed for 2.16

Generated at Sat Feb 10 03:28:19 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.