Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-18070

sanity test_103a: FAIL: run_acl_subtest 'misc' failed

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • Lustre 2.16.0
    • Ubuntu 24.04 client
      SLES 15 SP6 client
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for jianyu <yujian@whamcloud.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/53afc141-49e3-4248-b791-db7fd0dc3212

      test_103a failed with the following error:

                             
      [413] $ chmod 750 d -- ok
      [414] $ ls -dl d | awk '{print $1}' -- ok
      [417] $ getfacl --omit-header d -- failed
      user::rwx                             | user::rwx                              
      user:bin:r-x                          | user:bin:r-x                           
      user:daemon:rwx	#effective:r-x        | user:daemon:rwx	#effective:r-x         
      group::rwx	#effective:r-x             | group::rwx	#effective:r-x              
      mask::r-x                             | mask::r-x                              
      other::---                            | other::---                             
      default:user::rwx                     | default:user::rwx                      
      default:user:bin:r-x                  | default:user:bin:r-x                   
      default:user:daemon:rwx               ? default:user:daemon:rwx	#effective:r-x 
      default:group::rwx                    ? default:group::rwx	#effective:r-x      
      default:mask::rwx                     ? default:mask::r-x                      
      default:other::r-x                    ? default:other::---                     
                                            |                                        
      [432] $ rmdir d -- ok
      103 commands (95 passed, 8 failed)
       sanity test_103a: @@@@@@ FAIL: run_acl_subtest 'misc' failed
      

      Test session details:
      clients: https://build.whamcloud.com/job/lustre-reviews/106363 - 6.4.0-150600.23.14-default
      servers: https://build.whamcloud.com/job/lustre-reviews/106363 - 5.14.0-362.24.1_lustre.el9.x86_64

      <<Please provide additional information about the failure here>>

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      sanity test_103a - run_acl_subtest 'misc' failed

      Attachments

        Issue Links

          Activity

            [LU-18070] sanity test_103a: FAIL: run_acl_subtest 'misc' failed

            With patch https://review.whamcloud.com/41683, sanity test_103a is failing in a new different way:

            [176] $ : < f -- failed
            ~                                     ? f: Permission denied
            

            This is weird as all other recent failures of this test are happening with SLES15 only (either "full" test sessions or my debug patches pushed under LU-18070).
            I am going to investigate.

            sebastien Sebastien Buisson added a comment - With patch https://review.whamcloud.com/41683 , sanity test_103a is failing in a new different way: [176] $ : < f -- failed ~ ? f: Permission denied This is weird as all other recent failures of this test are happening with SLES15 only (either "full" test sessions or my debug patches pushed under LU-18070 ). I am going to investigate.

            I had this fail twice on master with el9.3 clients in a trivial test session... Patch https://review.whamcloud.com/41683 was based on 2.16.0-RC1, but I've rebased it to the latest master to include "LU-18101 sec: fix ACL handling on recent kernels again" in the hopes of fixing it...

            Otherwise, if this is causing even "trivial" test sessions to fail, we need to look into disabling this subtest until it is fixed.

            adilger Andreas Dilger added a comment - I had this fail twice on master with el9.3 clients in a trivial test session... Patch https://review.whamcloud.com/41683 was based on 2.16.0-RC1, but I've rebased it to the latest master to include " LU-18101 sec: fix ACL handling on recent kernels again " in the hopes of fixing it... Otherwise, if this is causing even "trivial" test sessions to fail, we need to look into disabling this subtest until it is fixed.

            "Sebastien Buisson <sbuisson@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56573
            Subject: LU-18070 dbg: debug sanity test_103a - 2
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: fd8ff92937685b823fbb18231e9df86557847889

            gerrit Gerrit Updater added a comment - "Sebastien Buisson <sbuisson@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56573 Subject: LU-18070 dbg: debug sanity test_103a - 2 Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: fd8ff92937685b823fbb18231e9df86557847889

            "Sebastien Buisson <sbuisson@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56572
            Subject: LU-18070 dbg: debug sanity test_103a - 1
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 81746489f00e9625dd876ed14157fbba820c969f

            gerrit Gerrit Updater added a comment - "Sebastien Buisson <sbuisson@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56572 Subject: LU-18070 dbg: debug sanity test_103a - 1 Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 81746489f00e9625dd876ed14157fbba820c969f

            All recent failures happen with SLES 15.6 clients. I am going to investigate.

            sebastien Sebastien Buisson added a comment - All recent failures happen with SLES 15.6 clients. I am going to investigate.
            yujian Jian Yu added a comment - Still failed on 2.16.0 RC1: https://testing.whamcloud.com/test_sets/29fd5bcd-852f-477a-917a-832e1112218b

            Patch to fix the problem in sanity test_103a is at https://review.whamcloud.com/56269 .

            sebastien Sebastien Buisson added a comment - Patch to fix the problem in sanity test_103a is at https://review.whamcloud.com/56269 .

            The problem shows up on an Ubuntu 24.04 client when just running standalone sanity test_103a.

            The test failure is due to the fact that on this distribution, the output of getfacl is not the expected one. More precisely, there are additional default ACLs on the directories. For instance:

            [46] $ getfacl --omit-header d -- failed
            user::rwx                             | user::rwx                              
            user:sanityusr:rwx                    | user:sanityusr:rwx                     
            group::r-x                            | group::r-x                             
            mask::rwx                             | mask::rwx                              
            other::---                            | other::---                             
                                                  ? default:user::rwx                      
            ~                                     ? default:user:sanityusr:rwx             
            ~                                     ? default:group::r-x                     
            ~                                     ? default:mask::rwx                      
            ~                                     ? default:other::---                     
            ~                                     ?                              
            

            I am not sure where these are coming from, and how to get rid of them. I pushed a few patches to investigate.

            sebastien Sebastien Buisson added a comment - The problem shows up on an Ubuntu 24.04 client when just running standalone sanity test_103a. The test failure is due to the fact that on this distribution, the output of getfacl is not the expected one. More precisely, there are additional default ACLs on the directories. For instance: [46] $ getfacl --omit-header d -- failed user::rwx | user::rwx user:sanityusr:rwx | user:sanityusr:rwx group::r-x | group::r-x mask::rwx | mask::rwx other::--- | other::--- ? default:user::rwx ~ ? default:user:sanityusr:rwx ~ ? default:group::r-x ~ ? default:mask::rwx ~ ? default:other::--- ~ ? I am not sure where these are coming from, and how to get rid of them. I pushed a few patches to investigate.

            "Sebastien Buisson <sbuisson@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56269
            Subject: LU-18070 tests: fix sanity test_103a
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 4bb769dc0ecc5f1d15d9a0d93861c57ab04b38ba

            gerrit Gerrit Updater added a comment - "Sebastien Buisson <sbuisson@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56269 Subject: LU-18070 tests: fix sanity test_103a Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 4bb769dc0ecc5f1d15d9a0d93861c57ab04b38ba

            "Sebastien Buisson <sbuisson@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56268
            Subject: LU-18070 dbg: debug sanity test_103a
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 4ddafd0c7d0a67aabfd3d1a07ffda0abb6aad07f

            gerrit Gerrit Updater added a comment - "Sebastien Buisson <sbuisson@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56268 Subject: LU-18070 dbg: debug sanity test_103a Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 4ddafd0c7d0a67aabfd3d1a07ffda0abb6aad07f

            sebastien,
            it looks like some of the getfacl output is differing in only "cosmetic" ways, such as the addition of the "# effective r-x" comments on some lines (which may be informative to the reader, but do not affect actual access permissions).

            In other cases, I see the addition (or removal?) of default:, lines, so this might represent an actual regression in default ACL inheritance, or it might just be a difference in how the commands are printing results.

            It looks like from Jian's patch https://review.whamcloud.com/55898 "LU-00000 tests: perform full group test sessions" that this failure can be reproduced 100% with Ubuntu 24.04 clients. All of those sessions were part of "full" runs, but it isn't clear if that is actually needed, or if that is the only time that Ubuntu24.04 client sessions are run and a regular sanity.sh run of just that subtest is enough?

            adilger Andreas Dilger added a comment - sebastien , it looks like some of the getfacl output is differing in only "cosmetic" ways, such as the addition of the " # effective r-x " comments on some lines (which may be informative to the reader, but do not affect actual access permissions). In other cases, I see the addition (or removal?) of default: , lines, so this might represent an actual regression in default ACL inheritance, or it might just be a difference in how the commands are printing results. It looks like from Jian's patch https://review.whamcloud.com/55898 " LU-00000 tests: perform full group test sessions " that this failure can be reproduced 100% with Ubuntu 24.04 clients. All of those sessions were part of "full" runs, but it isn't clear if that is actually needed, or if that is the only time that Ubuntu24.04 client sessions are run and a regular sanity.sh run of just that subtest is enough?

            People

              sebastien Sebastien Buisson
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: