Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17943

conf-sanity test_32d: FAIL: set project failed

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • Lustre 2.15.5
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Minh Diep <mdiep@whamcloud.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/0101ce36-e2a5-4868-a945-bceb058a322f

      test_32d failed with the following error:

      Timeout occurred after 483 minutes, last suite running was conf-sanity
      

      Test session details:
      clients: https://build.whamcloud.com/job/lustre-b2_15/88 - 4.18.0-553.el8_10.x86_64
      servers: https://build.whamcloud.com/job/lustre-b2_15/88 - 4.18.0-553.el8_lustre.x86_64

      <<Please provide additional information about the failure here>>

      onyx-24vm12: Pool t32fs.interop created
      lfs: failed to set xattr for '/tmp/t32/mnt/lustre/init.d': Value too large for defined data type
      lfs: failed to set xattr for '/tmp/t32/mnt/lustre/rc': Value too large for defined data type
      lfs: failed to set xattr for '/tmp/t32/mnt/lustre/rc0.d': Value too large for defined data type
      lfs: failed to set xattr for '/tmp/t32/mnt/lustre/rc1.d': Value too large for defined data type
      lfs: failed to set xattr for '/tmp/t32/mnt/lustre/rc2.d': Value too large for defined data type
      lfs: failed to set xattr for '/tmp/t32/mnt/lustre/rc3.d': Value too large for defined data type
      lfs: failed to set xattr for '/tmp/t32/mnt/lustre/rc4.d': Value too large for defined data type
      lfs: failed to set xattr for '/tmp/t32/mnt/lustre/rc5.d': Value too large for defined data type
      lfs: failed to set xattr for '/tmp/t32/mnt/lustre/rc6.d': Value too large for defined data type
      lfs: failed to set xattr for '/tmp/t32/mnt/lustre/rc.local': Value too large for defined data type
      lfs: failed to set xattr for '/tmp/t32/mnt/lustre/rc.sysinit': Value too large for defined data type
      lfs: failed to set xattr for '/tmp/t32/mnt/lustre/t32_qf_old': Value too large for defined data type
       conf-sanity test_32d: @@@@@@ FAIL: set project failed 
      

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      conf-sanity test_32d - Timeout occurred after 483 minutes, last suite running was conf-sanity

      Attachments

        Issue Links

          Activity

            [LU-17943] conf-sanity test_32d: FAIL: set project failed
            pjones Peter Jones added a comment -

            Merged for 2.16

            pjones Peter Jones added a comment - Merged for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55673/
            Subject: LU-17943 osd-ldiskfs: initialize dquot before expanding inode size
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 3fd57f81fddc604aa94bc7797cc211c7e393b3d0

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55673/ Subject: LU-17943 osd-ldiskfs: initialize dquot before expanding inode size Project: fs/lustre-release Branch: master Current Patch Set: Commit: 3fd57f81fddc604aa94bc7797cc211c7e393b3d0

            "Li Dongyang <dongyangli@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55673
            Subject: LU-17943 osd-ldiskfs: initialize dquot before expanding inode size
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 04f7854d9321bd72bc484e5ba78ed9099536bde2

            gerrit Gerrit Updater added a comment - "Li Dongyang <dongyangli@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55673 Subject: LU-17943 osd-ldiskfs: initialize dquot before expanding inode size Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 04f7854d9321bd72bc484e5ba78ed9099536bde2
            pjones Peter Jones added a comment -

            Sounds good - and I think it's ok to just tidy this up for 2.15.6 vs delaying 2.15.5

            pjones Peter Jones added a comment - Sounds good - and I think it's ok to just tidy this up for 2.15.6 vs delaying 2.15.5
            dongyang Dongyang Li added a comment - - edited

            The inode size from the 2.4 and the 2.5 image is ok, the issue is the extra_isize:

            Inode size:               512
            Required extra isize:     28
            Desired extra isize:      28
            

            The extra_isize set in superblock is 28, I think the image was created with mke2fs without project quota support?
            It should be sizeof(struct ext2_inode_large) - EXT2_GOOD_OLD_INODE_SIZE, which is now 32. Using 28 as extra_isize means we just miss out saving the project_id in the inode as it's at the very end of ext2_inode_large.

            The images for 2.7+ are all ok.

            So to fix this I think we need to set the new extra_isize when turning on project quota in tune2fs, and then run e2fsck to expand the i_size for every inode in use. So I prefer maybe just port the LU-10215 tests: remove disk2_4 disk2_5 images to b2_15

            dongyang Dongyang Li added a comment - - edited The inode size from the 2.4 and the 2.5 image is ok, the issue is the extra_isize: Inode size: 512 Required extra isize: 28 Desired extra isize: 28 The extra_isize set in superblock is 28, I think the image was created with mke2fs without project quota support? It should be sizeof(struct ext2_inode_large) - EXT2_GOOD_OLD_INODE_SIZE, which is now 32. Using 28 as extra_isize means we just miss out saving the project_id in the inode as it's at the very end of ext2_inode_large. The images for 2.7+ are all ok. So to fix this I think we need to set the new extra_isize when turning on project quota in tune2fs, and then run e2fsck to expand the i_size for every inode in use. So I prefer maybe just port the LU-10215 tests: remove disk2_4 disk2_5 images to b2_15

            If you can confirm that this test is only having problems with an upgrade from a 2.4 filesystem that doesn't have larger MDT or OST inodes, then I don't think it is a real concern for us. I was only worried that it might also have some impact on newer systems.

            adilger Andreas Dilger added a comment - If you can confirm that this test is only having problems with an upgrade from a 2.4 filesystem that doesn't have larger MDT or OST inodes, then I don't think it is a real concern for us. I was only worried that it might also have some impact on newer systems.
            dongyang Dongyang Li added a comment -

            log from mdt:

            [23670.629105] LustreError: 516086:0:(osd_handler.c:3151:osd_quota_transfer()) t32fs-MDT0000: quota transfer failed. Is project enforcement enabled on the ldiskfs filesystem? rc = -75
            [23675.572899] LustreError: 514936:0:(osd_handler.c:3151:osd_quota_transfer()) t32fs-MDT0000: quota transfer failed. Is project enforcement enabled on the ldiskfs filesystem? rc = -75
            [23675.864983] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  conf-sanity test_32d: @@@@@@ FAIL: set project failed 
            [23676.097944] Lustre: DEBUG MARKER: conf-sanity test_32d: @@@@@@ FAIL: set project failed
            

            75 is EOVERFLOW, looks like we failed to expand isize? checking the details.

            dongyang Dongyang Li added a comment - log from mdt: [23670.629105] LustreError: 516086:0:(osd_handler.c:3151:osd_quota_transfer()) t32fs-MDT0000: quota transfer failed. Is project enforcement enabled on the ldiskfs filesystem? rc = -75 [23675.572899] LustreError: 514936:0:(osd_handler.c:3151:osd_quota_transfer()) t32fs-MDT0000: quota transfer failed. Is project enforcement enabled on the ldiskfs filesystem? rc = -75 [23675.864983] Lustre: DEBUG MARKER: /usr/sbin/lctl mark conf-sanity test_32d: @@@@@@ FAIL: set project failed [23676.097944] Lustre: DEBUG MARKER: conf-sanity test_32d: @@@@@@ FAIL: set project failed 75 is EOVERFLOW, looks like we failed to expand isize? checking the details.
            pjones Peter Jones added a comment -

            I'm flagging the fix version as 2.15.5 as you've indicated that it warrants investigation but, based on the comments, I am not sure whether this is warranted - there should be no expectation of upgrading from something as old as 2.4 - even 2.10 would be a push...

            pjones Peter Jones added a comment - I'm flagging the fix version as 2.15.5 as you've indicated that it warrants investigation but, based on the comments, I am not sure whether this is warranted - there should be no expectation of upgrading from something as old as 2.4 - even 2.10 would be a push...

            Hi Dongyang, could you please take a closer look into this? I wonder if something in the new el8 kernel ext4 is causing this to fail. I'm not so much worried about the Lustre 2.4 MDT upgrade, but possibly it could affect newer versions since this subtest is only run with project_upgrade=yes for this kernel version and then exits, so it may be skipping other tests.

            adilger Andreas Dilger added a comment - Hi Dongyang, could you please take a closer look into this? I wonder if something in the new el8 kernel ext4 is causing this to fail. I'm not so much worried about the Lustre 2.4 MDT upgrade, but possibly it could affect newer versions since this subtest is only run with project_upgrade=yes for this kernel version and then exits, so it may be skipping other tests.

            This looks like potentially a real bug, but it is only affecting upgrades from 2.4 MDT images, so I'm not sure how critical it is?

            There have been only 4 timeouts in the past 6 months, and 3 of them were in the past week on b2_15 testing, so it seems possible that something which landed to b2_15 is causing a regression in this test? The servers are either (once) el8.9 or (twice) el8.10 so there may be some issue with the xattr format or projid values being stored by ldiskfs.

            adilger Andreas Dilger added a comment - This looks like potentially a real bug, but it is only affecting upgrades from 2.4 MDT images, so I'm not sure how critical it is? There have been only 4 timeouts in the past 6 months, and 3 of them were in the past week on b2_15 testing, so it seems possible that something which landed to b2_15 is causing a regression in this test? The servers are either (once) el8.9 or (twice) el8.10 so there may be some issue with the xattr format or projid values being stored by ldiskfs.

            People

              dongyang Dongyang Li
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: