Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • Lustre 2.16.0
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for jianyu <yujian@whamcloud.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/a42b9a1c-b8c9-41bc-b901-affadd750546

      test_49 failed with the following error:

      get all grp quota: 20000 / 5 seconds
      CMD: onyx-134vm11 lctl set_param -n os[cd]*.*MDT*.force_sync=1
      CMD: onyx-134vm1 lctl set_param -n osd*.*OS*.force_sync=1
      

      Test session details:
      clients: https://build.whamcloud.com/job/lustre-master/4587 - 4.18.0-477.27.1.el8_8.ppc64le
      servers: https://build.whamcloud.com/job/lustre-master/4587 - 4.18.0-477.27.1.el8_lustre.x86_64

      <<Please provide additional information about the failure here>>

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      sanity-quota test_49 - test_49 returned 139

      Attachments

        Issue Links

          Activity

            [LU-18394] sanity-quota test_49: returned 139
            pjones Peter Jones added a comment -

            Merged for 2.16

            pjones Peter Jones added a comment - Merged for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/56820/
            Subject: LU-18394 test: adjust step for different OS Arch
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: e7d63451eed447187dde0c672a65c1504ff24a07

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/56820/ Subject: LU-18394 test: adjust step for different OS Arch Project: fs/lustre-release Branch: master Current Patch Set: Commit: e7d63451eed447187dde0c672a65c1504ff24a07

            "Hongchao Zhang <hongchao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56820
            Subject: LU-18394 test: adjust step as per OS Arch
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 49846809d726415548103faeabdabe3e6cc8492a

            gerrit Gerrit Updater added a comment - "Hongchao Zhang <hongchao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56820 Subject: LU-18394 test: adjust step as per OS Arch Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 49846809d726415548103faeabdabe3e6cc8492a
            hongchao.zhang Hongchao Zhang added a comment - - edited

            https://testing.whamcloud.com/test_sets/04a76d46-7b8f-42d4-8521-7e393bfde9e7
            The issue can be reproduced at Maloo, it should be caused by the large data processing of "eval" function in Bash on PPC64le

            test_get_allquota() {
                    ...
                    eval $($LFS quota -a -s $start_qid -e $end_qid -u $MOUNT |
                        awk 'NR > 2 {printf("u_blimits[%d]=%d;u_ilimits[%d]=%d; \
                             u_busage2[%d]=%d;u_iusage2[%d]=%d;", \
                             NR, $5, NR, $9, NR, $3, NR, $7)}')
                    ...
            }
            

            The test will fail if the quota count is be retrieved by LFS is 5000 (SLOW=yes), there is no failure if the count is 1000 (SLOW=no)

            https://testing.whamcloud.com/test_sets/60e90159-3d13-49bc-9eb8-2653809b0d55
            the debug patch passed with setting count = 1000

            hongchao.zhang Hongchao Zhang added a comment - - edited https://testing.whamcloud.com/test_sets/04a76d46-7b8f-42d4-8521-7e393bfde9e7 The issue can be reproduced at Maloo, it should be caused by the large data processing of "eval" function in Bash on PPC64le test_get_allquota() { ... eval $($LFS quota -a -s $start_qid -e $end_qid -u $MOUNT | awk 'NR > 2 {printf("u_blimits[%d]=%d;u_ilimits[%d]=%d; \ u_busage2[%d]=%d;u_iusage2[%d]=%d;", \ NR, $5, NR, $9, NR, $3, NR, $7)}') ... } The test will fail if the quota count is be retrieved by LFS is 5000 (SLOW=yes), there is no failure if the count is 1000 (SLOW=no) https://testing.whamcloud.com/test_sets/60e90159-3d13-49bc-9eb8-2653809b0d55 the debug patch passed with setting count = 1000

            "Hongchao Zhang <hongchao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56799
            Subject: LU-18394 test: debug patch
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 5683cc340ca12b95a49a013ec0b5f8a48524bc54

            gerrit Gerrit Updater added a comment - "Hongchao Zhang <hongchao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56799 Subject: LU-18394 test: debug patch Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 5683cc340ca12b95a49a013ec0b5f8a48524bc54
            hongchao.zhang Hongchao Zhang added a comment - - edited

            Currently, all the failures are on PPC64 Client (RHEL8.8, 4.18.0-477.27.1.el8_8.ppc64le) since
            the related patch (https://review.whamcloud.com/#/c/42098/) had been merged on 2024-03-23,
            it seem the BASH at PPC64 has some issue?

            [46585.935966] bash[2530402]: bad frame in setup_rt_frame: 00007fffd09cf0e0 nip 000000010578c948 lr 0000000105750d18
            [46585.945110] systemd-coredump[2571027]: Not enough arguments passed by the kernel (0, expected 7).
            
            hongchao.zhang Hongchao Zhang added a comment - - edited Currently, all the failures are on PPC64 Client (RHEL8.8, 4.18.0-477.27.1.el8_8.ppc64le) since the related patch ( https://review.whamcloud.com/#/c/42098/ ) had been merged on 2024-03-23, it seem the BASH at PPC64 has some issue? [46585.935966] bash[2530402]: bad frame in setup_rt_frame: 00007fffd09cf0e0 nip 000000010578c948 lr 0000000105750d18 [46585.945110] systemd-coredump[2571027]: Not enough arguments passed by the kernel (0, expected 7).
            lixi_wc Li Xi added a comment -

            hongchao.zhang Would you please check this issue?

            lixi_wc Li Xi added a comment - hongchao.zhang Would you please check this issue?
            yujian Jian Yu added a comment -

            The failure occurred 31 times in the past 6 months.

            yujian Jian Yu added a comment - The failure occurred 31 times in the past 6 months.

            People

              hongchao.zhang Hongchao Zhang
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: