Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13543

hsm.actions file is broken on RHEL 8.2

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.14.0
    • Lustre 2.14.0
    • None
    • RHEL 8.2
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for jianyu <yujian@whamcloud.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/5690066f-e39c-4964-86b6-ceb2f961126c

      test_90 failed with the following error:

      CMD: trevis-66vm4 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm.actions | egrep 'WAITING|STARTED'
      pdsh@trevis-66vm1: trevis-66vm4: ssh exited with exit code 1
      CMD: trevis-66vm4 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm.actions | egrep 'WAITING|STARTED'
      pdsh@trevis-66vm1: trevis-66vm4: ssh exited with exit code 1
      Cannot send HSM request (use of /mnt/lustre/d90.sanity-hsm/f90.sanity-hsm.1): Operation not permitted
       sanity-hsm test_90: @@@@@@ FAIL: cannot release a file list 
      

      https://testing.whamcloud.com/test_sets/6badc2b6-c8a6-40db-a740-205913d2f371

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      sanity-hsm test_90 - cannot release a file list

      Attachments

        Issue Links

          Activity

            [LU-13543] hsm.actions file is broken on RHEL 8.2
            pjones Peter Jones added a comment -

            Landed for 2.14

            pjones Peter Jones added a comment - Landed for 2.14

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40757/
            Subject: LU-13543 lustre: update *pos in seq_file .next functions
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: d5d0ff24a84f64e5196341f5ce946952d7fff8b7

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40757/ Subject: LU-13543 lustre: update *pos in seq_file .next functions Project: fs/lustre-release Branch: master Current Patch Set: Commit: d5d0ff24a84f64e5196341f5ce946952d7fff8b7

            Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40757
            Subject: LU-13543 lustre: update *pos in seq_file .next functions
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 31b0a733dd6bc6eae6bd8b1a606da7e6d274c910

            gerrit Gerrit Updater added a comment - Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40757 Subject: LU-13543 lustre: update *pos in seq_file .next functions Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 31b0a733dd6bc6eae6bd8b1a606da7e6d274c910
            jhammond John Hammond added a comment -

            This is likely due to the inclusion of https://github.com/torvalds/linux/commit/1f4aace60b0edc2d885aaa263abf4df42c8c65a8 in el8.2. Reading from the debugfs file directly while using Lustre trace and strace shows that hsm_actions_show_cb() is called lots of times but read() returns 0. When there are lots of records in the log then none are returned from read().

            jhammond John Hammond added a comment - This is likely due to the inclusion of https://github.com/torvalds/linux/commit/1f4aace60b0edc2d885aaa263abf4df42c8c65a8 in el8.2. Reading from the debugfs file directly while using Lustre trace and strace shows that hsm_actions_show_cb() is called lots of times but read() returns 0 . When there are lots of records in the log then none are returned from read() .
            jhammond John Hammond added a comment -

            These tests (90, 104, 260c) consistently:

            1. Pass with el8.1 client and el8.1 server: https://review.whamcloud.com/#/c/38675/
            2. Fail with el8.1 client and el8.2 server: https://review.whamcloud.com/#/c/38676/

            Test 52 consistently passes with 8.1/8.1, but failed 1/4 times with 8.1/8.2.

            jhammond John Hammond added a comment - These tests (90, 104, 260c) consistently: Pass with el8.1 client and el8.1 server: https://review.whamcloud.com/#/c/38675/ Fail with el8.1 client and el8.2 server: https://review.whamcloud.com/#/c/38676/ Test 52 consistently passes with 8.1/8.1, but failed 1/4 times with 8.1/8.2.

            Yes, I'm working on that now.

            beevans Ben Evans (Inactive) added a comment - Yes, I'm working on that now.
            yujian Jian Yu added a comment -

            Hi Ben,
            It's likely a timing issue because the whole test session was run with ssh by autotest and only those three sub-tests failed regularly. Could you please improve the test scripts to resolve the issues?

            yujian Jian Yu added a comment - Hi Ben, It's likely a timing issue because the whole test session was run with ssh by autotest and only those three sub-tests failed regularly. Could you please improve the test scripts to resolve the issues?

            I think this is either the ssh issue mentioned above, or a timing issue.  I was able to run the test by hand with no issues.

            beevans Ben Evans (Inactive) added a comment - I think this is either the ssh issue mentioned above, or a timing issue.  I was able to run the test by hand with no issues.

            People

              tappro Mikhail Pershin
              yujian Jian Yu
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: