[LU-13543] hsm.actions file is broken on RHEL 8.2 Created: 10/May/20 Updated: 03/Dec/20 Resolved: 03/Dec/20 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.14.0 |
| Fix Version/s: | Lustre 2.14.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Jian Yu | Assignee: | Mikhail Pershin |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
RHEL 8.2 |
||
| Issue Links: |
|
||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||
| Description |
|
This issue was created by maloo for jianyu <yujian@whamcloud.com> This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/5690066f-e39c-4964-86b6-ceb2f961126c test_90 failed with the following error: CMD: trevis-66vm4 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm.actions | egrep 'WAITING|STARTED' pdsh@trevis-66vm1: trevis-66vm4: ssh exited with exit code 1 CMD: trevis-66vm4 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm.actions | egrep 'WAITING|STARTED' pdsh@trevis-66vm1: trevis-66vm4: ssh exited with exit code 1 Cannot send HSM request (use of /mnt/lustre/d90.sanity-hsm/f90.sanity-hsm.1): Operation not permitted sanity-hsm test_90: @@@@@@ FAIL: cannot release a file list https://testing.whamcloud.com/test_sets/6badc2b6-c8a6-40db-a740-205913d2f371 VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV |
| Comments |
| Comment by Jian Yu [ 10/May/20 ] |
|
The following error message is RHEL 8.2 specific: Cannot send HSM request (use of /mnt/lustre/d90.sanity-hsm/f90.sanity-hsm.1): Operation not permitted |
| Comment by Jian Yu [ 14/May/20 ] |
|
Hi Minh, pdsh@trevis-66vm1: trevis-66vm4: ssh exited with exit code 1 In lustre-initialization, MY_PDSH was defined as: MY_PDSH='pdsh -t 120 -S -Rssh -w' Do you know if the ssh connecting is reliable on RHEL 8.2 test environment? |
| Comment by Ben Evans (Inactive) [ 19/May/20 ] |
|
I think this is either the ssh issue mentioned above, or a timing issue. I was able to run the test by hand with no issues. |
| Comment by Jian Yu [ 20/May/20 ] |
|
Hi Ben, |
| Comment by Ben Evans (Inactive) [ 20/May/20 ] |
|
Yes, I'm working on that now. |
| Comment by John Hammond [ 20/May/20 ] |
|
These tests (90, 104, 260c) consistently:
Test 52 consistently passes with 8.1/8.1, but failed 1/4 times with 8.1/8.2. |
| Comment by John Hammond [ 22/May/20 ] |
|
This is likely due to the inclusion of https://github.com/torvalds/linux/commit/1f4aace60b0edc2d885aaa263abf4df42c8c65a8 in el8.2. Reading from the debugfs file directly while using Lustre trace and strace shows that hsm_actions_show_cb() is called lots of times but read() returns 0. When there are lots of records in the log then none are returned from read(). |
| Comment by Gerrit Updater [ 25/Nov/20 ] |
|
Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40757 |
| Comment by Gerrit Updater [ 03/Dec/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40757/ |
| Comment by Peter Jones [ 03/Dec/20 ] |
|
Landed for 2.14 |