[LU-12251] sanity-hsm tests fail with ‘Device or resource busy’ Created: 30/Apr/19 Updated: 08/Dec/22 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.13.0, Lustre 2.12.1, Lustre 2.12.3, Lustre 2.14.0, Lustre 2.12.4, Lustre 2.12.5 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Nunez (Inactive) | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | always_except, ppc, ubuntu18 | ||
| Environment: |
PPC clients |
||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
Many sanity-hsm tests fail with a variety of failure messages about release file failures, but all have the common error message ‘Device or resource busy’. We see this for PPC client testing only. sanity-hsm test_1a, 1b, 1d, 12q, 21, 22, 23 and 58 all fail with error messages similar to Cannot send HSM request (use of /mnt/lustre/d1a.sanity-hsm/f1a.sanity-hsm): Device or resource busy sanity-hsm test_1a: @@@@@@ FAIL: could not release file sanity-hsm test_12c, 12f, 12g, 12h, 12m, and 12o all fail with error messages similar to Cannot send HSM request (use of /mnt/lustre/d12f.sanity-hsm/f12f.sanity-hsm): Device or resource busy sanity-hsm test_12f: @@@@@@ FAIL: release of /mnt/lustre/d12f.sanity-hsm/f12f.sanity-hsm failed sanity-hsm test_12p, 24a, 24e, 24f, and 37 all fail with error messages similar to Cannot send HSM request (use of /mnt/lustre/d12p.sanity-hsm/f12p.sanity-hsm): Device or resource busy sanity-hsm test_12p: @@@@@@ FAIL: cannot release /mnt/lustre/d12p.sanity-hsm/f12p.sanity-hsm sanity-hsm test_24b, 30c, and 228 all fail with error messages similar to Cannot send HSM request (use of /mnt/lustre/d24b.sanity-hsm/f24b.sanity-hsm): Device or resource busy sanity-hsm test_24b: @@@@@@ FAIL: hsm flags on /mnt/lustre/d24b.sanity-hsm/f24b.sanity-hsm are 0x00000009 != 0x0000000d sanity-hsm test_25b fails with Cannot send HSM request (use of /mnt/lustre/d25b.sanity-hsm/f25b.sanity-hsm): Device or resource busy 0006f7423be4c48158f1a88ef512b3fe /mnt/lustre/d25b.sanity-hsm/f25b.sanity-hsm sanity-hsm test_25b: @@@@@@ FAIL: lost file access should failed (returns 0) sanity-hsm test_57 fails with trevis-77vm2: Cannot send HSM request (use of /mnt/lustre/d57.sanity-hsm/test_archive_remote): Device or resource busy sanity-hsm test_57: @@@@@@ FAIL: hsm_release failed sanity-hsm test_90 fails with Cannot send HSM request (use of /mnt/lustre/d90.sanity-hsm/f90.sanity-hsm.1): Device or resource busy sanity-hsm test_90: @@@@@@ FAIL: cannot release a file list Looking at logs for a recent failure, https://testing.whamcloud.com/test_sets/82a14510-668f-11e9-8bb1-52540065bddc, there is nothing obviously wrong or error messages in the node console logs. The only thing that seems out of place is that, on the client 1 (vm1) console log, we see OSC reconnect messages for several of these failures [ 342.773914] Lustre: DEBUG MARKER: == sanity-hsm test 1a: mmap [ 342.854968] Lustre: lustre-OST0003-osc-c00000007461d000: reconnect after 25s idle [ 344.521132] Lustre: DEBUG MARKER: /usr/sbin/lctl mark sanity-hsm test_1a: @@@@@@ FAIL: could not release file Logs for other failures are at |
| Comments |
| Comment by James Nunez (Inactive) [ 31/Jan/20 ] |
|
We see this for sanity-flr test 0b for PPC; https://testing.whamcloud.com/test_sets/b93e0cf0-428c-11ea-b083-52540065bddc |
| Comment by Gerrit Updater [ 13/Feb/20 ] |
|
James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37563 |
| Comment by Gerrit Updater [ 25/Feb/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37563/ |
| Comment by Gerrit Updater [ 08/Dec/22 ] |
|
"James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49348 |