Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.13.0, Lustre 2.12.1, Lustre 2.12.3, Lustre 2.14.0, Lustre 2.12.4, Lustre 2.12.5
-
PPC clients
-
3
-
9223372036854775807
Description
Many sanity-hsm tests fail with a variety of failure messages about release file failures, but all have the common error message ‘Device or resource busy’. We see this for PPC client testing only.
sanity-hsm test_1a, 1b, 1d, 12q, 21, 22, 23 and 58 all fail with error messages similar to
Cannot send HSM request (use of /mnt/lustre/d1a.sanity-hsm/f1a.sanity-hsm): Device or resource busy sanity-hsm test_1a: @@@@@@ FAIL: could not release file
sanity-hsm test_12c, 12f, 12g, 12h, 12m, and 12o all fail with error messages similar to
Cannot send HSM request (use of /mnt/lustre/d12f.sanity-hsm/f12f.sanity-hsm): Device or resource busy sanity-hsm test_12f: @@@@@@ FAIL: release of /mnt/lustre/d12f.sanity-hsm/f12f.sanity-hsm failed
sanity-hsm test_12p, 24a, 24e, 24f, and 37 all fail with error messages similar to
Cannot send HSM request (use of /mnt/lustre/d12p.sanity-hsm/f12p.sanity-hsm): Device or resource busy sanity-hsm test_12p: @@@@@@ FAIL: cannot release /mnt/lustre/d12p.sanity-hsm/f12p.sanity-hsm
sanity-hsm test_24b, 30c, and 228 all fail with error messages similar to
Cannot send HSM request (use of /mnt/lustre/d24b.sanity-hsm/f24b.sanity-hsm): Device or resource busy sanity-hsm test_24b: @@@@@@ FAIL: hsm flags on /mnt/lustre/d24b.sanity-hsm/f24b.sanity-hsm are 0x00000009 != 0x0000000d
sanity-hsm test_25b fails with
Cannot send HSM request (use of /mnt/lustre/d25b.sanity-hsm/f25b.sanity-hsm): Device or resource busy 0006f7423be4c48158f1a88ef512b3fe /mnt/lustre/d25b.sanity-hsm/f25b.sanity-hsm sanity-hsm test_25b: @@@@@@ FAIL: lost file access should failed (returns 0)
sanity-hsm test_57 fails with
trevis-77vm2: Cannot send HSM request (use of /mnt/lustre/d57.sanity-hsm/test_archive_remote): Device or resource busy sanity-hsm test_57: @@@@@@ FAIL: hsm_release failed
sanity-hsm test_90 fails with
Cannot send HSM request (use of /mnt/lustre/d90.sanity-hsm/f90.sanity-hsm.1): Device or resource busy sanity-hsm test_90: @@@@@@ FAIL: cannot release a file list
Looking at logs for a recent failure, https://testing.whamcloud.com/test_sets/82a14510-668f-11e9-8bb1-52540065bddc, there is nothing obviously wrong or error messages in the node console logs. The only thing that seems out of place is that, on the client 1 (vm1) console log, we see OSC reconnect messages for several of these failures
[ 342.773914] Lustre: DEBUG MARKER: == sanity-hsm test 1a: mmap [ 342.854968] Lustre: lustre-OST0003-osc-c00000007461d000: reconnect after 25s idle [ 344.521132] Lustre: DEBUG MARKER: /usr/sbin/lctl mark sanity-hsm test_1a: @@@@@@ FAIL: could not release file
Logs for other failures are at
https://testing.whamcloud.com/test_sets/0352f31e-6322-11e9-8bb1-52540065bddc
https://testing.whamcloud.com/test_sets/00dc12be-65bb-11e9-a6f9-52540065bddc