[LU-12251] sanity-hsm tests fail with ‘Device or resource busy’ Created: 30/Apr/19  Updated: 08/Dec/22

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.13.0, Lustre 2.12.1, Lustre 2.12.3, Lustre 2.14.0, Lustre 2.12.4, Lustre 2.12.5
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: James Nunez (Inactive) Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: always_except, ppc, ubuntu18
Environment:

PPC clients


Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Many sanity-hsm tests fail with a variety of failure messages about release file failures, but all have the common error message ‘Device or resource busy’. We see this for PPC client testing only.

sanity-hsm test_1a, 1b, 1d, 12q, 21, 22, 23 and 58 all fail with error messages similar to

Cannot send HSM request (use of /mnt/lustre/d1a.sanity-hsm/f1a.sanity-hsm): Device or resource busy
 sanity-hsm test_1a: @@@@@@ FAIL: could not release file 

sanity-hsm test_12c, 12f, 12g, 12h, 12m, and 12o all fail with error messages similar to

Cannot send HSM request (use of /mnt/lustre/d12f.sanity-hsm/f12f.sanity-hsm): Device or resource busy
 sanity-hsm test_12f: @@@@@@ FAIL: release of /mnt/lustre/d12f.sanity-hsm/f12f.sanity-hsm failed 

sanity-hsm test_12p, 24a, 24e, 24f, and 37 all fail with error messages similar to

Cannot send HSM request (use of /mnt/lustre/d12p.sanity-hsm/f12p.sanity-hsm): Device or resource busy
 sanity-hsm test_12p: @@@@@@ FAIL: cannot release /mnt/lustre/d12p.sanity-hsm/f12p.sanity-hsm 

sanity-hsm test_24b, 30c, and 228 all fail with error messages similar to

Cannot send HSM request (use of /mnt/lustre/d24b.sanity-hsm/f24b.sanity-hsm): Device or resource busy
 sanity-hsm test_24b: @@@@@@ FAIL: hsm flags on /mnt/lustre/d24b.sanity-hsm/f24b.sanity-hsm are 0x00000009 != 0x0000000d 

sanity-hsm test_25b fails with

Cannot send HSM request (use of /mnt/lustre/d25b.sanity-hsm/f25b.sanity-hsm): Device or resource busy
0006f7423be4c48158f1a88ef512b3fe  /mnt/lustre/d25b.sanity-hsm/f25b.sanity-hsm
 sanity-hsm test_25b: @@@@@@ FAIL: lost file access should failed (returns 0) 

sanity-hsm test_57 fails with

trevis-77vm2: Cannot send HSM request (use of /mnt/lustre/d57.sanity-hsm/test_archive_remote): Device or resource busy
 sanity-hsm test_57: @@@@@@ FAIL: hsm_release failed 

sanity-hsm test_90 fails with

Cannot send HSM request (use of /mnt/lustre/d90.sanity-hsm/f90.sanity-hsm.1): Device or resource busy
 sanity-hsm test_90: @@@@@@ FAIL: cannot release a file list 

Looking at logs for a recent failure, https://testing.whamcloud.com/test_sets/82a14510-668f-11e9-8bb1-52540065bddc, there is nothing obviously wrong or error messages in the node console logs. The only thing that seems out of place is that, on the client 1 (vm1) console log, we see OSC reconnect messages for several of these failures

[  342.773914] Lustre: DEBUG MARKER: == sanity-hsm test 1a: mmap
[  342.854968] Lustre: lustre-OST0003-osc-c00000007461d000: reconnect after 25s idle
[  344.521132] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  sanity-hsm test_1a: @@@@@@ FAIL: could not release file 

Logs for other failures are at
https://testing.whamcloud.com/test_sets/0352f31e-6322-11e9-8bb1-52540065bddc
https://testing.whamcloud.com/test_sets/00dc12be-65bb-11e9-a6f9-52540065bddc



 Comments   
Comment by James Nunez (Inactive) [ 31/Jan/20 ]

We see this for sanity-flr test 0b for PPC; https://testing.whamcloud.com/test_sets/b93e0cf0-428c-11ea-b083-52540065bddc

Comment by Gerrit Updater [ 13/Feb/20 ]

James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37563
Subject: LU-12251 tests: skip sanity-pfl tests for PPC
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: cfa54b728a732041ea77f4c126d3cd60913c8896

Comment by Gerrit Updater [ 25/Feb/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37563/
Subject: LU-12251 tests: stop running sanity-flr for PPC
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: d59f1b03eee0f908a99a6ea80642685fcb621974

Comment by Gerrit Updater [ 08/Dec/22 ]

"James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49348
Subject: LU-12251 tests: re-enable running sanity-flr for PPC
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: e573755a28d732c68714c09ba43b3f3f24d8a8ab

Generated at Sat Feb 10 02:50:57 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.