[LU-11788] sanity test 104a fails with ‘lfs df failed’ Created: 15/Dec/18 Updated: 24/Aug/23 Resolved: 24/Aug/23 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Nunez (Inactive) | Assignee: | Xinliang Liu |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | arm | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
sanity test_104a fails with ‘lfs df failed’ for ARM clients. We’ve only seen this once, https://testing.whamcloud.com/test_sets/ec527b2c-fdef-11e8-b837-52540065bddc , in the past four months. After test 104a fails, a series of other tests fail 107, 118k, 118i, 119c, 119d, 120a, 123a, 124a, 124b, 129, 130a/b/d/e, 131a/d/e, 133a/b/c/d, and test 133g hangs. It’s clear from the suite_log that there is something wrong with some of the MDTs/MDSs at the beginning of the test == sanity test 104a: lfs df [-ih] [path] test ======================================================== 19:18:31 (1544469511) UUID 1K-blocks Used Available Use% Mounted on lustre-MDT0000_UUID 1165900 22572 1040132 2% /mnt/lustre[MDT:0] lustre-MDT0001_UUID : Input/output error lustre-MDT0002_UUID : Input/output error lustre-MDT0003_UUID : Input/output error lustre-OST0000_UUID 1933276 34688 1777348 2% /mnt/lustre[OST:0] lustre-OST0001_UUID 1933276 45700 1766336 3% /mnt/lustre[OST:1] lustre-OST0002_UUID 1933276 37288 1774748 2% /mnt/lustre[OST:2] lustre-OST0003_UUID 1933276 31024 1781012 2% /mnt/lustre[OST:3] lustre-OST0004_UUID 1933276 30168 1781868 2% /mnt/lustre[OST:4] lustre-OST0005_UUID 1933276 40068 1771968 2% /mnt/lustre[OST:5] lustre-OST0006_UUID 1933276 41116 1770920 2% /mnt/lustre[OST:6] lustre-OST0007_UUID 1933276 32700 1779336 2% /mnt/lustre[OST:7] filesystem_summary: 15466208 292752 14203536 2% /mnt/lustre sanity test_104a: @@@@@@ FAIL: lfs df failed So, it’s no surprise that ‘lfs df’ failed. sanity test 104a does deactivate an OST and should expect to see that the OST is in a’FULL” state, but, in this test session log, we see a connection restored message from MDTs. From MDS2, 4 (vm5), we see [ 8408.485447] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == sanity test 104a: lfs df [-ih] [path] test ======================================================== 19:18:31 \(1544469511\) [ 8409.091430] Lustre: DEBUG MARKER: == sanity test 104a: lfs df [-ih] [path] test ======================================================== 19:18:31 (1544469511) [ 8409.525890] Lustre: lustre-MDT0001: Connection restored to 310dfc65-ad7d-537f-6815-c0fd7f0fb43b (at 10.9.8.38@tcp) [ 8409.940081] Lustre: DEBUG MARKER: /usr/sbin/lctl mark sanity test_104a: @@@@@@ FAIL: lfs df failed with a similar message from MDS1, 3 (vm4) [ 8408.748317] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == sanity test 104a: lfs df [-ih] [path] test ======================================================== 19:18:31 \(1544469511\) [ 8409.370355] Lustre: DEBUG MARKER: == sanity test 104a: lfs df [-ih] [path] test ======================================================== 19:18:31 (1544469511) [ 8409.831727] Lustre: lustre-MDT0002: Connection restored to 56904d5f-959b-023e-bc98-099190cbfba6 (at 10.9.8.38@tcp) [ 8409.832746] Lustre: Skipped 2 previous similar messages [ 8410.186046] Lustre: DEBUG MARKER: /usr/sbin/lctl mark sanity test_104a: @@@@@@ FAIL: lfs df failed |
| Comments |
| Comment by Xinliang Liu [ 08/Dec/21 ] |
|
This issue is hard to reproduced, run 100 times in local test environment all pass. Haven't seen it in CI again also now. |
| Comment by James A Simmons [ 14/Jun/22 ] |
|
Does |
| Comment by James A Simmons [ 23/Aug/23 ] |
|
Is this still a problem? |
| Comment by Xinliang Liu [ 24/Aug/23 ] |
|
I don't think it is a problem now. Can't see this problem in our Arm CI, neither on branch master nor on b2_15. |
| Comment by James A Simmons [ 24/Aug/23 ] |
|
We should close it then |