Details
-
Bug
-
Resolution: Cannot Reproduce
-
Minor
-
None
-
Lustre 2.12.0
-
3
-
9223372036854775807
Description
sanity test_104a fails with ‘lfs df failed’ for ARM clients. We’ve only seen this once, https://testing.whamcloud.com/test_sets/ec527b2c-fdef-11e8-b837-52540065bddc , in the past four months. After test 104a fails, a series of other tests fail 107, 118k, 118i, 119c, 119d, 120a, 123a, 124a, 124b, 129, 130a/b/d/e, 131a/d/e, 133a/b/c/d, and test 133g hangs.
It’s clear from the suite_log that there is something wrong with some of the MDTs/MDSs at the beginning of the test
== sanity test 104a: lfs df [-ih] [path] test ======================================================== 19:18:31 (1544469511) UUID 1K-blocks Used Available Use% Mounted on lustre-MDT0000_UUID 1165900 22572 1040132 2% /mnt/lustre[MDT:0] lustre-MDT0001_UUID : Input/output error lustre-MDT0002_UUID : Input/output error lustre-MDT0003_UUID : Input/output error lustre-OST0000_UUID 1933276 34688 1777348 2% /mnt/lustre[OST:0] lustre-OST0001_UUID 1933276 45700 1766336 3% /mnt/lustre[OST:1] lustre-OST0002_UUID 1933276 37288 1774748 2% /mnt/lustre[OST:2] lustre-OST0003_UUID 1933276 31024 1781012 2% /mnt/lustre[OST:3] lustre-OST0004_UUID 1933276 30168 1781868 2% /mnt/lustre[OST:4] lustre-OST0005_UUID 1933276 40068 1771968 2% /mnt/lustre[OST:5] lustre-OST0006_UUID 1933276 41116 1770920 2% /mnt/lustre[OST:6] lustre-OST0007_UUID 1933276 32700 1779336 2% /mnt/lustre[OST:7] filesystem_summary: 15466208 292752 14203536 2% /mnt/lustre sanity test_104a: @@@@@@ FAIL: lfs df failed
So, it’s no surprise that ‘lfs df’ failed.
sanity test 104a does deactivate an OST and should expect to see that the OST is in a’FULL” state, but, in this test session log, we see a connection restored message from MDTs. From MDS2, 4 (vm5), we see
[ 8408.485447] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == sanity test 104a: lfs df [-ih] [path] test ======================================================== 19:18:31 \(1544469511\) [ 8409.091430] Lustre: DEBUG MARKER: == sanity test 104a: lfs df [-ih] [path] test ======================================================== 19:18:31 (1544469511) [ 8409.525890] Lustre: lustre-MDT0001: Connection restored to 310dfc65-ad7d-537f-6815-c0fd7f0fb43b (at 10.9.8.38@tcp) [ 8409.940081] Lustre: DEBUG MARKER: /usr/sbin/lctl mark sanity test_104a: @@@@@@ FAIL: lfs df failed
with a similar message from MDS1, 3 (vm4)
[ 8408.748317] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == sanity test 104a: lfs df [-ih] [path] test ======================================================== 19:18:31 \(1544469511\) [ 8409.370355] Lustre: DEBUG MARKER: == sanity test 104a: lfs df [-ih] [path] test ======================================================== 19:18:31 (1544469511) [ 8409.831727] Lustre: lustre-MDT0002: Connection restored to 56904d5f-959b-023e-bc98-099190cbfba6 (at 10.9.8.38@tcp) [ 8409.832746] Lustre: Skipped 2 previous similar messages [ 8410.186046] Lustre: DEBUG MARKER: /usr/sbin/lctl mark sanity test_104a: @@@@@@ FAIL: lfs df failed