Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.10.6
-
None
-
failover test group configuration
-
3
-
9223372036854775807
Description
replay-single test_41 fails for failover test sessions with error message 'dd on client failed' .
Looking at the client test_log for https://testing.whamcloud.com/test_sets/57dc5c3c-ee96-11e8-86c0-52540065bddc , we see
== replay-single test 41: read from a valid osc while other oscs are invalid == 17:33:32 (1542908012) error on ioctl 0x4008669a for '/mnt/lustre/f41.replay-single' (3): No space left on device error: setstripe: create striped file '/mnt/lustre/f41.replay-single' failed: No space left on device CMD: trevis-34vm1.trevis.whamcloud.com dd if=/dev/zero of=/mnt/lustre/f41.replay-single bs=4k count=1 dd: opening `/mnt/lustre/f41.replay-single': No space left on device replay-single test_41: @@@@@@ FAIL: dd on client failed
Most likely, there is not an OST that is full because, looking at the suite_log, we see that there is space on every OST in test 39
== replay-single test 39: test recovery from unlink llog (test llog_gen_rec) == 17:32:13 (1542907933) total: 800 open/close in 1.19 seconds: 674.75 ops/second CMD: trevis-34vm8 sync; sync; sync UUID 1K-blocks Used Available Use% Mounted on lustre-MDT0000_UUID 5825660 47228 5255600 1% /mnt/lustre[MDT:0] lustre-OST0000_UUID 1933276 25792 1786244 1% /mnt/lustre[OST:0] lustre-OST0001_UUID 1933276 25792 1786244 1% /mnt/lustre[OST:1] lustre-OST0002_UUID 1933276 25784 1786028 1% /mnt/lustre[OST:2] lustre-OST0003_UUID 1933276 25784 1786028 1% /mnt/lustre[OST:3] lustre-OST0004_UUID 1933276 25784 1786028 1% /mnt/lustre[OST:4] lustre-OST0005_UUID 1933276 25784 1786028 1% /mnt/lustre[OST:5] lustre-OST0006_UUID 1933276 25832 1786204 1% /mnt/lustre[OST:6] filesystem_summary: 13532932 180552 12502804 1% /mnt/lustre
test 40 is skipped and both tests 41 and 42 fail with errors indicating that an OST is full. Then test 43 again shows us that there is not a full OST
== replay-single test 43: mds osc import failure during recovery; don't LBUG == 17:33:37 (1542908017) CMD: trevis-34vm7 sync; sync; sync UUID 1K-blocks Used Available Use% Mounted on lustre-MDT0000_UUID 5825660 47292 5255536 1% /mnt/lustre[MDT:0] lustre-OST0000_UUID 1933276 25792 1786244 1% /mnt/lustre[OST:0] lustre-OST0001_UUID 1933276 25792 1786244 1% /mnt/lustre[OST:1] lustre-OST0002_UUID 1933276 25784 1786028 1% /mnt/lustre[OST:2] lustre-OST0003_UUID 1933276 25784 1786028 1% /mnt/lustre[OST:3] lustre-OST0004_UUID 1933276 25784 1786028 1% /mnt/lustre[OST:4] lustre-OST0005_UUID 1933276 25784 1786028 1% /mnt/lustre[OST:5] lustre-OST0006_UUID 1933276 25832 1786204 1% /mnt/lustre[OST:6] filesystem_summary: 13532932 180552 12502804 1% /mnt/lustre
Looking at the MDS (vm7) console log, we see the following for both test 41 and test 42
[ 128.628474] Lustre: DEBUG MARKER: == replay-single test 41: read from a valid osc while other oscs are invalid == 17:33:32 (1542908012) [ 128.639586] LustreError: 2271:0:(lod_qos.c:1354:lod_alloc_specific()) can't lstripe objid [0x200050929:0x643:0x0]: have 0 want 1 [ 128.874693] Lustre: DEBUG MARKER: /usr/sbin/lctl mark replay-single test_41: @@@@@@ FAIL: dd on client failed
There are several JIRA tickets that have similar failures and same message in the MDS console log. For example, this looks like LU-10613, but we are seeing an error on ioctl for this failure which is not seen in LU-10613.
We have seen this failure and error messages in other test sessions
https://testing.whamcloud.com/test_sets/0ea8f4f0-d350-11e8-b589-52540065bddc
https://testing.whamcloud.com/test_sets/5892b56a-ba69-11e8-9df3-52540065bddc
Attachments
Issue Links
- mentioned in
-
Page Loading...