[LU-11725] replay-single test 41 fails with 'dd on client failed' Created: 02/Dec/18 Updated: 06/Dec/18 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.6 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Nunez (Inactive) | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Environment: |
failover test group configuration |
||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
replay-single test_41 fails for failover test sessions with error message 'dd on client failed' . Looking at the client test_log for https://testing.whamcloud.com/test_sets/57dc5c3c-ee96-11e8-86c0-52540065bddc , we see == replay-single test 41: read from a valid osc while other oscs are invalid == 17:33:32 (1542908012) error on ioctl 0x4008669a for '/mnt/lustre/f41.replay-single' (3): No space left on device error: setstripe: create striped file '/mnt/lustre/f41.replay-single' failed: No space left on device CMD: trevis-34vm1.trevis.whamcloud.com dd if=/dev/zero of=/mnt/lustre/f41.replay-single bs=4k count=1 dd: opening `/mnt/lustre/f41.replay-single': No space left on device replay-single test_41: @@@@@@ FAIL: dd on client failed Most likely, there is not an OST that is full because, looking at the suite_log, we see that there is space on every OST in test 39 == replay-single test 39: test recovery from unlink llog (test llog_gen_rec) == 17:32:13 (1542907933) total: 800 open/close in 1.19 seconds: 674.75 ops/second CMD: trevis-34vm8 sync; sync; sync UUID 1K-blocks Used Available Use% Mounted on lustre-MDT0000_UUID 5825660 47228 5255600 1% /mnt/lustre[MDT:0] lustre-OST0000_UUID 1933276 25792 1786244 1% /mnt/lustre[OST:0] lustre-OST0001_UUID 1933276 25792 1786244 1% /mnt/lustre[OST:1] lustre-OST0002_UUID 1933276 25784 1786028 1% /mnt/lustre[OST:2] lustre-OST0003_UUID 1933276 25784 1786028 1% /mnt/lustre[OST:3] lustre-OST0004_UUID 1933276 25784 1786028 1% /mnt/lustre[OST:4] lustre-OST0005_UUID 1933276 25784 1786028 1% /mnt/lustre[OST:5] lustre-OST0006_UUID 1933276 25832 1786204 1% /mnt/lustre[OST:6] filesystem_summary: 13532932 180552 12502804 1% /mnt/lustre test 40 is skipped and both tests 41 and 42 fail with errors indicating that an OST is full. Then test 43 again shows us that there is not a full OST == replay-single test 43: mds osc import failure during recovery; don't LBUG == 17:33:37 (1542908017) CMD: trevis-34vm7 sync; sync; sync UUID 1K-blocks Used Available Use% Mounted on lustre-MDT0000_UUID 5825660 47292 5255536 1% /mnt/lustre[MDT:0] lustre-OST0000_UUID 1933276 25792 1786244 1% /mnt/lustre[OST:0] lustre-OST0001_UUID 1933276 25792 1786244 1% /mnt/lustre[OST:1] lustre-OST0002_UUID 1933276 25784 1786028 1% /mnt/lustre[OST:2] lustre-OST0003_UUID 1933276 25784 1786028 1% /mnt/lustre[OST:3] lustre-OST0004_UUID 1933276 25784 1786028 1% /mnt/lustre[OST:4] lustre-OST0005_UUID 1933276 25784 1786028 1% /mnt/lustre[OST:5] lustre-OST0006_UUID 1933276 25832 1786204 1% /mnt/lustre[OST:6] filesystem_summary: 13532932 180552 12502804 1% /mnt/lustre Looking at the MDS (vm7) console log, we see the following for both test 41 and test 42 [ 128.628474] Lustre: DEBUG MARKER: == replay-single test 41: read from a valid osc while other oscs are invalid == 17:33:32 (1542908012) [ 128.639586] LustreError: 2271:0:(lod_qos.c:1354:lod_alloc_specific()) can't lstripe objid [0x200050929:0x643:0x0]: have 0 want 1 [ 128.874693] Lustre: DEBUG MARKER: /usr/sbin/lctl mark replay-single test_41: @@@@@@ FAIL: dd on client failed There are several JIRA tickets that have similar failures and same message in the MDS console log. For example, this looks like LU-10613, but we are seeing an error on ioctl for this failure which is not seen in LU-10613. We have seen this failure and error messages in other test sessions |
| Comments |
| Comment by Andreas Dilger [ 06/Dec/18 ] |
|
It would be useful if "lfs df" returned an OS_STATFS_OFFLINE state if the OSP on the MDS is inactive or offline, so that it is easier to see this on the client. |