Details
-
Bug
-
Resolution: Cannot Reproduce
-
Minor
-
None
-
Lustre 2.9.0
-
None
-
autotest review-dne-part-1
-
3
-
9223372036854775807
Description
sanity test 27o times out in review-dne-part-1. The last thing seen in the test_log is
osc.lustre-OST0007-osc-MDT0003.prealloc_reserved=0 osc.lustre-OST0007-osc-MDT0003.prealloc_status=-28 CMD: onyx-42vm8 lctl set_param fail_loc=0x215 fail_loc=0x215 CMD: onyx-42vm7 lctl get_param -n lov.*.qos_maxage touch: cannot touch `/mnt/lustre/d27o.sanity/f27o.sanity': No space left on device CMD: onyx-42vm8 lctl set_param fail_loc=0 fail_loc=0
The ‘No space left on device’ is expected. It looks like the test is hung in the reset_enospc() routine possibly hung on the call to sync.
1467 # OSCs keep a NOSPC flag that will be reset after ~5s (qos_maxage)
1468 # if the OST isn't full anymore.
1469 reset_enospc() {
1470 local OSTIDX=${1:-""}
1471
1472 local list=$(comma_list $(osts_nodes))
1473 [ "$OSTIDX" ] && list=$(facet_host ost$((OSTIDX + 1)))
1474
1475 do_nodes $list lctl set_param fail_loc=0
1476 sync # initiate all OST_DESTROYs from MDS to OST
1477 sleep_maxage
1478 }
The logs incomplete and aren’t much help. The only thing that looks out of place are some disconnect notices for the client2
03:26:34:Lustre: DEBUG MARKER: == sanity test 27o: create file with all full OSTs (should error) ====== 01:24:20 (1456824260) 03:26:34: 03:26:34:<ConMan> Console [onyx-42vm2] disconnected from <onyx-42:6001> at 03-01 03:24. 03:26:34: 03:26:34:<ConMan> Console [onyx-42vm2] connected to <onyx-42:6001> at 03-01 03:24. 03:26:34:Lustre: DEBUG MARKER: lctl set_param -n fail_loc=0 fail_val=0 2>/dev/null || true
and the MDS1
03:26:47:Lustre: DEBUG MARKER: == sanity test 27o: create file with all full OSTs (should error) ====== 01:24:20 (1456824260) 03:26:47:Lustre: DEBUG MARKER: lctl get_param -n lov.*.qos_maxage 03:26:47:Lustre: DEBUG MARKER: lctl get_param -n lov.*.qos_maxage 03:26:47:Lustre: DEBUG MARKER: lctl get_param -n lov.*.qos_maxage 03:26:47:Lustre: DEBUG MARKER: lctl get_param -n lov.*.qos_maxage 03:26:47:Lustre: DEBUG MARKER: lctl get_param -n lov.*.qos_maxage 03:26:47:Lustre: DEBUG MARKER: lctl get_param -n lov.*.qos_maxage 03:26:47:Lustre: DEBUG MARKER: lctl get_param -n lov.*.qos_maxage 03:26:47:Lustre: DEBUG MARKER: lctl get_param -n lov.*.qos_maxage 03:26:47:Lustre: DEBUG MARKER: lctl get_param -n lov.*.qos_maxage 03:26:47:Lustre: DEBUG MARKER: lctl get_param -n lov.*.qos_maxage 03:26:47: 03:26:47:<ConMan> Console [onyx-42vm7] disconnected from <onyx-42:6006> at 03-01 03:26. 03:26:47: 03:26:47:<ConMan> Console [onyx-42vm7] connected to <onyx-42:6006> at 03-01 03:26. 03:26:47:Lustre: DEBUG MARKER: lctl set_param -n fail_loc=0 fail_val=0 2>/dev/null || true
Logs are at https://testing.hpdd.intel.com/test_sets/4a1a8e4a-dfce-11e5-9020-5254006e85c2