[LU-7837] sanity test_27o times out hung in reset_enospc Created: 02/Mar/16 Updated: 13/Oct/21 Resolved: 13/Oct/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.9.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Nunez (Inactive) | Assignee: | WC Triage |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Environment: |
autotest review-dne-part-1 |
||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
sanity test 27o times out in review-dne-part-1. The last thing seen in the test_log is osc.lustre-OST0007-osc-MDT0003.prealloc_reserved=0 osc.lustre-OST0007-osc-MDT0003.prealloc_status=-28 CMD: onyx-42vm8 lctl set_param fail_loc=0x215 fail_loc=0x215 CMD: onyx-42vm7 lctl get_param -n lov.*.qos_maxage touch: cannot touch `/mnt/lustre/d27o.sanity/f27o.sanity': No space left on device CMD: onyx-42vm8 lctl set_param fail_loc=0 fail_loc=0 The ‘No space left on device’ is expected. It looks like the test is hung in the reset_enospc() routine possibly hung on the call to sync. 1467 # OSCs keep a NOSPC flag that will be reset after ~5s (qos_maxage)
1468 # if the OST isn't full anymore.
1469 reset_enospc() {
1470 local OSTIDX=${1:-""}
1471
1472 local list=$(comma_list $(osts_nodes))
1473 [ "$OSTIDX" ] && list=$(facet_host ost$((OSTIDX + 1)))
1474
1475 do_nodes $list lctl set_param fail_loc=0
1476 sync # initiate all OST_DESTROYs from MDS to OST
1477 sleep_maxage
1478 }
The logs incomplete and aren’t much help. The only thing that looks out of place are some disconnect notices for the client2 03:26:34:Lustre: DEBUG MARKER: == sanity test 27o: create file with all full OSTs (should error) ====== 01:24:20 (1456824260) 03:26:34: 03:26:34:<ConMan> Console [onyx-42vm2] disconnected from <onyx-42:6001> at 03-01 03:24. 03:26:34: 03:26:34:<ConMan> Console [onyx-42vm2] connected to <onyx-42:6001> at 03-01 03:24. 03:26:34:Lustre: DEBUG MARKER: lctl set_param -n fail_loc=0 fail_val=0 2>/dev/null || true and the MDS1 03:26:47:Lustre: DEBUG MARKER: == sanity test 27o: create file with all full OSTs (should error) ====== 01:24:20 (1456824260) 03:26:47:Lustre: DEBUG MARKER: lctl get_param -n lov.*.qos_maxage 03:26:47:Lustre: DEBUG MARKER: lctl get_param -n lov.*.qos_maxage 03:26:47:Lustre: DEBUG MARKER: lctl get_param -n lov.*.qos_maxage 03:26:47:Lustre: DEBUG MARKER: lctl get_param -n lov.*.qos_maxage 03:26:47:Lustre: DEBUG MARKER: lctl get_param -n lov.*.qos_maxage 03:26:47:Lustre: DEBUG MARKER: lctl get_param -n lov.*.qos_maxage 03:26:47:Lustre: DEBUG MARKER: lctl get_param -n lov.*.qos_maxage 03:26:47:Lustre: DEBUG MARKER: lctl get_param -n lov.*.qos_maxage 03:26:47:Lustre: DEBUG MARKER: lctl get_param -n lov.*.qos_maxage 03:26:47:Lustre: DEBUG MARKER: lctl get_param -n lov.*.qos_maxage 03:26:47: 03:26:47:<ConMan> Console [onyx-42vm7] disconnected from <onyx-42:6006> at 03-01 03:26. 03:26:47: 03:26:47:<ConMan> Console [onyx-42vm7] connected to <onyx-42:6006> at 03-01 03:26. 03:26:47:Lustre: DEBUG MARKER: lctl set_param -n fail_loc=0 fail_val=0 2>/dev/null || true Logs are at https://testing.hpdd.intel.com/test_sets/4a1a8e4a-dfce-11e5-9020-5254006e85c2 |