[LU-10751] sanity test 27o fails with 'able to create /mnt/lustre/d27o.sanity/f27o.sanity' Created: 01/Mar/18 Updated: 23/Sep/21 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.11.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Nunez (Inactive) | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | dne | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
sanity test_27o is failing because it is able to create a file after it exhausts all precreations on all OSTs. The error message for this failure is 'able to create /mnt/lustre/d27o.sanity/f27o.sanity' For each OST, the test, in exhaust_all_preallocations(), collects osc..prealloc_last_id and osc..prealloc_next_id and creates (last_id – next_id +2) files to exhaust all file precreations. For each OST, and looking at the suite_log for the failure at https://testing.hpdd.intel.com/test_sets/dabd9962-0d65-11e8-bd00-52540065bddc, we see that this works. For example, for OST1 OSTIDX=1 MDTIDX=3 CMD: trevis-15vm5 lctl get_param -n osc.lustre-OST0001-osc-MDT0003.prealloc_last_id CMD: trevis-15vm5 lctl get_param -n osc.lustre-OST0001-osc-MDT0003.prealloc_next_id CMD: trevis-15vm5 lctl get_param osc.*OST*-osc-MDT0003.prealloc* … osc.lustre-OST0001-osc-MDT0003.prealloc_last_id=97 osc.lustre-OST0001-osc-MDT0003.prealloc_last_seq=0x380000401 osc.lustre-OST0001-osc-MDT0003.prealloc_next_id=69 osc.lustre-OST0001-osc-MDT0003.prealloc_next_seq=0x380000401 osc.lustre-OST0001-osc-MDT0003.prealloc_reserved=0 osc.lustre-OST0001-osc-MDT0003.prealloc_status=-28 … striped dir -i3 -c2 /mnt/lustre/d27o.sanity/lustre-OST0001 CMD: trevis-15vm3 lctl set_param fail_val=-1 fail_loc=0x215 fail_val=-1 fail_loc=0x215 Creating to objid 97 on ost lustre-OST0001... open(/mnt/lustre/d27o.sanity/lustre-OST0001/f71) error: No space left on device total: 2 open/close in 0.01 seconds: 210.13 ops/second So, OST1 is “full”, and we see this for all OSTs except for one: OSTIDX=0 MDTIDX=3 CMD: trevis-15vm5 lctl get_param -n osc.lustre-OST0000-osc-MDT0003.prealloc_last_id CMD: trevis-15vm5 lctl get_param -n osc.lustre-OST0000-osc-MDT0003.prealloc_next_id CMD: trevis-15vm5 lctl get_param osc.*OST*-osc-MDT0003.prealloc* osc.lustre-OST0000-osc-MDT0003.prealloc_last_id=129 osc.lustre-OST0000-osc-MDT0003.prealloc_last_seq=0x300000401 osc.lustre-OST0000-osc-MDT0003.prealloc_next_id=85 osc.lustre-OST0000-osc-MDT0003.prealloc_next_seq=0x300000401 osc.lustre-OST0000-osc-MDT0003.prealloc_reserved=0 osc.lustre-OST0000-osc-MDT0003.prealloc_status=0 … striped dir -i3 -c2 /mnt/lustre/d27o.sanity/lustre-OST0000 CMD: trevis-15vm3 lctl set_param fail_val=-1 fail_loc=0x215 fail_val=-1 fail_loc=0x215 Creating to objid 129 on ost lustre-OST0000... total: 46 open/close in 0.07 seconds: 615.37 ops/second We don’t see OST0 fill/error with “No space on device”. Unfortunately, we see the same thing for sanity test 27o when it passes. Although this might expected due to the fail_loc, in the dmesg log MDS1/3, we see [ 1181.147030] Lustre: DEBUG MARKER: == sanity test 27o: create file with all full OSTs (should error) ==================================== 20:59:07 (1518123547) [ 1181.756807] Lustre: DEBUG MARKER: lctl get_param -n lov.*.qos_maxage [ 1194.103935] Lustre: DEBUG MARKER: lctl get_param -n lov.*.qos_maxage [ 1205.580992] LustreError: 28563:0:(lod_qos.c:1352:lod_alloc_specific()) can't lstripe objid [0x2000013a2:0xf4be:0x0]: have 0 want 1 [ 1206.377129] Lustre: DEBUG MARKER: lctl get_param -n lov.*.qos_maxage [ 1217.863538] LustreError: 30940:0:(lod_qos.c:1352:lod_alloc_specific()) can't lstripe objid [0x2000013a2:0xf4bf:0x0]: have 0 want 1 [ 1218.669304] Lustre: DEBUG MARKER: lctl get_param -n lov.*.qos_maxage [ 1230.138926] LustreError: 28565:0:(lod_qos.c:1352:lod_alloc_specific()) can't lstripe objid [0x2000013a2:0xf4c0:0x0]: have 0 want 1 [ 1230.931360] Lustre: DEBUG MARKER: lctl get_param -n lov.*.qos_maxage [ 1242.447227] LustreError: 32075:0:(lod_qos.c:1352:lod_alloc_specific()) can't lstripe objid [0x2000013a2:0xf4c1:0x0]: have 0 want 1 [ 1243.258795] Lustre: DEBUG MARKER: lctl get_param -n lov.*.qos_maxage [ 1254.768848] LustreError: 32075:0:(lod_qos.c:1352:lod_alloc_specific()) can't lstripe objid [0x2000013a2:0xf4c2:0x0]: have 0 want 1 [ 1255.579674] Lustre: DEBUG MARKER: lctl get_param -n lov.*.qos_maxage [ 1267.067337] LustreError: 30940:0:(lod_qos.c:1352:lod_alloc_specific()) can't lstripe objid [0x2000013a2:0xf4c3:0x0]: have 0 want 1 [ 1267.873867] Lustre: DEBUG MARKER: lctl get_param -n lov.*.qos_maxage [ 1280.214333] Lustre: DEBUG MARKER: lctl get_param -n lov.*.qos_maxage [ 1290.618993] Lustre: DEBUG MARKER: /usr/sbin/lctl mark sanity test_27o: @@@@@@ FAIL: able to create \/mnt\/lustre\/d27o.sanity\/f27o.sanity sanity test 27o started failing with this error message on 2018-01-25 and, so far, only fails for DNE testing. Logs for failures are at |
| Comments |
| Comment by John Hammond [ 23/Sep/21 ] |
|
I don't think this test is testing anything other than whether the test function exhaust_all_precreations() can reliably exhaust all precreations. |