Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.11.0
-
3
-
9223372036854775807
Description
sanity test_27o is failing because it is able to create a file after it exhausts all precreations on all OSTs. The error message for this failure is
'able to create /mnt/lustre/d27o.sanity/f27o.sanity'
For each OST, the test, in exhaust_all_preallocations(), collects osc..prealloc_last_id and osc..prealloc_next_id and creates (last_id – next_id +2) files to exhaust all file precreations. For each OST, and looking at the suite_log for the failure at https://testing.hpdd.intel.com/test_sets/dabd9962-0d65-11e8-bd00-52540065bddc, we see that this works. For example, for OST1
OSTIDX=1 MDTIDX=3 CMD: trevis-15vm5 lctl get_param -n osc.lustre-OST0001-osc-MDT0003.prealloc_last_id CMD: trevis-15vm5 lctl get_param -n osc.lustre-OST0001-osc-MDT0003.prealloc_next_id CMD: trevis-15vm5 lctl get_param osc.*OST*-osc-MDT0003.prealloc* … osc.lustre-OST0001-osc-MDT0003.prealloc_last_id=97 osc.lustre-OST0001-osc-MDT0003.prealloc_last_seq=0x380000401 osc.lustre-OST0001-osc-MDT0003.prealloc_next_id=69 osc.lustre-OST0001-osc-MDT0003.prealloc_next_seq=0x380000401 osc.lustre-OST0001-osc-MDT0003.prealloc_reserved=0 osc.lustre-OST0001-osc-MDT0003.prealloc_status=-28 … striped dir -i3 -c2 /mnt/lustre/d27o.sanity/lustre-OST0001 CMD: trevis-15vm3 lctl set_param fail_val=-1 fail_loc=0x215 fail_val=-1 fail_loc=0x215 Creating to objid 97 on ost lustre-OST0001... open(/mnt/lustre/d27o.sanity/lustre-OST0001/f71) error: No space left on device total: 2 open/close in 0.01 seconds: 210.13 ops/second
So, OST1 is “full”, and we see this for all OSTs except for one:
OSTIDX=0 MDTIDX=3 CMD: trevis-15vm5 lctl get_param -n osc.lustre-OST0000-osc-MDT0003.prealloc_last_id CMD: trevis-15vm5 lctl get_param -n osc.lustre-OST0000-osc-MDT0003.prealloc_next_id CMD: trevis-15vm5 lctl get_param osc.*OST*-osc-MDT0003.prealloc* osc.lustre-OST0000-osc-MDT0003.prealloc_last_id=129 osc.lustre-OST0000-osc-MDT0003.prealloc_last_seq=0x300000401 osc.lustre-OST0000-osc-MDT0003.prealloc_next_id=85 osc.lustre-OST0000-osc-MDT0003.prealloc_next_seq=0x300000401 osc.lustre-OST0000-osc-MDT0003.prealloc_reserved=0 osc.lustre-OST0000-osc-MDT0003.prealloc_status=0 … striped dir -i3 -c2 /mnt/lustre/d27o.sanity/lustre-OST0000 CMD: trevis-15vm3 lctl set_param fail_val=-1 fail_loc=0x215 fail_val=-1 fail_loc=0x215 Creating to objid 129 on ost lustre-OST0000... total: 46 open/close in 0.07 seconds: 615.37 ops/second
We don’t see OST0 fill/error with “No space on device”. Unfortunately, we see the same thing for sanity test 27o when it passes.
Although this might expected due to the fail_loc, in the dmesg log MDS1/3, we see
[ 1181.147030] Lustre: DEBUG MARKER: == sanity test 27o: create file with all full OSTs (should error) ==================================== 20:59:07 (1518123547) [ 1181.756807] Lustre: DEBUG MARKER: lctl get_param -n lov.*.qos_maxage [ 1194.103935] Lustre: DEBUG MARKER: lctl get_param -n lov.*.qos_maxage [ 1205.580992] LustreError: 28563:0:(lod_qos.c:1352:lod_alloc_specific()) can't lstripe objid [0x2000013a2:0xf4be:0x0]: have 0 want 1 [ 1206.377129] Lustre: DEBUG MARKER: lctl get_param -n lov.*.qos_maxage [ 1217.863538] LustreError: 30940:0:(lod_qos.c:1352:lod_alloc_specific()) can't lstripe objid [0x2000013a2:0xf4bf:0x0]: have 0 want 1 [ 1218.669304] Lustre: DEBUG MARKER: lctl get_param -n lov.*.qos_maxage [ 1230.138926] LustreError: 28565:0:(lod_qos.c:1352:lod_alloc_specific()) can't lstripe objid [0x2000013a2:0xf4c0:0x0]: have 0 want 1 [ 1230.931360] Lustre: DEBUG MARKER: lctl get_param -n lov.*.qos_maxage [ 1242.447227] LustreError: 32075:0:(lod_qos.c:1352:lod_alloc_specific()) can't lstripe objid [0x2000013a2:0xf4c1:0x0]: have 0 want 1 [ 1243.258795] Lustre: DEBUG MARKER: lctl get_param -n lov.*.qos_maxage [ 1254.768848] LustreError: 32075:0:(lod_qos.c:1352:lod_alloc_specific()) can't lstripe objid [0x2000013a2:0xf4c2:0x0]: have 0 want 1 [ 1255.579674] Lustre: DEBUG MARKER: lctl get_param -n lov.*.qos_maxage [ 1267.067337] LustreError: 30940:0:(lod_qos.c:1352:lod_alloc_specific()) can't lstripe objid [0x2000013a2:0xf4c3:0x0]: have 0 want 1 [ 1267.873867] Lustre: DEBUG MARKER: lctl get_param -n lov.*.qos_maxage [ 1280.214333] Lustre: DEBUG MARKER: lctl get_param -n lov.*.qos_maxage [ 1290.618993] Lustre: DEBUG MARKER: /usr/sbin/lctl mark sanity test_27o: @@@@@@ FAIL: able to create \/mnt\/lustre\/d27o.sanity\/f27o.sanity
sanity test 27o started failing with this error message on 2018-01-25 and, so far, only fails for DNE testing.
Logs for failures are at
https://testing.hpdd.intel.com/test_sets/959d7148-1c58-11e8-a10a-52540065bddc