Details
-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
Lustre 2.13.0, Lustre 2.10.7
-
None
-
3
-
9223372036854775807
Description
ost-pools test_24 fails with 'Pool '' not on /mnt/lustre/d24.ost-pools/dir3/f24.ost-pools0:test_85b'. We only see this test fail with this error message in full test sessions. So, some test suite prior to ost-pools may be failing or not cleaning up after itself. The pool name the test is looking for is strange; 'test_85b'. Test 24 is failing at line 1440 in the following ost-pools test script
1415 for i in 1 2 3 4; do 1416 dir=${POOL_ROOT}/dir${i} 1417 local pool 1418 local pool1 1419 local count 1420 local count1 1421 local index 1422 local size 1423 local size1 1424 1425 createmany -o $dir/${tfile} $numfiles || 1426 error "createmany $dir/${tfile} failed!" 1427 pool=$($LFS getstripe --pool $dir) 1428 index=$($LFS getstripe -i $dir) 1429 size=$($LFS getstripe -S $dir) 1430 count=$($LFS getstripe -c $dir) 1431 1432 for file in $dir/*; do 1433 if [ "$pool" != "" ]; then 1434 check_file_in_pool $file $pool 1435 fi 1436 pool1=$($LFS getstripe --pool $file) 1437 count1=$($LFS getstripe -c $file) 1438 size1=$($LFS getstripe -S $file) 1439 [[ "$pool" != "$pool1" ]] && 1440 error "Pool '$pool' not on $file:$pool1" 1441 [[ "$count" != "$count1" ]] && 1442 [[ "$count" != "-1" ]] && 1443 error "Stripe count $count not on"\ 1444 "$file:$count1" 1445 [[ "$count1" != "$OSTCOUNT" ]] && 1446 [[ "$count" = "-1" ]] && 1447 error "Stripe count $count1 not on"\ 1448 "$file:$OSTCOUNT" 1449 [[ "$size" != "$size1" ]] && [[ "$size" != "0" ]] && 1450 error "Stripe size $size not on $file:$size1" 1451 done 1452 done
Looking at a recent 2.10.7 RC1 test failure, https://testing.whamcloud.com/test_sets/8bc401ce-4320-11e9-8e92-52540065bddc, ost-pools is the only failure out of all test suites. Looking at the MDS (vm4) debug log, we see that we are calling with a bad pool name about 10 times:
00010000:00010000:1.0:1552189237.557826:0:12918:0:(ldlm_request.c:504:ldlm_cli_enqueue_local()) ### client-side local enqueue handler, new lock created ns: mdt-lustre-MDT0000_UUID lock: ffff8f36d7c0b200/0xe4fa7e4c800ca51f lrc: 3/0,1 mode: PW/PW res: [0x20006bac1:0x1972a:0x0].0xa55d7462 bits 0x2 rrc: 2 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 12918 timeout: 0 lvb_type: 0 00020000:01000000:1.0:1552189237.557892:0:12918:0:(lod_pool.c:920:lod_find_pool()) lustre-MDT0000-osd: request for an unknown pool (test_85b) 00000004:00080000:1.0:1552189237.557972:0:12918:0:(osp_object.c:1517:osp_create()) lustre-OST0000-osc-MDT0000: Wrote last used FID: [0x100000000:0x20219:0x0], index 0: 0
We do see that replay-single test 85b does create a test_85b pool, but it looks like it is destroyed
== replay-single test 85b: check the cancellation of unused locks during recovery(EXTENT) ============ 12:13:45 (1552162425) CMD: trevis-39vm4 lctl pool_new lustre.test_85b trevis-39vm4: Pool lustre.test_85b created CMD: trevis-39vm4 lctl get_param -n lod.lustre-MDT0000-mdtlov.pools.test_85b 2>/dev/null || echo foo CMD: trevis-39vm4 lctl get_param -n lod.lustre-MDT0000-mdtlov.pools.test_85b 2>/dev/null || echo foo CMD: trevis-39vm1 lctl get_param -n lov.lustre-*.pools.test_85b 2>/dev/null || echo foo CMD: trevis-39vm1 lctl get_param -n lov.lustre-*.pools.test_85b 2>/dev/null || echo foo CMD: trevis-39vm4 /usr/sbin/lctl pool_add lustre.test_85b lustre-OST0000 trevis-39vm4: OST lustre-OST0000_UUID added to pool lustre.test_85b before recovery: unused locks count = 100 ... after recovery: unused locks count = 0 CMD: trevis-39vm4 /usr/sbin/lctl pool_remove lustre.test_85b lustre-OST0000 trevis-39vm4: OST lustre-OST0000_UUID removed from pool lustre.test_85b CMD: trevis-39vm4 /usr/sbin/lctl pool_destroy lustre.test_85b trevis-39vm4: Pool lustre.test_85b destroyed ... CMD: trevis-39vm1,trevis-39vm2,trevis-39vm3,trevis-39vm4 dmesg Destroy the created pools: test_85b CMD: trevis-39vm4 /usr/sbin/lctl pool_list lustre PASS 85b (50s)
Yet, looking at output from later replay-single tests, we see that test_85b pool does exist. From test 90, lmm_pool is test_85b
Check getstripe: /usr/bin/lfs getstripe -r --obd lustre-OST0006_UUID /mnt/lustre/d90.replay-single/all lmm_stripe_count: 7 lmm_stripe_size: 1048576 lmm_pattern: 1 lmm_layout_gen: 0 lmm_stripe_offset: 3 lmm_pool: test_85b obdidx objid objid group 6 4930 0x1342 0 * /mnt/lustre/d90.replay-single/f6 lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: 1 lmm_layout_gen: 0 lmm_stripe_offset: 6 lmm_pool: test_85b obdidx objid objid group 6 4931 0x1343 0 * /mnt/lustre/d90.replay-single/all /mnt/lustre/d90.replay-single/f6 Failover ost7 to trevis-39vm3
Similar is test 132a
/mnt/lustre/f132a.replay-single
lcm_layout_gen: 3
lcm_entry_count: 2
lcme_id: 1
lcme_flags: init
lcme_extent.e_start: 0
lcme_extent.e_end: 1048576
lmm_stripe_count: 1
lmm_stripe_size: 1048576
lmm_pattern: 1
lmm_layout_gen: 0
lmm_stripe_offset: 3
lmm_pool: test_85b
lmm_objects:
- 0: { l_ost_idx: 3, l_fid: [0x100030000:0x1382:0x0] }
lcme_id: 2
lcme_flags: init
lcme_extent.e_start: 1048576
lcme_extent.e_end: EOF
lmm_stripe_count: 2
lmm_stripe_size: 1048576
lmm_pattern: 1
lmm_layout_gen: 0
lmm_stripe_offset: 4
lmm_pool: test_85b
lmm_objects:
- 0: { l_ost_idx: 4, l_fid: [0x100040000:0x1382:0x0] }
- 1: { l_ost_idx: 5, l_fid: [0x100050000:0x14e2:0x0] }
Looking at the results since June 2018, ost-pools test 24 did not fail with the error "Pool '' not on /mnt/lustre/d24.ost-pools/dir3/f24.ost-pools0:test_85b" on any branch from June 2018 - January 2019. Then, in February, this test started failing with this error again. Here are all the failures with this error in 2019:
12-FEB 2.12.51.28 - https://testing.whamcloud.com/test_sets/d9cbd72c-2ea6-11e9-a700-52540065bddc
27-FEB 2.10.6.63 - https://testing.whamcloud.com/test_sets/489d0546-3b35-11e9-913f-52540065bddc
27-FEB 2.12.51.51 - https://testing.whamcloud.com/test_sets/f7d5b686-3b15-11e9-b88b-52540065bddc
28-FEB 2.10.6.63 - https://testing.whamcloud.com/test_sets/6790308a-3b1d-11e9-a5c6-52540065bddc
28-FEB 2.10.6.63 - https://testing.whamcloud.com/test_sets/cec55e7c-3b83-11e9-9646-52540065bddc
28-FEB 2.10.6.63 - https://testing.whamcloud.com/test_sets/aa662b4a-3b62-11e9-913f-52540065bddc
28-FEB 2.10.6.63 - https://testing.whamcloud.com/test_sets/a446daf4-3b56-11e9-913f-52540065bddc
28-FEB 2.10.6.63 - https://testing.whamcloud.com/test_sets/17558fde-3b54-11e9-8f69-52540065bddc
28-FEB 2.10.6.63 - https://testing.whamcloud.com/test_sets/d13c77de-3b3e-11e9-913f-52540065bddc
3-MAR 2.12.51.79 server/2.12.0 clients - https://testing.whamcloud.com/test_sets/c816dce2-3e47-11e9-9720-52540065bddc
6-MAR 2.10.6.79 - https://testing.whamcloud.com/test_sets/07c7f1ca-4053-11e9-9646-52540065bddc
6-MAR 2.10.6.79 - https://testing.whamcloud.com/test_sets/aae422be-405f-11e9-a256-52540065bddc
6-MAR 2.10.6.79 - https://testing.whamcloud.com/test_sets/58314dc4-4061-11e9-9720-52540065bddc
6-MAR 2.10.6.79 - https://testing.whamcloud.com/test_sets/dd263538-4063-11e9-9720-52540065bddc
6-MAR 2.10.6.79 - https://testing.whamcloud.com/test_sets/8aca8b3e-4065-11e9-9720-52540065bddc
6-MAR 2.10.6.79 - https://testing.whamcloud.com/test_sets/e3b30308-406e-11e9-9646-52540065bddc
6-MAR 2.10.6.79 - https://testing.whamcloud.com/test_sets/848095a4-4077-11e9-8e92-52540065bddc
6-MAR 2.10.6.79 - https://testing.whamcloud.com/test_sets/a32fa184-40a4-11e9-92fe-52540065bddc
6-MAR 2.10.6.79 - https://testing.whamcloud.com/test_sets/2f488558-40ac-11e9-8e92-52540065bddc
7-MAR 2.10.6.79 - https://testing.whamcloud.com/test_sets/a4b30b40-40cc-11e9-a256-52540065bddc
9-MAR 2.10.7 RC1 - https://testing.whamcloud.com/test_sets/f6d552f6-42ee-11e9-b98a-52540065bddc
9-MAR 2.10.7 RC1 - https://testing.whamcloud.com/test_sets/12f341da-4300-11e9-9646-52540065bddc
9-MAR 2.10.7 RC1 - https://testing.whamcloud.com/test_sets/df7dc85c-430e-11e9-b98a-52540065bddc
10-MAR 2.10.7 RC1 - https://testing.whamcloud.com/test_sets/8bc401ce-4320-11e9-8e92-52540065bddc
10-MAR 2.10.7 RC1 - https://testing.whamcloud.com/test_sets/5a54930a-4335-11e9-a256-52540065bddc
10-MAR 2.10.7 RC1 - https://testing.whamcloud.com/test_sets/d29d65d4-433b-11e9-b98a-52540065bddc
10-MAR 2.10.7 RC1 - https://testing.whamcloud.com/test_sets/68760eee-4341-11e9-a256-52540065bddc
10-MAR 2.10.7 RC1 - https://testing.whamcloud.com/test_sets/f906e83a-434f-11e9-a256-52540065bddc