Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12060

ost-pools test 24 fails with 'Pool '' not on /mnt/lustre/d24.ost-pools/dir3/f24.ost-pools0:test_85b'

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • None
    • Lustre 2.13.0, Lustre 2.10.7
    • None
    • 3
    • 9223372036854775807

    Description

      ost-pools test_24 fails with 'Pool '' not on /mnt/lustre/d24.ost-pools/dir3/f24.ost-pools0:test_85b'. We only see this test fail with this error message in full test sessions. So, some test suite prior to ost-pools may be failing or not cleaning up after itself. The pool name the test is looking for is strange; 'test_85b'. Test 24 is failing at line 1440 in the following ost-pools test script

      1415         for i in 1 2 3 4; do
      1416                 dir=${POOL_ROOT}/dir${i}
      1417                 local pool
      1418                 local pool1
      1419                 local count
      1420                 local count1
      1421                 local index
      1422                 local size
      1423                 local size1
      1424 
      1425                 createmany -o $dir/${tfile} $numfiles ||
      1426                         error "createmany $dir/${tfile} failed!"
      1427                 pool=$($LFS getstripe --pool $dir)
      1428                 index=$($LFS getstripe -i $dir)
      1429                 size=$($LFS getstripe -S $dir)
      1430                 count=$($LFS getstripe -c $dir)
      1431 
      1432                 for file in $dir/*; do
      1433                         if [ "$pool" != "" ]; then
      1434                                 check_file_in_pool $file $pool
      1435                         fi
      1436                         pool1=$($LFS getstripe --pool $file)
      1437                         count1=$($LFS getstripe -c $file)
      1438                         size1=$($LFS getstripe -S $file)
      1439                         [[ "$pool" != "$pool1" ]] &&
      1440                                 error "Pool '$pool' not on $file:$pool1"
      1441                         [[ "$count" != "$count1" ]] &&
      1442                                 [[ "$count" != "-1" ]] &&
      1443                                         error "Stripe count $count not on"\
      1444                                                 "$file:$count1"
      1445                         [[ "$count1" != "$OSTCOUNT" ]] &&
      1446                                 [[ "$count" = "-1" ]] &&
      1447                                         error "Stripe count $count1 not on"\
      1448                                                 "$file:$OSTCOUNT"
      1449                         [[ "$size" != "$size1" ]] && [[ "$size" != "0" ]] &&
      1450                                 error "Stripe size $size not on $file:$size1"
      1451                 done
      1452         done
      

      Looking at a recent 2.10.7 RC1 test failure, https://testing.whamcloud.com/test_sets/8bc401ce-4320-11e9-8e92-52540065bddc, ost-pools is the only failure out of all test suites. Looking at the MDS (vm4) debug log, we see that we are calling with a bad pool name about 10 times:

      00010000:00010000:1.0:1552189237.557826:0:12918:0:(ldlm_request.c:504:ldlm_cli_enqueue_local()) ### client-side local enqueue handler, new lock created ns: mdt-lustre-MDT0000_UUID lock: ffff8f36d7c0b200/0xe4fa7e4c800ca51f lrc: 3/0,1 mode: PW/PW res: [0x20006bac1:0x1972a:0x0].0xa55d7462 bits 0x2 rrc: 2 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 12918 timeout: 0 lvb_type: 0
      00020000:01000000:1.0:1552189237.557892:0:12918:0:(lod_pool.c:920:lod_find_pool()) lustre-MDT0000-osd: request for an unknown pool (test_85b)
      00000004:00080000:1.0:1552189237.557972:0:12918:0:(osp_object.c:1517:osp_create()) lustre-OST0000-osc-MDT0000: Wrote last used FID: [0x100000000:0x20219:0x0], index 0: 0
      

      We do see that replay-single test 85b does create a test_85b pool, but it looks like it is destroyed

      == replay-single test 85b: check the cancellation of unused locks during recovery(EXTENT) ============ 12:13:45 (1552162425)
      CMD: trevis-39vm4 lctl pool_new lustre.test_85b
      trevis-39vm4: Pool lustre.test_85b created
      CMD: trevis-39vm4 lctl get_param -n lod.lustre-MDT0000-mdtlov.pools.test_85b 				2>/dev/null || echo foo
      CMD: trevis-39vm4 lctl get_param -n lod.lustre-MDT0000-mdtlov.pools.test_85b 				2>/dev/null || echo foo
      CMD: trevis-39vm1 lctl get_param -n lov.lustre-*.pools.test_85b 		2>/dev/null || echo foo
      CMD: trevis-39vm1 lctl get_param -n lov.lustre-*.pools.test_85b 		2>/dev/null || echo foo
      CMD: trevis-39vm4 /usr/sbin/lctl pool_add lustre.test_85b lustre-OST0000
      trevis-39vm4: OST lustre-OST0000_UUID added to pool lustre.test_85b
      before recovery: unused locks count = 100
      ...
      after recovery: unused locks count = 0
      CMD: trevis-39vm4 /usr/sbin/lctl pool_remove lustre.test_85b lustre-OST0000
      trevis-39vm4: OST lustre-OST0000_UUID removed from pool lustre.test_85b
      CMD: trevis-39vm4 /usr/sbin/lctl pool_destroy lustre.test_85b
      trevis-39vm4: Pool lustre.test_85b destroyed
      ...
      CMD: trevis-39vm1,trevis-39vm2,trevis-39vm3,trevis-39vm4 dmesg
      Destroy the created pools: test_85b
      CMD: trevis-39vm4 /usr/sbin/lctl pool_list lustre
      PASS 85b (50s)
      

      Yet, looking at output from later replay-single tests, we see that test_85b pool does exist. From test 90, lmm_pool is test_85b

      Check getstripe: /usr/bin/lfs getstripe -r --obd lustre-OST0006_UUID
      /mnt/lustre/d90.replay-single/all
      lmm_stripe_count:  7
      lmm_stripe_size:   1048576
      lmm_pattern:       1
      lmm_layout_gen:    0
      lmm_stripe_offset: 3
      lmm_pool:          test_85b
      	obdidx		 objid		 objid		 group
      	     6	          4930	       0x1342	             0 *
      
      /mnt/lustre/d90.replay-single/f6
      lmm_stripe_count:  1
      lmm_stripe_size:   1048576
      lmm_pattern:       1
      lmm_layout_gen:    0
      lmm_stripe_offset: 6
      lmm_pool:          test_85b
      	obdidx		 objid		 objid		 group
      	     6	          4931	       0x1343	             0 *
      /mnt/lustre/d90.replay-single/all
      /mnt/lustre/d90.replay-single/f6
      Failover ost7 to trevis-39vm3
      

      Similar is test 132a

      /mnt/lustre/f132a.replay-single
        lcm_layout_gen:  3
        lcm_entry_count: 2
          lcme_id:             1
          lcme_flags:          init
          lcme_extent.e_start: 0
          lcme_extent.e_end:   1048576
            lmm_stripe_count:  1
            lmm_stripe_size:   1048576
            lmm_pattern:       1
            lmm_layout_gen:    0
            lmm_stripe_offset: 3
            lmm_pool:          test_85b
            lmm_objects:
            - 0: { l_ost_idx: 3, l_fid: [0x100030000:0x1382:0x0] }
      
          lcme_id:             2
          lcme_flags:          init
          lcme_extent.e_start: 1048576
          lcme_extent.e_end:   EOF
            lmm_stripe_count:  2
            lmm_stripe_size:   1048576
            lmm_pattern:       1
            lmm_layout_gen:    0
            lmm_stripe_offset: 4
            lmm_pool:          test_85b
            lmm_objects:
            - 0: { l_ost_idx: 4, l_fid: [0x100040000:0x1382:0x0] }
            - 1: { l_ost_idx: 5, l_fid: [0x100050000:0x14e2:0x0] }
      

      Looking at the results since June 2018, ost-pools test 24 did not fail with the error "Pool '' not on /mnt/lustre/d24.ost-pools/dir3/f24.ost-pools0:test_85b" on any branch from June 2018 - January 2019. Then, in February, this test started failing with this error again. Here are all the failures with this error in 2019:
      12-FEB 2.12.51.28 - https://testing.whamcloud.com/test_sets/d9cbd72c-2ea6-11e9-a700-52540065bddc

      27-FEB 2.10.6.63 - https://testing.whamcloud.com/test_sets/489d0546-3b35-11e9-913f-52540065bddc
      27-FEB 2.12.51.51 - https://testing.whamcloud.com/test_sets/f7d5b686-3b15-11e9-b88b-52540065bddc
      28-FEB 2.10.6.63 - https://testing.whamcloud.com/test_sets/6790308a-3b1d-11e9-a5c6-52540065bddc
      28-FEB 2.10.6.63 - https://testing.whamcloud.com/test_sets/cec55e7c-3b83-11e9-9646-52540065bddc
      28-FEB 2.10.6.63 - https://testing.whamcloud.com/test_sets/aa662b4a-3b62-11e9-913f-52540065bddc
      28-FEB 2.10.6.63 - https://testing.whamcloud.com/test_sets/a446daf4-3b56-11e9-913f-52540065bddc
      28-FEB 2.10.6.63 - https://testing.whamcloud.com/test_sets/17558fde-3b54-11e9-8f69-52540065bddc
      28-FEB 2.10.6.63 - https://testing.whamcloud.com/test_sets/d13c77de-3b3e-11e9-913f-52540065bddc

      3-MAR 2.12.51.79 server/2.12.0 clients - https://testing.whamcloud.com/test_sets/c816dce2-3e47-11e9-9720-52540065bddc
      6-MAR 2.10.6.79 - https://testing.whamcloud.com/test_sets/07c7f1ca-4053-11e9-9646-52540065bddc
      6-MAR 2.10.6.79 - https://testing.whamcloud.com/test_sets/aae422be-405f-11e9-a256-52540065bddc
      6-MAR 2.10.6.79 - https://testing.whamcloud.com/test_sets/58314dc4-4061-11e9-9720-52540065bddc
      6-MAR 2.10.6.79 - https://testing.whamcloud.com/test_sets/dd263538-4063-11e9-9720-52540065bddc
      6-MAR 2.10.6.79 - https://testing.whamcloud.com/test_sets/8aca8b3e-4065-11e9-9720-52540065bddc
      6-MAR 2.10.6.79 - https://testing.whamcloud.com/test_sets/e3b30308-406e-11e9-9646-52540065bddc
      6-MAR 2.10.6.79 - https://testing.whamcloud.com/test_sets/848095a4-4077-11e9-8e92-52540065bddc
      6-MAR 2.10.6.79 - https://testing.whamcloud.com/test_sets/a32fa184-40a4-11e9-92fe-52540065bddc
      6-MAR 2.10.6.79 - https://testing.whamcloud.com/test_sets/2f488558-40ac-11e9-8e92-52540065bddc
      7-MAR 2.10.6.79 - https://testing.whamcloud.com/test_sets/a4b30b40-40cc-11e9-a256-52540065bddc
      9-MAR 2.10.7 RC1 - https://testing.whamcloud.com/test_sets/f6d552f6-42ee-11e9-b98a-52540065bddc
      9-MAR 2.10.7 RC1 - https://testing.whamcloud.com/test_sets/12f341da-4300-11e9-9646-52540065bddc
      9-MAR 2.10.7 RC1 - https://testing.whamcloud.com/test_sets/df7dc85c-430e-11e9-b98a-52540065bddc
      10-MAR 2.10.7 RC1 - https://testing.whamcloud.com/test_sets/8bc401ce-4320-11e9-8e92-52540065bddc
      10-MAR 2.10.7 RC1 - https://testing.whamcloud.com/test_sets/5a54930a-4335-11e9-a256-52540065bddc
      10-MAR 2.10.7 RC1 - https://testing.whamcloud.com/test_sets/d29d65d4-433b-11e9-b98a-52540065bddc
      10-MAR 2.10.7 RC1 - https://testing.whamcloud.com/test_sets/68760eee-4341-11e9-a256-52540065bddc
      10-MAR 2.10.7 RC1 - https://testing.whamcloud.com/test_sets/f906e83a-434f-11e9-a256-52540065bddc

      Attachments

        Issue Links

          Activity

            People

              pfarrell Patrick Farrell (Inactive)
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: