Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-938

sanity test_27c: @@@@@@ FAIL: two-stripe file doesn't have two stripes

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Minor
    • None
    • Lustre 2.1.0
    • None
    • ONLY=27c CLIENTONLY=true PDSH="pdsh -R ssh -S -w" mds_HOST=mds1 mgs_HOST=mds2 ost1_HOST=oss1 ost2_HOST=oss2 OSTDEV1="/dev/vdb" OSTDEV2="/dev/vdb" bash /usr/lib64/lustre/tests/sanity.sh
    • 3
    • 6054

    Description

      When I run sanity test 27c it fails to create a multistriped file. This is consistent with my own findings on the same cluster – which is what prompted me to run sanity in fact. To illustrate:

      1. lfs setstripe -c 2 bigfile5
        [root@client1 lustre]# lfs getstripe bigfile5
        bigfile5
        lmm_stripe_count: 1
        lmm_stripe_size: 1048576
        lmm_stripe_offset: 1
        obdidx objid objid group
        1 18853 0x49a5 0
      1. dd if=/dev/zero of=bigfile2 bs=10M count=100
        100+0 records in
        100+0 records out
        1048576000 bytes (1.0 GB) copied, 62.157 s, 16.9 MB/s
      2. lfs getstripe bigfile5
        bigfile5
        lmm_stripe_count: 1
        lmm_stripe_size: 1048576
        lmm_stripe_offset: 1
        obdidx objid objid group
        1 18853 0x49a5 0

      So as you can see, my attempts to create a 2 striped file fail.

      Two OSTs are available and have capacity:

      1. lfs df
        UUID 1K-blocks Used Available Use% Mounted on
        lustre-MDT0000_UUID 78629560 473256 72913424 1% /mnt/lustre[MDT:0]
        lustre-OST0000_UUID 208927160 472988 197968348 0% /mnt/lustre[OST:0]
        lustre-OST0001_UUID 208927160 1494412 196946924 1% /mnt/lustre[OST:1]

      filesystem summary: 417854320 1967400 394915272 0% /mnt/lustre

      1. lfs df -i
        UUID Inodes IUsed IFree IUse% Mounted on
        lustre-MDT0000_UUID 52428800 40 52428760 0% /mnt/lustre[MDT:0]
        lustre-OST0000_UUID 3097600 58 3097542 0% /mnt/lustre[OST:0]
        lustre-OST0001_UUID 3097600 86 3097514 0% /mnt/lustre[OST:1]

      filesystem summary: 52428800 40 52428760 0% /mnt/lustre

      Servers are running:

      kernel-2.6.32-131.6.1.el6_lustre.g65156ed.x86_64
      kernel-firmware-2.6.32-131.6.1.el6_lustre.g65156ed.x86_64
      lustre-2.1.0-2.6.32_131.6.1.el6_lustre.g65156ed.x86_64_g9d71fe8.x86_64
      lustre-ldiskfs-3.3.0-2.6.32_131.6.1.el6_lustre.g65156ed.x86_64_g9d71fe8.x86_64
      lustre-modules-2.1.0-2.6.32_131.6.1.el6_lustre.g65156ed.x86_64_g9d71fe8.x86_64
      lustre-repo-2.1-1.noarch
      lustre-tests-2.1.0-2.6.32_131.6.1.el6_lustre.g65156ed.x86_64_g9d71fe8.x86_64

      which should be 2.1.0 GA and the client is running b2_1, which is (only) two commits forward of 2.1.0.

      I will attach the 27c portion of the lustre debug log from the MDT for analysis.

      Attachments

        1. 27c.dk
          71 kB
        2. after_setstripe.log
          213 kB

        Issue Links

          Activity

            [LU-938] sanity test_27c: @@@@@@ FAIL: two-stripe file doesn't have two stripes

            Let reopen it if it can be reproduced.

            niu Niu Yawei (Inactive) added a comment - Let reopen it if it can be reproduced.

            Brian, is it reporducable? or it's no longer relevant, can we close it now?

            niu Niu Yawei (Inactive) added a comment - Brian, is it reporducable? or it's no longer relevant, can we close it now?

            Thank you, Brian. Looks it's not caused by inactive osc. Could you recapture the log with both D_IOCTL & D_INFO enabled? We could probably figure out why mds changed the stripe count to 1 from this log.

            niu Niu Yawei (Inactive) added a comment - Thank you, Brian. Looks it's not caused by inactive osc. Could you recapture the log with both D_IOCTL & D_INFO enabled? We could probably figure out why mds changed the stripe count to 1 from this log.

            Could you try 'lctl get_param osc.*.active' (on mds node) to see if all OSCs are active?

            Sure:

            # lctl get_param osc.*.active
            osc.lustre-OST0000-osc-MDT0000.active=1
            osc.lustre-OST0001-osc-MDT0000.active=1
            
            brian Brian Murrell (Inactive) added a comment - Could you try 'lctl get_param osc.*.active' (on mds node) to see if all OSCs are active? Sure: # lctl get_param osc.*.active osc.lustre-OST0000-osc-MDT0000.active=1 osc.lustre-OST0001-osc-MDT0000.active=1
            00020000:00000001:0.0:1325259847.301275:0:28375:0:(lov_pack.c:319:lov_alloc_memmd()) Process leaving (rc=72 : 72 : 48)
            

            MDS allocated memmd only for only 1 stripe, is it possible that one of the MDT OSC was inactive? Could you try 'lctl get_param osc.*.active' (on mds node) to see if all OSCs are active?

            niu Niu Yawei (Inactive) added a comment - 00020000:00000001:0.0:1325259847.301275:0:28375:0:(lov_pack.c:319:lov_alloc_memmd()) Process leaving (rc=72 : 72 : 48) MDS allocated memmd only for only 1 stripe, is it possible that one of the MDT OSC was inactive? Could you try 'lctl get_param osc.*.active' (on mds node) to see if all OSCs are active?

            Hi Niu,

            I will attach after_setstripe.log which is a lctl dk immediately after issuing an lfs setstripe -c 2 per the example below:

            # lfs setstripe -c 2 bigfile8
            [root@client1 lustre]# lfs getstripe bigfile8
            bigfile8
            lmm_stripe_count:   1
            lmm_stripe_size:    1048576
            lmm_stripe_offset:  1
            	obdidx		 objid		objid		 group
            	     1	         18884	       0x49c4	             0
            
            brian Brian Murrell (Inactive) added a comment - Hi Niu, I will attach after_setstripe.log which is a lctl dk immediately after issuing an lfs setstripe -c 2 per the example below: # lfs setstripe -c 2 bigfile8 [root@client1 lustre]# lfs getstripe bigfile8 bigfile8 lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_stripe_offset: 1 obdidx objid objid group 1 18884 0x49c4 0

            I didn't see anything abnormal in the 27c.dk log, and I can't reproduce it with b2_1 code neither.

            Hi, Brian, could you reproduce the test_27c with the D_TRACE enabled log? I think that might be helpful.

            niu Niu Yawei (Inactive) added a comment - I didn't see anything abnormal in the 27c.dk log, and I can't reproduce it with b2_1 code neither. Hi, Brian, could you reproduce the test_27c with the D_TRACE enabled log? I think that might be helpful.
            pjones Peter Jones added a comment -

            Niu

            Could you please look into this one?

            Thanks

            Peter

            pjones Peter Jones added a comment - Niu Could you please look into this one? Thanks Peter

            People

              niu Niu Yawei (Inactive)
              brian Brian Murrell (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: