[LU-938] sanity test_27c: @@@@@@ FAIL: two-stripe file doesn't have two stripes Created: 17/Dec/11  Updated: 05/Nov/12  Resolved: 05/Nov/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Brian Murrell (Inactive) Assignee: Niu Yawei (Inactive)
Resolution: Cannot Reproduce Votes: 0
Labels: None
Environment:

ONLY=27c CLIENTONLY=true PDSH="pdsh -R ssh -S -w" mds_HOST=mds1 mgs_HOST=mds2 ost1_HOST=oss1 ost2_HOST=oss2 OSTDEV1="/dev/vdb" OSTDEV2="/dev/vdb" bash /usr/lib64/lustre/tests/sanity.sh


Attachments: File 27c.dk     File after_setstripe.log    
Severity: 3
Rank (Obsolete): 6054

 Description   

When I run sanity test 27c it fails to create a multistriped file. This is consistent with my own findings on the same cluster – which is what prompted me to run sanity in fact. To illustrate:

  1. lfs setstripe -c 2 bigfile5
    [root@client1 lustre]# lfs getstripe bigfile5
    bigfile5
    lmm_stripe_count: 1
    lmm_stripe_size: 1048576
    lmm_stripe_offset: 1
    obdidx objid objid group
    1 18853 0x49a5 0
  1. dd if=/dev/zero of=bigfile2 bs=10M count=100
    100+0 records in
    100+0 records out
    1048576000 bytes (1.0 GB) copied, 62.157 s, 16.9 MB/s
  2. lfs getstripe bigfile5
    bigfile5
    lmm_stripe_count: 1
    lmm_stripe_size: 1048576
    lmm_stripe_offset: 1
    obdidx objid objid group
    1 18853 0x49a5 0

So as you can see, my attempts to create a 2 striped file fail.

Two OSTs are available and have capacity:

  1. lfs df
    UUID 1K-blocks Used Available Use% Mounted on
    lustre-MDT0000_UUID 78629560 473256 72913424 1% /mnt/lustre[MDT:0]
    lustre-OST0000_UUID 208927160 472988 197968348 0% /mnt/lustre[OST:0]
    lustre-OST0001_UUID 208927160 1494412 196946924 1% /mnt/lustre[OST:1]

filesystem summary: 417854320 1967400 394915272 0% /mnt/lustre

  1. lfs df -i
    UUID Inodes IUsed IFree IUse% Mounted on
    lustre-MDT0000_UUID 52428800 40 52428760 0% /mnt/lustre[MDT:0]
    lustre-OST0000_UUID 3097600 58 3097542 0% /mnt/lustre[OST:0]
    lustre-OST0001_UUID 3097600 86 3097514 0% /mnt/lustre[OST:1]

filesystem summary: 52428800 40 52428760 0% /mnt/lustre

Servers are running:

kernel-2.6.32-131.6.1.el6_lustre.g65156ed.x86_64
kernel-firmware-2.6.32-131.6.1.el6_lustre.g65156ed.x86_64
lustre-2.1.0-2.6.32_131.6.1.el6_lustre.g65156ed.x86_64_g9d71fe8.x86_64
lustre-ldiskfs-3.3.0-2.6.32_131.6.1.el6_lustre.g65156ed.x86_64_g9d71fe8.x86_64
lustre-modules-2.1.0-2.6.32_131.6.1.el6_lustre.g65156ed.x86_64_g9d71fe8.x86_64
lustre-repo-2.1-1.noarch
lustre-tests-2.1.0-2.6.32_131.6.1.el6_lustre.g65156ed.x86_64_g9d71fe8.x86_64

which should be 2.1.0 GA and the client is running b2_1, which is (only) two commits forward of 2.1.0.

I will attach the 27c portion of the lustre debug log from the MDT for analysis.



 Comments   
Comment by Peter Jones [ 29/Dec/11 ]

Niu

Could you please look into this one?

Thanks

Peter

Comment by Niu Yawei (Inactive) [ 29/Dec/11 ]

I didn't see anything abnormal in the 27c.dk log, and I can't reproduce it with b2_1 code neither.

Hi, Brian, could you reproduce the test_27c with the D_TRACE enabled log? I think that might be helpful.

Comment by Brian Murrell (Inactive) [ 30/Dec/11 ]

Hi Niu,

I will attach after_setstripe.log which is a lctl dk immediately after issuing an lfs setstripe -c 2 per the example below:

# lfs setstripe -c 2 bigfile8
[root@client1 lustre]# lfs getstripe bigfile8
bigfile8
lmm_stripe_count:   1
lmm_stripe_size:    1048576
lmm_stripe_offset:  1
	obdidx		 objid		objid		 group
	     1	         18884	       0x49c4	             0
Comment by Niu Yawei (Inactive) [ 30/Dec/11 ]
00020000:00000001:0.0:1325259847.301275:0:28375:0:(lov_pack.c:319:lov_alloc_memmd()) Process leaving (rc=72 : 72 : 48)

MDS allocated memmd only for only 1 stripe, is it possible that one of the MDT OSC was inactive? Could you try 'lctl get_param osc.*.active' (on mds node) to see if all OSCs are active?

Comment by Brian Murrell (Inactive) [ 31/Dec/11 ]

Could you try 'lctl get_param osc.*.active' (on mds node) to see if all OSCs are active?

Sure:

# lctl get_param osc.*.active
osc.lustre-OST0000-osc-MDT0000.active=1
osc.lustre-OST0001-osc-MDT0000.active=1
Comment by Niu Yawei (Inactive) [ 03/Jan/12 ]

Thank you, Brian. Looks it's not caused by inactive osc. Could you recapture the log with both D_IOCTL & D_INFO enabled? We could probably figure out why mds changed the stripe count to 1 from this log.

Comment by Niu Yawei (Inactive) [ 10/Apr/12 ]

Brian, is it reporducable? or it's no longer relevant, can we close it now?

Comment by Niu Yawei (Inactive) [ 05/Nov/12 ]

Let reopen it if it can be reproduced.

Generated at Sat Feb 10 01:11:54 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.