[LU-6602] ASSERTION( rec->lrh_len <= 8192 ) failed Created: 14/May/15  Updated: 30/Aug/19  Resolved: 16/Jul/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.8.0

Type: Bug Priority: Critical
Reporter: Robert Read (Inactive) Assignee: Di Wang
Resolution: Fixed Votes: 0
Labels: dne2

Issue Links:
Blocker
is blocking LU-6737 many stripe testing of DNE2 Resolved
Related
is related to LU-6831 The ticket for tracking all DNE2 bugs Reopened
is related to LU-7666 llog_cat_new_log() should use chunk s... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

  Testing this build: https://build.hpdd.intel.com/job/lustre-reviews/32021/

In AWS environment with 64 MDTs (8 MDS * 8 MDT each).

  1. cd /mnt/lustre
  2. lfs mkdir -c 8 8stripedir
  3. lfs mkdir -c 64 64stripedir
    <hang>
    On MDS0
    LustreError: 1291:0:(llog_cat.c:319:llog_cat_add_rec()) ASSERTION( rec->lrh_len <= 8192 ) failed: 
    LustreError: 1291:0:(llog_cat.c:319:llog_cat_add_rec()) LBUG
    Pid: 1291, comm: mdt00_002
    
    Call Trace:
     [<ffffffffa00f2875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
     [<ffffffffa00f2e77>] lbug_with_loc+0x47/0xb0 [libcfs]
     [<ffffffffa0207848>] llog_cat_add_rec+0x3e8/0x450 [obdclass]
     [<ffffffffa01ff039>] llog_add+0x89/0x1c0 [obdclass]
     [<ffffffffa187b6f4>] sub_updates_write+0x154/0x600 [ptlrpc]
     [<ffffffffa187c247>] top_trans_stop+0x6a7/0xb40 [ptlrpc]
     [<ffffffffa1d8cd21>] lod_trans_stop+0x61/0x70 [lod]
     [<ffffffffa1e3149a>] mdd_trans_stop+0x1a/0xac [mdd]
     [<ffffffffa1e20909>] mdd_create+0x13a9/0x1750 [mdd]
     [<ffffffffa1cdb65c>] ? mdt_version_save+0x8c/0x1a0 [mdt]
     [<ffffffffa1cdf9ec>] mdt_reint_create+0xbbc/0xcc0 [mdt]
     [<ffffffffa1cdab1d>] mdt_reint_rec+0x5d/0x200 [mdt]
     [<ffffffffa1cbffcb>] mdt_reint_internal+0x4cb/0x7a0 [mdt]
     [<ffffffffa1cc073b>] mdt_reint+0x6b/0x120 [mdt]
     [<ffffffffa1868e8e>] tgt_request_handle+0x8be/0xfe0 [ptlrpc]
     [<ffffffffa1818aa1>] ptlrpc_main+0xe41/0x1970 [ptlrpc]
     [<ffffffff81060c3f>] ? finish_task_switch+0x4f/0xf0
     [<ffffffffa1817c60>] ? ptlrpc_main+0x0/0x1970 [ptlrpc]
     [<ffffffff8109e71e>] kthread+0x9e/0xc0
     [<ffffffff8100c20a>] child_rip+0xa/0x20
     [<ffffffff8100b294>] ? int_ret_from_sys_call+0x7/0x1b
     [<ffffffff8100ba1d>] ? retint_restore_args+0x5/0x6
     [<ffffffff8100c200>] ? child_rip+0x0/0x20
    
    Kernel panic - not syncing: LBUG
    Pid: 1291, comm: mdt00_002 Not tainted 2.6.32-504.16.2.el6_lustre.gd805a88.x86_64 #1
    Call Trace:
     [<ffffffff81529fbc>] ? panic+0xa7/0x16f
     [<ffffffffa00f2ecb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
     [<ffffffffa0207848>] ? llog_cat_add_rec+0x3e8/0x450 [obdclass]
     [<ffffffffa01ff039>] ? llog_add+0x89/0x1c0 [obdclass]
     [<ffffffffa187b6f4>] ? sub_updates_write+0x154/0x600 [ptlrpc]
     [<ffffffffa187c247>] ? top_trans_stop+0x6a7/0xb40 [ptlrpc]
     [<ffffffffa1d8cd21>] ? lod_trans_stop+0x61/0x70 [lod]
     [<ffffffffa1e3149a>] ? mdd_trans_stop+0x1a/0xac [mdd]
     [<ffffffffa1e20909>] ? mdd_create+0x13a9/0x1750 [mdd]
     [<ffffffffa1cdb65c>] ? mdt_version_save+0x8c/0x1a0 [mdt]
     [<ffffffffa1cdf9ec>] ? mdt_reint_create+0xbbc/0xcc0 [mdt]
     [<ffffffffa1cdab1d>] ? mdt_reint_rec+0x5d/0x200 [mdt]
     [<ffffffffa1cbffcb>] ? mdt_reint_internal+0x4cb/0x7a0 [mdt]
     [<ffffffffa1cc073b>] ? mdt_reint+0x6b/0x120 [mdt]
     [<ffffffffa1868e8e>] ? tgt_request_handle+0x8be/0xfe0 [ptlrpc]
     [<ffffffffa1818aa1>] ? ptlrpc_main+0xe41/0x1970 [ptlrpc]
     [<ffffffff81060c3f>] ? finish_task_switch+0x4f/0xf0
     [<ffffffffa1817c60>] ? ptlrpc_main+0x0/0x1970 [ptlrpc]
     [<ffffffff8109e71e>] ? kthread+0x9e/0xc0
     [<ffffffff8100c20a>] ? child_rip+0xa/0x20
     [<ffffffff8100b294>] ? int_ret_from_sys_call+0x7/0x1b
     [<ffffffff8100ba1d>] ? retint_restore_args+0x5/0x6
     [<ffffffff8100c200>] ? child_rip+0x0/0x20
    

After each reboot/recovery cycle the MDS would LBUG again with same error right after recovery completed. Presumably the client was resending the mkdir. Once I killed lfs, the crashes stopped.



 Comments   
Comment by Andreas Dilger [ 14/May/15 ]

First off, we need to work out what the size of the llog record is as a function of the stripe count. The LASSERT() should be turned into an LASSERTF() for future reference, and we will also need to put in a safety check for the maximum stripe count on the MDS to avoid hitting this in the future.

One option is to simply increase the maximum llog record size. Since these llog records are no longer sent over the network (except ChangeLogs, which would also benefit from an increase in llog size) for unlink and setattr processing, and DNE2 OUT only accesses them like regular files, then this won't have as big a problem as in the past (where many > 8KB buffer allocations on the clients could be problematic). Some code changes are needed in the llog code to handle multiple different llog chunk sizes. A reasonable maximum size would be 32KB, which is what our current RPC size limit is for 2000-stripe LOV EAs.

Of course, it is also desirable to shrink the llog redo record size if possible, so that it can scale as much as possible.

Comment by Gerrit Updater [ 18/May/15 ]

wangdi (di.wang@intel.com) uploaded a new patch: http://review.whamcloud.com/14842
Subject: LU-6602 llog: increase llog chunk size
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: c87627011845ad48e96ed7d2e5fb9046c322cb94

Comment by Gerrit Updater [ 20/May/15 ]

wangdi (di.wang@intel.com) uploaded a new patch: http://review.whamcloud.com/14883
Subject: LU-6602 obdclass: variable llog chunk size
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 5430e1ec551cb066d68bf52352a236ae61762626

Comment by Robert Read (Inactive) [ 26/May/15 ]

Testing this build https://build.hpdd.intel.com/job/lustre-reviews/32346/

I started with creating a 64 stripe directory:

[root@client00 scratch]# lfs mkdir -c 64 64stripes

This hangs, and a I saw a soft lockup on the MDT0:

BUG: soft lockup - CPU#0 stuck for 67s! [mdt00_006:12371]
Modules linked in: 8021q garp stp llc osp(U) mdd(U) lod(U) mdt(U) lfsck(U) mgc(U) osd_ldiskfs(U) lquota(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic crc32c_intel libcfs(U) ldiskfs(U) ipv6 xen_netfront ext4 jbd2 mbcache xen_blkfront dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
CPU 0 
Modules linked in: 8021q garp stp llc osp(U) mdd(U) lod(U) mdt(U) lfsck(U) mgc(U) osd_ldiskfs(U) lquota(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic crc32c_intel libcfs(U) ldiskfs(U) ipv6 xen_netfront ext4 jbd2 mbcache xen_blkfront dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]

Pid: 12371, comm: mdt00_006 Not tainted 2.6.32-504.16.2.el6_lustre.g2f99b7f.x86_64 #1  
RIP: e030:[<ffffffffa0c6a1e5>]  [<ffffffffa0c6a1e5>] lod_sub_object_index_insert+0xc5/0x330 [lod]
RSP: e02b:ffff88076e32b810  EFLAGS: 00000202
RAX: 0000000000000142 RBX: ffff8805d6dbbc70 RCX: ffff8805d3916030
RDX: ffff8805d3916000 RSI: ffff8805d3916030 RDI: ffff8805d3917e62
RBP: ffff88076e32b8a0 R08: 0000000000000152 R09: 0000000000001e32
R10: ffff8805d3916010 R11: 0000000000000003 R12: ffff8805dd068740
R13: ffff8805d6b03bc0 R14: ffff88074cb47f80 R15: ffff8805d6f9c558
FS:  00007fac721f1740(0000) GS:ffff880028050000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007fac721fe000 CR3: 00000005f7678000 CR4: 0000000000002660
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process mdt00_006 (pid: 12371, threadinfo ffff88076e32a000, task ffff8805d7a84ab0)
Stack:
 ffff88076e32b860 ffff8805e1a523c0 ffff8805d6f9c6a8 ffff8805d6f9c558
<d> 0000000000002000 0000000000002000 ffff8805d6f9c6a8 ffff8805f9a7f6c0
<d> 6a04000002000000 000000000000001a 00000000000002dc 01ff8805d6f9c688
Call Trace:
 [<ffffffffa0c5e460>] lod_dir_striping_create_internal+0xd60/0x1610 [lod]
 [<ffffffffa0c5f3e5>] lod_xattr_set+0x365/0x3e0 [lod]
 [<ffffffffa0cc1e06>] mdo_xattr_set.clone.0+0xc6/0x170 [mdd]
 [<ffffffffa0ccebb3>] mdd_object_create+0x493/0x8d0 [mdd]
 [<ffffffffa059b07a>] ? top_trans_start+0x3fa/0x880 [ptlrpc]
 [<ffffffffa0cd1e88>] mdd_create+0xc98/0x1750 [mdd]
 [<ffffffffa0b8d5ec>] ? mdt_version_save+0x8c/0x1a0 [mdt]
 [<ffffffffa0b9197c>] mdt_reint_create+0xbbc/0xcc0 [mdt]
 [<ffffffffa031fc00>] ? lu_ucred+0x20/0x30 [obdclass]
 [<ffffffffa0b6dd95>] ? mdt_ucred+0x15/0x20 [mdt]
 [<ffffffffa0b888ec>] ? mdt_root_squash+0x2c/0x3f0 [mdt]
 [<ffffffffa054bcb2>] ? __req_capsule_get+0x162/0x6e0 [ptlrpc]
 [<ffffffffa0b8c99d>] mdt_reint_rec+0x5d/0x200 [mdt]
 [<ffffffffa0b71f6b>] mdt_reint_internal+0x4cb/0x7a0 [mdt]
 [<ffffffffa0b726db>] mdt_reint+0x6b/0x120 [mdt]
 [<ffffffffa058742e>] tgt_request_handle+0x95e/0x10b0 [ptlrpc]
 [<ffffffffa0536b71>] ptlrpc_main+0xe41/0x1970 [ptlrpc]
 [<ffffffff81060c3f>] ? finish_task_switch+0x4f/0xf0
 [<ffffffffa0535d30>] ? ptlrpc_main+0x0/0x1970 [ptlrpc]
 [<ffffffff8109e71e>] kthread+0x9e/0xc0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8100b294>] ? int_ret_from_sys_call+0x7/0x1b
 [<ffffffff8100ba1d>] ? retint_restore_args+0x5/0x6
 [<ffffffff8100c200>] ? child_rip+0x0/0x20
Code: 43 08 48 89 45 98 44 8b 42 28 45 85 c0 0f 84 67 01 00 00 48 8d 72 30 31 c0 45 31 c9 48 89 f1 48 89 f7 0f 1f 40 00 44 0f b7 5f 12 <83> c0 01 4f 8d 5c 1b 14 4d 01 d9 4c 01 df 41 39 c0 77 e8 31 c0 
Call Trace:
 [<ffffffffa0c5e460>] lod_dir_striping_create_internal+0xd60/0x1610 [lod]
 [<ffffffffa0c5f3e5>] lod_xattr_set+0x365/0x3e0 [lod]
 [<ffffffffa0cc1e06>] mdo_xattr_set.clone.0+0xc6/0x170 [mdd]
 [<ffffffffa0ccebb3>] mdd_object_create+0x493/0x8d0 [mdd]
 [<ffffffffa059b07a>] ? top_trans_start+0x3fa/0x880 [ptlrpc]
 [<ffffffffa0cd1e88>] mdd_create+0xc98/0x1750 [mdd]
 [<ffffffffa0b8d5ec>] ? mdt_version_save+0x8c/0x1a0 [mdt]
 [<ffffffffa0b9197c>] mdt_reint_create+0xbbc/0xcc0 [mdt]
 [<ffffffffa031fc00>] ? lu_ucred+0x20/0x30 [obdclass]
 [<ffffffffa0b6dd95>] ? mdt_ucred+0x15/0x20 [mdt]
 [<ffffffffa0b888ec>] ? mdt_root_squash+0x2c/0x3f0 [mdt]
 [<ffffffffa054bcb2>] ? __req_capsule_get+0x162/0x6e0 [ptlrpc]
 [<ffffffffa0b8c99d>] mdt_reint_rec+0x5d/0x200 [mdt]
 [<ffffffffa0b71f6b>] mdt_reint_internal+0x4cb/0x7a0 [mdt]
 [<ffffffffa0b726db>] mdt_reint+0x6b/0x120 [mdt]
 [<ffffffffa058742e>] tgt_request_handle+0x95e/0x10b0 [ptlrpc]
 [<ffffffffa0536b71>] ptlrpc_main+0xe41/0x1970 [ptlrpc]
 [<ffffffff81060c3f>] ? finish_task_switch+0x4f/0xf0
 [<ffffffffa0535d30>] ? ptlrpc_main+0x0/0x1970 [ptlrpc]
 [<ffffffff8109e71e>] kthread+0x9e/0xc0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8100b294>] ? int_ret_from_sys_call+0x7/0x1b
 [<ffffffff8100ba1d>] ? retint_restore_args+0x5/0x6
 [<ffffffff8100c200>] ? child_rip+0x0/0x20
[root@mdt00 ~]# 
Message from syslogd@mdt00 at May 26 18:56:44 ...
 kernel:BUG: soft lockup - CPU#0 stuck for 67s! [mdt00_006:12371]

Message from syslogd@mdt00 at May 26 18:58:08 ...
 kernel:BUG: soft lockup - CPU#0 stuck for 67s! [mdt00_006:12371]
Comment by Di Wang [ 28/May/15 ]

As discussed with Andreas and Alex, I changed the patch a bit. Unfortunately, due to the landing process, I can not push the independent patch there. So I pushed the whole dne3 patches here.

http://review.whamcloud.com/#/c/13942

With this patch, I can create 48 stripes directory. But let me try this build on OpenSFS node first.

Comment by Di Wang [ 28/May/15 ]

Ok, I was able to create a striped_dir with 68 stripes on Opensfs cluster. with the build
https://build.hpdd.intel.com/job/lustre-reviews/32529/

[root@c17 tests]# MDSCOUNT=68 sh llmount.sh 
Stopping clients: c17 /mnt/lustre (opts:)
Stopping clients: c17 /mnt/lustre2 (opts:)
Loading modules from /usr/lib64/lustre/tests/..
detected 8 online CPUs by sysfs
libcfs will create CPU partition based on online CPUs
debug=vfstrace rpctrace dlmtrace neterror ha config 		      ioctl super lfsck
subsystem_debug=all -lnet -lnd -pinger
Formatting mgs, mds, osts
Format mds1: /tmp/lustre-mdt1
Format mds2: /tmp/lustre-mdt2
Format mds3: /tmp/lustre-mdt3
Format mds4: /tmp/lustre-mdt4
Format mds5: /tmp/lustre-mdt5
Format mds6: /tmp/lustre-mdt6
Format mds7: /tmp/lustre-mdt7
Format mds8: /tmp/lustre-mdt8
Format mds9: /tmp/lustre-mdt9
Format mds10: /tmp/lustre-mdt10
Format mds11: /tmp/lustre-mdt11
Format mds12: /tmp/lustre-mdt12
Format mds13: /tmp/lustre-mdt13
Format mds14: /tmp/lustre-mdt14
Format mds15: /tmp/lustre-mdt15
Format mds16: /tmp/lustre-mdt16
Format mds17: /tmp/lustre-mdt17
Format mds18: /tmp/lustre-mdt18
Format mds19: /tmp/lustre-mdt19
Format mds20: /tmp/lustre-mdt20
Format mds21: /tmp/lustre-mdt21
Format mds22: /tmp/lustre-mdt22
Format mds23: /tmp/lustre-mdt23
Format mds24: /tmp/lustre-mdt24
Format mds25: /tmp/lustre-mdt25
Format mds26: /tmp/lustre-mdt26
Format mds27: /tmp/lustre-mdt27
Format mds28: /tmp/lustre-mdt28
Format mds29: /tmp/lustre-mdt29
Format mds30: /tmp/lustre-mdt30
Format mds31: /tmp/lustre-mdt31
Format mds32: /tmp/lustre-mdt32
Format mds33: /tmp/lustre-mdt33
Format mds34: /tmp/lustre-mdt34
Format mds35: /tmp/lustre-mdt35
Format mds36: /tmp/lustre-mdt36
Format mds37: /tmp/lustre-mdt37
Format mds38: /tmp/lustre-mdt38
Format mds39: /tmp/lustre-mdt39
Format mds40: /tmp/lustre-mdt40
Format mds41: /tmp/lustre-mdt41
Format mds42: /tmp/lustre-mdt42
Format mds43: /tmp/lustre-mdt43
Format mds44: /tmp/lustre-mdt44
Format mds45: /tmp/lustre-mdt45
Format mds46: /tmp/lustre-mdt46
Format mds47: /tmp/lustre-mdt47
Format mds48: /tmp/lustre-mdt48
Format mds49: /tmp/lustre-mdt49
Format mds50: /tmp/lustre-mdt50
Format mds51: /tmp/lustre-mdt51
Format mds52: /tmp/lustre-mdt52
Format mds53: /tmp/lustre-mdt53
Format mds54: /tmp/lustre-mdt54
Format mds55: /tmp/lustre-mdt55
Format mds56: /tmp/lustre-mdt56
Format mds57: /tmp/lustre-mdt57
Format mds58: /tmp/lustre-mdt58
Format mds59: /tmp/lustre-mdt59
Format mds60: /tmp/lustre-mdt60
Format mds61: /tmp/lustre-mdt61
Format mds62: /tmp/lustre-mdt62
Format mds63: /tmp/lustre-mdt63
Format mds64: /tmp/lustre-mdt64
Format mds65: /tmp/lustre-mdt65
Format mds66: /tmp/lustre-mdt66
Format mds67: /tmp/lustre-mdt67
Format mds68: /tmp/lustre-mdt68
Format ost1: /tmp/lustre-ost1
Format ost2: /tmp/lustre-ost2
Checking servers environments
Checking clients c17 environments
Loading modules from /usr/lib64/lustre/tests/..
detected 8 online CPUs by sysfs
libcfs will create CPU partition based on online CPUs
debug=vfstrace rpctrace dlmtrace neterror ha config 		      ioctl super lfsck
subsystem_debug=all -lnet -lnd -pinger
Setup mgs, mdt, osts
Starting mds1:   -o loop /tmp/lustre-mdt1 /mnt/mds1
Started lustre-MDT0000
Starting mds2:   -o loop /tmp/lustre-mdt2 /mnt/mds2
Started lustre-MDT0001
Starting mds3:   -o loop /tmp/lustre-mdt3 /mnt/mds3
Started lustre-MDT0002
Starting mds4:   -o loop /tmp/lustre-mdt4 /mnt/mds4
Started lustre-MDT0003
Starting mds5:   -o loop /tmp/lustre-mdt5 /mnt/mds5
Started lustre-MDT0004
Starting mds6:   -o loop /tmp/lustre-mdt6 /mnt/mds6
Started lustre-MDT0005
Starting mds7:   -o loop /tmp/lustre-mdt7 /mnt/mds7
Started lustre-MDT0006
Starting mds8:   -o loop /tmp/lustre-mdt8 /mnt/mds8
Started lustre-MDT0007
Starting mds9:   -o loop /tmp/lustre-mdt9 /mnt/mds9
Started lustre-MDT0008
Starting mds10:   -o loop /tmp/lustre-mdt10 /mnt/mds10
Started lustre-MDT0009
Starting mds11:   -o loop /tmp/lustre-mdt11 /mnt/mds11
Started lustre-MDT000a
Starting mds12:   -o loop /tmp/lustre-mdt12 /mnt/mds12
Started lustre-MDT000b
Starting mds13:   -o loop /tmp/lustre-mdt13 /mnt/mds13
Started lustre-MDT000c
Starting mds14:   -o loop /tmp/lustre-mdt14 /mnt/mds14
Started lustre-MDT000d
Starting mds15:   -o loop /tmp/lustre-mdt15 /mnt/mds15
Started lustre-MDT000e
Starting mds16:   -o loop /tmp/lustre-mdt16 /mnt/mds16
Started lustre-MDT000f
Starting mds17:   -o loop /tmp/lustre-mdt17 /mnt/mds17
Started lustre-MDT0010
Starting mds18:   -o loop /tmp/lustre-mdt18 /mnt/mds18
Started lustre-MDT0011
Starting mds19:   -o loop /tmp/lustre-mdt19 /mnt/mds19
Started lustre-MDT0012
Starting mds20:   -o loop /tmp/lustre-mdt20 /mnt/mds20
Started lustre-MDT0013
Starting mds21:   -o loop /tmp/lustre-mdt21 /mnt/mds21
Started lustre-MDT0014
Starting mds22:   -o loop /tmp/lustre-mdt22 /mnt/mds22
Started lustre-MDT0015
Starting mds23:   -o loop /tmp/lustre-mdt23 /mnt/mds23
Started lustre-MDT0016
Starting mds24:   -o loop /tmp/lustre-mdt24 /mnt/mds24
Started lustre-MDT0017
Starting mds25:   -o loop /tmp/lustre-mdt25 /mnt/mds25
Started lustre-MDT0018
Starting mds26:   -o loop /tmp/lustre-mdt26 /mnt/mds26
Started lustre-MDT0019
Starting mds27:   -o loop /tmp/lustre-mdt27 /mnt/mds27
Started lustre-MDT001a
Starting mds28:   -o loop /tmp/lustre-mdt28 /mnt/mds28
Started lustre-MDT001b
Starting mds29:   -o loop /tmp/lustre-mdt29 /mnt/mds29
Started lustre-MDT001c
Starting mds30:   -o loop /tmp/lustre-mdt30 /mnt/mds30
Started lustre-MDT001d
Starting mds31:   -o loop /tmp/lustre-mdt31 /mnt/mds31
Started lustre-MDT001e
Starting mds32:   -o loop /tmp/lustre-mdt32 /mnt/mds32
Started lustre-MDT001f
Starting mds33:   -o loop /tmp/lustre-mdt33 /mnt/mds33
Started lustre-MDT0020
Starting mds34:   -o loop /tmp/lustre-mdt34 /mnt/mds34
Started lustre-MDT0021
Starting mds35:   -o loop /tmp/lustre-mdt35 /mnt/mds35
Started lustre-MDT0022
Starting mds36:   -o loop /tmp/lustre-mdt36 /mnt/mds36
Started lustre-MDT0023
Starting mds37:   -o loop /tmp/lustre-mdt37 /mnt/mds37
Started lustre-MDT0024
Starting mds38:   -o loop /tmp/lustre-mdt38 /mnt/mds38
Started lustre-MDT0025
Starting mds39:   -o loop /tmp/lustre-mdt39 /mnt/mds39
Started lustre-MDT0026
Starting mds40:   -o loop /tmp/lustre-mdt40 /mnt/mds40
Started lustre-MDT0027
Starting mds41:   -o loop /tmp/lustre-mdt41 /mnt/mds41
Started lustre-MDT0028
Starting mds42:   -o loop /tmp/lustre-mdt42 /mnt/mds42
Started lustre-MDT0029
Starting mds43:   -o loop /tmp/lustre-mdt43 /mnt/mds43
Started lustre-MDT002a
Starting mds44:   -o loop /tmp/lustre-mdt44 /mnt/mds44
Started lustre-MDT002b
Starting mds45:   -o loop /tmp/lustre-mdt45 /mnt/mds45
Started lustre-MDT002c
Starting mds46:   -o loop /tmp/lustre-mdt46 /mnt/mds46
Started lustre-MDT002d
Starting mds47:   -o loop /tmp/lustre-mdt47 /mnt/mds47
Started lustre-MDT002e
Starting mds48:   -o loop /tmp/lustre-mdt48 /mnt/mds48
Started lustre-MDT002f
Starting mds49:   -o loop /tmp/lustre-mdt49 /mnt/mds49
Started lustre-MDT0030
Starting mds50:   -o loop /tmp/lustre-mdt50 /mnt/mds50
Started lustre-MDT0031
Starting mds51:   -o loop /tmp/lustre-mdt51 /mnt/mds51
Started lustre-MDT0032
Starting mds52:   -o loop /tmp/lustre-mdt52 /mnt/mds52
Started lustre-MDT0033
Starting mds53:   -o loop /tmp/lustre-mdt53 /mnt/mds53
Started lustre-MDT0034
Starting mds54:   -o loop /tmp/lustre-mdt54 /mnt/mds54
Started lustre-MDT0035
Starting mds55:   -o loop /tmp/lustre-mdt55 /mnt/mds55
Started lustre-MDT0036
Starting mds56:   -o loop /tmp/lustre-mdt56 /mnt/mds56
Started lustre-MDT0037
Starting mds57:   -o loop /tmp/lustre-mdt57 /mnt/mds57
Started lustre-MDT0038
Starting mds58:   -o loop /tmp/lustre-mdt58 /mnt/mds58

Started lustre-MDT0039
Starting mds59:   -o loop /tmp/lustre-mdt59 /mnt/mds59
Started lustre-MDT003a
Starting mds60:   -o loop /tmp/lustre-mdt60 /mnt/mds60
Started lustre-MDT003b
Starting mds61:   -o loop /tmp/lustre-mdt61 /mnt/mds61
Started lustre-MDT003c
Starting mds62:   -o loop /tmp/lustre-mdt62 /mnt/mds62
Started lustre-MDT003d
Starting mds63:   -o loop /tmp/lustre-mdt63 /mnt/mds63
Started lustre-MDT003e
Starting mds64:   -o loop /tmp/lustre-mdt64 /mnt/mds64
Started lustre-MDT003f
Starting mds65:   -o loop /tmp/lustre-mdt65 /mnt/mds65
Started lustre-MDT0040
Starting mds66:   -o loop /tmp/lustre-mdt66 /mnt/mds66

Started lustre-MDT0041
Starting mds67:   -o loop /tmp/lustre-mdt67 /mnt/mds67
Started lustre-MDT0042
Starting mds68:   -o loop /tmp/lustre-mdt68 /mnt/mds68
Started lustre-MDT0043
Starting ost1:   -o loop /tmp/lustre-ost1 /mnt/ost1
Started lustre-OST0000
Starting ost2:   -o loop /tmp/lustre-ost2 /mnt/ost2
Started lustre-OST0001
Starting client: c17:  -o user_xattr,flock c17@tcp:/lustre /mnt/lustre
Using TIMEOUT=20
seting jobstats to procname_uid
Setting lustre.sys.jobid_var from disable to procname_uid
warning: 'lctl conf_param' is deprecated, use 'lctl set_param -P' instead
Waiting 90 secs for update
Updated after 8s: wanted 'procname_uid' got 'procname_uid'
disable quota as required
[root@c17 tests]# 
[root@c17 tests]# 
[root@c17 tests]# df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sda1             20642428   5641124  13952728  29% /
tmpfs                 16426748         0  16426748   0% /dev/shm
192.168.0.1:/scratch 416433152 272369664 122909696  69% /scratch
192.168.0.1:/home    416433152 272369664 122909696  69% /home
/dev/loop0              133560      8912    114936   8% /mnt/mds1
/dev/loop1              133560      1628    122208   2% /mnt/mds2
/dev/loop2              133560      1632    122204   2% /mnt/mds3
/dev/loop3              133560      1636    122200   2% /mnt/mds4
/dev/loop4              133560      1640    122196   2% /mnt/mds5
/dev/loop5              133560      1648    122188   2% /mnt/mds6
/dev/loop6              133560      1648    122188   2% /mnt/mds7
/dev/loop7              133560      1652    122184   2% /mnt/mds8
/dev/loop8              133560      1656    122180   2% /mnt/mds9
/dev/loop9              133560      1664    122172   2% /mnt/mds10
/dev/loop10             133560      1664    122172   2% /mnt/mds11
/dev/loop11             133560      1668    122168   2% /mnt/mds12
/dev/loop12             133560      1676    122160   2% /mnt/mds13
/dev/loop13             133560      1684    122152   2% /mnt/mds14
/dev/loop14             133560      1688    122148   2% /mnt/mds15
/dev/loop15             133560      1688    122148   2% /mnt/mds16
/dev/loop16             133560      1692    122144   2% /mnt/mds17
/dev/loop17             133560      1696    122140   2% /mnt/mds18
/dev/loop18             133560      1704    122132   2% /mnt/mds19
/dev/loop19             133560      1708    122128   2% /mnt/mds20
/dev/loop20             133560      1708    122128   2% /mnt/mds21
/dev/loop21             133560      1712    122124   2% /mnt/mds22
/dev/loop22             133560      1720    122116   2% /mnt/mds23
/dev/loop23             133560      1724    122112   2% /mnt/mds24
/dev/loop24             133560      1724    122112   2% /mnt/mds25
/dev/loop25             133560      1728    122108   2% /mnt/mds26
/dev/loop26             133560      1736    122100   2% /mnt/mds27
/dev/loop27             133560      1740    122096   2% /mnt/mds28
/dev/loop28             133560      1744    122092   2% /mnt/mds29
/dev/loop29             133560      1744    122092   2% /mnt/mds30
/dev/loop30             133560      1752    122084   2% /mnt/mds31
/dev/loop31             133560      1756    122080   2% /mnt/mds32
/dev/loop32             133560      1760    122076   2% /mnt/mds33
/dev/loop33             133560      1764    122072   2% /mnt/mds34
/dev/loop34             133560      1768    122068   2% /mnt/mds35
/dev/loop35             133560      1772    122064   2% /mnt/mds36
/dev/loop36             133560      1776    122060   2% /mnt/mds37
/dev/loop37             133560      1780    122056   2% /mnt/mds38
/dev/loop38             133560      1780    122056   2% /mnt/mds39
/dev/loop39             133560      1792    122044   2% /mnt/mds40
/dev/loop40             133560      1796    122040   2% /mnt/mds41
/dev/loop41             133560      1800    122036   2% /mnt/mds42
/dev/loop42             133560      1804    122032   2% /mnt/mds43
/dev/loop43             133560      1808    122028   2% /mnt/mds44
/dev/loop44             133560      1812    122024   2% /mnt/mds45
/dev/loop45             133560      1816    122020   2% /mnt/mds46
/dev/loop46             133560      1820    122016   2% /mnt/mds47
/dev/loop47             133560      1828    122008   2% /mnt/mds48
/dev/loop48             133560      1828    122008   2% /mnt/mds49
/dev/loop49             133560      1832    122004   2% /mnt/mds50
/dev/loop50             133560      1836    122000   2% /mnt/mds51
/dev/loop51             133560      1844    121992   2% /mnt/mds52
/dev/loop52             133560      1844    121992   2% /mnt/mds53
/dev/loop53             133560      1848    121988   2% /mnt/mds54
/dev/loop54             133560      1852    121984   2% /mnt/mds55
/dev/loop55             133560      1860    121976   2% /mnt/mds56
/dev/loop56             133560      1864    121972   2% /mnt/mds57
/dev/loop57             133560      1864    121972   2% /mnt/mds58
/dev/loop58             133560      1868    121968   2% /mnt/mds59
/dev/loop59             133560      1876    121960   2% /mnt/mds60
/dev/loop60             133560      1880    121956   2% /mnt/mds61
/dev/loop61             133560      1884    121952   2% /mnt/mds62
/dev/loop62             133560      1884    121952   2% /mnt/mds63
/dev/loop63             133560      1888    121948   2% /mnt/mds64
/dev/loop64             133560      1896    121940   2% /mnt/mds65
/dev/loop65             133560      1900    121936   2% /mnt/mds66
/dev/loop66             133560      1900    121936   2% /mnt/mds67
/dev/loop67             133560      1904    121932   2% /mnt/mds68
/dev/loop68             171080     18728    141948  12% /mnt/ost1
/dev/loop69             171080     18728    141948  12% /mnt/ost2
c17@tcp:/lustre         342160     37456    283896  12% /mnt/lustre
[root@c17 tests]# lfs setdirstripe -c68 /mnt/lustre/test1

[root@c17 tests]# 
[root@c17 tests]# lfs getdirstripe /mnt/lustre/test1
/mnt/lustre/test1
lmv_stripe_count: 68 lmv_stripe_offset: 0
mdtidx		 FID[seq:oid:ver]
     0		 [0x200000400:0x2:0x0]		
     1		 [0x240000406:0x2:0x0]		
     2		 [0x280000406:0x2:0x0]		
     3		 [0x2c0000406:0x2:0x0]		
     4		 [0x300000406:0x2:0x0]		
     5		 [0x340000406:0x2:0x0]		
     6		 [0x380000406:0x2:0x0]		
     7		 [0x3c000040b:0x2:0x0]		
     8		 [0x40000040b:0x2:0x0]		
     9		 [0x44000040b:0x2:0x0]		
    10		 [0x48000040b:0x2:0x0]		
    11		 [0x4c000040b:0x2:0x0]		
    12		 [0x500000412:0x2:0x0]		
    13		 [0x540000412:0x2:0x0]		
    14		 [0x580000412:0x2:0x0]		
    15		 [0x5c0000412:0x2:0x0]		
    16		 [0x600000412:0x2:0x0]		
    17		 [0x640000412:0x2:0x0]		
    18		 [0x68000040b:0x2:0x0]		
    19		 [0x6c0000417:0x2:0x0]		
    20		 [0x700000417:0x2:0x0]		
    21		 [0x740000417:0x2:0x0]		
    22		 [0x780000417:0x2:0x0]		
    23		 [0x7c0000404:0x2:0x0]		
    24		 [0x80000041b:0x2:0x0]		
    25		 [0x84000041b:0x2:0x0]		
    26		 [0x88000041b:0x2:0x0]		
    27		 [0x8c000040c:0x2:0x0]		
    28		 [0x900000403:0x2:0x0]		
    29		 [0x940000421:0x2:0x0]		
    30		 [0x980000421:0x2:0x0]		
    31		 [0x9c0000421:0x2:0x0]		
    32		 [0xa0000041a:0x2:0x0]		
    33		 [0xa40000403:0x2:0x0]		
    34		 [0xa80000427:0x2:0x0]		
    35		 [0xac0000427:0x2:0x0]		
    36		 [0xb00000427:0x2:0x0]		
    37		 [0xb40000427:0x2:0x0]		
    38		 [0xb80000416:0x2:0x0]		
    39		 [0xbc0000402:0x2:0x0]		
    40		 [0xc0000042e:0x2:0x0]		
    41		 [0xc4000042e:0x2:0x0]		
    42		 [0xc8000042f:0x2:0x0]		
    43		 [0xcc000042f:0x2:0x0]		
    44		 [0xd00000428:0x2:0x0]		
    45		 [0xd4000041c:0x2:0x0]		
    46		 [0xd80000411:0x2:0x0]		
    47		 [0xdc0000402:0x2:0x0]		
    48		 [0xe00000434:0x2:0x0]		
    49		 [0xe40000434:0x2:0x0]		
    50		 [0xe8000042e:0x2:0x0]		
    51		 [0xec0000422:0x2:0x0]		
    52		 [0xf00000415:0x2:0x0]		
    53		 [0xf40000409:0x2:0x0]		
    54		 [0xf80000439:0x2:0x0]		
    55		 [0xfc0000439:0x2:0x0]		
    56		 [0x1000000428:0x2:0x0]		
    57		 [0x1040000419:0x2:0x0]		
    58		 [0x1080000409:0x2:0x0]		
    59		 [0x10c000043d:0x2:0x0]		
    60		 [0x110000042d:0x2:0x0]		
    61		 [0x114000041a:0x2:0x0]		
    62		 [0x1180000403:0x2:0x0]		
    63		 [0x11c0000441:0x2:0x0]		
    64		 [0x1200000432:0x2:0x0]		
    65		 [0x124000041d:0x2:0x0]		
    66		 [0x1280000407:0x2:0x0]		
    67		 [0x12c0000443:0x2:0x0]		
[root@c17 tests]# 
Comment by Di Wang [ 29/May/15 ]

I can create striped directory with 86 stripe count now. Robert: Could you please try this build? Thanks.

[root@c17 tests]# df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sda1             20642428   9385264  10208588  48% /
tmpfs                 16426748         0  16426748   0% /dev/shm
192.168.0.1:/scratch 416433152 272370688 122909696  69% /scratch
192.168.0.1:/home    416433152 272370688 122909696  69% /home
/dev/loop0              133560     12716    111140  11% /mnt/mds1
/dev/loop1              133560      1636    122200   2% /mnt/mds2
/dev/loop2              133560      1640    122196   2% /mnt/mds3
/dev/loop3              133560      1644    122192   2% /mnt/mds4
/dev/loop4              133560      1648    122188   2% /mnt/mds5
/dev/loop5              133560      1656    122180   2% /mnt/mds6
/dev/loop6              133560      1656    122180   2% /mnt/mds7
/dev/loop7              133560      1660    122176   2% /mnt/mds8
/dev/loop8              133560      1664    122172   2% /mnt/mds9
/dev/loop9              133560      1672    122164   2% /mnt/mds10
/dev/loop10             133560      1672    122164   2% /mnt/mds11
/dev/loop11             133560      1676    122160   2% /mnt/mds12
/dev/loop12             133560      1684    122152   2% /mnt/mds13
/dev/loop13             133560      1692    122144   2% /mnt/mds14
/dev/loop14             133560      1696    122140   2% /mnt/mds15
/dev/loop15             133560      1696    122140   2% /mnt/mds16
/dev/loop16             133560      1700    122136   2% /mnt/mds17
/dev/loop17             133560      1704    122132   2% /mnt/mds18
/dev/loop18             133560      1712    122124   2% /mnt/mds19
/dev/loop19             133560      1716    122120   2% /mnt/mds20
/dev/loop20             133560      1716    122120   2% /mnt/mds21
/dev/loop21             133560      1720    122116   2% /mnt/mds22
/dev/loop22             133560      1728    122108   2% /mnt/mds23
/dev/loop23             133560      1732    122104   2% /mnt/mds24
/dev/loop24             133560      1732    122104   2% /mnt/mds25
/dev/loop25             133560      1736    122100   2% /mnt/mds26
/dev/loop26             133560      1744    122092   2% /mnt/mds27
/dev/loop27             133560      1748    122088   2% /mnt/mds28
/dev/loop28             133560      1752    122084   2% /mnt/mds29
/dev/loop29             133560      1752    122084   2% /mnt/mds30
/dev/loop30             133560      1760    122076   2% /mnt/mds31
/dev/loop31             133560      1764    122072   2% /mnt/mds32
/dev/loop32             133560      1768    122068   2% /mnt/mds33
/dev/loop33             133560      1772    122064   2% /mnt/mds34
/dev/loop34             133560      1776    122060   2% /mnt/mds35
/dev/loop35             133560      1780    122056   2% /mnt/mds36
/dev/loop36             133560      1784    122052   2% /mnt/mds37
/dev/loop37             133560      1788    122048   2% /mnt/mds38
/dev/loop38             133560      1788    122048   2% /mnt/mds39
/dev/loop39             133560      1800    122036   2% /mnt/mds40
/dev/loop40             133560      1804    122032   2% /mnt/mds41
/dev/loop41             133560      1808    122028   2% /mnt/mds42
/dev/loop42             133560      1812    122024   2% /mnt/mds43
/dev/loop43             133560      1816    122020   2% /mnt/mds44
/dev/loop44             133560      1820    122016   2% /mnt/mds45
/dev/loop45             133560      1824    122012   2% /mnt/mds46
/dev/loop46             133560      1828    122008   2% /mnt/mds47
/dev/loop47             133560      1836    122000   2% /mnt/mds48
/dev/loop48             133560      1836    122000   2% /mnt/mds49
/dev/loop49             133560      1840    121996   2% /mnt/mds50
/dev/loop50             133560      1844    121992   2% /mnt/mds51
/dev/loop51             133560      1852    121984   2% /mnt/mds52
/dev/loop52             133560      1852    121984   2% /mnt/mds53
/dev/loop53             133560      1856    121980   2% /mnt/mds54
/dev/loop54             133560      1860    121976   2% /mnt/mds55
/dev/loop55             133560      1868    121968   2% /mnt/mds56
/dev/loop56             133560      1872    121964   2% /mnt/mds57
/dev/loop57             133560      1872    121964   2% /mnt/mds58
/dev/loop58             133560      1876    121960   2% /mnt/mds59
/dev/loop59             133560      1884    121952   2% /mnt/mds60
/dev/loop60             133560      1888    121948   2% /mnt/mds61
/dev/loop61             133560      1892    121944   2% /mnt/mds62
/dev/loop62             133560      1892    121944   2% /mnt/mds63
/dev/loop63             133560      1896    121940   2% /mnt/mds64
/dev/loop64             133560      1904    121932   2% /mnt/mds65
/dev/loop65             133560      1908    121928   2% /mnt/mds66
/dev/loop66             133560      1908    121928   2% /mnt/mds67
/dev/loop67             133560      1912    121924   2% /mnt/mds68
/dev/loop68             133560      1920    121916   2% /mnt/mds69
/dev/loop69             133560      1924    121912   2% /mnt/mds70
/dev/loop70             133560      1928    121908   2% /mnt/mds71
/dev/loop71             133560      1928    121908   2% /mnt/mds72
/dev/loop72             133560      1936    121900   2% /mnt/mds73
/dev/loop73             133560      1940    121896   2% /mnt/mds74
/dev/loop74             133560      1944    121892   2% /mnt/mds75
/dev/loop75             133560      1948    121888   2% /mnt/mds76
/dev/loop76             133560      1952    121884   2% /mnt/mds77
/dev/loop77             133560      1956    121880   2% /mnt/mds78
/dev/loop78             133560      1960    121876   2% /mnt/mds79
/dev/loop79             133560      1964    121872   2% /mnt/mds80
/dev/loop80             133560      1968    121868   2% /mnt/mds81
/dev/loop81             133560      1972    121864   2% /mnt/mds82
/dev/loop82             133560      1976    121860   2% /mnt/mds83
/dev/loop83             133560      1980    121856   2% /mnt/mds84
/dev/loop84             133560      1988    121848   2% /mnt/mds85
/dev/loop85             133560      1988    121848   2% /mnt/mds86
/dev/loop86             171080     21232    139448  14% /mnt/ost1
/dev/loop87             171080     21232    139448  14% /mnt/ost2
c17@tcp:/lustre         342160     42464    278896  14% /mnt/lustre
[root@c17 tests]# ../uti^C
[root@c17 tests]# echo 0 > /proc/sys/lnet/panic_on_lbug 
[root@c17 tests]# ../util^C
[root@c17 tests]# MDSC^C
[root@c17 tests]# lfs mkdir -c86 /mnt/lustre/test1
[root@c17 tests]# lfs getdirstripe /mnt/lustre/test1
/mnt/lustre/test1
lmv_stripe_count: 86 lmv_stripe_offset: 0
mdtidx		 FID[seq:oid:ver]
     0		 [0x200000400:0x2:0x0]		
     1		 [0x240000406:0x2:0x0]		
     2		 [0x280000406:0x2:0x0]		
     3		 [0x2c0000406:0x2:0x0]		
     4		 [0x300000406:0x2:0x0]		
     5		 [0x340000406:0x2:0x0]		
     6		 [0x380000409:0x2:0x0]		
     7		 [0x3c000040b:0x2:0x0]		
     8		 [0x40000040b:0x2:0x0]		
     9		 [0x44000040b:0x2:0x0]		
    10		 [0x48000040b:0x2:0x0]		
    11		 [0x4c000040b:0x2:0x0]		
    12		 [0x500000408:0x2:0x0]		
    13		 [0x540000413:0x2:0x0]		
    14		 [0x580000413:0x2:0x0]		
    15		 [0x5c0000413:0x2:0x0]		
    16		 [0x600000413:0x2:0x0]		
    17		 [0x640000413:0x2:0x0]		
    18		 [0x680000413:0x2:0x0]		
    19		 [0x6c0000402:0x2:0x0]		
    20		 [0x70000041a:0x2:0x0]		
    21		 [0x74000041a:0x2:0x0]		
    22		 [0x78000041a:0x2:0x0]		
    23		 [0x7c000041a:0x2:0x0]		
    24		 [0x80000041a:0x2:0x0]		
    25		 [0x84000041a:0x2:0x0]		
    26		 [0x88000040d:0x2:0x0]		
    27		 [0x8c0000404:0x2:0x0]		
    28		 [0x90000041f:0x2:0x0]		
    29		 [0x94000041f:0x2:0x0]		
    30		 [0x98000041f:0x2:0x0]		
    31		 [0x9c0000408:0x2:0x0]		
    32		 [0xa00000426:0x2:0x0]		
    33		 [0xa40000426:0x2:0x0]		
    34		 [0xa80000426:0x2:0x0]		
    35		 [0xac0000426:0x2:0x0]		
    36		 [0xb00000426:0x2:0x0]		
    37		 [0xb4000041b:0x2:0x0]		
    38		 [0xb8000040f:0x2:0x0]		
    39		 [0xbc0000405:0x2:0x0]		
    40		 [0xc0000042d:0x2:0x0]		
    41		 [0xc4000042d:0x2:0x0]		
    42		 [0xc8000042d:0x2:0x0]		
    43		 [0xcc0000425:0x2:0x0]		
    44		 [0xd00000418:0x2:0x0]		
    45		 [0xd4000040b:0x2:0x0]		
    46		 [0xd80000403:0x2:0x0]		
    47		 [0xdc0000432:0x2:0x0]		
    48		 [0xe0000042c:0x2:0x0]		
    49		 [0xe4000041f:0x2:0x0]		
    50		 [0xe80000410:0x2:0x0]		
    51		 [0xec0000403:0x2:0x0]		
    52		 [0xf00000437:0x2:0x0]		
    53		 [0xf40000431:0x2:0x0]		
    54		 [0xf80000422:0x2:0x0]		
    55		 [0xfc0000413:0x2:0x0]		
    56		 [0x1000000403:0x2:0x0]		
    57		 [0x104000043c:0x2:0x0]		
    58		 [0x1080000433:0x2:0x0]		
    59		 [0x10c0000422:0x2:0x0]		
    60		 [0x110000040f:0x2:0x0]		
    61		 [0x114000043f:0x2:0x0]		
    62		 [0x1180000438:0x2:0x0]		
    63		 [0x11c0000425:0x2:0x0]		
    64		 [0x1200000404:0x2:0x0]		
    65		 [0x1240000443:0x2:0x0]		
    66		 [0x1280000430:0x2:0x0]		
    67		 [0x12c0000404:0x2:0x0]		
    68		 [0x1300000446:0x2:0x0]		
    69		 [0x1340000439:0x2:0x0]		
    70		 [0x138000041e:0x2:0x0]		
    71		 [0x13c0000401:0x2:0x0]		
    72		 [0x140000043e:0x2:0x0]		
    73		 [0x1440000420:0x2:0x0]		
    74		 [0x148000040f:0x2:0x0]		
    75		 [0x14c0000439:0x2:0x0]		
    76		 [0x150000040d:0x2:0x0]		
    77		 [0x154000044e:0x2:0x0]		
    78		 [0x158000042d:0x2:0x0]		
    79		 [0x15c0000402:0x2:0x0]		
    80		 [0x1600000440:0x2:0x0]		
    81		 [0x1640000412:0x2:0x0]		
    82		 [0x1680000453:0x2:0x0]		
    83		 [0x16c000042b:0x2:0x0]		
    84		 [0x170000040a:0x2:0x0]		
    85		 [0x1740000438:0x2:0x0]		
Comment by Robert Read (Inactive) [ 29/May/15 ]

OK, I'll try this build next.

Comment by Andreas Dilger [ 29/May/15 ]

Di, do you have any thoughts on what the maximum stripe count will now be? Presumably there is no longer a limit because of the size of a single replay record, but there might still be another limit from the size of a single OUT record that lists e.g. the FIDs of all stripes, or is each stripe separate?

Comment by Andreas Dilger [ 29/May/15 ]

Di, I was going to look at your patch to change the OUT logging change, but I only see the combined change 13942. Is there a different patch that contains only the changes for the redo logs?

Comment by Di Wang [ 29/May/15 ]

Andreas: Sorry, no independent patch yet. I was waiting Oleg to land some of my patches, (he is testing them right now, hopefully tomorrow they can land) then I will push those patches independently, which actually includes 3 patches.

Comment by Di Wang [ 29/May/15 ]

Andreas: the maximum stripes is now limited by 1MB RPC size, which includes the updates to create the single stripe (about 4k, at most 8k, plus some RPC overhead) and the update logs (1016k at most, let's say 1000k for safe).

As we calculated before, each stripe costs about 390 bytes, and extra update records is about 2k. so (1000k - 2k)/390 = 2620 stripes.

There might be some other extra overhead, for example if there are ACL or default LOV etc, then each stripe creation might include more updates. let's say 3 more xattr set updates.(so it is 32 * 3 = 96 bytes), then the total bytes of creation single stripe will be 390 + 96 = 486 (Note: in setxattr, EA itself will be in common params area, because it will be the same for all stripes, so it will not repeat). Then the total stripes will be (1000k - 2k) / 486 = 2102 stripes.

So I would think the maximum stripes will be 2000 in theory. It would be interesting, if we can create 512 or 1024 stripes to see if it can works.

Comment by James A Simmons [ 29/May/15 ]

Actually you can increase the RPC to 4MB so in theory it can be 8000 stripes. It is very possible that we might have 16MB RPCs a few years for now.

Comment by Robert Read (Inactive) [ 03/Jun/15 ]

With build 32529, I was able to create striped directories up to 84, but at 96 I hit another assertion on an MDS:

Jun  3 18:22:19 mdt03 kernel: LustreError: 1745:0:(llog_osd.c:147:llog_osd_pad()) ASSERTION( len >= (24) && (len & 0x7) == 0 ) failed: 
Jun  3 18:22:19 mdt03 kernel: LustreError: 1745:0:(llog_osd.c:147:llog_osd_pad()) LBUG
Jun  3 18:22:19 mdt03 kernel: Pid: 1745, comm: mdt01_002
Jun  3 18:22:19 mdt03 kernel: 
Jun  3 18:22:19 mdt03 kernel: Call Trace:
Jun  3 18:22:19 mdt03 kernel: [<ffffffffa0145875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
Jun  3 18:22:19 mdt03 kernel: [<ffffffffa0145e77>] lbug_with_loc+0x47/0xb0 [libcfs]
Jun  3 18:22:19 mdt03 kernel: [<ffffffffa023d48e>] llog_osd_write_rec+0x167e/0x17a0 [obdclass]
Jun  3 18:22:19 mdt03 kernel: [<ffffffffa022d220>] llog_write_rec+0xb0/0x270 [obdclass]
Jun  3 18:22:19 mdt03 kernel: [<ffffffffa0235a6f>] llog_cat_add_rec+0x9f/0x460 [obdclass]
Jun  3 18:22:19 mdt03 kernel: [<ffffffffa022d039>] llog_add+0x89/0x1c0 [obdclass]
Jun  3 18:22:19 mdt03 kernel: [<ffffffffa051b17c>] sub_updates_write+0x9dc/0x1380 [ptlrpc]
Jun  3 18:22:19 mdt03 kernel: [<ffffffff8115cdbc>] ? __vunmap+0x9c/0x120
Jun  3 18:22:19 mdt03 kernel: [<ffffffffa051c26c>] top_trans_stop+0x74c/0xb30 [ptlrpc]
Jun  3 18:22:19 mdt03 kernel: [<ffffffffa0c5b0ff>] ? lod_attr_set+0x12f/0xab0 [lod]
Jun  3 18:22:19 mdt03 kernel: [<ffffffffa029ce40>] ? lu_ucred+0x20/0x30 [obdclass]
Jun  3 18:22:19 mdt03 kernel: [<ffffffffa0c449bf>] lod_trans_stop+0x2bf/0x330 [lod]
Jun  3 18:22:19 mdt03 kernel: [<ffffffffa0ce867d>] mdd_trans_stop+0x1d/0xb0 [mdd]
Jun  3 18:22:19 mdt03 kernel: [<ffffffffa0cd7a8c>] mdd_create+0x13ac/0x1760 [mdd]
Jun  3 18:22:19 mdt03 kernel: [<ffffffffa0b9080c>] ? mdt_version_save+0x8c/0x1a0 [mdt]
Jun  3 18:22:19 mdt03 kernel: [<ffffffffa0b94bec>] mdt_reint_create+0xbcc/0xce0 [mdt]
Jun  3 18:22:19 mdt03 kernel: [<ffffffffa029ce40>] ? lu_ucred+0x20/0x30 [obdclass]
Jun  3 18:22:19 mdt03 kernel: [<ffffffffa0b70dd5>] ? mdt_ucred+0x15/0x20 [mdt]
Jun  3 18:22:19 mdt03 kernel: [<ffffffffa0b8baec>] ? mdt_root_squash+0x2c/0x3f0 [mdt]
Jun  3 18:22:19 mdt03 kernel: [<ffffffffa04cbd42>] ? __req_capsule_get+0x162/0x6e0 [ptlrpc]
Jun  3 18:22:19 mdt03 kernel: [<ffffffffa0b8fb9d>] mdt_reint_rec+0x5d/0x200 [mdt]
Jun  3 18:22:19 mdt03 kernel: [<ffffffffa0b74fab>] mdt_reint_internal+0x4cb/0x7a0 [mdt]
Jun  3 18:22:19 mdt03 kernel: [<ffffffffa0b7571b>] mdt_reint+0x6b/0x120 [mdt]
Jun  3 18:22:19 mdt03 kernel: [<ffffffffa05074be>] tgt_request_handle+0x95e/0x10b0 [ptlrpc]
Jun  3 18:22:19 mdt03 kernel: [<ffffffffa04b6c31>] ptlrpc_main+0xe41/0x1970 [ptlrpc]
Jun  3 18:22:19 mdt03 kernel: [<ffffffff81014959>] ? sched_clock+0x9/0x10
Jun  3 18:22:19 mdt03 kernel: [<ffffffff81060c3f>] ? finish_task_switch+0x4f/0xf0
Jun  3 18:22:19 mdt03 kernel: [<ffffffffa04b5df0>] ? ptlrpc_main+0x0/0x1970 [ptlrpc]
Jun  3 18:22:19 mdt03 kernel: [<ffffffff8109e71e>] kthread+0x9e/0xc0
Jun  3 18:22:19 mdt03 kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20
Jun  3 18:22:19 mdt03 kernel: [<ffffffff8100b294>] ? int_ret_from_sys_call+0x7/0x1b
Jun  3 18:22:19 mdt03 kernel: [<ffffffff8100ba1d>] ? retint_restore_args+0x5/0x6
Jun  3 18:22:19 mdt03 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
Jun  3 18:22:19 mdt03 kernel: 
Comment by Di Wang [ 03/Jun/15 ]

Ah, I probably already fixed this problem last night. I just updated the patch to make DNE create multiple stripes on single MDT as Andreas suggested. Right now, I am able to create the striped directory with 512 stripes on my local environment. I will push the patch soon. Thanks for testing.

Comment by Robert Read (Inactive) [ 03/Jun/15 ]

I had 128 MDTs though, so it didn't need to create more than one stripe on a single MDT.

Comment by Di Wang [ 03/Jun/15 ]

Oh, yes, this multiple stripes is only for testing purpose, IMHO. i.e. verifying big stripes on local environment.

Comment by Robert Read (Inactive) [ 03/Jun/15 ]

Got it.

Comment by Di Wang [ 04/Jun/15 ]

I just upgrade the patch http://review.whamcloud.com/#/c/13942
With the patch, I can create 512 stripes striped dir locally.

== sanity test 300k: test large striped directory == 01:30:04 (1433320204)
fail_loc=0x1703
fail_loc=0
/mnt/lustre/d300k.sanity/striped_dir
lmv_stripe_count: 512 lmv_stripe_offset: 0
mdtidx		 FID[seq:oid:ver]
     0		 [0x200000400:0x104:0x0]		
     1		 [0x240000403:0x104:0x0]		
     2		 [0x280000403:0x104:0x0]		
     3		 [0x2c0000403:0x104:0x0]		
     0		 [0x200000400:0x105:0x0]		
     1		 [0x240000403:0x105:0x0]		
     2		 [0x280000403:0x105:0x0]		
     3		 [0x2c0000403:0x105:0x0]		
     0		 [0x200000400:0x106:0x0]		
     1		 [0x240000403:0x106:0x0]		
     2		 [0x280000403:0x106:0x0]		
     3		 [0x2c0000403:0x106:0x0]		
     0		 [0x200000400:0x107:0x0]		
     1		 [0x240000403:0x107:0x0]		
     2		 [0x280000403:0x107:0x0]		
     3		 [0x2c0000403:0x107:0x0]		
     0		 [0x200000400:0x108:0x0]		
     1		 [0x240000403:0x108:0x0]		
     2		 [0x280000403:0x108:0x0]		
     3		 [0x2c0000403:0x108:0x0]		
     0		 [0x200000400:0x109:0x0]		
     1		 [0x240000403:0x109:0x0]		
     2		 [0x280000403:0x109:0x0]		
     3		 [0x2c0000403:0x109:0x0]		
     0		 [0x200000400:0x10a:0x0]		
     1		 [0x240000403:0x10a:0x0]		
     2		 [0x280000403:0x10a:0x0]		
     3		 [0x2c0000403:0x10a:0x0]		
     0		 [0x200000400:0x10b:0x0]		
     1		 [0x240000403:0x10b:0x0]		
     2		 [0x280000403:0x10b:0x0]		
     3		 [0x2c0000403:0x10b:0x0]		
     0		 [0x200000400:0x10c:0x0]		
     1		 [0x240000403:0x10c:0x0]		
     2		 [0x280000403:0x10c:0x0]		
     3		 [0x2c0000403:0x10c:0x0]		
     0		 [0x200000400:0x10d:0x0]		
     1		 [0x240000403:0x10d:0x0]		
     2		 [0x280000403:0x10d:0x0]		
     3		 [0x2c0000403:0x10d:0x0]		
     0		 [0x200000400:0x10e:0x0]		
     1		 [0x240000403:0x10e:0x0]		
     2		 [0x280000403:0x10e:0x0]		
     3		 [0x2c0000403:0x10e:0x0]		
     0		 [0x200000400:0x10f:0x0]		
     1		 [0x240000403:0x10f:0x0]		
     2		 [0x280000403:0x10f:0x0]		
     3		 [0x2c0000403:0x10f:0x0]		
     0		 [0x200000400:0x110:0x0]		
     1		 [0x240000403:0x110:0x0]		
     2		 [0x280000403:0x110:0x0]		
     3		 [0x2c0000403:0x110:0x0]		
     0		 [0x200000400:0x111:0x0]		
     1		 [0x240000403:0x111:0x0]		
     2		 [0x280000403:0x111:0x0]		
     3		 [0x2c0000403:0x111:0x0]		
     0		 [0x200000400:0x112:0x0]		
     1		 [0x240000403:0x112:0x0]		
     2		 [0x280000403:0x112:0x0]		
     3		 [0x2c0000403:0x112:0x0]		
     0		 [0x200000400:0x113:0x0]		
     1		 [0x240000403:0x113:0x0]		
     2		 [0x280000403:0x113:0x0]		
     3		 [0x2c0000403:0x113:0x0]		
     0		 [0x200000400:0x114:0x0]		
     1		 [0x240000403:0x114:0x0]		
     2		 [0x280000403:0x114:0x0]		
     3		 [0x2c0000403:0x114:0x0]		
     0		 [0x200000400:0x115:0x0]		
     1		 [0x240000403:0x115:0x0]		
     2		 [0x280000403:0x115:0x0]		
     3		 [0x2c0000403:0x115:0x0]		
     0		 [0x200000400:0x116:0x0]		
     1		 [0x240000403:0x116:0x0]		
     2		 [0x280000403:0x116:0x0]		
     3		 [0x2c0000403:0x116:0x0]		
     0		 [0x200000400:0x117:0x0]		
     1		 [0x240000403:0x117:0x0]		
     2		 [0x280000403:0x117:0x0]		
     3		 [0x2c0000403:0x117:0x0]		
     0		 [0x200000400:0x118:0x0]		
     1		 [0x240000403:0x118:0x0]		
     2		 [0x280000403:0x118:0x0]		
     3		 [0x2c0000403:0x118:0x0]		
     0		 [0x200000400:0x119:0x0]		
     1		 [0x240000403:0x119:0x0]		
     2		 [0x280000403:0x119:0x0]		
     3		 [0x2c0000403:0x119:0x0]		
     0		 [0x200000400:0x11a:0x0]		
     1		 [0x240000403:0x11a:0x0]		
     2		 [0x280000403:0x11a:0x0]		
     3		 [0x2c0000403:0x11a:0x0]		
     0		 [0x200000400:0x11b:0x0]		
     1		 [0x240000403:0x11b:0x0]		
     2		 [0x280000403:0x11b:0x0]		
     3		 [0x2c0000403:0x11b:0x0]		
     0		 [0x200000400:0x11c:0x0]		
     1		 [0x240000403:0x11c:0x0]		
     2		 [0x280000403:0x11c:0x0]		
     3		 [0x2c0000403:0x11c:0x0]		
     0		 [0x200000400:0x11d:0x0]		
     1		 [0x240000403:0x11d:0x0]		
     2		 [0x280000403:0x11d:0x0]		
     3		 [0x2c0000403:0x11d:0x0]		
     0		 [0x200000400:0x11e:0x0]		
     1		 [0x240000403:0x11e:0x0]		
     2		 [0x280000403:0x11e:0x0]		
     3		 [0x2c0000403:0x11e:0x0]		
     0		 [0x200000400:0x11f:0x0]		
     1		 [0x240000403:0x11f:0x0]		
     2		 [0x280000403:0x11f:0x0]		
     3		 [0x2c0000403:0x11f:0x0]		
     0		 [0x200000400:0x120:0x0]		
     1		 [0x240000403:0x120:0x0]		
     2		 [0x280000403:0x120:0x0]		
     3		 [0x2c0000403:0x120:0x0]		
     0		 [0x200000400:0x121:0x0]		
     1		 [0x240000403:0x121:0x0]		
     2		 [0x280000403:0x121:0x0]		
     3		 [0x2c0000403:0x121:0x0]		
     0		 [0x200000400:0x122:0x0]		
     1		 [0x240000403:0x122:0x0]		
     2		 [0x280000403:0x122:0x0]		
     3		 [0x2c0000403:0x122:0x0]		
     0		 [0x200000400:0x123:0x0]		
     1		 [0x240000403:0x123:0x0]		
     2		 [0x280000403:0x123:0x0]		
     3		 [0x2c0000403:0x123:0x0]		
     0		 [0x200000400:0x124:0x0]		
     1		 [0x240000403:0x124:0x0]		
     2		 [0x280000403:0x124:0x0]		
     3		 [0x2c0000403:0x124:0x0]		
     0		 [0x200000400:0x125:0x0]		
     1		 [0x240000403:0x125:0x0]		
     2		 [0x280000403:0x125:0x0]		
     3		 [0x2c0000403:0x125:0x0]		
     0		 [0x200000400:0x126:0x0]		
     1		 [0x240000403:0x126:0x0]		
     2		 [0x280000403:0x126:0x0]		
     3		 [0x2c0000403:0x126:0x0]		
     0		 [0x200000400:0x127:0x0]		
     1		 [0x240000403:0x127:0x0]		
     2		 [0x280000403:0x127:0x0]		
     3		 [0x2c0000403:0x127:0x0]		
     0		 [0x200000400:0x128:0x0]		
     1		 [0x240000403:0x128:0x0]		
     2		 [0x280000403:0x128:0x0]		
     3		 [0x2c0000403:0x128:0x0]		
     0		 [0x200000400:0x129:0x0]		
     1		 [0x240000403:0x129:0x0]		
     2		 [0x280000403:0x129:0x0]		
     3		 [0x2c0000403:0x129:0x0]		
     0		 [0x200000400:0x12a:0x0]		
     1		 [0x240000403:0x12a:0x0]		
     2		 [0x280000403:0x12a:0x0]		
     3		 [0x2c0000403:0x12a:0x0]		
     0		 [0x200000400:0x12b:0x0]		
     1		 [0x240000403:0x12b:0x0]		
     2		 [0x280000403:0x12b:0x0]		
     3		 [0x2c0000403:0x12b:0x0]		
     0		 [0x200000400:0x12c:0x0]		
     1		 [0x240000403:0x12c:0x0]		
     2		 [0x280000403:0x12c:0x0]		
     3		 [0x2c0000403:0x12c:0x0]		
     0		 [0x200000400:0x12d:0x0]		
     1		 [0x240000403:0x12d:0x0]		
     2		 [0x280000403:0x12d:0x0]		
     3		 [0x2c0000403:0x12d:0x0]		
     0		 [0x200000400:0x12e:0x0]		
     1		 [0x240000403:0x12e:0x0]		
     2		 [0x280000403:0x12e:0x0]		
     3		 [0x2c0000403:0x12e:0x0]		
     0		 [0x200000400:0x12f:0x0]		
     1		 [0x240000403:0x12f:0x0]		
     2		 [0x280000403:0x12f:0x0]		
     3		 [0x2c0000403:0x12f:0x0]		
     0		 [0x200000400:0x130:0x0]		
     1		 [0x240000403:0x130:0x0]		
     2		 [0x280000403:0x130:0x0]		
     3		 [0x2c0000403:0x130:0x0]		
     0		 [0x200000400:0x131:0x0]		
     1		 [0x240000403:0x131:0x0]		
     2		 [0x280000403:0x131:0x0]		
     3		 [0x2c0000403:0x131:0x0]		
     0		 [0x200000400:0x132:0x0]		
     1		 [0x240000403:0x132:0x0]		
     2		 [0x280000403:0x132:0x0]		
     3		 [0x2c0000403:0x132:0x0]		
     0		 [0x200000400:0x133:0x0]		
     1		 [0x240000403:0x133:0x0]		
     2		 [0x280000403:0x133:0x0]		
     3		 [0x2c0000403:0x133:0x0]		
     0		 [0x200000400:0x134:0x0]		
     1		 [0x240000403:0x134:0x0]		
     2		 [0x280000403:0x134:0x0]		
     3		 [0x2c0000403:0x134:0x0]		
     0		 [0x200000400:0x135:0x0]		
     1		 [0x240000403:0x135:0x0]		
     2		 [0x280000403:0x135:0x0]		
     3		 [0x2c0000403:0x135:0x0]		
     0		 [0x200000400:0x136:0x0]		
     1		 [0x240000403:0x136:0x0]		
     2		 [0x280000403:0x136:0x0]		
     3		 [0x2c0000403:0x136:0x0]		
     0		 [0x200000400:0x137:0x0]		
     1		 [0x240000403:0x137:0x0]		
     2		 [0x280000403:0x137:0x0]		
     3		 [0x2c0000403:0x137:0x0]		
     0		 [0x200000400:0x138:0x0]		
     1		 [0x240000403:0x138:0x0]		
     2		 [0x280000403:0x138:0x0]		
     3		 [0x2c0000403:0x138:0x0]		
     0		 [0x200000400:0x139:0x0]		
     1		 [0x240000403:0x139:0x0]		
     2		 [0x280000403:0x139:0x0]		
     3		 [0x2c0000403:0x139:0x0]		
     0		 [0x200000400:0x13a:0x0]		
     1		 [0x240000403:0x13a:0x0]		
     2		 [0x280000403:0x13a:0x0]		
     3		 [0x2c0000403:0x13a:0x0]		
     0		 [0x200000400:0x13b:0x0]		
     1		 [0x240000403:0x13b:0x0]		
     2		 [0x280000403:0x13b:0x0]		
     3		 [0x2c0000403:0x13b:0x0]		
     0		 [0x200000400:0x13c:0x0]		
     1		 [0x240000403:0x13c:0x0]		
     2		 [0x280000403:0x13c:0x0]		
     3		 [0x2c0000403:0x13c:0x0]		
     0		 [0x200000400:0x13d:0x0]		
     1		 [0x240000403:0x13d:0x0]		
     2		 [0x280000403:0x13d:0x0]		
     3		 [0x2c0000403:0x13d:0x0]		
     0		 [0x200000400:0x13e:0x0]		
     1		 [0x240000403:0x13e:0x0]		
     2		 [0x280000403:0x13e:0x0]		
     3		 [0x2c0000403:0x13e:0x0]		
     0		 [0x200000400:0x13f:0x0]		
     1		 [0x240000403:0x13f:0x0]		
     2		 [0x280000403:0x13f:0x0]		
     3		 [0x2c0000403:0x13f:0x0]		
     0		 [0x200000400:0x140:0x0]		
     1		 [0x240000403:0x140:0x0]		
     2		 [0x280000403:0x140:0x0]		
     3		 [0x2c0000403:0x140:0x0]		
     0		 [0x200000400:0x141:0x0]		
     1		 [0x240000403:0x141:0x0]		
     2		 [0x280000403:0x141:0x0]		
     3		 [0x2c0000403:0x141:0x0]		
     0		 [0x200000400:0x142:0x0]		
     1		 [0x240000403:0x142:0x0]		
     2		 [0x280000403:0x142:0x0]		
     3		 [0x2c0000403:0x142:0x0]		
     0		 [0x200000400:0x143:0x0]		
     1		 [0x240000403:0x143:0x0]		
     2		 [0x280000403:0x143:0x0]		
     3		 [0x2c0000403:0x143:0x0]		
     0		 [0x200000400:0x144:0x0]		
     1		 [0x240000403:0x144:0x0]		
     2		 [0x280000403:0x144:0x0]		
     3		 [0x2c0000403:0x144:0x0]		
     0		 [0x200000400:0x145:0x0]		
     1		 [0x240000403:0x145:0x0]		
     2		 [0x280000403:0x145:0x0]		
     3		 [0x2c0000403:0x145:0x0]		
     0		 [0x200000400:0x146:0x0]		
     1		 [0x240000403:0x146:0x0]		
     2		 [0x280000403:0x146:0x0]		
     3		 [0x2c0000403:0x146:0x0]		
     0		 [0x200000400:0x147:0x0]		
     1		 [0x240000403:0x147:0x0]		
     2		 [0x280000403:0x147:0x0]		
     3		 [0x2c0000403:0x147:0x0]		
     0		 [0x200000400:0x148:0x0]		
     1		 [0x240000403:0x148:0x0]		
     2		 [0x280000403:0x148:0x0]		
     3		 [0x2c0000403:0x148:0x0]		
     0		 [0x200000400:0x149:0x0]		
     1		 [0x240000403:0x149:0x0]		
     2		 [0x280000403:0x149:0x0]		
     3		 [0x2c0000403:0x149:0x0]		
     0		 [0x200000400:0x14a:0x0]		
     1		 [0x240000403:0x14a:0x0]		
     2		 [0x280000403:0x14a:0x0]		
     3		 [0x2c0000403:0x14a:0x0]		
     0		 [0x200000400:0x14b:0x0]		
     1		 [0x240000403:0x14b:0x0]		
     2		 [0x280000403:0x14b:0x0]		
     3		 [0x2c0000403:0x14b:0x0]		
     0		 [0x200000400:0x14c:0x0]		
     1		 [0x240000403:0x14c:0x0]		
     2		 [0x280000403:0x14c:0x0]		
     3		 [0x2c0000403:0x14c:0x0]		
     0		 [0x200000400:0x14d:0x0]		
     1		 [0x240000403:0x14d:0x0]		
     2		 [0x280000403:0x14d:0x0]		
     3		 [0x2c0000403:0x14d:0x0]		
     0		 [0x200000400:0x14e:0x0]		
     1		 [0x240000403:0x14e:0x0]		
     2		 [0x280000403:0x14e:0x0]		
     3		 [0x2c0000403:0x14e:0x0]		
     0		 [0x200000400:0x14f:0x0]		
     1		 [0x240000403:0x14f:0x0]		
     2		 [0x280000403:0x14f:0x0]		
     3		 [0x2c0000403:0x14f:0x0]		
     0		 [0x200000400:0x150:0x0]		
     1		 [0x240000403:0x150:0x0]		
     2		 [0x280000403:0x150:0x0]		
     3		 [0x2c0000403:0x150:0x0]		
     0		 [0x200000400:0x151:0x0]		
     1		 [0x240000403:0x151:0x0]		
     2		 [0x280000403:0x151:0x0]		
     3		 [0x2c0000403:0x151:0x0]		
     0		 [0x200000400:0x152:0x0]		
     1		 [0x240000403:0x152:0x0]		
     2		 [0x280000403:0x152:0x0]		
     3		 [0x2c0000403:0x152:0x0]		
     0		 [0x200000400:0x153:0x0]		
     1		 [0x240000403:0x153:0x0]		
     2		 [0x280000403:0x153:0x0]		
     3		 [0x2c0000403:0x153:0x0]		
     0		 [0x200000400:0x154:0x0]		
     1		 [0x240000403:0x154:0x0]		
     2		 [0x280000403:0x154:0x0]		
     3		 [0x2c0000403:0x154:0x0]		
     0		 [0x200000400:0x155:0x0]		
     1		 [0x240000403:0x155:0x0]		
     2		 [0x280000403:0x155:0x0]		
     3		 [0x2c0000403:0x155:0x0]		
     0		 [0x200000400:0x156:0x0]		
     1		 [0x240000403:0x156:0x0]		
     2		 [0x280000403:0x156:0x0]		
     3		 [0x2c0000403:0x156:0x0]		
     0		 [0x200000400:0x157:0x0]		
     1		 [0x240000403:0x157:0x0]		
     2		 [0x280000403:0x157:0x0]		
     3		 [0x2c0000403:0x157:0x0]		
     0		 [0x200000400:0x158:0x0]		
     1		 [0x240000403:0x158:0x0]		
     2		 [0x280000403:0x158:0x0]		
     3		 [0x2c0000403:0x158:0x0]		
     0		 [0x200000400:0x159:0x0]		
     1		 [0x240000403:0x159:0x0]		
     2		 [0x280000403:0x159:0x0]		
     3		 [0x2c0000403:0x159:0x0]		
     0		 [0x200000400:0x15a:0x0]		
     1		 [0x240000403:0x15a:0x0]		
     2		 [0x280000403:0x15a:0x0]		
     3		 [0x2c0000403:0x15a:0x0]		
     0		 [0x200000400:0x15b:0x0]		
     1		 [0x240000403:0x15b:0x0]		
     2		 [0x280000403:0x15b:0x0]		
     3		 [0x2c0000403:0x15b:0x0]		
     0		 [0x200000400:0x15c:0x0]		
     1		 [0x240000403:0x15c:0x0]		
     2		 [0x280000403:0x15c:0x0]		
     3		 [0x2c0000403:0x15c:0x0]		
     0		 [0x200000400:0x15d:0x0]		
     1		 [0x240000403:0x15d:0x0]		
     2		 [0x280000403:0x15d:0x0]		
     3		 [0x2c0000403:0x15d:0x0]		
     0		 [0x200000400:0x15e:0x0]		
     1		 [0x240000403:0x15e:0x0]		
     2		 [0x280000403:0x15e:0x0]		
     3		 [0x2c0000403:0x15e:0x0]		
     0		 [0x200000400:0x15f:0x0]		
     1		 [0x240000403:0x15f:0x0]		
     2		 [0x280000403:0x15f:0x0]		
     3		 [0x2c0000403:0x15f:0x0]		
     0		 [0x200000400:0x160:0x0]		
     1		 [0x240000403:0x160:0x0]		
     2		 [0x280000403:0x160:0x0]		
     3		 [0x2c0000403:0x160:0x0]		
     0		 [0x200000400:0x161:0x0]		
     1		 [0x240000403:0x161:0x0]		
     2		 [0x280000403:0x161:0x0]		
     3		 [0x2c0000403:0x161:0x0]		
     0		 [0x200000400:0x162:0x0]		
     1		 [0x240000403:0x162:0x0]		
     2		 [0x280000403:0x162:0x0]		
     3		 [0x2c0000403:0x162:0x0]		
     0		 [0x200000400:0x163:0x0]		
     1		 [0x240000403:0x163:0x0]		
     2		 [0x280000403:0x163:0x0]		
     3		 [0x2c0000403:0x163:0x0]		
     0		 [0x200000400:0x164:0x0]		
     1		 [0x240000403:0x164:0x0]		
     2		 [0x280000403:0x164:0x0]		
     3		 [0x2c0000403:0x164:0x0]		
     0		 [0x200000400:0x165:0x0]		
     1		 [0x240000403:0x165:0x0]		
     2		 [0x280000403:0x165:0x0]		
     3		 [0x2c0000403:0x165:0x0]		
     0		 [0x200000400:0x166:0x0]		
     1		 [0x240000403:0x166:0x0]		
     2		 [0x280000403:0x166:0x0]		
     3		 [0x2c0000403:0x166:0x0]		
     0		 [0x200000400:0x167:0x0]		
     1		 [0x240000403:0x167:0x0]		
     2		 [0x280000403:0x167:0x0]		
     3		 [0x2c0000403:0x167:0x0]		
     0		 [0x200000400:0x168:0x0]		
     1		 [0x240000403:0x168:0x0]		
     2		 [0x280000403:0x168:0x0]		
     3		 [0x2c0000403:0x168:0x0]		
     0		 [0x200000400:0x169:0x0]		
     1		 [0x240000403:0x169:0x0]		
     2		 [0x280000403:0x169:0x0]		
     3		 [0x2c0000403:0x169:0x0]		
     0		 [0x200000400:0x16a:0x0]		
     1		 [0x240000403:0x16a:0x0]		
     2		 [0x280000403:0x16a:0x0]		
     3		 [0x2c0000403:0x16a:0x0]		
     0		 [0x200000400:0x16b:0x0]		
     1		 [0x240000403:0x16b:0x0]		
     2		 [0x280000403:0x16b:0x0]		
     3		 [0x2c0000403:0x16b:0x0]		
     0		 [0x200000400:0x16c:0x0]		
     1		 [0x240000403:0x16c:0x0]		
     2		 [0x280000403:0x16c:0x0]		
     3		 [0x2c0000403:0x16c:0x0]		
     0		 [0x200000400:0x16d:0x0]		
     1		 [0x240000403:0x16d:0x0]		
     2		 [0x280000403:0x16d:0x0]		
     3		 [0x2c0000403:0x16d:0x0]		
     0		 [0x200000400:0x16e:0x0]		
     1		 [0x240000403:0x16e:0x0]		
     2		 [0x280000403:0x16e:0x0]		
     3		 [0x2c0000403:0x16e:0x0]		
     0		 [0x200000400:0x16f:0x0]		
     1		 [0x240000403:0x16f:0x0]		
     2		 [0x280000403:0x16f:0x0]		
     3		 [0x2c0000403:0x16f:0x0]		
     0		 [0x200000400:0x170:0x0]		
     1		 [0x240000403:0x170:0x0]		
     2		 [0x280000403:0x170:0x0]		
     3		 [0x2c0000403:0x170:0x0]		
     0		 [0x200000400:0x171:0x0]		
     1		 [0x240000403:0x171:0x0]		
     2		 [0x280000403:0x171:0x0]		
     3		 [0x2c0000403:0x171:0x0]		
     0		 [0x200000400:0x172:0x0]		
     1		 [0x240000403:0x172:0x0]		
     2		 [0x280000403:0x172:0x0]		
     3		 [0x2c0000403:0x172:0x0]		
     0		 [0x200000400:0x173:0x0]		
     1		 [0x240000403:0x173:0x0]		
     2		 [0x280000403:0x173:0x0]		
     3		 [0x2c0000403:0x173:0x0]		
     0		 [0x200000400:0x174:0x0]		
     1		 [0x240000403:0x174:0x0]		
     2		 [0x280000403:0x174:0x0]		
     3		 [0x2c0000403:0x174:0x0]		
     0		 [0x200000400:0x175:0x0]		
     1		 [0x240000403:0x175:0x0]		
     2		 [0x280000403:0x175:0x0]		
     3		 [0x2c0000403:0x175:0x0]		
     0		 [0x200000400:0x176:0x0]		
     1		 [0x240000403:0x176:0x0]		
     2		 [0x280000403:0x176:0x0]		
     3		 [0x2c0000403:0x176:0x0]		
     0		 [0x200000400:0x177:0x0]		
     1		 [0x240000403:0x177:0x0]		
     2		 [0x280000403:0x177:0x0]		
     3		 [0x2c0000403:0x177:0x0]		
     0		 [0x200000400:0x178:0x0]		
     1		 [0x240000403:0x178:0x0]		
     2		 [0x280000403:0x178:0x0]		
     3		 [0x2c0000403:0x178:0x0]		
     0		 [0x200000400:0x179:0x0]		
     1		 [0x240000403:0x179:0x0]		
     2		 [0x280000403:0x179:0x0]		
     3		 [0x2c0000403:0x179:0x0]		
     0		 [0x200000400:0x17a:0x0]		
     1		 [0x240000403:0x17a:0x0]		
     2		 [0x280000403:0x17a:0x0]		
     3		 [0x2c0000403:0x17a:0x0]		
     0		 [0x200000400:0x17b:0x0]		
     1		 [0x240000403:0x17b:0x0]		
     2		 [0x280000403:0x17b:0x0]		
     3		 [0x2c0000403:0x17b:0x0]		
     0		 [0x200000400:0x17c:0x0]		
     1		 [0x240000403:0x17c:0x0]		
     2		 [0x280000403:0x17c:0x0]		
     3		 [0x2c0000403:0x17c:0x0]		
     0		 [0x200000400:0x17d:0x0]		
     1		 [0x240000403:0x17d:0x0]		
     2		 [0x280000403:0x17d:0x0]		
     3		 [0x2c0000403:0x17d:0x0]		
     0		 [0x200000400:0x17e:0x0]		
     1		 [0x240000403:0x17e:0x0]		
     2		 [0x280000403:0x17e:0x0]		
     3		 [0x2c0000403:0x17e:0x0]		
     0		 [0x200000400:0x17f:0x0]		
     1		 [0x240000403:0x17f:0x0]		
     2		 [0x280000403:0x17f:0x0]		
     3		 [0x2c0000403:0x17f:0x0]		
     0		 [0x200000400:0x180:0x0]		
     1		 [0x240000403:0x180:0x0]		
     2		 [0x280000403:0x180:0x0]		
     3		 [0x2c0000403:0x180:0x0]		
     0		 [0x200000400:0x181:0x0]		
     1		 [0x240000403:0x181:0x0]		
     2		 [0x280000403:0x181:0x0]		
     3		 [0x2c0000403:0x181:0x0]		
     0		 [0x200000400:0x182:0x0]		
     1		 [0x240000403:0x182:0x0]		
     2		 [0x280000403:0x182:0x0]		
     3		 [0x2c0000403:0x182:0x0]		
     0		 [0x200000400:0x183:0x0]		
     1		 [0x240000403:0x183:0x0]		
     2		 [0x280000403:0x183:0x0]		
     3		 [0x2c0000403:0x183:0x0]		
Resetting fail_loc on all nodes...done.
PASS 300k (7s)
== sanity test complete, duration 9 sec == 01:30:11 (1433320211)
Comment by Gerrit Updater [ 05/Jun/15 ]

wangdi (di.wang@intel.com) uploaded a new patch: http://review.whamcloud.com/15161
Subject: LU-6602 llog: increase llog chunk size
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 99b0ee58d13a032feb4d580b23dc46771add65d2

Comment by Gerrit Updater [ 05/Jun/15 ]

wangdi (di.wang@intel.com) uploaded a new patch: http://review.whamcloud.com/15162
Subject: LU-6602 update: split update llog record
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 7a64f0dace7892a83cf456255d9f420e2a457ed8

Comment by Robert Read (Inactive) [ 05/Jun/15 ]

Let me know which build you want me to test.

Comment by Di Wang [ 05/Jun/15 ]

Robert: Please use this build https://build.hpdd.intel.com/job/lustre-reviews/32667/ Thanks.

Comment by Robert Read (Inactive) [ 10/Jun/15 ]

Not sure if this is related to DNE or not, but during setup one of the MDS nodes panic with this trace right after mounting an MDT:

LDISKFS-fs (xvdg1): mounted filesystem with ordered data mode. quota=on. Opts: 
LDISKFS-fs (xvdg1): mounted filesystem with ordered data mode. quota=on. Opts: 
LDISKFS-fs (xvdg1): mounted filesystem with ordered data mode. quota=on. Opts: 
BUG: unable to handle kernel NULL pointer dereference at 0000000000000024
IP: [<ffffffffa0232eb6>] llog_cat_process_or_fork+0x46/0x300 [obdclass]
PGD 0 
Oops: 0000 [#1] SMP 
last sysfs file: /sys/devices/vbd-2145/block/xvdg1/dev
CPU 7 
Modules linked in: osp(U) mdd(U) lod(U) mdt(U) lfsck(U) mgc(U) osd_ldiskfs(U) lquota(U) ldiskfs(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic crc32c_intel libcfs(U) ipv6 xen_netfront ext4 jbd2 mbcache xen_blkfront dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]

Pid: 7552, comm: lod0061_rec0063 Not tainted 2.6.32-504.16.2.el6_lustre.g2f99b7f.x86_64 #1  
RIP: e030:[<ffffffffa0232eb6>]  [<ffffffffa0232eb6>] llog_cat_process_or_fork+0x46/0x300 [obdclass]
RSP: e02b:ffff8806694b7da0  EFLAGS: 00010246
RAX: ffff88067e6aa378 RBX: ffff88068d020380 RCX: ffff88069c9bc240
RDX: ffffffffa0c48be0 RSI: ffff88068d020380 RDI: ffff8806694b7e70
RBP: ffff8806694b7e20 R08: 0000000000000000 R09: 0000000000000000
R10: ffff88067232eec0 R11: 1000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: ffff8806694b7e70 R15: ffff8806694b7e70
FS:  00007fe3a4090700(0000) GS:ffff880028122000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000024 CR3: 0000000001a85000 CR4: 0000000000002660
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process lod0061_rec0063 (pid: 7552, threadinfo ffff8806694b6000, task ffff8806694b5520)
Stack:
 ffff8806694b7e70 ffff88067232a800 ffff8806694b7e40 ffffffffa0c72dc0
<d> 0000000000000000 ffff8806694b7e70 ffff88066a66db40 ffff8806694b7e08
<d> ffff88067e6aa078 ffff88067fb62030 ffff88067fb622b8 ffff88069c9bc240
Call Trace:
 [<ffffffffa0c72dc0>] ? lod_sub_prep_llog+0x4f0/0x7b0 [lod]
 [<ffffffffa0233189>] llog_cat_process+0x19/0x20 [obdclass]
 [<ffffffffa0c4870a>] lod_sub_recovery_thread+0x4ba/0x990 [lod]
 [<ffffffff81007d82>] ? check_events+0x12/0x20
 [<ffffffff8152dbbc>] ? _spin_unlock_irqrestore+0x1c/0x20
 [<ffffffffa0c48250>] ? lod_sub_recovery_thread+0x0/0x990 [lod]
 [<ffffffff8109e71e>] kthread+0x9e/0xc0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8100b294>] ? int_ret_from_sys_call+0x7/0x1b
 [<ffffffff8100ba1d>] ? retint_restore_args+0x5/0x6
 [<ffffffff8100c200>] ? child_rip+0x0/0x20
Code: f8 0f 1f 44 00 00 f6 05 8c ae f3 ff 01 44 0f b6 6d 10 49 89 fe 48 89 f3 4c 8b 66 38 74 0d f6 05 70 ae f3 ff 40 0f 85 9a 01 00 00 <41> f6 44 24 24 02 0f 84 6b 02 00 00 48 89 4d a0 48 89 55 a8 44 
RIP  [<ffffffffa0232eb6>] llog_cat_process_or_fork+0x46/0x300 [obdclass]
 RSP <ffff8806694b7da0>
CR2: 0000000000000024
---[ end trace 2a9e4e41d6fdd5e2 ]---
Kernel panic - not syncing: Fatal exception
Pid: 7552, comm: lod0061_rec0063 Tainted: G      D    ---------------    2.6.32-504.16.2.el6_lustre.g2f99b7f.x86_64 #1
Call Trace:
 [<ffffffff81529fbc>] ? panic+0xa7/0x16f
 [<ffffffff8152dbbc>] ? _spin_unlock_irqrestore+0x1c/0x20
 [<ffffffff8152ed94>] ? oops_end+0xe4/0x100
 [<ffffffff8104c80b>] ? no_context+0xfb/0x260
 [<ffffffff8104ca95>] ? __bad_area_nosemaphore+0x125/0x1e0
 [<ffffffff8104cb63>] ? bad_area_nosemaphore+0x13/0x20
 [<ffffffff8104d25c>] ? __do_page_fault+0x30c/0x500
 [<ffffffff81007d82>] ? check_events+0x12/0x20
 [<ffffffff810075dd>] ? xen_force_evtchn_callback+0xd/0x10
 [<ffffffffa0511869>] ? out_update_pack+0xc9/0x190 [ptlrpc]
 [<ffffffff810075dd>] ? xen_force_evtchn_callback+0xd/0x10
 [<ffffffff81007d82>] ? check_events+0x12/0x20
 [<ffffffff81530cbe>] ? do_page_fault+0x3e/0xa0
 [<ffffffff8152e075>] ? page_fault+0x25/0x30
 [<ffffffffa0c48be0>] ? lod_process_recovery_updates+0x0/0x420 [lod]
 [<ffffffffa0232eb6>] ? llog_cat_process_or_fork+0x46/0x300 [obdclass]
 [<ffffffffa0c72dc0>] ? lod_sub_prep_llog+0x4f0/0x7b0 [lod]
 [<ffffffffa0233189>] ? llog_cat_process+0x19/0x20 [obdclass]
 [<ffffffffa0c4870a>] ? lod_sub_recovery_thread+0x4ba/0x990 [lod]
 [<ffffffff81007d82>] ? check_events+0x12/0x20
 [<ffffffff8152dbbc>] ? _spin_unlock_irqrestore+0x1c/0x20
 [<ffffffffa0c48250>] ? lod_sub_recovery_thread+0x0/0x990 [lod]
 [<ffffffff8109e71e>] ? kthread+0x9e/0xc0
 [<ffffffff8100c20a>] ? child_rip+0xa/0x20
 [<ffffffff8100b294>] ? int_ret_from_sys_call+0x7/0x1b
 [<ffffffff8100ba1d>] ? retint_restore_args+0x5/0x6
 [<ffffffff8100c200>] ? child_rip+0x0/0x20
Comment by Di Wang [ 10/Jun/15 ]

Robert: Thanks for testing. It indeed looks like DNE issue. I will check the code. Thanks.

Comment by Gerrit Updater [ 11/Jun/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14883/
Subject: LU-6602 obdclass: variable llog chunk size
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: dc689955366c895f9cdcc86d78f4221866fe0926

Comment by Gerrit Updater [ 14/Jun/15 ]

wangdi (di.wang@intel.com) uploaded a new patch: http://review.whamcloud.com/15274
Subject: LU-6602 osp: change lgh_hdr_lock to mutex
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 6e46cf6e89a578b2e8a236ac4e00433d0bed0bba

Comment by Di Wang [ 17/Jun/15 ]

Robert: could you please try https://build.hpdd.intel.com/job/lustre-reviews/32865/ Thanks.

Comment by Gerrit Updater [ 03/Jul/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/15161/
Subject: LU-6602 llog: increase update llog chunk size
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: b45e500e5a996d8529ab3d85d542908c93b1e1ce

Comment by Gerrit Updater [ 04/Jul/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/15162/
Subject: LU-6602 update: split update llog record
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: fb80ae7c7601a03c1181de381f067f553e7b8c6f

Comment by James A Simmons [ 07/Jul/15 ]

One patch left!!

Comment by Gerrit Updater [ 16/Jul/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/15274/
Subject: LU-6602 osp: change lgh_hdr_lock to mutex
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: fffe8ac7e42b6638bff9fe19c4bfeb6635023c92

Comment by James A Simmons [ 29/Aug/19 ]

sorry, typo. Meant to be LU-6202

Generated at Sat Feb 10 02:01:37 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.