[LU-6602] ASSERTION( rec->lrh_len <= 8192 ) failed Created: 14/May/15 Updated: 30/Aug/19 Resolved: 16/Jul/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.8.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Robert Read (Inactive) | Assignee: | Di Wang |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | dne2 | ||
| Issue Links: |
|
||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||
| Description |
|
Testing this build: https://build.hpdd.intel.com/job/lustre-reviews/32021/ In AWS environment with 64 MDTs (8 MDS * 8 MDT each).
After each reboot/recovery cycle the MDS would LBUG again with same error right after recovery completed. Presumably the client was resending the mkdir. Once I killed lfs, the crashes stopped. |
| Comments |
| Comment by Andreas Dilger [ 14/May/15 ] |
|
First off, we need to work out what the size of the llog record is as a function of the stripe count. The LASSERT() should be turned into an LASSERTF() for future reference, and we will also need to put in a safety check for the maximum stripe count on the MDS to avoid hitting this in the future. One option is to simply increase the maximum llog record size. Since these llog records are no longer sent over the network (except ChangeLogs, which would also benefit from an increase in llog size) for unlink and setattr processing, and DNE2 OUT only accesses them like regular files, then this won't have as big a problem as in the past (where many > 8KB buffer allocations on the clients could be problematic). Some code changes are needed in the llog code to handle multiple different llog chunk sizes. A reasonable maximum size would be 32KB, which is what our current RPC size limit is for 2000-stripe LOV EAs. Of course, it is also desirable to shrink the llog redo record size if possible, so that it can scale as much as possible. |
| Comment by Gerrit Updater [ 18/May/15 ] |
|
wangdi (di.wang@intel.com) uploaded a new patch: http://review.whamcloud.com/14842 |
| Comment by Gerrit Updater [ 20/May/15 ] |
|
wangdi (di.wang@intel.com) uploaded a new patch: http://review.whamcloud.com/14883 |
| Comment by Robert Read (Inactive) [ 26/May/15 ] |
|
Testing this build https://build.hpdd.intel.com/job/lustre-reviews/32346/ I started with creating a 64 stripe directory: [root@client00 scratch]# lfs mkdir -c 64 64stripes This hangs, and a I saw a soft lockup on the MDT0: BUG: soft lockup - CPU#0 stuck for 67s! [mdt00_006:12371] Modules linked in: 8021q garp stp llc osp(U) mdd(U) lod(U) mdt(U) lfsck(U) mgc(U) osd_ldiskfs(U) lquota(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic crc32c_intel libcfs(U) ldiskfs(U) ipv6 xen_netfront ext4 jbd2 mbcache xen_blkfront dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] CPU 0 Modules linked in: 8021q garp stp llc osp(U) mdd(U) lod(U) mdt(U) lfsck(U) mgc(U) osd_ldiskfs(U) lquota(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic crc32c_intel libcfs(U) ldiskfs(U) ipv6 xen_netfront ext4 jbd2 mbcache xen_blkfront dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] Pid: 12371, comm: mdt00_006 Not tainted 2.6.32-504.16.2.el6_lustre.g2f99b7f.x86_64 #1 RIP: e030:[<ffffffffa0c6a1e5>] [<ffffffffa0c6a1e5>] lod_sub_object_index_insert+0xc5/0x330 [lod] RSP: e02b:ffff88076e32b810 EFLAGS: 00000202 RAX: 0000000000000142 RBX: ffff8805d6dbbc70 RCX: ffff8805d3916030 RDX: ffff8805d3916000 RSI: ffff8805d3916030 RDI: ffff8805d3917e62 RBP: ffff88076e32b8a0 R08: 0000000000000152 R09: 0000000000001e32 R10: ffff8805d3916010 R11: 0000000000000003 R12: ffff8805dd068740 R13: ffff8805d6b03bc0 R14: ffff88074cb47f80 R15: ffff8805d6f9c558 FS: 00007fac721f1740(0000) GS:ffff880028050000(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007fac721fe000 CR3: 00000005f7678000 CR4: 0000000000002660 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process mdt00_006 (pid: 12371, threadinfo ffff88076e32a000, task ffff8805d7a84ab0) Stack: ffff88076e32b860 ffff8805e1a523c0 ffff8805d6f9c6a8 ffff8805d6f9c558 <d> 0000000000002000 0000000000002000 ffff8805d6f9c6a8 ffff8805f9a7f6c0 <d> 6a04000002000000 000000000000001a 00000000000002dc 01ff8805d6f9c688 Call Trace: [<ffffffffa0c5e460>] lod_dir_striping_create_internal+0xd60/0x1610 [lod] [<ffffffffa0c5f3e5>] lod_xattr_set+0x365/0x3e0 [lod] [<ffffffffa0cc1e06>] mdo_xattr_set.clone.0+0xc6/0x170 [mdd] [<ffffffffa0ccebb3>] mdd_object_create+0x493/0x8d0 [mdd] [<ffffffffa059b07a>] ? top_trans_start+0x3fa/0x880 [ptlrpc] [<ffffffffa0cd1e88>] mdd_create+0xc98/0x1750 [mdd] [<ffffffffa0b8d5ec>] ? mdt_version_save+0x8c/0x1a0 [mdt] [<ffffffffa0b9197c>] mdt_reint_create+0xbbc/0xcc0 [mdt] [<ffffffffa031fc00>] ? lu_ucred+0x20/0x30 [obdclass] [<ffffffffa0b6dd95>] ? mdt_ucred+0x15/0x20 [mdt] [<ffffffffa0b888ec>] ? mdt_root_squash+0x2c/0x3f0 [mdt] [<ffffffffa054bcb2>] ? __req_capsule_get+0x162/0x6e0 [ptlrpc] [<ffffffffa0b8c99d>] mdt_reint_rec+0x5d/0x200 [mdt] [<ffffffffa0b71f6b>] mdt_reint_internal+0x4cb/0x7a0 [mdt] [<ffffffffa0b726db>] mdt_reint+0x6b/0x120 [mdt] [<ffffffffa058742e>] tgt_request_handle+0x95e/0x10b0 [ptlrpc] [<ffffffffa0536b71>] ptlrpc_main+0xe41/0x1970 [ptlrpc] [<ffffffff81060c3f>] ? finish_task_switch+0x4f/0xf0 [<ffffffffa0535d30>] ? ptlrpc_main+0x0/0x1970 [ptlrpc] [<ffffffff8109e71e>] kthread+0x9e/0xc0 [<ffffffff8100c20a>] child_rip+0xa/0x20 [<ffffffff8100b294>] ? int_ret_from_sys_call+0x7/0x1b [<ffffffff8100ba1d>] ? retint_restore_args+0x5/0x6 [<ffffffff8100c200>] ? child_rip+0x0/0x20 Code: 43 08 48 89 45 98 44 8b 42 28 45 85 c0 0f 84 67 01 00 00 48 8d 72 30 31 c0 45 31 c9 48 89 f1 48 89 f7 0f 1f 40 00 44 0f b7 5f 12 <83> c0 01 4f 8d 5c 1b 14 4d 01 d9 4c 01 df 41 39 c0 77 e8 31 c0 Call Trace: [<ffffffffa0c5e460>] lod_dir_striping_create_internal+0xd60/0x1610 [lod] [<ffffffffa0c5f3e5>] lod_xattr_set+0x365/0x3e0 [lod] [<ffffffffa0cc1e06>] mdo_xattr_set.clone.0+0xc6/0x170 [mdd] [<ffffffffa0ccebb3>] mdd_object_create+0x493/0x8d0 [mdd] [<ffffffffa059b07a>] ? top_trans_start+0x3fa/0x880 [ptlrpc] [<ffffffffa0cd1e88>] mdd_create+0xc98/0x1750 [mdd] [<ffffffffa0b8d5ec>] ? mdt_version_save+0x8c/0x1a0 [mdt] [<ffffffffa0b9197c>] mdt_reint_create+0xbbc/0xcc0 [mdt] [<ffffffffa031fc00>] ? lu_ucred+0x20/0x30 [obdclass] [<ffffffffa0b6dd95>] ? mdt_ucred+0x15/0x20 [mdt] [<ffffffffa0b888ec>] ? mdt_root_squash+0x2c/0x3f0 [mdt] [<ffffffffa054bcb2>] ? __req_capsule_get+0x162/0x6e0 [ptlrpc] [<ffffffffa0b8c99d>] mdt_reint_rec+0x5d/0x200 [mdt] [<ffffffffa0b71f6b>] mdt_reint_internal+0x4cb/0x7a0 [mdt] [<ffffffffa0b726db>] mdt_reint+0x6b/0x120 [mdt] [<ffffffffa058742e>] tgt_request_handle+0x95e/0x10b0 [ptlrpc] [<ffffffffa0536b71>] ptlrpc_main+0xe41/0x1970 [ptlrpc] [<ffffffff81060c3f>] ? finish_task_switch+0x4f/0xf0 [<ffffffffa0535d30>] ? ptlrpc_main+0x0/0x1970 [ptlrpc] [<ffffffff8109e71e>] kthread+0x9e/0xc0 [<ffffffff8100c20a>] child_rip+0xa/0x20 [<ffffffff8100b294>] ? int_ret_from_sys_call+0x7/0x1b [<ffffffff8100ba1d>] ? retint_restore_args+0x5/0x6 [<ffffffff8100c200>] ? child_rip+0x0/0x20 [root@mdt00 ~]# Message from syslogd@mdt00 at May 26 18:56:44 ... kernel:BUG: soft lockup - CPU#0 stuck for 67s! [mdt00_006:12371] Message from syslogd@mdt00 at May 26 18:58:08 ... kernel:BUG: soft lockup - CPU#0 stuck for 67s! [mdt00_006:12371] |
| Comment by Di Wang [ 28/May/15 ] |
|
As discussed with Andreas and Alex, I changed the patch a bit. Unfortunately, due to the landing process, I can not push the independent patch there. So I pushed the whole dne3 patches here. http://review.whamcloud.com/#/c/13942 With this patch, I can create 48 stripes directory. But let me try this build on OpenSFS node first. |
| Comment by Di Wang [ 28/May/15 ] |
|
Ok, I was able to create a striped_dir with 68 stripes on Opensfs cluster. with the build [root@c17 tests]# MDSCOUNT=68 sh llmount.sh
Stopping clients: c17 /mnt/lustre (opts:)
Stopping clients: c17 /mnt/lustre2 (opts:)
Loading modules from /usr/lib64/lustre/tests/..
detected 8 online CPUs by sysfs
libcfs will create CPU partition based on online CPUs
debug=vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck
subsystem_debug=all -lnet -lnd -pinger
Formatting mgs, mds, osts
Format mds1: /tmp/lustre-mdt1
Format mds2: /tmp/lustre-mdt2
Format mds3: /tmp/lustre-mdt3
Format mds4: /tmp/lustre-mdt4
Format mds5: /tmp/lustre-mdt5
Format mds6: /tmp/lustre-mdt6
Format mds7: /tmp/lustre-mdt7
Format mds8: /tmp/lustre-mdt8
Format mds9: /tmp/lustre-mdt9
Format mds10: /tmp/lustre-mdt10
Format mds11: /tmp/lustre-mdt11
Format mds12: /tmp/lustre-mdt12
Format mds13: /tmp/lustre-mdt13
Format mds14: /tmp/lustre-mdt14
Format mds15: /tmp/lustre-mdt15
Format mds16: /tmp/lustre-mdt16
Format mds17: /tmp/lustre-mdt17
Format mds18: /tmp/lustre-mdt18
Format mds19: /tmp/lustre-mdt19
Format mds20: /tmp/lustre-mdt20
Format mds21: /tmp/lustre-mdt21
Format mds22: /tmp/lustre-mdt22
Format mds23: /tmp/lustre-mdt23
Format mds24: /tmp/lustre-mdt24
Format mds25: /tmp/lustre-mdt25
Format mds26: /tmp/lustre-mdt26
Format mds27: /tmp/lustre-mdt27
Format mds28: /tmp/lustre-mdt28
Format mds29: /tmp/lustre-mdt29
Format mds30: /tmp/lustre-mdt30
Format mds31: /tmp/lustre-mdt31
Format mds32: /tmp/lustre-mdt32
Format mds33: /tmp/lustre-mdt33
Format mds34: /tmp/lustre-mdt34
Format mds35: /tmp/lustre-mdt35
Format mds36: /tmp/lustre-mdt36
Format mds37: /tmp/lustre-mdt37
Format mds38: /tmp/lustre-mdt38
Format mds39: /tmp/lustre-mdt39
Format mds40: /tmp/lustre-mdt40
Format mds41: /tmp/lustre-mdt41
Format mds42: /tmp/lustre-mdt42
Format mds43: /tmp/lustre-mdt43
Format mds44: /tmp/lustre-mdt44
Format mds45: /tmp/lustre-mdt45
Format mds46: /tmp/lustre-mdt46
Format mds47: /tmp/lustre-mdt47
Format mds48: /tmp/lustre-mdt48
Format mds49: /tmp/lustre-mdt49
Format mds50: /tmp/lustre-mdt50
Format mds51: /tmp/lustre-mdt51
Format mds52: /tmp/lustre-mdt52
Format mds53: /tmp/lustre-mdt53
Format mds54: /tmp/lustre-mdt54
Format mds55: /tmp/lustre-mdt55
Format mds56: /tmp/lustre-mdt56
Format mds57: /tmp/lustre-mdt57
Format mds58: /tmp/lustre-mdt58
Format mds59: /tmp/lustre-mdt59
Format mds60: /tmp/lustre-mdt60
Format mds61: /tmp/lustre-mdt61
Format mds62: /tmp/lustre-mdt62
Format mds63: /tmp/lustre-mdt63
Format mds64: /tmp/lustre-mdt64
Format mds65: /tmp/lustre-mdt65
Format mds66: /tmp/lustre-mdt66
Format mds67: /tmp/lustre-mdt67
Format mds68: /tmp/lustre-mdt68
Format ost1: /tmp/lustre-ost1
Format ost2: /tmp/lustre-ost2
Checking servers environments
Checking clients c17 environments
Loading modules from /usr/lib64/lustre/tests/..
detected 8 online CPUs by sysfs
libcfs will create CPU partition based on online CPUs
debug=vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck
subsystem_debug=all -lnet -lnd -pinger
Setup mgs, mdt, osts
Starting mds1: -o loop /tmp/lustre-mdt1 /mnt/mds1
Started lustre-MDT0000
Starting mds2: -o loop /tmp/lustre-mdt2 /mnt/mds2
Started lustre-MDT0001
Starting mds3: -o loop /tmp/lustre-mdt3 /mnt/mds3
Started lustre-MDT0002
Starting mds4: -o loop /tmp/lustre-mdt4 /mnt/mds4
Started lustre-MDT0003
Starting mds5: -o loop /tmp/lustre-mdt5 /mnt/mds5
Started lustre-MDT0004
Starting mds6: -o loop /tmp/lustre-mdt6 /mnt/mds6
Started lustre-MDT0005
Starting mds7: -o loop /tmp/lustre-mdt7 /mnt/mds7
Started lustre-MDT0006
Starting mds8: -o loop /tmp/lustre-mdt8 /mnt/mds8
Started lustre-MDT0007
Starting mds9: -o loop /tmp/lustre-mdt9 /mnt/mds9
Started lustre-MDT0008
Starting mds10: -o loop /tmp/lustre-mdt10 /mnt/mds10
Started lustre-MDT0009
Starting mds11: -o loop /tmp/lustre-mdt11 /mnt/mds11
Started lustre-MDT000a
Starting mds12: -o loop /tmp/lustre-mdt12 /mnt/mds12
Started lustre-MDT000b
Starting mds13: -o loop /tmp/lustre-mdt13 /mnt/mds13
Started lustre-MDT000c
Starting mds14: -o loop /tmp/lustre-mdt14 /mnt/mds14
Started lustre-MDT000d
Starting mds15: -o loop /tmp/lustre-mdt15 /mnt/mds15
Started lustre-MDT000e
Starting mds16: -o loop /tmp/lustre-mdt16 /mnt/mds16
Started lustre-MDT000f
Starting mds17: -o loop /tmp/lustre-mdt17 /mnt/mds17
Started lustre-MDT0010
Starting mds18: -o loop /tmp/lustre-mdt18 /mnt/mds18
Started lustre-MDT0011
Starting mds19: -o loop /tmp/lustre-mdt19 /mnt/mds19
Started lustre-MDT0012
Starting mds20: -o loop /tmp/lustre-mdt20 /mnt/mds20
Started lustre-MDT0013
Starting mds21: -o loop /tmp/lustre-mdt21 /mnt/mds21
Started lustre-MDT0014
Starting mds22: -o loop /tmp/lustre-mdt22 /mnt/mds22
Started lustre-MDT0015
Starting mds23: -o loop /tmp/lustre-mdt23 /mnt/mds23
Started lustre-MDT0016
Starting mds24: -o loop /tmp/lustre-mdt24 /mnt/mds24
Started lustre-MDT0017
Starting mds25: -o loop /tmp/lustre-mdt25 /mnt/mds25
Started lustre-MDT0018
Starting mds26: -o loop /tmp/lustre-mdt26 /mnt/mds26
Started lustre-MDT0019
Starting mds27: -o loop /tmp/lustre-mdt27 /mnt/mds27
Started lustre-MDT001a
Starting mds28: -o loop /tmp/lustre-mdt28 /mnt/mds28
Started lustre-MDT001b
Starting mds29: -o loop /tmp/lustre-mdt29 /mnt/mds29
Started lustre-MDT001c
Starting mds30: -o loop /tmp/lustre-mdt30 /mnt/mds30
Started lustre-MDT001d
Starting mds31: -o loop /tmp/lustre-mdt31 /mnt/mds31
Started lustre-MDT001e
Starting mds32: -o loop /tmp/lustre-mdt32 /mnt/mds32
Started lustre-MDT001f
Starting mds33: -o loop /tmp/lustre-mdt33 /mnt/mds33
Started lustre-MDT0020
Starting mds34: -o loop /tmp/lustre-mdt34 /mnt/mds34
Started lustre-MDT0021
Starting mds35: -o loop /tmp/lustre-mdt35 /mnt/mds35
Started lustre-MDT0022
Starting mds36: -o loop /tmp/lustre-mdt36 /mnt/mds36
Started lustre-MDT0023
Starting mds37: -o loop /tmp/lustre-mdt37 /mnt/mds37
Started lustre-MDT0024
Starting mds38: -o loop /tmp/lustre-mdt38 /mnt/mds38
Started lustre-MDT0025
Starting mds39: -o loop /tmp/lustre-mdt39 /mnt/mds39
Started lustre-MDT0026
Starting mds40: -o loop /tmp/lustre-mdt40 /mnt/mds40
Started lustre-MDT0027
Starting mds41: -o loop /tmp/lustre-mdt41 /mnt/mds41
Started lustre-MDT0028
Starting mds42: -o loop /tmp/lustre-mdt42 /mnt/mds42
Started lustre-MDT0029
Starting mds43: -o loop /tmp/lustre-mdt43 /mnt/mds43
Started lustre-MDT002a
Starting mds44: -o loop /tmp/lustre-mdt44 /mnt/mds44
Started lustre-MDT002b
Starting mds45: -o loop /tmp/lustre-mdt45 /mnt/mds45
Started lustre-MDT002c
Starting mds46: -o loop /tmp/lustre-mdt46 /mnt/mds46
Started lustre-MDT002d
Starting mds47: -o loop /tmp/lustre-mdt47 /mnt/mds47
Started lustre-MDT002e
Starting mds48: -o loop /tmp/lustre-mdt48 /mnt/mds48
Started lustre-MDT002f
Starting mds49: -o loop /tmp/lustre-mdt49 /mnt/mds49
Started lustre-MDT0030
Starting mds50: -o loop /tmp/lustre-mdt50 /mnt/mds50
Started lustre-MDT0031
Starting mds51: -o loop /tmp/lustre-mdt51 /mnt/mds51
Started lustre-MDT0032
Starting mds52: -o loop /tmp/lustre-mdt52 /mnt/mds52
Started lustre-MDT0033
Starting mds53: -o loop /tmp/lustre-mdt53 /mnt/mds53
Started lustre-MDT0034
Starting mds54: -o loop /tmp/lustre-mdt54 /mnt/mds54
Started lustre-MDT0035
Starting mds55: -o loop /tmp/lustre-mdt55 /mnt/mds55
Started lustre-MDT0036
Starting mds56: -o loop /tmp/lustre-mdt56 /mnt/mds56
Started lustre-MDT0037
Starting mds57: -o loop /tmp/lustre-mdt57 /mnt/mds57
Started lustre-MDT0038
Starting mds58: -o loop /tmp/lustre-mdt58 /mnt/mds58
Started lustre-MDT0039
Starting mds59: -o loop /tmp/lustre-mdt59 /mnt/mds59
Started lustre-MDT003a
Starting mds60: -o loop /tmp/lustre-mdt60 /mnt/mds60
Started lustre-MDT003b
Starting mds61: -o loop /tmp/lustre-mdt61 /mnt/mds61
Started lustre-MDT003c
Starting mds62: -o loop /tmp/lustre-mdt62 /mnt/mds62
Started lustre-MDT003d
Starting mds63: -o loop /tmp/lustre-mdt63 /mnt/mds63
Started lustre-MDT003e
Starting mds64: -o loop /tmp/lustre-mdt64 /mnt/mds64
Started lustre-MDT003f
Starting mds65: -o loop /tmp/lustre-mdt65 /mnt/mds65
Started lustre-MDT0040
Starting mds66: -o loop /tmp/lustre-mdt66 /mnt/mds66
Started lustre-MDT0041
Starting mds67: -o loop /tmp/lustre-mdt67 /mnt/mds67
Started lustre-MDT0042
Starting mds68: -o loop /tmp/lustre-mdt68 /mnt/mds68
Started lustre-MDT0043
Starting ost1: -o loop /tmp/lustre-ost1 /mnt/ost1
Started lustre-OST0000
Starting ost2: -o loop /tmp/lustre-ost2 /mnt/ost2
Started lustre-OST0001
Starting client: c17: -o user_xattr,flock c17@tcp:/lustre /mnt/lustre
Using TIMEOUT=20
seting jobstats to procname_uid
Setting lustre.sys.jobid_var from disable to procname_uid
warning: 'lctl conf_param' is deprecated, use 'lctl set_param -P' instead
Waiting 90 secs for update
Updated after 8s: wanted 'procname_uid' got 'procname_uid'
disable quota as required
[root@c17 tests]#
[root@c17 tests]#
[root@c17 tests]# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda1 20642428 5641124 13952728 29% /
tmpfs 16426748 0 16426748 0% /dev/shm
192.168.0.1:/scratch 416433152 272369664 122909696 69% /scratch
192.168.0.1:/home 416433152 272369664 122909696 69% /home
/dev/loop0 133560 8912 114936 8% /mnt/mds1
/dev/loop1 133560 1628 122208 2% /mnt/mds2
/dev/loop2 133560 1632 122204 2% /mnt/mds3
/dev/loop3 133560 1636 122200 2% /mnt/mds4
/dev/loop4 133560 1640 122196 2% /mnt/mds5
/dev/loop5 133560 1648 122188 2% /mnt/mds6
/dev/loop6 133560 1648 122188 2% /mnt/mds7
/dev/loop7 133560 1652 122184 2% /mnt/mds8
/dev/loop8 133560 1656 122180 2% /mnt/mds9
/dev/loop9 133560 1664 122172 2% /mnt/mds10
/dev/loop10 133560 1664 122172 2% /mnt/mds11
/dev/loop11 133560 1668 122168 2% /mnt/mds12
/dev/loop12 133560 1676 122160 2% /mnt/mds13
/dev/loop13 133560 1684 122152 2% /mnt/mds14
/dev/loop14 133560 1688 122148 2% /mnt/mds15
/dev/loop15 133560 1688 122148 2% /mnt/mds16
/dev/loop16 133560 1692 122144 2% /mnt/mds17
/dev/loop17 133560 1696 122140 2% /mnt/mds18
/dev/loop18 133560 1704 122132 2% /mnt/mds19
/dev/loop19 133560 1708 122128 2% /mnt/mds20
/dev/loop20 133560 1708 122128 2% /mnt/mds21
/dev/loop21 133560 1712 122124 2% /mnt/mds22
/dev/loop22 133560 1720 122116 2% /mnt/mds23
/dev/loop23 133560 1724 122112 2% /mnt/mds24
/dev/loop24 133560 1724 122112 2% /mnt/mds25
/dev/loop25 133560 1728 122108 2% /mnt/mds26
/dev/loop26 133560 1736 122100 2% /mnt/mds27
/dev/loop27 133560 1740 122096 2% /mnt/mds28
/dev/loop28 133560 1744 122092 2% /mnt/mds29
/dev/loop29 133560 1744 122092 2% /mnt/mds30
/dev/loop30 133560 1752 122084 2% /mnt/mds31
/dev/loop31 133560 1756 122080 2% /mnt/mds32
/dev/loop32 133560 1760 122076 2% /mnt/mds33
/dev/loop33 133560 1764 122072 2% /mnt/mds34
/dev/loop34 133560 1768 122068 2% /mnt/mds35
/dev/loop35 133560 1772 122064 2% /mnt/mds36
/dev/loop36 133560 1776 122060 2% /mnt/mds37
/dev/loop37 133560 1780 122056 2% /mnt/mds38
/dev/loop38 133560 1780 122056 2% /mnt/mds39
/dev/loop39 133560 1792 122044 2% /mnt/mds40
/dev/loop40 133560 1796 122040 2% /mnt/mds41
/dev/loop41 133560 1800 122036 2% /mnt/mds42
/dev/loop42 133560 1804 122032 2% /mnt/mds43
/dev/loop43 133560 1808 122028 2% /mnt/mds44
/dev/loop44 133560 1812 122024 2% /mnt/mds45
/dev/loop45 133560 1816 122020 2% /mnt/mds46
/dev/loop46 133560 1820 122016 2% /mnt/mds47
/dev/loop47 133560 1828 122008 2% /mnt/mds48
/dev/loop48 133560 1828 122008 2% /mnt/mds49
/dev/loop49 133560 1832 122004 2% /mnt/mds50
/dev/loop50 133560 1836 122000 2% /mnt/mds51
/dev/loop51 133560 1844 121992 2% /mnt/mds52
/dev/loop52 133560 1844 121992 2% /mnt/mds53
/dev/loop53 133560 1848 121988 2% /mnt/mds54
/dev/loop54 133560 1852 121984 2% /mnt/mds55
/dev/loop55 133560 1860 121976 2% /mnt/mds56
/dev/loop56 133560 1864 121972 2% /mnt/mds57
/dev/loop57 133560 1864 121972 2% /mnt/mds58
/dev/loop58 133560 1868 121968 2% /mnt/mds59
/dev/loop59 133560 1876 121960 2% /mnt/mds60
/dev/loop60 133560 1880 121956 2% /mnt/mds61
/dev/loop61 133560 1884 121952 2% /mnt/mds62
/dev/loop62 133560 1884 121952 2% /mnt/mds63
/dev/loop63 133560 1888 121948 2% /mnt/mds64
/dev/loop64 133560 1896 121940 2% /mnt/mds65
/dev/loop65 133560 1900 121936 2% /mnt/mds66
/dev/loop66 133560 1900 121936 2% /mnt/mds67
/dev/loop67 133560 1904 121932 2% /mnt/mds68
/dev/loop68 171080 18728 141948 12% /mnt/ost1
/dev/loop69 171080 18728 141948 12% /mnt/ost2
c17@tcp:/lustre 342160 37456 283896 12% /mnt/lustre
[root@c17 tests]# lfs setdirstripe -c68 /mnt/lustre/test1
[root@c17 tests]#
[root@c17 tests]# lfs getdirstripe /mnt/lustre/test1
/mnt/lustre/test1
lmv_stripe_count: 68 lmv_stripe_offset: 0
mdtidx FID[seq:oid:ver]
0 [0x200000400:0x2:0x0]
1 [0x240000406:0x2:0x0]
2 [0x280000406:0x2:0x0]
3 [0x2c0000406:0x2:0x0]
4 [0x300000406:0x2:0x0]
5 [0x340000406:0x2:0x0]
6 [0x380000406:0x2:0x0]
7 [0x3c000040b:0x2:0x0]
8 [0x40000040b:0x2:0x0]
9 [0x44000040b:0x2:0x0]
10 [0x48000040b:0x2:0x0]
11 [0x4c000040b:0x2:0x0]
12 [0x500000412:0x2:0x0]
13 [0x540000412:0x2:0x0]
14 [0x580000412:0x2:0x0]
15 [0x5c0000412:0x2:0x0]
16 [0x600000412:0x2:0x0]
17 [0x640000412:0x2:0x0]
18 [0x68000040b:0x2:0x0]
19 [0x6c0000417:0x2:0x0]
20 [0x700000417:0x2:0x0]
21 [0x740000417:0x2:0x0]
22 [0x780000417:0x2:0x0]
23 [0x7c0000404:0x2:0x0]
24 [0x80000041b:0x2:0x0]
25 [0x84000041b:0x2:0x0]
26 [0x88000041b:0x2:0x0]
27 [0x8c000040c:0x2:0x0]
28 [0x900000403:0x2:0x0]
29 [0x940000421:0x2:0x0]
30 [0x980000421:0x2:0x0]
31 [0x9c0000421:0x2:0x0]
32 [0xa0000041a:0x2:0x0]
33 [0xa40000403:0x2:0x0]
34 [0xa80000427:0x2:0x0]
35 [0xac0000427:0x2:0x0]
36 [0xb00000427:0x2:0x0]
37 [0xb40000427:0x2:0x0]
38 [0xb80000416:0x2:0x0]
39 [0xbc0000402:0x2:0x0]
40 [0xc0000042e:0x2:0x0]
41 [0xc4000042e:0x2:0x0]
42 [0xc8000042f:0x2:0x0]
43 [0xcc000042f:0x2:0x0]
44 [0xd00000428:0x2:0x0]
45 [0xd4000041c:0x2:0x0]
46 [0xd80000411:0x2:0x0]
47 [0xdc0000402:0x2:0x0]
48 [0xe00000434:0x2:0x0]
49 [0xe40000434:0x2:0x0]
50 [0xe8000042e:0x2:0x0]
51 [0xec0000422:0x2:0x0]
52 [0xf00000415:0x2:0x0]
53 [0xf40000409:0x2:0x0]
54 [0xf80000439:0x2:0x0]
55 [0xfc0000439:0x2:0x0]
56 [0x1000000428:0x2:0x0]
57 [0x1040000419:0x2:0x0]
58 [0x1080000409:0x2:0x0]
59 [0x10c000043d:0x2:0x0]
60 [0x110000042d:0x2:0x0]
61 [0x114000041a:0x2:0x0]
62 [0x1180000403:0x2:0x0]
63 [0x11c0000441:0x2:0x0]
64 [0x1200000432:0x2:0x0]
65 [0x124000041d:0x2:0x0]
66 [0x1280000407:0x2:0x0]
67 [0x12c0000443:0x2:0x0]
[root@c17 tests]#
|
| Comment by Di Wang [ 29/May/15 ] |
|
I can create striped directory with 86 stripe count now. Robert: Could you please try this build? Thanks. [root@c17 tests]# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda1 20642428 9385264 10208588 48% /
tmpfs 16426748 0 16426748 0% /dev/shm
192.168.0.1:/scratch 416433152 272370688 122909696 69% /scratch
192.168.0.1:/home 416433152 272370688 122909696 69% /home
/dev/loop0 133560 12716 111140 11% /mnt/mds1
/dev/loop1 133560 1636 122200 2% /mnt/mds2
/dev/loop2 133560 1640 122196 2% /mnt/mds3
/dev/loop3 133560 1644 122192 2% /mnt/mds4
/dev/loop4 133560 1648 122188 2% /mnt/mds5
/dev/loop5 133560 1656 122180 2% /mnt/mds6
/dev/loop6 133560 1656 122180 2% /mnt/mds7
/dev/loop7 133560 1660 122176 2% /mnt/mds8
/dev/loop8 133560 1664 122172 2% /mnt/mds9
/dev/loop9 133560 1672 122164 2% /mnt/mds10
/dev/loop10 133560 1672 122164 2% /mnt/mds11
/dev/loop11 133560 1676 122160 2% /mnt/mds12
/dev/loop12 133560 1684 122152 2% /mnt/mds13
/dev/loop13 133560 1692 122144 2% /mnt/mds14
/dev/loop14 133560 1696 122140 2% /mnt/mds15
/dev/loop15 133560 1696 122140 2% /mnt/mds16
/dev/loop16 133560 1700 122136 2% /mnt/mds17
/dev/loop17 133560 1704 122132 2% /mnt/mds18
/dev/loop18 133560 1712 122124 2% /mnt/mds19
/dev/loop19 133560 1716 122120 2% /mnt/mds20
/dev/loop20 133560 1716 122120 2% /mnt/mds21
/dev/loop21 133560 1720 122116 2% /mnt/mds22
/dev/loop22 133560 1728 122108 2% /mnt/mds23
/dev/loop23 133560 1732 122104 2% /mnt/mds24
/dev/loop24 133560 1732 122104 2% /mnt/mds25
/dev/loop25 133560 1736 122100 2% /mnt/mds26
/dev/loop26 133560 1744 122092 2% /mnt/mds27
/dev/loop27 133560 1748 122088 2% /mnt/mds28
/dev/loop28 133560 1752 122084 2% /mnt/mds29
/dev/loop29 133560 1752 122084 2% /mnt/mds30
/dev/loop30 133560 1760 122076 2% /mnt/mds31
/dev/loop31 133560 1764 122072 2% /mnt/mds32
/dev/loop32 133560 1768 122068 2% /mnt/mds33
/dev/loop33 133560 1772 122064 2% /mnt/mds34
/dev/loop34 133560 1776 122060 2% /mnt/mds35
/dev/loop35 133560 1780 122056 2% /mnt/mds36
/dev/loop36 133560 1784 122052 2% /mnt/mds37
/dev/loop37 133560 1788 122048 2% /mnt/mds38
/dev/loop38 133560 1788 122048 2% /mnt/mds39
/dev/loop39 133560 1800 122036 2% /mnt/mds40
/dev/loop40 133560 1804 122032 2% /mnt/mds41
/dev/loop41 133560 1808 122028 2% /mnt/mds42
/dev/loop42 133560 1812 122024 2% /mnt/mds43
/dev/loop43 133560 1816 122020 2% /mnt/mds44
/dev/loop44 133560 1820 122016 2% /mnt/mds45
/dev/loop45 133560 1824 122012 2% /mnt/mds46
/dev/loop46 133560 1828 122008 2% /mnt/mds47
/dev/loop47 133560 1836 122000 2% /mnt/mds48
/dev/loop48 133560 1836 122000 2% /mnt/mds49
/dev/loop49 133560 1840 121996 2% /mnt/mds50
/dev/loop50 133560 1844 121992 2% /mnt/mds51
/dev/loop51 133560 1852 121984 2% /mnt/mds52
/dev/loop52 133560 1852 121984 2% /mnt/mds53
/dev/loop53 133560 1856 121980 2% /mnt/mds54
/dev/loop54 133560 1860 121976 2% /mnt/mds55
/dev/loop55 133560 1868 121968 2% /mnt/mds56
/dev/loop56 133560 1872 121964 2% /mnt/mds57
/dev/loop57 133560 1872 121964 2% /mnt/mds58
/dev/loop58 133560 1876 121960 2% /mnt/mds59
/dev/loop59 133560 1884 121952 2% /mnt/mds60
/dev/loop60 133560 1888 121948 2% /mnt/mds61
/dev/loop61 133560 1892 121944 2% /mnt/mds62
/dev/loop62 133560 1892 121944 2% /mnt/mds63
/dev/loop63 133560 1896 121940 2% /mnt/mds64
/dev/loop64 133560 1904 121932 2% /mnt/mds65
/dev/loop65 133560 1908 121928 2% /mnt/mds66
/dev/loop66 133560 1908 121928 2% /mnt/mds67
/dev/loop67 133560 1912 121924 2% /mnt/mds68
/dev/loop68 133560 1920 121916 2% /mnt/mds69
/dev/loop69 133560 1924 121912 2% /mnt/mds70
/dev/loop70 133560 1928 121908 2% /mnt/mds71
/dev/loop71 133560 1928 121908 2% /mnt/mds72
/dev/loop72 133560 1936 121900 2% /mnt/mds73
/dev/loop73 133560 1940 121896 2% /mnt/mds74
/dev/loop74 133560 1944 121892 2% /mnt/mds75
/dev/loop75 133560 1948 121888 2% /mnt/mds76
/dev/loop76 133560 1952 121884 2% /mnt/mds77
/dev/loop77 133560 1956 121880 2% /mnt/mds78
/dev/loop78 133560 1960 121876 2% /mnt/mds79
/dev/loop79 133560 1964 121872 2% /mnt/mds80
/dev/loop80 133560 1968 121868 2% /mnt/mds81
/dev/loop81 133560 1972 121864 2% /mnt/mds82
/dev/loop82 133560 1976 121860 2% /mnt/mds83
/dev/loop83 133560 1980 121856 2% /mnt/mds84
/dev/loop84 133560 1988 121848 2% /mnt/mds85
/dev/loop85 133560 1988 121848 2% /mnt/mds86
/dev/loop86 171080 21232 139448 14% /mnt/ost1
/dev/loop87 171080 21232 139448 14% /mnt/ost2
c17@tcp:/lustre 342160 42464 278896 14% /mnt/lustre
[root@c17 tests]# ../uti^C
[root@c17 tests]# echo 0 > /proc/sys/lnet/panic_on_lbug
[root@c17 tests]# ../util^C
[root@c17 tests]# MDSC^C
[root@c17 tests]# lfs mkdir -c86 /mnt/lustre/test1
[root@c17 tests]# lfs getdirstripe /mnt/lustre/test1
/mnt/lustre/test1
lmv_stripe_count: 86 lmv_stripe_offset: 0
mdtidx FID[seq:oid:ver]
0 [0x200000400:0x2:0x0]
1 [0x240000406:0x2:0x0]
2 [0x280000406:0x2:0x0]
3 [0x2c0000406:0x2:0x0]
4 [0x300000406:0x2:0x0]
5 [0x340000406:0x2:0x0]
6 [0x380000409:0x2:0x0]
7 [0x3c000040b:0x2:0x0]
8 [0x40000040b:0x2:0x0]
9 [0x44000040b:0x2:0x0]
10 [0x48000040b:0x2:0x0]
11 [0x4c000040b:0x2:0x0]
12 [0x500000408:0x2:0x0]
13 [0x540000413:0x2:0x0]
14 [0x580000413:0x2:0x0]
15 [0x5c0000413:0x2:0x0]
16 [0x600000413:0x2:0x0]
17 [0x640000413:0x2:0x0]
18 [0x680000413:0x2:0x0]
19 [0x6c0000402:0x2:0x0]
20 [0x70000041a:0x2:0x0]
21 [0x74000041a:0x2:0x0]
22 [0x78000041a:0x2:0x0]
23 [0x7c000041a:0x2:0x0]
24 [0x80000041a:0x2:0x0]
25 [0x84000041a:0x2:0x0]
26 [0x88000040d:0x2:0x0]
27 [0x8c0000404:0x2:0x0]
28 [0x90000041f:0x2:0x0]
29 [0x94000041f:0x2:0x0]
30 [0x98000041f:0x2:0x0]
31 [0x9c0000408:0x2:0x0]
32 [0xa00000426:0x2:0x0]
33 [0xa40000426:0x2:0x0]
34 [0xa80000426:0x2:0x0]
35 [0xac0000426:0x2:0x0]
36 [0xb00000426:0x2:0x0]
37 [0xb4000041b:0x2:0x0]
38 [0xb8000040f:0x2:0x0]
39 [0xbc0000405:0x2:0x0]
40 [0xc0000042d:0x2:0x0]
41 [0xc4000042d:0x2:0x0]
42 [0xc8000042d:0x2:0x0]
43 [0xcc0000425:0x2:0x0]
44 [0xd00000418:0x2:0x0]
45 [0xd4000040b:0x2:0x0]
46 [0xd80000403:0x2:0x0]
47 [0xdc0000432:0x2:0x0]
48 [0xe0000042c:0x2:0x0]
49 [0xe4000041f:0x2:0x0]
50 [0xe80000410:0x2:0x0]
51 [0xec0000403:0x2:0x0]
52 [0xf00000437:0x2:0x0]
53 [0xf40000431:0x2:0x0]
54 [0xf80000422:0x2:0x0]
55 [0xfc0000413:0x2:0x0]
56 [0x1000000403:0x2:0x0]
57 [0x104000043c:0x2:0x0]
58 [0x1080000433:0x2:0x0]
59 [0x10c0000422:0x2:0x0]
60 [0x110000040f:0x2:0x0]
61 [0x114000043f:0x2:0x0]
62 [0x1180000438:0x2:0x0]
63 [0x11c0000425:0x2:0x0]
64 [0x1200000404:0x2:0x0]
65 [0x1240000443:0x2:0x0]
66 [0x1280000430:0x2:0x0]
67 [0x12c0000404:0x2:0x0]
68 [0x1300000446:0x2:0x0]
69 [0x1340000439:0x2:0x0]
70 [0x138000041e:0x2:0x0]
71 [0x13c0000401:0x2:0x0]
72 [0x140000043e:0x2:0x0]
73 [0x1440000420:0x2:0x0]
74 [0x148000040f:0x2:0x0]
75 [0x14c0000439:0x2:0x0]
76 [0x150000040d:0x2:0x0]
77 [0x154000044e:0x2:0x0]
78 [0x158000042d:0x2:0x0]
79 [0x15c0000402:0x2:0x0]
80 [0x1600000440:0x2:0x0]
81 [0x1640000412:0x2:0x0]
82 [0x1680000453:0x2:0x0]
83 [0x16c000042b:0x2:0x0]
84 [0x170000040a:0x2:0x0]
85 [0x1740000438:0x2:0x0]
|
| Comment by Robert Read (Inactive) [ 29/May/15 ] |
|
OK, I'll try this build next. |
| Comment by Andreas Dilger [ 29/May/15 ] |
|
Di, do you have any thoughts on what the maximum stripe count will now be? Presumably there is no longer a limit because of the size of a single replay record, but there might still be another limit from the size of a single OUT record that lists e.g. the FIDs of all stripes, or is each stripe separate? |
| Comment by Andreas Dilger [ 29/May/15 ] |
|
Di, I was going to look at your patch to change the OUT logging change, but I only see the combined change 13942. Is there a different patch that contains only the changes for the redo logs? |
| Comment by Di Wang [ 29/May/15 ] |
|
Andreas: Sorry, no independent patch yet. I was waiting Oleg to land some of my patches, (he is testing them right now, hopefully tomorrow they can land) then I will push those patches independently, which actually includes 3 patches. |
| Comment by Di Wang [ 29/May/15 ] |
|
Andreas: the maximum stripes is now limited by 1MB RPC size, which includes the updates to create the single stripe (about 4k, at most 8k, plus some RPC overhead) and the update logs (1016k at most, let's say 1000k for safe). As we calculated before, each stripe costs about 390 bytes, and extra update records is about 2k. so (1000k - 2k)/390 = 2620 stripes. There might be some other extra overhead, for example if there are ACL or default LOV etc, then each stripe creation might include more updates. let's say 3 more xattr set updates.(so it is 32 * 3 = 96 bytes), then the total bytes of creation single stripe will be 390 + 96 = 486 (Note: in setxattr, EA itself will be in common params area, because it will be the same for all stripes, so it will not repeat). Then the total stripes will be (1000k - 2k) / 486 = 2102 stripes. So I would think the maximum stripes will be 2000 in theory. It would be interesting, if we can create 512 or 1024 stripes to see if it can works. |
| Comment by James A Simmons [ 29/May/15 ] |
|
Actually you can increase the RPC to 4MB so in theory it can be 8000 stripes. It is very possible that we might have 16MB RPCs a few years for now. |
| Comment by Robert Read (Inactive) [ 03/Jun/15 ] |
|
With build 32529, I was able to create striped directories up to 84, but at 96 I hit another assertion on an MDS: Jun 3 18:22:19 mdt03 kernel: LustreError: 1745:0:(llog_osd.c:147:llog_osd_pad()) ASSERTION( len >= (24) && (len & 0x7) == 0 ) failed: Jun 3 18:22:19 mdt03 kernel: LustreError: 1745:0:(llog_osd.c:147:llog_osd_pad()) LBUG Jun 3 18:22:19 mdt03 kernel: Pid: 1745, comm: mdt01_002 Jun 3 18:22:19 mdt03 kernel: Jun 3 18:22:19 mdt03 kernel: Call Trace: Jun 3 18:22:19 mdt03 kernel: [<ffffffffa0145875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] Jun 3 18:22:19 mdt03 kernel: [<ffffffffa0145e77>] lbug_with_loc+0x47/0xb0 [libcfs] Jun 3 18:22:19 mdt03 kernel: [<ffffffffa023d48e>] llog_osd_write_rec+0x167e/0x17a0 [obdclass] Jun 3 18:22:19 mdt03 kernel: [<ffffffffa022d220>] llog_write_rec+0xb0/0x270 [obdclass] Jun 3 18:22:19 mdt03 kernel: [<ffffffffa0235a6f>] llog_cat_add_rec+0x9f/0x460 [obdclass] Jun 3 18:22:19 mdt03 kernel: [<ffffffffa022d039>] llog_add+0x89/0x1c0 [obdclass] Jun 3 18:22:19 mdt03 kernel: [<ffffffffa051b17c>] sub_updates_write+0x9dc/0x1380 [ptlrpc] Jun 3 18:22:19 mdt03 kernel: [<ffffffff8115cdbc>] ? __vunmap+0x9c/0x120 Jun 3 18:22:19 mdt03 kernel: [<ffffffffa051c26c>] top_trans_stop+0x74c/0xb30 [ptlrpc] Jun 3 18:22:19 mdt03 kernel: [<ffffffffa0c5b0ff>] ? lod_attr_set+0x12f/0xab0 [lod] Jun 3 18:22:19 mdt03 kernel: [<ffffffffa029ce40>] ? lu_ucred+0x20/0x30 [obdclass] Jun 3 18:22:19 mdt03 kernel: [<ffffffffa0c449bf>] lod_trans_stop+0x2bf/0x330 [lod] Jun 3 18:22:19 mdt03 kernel: [<ffffffffa0ce867d>] mdd_trans_stop+0x1d/0xb0 [mdd] Jun 3 18:22:19 mdt03 kernel: [<ffffffffa0cd7a8c>] mdd_create+0x13ac/0x1760 [mdd] Jun 3 18:22:19 mdt03 kernel: [<ffffffffa0b9080c>] ? mdt_version_save+0x8c/0x1a0 [mdt] Jun 3 18:22:19 mdt03 kernel: [<ffffffffa0b94bec>] mdt_reint_create+0xbcc/0xce0 [mdt] Jun 3 18:22:19 mdt03 kernel: [<ffffffffa029ce40>] ? lu_ucred+0x20/0x30 [obdclass] Jun 3 18:22:19 mdt03 kernel: [<ffffffffa0b70dd5>] ? mdt_ucred+0x15/0x20 [mdt] Jun 3 18:22:19 mdt03 kernel: [<ffffffffa0b8baec>] ? mdt_root_squash+0x2c/0x3f0 [mdt] Jun 3 18:22:19 mdt03 kernel: [<ffffffffa04cbd42>] ? __req_capsule_get+0x162/0x6e0 [ptlrpc] Jun 3 18:22:19 mdt03 kernel: [<ffffffffa0b8fb9d>] mdt_reint_rec+0x5d/0x200 [mdt] Jun 3 18:22:19 mdt03 kernel: [<ffffffffa0b74fab>] mdt_reint_internal+0x4cb/0x7a0 [mdt] Jun 3 18:22:19 mdt03 kernel: [<ffffffffa0b7571b>] mdt_reint+0x6b/0x120 [mdt] Jun 3 18:22:19 mdt03 kernel: [<ffffffffa05074be>] tgt_request_handle+0x95e/0x10b0 [ptlrpc] Jun 3 18:22:19 mdt03 kernel: [<ffffffffa04b6c31>] ptlrpc_main+0xe41/0x1970 [ptlrpc] Jun 3 18:22:19 mdt03 kernel: [<ffffffff81014959>] ? sched_clock+0x9/0x10 Jun 3 18:22:19 mdt03 kernel: [<ffffffff81060c3f>] ? finish_task_switch+0x4f/0xf0 Jun 3 18:22:19 mdt03 kernel: [<ffffffffa04b5df0>] ? ptlrpc_main+0x0/0x1970 [ptlrpc] Jun 3 18:22:19 mdt03 kernel: [<ffffffff8109e71e>] kthread+0x9e/0xc0 Jun 3 18:22:19 mdt03 kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20 Jun 3 18:22:19 mdt03 kernel: [<ffffffff8100b294>] ? int_ret_from_sys_call+0x7/0x1b Jun 3 18:22:19 mdt03 kernel: [<ffffffff8100ba1d>] ? retint_restore_args+0x5/0x6 Jun 3 18:22:19 mdt03 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20 Jun 3 18:22:19 mdt03 kernel: |
| Comment by Di Wang [ 03/Jun/15 ] |
|
Ah, I probably already fixed this problem last night. I just updated the patch to make DNE create multiple stripes on single MDT as Andreas suggested. Right now, I am able to create the striped directory with 512 stripes on my local environment. I will push the patch soon. Thanks for testing. |
| Comment by Robert Read (Inactive) [ 03/Jun/15 ] |
|
I had 128 MDTs though, so it didn't need to create more than one stripe on a single MDT. |
| Comment by Di Wang [ 03/Jun/15 ] |
|
Oh, yes, this multiple stripes is only for testing purpose, IMHO. i.e. verifying big stripes on local environment. |
| Comment by Robert Read (Inactive) [ 03/Jun/15 ] |
|
Got it. |
| Comment by Di Wang [ 04/Jun/15 ] |
|
I just upgrade the patch http://review.whamcloud.com/#/c/13942 == sanity test 300k: test large striped directory == 01:30:04 (1433320204)
fail_loc=0x1703
fail_loc=0
/mnt/lustre/d300k.sanity/striped_dir
lmv_stripe_count: 512 lmv_stripe_offset: 0
mdtidx FID[seq:oid:ver]
0 [0x200000400:0x104:0x0]
1 [0x240000403:0x104:0x0]
2 [0x280000403:0x104:0x0]
3 [0x2c0000403:0x104:0x0]
0 [0x200000400:0x105:0x0]
1 [0x240000403:0x105:0x0]
2 [0x280000403:0x105:0x0]
3 [0x2c0000403:0x105:0x0]
0 [0x200000400:0x106:0x0]
1 [0x240000403:0x106:0x0]
2 [0x280000403:0x106:0x0]
3 [0x2c0000403:0x106:0x0]
0 [0x200000400:0x107:0x0]
1 [0x240000403:0x107:0x0]
2 [0x280000403:0x107:0x0]
3 [0x2c0000403:0x107:0x0]
0 [0x200000400:0x108:0x0]
1 [0x240000403:0x108:0x0]
2 [0x280000403:0x108:0x0]
3 [0x2c0000403:0x108:0x0]
0 [0x200000400:0x109:0x0]
1 [0x240000403:0x109:0x0]
2 [0x280000403:0x109:0x0]
3 [0x2c0000403:0x109:0x0]
0 [0x200000400:0x10a:0x0]
1 [0x240000403:0x10a:0x0]
2 [0x280000403:0x10a:0x0]
3 [0x2c0000403:0x10a:0x0]
0 [0x200000400:0x10b:0x0]
1 [0x240000403:0x10b:0x0]
2 [0x280000403:0x10b:0x0]
3 [0x2c0000403:0x10b:0x0]
0 [0x200000400:0x10c:0x0]
1 [0x240000403:0x10c:0x0]
2 [0x280000403:0x10c:0x0]
3 [0x2c0000403:0x10c:0x0]
0 [0x200000400:0x10d:0x0]
1 [0x240000403:0x10d:0x0]
2 [0x280000403:0x10d:0x0]
3 [0x2c0000403:0x10d:0x0]
0 [0x200000400:0x10e:0x0]
1 [0x240000403:0x10e:0x0]
2 [0x280000403:0x10e:0x0]
3 [0x2c0000403:0x10e:0x0]
0 [0x200000400:0x10f:0x0]
1 [0x240000403:0x10f:0x0]
2 [0x280000403:0x10f:0x0]
3 [0x2c0000403:0x10f:0x0]
0 [0x200000400:0x110:0x0]
1 [0x240000403:0x110:0x0]
2 [0x280000403:0x110:0x0]
3 [0x2c0000403:0x110:0x0]
0 [0x200000400:0x111:0x0]
1 [0x240000403:0x111:0x0]
2 [0x280000403:0x111:0x0]
3 [0x2c0000403:0x111:0x0]
0 [0x200000400:0x112:0x0]
1 [0x240000403:0x112:0x0]
2 [0x280000403:0x112:0x0]
3 [0x2c0000403:0x112:0x0]
0 [0x200000400:0x113:0x0]
1 [0x240000403:0x113:0x0]
2 [0x280000403:0x113:0x0]
3 [0x2c0000403:0x113:0x0]
0 [0x200000400:0x114:0x0]
1 [0x240000403:0x114:0x0]
2 [0x280000403:0x114:0x0]
3 [0x2c0000403:0x114:0x0]
0 [0x200000400:0x115:0x0]
1 [0x240000403:0x115:0x0]
2 [0x280000403:0x115:0x0]
3 [0x2c0000403:0x115:0x0]
0 [0x200000400:0x116:0x0]
1 [0x240000403:0x116:0x0]
2 [0x280000403:0x116:0x0]
3 [0x2c0000403:0x116:0x0]
0 [0x200000400:0x117:0x0]
1 [0x240000403:0x117:0x0]
2 [0x280000403:0x117:0x0]
3 [0x2c0000403:0x117:0x0]
0 [0x200000400:0x118:0x0]
1 [0x240000403:0x118:0x0]
2 [0x280000403:0x118:0x0]
3 [0x2c0000403:0x118:0x0]
0 [0x200000400:0x119:0x0]
1 [0x240000403:0x119:0x0]
2 [0x280000403:0x119:0x0]
3 [0x2c0000403:0x119:0x0]
0 [0x200000400:0x11a:0x0]
1 [0x240000403:0x11a:0x0]
2 [0x280000403:0x11a:0x0]
3 [0x2c0000403:0x11a:0x0]
0 [0x200000400:0x11b:0x0]
1 [0x240000403:0x11b:0x0]
2 [0x280000403:0x11b:0x0]
3 [0x2c0000403:0x11b:0x0]
0 [0x200000400:0x11c:0x0]
1 [0x240000403:0x11c:0x0]
2 [0x280000403:0x11c:0x0]
3 [0x2c0000403:0x11c:0x0]
0 [0x200000400:0x11d:0x0]
1 [0x240000403:0x11d:0x0]
2 [0x280000403:0x11d:0x0]
3 [0x2c0000403:0x11d:0x0]
0 [0x200000400:0x11e:0x0]
1 [0x240000403:0x11e:0x0]
2 [0x280000403:0x11e:0x0]
3 [0x2c0000403:0x11e:0x0]
0 [0x200000400:0x11f:0x0]
1 [0x240000403:0x11f:0x0]
2 [0x280000403:0x11f:0x0]
3 [0x2c0000403:0x11f:0x0]
0 [0x200000400:0x120:0x0]
1 [0x240000403:0x120:0x0]
2 [0x280000403:0x120:0x0]
3 [0x2c0000403:0x120:0x0]
0 [0x200000400:0x121:0x0]
1 [0x240000403:0x121:0x0]
2 [0x280000403:0x121:0x0]
3 [0x2c0000403:0x121:0x0]
0 [0x200000400:0x122:0x0]
1 [0x240000403:0x122:0x0]
2 [0x280000403:0x122:0x0]
3 [0x2c0000403:0x122:0x0]
0 [0x200000400:0x123:0x0]
1 [0x240000403:0x123:0x0]
2 [0x280000403:0x123:0x0]
3 [0x2c0000403:0x123:0x0]
0 [0x200000400:0x124:0x0]
1 [0x240000403:0x124:0x0]
2 [0x280000403:0x124:0x0]
3 [0x2c0000403:0x124:0x0]
0 [0x200000400:0x125:0x0]
1 [0x240000403:0x125:0x0]
2 [0x280000403:0x125:0x0]
3 [0x2c0000403:0x125:0x0]
0 [0x200000400:0x126:0x0]
1 [0x240000403:0x126:0x0]
2 [0x280000403:0x126:0x0]
3 [0x2c0000403:0x126:0x0]
0 [0x200000400:0x127:0x0]
1 [0x240000403:0x127:0x0]
2 [0x280000403:0x127:0x0]
3 [0x2c0000403:0x127:0x0]
0 [0x200000400:0x128:0x0]
1 [0x240000403:0x128:0x0]
2 [0x280000403:0x128:0x0]
3 [0x2c0000403:0x128:0x0]
0 [0x200000400:0x129:0x0]
1 [0x240000403:0x129:0x0]
2 [0x280000403:0x129:0x0]
3 [0x2c0000403:0x129:0x0]
0 [0x200000400:0x12a:0x0]
1 [0x240000403:0x12a:0x0]
2 [0x280000403:0x12a:0x0]
3 [0x2c0000403:0x12a:0x0]
0 [0x200000400:0x12b:0x0]
1 [0x240000403:0x12b:0x0]
2 [0x280000403:0x12b:0x0]
3 [0x2c0000403:0x12b:0x0]
0 [0x200000400:0x12c:0x0]
1 [0x240000403:0x12c:0x0]
2 [0x280000403:0x12c:0x0]
3 [0x2c0000403:0x12c:0x0]
0 [0x200000400:0x12d:0x0]
1 [0x240000403:0x12d:0x0]
2 [0x280000403:0x12d:0x0]
3 [0x2c0000403:0x12d:0x0]
0 [0x200000400:0x12e:0x0]
1 [0x240000403:0x12e:0x0]
2 [0x280000403:0x12e:0x0]
3 [0x2c0000403:0x12e:0x0]
0 [0x200000400:0x12f:0x0]
1 [0x240000403:0x12f:0x0]
2 [0x280000403:0x12f:0x0]
3 [0x2c0000403:0x12f:0x0]
0 [0x200000400:0x130:0x0]
1 [0x240000403:0x130:0x0]
2 [0x280000403:0x130:0x0]
3 [0x2c0000403:0x130:0x0]
0 [0x200000400:0x131:0x0]
1 [0x240000403:0x131:0x0]
2 [0x280000403:0x131:0x0]
3 [0x2c0000403:0x131:0x0]
0 [0x200000400:0x132:0x0]
1 [0x240000403:0x132:0x0]
2 [0x280000403:0x132:0x0]
3 [0x2c0000403:0x132:0x0]
0 [0x200000400:0x133:0x0]
1 [0x240000403:0x133:0x0]
2 [0x280000403:0x133:0x0]
3 [0x2c0000403:0x133:0x0]
0 [0x200000400:0x134:0x0]
1 [0x240000403:0x134:0x0]
2 [0x280000403:0x134:0x0]
3 [0x2c0000403:0x134:0x0]
0 [0x200000400:0x135:0x0]
1 [0x240000403:0x135:0x0]
2 [0x280000403:0x135:0x0]
3 [0x2c0000403:0x135:0x0]
0 [0x200000400:0x136:0x0]
1 [0x240000403:0x136:0x0]
2 [0x280000403:0x136:0x0]
3 [0x2c0000403:0x136:0x0]
0 [0x200000400:0x137:0x0]
1 [0x240000403:0x137:0x0]
2 [0x280000403:0x137:0x0]
3 [0x2c0000403:0x137:0x0]
0 [0x200000400:0x138:0x0]
1 [0x240000403:0x138:0x0]
2 [0x280000403:0x138:0x0]
3 [0x2c0000403:0x138:0x0]
0 [0x200000400:0x139:0x0]
1 [0x240000403:0x139:0x0]
2 [0x280000403:0x139:0x0]
3 [0x2c0000403:0x139:0x0]
0 [0x200000400:0x13a:0x0]
1 [0x240000403:0x13a:0x0]
2 [0x280000403:0x13a:0x0]
3 [0x2c0000403:0x13a:0x0]
0 [0x200000400:0x13b:0x0]
1 [0x240000403:0x13b:0x0]
2 [0x280000403:0x13b:0x0]
3 [0x2c0000403:0x13b:0x0]
0 [0x200000400:0x13c:0x0]
1 [0x240000403:0x13c:0x0]
2 [0x280000403:0x13c:0x0]
3 [0x2c0000403:0x13c:0x0]
0 [0x200000400:0x13d:0x0]
1 [0x240000403:0x13d:0x0]
2 [0x280000403:0x13d:0x0]
3 [0x2c0000403:0x13d:0x0]
0 [0x200000400:0x13e:0x0]
1 [0x240000403:0x13e:0x0]
2 [0x280000403:0x13e:0x0]
3 [0x2c0000403:0x13e:0x0]
0 [0x200000400:0x13f:0x0]
1 [0x240000403:0x13f:0x0]
2 [0x280000403:0x13f:0x0]
3 [0x2c0000403:0x13f:0x0]
0 [0x200000400:0x140:0x0]
1 [0x240000403:0x140:0x0]
2 [0x280000403:0x140:0x0]
3 [0x2c0000403:0x140:0x0]
0 [0x200000400:0x141:0x0]
1 [0x240000403:0x141:0x0]
2 [0x280000403:0x141:0x0]
3 [0x2c0000403:0x141:0x0]
0 [0x200000400:0x142:0x0]
1 [0x240000403:0x142:0x0]
2 [0x280000403:0x142:0x0]
3 [0x2c0000403:0x142:0x0]
0 [0x200000400:0x143:0x0]
1 [0x240000403:0x143:0x0]
2 [0x280000403:0x143:0x0]
3 [0x2c0000403:0x143:0x0]
0 [0x200000400:0x144:0x0]
1 [0x240000403:0x144:0x0]
2 [0x280000403:0x144:0x0]
3 [0x2c0000403:0x144:0x0]
0 [0x200000400:0x145:0x0]
1 [0x240000403:0x145:0x0]
2 [0x280000403:0x145:0x0]
3 [0x2c0000403:0x145:0x0]
0 [0x200000400:0x146:0x0]
1 [0x240000403:0x146:0x0]
2 [0x280000403:0x146:0x0]
3 [0x2c0000403:0x146:0x0]
0 [0x200000400:0x147:0x0]
1 [0x240000403:0x147:0x0]
2 [0x280000403:0x147:0x0]
3 [0x2c0000403:0x147:0x0]
0 [0x200000400:0x148:0x0]
1 [0x240000403:0x148:0x0]
2 [0x280000403:0x148:0x0]
3 [0x2c0000403:0x148:0x0]
0 [0x200000400:0x149:0x0]
1 [0x240000403:0x149:0x0]
2 [0x280000403:0x149:0x0]
3 [0x2c0000403:0x149:0x0]
0 [0x200000400:0x14a:0x0]
1 [0x240000403:0x14a:0x0]
2 [0x280000403:0x14a:0x0]
3 [0x2c0000403:0x14a:0x0]
0 [0x200000400:0x14b:0x0]
1 [0x240000403:0x14b:0x0]
2 [0x280000403:0x14b:0x0]
3 [0x2c0000403:0x14b:0x0]
0 [0x200000400:0x14c:0x0]
1 [0x240000403:0x14c:0x0]
2 [0x280000403:0x14c:0x0]
3 [0x2c0000403:0x14c:0x0]
0 [0x200000400:0x14d:0x0]
1 [0x240000403:0x14d:0x0]
2 [0x280000403:0x14d:0x0]
3 [0x2c0000403:0x14d:0x0]
0 [0x200000400:0x14e:0x0]
1 [0x240000403:0x14e:0x0]
2 [0x280000403:0x14e:0x0]
3 [0x2c0000403:0x14e:0x0]
0 [0x200000400:0x14f:0x0]
1 [0x240000403:0x14f:0x0]
2 [0x280000403:0x14f:0x0]
3 [0x2c0000403:0x14f:0x0]
0 [0x200000400:0x150:0x0]
1 [0x240000403:0x150:0x0]
2 [0x280000403:0x150:0x0]
3 [0x2c0000403:0x150:0x0]
0 [0x200000400:0x151:0x0]
1 [0x240000403:0x151:0x0]
2 [0x280000403:0x151:0x0]
3 [0x2c0000403:0x151:0x0]
0 [0x200000400:0x152:0x0]
1 [0x240000403:0x152:0x0]
2 [0x280000403:0x152:0x0]
3 [0x2c0000403:0x152:0x0]
0 [0x200000400:0x153:0x0]
1 [0x240000403:0x153:0x0]
2 [0x280000403:0x153:0x0]
3 [0x2c0000403:0x153:0x0]
0 [0x200000400:0x154:0x0]
1 [0x240000403:0x154:0x0]
2 [0x280000403:0x154:0x0]
3 [0x2c0000403:0x154:0x0]
0 [0x200000400:0x155:0x0]
1 [0x240000403:0x155:0x0]
2 [0x280000403:0x155:0x0]
3 [0x2c0000403:0x155:0x0]
0 [0x200000400:0x156:0x0]
1 [0x240000403:0x156:0x0]
2 [0x280000403:0x156:0x0]
3 [0x2c0000403:0x156:0x0]
0 [0x200000400:0x157:0x0]
1 [0x240000403:0x157:0x0]
2 [0x280000403:0x157:0x0]
3 [0x2c0000403:0x157:0x0]
0 [0x200000400:0x158:0x0]
1 [0x240000403:0x158:0x0]
2 [0x280000403:0x158:0x0]
3 [0x2c0000403:0x158:0x0]
0 [0x200000400:0x159:0x0]
1 [0x240000403:0x159:0x0]
2 [0x280000403:0x159:0x0]
3 [0x2c0000403:0x159:0x0]
0 [0x200000400:0x15a:0x0]
1 [0x240000403:0x15a:0x0]
2 [0x280000403:0x15a:0x0]
3 [0x2c0000403:0x15a:0x0]
0 [0x200000400:0x15b:0x0]
1 [0x240000403:0x15b:0x0]
2 [0x280000403:0x15b:0x0]
3 [0x2c0000403:0x15b:0x0]
0 [0x200000400:0x15c:0x0]
1 [0x240000403:0x15c:0x0]
2 [0x280000403:0x15c:0x0]
3 [0x2c0000403:0x15c:0x0]
0 [0x200000400:0x15d:0x0]
1 [0x240000403:0x15d:0x0]
2 [0x280000403:0x15d:0x0]
3 [0x2c0000403:0x15d:0x0]
0 [0x200000400:0x15e:0x0]
1 [0x240000403:0x15e:0x0]
2 [0x280000403:0x15e:0x0]
3 [0x2c0000403:0x15e:0x0]
0 [0x200000400:0x15f:0x0]
1 [0x240000403:0x15f:0x0]
2 [0x280000403:0x15f:0x0]
3 [0x2c0000403:0x15f:0x0]
0 [0x200000400:0x160:0x0]
1 [0x240000403:0x160:0x0]
2 [0x280000403:0x160:0x0]
3 [0x2c0000403:0x160:0x0]
0 [0x200000400:0x161:0x0]
1 [0x240000403:0x161:0x0]
2 [0x280000403:0x161:0x0]
3 [0x2c0000403:0x161:0x0]
0 [0x200000400:0x162:0x0]
1 [0x240000403:0x162:0x0]
2 [0x280000403:0x162:0x0]
3 [0x2c0000403:0x162:0x0]
0 [0x200000400:0x163:0x0]
1 [0x240000403:0x163:0x0]
2 [0x280000403:0x163:0x0]
3 [0x2c0000403:0x163:0x0]
0 [0x200000400:0x164:0x0]
1 [0x240000403:0x164:0x0]
2 [0x280000403:0x164:0x0]
3 [0x2c0000403:0x164:0x0]
0 [0x200000400:0x165:0x0]
1 [0x240000403:0x165:0x0]
2 [0x280000403:0x165:0x0]
3 [0x2c0000403:0x165:0x0]
0 [0x200000400:0x166:0x0]
1 [0x240000403:0x166:0x0]
2 [0x280000403:0x166:0x0]
3 [0x2c0000403:0x166:0x0]
0 [0x200000400:0x167:0x0]
1 [0x240000403:0x167:0x0]
2 [0x280000403:0x167:0x0]
3 [0x2c0000403:0x167:0x0]
0 [0x200000400:0x168:0x0]
1 [0x240000403:0x168:0x0]
2 [0x280000403:0x168:0x0]
3 [0x2c0000403:0x168:0x0]
0 [0x200000400:0x169:0x0]
1 [0x240000403:0x169:0x0]
2 [0x280000403:0x169:0x0]
3 [0x2c0000403:0x169:0x0]
0 [0x200000400:0x16a:0x0]
1 [0x240000403:0x16a:0x0]
2 [0x280000403:0x16a:0x0]
3 [0x2c0000403:0x16a:0x0]
0 [0x200000400:0x16b:0x0]
1 [0x240000403:0x16b:0x0]
2 [0x280000403:0x16b:0x0]
3 [0x2c0000403:0x16b:0x0]
0 [0x200000400:0x16c:0x0]
1 [0x240000403:0x16c:0x0]
2 [0x280000403:0x16c:0x0]
3 [0x2c0000403:0x16c:0x0]
0 [0x200000400:0x16d:0x0]
1 [0x240000403:0x16d:0x0]
2 [0x280000403:0x16d:0x0]
3 [0x2c0000403:0x16d:0x0]
0 [0x200000400:0x16e:0x0]
1 [0x240000403:0x16e:0x0]
2 [0x280000403:0x16e:0x0]
3 [0x2c0000403:0x16e:0x0]
0 [0x200000400:0x16f:0x0]
1 [0x240000403:0x16f:0x0]
2 [0x280000403:0x16f:0x0]
3 [0x2c0000403:0x16f:0x0]
0 [0x200000400:0x170:0x0]
1 [0x240000403:0x170:0x0]
2 [0x280000403:0x170:0x0]
3 [0x2c0000403:0x170:0x0]
0 [0x200000400:0x171:0x0]
1 [0x240000403:0x171:0x0]
2 [0x280000403:0x171:0x0]
3 [0x2c0000403:0x171:0x0]
0 [0x200000400:0x172:0x0]
1 [0x240000403:0x172:0x0]
2 [0x280000403:0x172:0x0]
3 [0x2c0000403:0x172:0x0]
0 [0x200000400:0x173:0x0]
1 [0x240000403:0x173:0x0]
2 [0x280000403:0x173:0x0]
3 [0x2c0000403:0x173:0x0]
0 [0x200000400:0x174:0x0]
1 [0x240000403:0x174:0x0]
2 [0x280000403:0x174:0x0]
3 [0x2c0000403:0x174:0x0]
0 [0x200000400:0x175:0x0]
1 [0x240000403:0x175:0x0]
2 [0x280000403:0x175:0x0]
3 [0x2c0000403:0x175:0x0]
0 [0x200000400:0x176:0x0]
1 [0x240000403:0x176:0x0]
2 [0x280000403:0x176:0x0]
3 [0x2c0000403:0x176:0x0]
0 [0x200000400:0x177:0x0]
1 [0x240000403:0x177:0x0]
2 [0x280000403:0x177:0x0]
3 [0x2c0000403:0x177:0x0]
0 [0x200000400:0x178:0x0]
1 [0x240000403:0x178:0x0]
2 [0x280000403:0x178:0x0]
3 [0x2c0000403:0x178:0x0]
0 [0x200000400:0x179:0x0]
1 [0x240000403:0x179:0x0]
2 [0x280000403:0x179:0x0]
3 [0x2c0000403:0x179:0x0]
0 [0x200000400:0x17a:0x0]
1 [0x240000403:0x17a:0x0]
2 [0x280000403:0x17a:0x0]
3 [0x2c0000403:0x17a:0x0]
0 [0x200000400:0x17b:0x0]
1 [0x240000403:0x17b:0x0]
2 [0x280000403:0x17b:0x0]
3 [0x2c0000403:0x17b:0x0]
0 [0x200000400:0x17c:0x0]
1 [0x240000403:0x17c:0x0]
2 [0x280000403:0x17c:0x0]
3 [0x2c0000403:0x17c:0x0]
0 [0x200000400:0x17d:0x0]
1 [0x240000403:0x17d:0x0]
2 [0x280000403:0x17d:0x0]
3 [0x2c0000403:0x17d:0x0]
0 [0x200000400:0x17e:0x0]
1 [0x240000403:0x17e:0x0]
2 [0x280000403:0x17e:0x0]
3 [0x2c0000403:0x17e:0x0]
0 [0x200000400:0x17f:0x0]
1 [0x240000403:0x17f:0x0]
2 [0x280000403:0x17f:0x0]
3 [0x2c0000403:0x17f:0x0]
0 [0x200000400:0x180:0x0]
1 [0x240000403:0x180:0x0]
2 [0x280000403:0x180:0x0]
3 [0x2c0000403:0x180:0x0]
0 [0x200000400:0x181:0x0]
1 [0x240000403:0x181:0x0]
2 [0x280000403:0x181:0x0]
3 [0x2c0000403:0x181:0x0]
0 [0x200000400:0x182:0x0]
1 [0x240000403:0x182:0x0]
2 [0x280000403:0x182:0x0]
3 [0x2c0000403:0x182:0x0]
0 [0x200000400:0x183:0x0]
1 [0x240000403:0x183:0x0]
2 [0x280000403:0x183:0x0]
3 [0x2c0000403:0x183:0x0]
Resetting fail_loc on all nodes...done.
PASS 300k (7s)
== sanity test complete, duration 9 sec == 01:30:11 (1433320211)
|
| Comment by Gerrit Updater [ 05/Jun/15 ] |
|
wangdi (di.wang@intel.com) uploaded a new patch: http://review.whamcloud.com/15161 |
| Comment by Gerrit Updater [ 05/Jun/15 ] |
|
wangdi (di.wang@intel.com) uploaded a new patch: http://review.whamcloud.com/15162 |
| Comment by Robert Read (Inactive) [ 05/Jun/15 ] |
|
Let me know which build you want me to test. |
| Comment by Di Wang [ 05/Jun/15 ] |
|
Robert: Please use this build https://build.hpdd.intel.com/job/lustre-reviews/32667/ Thanks. |
| Comment by Robert Read (Inactive) [ 10/Jun/15 ] |
|
Not sure if this is related to DNE or not, but during setup one of the MDS nodes panic with this trace right after mounting an MDT: LDISKFS-fs (xvdg1): mounted filesystem with ordered data mode. quota=on. Opts: LDISKFS-fs (xvdg1): mounted filesystem with ordered data mode. quota=on. Opts: LDISKFS-fs (xvdg1): mounted filesystem with ordered data mode. quota=on. Opts: BUG: unable to handle kernel NULL pointer dereference at 0000000000000024 IP: [<ffffffffa0232eb6>] llog_cat_process_or_fork+0x46/0x300 [obdclass] PGD 0 Oops: 0000 [#1] SMP last sysfs file: /sys/devices/vbd-2145/block/xvdg1/dev CPU 7 Modules linked in: osp(U) mdd(U) lod(U) mdt(U) lfsck(U) mgc(U) osd_ldiskfs(U) lquota(U) ldiskfs(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic crc32c_intel libcfs(U) ipv6 xen_netfront ext4 jbd2 mbcache xen_blkfront dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] Pid: 7552, comm: lod0061_rec0063 Not tainted 2.6.32-504.16.2.el6_lustre.g2f99b7f.x86_64 #1 RIP: e030:[<ffffffffa0232eb6>] [<ffffffffa0232eb6>] llog_cat_process_or_fork+0x46/0x300 [obdclass] RSP: e02b:ffff8806694b7da0 EFLAGS: 00010246 RAX: ffff88067e6aa378 RBX: ffff88068d020380 RCX: ffff88069c9bc240 RDX: ffffffffa0c48be0 RSI: ffff88068d020380 RDI: ffff8806694b7e70 RBP: ffff8806694b7e20 R08: 0000000000000000 R09: 0000000000000000 R10: ffff88067232eec0 R11: 1000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: ffff8806694b7e70 R15: ffff8806694b7e70 FS: 00007fe3a4090700(0000) GS:ffff880028122000(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000024 CR3: 0000000001a85000 CR4: 0000000000002660 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process lod0061_rec0063 (pid: 7552, threadinfo ffff8806694b6000, task ffff8806694b5520) Stack: ffff8806694b7e70 ffff88067232a800 ffff8806694b7e40 ffffffffa0c72dc0 <d> 0000000000000000 ffff8806694b7e70 ffff88066a66db40 ffff8806694b7e08 <d> ffff88067e6aa078 ffff88067fb62030 ffff88067fb622b8 ffff88069c9bc240 Call Trace: [<ffffffffa0c72dc0>] ? lod_sub_prep_llog+0x4f0/0x7b0 [lod] [<ffffffffa0233189>] llog_cat_process+0x19/0x20 [obdclass] [<ffffffffa0c4870a>] lod_sub_recovery_thread+0x4ba/0x990 [lod] [<ffffffff81007d82>] ? check_events+0x12/0x20 [<ffffffff8152dbbc>] ? _spin_unlock_irqrestore+0x1c/0x20 [<ffffffffa0c48250>] ? lod_sub_recovery_thread+0x0/0x990 [lod] [<ffffffff8109e71e>] kthread+0x9e/0xc0 [<ffffffff8100c20a>] child_rip+0xa/0x20 [<ffffffff8100b294>] ? int_ret_from_sys_call+0x7/0x1b [<ffffffff8100ba1d>] ? retint_restore_args+0x5/0x6 [<ffffffff8100c200>] ? child_rip+0x0/0x20 Code: f8 0f 1f 44 00 00 f6 05 8c ae f3 ff 01 44 0f b6 6d 10 49 89 fe 48 89 f3 4c 8b 66 38 74 0d f6 05 70 ae f3 ff 40 0f 85 9a 01 00 00 <41> f6 44 24 24 02 0f 84 6b 02 00 00 48 89 4d a0 48 89 55 a8 44 RIP [<ffffffffa0232eb6>] llog_cat_process_or_fork+0x46/0x300 [obdclass] RSP <ffff8806694b7da0> CR2: 0000000000000024 ---[ end trace 2a9e4e41d6fdd5e2 ]--- Kernel panic - not syncing: Fatal exception Pid: 7552, comm: lod0061_rec0063 Tainted: G D --------------- 2.6.32-504.16.2.el6_lustre.g2f99b7f.x86_64 #1 Call Trace: [<ffffffff81529fbc>] ? panic+0xa7/0x16f [<ffffffff8152dbbc>] ? _spin_unlock_irqrestore+0x1c/0x20 [<ffffffff8152ed94>] ? oops_end+0xe4/0x100 [<ffffffff8104c80b>] ? no_context+0xfb/0x260 [<ffffffff8104ca95>] ? __bad_area_nosemaphore+0x125/0x1e0 [<ffffffff8104cb63>] ? bad_area_nosemaphore+0x13/0x20 [<ffffffff8104d25c>] ? __do_page_fault+0x30c/0x500 [<ffffffff81007d82>] ? check_events+0x12/0x20 [<ffffffff810075dd>] ? xen_force_evtchn_callback+0xd/0x10 [<ffffffffa0511869>] ? out_update_pack+0xc9/0x190 [ptlrpc] [<ffffffff810075dd>] ? xen_force_evtchn_callback+0xd/0x10 [<ffffffff81007d82>] ? check_events+0x12/0x20 [<ffffffff81530cbe>] ? do_page_fault+0x3e/0xa0 [<ffffffff8152e075>] ? page_fault+0x25/0x30 [<ffffffffa0c48be0>] ? lod_process_recovery_updates+0x0/0x420 [lod] [<ffffffffa0232eb6>] ? llog_cat_process_or_fork+0x46/0x300 [obdclass] [<ffffffffa0c72dc0>] ? lod_sub_prep_llog+0x4f0/0x7b0 [lod] [<ffffffffa0233189>] ? llog_cat_process+0x19/0x20 [obdclass] [<ffffffffa0c4870a>] ? lod_sub_recovery_thread+0x4ba/0x990 [lod] [<ffffffff81007d82>] ? check_events+0x12/0x20 [<ffffffff8152dbbc>] ? _spin_unlock_irqrestore+0x1c/0x20 [<ffffffffa0c48250>] ? lod_sub_recovery_thread+0x0/0x990 [lod] [<ffffffff8109e71e>] ? kthread+0x9e/0xc0 [<ffffffff8100c20a>] ? child_rip+0xa/0x20 [<ffffffff8100b294>] ? int_ret_from_sys_call+0x7/0x1b [<ffffffff8100ba1d>] ? retint_restore_args+0x5/0x6 [<ffffffff8100c200>] ? child_rip+0x0/0x20 |
| Comment by Di Wang [ 10/Jun/15 ] |
|
Robert: Thanks for testing. It indeed looks like DNE issue. I will check the code. Thanks. |
| Comment by Gerrit Updater [ 11/Jun/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14883/ |
| Comment by Gerrit Updater [ 14/Jun/15 ] |
|
wangdi (di.wang@intel.com) uploaded a new patch: http://review.whamcloud.com/15274 |
| Comment by Di Wang [ 17/Jun/15 ] |
|
Robert: could you please try https://build.hpdd.intel.com/job/lustre-reviews/32865/ Thanks. |
| Comment by Gerrit Updater [ 03/Jul/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/15161/ |
| Comment by Gerrit Updater [ 04/Jul/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/15162/ |
| Comment by James A Simmons [ 07/Jul/15 ] |
|
One patch left!! |
| Comment by Gerrit Updater [ 16/Jul/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/15274/ |
| Comment by James A Simmons [ 29/Aug/19 ] |
|
sorry, typo. Meant to be LU-6202 |