[LU-8043] MDS running lustre 2.5.5+ OOM when running with Lustre 2.8 GA clients Created: 19/Apr/16 Updated: 03/Sep/16 Resolved: 20/May/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.5 |
| Fix Version/s: | Lustre 2.5.5 |
| Type: | Bug | Priority: | Critical |
| Reporter: | James A Simmons | Assignee: | Alex Zhuravlev |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Cray clients running unpatched lustre 2.8 GA clients. Server side running Lustre 2.5.5 with a patch set in a RHEL6.7 environment. |
||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
Today we performed a test shot on our smaller Cray Aries cluster (700 nodes) with a non-patched lustre 2.8 GA client specially build for this system. The test were run against our atlas file system which is running a RHEL6.7 distro with the lustre version 2.5.5 with patches. During our test shot while running an IOR single shared file test across all nodes with the stripe count of 1008 the MDS server ran out of memory. I attached the dmesg output to this ticket. |
| Comments |
| Comment by Oleg Drokin [ 20/Apr/16 ] |
|
From James: BTW I ran kmem -S on the vmcore. CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE ffff883f77c23c40 osp_obj 200 35894144 35894895 1889205 4k ffff883f811f3c00 mdd_obj 80 12196492 14344704 298848 4k ffff883f810f3bc0 lod_obj 112 12196492 13903654 408931 4k ffff883f940a3b80 mdt_cdt_agent_req 168 0 0 0 4k ffff883f94653b40 mdt_cdt_restore_handle 112 0 0 0 4k ffff883f94033b00 mdt_obj 256 12196492 13096710 873114 4k ffff883fa7b83ac0 dynlock_cache 72 0 265 5 4k ffff883fa7ab3a80 upd_kmem 96 0 0 0 4k ffff883fa7aa3a40 lqe_kmem 200 18 19 1 4k ffff883fa7a23a00 ldiskfs_inode_cache 1040 24405903 24405930 8135310 4k ffff883fa7a139c0 ldiskfs_xattr 88 0 0 0 4k ffff883fa7a03980 ldiskfs_free_data 64 1 59 1 4k ffff883fa79f3940 ldiskfs_alloc_context 136 0 84 3 4k ffff883fa79e3900 ldiskfs_prealloc_space 112 121 238 7 4k ffff883fa79d38c0 ldiskfs_system_zone 40 0 0 0 4k ffff883fa78d3880 jbd2_journal_handle 48 0 385 5 4k ffff883fa7863840 jbd2_journal_head 112 3985 8126 239 4k ffff883fa7853800 jbd2_revoke_table 16 2 202 1 4k ffff883fa7f737c0 jbd2_revoke_record 32 0 0 0 4k ffff883fb05c3780 xattr_kmem 40 0 0 0 4k |
| Comment by James A Simmons [ 21/Apr/16 ] |
|
The memory exhaustion problem exposed |
| Comment by Jian Yu [ 21/Apr/16 ] |
|
Hi James, |
| Comment by Alex Zhuravlev [ 21/Apr/16 ] |
|
Can you please confirm that you did same 1008 stripes with the old clients and it was doing fine? |
| Comment by James A Simmons [ 21/Apr/16 ] |
|
Yes we tested 2.5 client on the same platform and the MDS did not OOM. |
| Comment by James A Simmons [ 21/Apr/16 ] |
|
Oh I do see it now. We still encountered that same issues even with the patch 0>[4805210.570311] LustreError: 15316:0:(mdt_handler.c:3224:mdt_lock_handle_fini()) ASSERTION( !lustre_handle_is_used(&lh->mlh_reg_lh) ) failed: |
| Comment by Matt Ezell [ 21/Apr/16 ] |
|
Our assertion was for a regular lock, not a pdo lock. |
| Comment by Matt Ezell [ 21/Apr/16 ] |
|
I dumped the debug log from the crash dump, but I didn't see anything interesting. The most recent data is 00000004:00080000:2.0:1461078294.488964:0:15455:0:(mdt_open.c:661:mdt_mfd_set_mode()) [0x2002d12a5:0x47ab:0x0] Change mfd mode 00 -> 0302. 00000004:00080000:2.0:1461078294.488967:0:15455:0:(mdt_handler.c:2853:mdt_save_lock()) request = ffff8834683bd400 reply state = ffff881fa5957000 transno = 448115532568 00000004:00080000:2.0:1461078294.503955:0:15455:0:(mdt_open.c:661:mdt_mfd_set_mode()) [0x2002ce5a3:0x5f:0x0] Change mfd mode 00 -> 01. 00000004:00080000:2.0:1461078294.504528:0:15455:0:(mdt_open.c:661:mdt_mfd_set_mode()) [0x2002d12a5:0x47ac:0x0] Change mfd mode 00 -> 0302. 00000004:00080000:2.0:1461078294.504532:0:15455:0:(mdt_handler.c:2853:mdt_save_lock()) request = ffff88048acf6000 reply state = ffff8816e7196000 transno = 448115532572 00000004:00080000:3.0:1461078294.522700:0:15239:0:(mdt_open.c:661:mdt_mfd_set_mode()) [0x2002ccbf6:0x55:0x0] Change mfd mode 00 -> 01. 00000004:00080000:3.0:1461078294.523266:0:15239:0:(mdt_open.c:661:mdt_mfd_set_mode()) [0x2002d12a5:0x47ad:0x0] Change mfd mode 00 -> 0302. 00000004:00080000:3.0:1461078294.523269:0:15239:0:(mdt_handler.c:2853:mdt_save_lock()) request = ffff880339962400 reply state = ffff8817f1155000 transno = 448115532576 00000004:00080000:1.0:1461078294.538529:0:15601:0:(mdt_open.c:661:mdt_mfd_set_mode()) [0x2002d111a:0x1f4b0:0x0] Change mfd mode 00 -> 01. 00000004:00080000:2.0:1461078294.538598:0:15371:0:(mdt_handler.c:2853:mdt_save_lock()) request = ffff880873ac7000 reply state = ffff8834db7b5000 transno = 448115532578 00000004:00040000:0.0:1461078294.538823:0:15316:0:(mdt_handler.c:3224:mdt_lock_handle_fini()) ASSERTION( !lustre_handle_is_used(&lh->mlh_reg_lh) ) failed: 00000004:00080000:2.0:1461078294.538953:0:15371:0:(mdt_handler.c:2853:mdt_save_lock()) request = ffff880873ac7000 reply state = ffff8834db7b5000 transno = 448115532578 00000004:00040000:2.0:1461078294.538990:0:15659:0:(mdt_handler.c:3224:mdt_lock_handle_fini()) ASSERTION( !lustre_handle_is_used(&lh->mlh_reg_lh) ) failed: 00000004:00040000:2.0:1461078294.538993:0:15659:0:(mdt_handler.c:3224:mdt_lock_handle_fini()) LBUG |
| Comment by Gerrit Updater [ 21/Apr/16 ] |
|
Alex Zhuravlev (alexey.zhuravlev@intel.com) uploaded a new patch: http://review.whamcloud.com/19717 |
| Comment by Alex Zhuravlev [ 21/Apr/16 ] |
|
in all the cases it's stats: <3>[2806622.047940] LustreError: 14895:0:(lvfs_lib.c:157:lprocfs_stats_alloc_one()) LNET: out of memory at /data/buildsystem/jsimmons-atlas/rpmbuild/BUILD/lustre-2.5.5/lustre/lvfs/lvfs_lib.c:157 (tried to alloc '(stats->ls_percpu[cpuid])' = 4224) lprocfs_stats_alloc_one() is trying to allocate per-cpu slot: |
| Comment by Alex Zhuravlev [ 21/Apr/16 ] |
|
the patch above hasn't been tested yet.. uploaded to start testing in Maloo. |
| Comment by Matt Ezell [ 21/Apr/16 ] |
|
I took a look in the crash dump and found that mdt_thread_info.mti_lh[MDT_LH_OLD] is: {
mlh_type = MDT_NUL_LOCK,
mlh_reg_lh = {
cookie = 16144317094821187230
},
mlh_reg_mode = LCK_CR,
mlh_pdo_lh = {
cookie = 0
},
mlh_pdo_mode = LCK_MINMODE,
mlh_pdo_hash = 0,
mlh_rreg_lh = {
cookie = 0
},
mlh_rreg_mode = LCK_MINMODE
}
Also looking at the mdt_thread_info, I see: mti_rr = {
rr_opcode = REINT_OPEN,
rr_handle = 0xffff883f36707350,
rr_fid1 = 0xffff883f36707330,
rr_fid2 = 0xffff883f36707340,
rr_name = 0xffff883f36707480 "testfile.out",
rr_namelen = 12,
rr_tgt = 0x0,
rr_tgtlen = 0,
rr_eadata = 0xffff883f36707490,
rr_eadatalen = 4096,
rr_logcookielen = 0,
rr_logcookies = 0x0,
rr_flags = 0
}
Why does a create reint have a lock handle for "old". Isn't that just used for renames and layout swaps? |
| Comment by James A Simmons [ 25/Apr/16 ] |
|
After a discussion internally we really like to see the assertion that is present in the dmesg log to be fixed as well. It is possible that the assertion in the log is independent of the memory exhaustion problems. |
| Comment by Jian Yu [ 25/Apr/16 ] |
|
Hi John, |
| Comment by John Hammond [ 26/Apr/16 ] |
|
Hi Matt, could you post the debug log? Or if it's too big, could you post the lines from tasks 15316 and 15659? |
| Comment by John Hammond [ 26/Apr/16 ] |
|
> Also looking at the mdt_thread_info, I see The values in mti_rr may be from a previous request. I think mdt_intent_layout() is more likely here. Note that MDT_LH_LAYOUT has the same value as MDT_LH_OLD. |
| Comment by Matt Ezell [ 26/Apr/16 ] |
|
I just attached mylog.dk.gz. Unfortunately it was only running with "normal" debug levels. James Simmons was running IOR when it crashed. He said it was a single-shared-file striped across 1008 OSTs from ~700 clients. "testfile.out" likely is the name of the file he was using, but I understand mti_rr might be from a previous request. Let me know if there's anything you want to see from the crash. We can't upload it to Intel for you to look at, but I can run commands for you and provide the output. Thanks! #5
mod libcfs, name lbug_with_loc, RIP 0xffffffffa0420eeb
frame start 0xffff883f8d5a3ce8, end 0xffff883f8d5a3d08, *base 0xffff883f8d5a3d10
XBT_RBX = ffffffffa0e15620
msgdata = ffffffffa0e15620
#6
mod mdt, name mdt_lock_handle_fini, RIP 0xffffffffa0d9e64b
frame start 0xffff883f8d5a3d08, end 0xffff883f8d5a3d18, *base 0xffff883f8d5a3d40
XBT_RBX = ffff883f8d4ff000
#7
mod mdt, name mdt_thread_info_fini, RIP 0xffffffffa0da4cc0
frame start 0xffff883f8d5a3d18, end 0xffff883f8d5a3d48, *base 0xffff883f8d5a3d90
XBT_RBX = ffff883f8d4ff000
info = ffff883f8d4ff000
#8
mod mdt, name mdt_handle_common, RIP 0xffffffffa0daa473
frame start 0xffff883f8d5a3d48, end 0xffff883f8d5a3d98, *base 0xffff883f8d5a3da0
XBT_RBX = ffff8837a0061000
XBT_R12 = ffff883f8d4ff000
XBT_R13 = ffffffffa0e23ee0
req = ffff8837a0061000
&supported = ffff883f8d5a3d58
supported = ffffffff00000002 ...
info = ffff883f8d4ff000
&supported = ffff883f8d5a3d58
supported = ffffffff00000002 ...
info = ffff883f8d4ff000
req = ffff8837a0061000
set = 1
id = 123
quiet = 0
subsystem = 4
mask = 1
#9
mod mdt, name mds_regular_handle, RIP 0xffffffffa0de98c5
frame start 0xffff883f8d5a3d98, end 0xffff883f8d5a3da8, *base 0xffff883f8d5a3ee0
XBT_RBX = ffff883f8df0d140
XBT_R12 = ffff883fa7a96800
XBT_R13 = ffff8837a0061000
XBT_R14 = 42
XBT_R15 = ffff883f8d418940
#10
mod ptlrpc, name ptlrpc_main, RIP 0xffffffffa076d07e
frame start 0xffff883f8d5a3da8, end 0xffff883f8d5a3ee8, *base 0xffff883f8d5a3f40
XBT_RBX = ffff883f8df0d140
XBT_R12 = ffff883fa7a96800
XBT_R13 = ffff8837a0061000
XBT_R14 = 42
XBT_R15 = ffff883f8d418940
arg = ffff883f8df0d140
thread = ffff883f8df0d140
svcpt = ffff883fa7a96800
&svc = ffff883f8d5a3e50
svc = ffff883fa7f96080 ...
&rs = ffff883f8d5a3e68
rs = ffff883fa7a96868 ...
&env = ffff883f8d5a3e80
env = ffff883fa7f96080 ...
counter = 42
&rc = ffff883f8d5a3e64
rc = 0 ...
flags = 8050
size = 38
&ret = ffff883f8d5a3e80
ret = ffff883fa7f96080 ...
i = 1
id = c00
quiet = 0
set = 0
value = 0
subsystem = 100
mask = 10
flags = 8250
thread = ffff883f8df0d140
flags = 4
thread = ffff883f8df0d140
flags = 8
thread = ffff883f8df0d140
&lock = ffff883f8d5a3e88
lock = ffff883fa7a96830 ...
svcpt = ffff883fa7a96800
head = ffff883fa7a96a18
new = ffff883fa7a96878
subsystem = 100
mask = 200
svcpt = ffff883fa7a96800
thread = ffff883f8df0d140
svcpt = ffff883fa7a96800
&svc = ffff883f8d5a3e80
svc = ffff883fa7f96080 ...
request = ffff8837a0061000
&work_start = ffff883f8d5a3ea0
work_start = 57164916 ...
&work_end = ffff883f8d5a3e90
work_end = 57164916 ...
small = ffff8837a00611e0
large = ffff883f8d5a3ea0
&r = ffff883f8d5a3e40
r = 4 ...
result = 0
id = 50e
quiet = 0
set = 0
value = 0
req = ffff8837a0061000
id = 512
quiet = 0
set = 0
value = 0
subsystem = 100
mask = 1
subsystem = 100
mask = 100000
subsystem = 100
mask = 200
req = ffff8837a0061000
thread = ffff883f8df0d140
svcpt = ffff883fa7a96800
svcpt = ffff883fa7a96800
svcpt = ffff883fa7a96800
force = 0
svcpt = ffff883fa7a96800
|
| Comment by Matt Ezell [ 26/Apr/16 ] |
crash> struct ptlrpc_request.rq_pill ffff8837a0061000
rq_pill = {
rc_req = 0xffff8837a0061000,
rc_fmt = 0xffffffffa0819240 <RQF_LDLM_INTENT_LAYOUT>,
rc_loc = RCL_SERVER,
rc_area = {{4294967295, 4294967295, 4294967295, 4294967295, 4294967295, 4294967295, 4294967295, 4294967295, 4294967295}, {4294967295, 4294967295, 4294967295, 4294967295, 4294967295, 4294967295, 4294967295, 4294967295, 4294967295}}
}
|
| Comment by James A Simmons [ 03/May/16 ] |
|
Any progress on fixing the assertion? |
| Comment by John Hammond [ 03/May/16 ] |
|
I understood that ORNL was going to reproduce with a stronger debug mask. Has that been done? |
| Comment by Matt Ezell [ 03/May/16 ] |
|
We have been unable to reproduce on our testbed systems, and we haven't had an opportunity to reproduce on the production systems. |
| Comment by John Hammond [ 03/May/16 ] |
|
It's hard to say for sure without more information but the failed assertion may be addresses by http://review.whamcloud.com/#/c/18060/. |
| Comment by Matt Ezell [ 19/May/16 ] |
|
We attempted to reproduce this assertion on Tuesday using the same conditions as last time, but it never crashed. After that, we moved to a server with 19717 and 18060 to hopefully prevent it in the future. I think we can close this ticket and reopen if we see it again. Thanks. |
| Comment by Jian Yu [ 20/May/16 ] |
|
Thank you, Matt. |
| Comment by Gerrit Updater [ 19/Jul/16 ] |
|
Comment deleted (wrong LU in commit message). |
| Comment by Gerrit Updater [ 19/Jul/16 ] |
|
Comment deleted (wrong LU in commit message). |