[LU-6077] MDS OOM Created: 31/Dec/14  Updated: 01/Jun/15  Resolved: 01/Jun/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.3
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Mahmoud Hanafi Assignee: Niu Yawei (Inactive)
Resolution: Duplicate Votes: 0
Labels: None

Attachments: HTML File service160    
Severity: 3
Rank (Obsolete): 16909

 Description   

We have had a number of crashes with the MDS OOMing with ldlm_locks slab using most of the memory. Attached you'll find console logs and back trace.

<code>
rash> kmem -i
PAGES TOTAL PERCENTAGE
TOTAL MEM 12289376 46.9 GB ----
FREE 348961 1.3 GB 2% of TOTAL MEM
USED 11940415 45.5 GB 97% of TOTAL MEM
SHARED 251654 983 MB 2% of TOTAL MEM
BUFFERS 250789 979.6 MB 2% of TOTAL MEM
CACHED 864 3.4 MB 0% of TOTAL MEM
SLAB 9196563 35.1 GB 74% of TOTAL MEM

TOTAL SWAP 500013 1.9 GB ----
SWAP USED 2913 11.4 MB 0% of TOTAL SWAP
SWAP FREE 497100 1.9 GB 99% of TOTAL SWAP

crash> kmem -s
CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE
ffff880ba5641980 osp_obj 216 38190 85302 4739 4k
ffff880babc51940 lod_obj 120 21141 64096 2003 4k
ffff880bb1951900 mdt_obj 248 21141 54528 3408 4k
ffff880bb34a18c0 fsfilt_ldiskfs_fcb 56 0 0 0 4k
ffff880bb3631880 dynlock_cache 128 0 0 0 4k
ffff880bb3621840 ldiskfs_inode_cache 1056 22187 33852 11284 4k
ffff880bb3611800 ldiskfs_xattr 88 0 0 0 4k
ffff880bb36017c0 ldiskfs_free_data 64 0 0 0 4k
ffff880bb35f1780 ldiskfs_alloc_context 136 0 0 0 4k
ffff880bb35e1740 ldiskfs_prealloc_space 112 37 170 5 4k
ffff880bb35d1700 ldiskfs_system_zone 40 0 0 0 4k
ffff880bb35516c0 upd_kmem 96 0 0 0 4k
ffff880bb3541680 lqe_kmem 192 3130 3180 159 4k
ffff880bb3491640 jbd2_journal_handle 48 0 0 0 4k
ffff880bb3481600 jbd2_journal_head 112 0 0 0 4k
ffff880bb3b715c0 jbd2_revoke_table 16 4 404 2 4k
ffff880bb3b81580 jbd2_revoke_record 32 0 0 0 4k
ffff880bb3461540 mdd_obj 96 21141 68200 1705 4k
ffff8805fd5a2040 ccc_req_kmem 40 0 0 0 4k
ffff8805fd592000 ccc_session_kmem 184 589 1890 90 4k
ffff8805fd581fc0 ccc_thread_kmem 352 71 176 16 4k
ffff8805fdfb1f80 ccc_object_kmem 264 0 0 0 4k
ffff8805fdfa1f40 ccc_lock_kmem 40 0 0 0 4k
ffff8805fdf91f00 vvp_session_kmem 104 589 2183 59 4k
ffff8805fdf81ec0 vvp_thread_kmem 488 71 136 17 4k
ffff8805fde31e80 ll_rmtperm_hash_cache 256 0 0 0 4k
ffff8805fde21e40 ll_remote_perm_cache 40 0 0 0 4k
ffff8805fe391e00 ll_file_data 192 0 0 0 4k
ffff880601741dc0 lustre_inode_cache 1216 0 0 0 4k
ffff8805fdf71d80 lov_oinfo 128 0 0 0 4k
ffff8805fdf61d40 lov_lock_link_kmem 32 0 0 0 4k
ffff8805fdf51d00 lovsub_req_kmem 40 0 0 0 4k
ffff8805fdf41cc0 lovsub_object_kmem 240 0 0 0 4k
ffff8805fdf31c80 lovsub_lock_kmem 64 0 0 0 4k
ffff8805fdf21c40 lov_req_kmem 40 0 0 0 4k
ffff8805fdd11c00 lov_session_kmem 400 589 1110 111 4k
ffff8805fdd01bc0 lov_thread_kmem 288 71 195 15 4k
ffff8805fdf11b80 lov_object_kmem 240 0 0 0 4k
ffff8805fdcf1b40 lov_lock_kmem 104 0 0 0 4k
ffff8805fde11b00 osc_quota_kmem 24 0 0 0 4k
ffff8805fde01ac0 osc_extent_kmem 168 0 0 0 4k
ffff8805fddf1a80 osc_req_kmem 40 0 0 0 4k
ffff8805fdde1a40 osc_session_kmem 424 589 1080 120 4k
ffff8805fddd1a00 osc_thread_kmem 984 71 96 24 4k
ffff8805fddc19c0 osc_object_kmem 288 0 0 0 4k
ffff8805fddb1980 osc_lock_kmem 192 0 0 0 4k
ffff8805fe371940 interval_node 128 0 0 0 4k
ffff8805fe361900 ldlm_locks 576 49731039 49796635 7113805 4k
<cod>



 Comments   
Comment by John Fuchs-Chesney (Inactive) [ 01/Jan/15 ]

Niu,
Could you please advise on this issue.

Thanks,
~ jfc.

Comment by Niu Yawei (Inactive) [ 04/Jan/15 ]

1. I see lots of network errors in the log:

<4>LNet: 2036:0:(o2iblnd_cb.c:2348:kiblnd_passive_connect()) Conn stale 10.151.28.220@o2ib [old ver: 12, new ver: 12]
<4>LNet: 2036:0:(o2iblnd_cb.c:2348:kiblnd_passive_connect()) Conn stale 10.151.49.230@o2ib [old ver: 12, new ver: 12]
<4>LNet: 2036:0:(o2iblnd_cb.c:2348:kiblnd_passive_connect()) Skipped 1 previous similar message
<4>LNet: 2036:0:(o2iblnd_cb.c:2348:kiblnd_passive_connect()) Conn stale 10.151.49.233@o2ib [old ver: 12, new ver: 12]
<4>LNet: 2036:0:(o2iblnd_cb.c:2348:kiblnd_passive_connect()) Skipped 2 previous similar messages
<4>Lustre: MGS: haven't heard from client 115bc340-65eb-e4c8-5212-3d07e8fe9c9b (at 10.151.46.238@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff880432122c00, cur 1419472730 expire 1419472580 last 1419472503

You probably should check the network is working properly first.

2. Do you have any special patches applied on 2.4.3?

3. I'm afraid that the ldlm pools shrink mechanism can't work well in heavy workload, could you try to disable the lru_resize to see if the OOM can be resolved? (see Lustre manual 32.8 Configuring locking)

Comment by Peter Jones [ 05/Jan/15 ]

Niu

The NASA tree is on github - https://github.com/jlan/lustre-nas. NASA will have to advise as to the exact version in use.

Peter

Comment by Jay Lan (Inactive) [ 05/Jan/15 ]

Service160 was running 2.4.3-8nasS. The tag corresponds to
LU-4019 ofd: setattr don't udpate lvbo with object referenced
in the nas-2.4.3 branch.

Comment by Mahmoud Hanafi [ 05/Jan/15 ]

The network error you pointed out are normal We see those all the time. We have large number of nodes that are some time rebooted after a job.

The Documentation is not very clear. Do we run this on every client? If we have different clients/#cpus how do we deal with that?
$ lctl set_param ldlm.namespaces.osc.lru_size=$((NR_CPU*100))

What are the side effects of disabling lru_size?

Comment by Niu Yawei (Inactive) [ 06/Jan/15 ]

Thank you, Jay. I didn't see any suspicious commit in the log.

The Documentation is not very clear. Do we run this on every client? If we have different clients/#cpus how do we deal with that?
$ lctl set_param ldlm.namespaces.osc.lru_size=$((NR_CPU*100))

Yes, you have to run this on every client. You can use a script to get the NR_CPU on each client then set the lru_size accordingly, or you can just use an average value for all clients.

What are the side effects of disabling lru_size?

When lru_resize enabled, each client has a dynamic ldlm cache size, the number of cached locks for each client depends on the workload and memory on client/server (active client can cache more locks, idle client cache less locks); When lru_resize disabled, each client can at maximum cache only lru_size (NR_CPU * 100) ldlm locks.

Comment by Peter Jones [ 15/Jan/15 ]

Niu

Could this be related to LU-5726?

Peter

Comment by Niu Yawei (Inactive) [ 16/Jan/15 ]

I think they are different issues, in this ticket, the ldlm lock cache is getting very huge, it consumed lots of memory, whereas in LU-5726, it's kernel buffers consumed lots of memory.

Comment by Niu Yawei (Inactive) [ 01/Jun/15 ]

I think this is the same problem of LU-6529, will fix it in LU-6529.

Comment by Niu Yawei (Inactive) [ 01/Jun/15 ]

dup of LU-6529.

Generated at Sat Feb 10 01:57:01 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.