[LU-8246] Leaks on ldlm granted locks counter on MDS leading to canceling loop Created: 07/Jun/16 Updated: 14/Jun/18 Resolved: 30/Oct/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.3, Lustre 2.8.0 |
| Fix Version/s: | Lustre 2.9.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Sebastien Piechurski | Assignee: | Oleg Drokin |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
A performance problem at one of our customers led us to find that the granted ldlm locks counter (found in /proc/fs/lustre/ldlm/namespaces/mdt-fsname-MDT0000_UUID/pool/granted) is actually missing some decrements in some conditions (yet to be determined). This leads after some time to have this counter largely exceed the number found in /proc/fs/lustre/ldlm/namespaces/mdt-fsname-MDT0000_UUID/pool/limit. See here: [root@prolixmds1 pool]# pwd /proc/fs/lustre/ldlm/namespaces/mdt-scratch-MDT0000_UUID/pool [root@prolixmds1 pool]# cat limit 3203616 [root@prolixmds1 pool]# cat granted 54882822 However, summing up all granted locks as seen by the all the clients, we get only 16k locks, which is also consistent with the slab consumption on the MDS. Once above the limit, the MDS will then constantly try to cancel locks, even those which are not above max_age. Clients then reacquire the locks, but lose time in the process (then showing the performance problem). Note that as this is only the counter which is false, we don't have any resource overconsumption tied to this problem. We found that this problem is also seen on 2.8. I also wonder if there is any relation with the last comment from Shuichi Ihara in |
| Comments |
| Comment by Peter Jones [ 07/Jun/16 ] |
|
Oleg Could you please advise? Peter |
| Comment by Oleg Drokin [ 16/Jun/16 ] |
|
Hm, you are actually right. I checked some of my nodes and see the numbers are out of whack too, so I added a simple refcounter into every lock to see how many times did we call ldlm_pool_add/del and it seems to be a common occurence that we call ldlm_pool_add (from ldlm_lock_grant) twice, but free it only once. The second addition comes via: [ 353.612043] Increaing pl_granted but lock ffff880046a3cc40 refc is already 1 [ 353.612660] CPU: 7 PID: 3654 Comm: mdt01_002 Tainted: G OE ------------ 3.10.0-debug #1 [ 353.613777] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 353.614385] ffff880046a3cc40 0000000095009a39 ffff880080ef7840 ffffffff816fe7c0 [ 353.616879] ffff880080ef7890 ffffffffa0556e5d ffff880046a3ceb8 ffff880080ef78a8 [ 353.618213] ffff880080ef7890 0000000095009a39 ffff880046a3cc40 ffff880042a36e80 [ 353.619201] Call Trace: [ 353.619815] [<ffffffff816fe7c0>] dump_stack+0x19/0x1b [ 353.620385] [<ffffffffa0556e5d>] ldlm_pool_add+0x16d/0x1f0 [ptlrpc] [ 353.620931] [<ffffffffa052162e>] ldlm_grant_lock+0x11e/0x780 [ptlrpc] [ 353.621564] [<ffffffffa0554382>] ldlm_process_inodebits_lock+0x362/0x420 [ptlrpc] [ 353.622676] [<ffffffffa052323e>] ldlm_lock_enqueue+0x3fe/0x940 [ptlrpc] [ 353.623327] [<ffffffffa053f126>] ldlm_cli_enqueue_local+0x1c6/0x850 [ptlrpc] [ 353.623984] [<ffffffffa053c480>] ? ldlm_expired_completion_wait+0x250/0x250 [ptlrpc] [ 353.625077] [<ffffffffa0b9a570>] ? mdt_register_seq_exp+0x2e0/0x2e0 [mdt] [ 353.625710] [<ffffffffa0ba947a>] mdt_object_local_lock+0x52a/0xb00 [mdt] [ 353.626344] [<ffffffffa0b9a570>] ? mdt_register_seq_exp+0x2e0/0x2e0 [mdt] [ 353.627001] [<ffffffffa053c480>] ? ldlm_expired_completion_wait+0x250/0x250 [ptlrpc] [ 353.628095] [<ffffffffa0ba9aae>] mdt_object_lock_internal+0x5e/0x2f0 [mdt] [ 353.628728] [<ffffffffa0ba9ec0>] mdt_reint_object_lock+0x20/0x60 [mdt] [ 353.629362] [<ffffffffa0bc167b>] mdt_reint_setattr+0x67b/0x1130 [mdt] [ 353.630001] [<ffffffffa0bc21b0>] mdt_reint_rec+0x80/0x210 [mdt] [ 353.630612] [<ffffffffa0ba4f8c>] mdt_reint_internal+0x5dc/0x940 [mdt] [ 353.631246] [<ffffffffa0baee87>] mdt_reint+0x67/0x140 [mdt] [ 353.631888] [<ffffffffa05ce835>] tgt_request_handle+0x925/0x1330 [ptlrpc] [ 353.632567] [<ffffffffa057c261>] ptlrpc_server_handle_request+0x231/0xab0 [ptlrpc] [ 353.633678] [<ffffffffa057aae8>] ? ptlrpc_wait_event+0xb8/0x370 [ptlrpc] [ 353.634353] [<ffffffffa0580068>] ptlrpc_main+0xa58/0x1dc0 [ptlrpc] [ 353.635023] [<ffffffff81707f27>] ? _raw_spin_unlock_irq+0x27/0x50 [ 353.635657] [<ffffffffa057f610>] ? ptlrpc_register_service+0xe70/0xe70 [ptlrpc] [ 353.636433] [<ffffffff810a404a>] kthread+0xea/0xf0 [ 353.636860] [<ffffffff810a3f60>] ? kthread_create_on_node+0x140/0x140 [ 353.637301] [<ffffffff81711758>] ret_from_fork+0x58/0x90 [ 353.637724] [<ffffffff810a3f60>] ? kthread_create_on_node+0x140/0x140 [ 353.638163] Original trace: [ 353.638573] [<ffffffffa0556da0>] ldlm_pool_add+0xb0/0x1f0 [ptlrpc] [ 353.639218] [<ffffffffa052162e>] ldlm_grant_lock+0x11e/0x780 [ptlrpc] [ 353.639950] [<ffffffffa0554323>] ldlm_process_inodebits_lock+0x303/0x420 [ptlrpc] [ 353.641051] [<ffffffffa05238a3>] ldlm_reprocess_queue+0x123/0x230 [ptlrpc] [ 353.642311] [<ffffffffa0523e50>] ldlm_reprocess_all+0x120/0x310 [ptlrpc] [ 353.643068] [<ffffffffa05466a5>] ldlm_request_cancel+0x475/0x710 [ptlrpc] [ 353.643810] [<ffffffffa054c78a>] ldlm_handle_cancel+0xba/0x250 [ptlrpc] [ 353.644555] [<ffffffffa054ca61>] ldlm_cancel_handler+0x141/0x490 [ptlrpc] [ 353.645330] [<ffffffffa057c261>] ptlrpc_server_handle_request+0x231/0xab0 [ptlrpc] [ 353.651045] [<ffffffffa0580068>] ptlrpc_main+0xa58/0x1dc0 [ptlrpc] [ 353.651478] [<ffffffff810a404a>] kthread+0xea/0xf0 [ 353.651899] [<ffffffff81711758>] ret_from_fork+0x58/0x90 [ 353.652327] [<ffffffffffffffff>] 0xffffffffffffffff |
| Comment by Oleg Drokin [ 16/Jun/16 ] |
|
so... a race between a lock request and lock cancel that leads to two granting threads, I guess? |
| Comment by Oleg Drokin [ 16/Jun/16 ] |
|
Ok, o the race plays like this: We do first enqueue (in ldlm_process_inodebits_lock()), meet a conflicting lock, add ourselves into waiting list and call ldlm_run_ast_work() Every grant call increases the pl_granted counter, but since the lock is the same there is only one decrease. Hm, in fact there's even a check for that just for extent only lock, fixed as part of bug 11300. I'll try to do a similar patch for plain and ibits locks. The flock is more convoluted and would not play into this particular problem. |
| Comment by Gerrit Updater [ 16/Jun/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) uploaded a new patch: http://review.whamcloud.com/20839 |
| Comment by Oleg Drokin [ 16/Jun/16 ] |
|
btw to guge number of locks MDT has actually granted in total, instead of counting all clients, just check /proc/fs/lustre/ldlm/namespaces/mdt-lustre-MDT0000_UUID/lock_count |
| Comment by Sebastien Piechurski [ 21/Jun/16 ] |
|
Great ! Thanks Oleg for the analysis. |
| Comment by Gerrit Updater [ 30/Oct/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/20839/ |
| Comment by Peter Jones [ 30/Oct/16 ] |
|
Landed for 2.9 |