[LU-11518] lock_count is exceeding lru_size Created: 15/Oct/18 Updated: 28/Dec/22 Resolved: 25/Sep/20 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.14.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Gu Zheng (Inactive) | Assignee: | Gu Zheng (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | LTS12, ORNL | ||
| Environment: |
Client: |
||
| Issue Links: |
|
||||||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||||||
| Description |
|
Mitsuhiro reported lock_count is exceeding lru_size ( disabled lru resize) on DDN es3.x branch before, and when we tested with master, found the same problem can also be seen on upstream. # for i in `seq -w 10001 20000`; do dd if=/dev/zero of=/mnt/lustre/1KB_file-${i} bs=1K count=1 > /dev/null 2>&1 ; echo ${i} ; done # lctl get_param ldlm.namespaces.*.lru_size ldlm.namespaces.lustre-MDT0000-mdc-ffff8d0afcc55000.lru_size=800 ldlm.namespaces.lustre-OST0000-osc-ffff8d0afcc55000.lru_size=800 ldlm.namespaces.lustre-OST0001-osc-ffff8d0afcc55000.lru_size=800 # lctl get_param ldlm.namespaces.*.lock_count ldlm.namespaces.lustre-MDT0000-mdc-ffff8d0afcc55000.lock_count=800 ldlm.namespaces.lustre-OST0000-osc-ffff8d0afcc55000.lock_count=4090 ldlm.namespaces.lustre-OST0001-osc-ffff8d0afcc55000.lock_count=4096 {{}} |
| Comments |
| Comment by Gerrit Updater [ 15/Oct/18 ] |
|
Gu Zheng (gzheng@ddn.com) uploaded a new patch: https://review.whamcloud.com/33371 |
| Comment by Andreas Dilger [ 15/Oct/18 ] |
|
Can you please link the LU ticket and patch that removed ELC for extent locks? While I agree this patch is good to avoid overflowin the LRU size, canceling locks with ELC avoids sending extra RPCs. I'd like to understand why ELC was turned off. |
| Comment by Oleg Drokin [ 04/Jan/19 ] |
|
what about |
| Comment by Li Xi [ 29/Jan/19 ] |
|
I'd suggest to add some debug message in ldlm_prep_elc_req() and check wheteher it releases lock from time to time. In theory, if ldlm_prep_elc_req() releases lock as expected, LRU shouldn't grow larger than the limitation. |
| Comment by Li Xi [ 29/Jan/19 ] |
|
Is it possible that enqueue RPC doesn't have a lot of space to save the ELC locks? For example, in a given I/O patern, maybe all the payload has already been taken and no space for ELC at all? |
| Comment by Gu Zheng (Inactive) [ 26/Feb/19 ] |
|
Hi Oleg, Andreas Since
[ 1203.258882] #####ldlm cli enqulock:102
[ 1203.259923] ####: avail:408, to_free:1, count:0,pack:0 <--osc
[ 1203.263990] ####: avail:368, to_free:1, count:1,pack:1 <--mdc
[ 1203.266082] #####ldlm cli enqulock:103
[ 1203.267089] ####: avail:408, to_free:1, count:0,pack:0
count += ldlm_cancel_lru_local(ns, cancels, to_free,
avail - count, 0,
lru_flags);
lru_size is default to 100 on my testing VMs. So when enqueue locks over lru_size limit, we can see mdc cancel locks one by one, but osc doesn't. Sorry to my limited understanding, please correct me if I'm missing something.
|
| Comment by Gu Zheng (Inactive) [ 26/Mar/19 ] |
|
When disabled lru_resize, I traced ldlm_prep_elc_req&ldlm_prepare_lru_list, found EXTENT locks are skipped even though lock_count already exceeded lru_size. That's the situation we meet above, there is no more chance for extent locks to be canceled at enqueue time. Although we can set lru_size=clear to cleanup all, but we can't be always conscious of this condition immediately. |
| Comment by Patrick Farrell (Inactive) [ 27/Mar/19 ] |
|
Gu Zheng, Do you know why they are being skipped? It seems like we should fix that, rather than do the async stuff. |
| Comment by Gu Zheng (Inactive) [ 29/Mar/19 ] |
|
Hi Patrick,
I don't know the root cause, but according to previous debug, osc_ldlm_weigh_ast is not 0, means that the lock can't be early cancelled. |
| Comment by Gu Zheng (Inactive) [ 03/Apr/19 ] |
|
Since ELC calls osc_ldlm_weigh_ast -> weigh_cb to judge wether the lock is safe enough to cancel early, it checks every page from ldlm_extent, if any of the page(vmpage) is Dirty, ELC will skip this lock. As mentioned in the summary, this condition is really common if there are explosive write I/O, ELC can't make positive effect to it. |
| Comment by Patrick Farrell (Inactive) [ 03/Apr/19 ] |
|
It would be good to get -1 trace on one of these calls to osc_ldlm_weigh_ast - I am skeptical that you actually have a dirty page under all your locks. That is probably only possible with a workload that is dirtying many, many files with a small amount of data, very quickly. I guess if you are creating many small files that might be possible... I wonder if you are actually hitting this case: lock_res_and_lock(dlmlock);
obj = dlmlock->l_ast_data;
if (obj)
cl_object_get(osc2cl(obj));
unlock_res_and_lock(dlmlock);
if (obj == NULL)
GOTO(out, weight = 1);
Because if we cannot find the OSC object, we assume we should not cancel the lock. So if a bunch of OSC objects are being destroyed, all of their associated LDLM locks will be ignored for ELC. This seems incorrect to me... This was copied over from the old CLIO code, before Jinshan's simplification, but when we copied, we changed from 0 to 1 (which means 'keep the lock') in this case: - lock = osc_ast_data_get(dlmlock);
- if (lock == NULL) {
- /* cl_lock was destroyed because of memory pressure.
- * It is much reasonable to assign this type of lock
- * a lower cost.
- GOTO(out, weight = 0);
I'll push a patch. Anyway, you should be able to figure out if this patch will help just by using trace debug to check how you're exiting osc_ldlm_weigh_ast. You could also check your lock used vs unused counts in LDLM, that might be interesting too. |
| Comment by Gerrit Updater [ 03/Apr/19 ] |
|
Patrick Farrell (pfarrell@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34584 |
| Comment by Gu Zheng (Inactive) [ 04/Apr/19 ] |
|
Hi Patrick, > That is probably only possible with a workload that is dirtying many, many files with a small amount of data, very quickly. I guess if you are creating many small files that might be possible... Yeah, correct, we hit this problem when creating lots of small files for testing, please see the reproduce script in the summary. https://review.whamcloud.com/34584 And I have no objections to this patch, but it doesn't help to resolve the above problem. |
| Comment by Gerrit Updater [ 21/Jun/19 ] |
|
Andriy Skulysh (c17819@cray.com) uploaded a new patch: https://review.whamcloud.com/35285 |
| Comment by Gerrit Updater [ 01/Jul/19 ] |
|
Gu Zheng (gzheng@ddn.com) uploaded a new patch: https://review.whamcloud.com/35396 |
| Comment by Gerrit Updater [ 01/Jul/19 ] |
|
Gu Zheng (gzheng@ddn.com) uploaded a new patch: https://review.whamcloud.com/35397 |
| Comment by Gerrit Updater [ 03/Jul/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35285/ |
| Comment by Gerrit Updater [ 12/Jul/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35396/ |
| Comment by Gerrit Updater [ 12/Jul/19 ] |
|
Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35489 |
| Comment by Gerrit Updater [ 20/Jul/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35489/ |
| Comment by Gu Zheng (Inactive) [ 03/Sep/19 ] |
|
Hi pfarrell adilger [ 1106.410531] Call Trace: [ 1106.411660] [<ffffffff8d363107>] dump_stack+0x19/0x1b [ 1106.413944] [<ffffffffc0a1e8dc>] osc_extent_wait+0x5dc/0x750 [osc] [ 1106.416400] [<ffffffff8ccd6b60>] ? wake_up_state+0x20/0x20 [ 1106.418593] [<ffffffffc0a20b57>] osc_cache_wait_range+0x2e7/0x940 [osc] [ 1106.421190] [<ffffffffc0a21d7e>] osc_cache_writeback_range+0xbce/0x1260 [osc] [ 1106.423971] [<ffffffffc03aad98>] ? libcfs_debug_msg+0x688/0xab0 [libcfs] [ 1106.426865] [<ffffffffc053c4bb>] ? cl_env_get+0x1bb/0x270 [obdclass] [ 1106.429221] [<ffffffffc0a0dd85>] osc_lock_flush+0x195/0x290 [osc] [ 1106.431463] [<ffffffffc0a0e243>] osc_ldlm_blocking_ast+0x2e3/0x3a0 [osc] [ 1106.433930] [<ffffffffc0530001>] ? lustre_fill_super+0x6b1/0x950 [obdclass] [ 1106.436432] [<ffffffffc07c6b5a>] ldlm_cancel_callback+0x8a/0x330 [ptlrpc] [ 1106.438807] [<ffffffffc053bd30>] ? cl_env_put+0x140/0x1d0 [obdclass] [ 1106.441008] [<ffffffffc0a0d978>] ? osc_ldlm_weigh_ast+0x118/0x390 [osc] [ 1106.443332] [<ffffffffc07c6e56>] ldlm_lock_cancel+0x56/0x1f0 [ptlrpc] [ 1106.445551] [<ffffffffc07e2275>] ldlm_cli_cancel_list_local+0x85/0x280 [ptlrpc] [ 1106.447985] [<ffffffffc07e2a0b>] ldlm_cancel_lru_local+0x2b/0x30 [ptlrpc] [ 1106.450215] [<ffffffffc07e5797>] ldlm_replay_locks+0x5c7/0x880 [ptlrpc] [ 1106.452153] [<ffffffffc03a4cce>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [ 1106.454474] [<ffffffffc082e145>] ptlrpc_import_recovery_state_machine+0x5a5/0x920 [ptlrpc] [ 1106.457066] [<ffffffffc0830c91>] ptlrpc_connect_interpret+0xe21/0x2150 [ptlrpc] [ 1106.459344] [<ffffffffc0805921>] ptlrpc_check_set.part.23+0x481/0x1e10 [ptlrpc] [ 1106.461638] [<ffffffffc080730b>] ptlrpc_check_set+0x5b/0xe0 [ptlrpc] [ 1106.463659] [<ffffffffc08333cb>] ptlrpcd_check+0x4ab/0x590 [ptlrpc] [ 1106.465622] [<ffffffffc083374b>] ptlrpcd+0x29b/0x550 [ptlrpc] [ 1106.467359] [<ffffffff8ccd6b60>] ? wake_up_state+0x20/0x20 [ 1106.469039] [<ffffffffc08334b0>] ? ptlrpcd_check+0x590/0x590 [ptlrpc] [ 1106.471004] [<ffffffff8ccc1da1>] kthread+0xd1/0xe0 [ 1106.472420] [<ffffffff8ccc1cd0>] ? insert_kthread_work+0x40/0x40 [ 1106.474233] [<ffffffff8d375c37>] ret_from_fork_nospec_begin+0x21/0x21 [ 1106.476078] [<ffffffff8ccc1cd0>] ? insert_kthread_work+0x40/0x40 osc_extent_wait_range is waiting dirty pages of target extent lock to be written out, but seems no one flushed them, so the thread stalled there. |
| Comment by Patrick Farrell (Inactive) [ 03/Sep/19 ] |
|
Zheng, Hmm. I can try to take a look. |
| Comment by Gu Zheng (Inactive) [ 05/Sep/19 ] |
|
pfarrell thanks. |
| Comment by Gu Zheng (Inactive) [ 05/Sep/19 ] |
|
Hi pfarrell adilger |
| Comment by Andreas Dilger [ 06/Sep/19 ] |
|
I was not really against the 33371 patch as much as Oleg was. Allowing locks above lru_size to be cancelled in a batch of LDLM_LRU_EXCESS_LIMIT=16 locks makes sense to me, for efficiency reasons. If Oleg really feels that we shouldn't exceed lru_size at all (though without this patch we can already exceed lru_size by many thousands of locks), then we could shrink a batch of locks below lru_size. Of course, even better, as Oleg and Vitaly suggest would be to fix dynamic LRU size, but as yet it isn't really working properly (AFAICS) under pressure (only via aging), and it still makes sense to fix the fixed LRU size case since there are a number of sites using this already. |
| Comment by Gu Zheng (Inactive) [ 06/Sep/19 ] |
|
Thanks adilger |
| Comment by Gerrit Updater [ 31/Jul/20 ] |
|
Vitaly Fertman (vitaly.fertman@hpe.com) uploaded a new patch: https://review.whamcloud.com/39560 |
| Comment by Gerrit Updater [ 31/Jul/20 ] |
|
Vitaly Fertman (vitaly.fertman@hpe.com) uploaded a new patch: https://review.whamcloud.com/39561 |
| Comment by Gerrit Updater [ 31/Jul/20 ] |
|
Vitaly Fertman (vitaly.fertman@hpe.com) uploaded a new patch: https://review.whamcloud.com/39562 |
| Comment by Gerrit Updater [ 31/Jul/20 ] |
|
Vitaly Fertman (vitaly.fertman@hpe.com) uploaded a new patch: https://review.whamcloud.com/39563 |
| Comment by Gerrit Updater [ 31/Jul/20 ] |
|
Vitaly Fertman (vitaly.fertman@hpe.com) uploaded a new patch: https://review.whamcloud.com/39564 |
| Comment by Gerrit Updater [ 25/Aug/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39560/ |
| Comment by Gerrit Updater [ 25/Aug/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39561/ |
| Comment by Gerrit Updater [ 19/Sep/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39562/ |
| Comment by Gerrit Updater [ 19/Sep/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39563/ |
| Comment by Gerrit Updater [ 19/Sep/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39564/ |
| Comment by Peter Jones [ 19/Sep/20 ] |
|
I'm finding it hard to follow all the activity tracked under this ticket. There is still https://review.whamcloud.com/#/c/34584/ unlanded - is that still needed? |
| Comment by Cory Spitz [ 22/Sep/20 ] |
|
Peter, https://review.whamcloud.com/#/c/34584 appears to be active. |
| Comment by Andreas Dilger [ 23/Sep/20 ] |
Both 33371 and 35397 have been abandoned, only 34584 remains to be landed. |
| Comment by Gerrit Updater [ 25/Sep/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34584/ |
| Comment by Peter Jones [ 25/Sep/20 ] |
|
Wow. Finally. |
| Comment by Gerrit Updater [ 16/Dec/20 ] |
|
James Simmons (jsimmons@infradead.org) uploaded a new patch: https://review.whamcloud.com/41005 |
| Comment by Gerrit Updater [ 16/Dec/20 ] |
|
James Simmons (jsimmons@infradead.org) uploaded a new patch: https://review.whamcloud.com/41006 |
| Comment by Gerrit Updater [ 16/Dec/20 ] |
|
James Simmons (jsimmons@infradead.org) uploaded a new patch: https://review.whamcloud.com/41007 |
| Comment by Gerrit Updater [ 16/Dec/20 ] |
|
James Simmons (jsimmons@infradead.org) uploaded a new patch: https://review.whamcloud.com/41008 |
| Comment by Gerrit Updater [ 16/Dec/20 ] |
|
James Simmons (jsimmons@infradead.org) uploaded a new patch: https://review.whamcloud.com/41009 |
| Comment by Gerrit Updater [ 16/Dec/20 ] |
|
James Simmons (jsimmons@infradead.org) uploaded a new patch: https://review.whamcloud.com/41010 |
| Comment by Gerrit Updater [ 16/Dec/20 ] |
|
James Simmons (jsimmons@infradead.org) uploaded a new patch: https://review.whamcloud.com/41011 |
| Comment by James A Simmons [ 12/Mar/21 ] |
|
Any chance to land these to 2.12 LTS? |
| Comment by Gerrit Updater [ 06/Apr/21 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/41005/ |
| Comment by Gerrit Updater [ 06/Apr/21 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/41006/ |
| Comment by Gerrit Updater [ 13/Sep/21 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/41007/ |