[LU-11276] racer: mdc_dev.c:1346:mdc_req_attr_set()) uncovered page Created: 22/Aug/18 Updated: 10/Nov/23 Resolved: 26/Sep/20 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.0 |
| Fix Version/s: | Lustre 2.14.0, Lustre 2.12.6 |
| Type: | Bug | Priority: | Major |
| Reporter: | Oleg Drokin | Assignee: | Mikhail Pershin |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | DoM2, LTS12 | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||||||
| Description |
|
Saw this in master-next on Ag 12th first once, and then in the past two day many more occurences all of a sudden.
[27046.472503] LustreError: 12291:0:(ldlm_resource.c:1725:ldlm_resource_dump()) --- Resource: [0x200000406:0xe97:0x0].0x0 (ffff88000cdd1c80) refcount = 4 [27046.474149] LustreError: 12291:0:(ldlm_resource.c:1728:ldlm_resource_dump()) Granted locks (in reverse order): [27046.475547] LustreError: 12291:0:(ldlm_resource.c:1731:ldlm_resource_dump()) ### ### ns: lustre-MDT0000-mdc-ffff880050df3800 lock: ffff880008d22d80/0x85f55ba2c9d6a1c8 lrc: 3/1,0 mode: CR/CR res: [0x200000406:0xe97:0x0].0x0 bits 0x8/0x0 rrc: 5 type: IBT flags: 0x0 nid: local remote: 0x85f55ba2c9d6a1cf expref: -99 pid: 12291 timeout: 0 lvb_type: 3 [27046.479190] LustreError: 12291:0:(ldlm_resource.c:1731:ldlm_resource_dump()) ### ### ns: lustre-MDT0000-mdc-ffff880050df3800 lock: ffff8800a46fad80/0x85f55ba2c9d6a040 lrc: 1/0,0 mode: PR/PR res: [0x200000406:0xe97:0x0].0x0 bits 0x20/0x0 rrc: 5 type: IBT flags: 0x0 nid: local remote: 0x85f55ba2c9d6a04e expref: -99 pid: 12291 timeout: 0 lvb_type: 0 [27046.482737] Pid: 12291, comm: dd 3.10.0-7.5-debug #2 SMP Sat Jun 2 18:41:57 EDT 2018 [27046.484162] Call Trace: [27046.484763] [<ffffffffa01b77dc>] libcfs_call_trace+0x8c/0xc0 [libcfs] [27046.485651] [<ffffffffa01b7836>] libcfs_debug_dumpstack+0x26/0x30 [libcfs] [27046.486498] [<ffffffffa088781b>] mdc_get_lock_handle+0xcb/0xe0 [mdc] [27046.487299] [<ffffffffa0887c50>] mdc_req_attr_set+0x90/0x170 [mdc] [27046.487994] [<ffffffffa0379170>] cl_req_attr_set+0x60/0x150 [obdclass] [27046.488818] [<ffffffffa07fb4e6>] osc_build_rpc+0x496/0xfe0 [osc] [27046.489621] [<ffffffffa0815597>] osc_io_unplug0+0xe17/0x1920 [osc] [27046.490470] [<ffffffffa081a050>] osc_cache_writeback_range+0xc20/0x1260 [osc] [27046.491804] [<ffffffffa0884c96>] mdc_io_fsync_start+0x56/0x170 [mdc] [27046.493774] [<ffffffffa03793e5>] cl_io_start+0x65/0x130 [obdclass] [27046.494603] [<ffffffffa08accd5>] lov_io_call.isra.5+0x85/0x140 [lov] [27046.495669] [<ffffffffa08ace96>] lov_io_start+0x56/0x150 [lov] [27046.496550] [<ffffffffa03793e5>] cl_io_start+0x65/0x130 [obdclass] [27046.497361] [<ffffffffa037b6a1>] cl_io_loop+0x131/0xb30 [obdclass] [27046.498087] [<ffffffffa144e9eb>] cl_sync_file_range+0x2db/0x380 [lustre] [27046.500281] [<ffffffffa1494fd4>] vvp_prune+0x64/0x2b0 [lustre] [27046.501184] [<ffffffffa03718e5>] cl_object_prune+0x55/0x110 [obdclass] [27046.503100] [<ffffffffa08c1e01>] lov_conf_set+0x531/0xa70 [lov] [27046.504495] [<ffffffffa0371a00>] cl_conf_set+0x60/0x120 [obdclass] [27046.505423] [<ffffffffa1450e81>] ll_layout_conf+0x81/0x410 [lustre] [27046.507428] [<ffffffffa14517c5>] ll_layout_refresh+0x5b5/0xa60 [lustre] [27046.508276] [<ffffffffa149468f>] vvp_io_init+0x35f/0x460 [lustre] [27046.509231] [<ffffffffa03797e8>] cl_io_init0.isra.17+0x88/0x160 [obdclass] [27046.526478] [<ffffffffa0379983>] cl_io_init+0x43/0x80 [obdclass] [27046.527149] [<ffffffffa037a383>] cl_io_rw_init+0x93/0x2a0 [obdclass] [27046.527843] [<ffffffffa1447d3d>] ll_file_io_generic+0x20d/0xd10 [lustre] [27046.528490] [<ffffffffa1448db1>] ll_file_aio_write+0x391/0x830 [lustre] [27046.529163] [<ffffffffa14492f4>] ll_file_write+0xa4/0x170 [lustre] [27046.529567] [<ffffffff8120daad>] vfs_write+0xbd/0x1e0 [27046.530010] [<ffffffff8120e8cf>] SyS_write+0x7f/0xf0 [27046.530450] [<ffffffff817838e9>] system_call_fastpath+0x16/0x1b [27046.530895] [<ffffffffffffffff>] 0xffffffffffffffff [27046.531358] LustreError: 12291:0:(mdc_dev.c:1346:mdc_req_attr_set()) page@ffff88000dd4ae00[2 ffff880018299718 2 1 (null)] [27046.532734] LustreError: 12291:0:(mdc_dev.c:1346:mdc_req_attr_set()) vvp-page@ffff88000dd4ae50(0:0) vm@ffffea00020884e0 1fffff0000282c 2:0 ffff88000dd4ae00 0 lru [27046.534974] LustreError: 12291:0:(mdc_dev.c:1346:mdc_req_attr_set()) lov-page@ffff88000dd4ae90, comp index: 0, gen: 2 [27046.536852] LustreError: 12291:0:(mdc_dev.c:1346:mdc_req_attr_set()) osc-page@ffff88000dd4aec8 0: 1< 0x845fed 258 0 + + > 2< 0 0 4096 0x5 0x520 | (null) ffff88000990ac48 ffff880008e60e60 > 3< 1 64 0 > 4< 0 0 8 11886592 - | - - + - > 5< - - + - | 0 - | 0 - -> [27046.540751] LustreError: 12291:0:(mdc_dev.c:1346:mdc_req_attr_set()) end page@ffff88000dd4ae00 [27046.542709] LustreError: 12291:0:(mdc_dev.c:1346:mdc_req_attr_set()) uncovered page! [27046.544045] LustreError: 12291:0:(mdc_dev.c:1347:mdc_req_attr_set()) LBUG [27046.544824] Pid: 12291, comm: dd 3.10.0-7.5-debug #2 SMP Sat Jun 2 18:41:57 EDT 2018 [27046.546276] Call Trace: [27046.546712] [<ffffffffa01b77dc>] libcfs_call_trace+0x8c/0xc0 [libcfs] [27046.547165] [<ffffffffa01b788c>] lbug_with_loc+0x4c/0xa0 [libcfs] [27046.547704] [<ffffffffa0887d24>] mdc_req_attr_set+0x164/0x170 [mdc] [27046.548206] [<ffffffffa0379170>] cl_req_attr_set+0x60/0x150 [obdclass] [27046.548675] [<ffffffffa07fb4e6>] osc_build_rpc+0x496/0xfe0 [osc] [27046.549099] [<ffffffffa0815597>] osc_io_unplug0+0xe17/0x1920 [osc] [27046.549834] [<ffffffffa081a050>] osc_cache_writeback_range+0xc20/0x1260 [osc] [27046.550585] [<ffffffffa0884c96>] mdc_io_fsync_start+0x56/0x170 [mdc] [27046.551013] [<ffffffffa03793e5>] cl_io_start+0x65/0x130 [obdclass] [27046.551433] [<ffffffffa08accd5>] lov_io_call.isra.5+0x85/0x140 [lov] [27046.551839] [<ffffffffa08ace96>] lov_io_start+0x56/0x150 [lov] [27046.552439] [<ffffffffa03793e5>] cl_io_start+0x65/0x130 [obdclass] [27046.553102] [<ffffffffa037b6a1>] cl_io_loop+0x131/0xb30 [obdclass] [27046.553823] [<ffffffffa144e9eb>] cl_sync_file_range+0x2db/0x380 [lustre] [27046.554653] [<ffffffffa1494fd4>] vvp_prune+0x64/0x2b0 [lustre] [27046.555426] [<ffffffffa03718e5>] cl_object_prune+0x55/0x110 [obdclass] [27046.556088] [<ffffffffa08c1e01>] lov_conf_set+0x531/0xa70 [lov] [27046.556528] [<ffffffffa0371a00>] cl_conf_set+0x60/0x120 [obdclass] [27046.556978] [<ffffffffa1450e81>] ll_layout_conf+0x81/0x410 [lustre] [27046.557386] [<ffffffffa14517c5>] ll_layout_refresh+0x5b5/0xa60 [lustre] [27046.557836] [<ffffffffa149468f>] vvp_io_init+0x35f/0x460 [lustre] [27046.558475] [<ffffffffa03797e8>] cl_io_init0.isra.17+0x88/0x160 [obdclass] [27046.559128] [<ffffffffa0379983>] cl_io_init+0x43/0x80 [obdclass] [27046.559655] [<ffffffffa037a383>] cl_io_rw_init+0x93/0x2a0 [obdclass] [27046.560233] [<ffffffffa1447d3d>] ll_file_io_generic+0x20d/0xd10 [lustre] [27046.560650] [<ffffffffa1448db1>] ll_file_aio_write+0x391/0x830 [lustre] [27046.561267] [<ffffffffa14492f4>] ll_file_write+0xa4/0x170 [lustre] [27046.561948] [<ffffffff8120daad>] vfs_write+0xbd/0x1e0 [27046.562603] [<ffffffff8120e8cf>] SyS_write+0x7f/0xf0 [27046.563264] [<ffffffff817838e9>] system_call_fastpath+0x16/0x1b [27046.563704] [<ffffffffffffffff>] 0xffffffffffffffff [27046.564385] Kernel panic - not syncing: LBUG |
| Comments |
| Comment by Gerrit Updater [ 07/Sep/18 ] |
|
Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33125 |
| Comment by Gerrit Updater [ 10/Sep/18 ] |
|
Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33129 |
| Comment by Gerrit Updater [ 12/Oct/18 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33125/ |
| Comment by Andrew Perepechko [ 04/Mar/19 ] |
With this patch, unlinkmany (after createmany) performance within a DoM directory drops by ~35%: HEAD is now at 70a01a6... LU-11276 ldlm: don't apply ELC to converting and DOM locks total: 20000 unlinks in 52 seconds: 384.615387 unlinks/second HEAD is now at 7237248... LU-11347 osd: do not use pagecache for I/O total: 20000 unlinks in 34 seconds: 588.235291 unlinks/second HEAD is now at 8b9105d... LU-11199 mdt: Attempt lookup lock on open total: 20000 unlinks in 34 seconds: 588.235291 unlinks/second HEAD is now at 697e8fe... LU-11473 doc: add lfs-getsom man page total: 20000 unlinks in 34 seconds: 588.235291 unlinks/second HEAD is now at ed0c19d... LU-1095 misc: quiet console messages at startup Is there a way to improve it to keep unlinks faster? |
| Comment by Mikhail Pershin [ 05/Mar/19 ] |
|
Andrew, I will check that, thanks for this finding. |
| Comment by Andrew Perepechko [ 05/Mar/19 ] |
|
Thank you Mikhail! |
| Comment by Gerrit Updater [ 25/Mar/19 ] |
|
Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34500 |
| Comment by Mikhail Pershin [ 27/Mar/19 ] |
|
Andrew, I've created separate ticket for problem you've reported - |
| Comment by Gerrit Updater [ 16/Oct/19 ] |
|
Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36466 |
| Comment by Gerrit Updater [ 23/Jan/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36466/ |
| Comment by Andreas Dilger [ 04/Jun/20 ] |
|
Can this ticket be resolved for 2.14.0 with the landing of patch 36466? Is patch 34500 still needed or is it obsolete? |
| Comment by Mikhail Pershin [ 05/Jun/20 ] |
|
yes, it can be set as resolved. Patch 34500 is just optimisation and not related to this particular issue, if I will refresh it, I'll create separate ticket for it |
| Comment by Gerrit Updater [ 09/Sep/20 ] |
|
Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/39854 |
| Comment by Gerrit Updater [ 22/Oct/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39854/ |
| Comment by Alex Zhuravlev [ 01/Nov/22 ] |
[ 1851.605177] LustreError: 330093:0:(mdc_dev.c:1428:mdc_req_attr_set()) uncovered page! [ 1851.605222] LustreError: 330093:0:(mdc_dev.c:1429:mdc_req_attr_set()) LBUG [ 1851.605260] Pid: 330093, comm: kworker/u4:0 4.18.0 #2 SMP Sun Oct 23 17:58:04 UTC 2022 [ 1851.605305] Call Trace TBD: [ 1851.605329] [<0>] libcfs_call_trace+0x67/0x90 [libcfs] [ 1851.605373] [<0>] lbug_with_loc+0x3e/0x80 [libcfs] [ 1851.605416] [<0>] mdc_req_attr_set+0x153/0x160 [mdc] [ 1851.605482] [<0>] cl_req_attr_set+0x56/0x150 [obdclass] [ 1851.605533] [<0>] osc_build_rpc+0x3e0/0x1210 [osc] [ 1851.605718] [<0>] osc_io_unplug0+0x167e/0x18b0 [osc] [ 1851.605773] [<0>] osc_cache_writeback_range+0xf05/0xfa0 [osc] [ 1851.605826] [<0>] mdc_io_fsync_start+0x4a/0x170 [mdc] [ 1851.605886] [<0>] cl_io_start+0x55/0x110 [obdclass] [ 1851.605935] [<0>] lov_io_call.isra.9+0x7c/0x140 [lov] [ 1851.605994] [<0>] cl_io_start+0x55/0x110 [obdclass] [ 1851.606057] [<0>] cl_io_loop+0x95/0x200 [obdclass] [ 1851.606114] [<0>] cl_sync_file_range+0x27c/0x2f0 [lustre] [ 1851.606169] [<0>] ll_writepages+0x6e/0x1e0 [lustre] [ 1851.606214] [<0>] do_writepages+0x29/0xc0 [ 1851.606237] [<0>] __writeback_single_inode+0x30/0x190 [ 1851.606268] [<0>] writeback_sb_inodes+0x1c0/0x3d0 [ 1851.606299] [<0>] __writeback_inodes_wb+0x5a/0xc0 [ 1851.606329] [<0>] wb_writeback+0x1b7/0x1c0 [ 1851.606352] [<0>] wb_workfn+0x1da/0x320 [ 1851.606376] [<0>] process_one_work+0x23f/0x420 [ 1851.606407] [<0>] worker_thread+0x35/0x3e0 old evil returns in racer? |
| Comment by Alex Zhuravlev [ 10/Nov/23 ] |
|
still hitting in racer: [ 1885.498891] LustreError: 70454:0:(mdc_dev.c:1449:mdc_req_attr_set()) page@000000003d88dd0e[2 00000000b08be5c2 3 1 000000004fac1a50] [ 1885.499034] LustreError: 70454:0:(mdc_dev.c:1449:mdc_req_attr_set()) vmpage @00000000c1af35f0 2480000005028 2:0 ffffa218584fdd98 0 lru [ 1885.499181] LustreError: 70454:0:(mdc_dev.c:1449:mdc_req_attr_set()) osc-page@0000000071245808 0: 1< 2 + + > 2< 0 0 52 0x5 0x40420 | 000000004fac1a50 00000000a2ae15e2 00000000ab94b77b > 3< 1 0 > 4< 0 0 8 30150656 - | - - + - > 5< - - + - | 0 - | 0 - -> [ 1885.499444] LustreError: 70454:0:(mdc_dev.c:1449:mdc_req_attr_set()) end page@000000003d88dd0e [ 1885.499560] LustreError: 70454:0:(mdc_dev.c:1449:mdc_req_attr_set()) uncovered page! [ 1885.499656] LustreError: 70454:0:(mdc_dev.c:1450:mdc_req_attr_set()) LBUG [ 1885.499736] Pid: 70454, comm: kworker/u8:5 4.18.0 #2 SMP Sun Oct 23 17:58:04 UTC 2022 [ 1885.499829] Call Trace TBD: [ 1885.499876] [<0>] libcfs_call_trace+0x5e/0x80 [libcfs] [ 1885.499946] [<0>] lbug_with_loc+0x3e/0x80 [libcfs] [ 1885.500036] [<0>] mdc_req_attr_set+0x150/0x150 [mdc] [ 1885.500126] [<0>] cl_req_attr_set+0x56/0x150 [obdclass] [ 1885.500203] [<0>] osc_build_rpc+0x448/0x1340 [osc] [ 1885.500277] [<0>] osc_io_unplug0+0x16f9/0x1930 [osc] [ 1885.500351] [<0>] osc_cache_writeback_range+0xf05/0xfa0 [osc] [ 1885.500438] [<0>] mdc_io_fsync_start+0xbb/0x240 [mdc] [ 1885.500527] [<0>] cl_io_start+0x55/0x110 [obdclass] [ 1885.500604] [<0>] lov_io_call.isra.9+0x6a/0x150 [lov] [ 1885.500688] [<0>] cl_io_start+0x55/0x110 [obdclass] [ 1885.500775] [<0>] cl_io_loop+0x95/0x200 [obdclass] [ 1885.500859] [<0>] cl_sync_file_range+0x27c/0x2f0 [lustre] [ 1885.500950] [<0>] ll_writepages+0xb3/0x260 [lustre] [ 1885.501021] [<0>] do_writepages+0x29/0xc0 [ 1885.501072] [<0>] __writeback_single_inode+0x30/0x190 [ 1885.501138] [<0>] writeback_sb_inodes+0x1c0/0x3d0 [ 1885.501203] [<0>] __writeback_inodes_wb+0x5a/0xc0 [ 1885.501268] [<0>] wb_writeback+0x1b7/0x1c0 [ 1885.501317] [<0>] wb_workfn+0x1da/0x320 [ 1885.501368] [<0>] process_one_work+0x23f/0x420 [ 1885.501433] [<0>] worker_thread+0x35/0x3e0 [ 1885.501482] [<0>] kthread+0x129/0x140 [ 1885.501540] [<0>] ret_from_fork+0x1f/0x30 |