[LU-8221] Changes required to allow USE_LU_REF/lu_ref feature enabling and usage Created: 31/May/16 Updated: 17/Mar/20 Resolved: 17/Mar/20 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.14.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Bruno Faccini (Inactive) | Assignee: | Bruno Faccini (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
When trying to enable (by specifying additional "--enable-lu_ref" configure option) and use the USE_LU_REF/lu_ref feature to help tracking some internal "reference" issue with one of my patch under development, I faced with already present bugs (introduced by previous changes and not detected due to no build/tests with feature enabled!) in direct relation with the feature. |
| Comments |
| Comment by Gerrit Updater [ 31/May/16 ] |
|
Faccini Bruno (bruno.faccini@intel.com) uploaded a new patch: http://review.whamcloud.com/20519 |
| Comment by Bruno Faccini (Inactive) [ 18/Feb/20 ] |
|
Trying to build+run with USE_LU_REF defined ("configure --enable-lu_ref") to track a leaked reference, I have found that since my previous patch/Gerrit-change #20519 never had a chance to land, this same bug (in fact 2 in osc_extent_put(), adding a lu_ref for "osc_extent" where it should delete it, and using LDLM_LOCK_PUT() instead of LDLM_LOCK_RELEASE() to not try to delete an inexisting lu_ref for "handle", hence the LBUG) is still present in current master branch and still triggers a LBUG! Here are some "crash" tool analysis extracts to detail the problem : crash> dmesg | less
.......................................
[193321.462161] Lustre: DEBUG MARKER: == sanity test 13: creat .../d13/f; dd .../d13/f; > .../d13/f ======================================== 15:35:49 (1581953749)
[193321.521103] LustreError: 145892:0:(lu_ref.c:96:lu_ref_print()) lu_ref: ffff903e66732f20 7 0 ldlm_lock_new:495
[193321.535104] LustreError: 145892:0:(lu_ref.c:98:lu_ref_print()) link: hash ffff903e66732d00
[193321.557773] LustreError: 145892:0:(lu_ref.c:257:lu_ref_del()) ASSERTION( 0 ) failed:
[193321.569081] LustreError: 145892:0:(lu_ref.c:257:lu_ref_del()) LBUG
[193321.578372] Pid: 145892, comm: dd 3.10.0-862.14.4.el7_lustre_ClientSymlink_279c264.x86_64 #1 SMP Thu Oct 17 10:54:24 UTC 2019
[193321.595651] Call Trace:
[193321.600603] [<ffffffffc0f190ec>] libcfs_call_trace+0x8c/0xc0 [libcfs]
[193321.610154] [<ffffffffc0f1919c>] lbug_with_loc+0x4c/0xa0 [libcfs]
[193321.619154] [<ffffffffc0b0eab0>] lu_ref_set_at+0x0/0x160 [obdclass]
[193321.628279] [<ffffffffc0d89366>] osc_extent_put+0x96/0x360 [osc]
[193321.637048] [<ffffffffc0d9c29d>] osc_extent_find+0x14d1/0x1beb [osc]
[193321.646086] [<ffffffffc0d974e7>] osc_queue_async_io+0x747/0x1b30 [osc]
[193321.655219] [<ffffffffc0d7f1f2>] osc_page_cache_add+0x52/0x140 [osc]
[193321.664128] [<ffffffffc0d870b8>] osc_io_commit_async+0x2c8/0x500 [osc]
[193321.673206] [<ffffffffc0b0ca6a>] cl_io_commit_async+0x7a/0x140 [obdclass]
[193321.682573] [<ffffffffc0e456d0>] lov_io_commit_async+0x2c0/0x4c0 [lov]
[193321.691618] [<ffffffffc0b0ca6a>] cl_io_commit_async+0x7a/0x140 [obdclass]
[193321.700936] [<ffffffffc11fa373>] vvp_io_write_commit+0x173/0x8c0 [lustre]
[193321.710261] [<ffffffffc11fb149>] vvp_io_write_start+0x689/0xa70 [lustre]
[193321.719429] [<ffffffffc0b0bf08>] cl_io_start+0x68/0x130 [obdclass]
[193321.727997] [<ffffffffc0b0dcfc>] cl_io_loop+0xcc/0x1c0 [obdclass]
[193321.736441] [<ffffffffc11aa49a>] ll_file_io_generic+0x5da/0xb00 [lustre]
[193321.745535] [<ffffffffc11ab0e1>] ll_file_aio_write+0x491/0x780 [lustre]
[193321.754507] [<ffffffffc11ab4d0>] ll_file_write+0x100/0x1c0 [lustre]
[193321.763062] [<ffffffff8941f240>] vfs_write+0xc0/0x1f0
[193321.770225] [<ffffffff8942006f>] SyS_write+0x7f/0xf0
[193321.777260] [<ffffffff8992579b>] system_call_fastpath+0x22/0x27
[193321.785342] [<ffffffffffffffff>] 0xffffffffffffffff
[193321.792234] Kernel panic - not syncing: LBUG
[193321.798290] CPU: 37 PID: 145892 Comm: dd Kdump: loaded Tainted: G IOE ------------ 3.10.0-862.14.4.el7_lustre_ClientSymlink_279c264.x86_64 #1
[193321.816097] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS SE5C610.86B.01.01.0008.021120151325 02/11/2015
[193321.829044] Call Trace:
[193321.833045] [<ffffffff89913754>] dump_stack+0x19/0x1b
[193321.840032] [<ffffffff8990d29f>] panic+0xe8/0x21f
[193321.846607] [<ffffffffc0f191eb>] lbug_with_loc+0x9b/0xa0 [libcfs]
[193321.854732] [<ffffffffc0b0eab0>] lu_ref_del+0x230/0x230 [obdclass]
[193321.862912] [<ffffffffc0d89366>] osc_extent_put+0x96/0x360 [osc]
[193321.870874] [<ffffffffc0d9c29d>] osc_extent_find+0x14d1/0x1beb [osc]
[193321.879196] [<ffffffffc0d974e7>] osc_queue_async_io+0x747/0x1b30 [osc]
[193321.887697] [<ffffffffc11fa000>] ? vvp_set_pagevec_dirty+0x390/0x390 [lustre]
[193321.896854] [<ffffffffc11fa000>] ? vvp_set_pagevec_dirty+0x390/0x390 [lustre]
[193321.905966] [<ffffffffc0d7f1f2>] osc_page_cache_add+0x52/0x140 [osc]
[193321.914181] [<ffffffffc0d870b8>] osc_io_commit_async+0x2c8/0x500 [osc]
[193321.922575] [<ffffffffc11fa000>] ? vvp_set_pagevec_dirty+0x390/0x390 [lustre]
[193321.931627] [<ffffffffc11fa000>] ? vvp_set_pagevec_dirty+0x390/0x390 [lustre]
[193321.940651] [<ffffffffc0b0ca6a>] cl_io_commit_async+0x7a/0x140 [obdclass]
[193321.949254] [<ffffffffc0e456d0>] lov_io_commit_async+0x2c0/0x4c0 [lov]
[193321.957566] [<ffffffffc11fa000>] ? vvp_set_pagevec_dirty+0x390/0x390 [lustre]
[193321.966566] [<ffffffffc11fa000>] ? vvp_set_pagevec_dirty+0x390/0x390 [lustre]
[193321.975764] [<ffffffffc0b0ca6a>] cl_io_commit_async+0x7a/0x140 [obdclass]
[193321.984547] [<ffffffffc11fa373>] vvp_io_write_commit+0x173/0x8c0 [lustre]
[193321.993304] [<ffffffffc11fb149>] vvp_io_write_start+0x689/0xa70 [lustre]
[193322.001967] [<ffffffffc0b0bf08>] cl_io_start+0x68/0x130 [obdclass]
[193322.010053] [<ffffffffc0b0dcfc>] cl_io_loop+0xcc/0x1c0 [obdclass]
[193322.018018] [<ffffffffc11aa49a>] ll_file_io_generic+0x5da/0xb00 [lustre]
[193322.026674] [<ffffffffc11ab0e1>] ll_file_aio_write+0x491/0x780 [lustre]
[193322.035235] [<ffffffffc11ab4d0>] ll_file_write+0x100/0x1c0 [lustre]
[193322.043390] [<ffffffff8941f240>] vfs_write+0xc0/0x1f0
[193322.050190] [<ffffffff8942006f>] SyS_write+0x7f/0xf0
[193322.056876] [<ffffffff8992579b>] system_call_fastpath+0x22/0x27
(END)
crash> bt
PID: 145892 TASK: ffff903f28371fa0 CPU: 37 COMMAND: "dd"
#0 [ffff903e667d3418] machine_kexec at ffffffff89262a0a
#1 [ffff903e667d3478] __crash_kexec at ffffffff893166c2
#2 [ffff903e667d3548] panic at ffffffff8990d2aa
#3 [ffff903e667d35c8] lbug_with_loc at ffffffffc0f191eb [libcfs]
#4 [ffff903e667d3620] osc_extent_put at ffffffffc0d89366 [osc]
#5 [ffff903e667d3640] osc_extent_find at ffffffffc0d9c29d [osc]
#6 [ffff903e667d37d0] osc_queue_async_io at ffffffffc0d974e7 [osc]
#7 [ffff903e667d3938] osc_page_cache_add at ffffffffc0d7f1f2 [osc]
#8 [ffff903e667d3968] osc_io_commit_async at ffffffffc0d870b8 [osc]
#9 [ffff903e667d39e0] cl_io_commit_async at ffffffffc0b0ca6a [obdclass]
#10 [ffff903e667d3a28] lov_io_commit_async at ffffffffc0e456d0 [lov]
#11 [ffff903e667d3a88] cl_io_commit_async at ffffffffc0b0ca6a [obdclass]
#12 [ffff903e667d3ad0] vvp_io_write_commit at ffffffffc11fa373 [lustre]
#13 [ffff903e667d3b28] vvp_io_write_start at ffffffffc11fb149 [lustre]
#14 [ffff903e667d3bf8] cl_io_start at ffffffffc0b0bf08 [obdclass]
#15 [ffff903e667d3c20] cl_io_loop at ffffffffc0b0dcfc [obdclass]
#16 [ffff903e667d3c50] ll_file_io_generic at ffffffffc11aa49a [lustre]
#17 [ffff903e667d3d58] ll_file_aio_write at ffffffffc11ab0e1 [lustre]
#18 [ffff903e667d3de0] ll_file_write at ffffffffc11ab4d0 [lustre]
#19 [ffff903e667d3ec8] vfs_write at ffffffff8941f240
#20 [ffff903e667d3f08] sys_write at ffffffff8942006f
#21 [ffff903e667d3f50] system_call_fastpath at ffffffff8992579b
RIP: 00007feb0b2d19b0 RSP: 00007ffed85aaaa8 RFLAGS: 00010206
RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000010000
RDX: 0000000000000200 RSI: 0000000001e4e000 RDI: 0000000000000001
RBP: 0000000000000000 R8: 0000000001e4e1f0 R9: 0000000000000000
R10: 00007ffed85aa4a0 R11: 0000000000000246 R12: 0000000000000200
R13: 0000000001e4e000 R14: 0000000001e4e200 R15: 0000000000000800
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
crash>
So I will rebase #20519 and hope reviewers to have a look.
|
| Comment by Gerrit Updater [ 17/Mar/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/20519/ |
| Comment by Peter Jones [ 17/Mar/20 ] |
|
Landed for 2.14 |