[LU-8221] Changes required to allow USE_LU_REF/lu_ref feature enabling and usage Created: 31/May/16  Updated: 17/Mar/20  Resolved: 17/Mar/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.14.0

Type: Bug Priority: Minor
Reporter: Bruno Faccini (Inactive) Assignee: Bruno Faccini (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Cloners
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

When trying to enable (by specifying additional "--enable-lu_ref" configure option) and use the USE_LU_REF/lu_ref feature to help tracking some internal "reference" issue with one of my patch under development, I faced with already present bugs (introduced by previous changes and not detected due to no build/tests with feature enabled!) in direct relation with the feature.



 Comments   
Comment by Gerrit Updater [ 31/May/16 ]

Faccini Bruno (bruno.faccini@intel.com) uploaded a new patch: http://review.whamcloud.com/20519
Subject: LU-8221 clio: required changes for lu_ref feature to work
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: de559bfd52e07c8993800a6f03b622fa50b2d2c7

Comment by Bruno Faccini (Inactive) [ 18/Feb/20 ]

Trying to build+run with USE_LU_REF defined ("configure --enable-lu_ref") to track a leaked reference, I have found that since my previous patch/Gerrit-change #20519 never had a chance to land, this same bug (in fact 2 in osc_extent_put(), adding a lu_ref for "osc_extent" where it should delete it, and using LDLM_LOCK_PUT() instead of LDLM_LOCK_RELEASE() to not try to delete an inexisting lu_ref for "handle", hence the LBUG) is still present in current master branch and still triggers a LBUG!

Here are some "crash" tool analysis extracts to detail the problem :

crash> dmesg | less
.......................................
[193321.462161] Lustre: DEBUG MARKER: == sanity test 13: creat .../d13/f; dd .../d13/f; > .../d13/f ======================================== 15:35:49 (1581953749)
[193321.521103] LustreError: 145892:0:(lu_ref.c:96:lu_ref_print()) lu_ref: ffff903e66732f20 7 0 ldlm_lock_new:495
[193321.535104] LustreError: 145892:0:(lu_ref.c:98:lu_ref_print())      link: hash ffff903e66732d00
[193321.557773] LustreError: 145892:0:(lu_ref.c:257:lu_ref_del()) ASSERTION( 0 ) failed: 
[193321.569081] LustreError: 145892:0:(lu_ref.c:257:lu_ref_del()) LBUG
[193321.578372] Pid: 145892, comm: dd 3.10.0-862.14.4.el7_lustre_ClientSymlink_279c264.x86_64 #1 SMP Thu Oct 17 10:54:24 UTC 2019
[193321.595651] Call Trace:
[193321.600603]  [<ffffffffc0f190ec>] libcfs_call_trace+0x8c/0xc0 [libcfs]
[193321.610154]  [<ffffffffc0f1919c>] lbug_with_loc+0x4c/0xa0 [libcfs]
[193321.619154]  [<ffffffffc0b0eab0>] lu_ref_set_at+0x0/0x160 [obdclass]
[193321.628279]  [<ffffffffc0d89366>] osc_extent_put+0x96/0x360 [osc]
[193321.637048]  [<ffffffffc0d9c29d>] osc_extent_find+0x14d1/0x1beb [osc]
[193321.646086]  [<ffffffffc0d974e7>] osc_queue_async_io+0x747/0x1b30 [osc]
[193321.655219]  [<ffffffffc0d7f1f2>] osc_page_cache_add+0x52/0x140 [osc]
[193321.664128]  [<ffffffffc0d870b8>] osc_io_commit_async+0x2c8/0x500 [osc]
[193321.673206]  [<ffffffffc0b0ca6a>] cl_io_commit_async+0x7a/0x140 [obdclass]
[193321.682573]  [<ffffffffc0e456d0>] lov_io_commit_async+0x2c0/0x4c0 [lov]
[193321.691618]  [<ffffffffc0b0ca6a>] cl_io_commit_async+0x7a/0x140 [obdclass]
[193321.700936]  [<ffffffffc11fa373>] vvp_io_write_commit+0x173/0x8c0 [lustre]
[193321.710261]  [<ffffffffc11fb149>] vvp_io_write_start+0x689/0xa70 [lustre]
[193321.719429]  [<ffffffffc0b0bf08>] cl_io_start+0x68/0x130 [obdclass]
[193321.727997]  [<ffffffffc0b0dcfc>] cl_io_loop+0xcc/0x1c0 [obdclass]
[193321.736441]  [<ffffffffc11aa49a>] ll_file_io_generic+0x5da/0xb00 [lustre]
[193321.745535]  [<ffffffffc11ab0e1>] ll_file_aio_write+0x491/0x780 [lustre]
[193321.754507]  [<ffffffffc11ab4d0>] ll_file_write+0x100/0x1c0 [lustre]
[193321.763062]  [<ffffffff8941f240>] vfs_write+0xc0/0x1f0
[193321.770225]  [<ffffffff8942006f>] SyS_write+0x7f/0xf0
[193321.777260]  [<ffffffff8992579b>] system_call_fastpath+0x22/0x27
[193321.785342]  [<ffffffffffffffff>] 0xffffffffffffffff
[193321.792234] Kernel panic - not syncing: LBUG
[193321.798290] CPU: 37 PID: 145892 Comm: dd Kdump: loaded Tainted: G          IOE  ------------   3.10.0-862.14.4.el7_lustre_ClientSymlink_279c264.x86_64 #1
[193321.816097] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS SE5C610.86B.01.01.0008.021120151325 02/11/2015
[193321.829044] Call Trace:
[193321.833045]  [<ffffffff89913754>] dump_stack+0x19/0x1b
[193321.840032]  [<ffffffff8990d29f>] panic+0xe8/0x21f
[193321.846607]  [<ffffffffc0f191eb>] lbug_with_loc+0x9b/0xa0 [libcfs]
[193321.854732]  [<ffffffffc0b0eab0>] lu_ref_del+0x230/0x230 [obdclass]
[193321.862912]  [<ffffffffc0d89366>] osc_extent_put+0x96/0x360 [osc]
[193321.870874]  [<ffffffffc0d9c29d>] osc_extent_find+0x14d1/0x1beb [osc]
[193321.879196]  [<ffffffffc0d974e7>] osc_queue_async_io+0x747/0x1b30 [osc]
[193321.887697]  [<ffffffffc11fa000>] ? vvp_set_pagevec_dirty+0x390/0x390 [lustre]
[193321.896854]  [<ffffffffc11fa000>] ? vvp_set_pagevec_dirty+0x390/0x390 [lustre]
[193321.905966]  [<ffffffffc0d7f1f2>] osc_page_cache_add+0x52/0x140 [osc]
[193321.914181]  [<ffffffffc0d870b8>] osc_io_commit_async+0x2c8/0x500 [osc]
[193321.922575]  [<ffffffffc11fa000>] ? vvp_set_pagevec_dirty+0x390/0x390 [lustre]
[193321.931627]  [<ffffffffc11fa000>] ? vvp_set_pagevec_dirty+0x390/0x390 [lustre]
[193321.940651]  [<ffffffffc0b0ca6a>] cl_io_commit_async+0x7a/0x140 [obdclass]
[193321.949254]  [<ffffffffc0e456d0>] lov_io_commit_async+0x2c0/0x4c0 [lov]
[193321.957566]  [<ffffffffc11fa000>] ? vvp_set_pagevec_dirty+0x390/0x390 [lustre]
[193321.966566]  [<ffffffffc11fa000>] ? vvp_set_pagevec_dirty+0x390/0x390 [lustre]
[193321.975764]  [<ffffffffc0b0ca6a>] cl_io_commit_async+0x7a/0x140 [obdclass]
[193321.984547]  [<ffffffffc11fa373>] vvp_io_write_commit+0x173/0x8c0 [lustre]
[193321.993304]  [<ffffffffc11fb149>] vvp_io_write_start+0x689/0xa70 [lustre]
[193322.001967]  [<ffffffffc0b0bf08>] cl_io_start+0x68/0x130 [obdclass]
[193322.010053]  [<ffffffffc0b0dcfc>] cl_io_loop+0xcc/0x1c0 [obdclass]
[193322.018018]  [<ffffffffc11aa49a>] ll_file_io_generic+0x5da/0xb00 [lustre]
[193322.026674]  [<ffffffffc11ab0e1>] ll_file_aio_write+0x491/0x780 [lustre]
[193322.035235]  [<ffffffffc11ab4d0>] ll_file_write+0x100/0x1c0 [lustre]
[193322.043390]  [<ffffffff8941f240>] vfs_write+0xc0/0x1f0
[193322.050190]  [<ffffffff8942006f>] SyS_write+0x7f/0xf0
[193322.056876]  [<ffffffff8992579b>] system_call_fastpath+0x22/0x27
(END)
crash> bt
PID: 145892  TASK: ffff903f28371fa0  CPU: 37  COMMAND: "dd"
 #0 [ffff903e667d3418] machine_kexec at ffffffff89262a0a
 #1 [ffff903e667d3478] __crash_kexec at ffffffff893166c2
 #2 [ffff903e667d3548] panic at ffffffff8990d2aa
 #3 [ffff903e667d35c8] lbug_with_loc at ffffffffc0f191eb [libcfs]
 #4 [ffff903e667d3620] osc_extent_put at ffffffffc0d89366 [osc]
 #5 [ffff903e667d3640] osc_extent_find at ffffffffc0d9c29d [osc]
 #6 [ffff903e667d37d0] osc_queue_async_io at ffffffffc0d974e7 [osc]
 #7 [ffff903e667d3938] osc_page_cache_add at ffffffffc0d7f1f2 [osc]
 #8 [ffff903e667d3968] osc_io_commit_async at ffffffffc0d870b8 [osc]
 #9 [ffff903e667d39e0] cl_io_commit_async at ffffffffc0b0ca6a [obdclass]
#10 [ffff903e667d3a28] lov_io_commit_async at ffffffffc0e456d0 [lov]
#11 [ffff903e667d3a88] cl_io_commit_async at ffffffffc0b0ca6a [obdclass]
#12 [ffff903e667d3ad0] vvp_io_write_commit at ffffffffc11fa373 [lustre]
#13 [ffff903e667d3b28] vvp_io_write_start at ffffffffc11fb149 [lustre]
#14 [ffff903e667d3bf8] cl_io_start at ffffffffc0b0bf08 [obdclass]
#15 [ffff903e667d3c20] cl_io_loop at ffffffffc0b0dcfc [obdclass]
#16 [ffff903e667d3c50] ll_file_io_generic at ffffffffc11aa49a [lustre]
#17 [ffff903e667d3d58] ll_file_aio_write at ffffffffc11ab0e1 [lustre]
#18 [ffff903e667d3de0] ll_file_write at ffffffffc11ab4d0 [lustre]
#19 [ffff903e667d3ec8] vfs_write at ffffffff8941f240
#20 [ffff903e667d3f08] sys_write at ffffffff8942006f
#21 [ffff903e667d3f50] system_call_fastpath at ffffffff8992579b
    RIP: 00007feb0b2d19b0  RSP: 00007ffed85aaaa8  RFLAGS: 00010206
    RAX: 0000000000000001  RBX: 0000000000000000  RCX: 0000000000010000
    RDX: 0000000000000200  RSI: 0000000001e4e000  RDI: 0000000000000001
    RBP: 0000000000000000   R8: 0000000001e4e1f0   R9: 0000000000000000
    R10: 00007ffed85aa4a0  R11: 0000000000000246  R12: 0000000000000200
    R13: 0000000001e4e000  R14: 0000000001e4e200  R15: 0000000000000800
    ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b
crash> 

So I will rebase #20519 and hope reviewers to have a look.

 

Comment by Gerrit Updater [ 17/Mar/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/20519/
Subject: LU-8221 osc: fix for lu_ref feature in osc_extent_put()
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: dfa0036b5536ebc89177d70e994244877dca167e

Comment by Peter Jones [ 17/Mar/20 ]

Landed for 2.14

Generated at Sat Feb 10 02:15:40 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.