[LU-5522] ofd_prolong_extent_locks()) ASSERTION( lock->l_flags & 0x0000000000000020ULL ) failed Created: 20/Aug/14  Updated: 27/Apr/15  Resolved: 29/Oct/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0
Fix Version/s: Lustre 2.7.0

Type: Bug Priority: Minor
Reporter: Johann Lombardi (Inactive) Assignee: Johann Lombardi (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-6236 ost_prolong_locks()) ASSERTION( lock-... Resolved
Severity: 3
Rank (Obsolete): 15379

 Description   

LBUG hit while running soak testing (IOR on CNs + message drop on router) on lola:

<4>Lustre: 6834:0:(client.c:1926:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1408450946/real 1408450946]  req@ffff8802f1fc3000 x1476486607266588/t0(0) o105->soaked-OST0000@192.168.1.121@o2ib1:15/16 lens 360/224 e 0 to 1 dl 1408450953 ref 1 fl Rpc:X/0/ffffffff rc 0/-1
<0>LustreError: 6767:0:(ofd_dev.c:1809:ofd_prolong_extent_locks()) ASSERTION( lock->l_flags & 0x0000000000000020ULL ) failed: 
<0>LustreError: 6767:0:(ofd_dev.c:1809:ofd_prolong_extent_locks()) LBUG
<0>Kernel panic - not syncing: LBUG in interrupt.
<0>
<4>Pid: 6767, comm: ll_ost_io02_005 Tainted: P        W  ---------------    2.6.32-431.23.3.el6_lustre.gc0c4f13.x86_64 #1
<4>Call Trace:
<4> [<ffffffff81528dbc>] ? panic+0xa7/0x16f
<4> [<ffffffffa1089edd>] ? lbug_with_loc+0x8d/0xb0 [libcfs]
<4> [<ffffffffa1a0f7ac>] ? ofd_prolong_extent_locks+0x35c/0x390 [ofd]
<4> [<ffffffffa1a0fce9>] ? ofd_rw_hpreq_check+0xe9/0x350 [ofd]
<4> [<ffffffffa1444345>] ? req_capsule_client_get+0x15/0x20 [ptlrpc]
<4> [<ffffffffa1a0fb87>] ? ofd_hp_brw+0xd7/0x150 [ofd]
<4> [<ffffffffa147d2a2>] ? tgt_hpreq_handler+0xf2/0x2f0 [ptlrpc]
<4> [<ffffffffa1426cf6>] ? ptlrpc_server_handle_req_in+0x7c6/0xcd0 [ptlrpc]
<4> [<ffffffffa142ddfc>] ? ptlrpc_main+0x9ec/0x1990 [ptlrpc]
<4> [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
<4> [<ffffffff810623a9>] ? find_busiest_queue+0x69/0x150
<4> [<ffffffff815294ce>] ? thread_return+0x4e/0x760
<4> [<ffffffffa142d410>] ? ptlrpc_main+0x0/0x1990 [ptlrpc]
<4> [<ffffffff8109abf6>] ? kthread+0x96/0xa0
<4> [<ffffffff8100c20a>] ? child_rip+0xa/0x20
<4> [<ffffffff8109ab60>] ? kthread+0x0/0xa0
<4> [<ffffffff8100c200>] ? child_rip+0x0/0x20


 Comments   
Comment by Johann Lombardi (Inactive) [ 20/Aug/14 ]

Noticed the following warning earlier in the log:

<4>WARNING: at lib/list_debug.c:26 __list_add+0x6d/0xa0() (Tainted: P           ---------------   )
<4>Hardware name: S2600GZ ..........
<4>list_add corruption. next->prev should be prev (ffff8808305fae70), but was ffff8805da213328. (next=ffff8805da213328).
<4>Modules linked in: osp(U) ofd(U) lfsck(U) ost(U) mgc(U) osd_zfs(U) lquota(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ko2iblnd(U) ptlrpc(
U) obdclass(U) lnet(U) libcfs(U) sha512_generic sha256_generic crc32c_intel nfsd exportfs nfs lockd fscache auth_rpcgss nfs_acl sunrpc cpufreq_o
ndemand acpi_cpufreq freq_table mperf ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 microcode iTCO_wdt iTCO_vendor
_support sb_edac edac_core zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate lpc_ich mfd_core i2c_i801 ioatdma
 ses enclosure sg mlx4_ib ib_sa ib_mad ib_core mlx4_en mlx4_core igb dca i2c_algo_bit i2c_core ptp pps_core ext3 jbd mbcache sd_mod crc_t10dif a
hci isci libsas mpt2sas scsi_transport_sas raid_class wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: libcfs]
<4>Pid: 6804, comm: ll_ost03_011 Tainted: P           ---------------    2.6.32-431.23.3.el6_lustre.gc0c4f13.x86_64 #1
<4>Call Trace:
<4> [<ffffffff81071b37>] ? warn_slowpath_common+0x87/0xc0
<4> [<ffffffff81071c26>] ? warn_slowpath_fmt+0x46/0x50
<4> [<ffffffff8129589d>] ? __list_add+0x6d/0xa0
<4> [<ffffffffa13fba1a>] ? ldlm_add_waiting_lock+0x1fa/0x310 [ptlrpc]
<4> [<ffffffffa13fd823>] ? ldlm_server_blocking_ast+0x423/0x8a0 [ptlrpc]
<4> [<ffffffffa147975b>] ? tgt_blocking_ast+0x7b/0x7e0 [ptlrpc]
<4> [<ffffffffa02ef9e9>] ? dbuf_dirty+0x4c9/0x840 [zfs]
<4> [<ffffffffa13ce4ea>] ? ldlm_add_bl_work_item+0x8a/0x1e0 [ptlrpc]
<4> [<ffffffffa13ce695>] ? ldlm_add_ast_work_item+0x55/0x150 [ptlrpc]
<4> [<ffffffffa13d0e3d>] ? ldlm_work_bl_ast_lock+0xdd/0x290 [ptlrpc]
<4> [<ffffffffa1411f8c>] ? ptlrpc_set_wait+0x6c/0x860 [ptlrpc]
<4> [<ffffffffa1622257>] ? kiblnd_launch_tx+0xf7/0xa80 [ko2iblnd]
<4> [<ffffffffa140d842>] ? ptlrpc_prep_set+0x112/0x2e0 [ptlrpc]
<4> [<ffffffffa13d0d60>] ? ldlm_work_bl_ast_lock+0x0/0x290 [ptlrpc]
<4> [<ffffffffa13d2f6b>] ? ldlm_run_ast_work+0x1db/0x470 [ptlrpc]
<4> [<ffffffffa13ea315>] ? ldlm_process_extent_lock+0x155/0xab0 [ptlrpc]
<4> [<ffffffffa13d283d>] ? ldlm_lock_enqueue+0x41d/0x970 [ptlrpc]
<4> [<ffffffffa13f1fc6>] ? ldlm_cli_enqueue_local+0x186/0x5d0 [ptlrpc]
<4> [<ffffffffa13f2410>] ? ldlm_completion_ast+0x0/0x930 [ptlrpc]
<4> [<ffffffffa13f0db0>] ? ldlm_blocking_ast+0x0/0x180 [ptlrpc]
<4> [<ffffffffa1a23014>] ? ofd_destroy_by_fid+0x344/0x620 [ofd]
<4> [<ffffffffa13f0db0>] ? ldlm_blocking_ast+0x0/0x180 [ptlrpc]
<4> [<ffffffffa13f2410>] ? ldlm_completion_ast+0x0/0x930 [ptlrpc]
<4> [<ffffffffa1a1c96a>] ? ofd_destroy_hdl+0x2fa/0xb60 [ofd]
<4> [<ffffffffa147f21e>] ? tgt_request_handle+0x71e/0xb10 [ptlrpc]
<4> [<ffffffffa142e274>] ? ptlrpc_main+0xe64/0x1990 [ptlrpc]
Comment by Johann Lombardi (Inactive) [ 28/Aug/14 ]

http://review.whamcloud.com/11634

Comment by Johann Lombardi (Inactive) [ 29/Oct/14 ]

Patch landed.

Comment by Gerrit Updater [ 12/Feb/15 ]

Grégoire Pichon (gregoire.pichon@bull.net) uploaded a new patch: http://review.whamcloud.com/13744
Subject: LU-5522 ldlm: remove expired lock from per-export list
Project: fs/lustre-release
Branch: b2_5
Current Patch Set: 1
Commit: b95a885461dab9c07410ee52e24a7d25318f55a1

Generated at Sat Feb 10 01:52:12 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.