[LU-3073] Test failure on test suite sanity test_120a Created: 31/Mar/13  Updated: 20/Apr/13  Resolved: 20/Apr/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: Lustre 2.4.0

Type: Bug Priority: Blocker
Reporter: Maloo Assignee: Hongchao Zhang
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 7477

 Description   

This issue was created by maloo for nasf <fan.yong@intel.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/9521ae5c-99ca-11e2-8bb4-52540035b04c.

== sanity test 120a: Early Lock Cancel: mkdir test == 13:08:00 (1364674080)
ldlm.namespaces.lustre-MDT0000-mdc-ffff88006796e400.lru_size=100
ldlm.namespaces.lustre-OST0000-osc-ffff88006796e400.lru_size=100
ldlm.namespaces.lustre-OST0001-osc-ffff88006796e400.lru_size=100
ldlm.namespaces.lustre-OST0002-osc-ffff88006796e400.lru_size=100
ldlm.namespaces.lustre-OST0003-osc-ffff88006796e400.lru_size=100
ldlm.namespaces.lustre-OST0004-osc-ffff88006796e400.lru_size=100
ldlm.namespaces.lustre-OST0005-osc-ffff88006796e400.lru_size=100
ldlm.namespaces.lustre-OST0006-osc-ffff88006796e400.lru_size=100
error: get_param: /proc/

{fs,sys}/{lnet,lustre}/ldlm/services/ldlm_canceld/stats: Found no match
error: get_param: /proc/{fs,sys}

/

{lnet,lustre}

/ldlm/services/ldlm_canceld/stats: Found no match
sanity test_120a: @@@@@@ FAIL: 1 blocking RPC occured.



 Comments   
Comment by Jian Yu [ 07/Apr/13 ]

Another instance:
https://maloo.whamcloud.com/test_sets/54fc0ca8-9f7b-11e2-86dc-52540035b04c

Comment by Jian Yu [ 14/Apr/13 ]

I hit this issue again while testing patch http://review.whamcloud.com/#change,6027 :
https://maloo.whamcloud.com/test_sets/e59db8e4-a40b-11e2-80ee-52540035b04c

Comment by Keith Mannthey (Inactive) [ 16/Apr/13 ]

Another here:
https://maloo.whamcloud.com/test_sets/7ed97bdc-a41c-11e2-80ee-52540035b04c

Comment by Hongchao Zhang [ 16/Apr/13 ]

this ticket is caused by the asynchronous destroy at MDT, which triggers a blocking AST at client side.

the log of OST

00000100:00100000:0.0:1364674080.749546:0:25979:0:(service.c:1975:ptlrpc_server_handle_request()) Handling RPC pname:cluuid+ref:pid:xid:nid:opc ll_ost00_029:lustre-MDT0000-mdtlov_UUID+5:2920:x1430961061754764:12345-10.10.16.143@tcp:6
.......

00010000:00000001:0.0:1364674080.749677:0:25979:0:(ldlm_lock.c:751:ldlm_add_ast_work_item()) Process entered
00010000:00010000:0.0:1364674080.749680:0:25979:0:(ldlm_lock.c:714:ldlm_add_bl_work_item()) ### lock incompatible; sending blocking AST. ns: filter-ffff880037068000 lock: ffff880069322400/0x77cac32f14b081ad lrc: 2/0,0 mode: PW/PW res: 2005/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x0 nid: 10.10.16.146@tcp remote: 0x24fbfe568768a43c expref: 73 pid: 6272 timeout: 0 lvb_type: 0
00010000:00000001:0.0:1364674080.749687:0:25979:0:(ldlm_lock.c:757:ldlm_add_ast_work_item()) Process leaving
00010000:00000001:0.0:1364674080.749689:0:25979:0:(ldlm_extent.c:363:ldlm_extent_compat_cb()) Process leaving (rc=1 : 1 : 1)
00010000:00010000:0.0:1364674080.749691:0:25979:0:(ldlm_extent.c:314:ldlm_check_contention()) contended locks = 1
00010000:00000001:0.0:1364674080.749693:0:25979:0:(ldlm_extent.c:635:ldlm_extent_compat_queue()) Process leaving (rc=0 : 0 : 0)
00010000:00000001:0.0:1364674080.749695:0:25979:0:(ldlm_extent.c:392:ldlm_extent_compat_queue()) Process entered
00010000:00010000:0.0:1364674080.749696:0:25979:0:(ldlm_extent.c:314:ldlm_check_contention()) contended locks = 1
00010000:00000001:0.0:1364674080.749697:0:25979:0:(ldlm_extent.c:635:ldlm_extent_compat_queue()) Process leaving (rc=1 : 1 : 1)

.......

00010000:00010000:0.0:1364674080.749850:0:25979:0:(ldlm_lockd.c:896:ldlm_server_blocking_ast()) ### server preparing blocking AST ns: filter-ffff880037068000 lock: ffff880069322400/0x77cac32f14b081ad lrc: 3/0,0 mode: PW/PW res: 2005/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x10020 nid: 10.10.16.146@tcp remote: 0x24fbfe568768a43c expref: 73 pid: 6272 timeout: 0 lvb_type: 0
00010000:00010000:0.0:1364674080.749860:0:25979:0:(ldlm_lockd.c:506:ldlm_add_waiting_lock()) ### adding to wait list(timeout: 100, AT: on) ns: filter-ffff880037068000 lock: ffff880069322400/0x77cac32f14b081ad lrc: 4/0,0 mode: PW/PW res: 2005/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x10020 nid: 10.10.16.146@tcp remote: 0x24fbfe568768a43c expref: 73 pid: 6272 timeout: 4298369097 lvb_type: 0
00010000:000000

the log of Client

00010000:00010000:0.0:1364674080.750726:0:22778:0:(ldlm_lockd.c:1686:ldlm_handle_bl_callback()) ### client blocking AST callback handler ns: lustre-OST0000-osc-ffff88006796e400 lock: ffff8800678afc00/0x24fbfe568768a43c lrc: 3/0,0 mode: PW/PW res: 2005/0 rrc: 1 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x420000010000 nid: local remote: 0x77cac32f14b081ad expref: -99 pid: 23659 timeout: 0 lvb_type: 1
00010000:00010000:0.0:1364674080.750730:0:22778:0:(ldlm_lockd.c:1699:ldlm_handle_bl_callback()) Lock ffff8800678afc00 already unused, calling callback (ffffffffa099b370)
00000020:00010000:0.0:1364674080.750764:0:22778:0:(cl_lock.c:150:cl_lock_trace0()) cancel lock: ffff880068a3f898@(1 ffff88007a3beae0 1 5 0 0 0 0)(ffff880079972db8/1/1) at cl_lock_cancel():1838
00000008:00010000:0.0:1364674080.750771:0:22778:0:(osc_lock.c:1319:osc_lock_flush()) ### lock ffff880068a3f898: 0 pages were discarded.
ns: lustre-OST0000-osc-ffff88006796e400 lock: ffff8800678afc00/0x24fbfe568768a43c lrc: 3/0,0 mode: PW/PW res: 2005/0 rrc: 1 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x420400010000 nid: local remote: 0x77cac32f14b081ad expref: -99 pid: 23659 timeout: 0 lvb_type: 1
00000080:00200000:0.0:1364674080.751365:0:22778:0:(vvp_io.c:128:vvp_io_fini()) ignore/verify layout 1/0, layout version 0.
00010000:00010000:0.0:1364674080.751368:0:22778:0:(ldlm_request.c:1127:ldlm_cli_cancel_local()) ### client-side cancel ns: lustre-OST0000-osc-ffff88006796e400 lock: ffff8800678afc00/0x24fbfe568768a43c lrc: 4/0,0 mode: PW/PW res: 2005/0 rrc: 1 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x428400010000 nid: local remote: 0x77cac32f14b081ad expref: -99 pid: 23659 timeout: 0 lvb_type: 1

Comment by Hongchao Zhang [ 18/Apr/13 ]

the patch is tracked at http://review.whamcloud.com/#change,6088

Comment by Peter Jones [ 20/Apr/13 ]

Landed for 2.4

Generated at Sat Feb 10 01:30:44 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.