[LU-9171] GPF in osc_page_gang_lookup doing ELC with ldlm_cancel_no_wait_policy() Created: 01/Mar/17  Updated: 10/Jan/19  Resolved: 19/Apr/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.10.0

Type: Bug Priority: Minor
Reporter: Andriy Skulysh Assignee: WC Triage
Resolution: Fixed Votes: 0
Labels: None

Attachments: File MRP-4179-test.diff    
Issue Links:
Duplicate
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

> [ 9337.400497] general protection fault: 0000 1 SMP
> [ 9337.513717] CPU: 14 PID: 115720 Comm: ll_agl_31230 Tainted: G W OE NX 3.12.60-52.57.1.11767.0.PTF.996988-default #1
> [ 9337.526131] Hardware name: /0CNCJW, BIOS 1.3.6 06/03/2015
> [ 9337.532441] task: ffff88027aaa8240 ti: ffff8802d9458000 task.ti: ffff8802d9458000
> [ 9337.540785] RIP: 0010:[<ffffffffa0b20416>] [<ffffffffa0b20416>] osc_page_gang_lookup+0x156/0x870 [osc]
> [ 9337.551282] RSP: 0018:ffff8802d9459990 EFLAGS: 00010202
> [ 9337.557204] RAX: 0000000000000001 RBX: 0000000000000000 RCX: ffff886358d97470
> [ 9337.565161] RDX: 0000000000000000 RSI: ffff8802d9459950 RDI: ffff8802d65c7170
> [ 9337.573117] RBP: ffff886358d97648 R08: 0000000000000000 R09: 0000000000000001
> [ 9337.581074] R10: 0000000000000000 R11: 0000000000000001 R12: 5a5a5a5a5a5a5a5a
> [ 9337.589030] R13: ffff8802d73ae2d0 R14: ffff886358d97470 R15: ffff8802d65c7060
> [ 9337.596986] FS: 0000000000000000(0000) GS:ffff88607f4e0000(0000) knlGS:0000000000000000
> [ 9337.606021] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 9337.612438] CR2: 00007f6942c05de8 CR3: 00000063488c2000 CR4: 00000000001407e0
> [ 9337.620393] Stack:
> [ 9337.622632] ffffffffffffffff ffff880106b62000 ffff8862ff8cd680 ffff886358d97550
> [ 9337.630921] ffffffffa0b07140 00ff8862ff8cd680 ffff886300000001 0000000000000000
> [ 9337.639203] ffff8802d65c7180 ffff886358d97470 ffff8802d65c7170 ffff886358d97650
> [ 9337.647486] Call Trace:
> [ 9337.650248] [<ffffffffa0b09aa0>] osc_ldlm_weigh_ast+0x340/0x480 [osc]
> [ 9337.657560] [<ffffffffa0af67fb>] osc_cancel_weight+0x3b/0xa0 [osc]
> [ 9337.664586] [<ffffffffa0886d9b>] ldlm_cancel_no_wait_policy+0x2b/0x90 [ptlrpc]
> [ 9337.672765] [<ffffffffa08887a1>] ldlm_prepare_lru_list+0x221/0x500 [ptlrpc]
> [ 9337.680653] [<ffffffffa088cfc5>] ldlm_cancel_lru_local+0x15/0x40 [ptlrpc]
> [ 9337.688347] [<ffffffffa088d1fc>] ldlm_prep_elc_req+0x20c/0x480 [ptlrpc]
> [ 9337.695847] [<ffffffffa088d494>] ldlm_prep_enqueue_req+0x24/0x30 [ptlrpc]
> [ 9337.703529] [<ffffffffa0affbd1>] osc_enqueue_base+0x1c1/0x6e0 [osc]
> [ 9337.710622] [<ffffffffa0b09097>] osc_lock_enqueue+0x357/0xa00 [osc]
> [ 9337.717756] [<ffffffffa0bff793>] cl_lock_enqueue+0x63/0x120 [obdclass]
> [ 9337.725180] [<ffffffffa0a9cecc>] lov_lock_enqueue+0x9c/0x170 [lov]
> [ 9337.732179] [<ffffffffa0bff793>] cl_lock_enqueue+0x63/0x120 [obdclass]
> [ 9337.739600] [<ffffffffa0bffce2>] cl_lock_request+0x62/0x1e0 [obdclass]
> [ 9337.747040] [<ffffffffa0e6e487>] cl_glimpse_lock+0x337/0x3d0 [lustre]
> [ 9337.754359] [<ffffffffa0e6e7e7>] cl_glimpse_size0+0x1b7/0x1c0 [lustre]
> [ 9337.761769] [<ffffffffa0e69b65>] ll_agl_trigger+0x115/0x4a0 [lustre]
> [ 9337.768982] [<ffffffffa0e6a04d>] ll_agl_thread+0x15d/0x4b0 [lustre]
> [ 9337.776098] [<ffffffff81077874>] kthread+0xb4/0xc0
> [ 9337.781530] [<ffffffff81523498>] ret_from_fork+0x58/0x90



 Comments   
Comment by Gerrit Updater [ 01/Mar/17 ]

Andriy Skulysh (andriy.skulysh@seagate.com) uploaded a new patch: https://review.whamcloud.com/25700
Subject: LU-9171 osc: GPF while doing ELC with no_wait_policy
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: c609c91808fee9f308fbbfcf487397918d03065a

Comment by Andriy Skulysh [ 01/Mar/17 ]

Attached test reproduced the crash but it works only with 2.6.32 kernel.
Here is resulting failure
{noformat}
crash> bt
PID: 28473  TASK: ffff880058a9e040  CPU: 0   COMMAND: "stat"
 #0 [ffff880059c0b5f0] machine_kexec at ffffffff81038f3b
 #1 [ffff880059c0b650] crash_kexec at ffffffff810c5b02
 #2 [ffff880059c0b720] oops_end at ffffffff81529030
 #3 [ffff880059c0b750] die at ffffffff81010e0b
 #4 [ffff880059c0b780] do_general_protection at ffffffff81528b32
 #5 [ffff880059c0b7b0] general_protection at ffffffff81528305
    [exception RIP: osc_ldlm_weigh_ast+306]
    RIP: ffffffffa09bcec2  RSP: ffff880059c0b868  RFLAGS: 00010292
    RAX: 5a5a5a5a5a5a5a02  RBX: ffff88004c729c78  RCX: ffff88004c729da0
    RDX: 5a5a5a5a5a5a5a5a  RSI: 0000000000000000  RDI: ffff88004c729db0
    RBP: ffff880059c0b8d8   R8: 0000000000000000   R9: ffffffff816457a0
    R10: 0000000000000001  R11: 0000000000000000  R12: ffff88005a690768
    R13: ffff880059c0b8a6  R14: ffff880057b9eac0  R15: ffff88004c729db0
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
{noformat}

Comment by Gerrit Updater [ 19/Apr/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/25700/
Subject: LU-9171 osc: GPF while doing ELC with no_wait_policy
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: e0850b31ccd5597b564d4385a424bfa6be6f2f3e

Comment by Peter Jones [ 19/Apr/17 ]

Landed for 2.10

Generated at Sat Feb 10 02:23:51 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.