[LU-4581] ASSERTION( (!(page->cp_type == CPT_CACHEABLE) || (!PageWriteback(cl_page_vmpage(env, page)))) ) failed: Created: 04/Feb/14  Updated: 05/Aug/15  Resolved: 31/Jul/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.1
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Mahmoud Hanafi Assignee: Jinshan Xiong (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

client: slessp2 2.4.1-3nas


Attachments: Text File iwc151.lbug.txt    
Issue Links:
Duplicate
duplicates LU-2779 LBUG in discard_cb: !(page->cp_type =... Resolved
Related
is related to LU-4555 Patched (LU-2779) 2.4.1 Lustre Client... Resolved
Severity: 3
Rank (Obsolete): 12522

 Description   
3891131.052019] LustreError: 27617:0:(cl_lock.c:1964:discard_cb()) ASSERTION( (!(page->cp_type == CPT_CACHEABLE) || (!PageWriteback(cl_page_vmpage(env, page)))) ) failed: ^M
[3891131.067131] LustreError: 27617:0:(cl_lock.c:1964:discard_cb()) LBUG^M
[3891131.073568] Pid: 27617, comm: geos.xco2.osse.^M


PID: 27617  TASK: ffff8809be7a04c0  CPU: 21  COMMAND: "geos.xco2.osse."
 #0 [ffff88103321f620] panic at ffffffff81450800
 #1 [ffff88103321f6a0] lbug_with_loc at ffffffffa04c2dc3 [libcfs]
 #2 [ffff88103321f6c0] discard_cb at ffffffffa0633774 [obdclass]
 #3 [ffff88103321f6f0] cl_page_gang_lookup at ffffffffa06309d6 [obdclass]
 #4 [ffff88103321f780] cl_lock_discard_pages at ffffffffa063346a [obdclass]
 #5 [ffff88103321f7c0] osc_lock_flush at ffffffffa08fe272 [osc]
 #6 [ffff88103321f810] osc_lock_cancel at ffffffffa08fe4d9 [osc]
 #7 [ffff88103321f850] cl_lock_cancel0 at ffffffffa0631155 [obdclass]
 #8 [ffff88103321f870] cl_lock_cancel at ffffffffa0631eab [obdclass]
 #9 [ffff88103321f890] osc_lock_blocking at ffffffffa08ff01d [osc]
#10 [ffff88103321f8c0] osc_dlm_blocking_ast0 at ffffffffa08ff8c9 [osc]
#11 [ffff88103321f900] osc_ldlm_blocking_ast at ffffffffa08ffa2c [osc]
#12 [ffff88103321f940] ldlm_cancel_callback at ffffffffa0742e0f [ptlrpc]
#13 [ffff88103321f950] ldlm_cli_cancel_local at ffffffffa075121f [ptlrpc]
#14 [ffff88103321f970] ldlm_cli_cancel_list_local at ffffffffa07545b2 [ptlrpc]
#15 [ffff88103321f9d0] ldlm_prep_elc_req at ffffffffa07563bf [ptlrpc]
#16 [ffff88103321fa40] ldlm_prep_enqueue_req at ffffffffa075648f [ptlrpc]
#17 [ffff88103321fa50] osc_enqueue_base at ffffffffa08e525f [osc]
#18 [ffff88103321faf0] osc_lock_enqueue at ffffffffa08ff440 [osc]
#19 [ffff88103321fb60] cl_enqueue_kick at ffffffffa0632652 [obdclass]
#20 [ffff88103321fb90] cl_enqueue_try at ffffffffa0635961 [obdclass]
#21 [ffff88103321fbc0] lov_lock_enqueue_one at ffffffffa0995d8d [lov]
#22 [ffff88103321fbf0] lov_lock_enqueue at ffffffffa09984bb [lov]
#23 [ffff88103321fc60] cl_enqueue_kick at ffffffffa0632652 [obdclass]
#24 [ffff88103321fc90] cl_enqueue_try at ffffffffa0635961 [obdclass]
#25 [ffff88103321fcc0] cl_enqueue_locked at ffffffffa0636717 [obdclass]
#26 [ffff88103321fcf0] cl_lock_request at ffffffffa06373e9 [obdclass]
#27 [ffff88103321fd40] cl_glimpse_lock at ffffffffa0a6ae5d [lustre]
#28 [ffff88103321fda0] cl_glimpse_size0 at ffffffffa0a6b31f [lustre]
#29 [ffff88103321fdf0] ll_glimpse_size at ffffffffa0a19695 [lustre]
#30 [ffff88103321fe10] ll_inode_revalidate_it at ffffffffa0a1eb68 [lustre]
#31 [ffff88103321fe40] ll_getattr_it at ffffffffa0a1ebae [lustre]
#32 [ffff88103321fe70] ll_getattr at ffffffffa0a1ecff [lustre]
#33 [ffff88103321fed0] vfs_fstat at ffffffff81155d27
#34 [ffff88103321fef0] sys_newfstat at ffffffff81155d6f
#35 [ffff88103321ff80] system_call_fastpath at ffffffff8145b412
    RIP: 00002aaaacfe9394  RSP: 00007fffffff5a20  RFLAGS: 00000202
    RAX: 0000000000000005  RBX: ffffffff8145b412  RCX: 0000000000000001
    RDX: 00007fffffdafbf8  RSI: 00007fffffdafbf8  RDI: 0000000000000009
    RBP: 00007fffffdb1cc0   R8: 0000000000000000   R9: 000000000000000a
    R10: 00007fffffdac960  R11: 0000000000000246  R12: 0000000000000001
    R13: 00007fffffdb1dc0  R14: 0000000000000000  R15: 00007fffffdb1cc0
    ORIG_RAX: 0000000000000005  CS: 0033  SS: 002b



 Comments   
Comment by Peter Jones [ 04/Feb/14 ]

Jinshan will advise on this one

Comment by Jinshan Xiong (Inactive) [ 04/Feb/14 ]

is this a reproduction of LU-2779?

Can you please try to apply patch http://review.whamcloud.com/5419 to your branch and see if it can help?

Jinshan

Comment by Bruno Faccini (Inactive) [ 05/Feb/14 ]

Hello Jinshan,
Looks like the same issue has been reported in LU-4555 for 2.4.1, and where patch http://review.whamcloud.com/5419 from LU-2779 has been already applied …

Comment by Mahmoud Hanafi [ 05/Feb/14 ]

We didn't have LU-2779 applied but it don't look will fix the issue any ways.
We have crash dump from the client and can be provided to a US citizen at intel.

Comment by Jinshan Xiong (Inactive) [ 05/Feb/14 ]

Hi Mahmoud,

In that case, please apply the patch on LU-2779 and then reproduce the problem, I think this way it will help us locate the problem.

Hi Oz,

If you happen to have a stack trace, please post it here. Thanks.

Jinshan

Comment by Cliff White (Inactive) [ 05/Feb/14 ]

You can provide it to me,(cliff.white@intel.com) as you did with the previous dump. Can you get a dump with the LU-2779 patch applied? We believe that would give us further data to diagnose the issue.

Comment by Oz Rentas [ 13/Feb/14 ]

From the customer -
"Unfortunately, there was no crash dump associated with this crash.
All I can find on the client is the error messages from the console output (sent previously):

LustreError: 17450:0:(cl_lock.c:1964:discard_cb()) ASSERTION( (!(page->cp_type == CPT_CACHEABLE) || (!PageWriteback(cl_page_vmpage(env, page)))) ) failed:
LustreError: 17450:0:(cl_lock.c:1964:discard_cb()) LBUG Kernel panic - not syncing: LBUG
Pid: 17450, comm: tar Tainted: GF

Please confirm direction in the absence of a crash."

Comment by Jinshan Xiong (Inactive) [ 13/Feb/14 ]

In that case, please apply the patch 5419 and probably we can get a new stack trace, then we can work from there.

Comment by Oz Rentas [ 18/Feb/14 ]

Hi, this isn't clear to me.

The patch from LU-2779 (5419) has already been applied.

This is the issue, the problem reoccurs even after the patch has been applied.

This issue was initially reported in (LU-4555), but LU-4555 was closed as a duplicate of this ticket.

Comment by Jinshan Xiong (Inactive) [ 19/Feb/14 ]

Hi Oz,

Is it possible to enable D_CACHE on the client side and collect some logs?

Jinshan

Comment by Cliff White (Inactive) [ 19/Feb/14 ]

Mahmoud,
As stated above, you can send crash dumps to me, cliff.white@intel.com - I am a US citizen, and handled the last dump analysis. Please advise be email when you do this. We would very much like to get some further debugging information, logs from the clients with D_CACHE enabled would be good, a crash dump with the 5419 patch would also be useful.

Comment by Mahmoud Hanafi [ 19/Feb/14 ]

With the patch we have not hit this lbug. But we only hit this a few times, even without the patch. I can send you the dump for the initial crash if you still need it.

Comment by Cliff White (Inactive) [ 19/Feb/14 ]

Mahmoud, I think at the moment we are okay.

Comment by John Fuchs-Chesney (Inactive) [ 14/Mar/14 ]

Hello Mahmoud,
I see from the comments above that that patch we supplied has been merged.
Do you want us to keep this ticket open?
Thanks,
~ jfc.

Comment by John Fuchs-Chesney (Inactive) [ 29/Mar/14 ]

Looks like the patch we supplied has fixed this problem, as far as we can tell.
~ jfc.

Generated at Sat Feb 10 01:44:01 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.