[LU-5284] GPF in radix_tree_lookup_slot on OSS Created: 02/Jul/14  Updated: 07/Jun/17  Resolved: 07/Jun/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.6
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Sebastien Piechurski Assignee: Bruno Faccini (Inactive)
Resolution: Cannot Reproduce Votes: 0
Labels: p4b
Environment:

bullx supercomputer suite


Attachments: Text File Dump-analysis-140310-BS0206.txt    
Severity: 3
Rank (Obsolete): 14747

 Description   

OSS hits a general protection fault with the following trace:

PID: 22785  TASK: ffff880612f90830  CPU: 1   COMMAND: "ll_ost_io_1011"
 #0 [ffff880612fd7620] machine_kexec at ffffffff8102902b
 #1 [ffff880612fd7680] crash_kexec at ffffffff810a5292
 #2 [ffff880612fd7750] oops_end at ffffffff8149a050
 #3 [ffff880612fd7780] die at ffffffff8100714b
 #4 [ffff880612fd77b0] do_general_protection at ffffffff81499be2
 #5 [ffff880612fd77e0] general_protection at ffffffff814993b5
    [exception RIP: radix_tree_lookup_slot+5]
    RIP: ffffffff81261465  RSP: ffff880612fd7890  RFLAGS: 00010286
    RAX: e940201000000010  RBX: e940201000000008  RCX: 0000000000000000
    RDX: 00000000000200d2  RSI: 0000000000000000  RDI: e940201000000008
    RBP: ffff880612fd78b0   R8: ffff880612fdc140   R9: 0000000000000008
    R10: 0000000000001000  R11: 0000000000000001  R12: 0000000000000000
    R13: 0000000000000000  R14: e940201000000000  R15: 20105fa000080221
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #6 [ffff880612fd7898] find_get_page at ffffffff810ffe8e
 #7 [ffff880612fd78b8] find_lock_page at ffffffff8110112a
 #8 [ffff880612fd78e8] find_or_create_page at ffffffff8110129f
 #9 [ffff880612fd7938] filter_get_page at ffffffffa0c4b065 [obdfilter]
#10 [ffff880612fd7968] filter_preprw_read at ffffffffa0c4d64d [obdfilter]
#11 [ffff880612fd7a98] filter_preprw at ffffffffa0c4dedc [obdfilter]
#12 [ffff880612fd7ad8] obd_preprw at ffffffffa0c09051 [ost]
#13 [ffff880612fd7b48] ost_brw_read at ffffffffa0c10091 [ost]
#14 [ffff880612fd7c88] ost_handle at ffffffffa0c16423 [ost]
#15 [ffff880612fd7da8] ptlrpc_main at ffffffffa07fd4e6 [ptlrpc]
#16 [ffff880612fd7f48] kernel_thread at ffffffff8100412a

Taking a look at the dump, it seems there is a race leading to corruption of the struct inode which is passed from filter_preprw_read to filter_get_page.

I attached my complete dump analysis log.

I can also upload the dump if you request it.



 Comments   
Comment by Peter Jones [ 02/Jul/14 ]

Bruno

Could you please look into this one?

Thanks

Peter

Comment by Bruno Faccini (Inactive) [ 02/Jul/14 ]

Hello Seb,
Thanks for your crash-dump analysis log! But can you check for me if %rbp value at the time of filter_get_page() call in filter_preprw_read() is really 0xffff880040aa57c0? It should be saved at 0xffff880612fd7970 location in stack, but it is better to confirm this by disassembling the 1st instructions of filter_get_page() ...
Also can you check if the inode address you found (0xffff880040aa57c0) is part of one of the inodes slab area/cache ?

Or may be you can upload the crash-dump if you want ?

Comment by Sebastien Piechurski [ 02/Jul/14 ]

As seen in the stack of filter_get_page(), the base pointer of filter_preprw_read() is really 0xffff880612fd7a90 (I think you swapped the addresses in your comment with the one of the inode struct):

#9 [ffff880612fd7938] filter_get_page at ffffffffa0c4b065 [obdfilter]
    ffff880612fd7940: 00000000531db489 000000001c4be826 
    ffff880612fd7950: ffff880612fd7c4c ffff880610f480b8 
    ffff880612fd7960: ffff880612fd7a90 ffffffffa0c4d64d 
#10 [ffff880612fd7968] filter_preprw_read at ffffffffa0c4d64d [obdfilter]
[...]
    ffff880612fd7a80: ffff880091cab1c8 ffff8805235be548 
    ffff880612fd7a90: ffff880612fd7ad0 ffffffffa0c4dedc 
#11 [ffff880612fd7a98] filter_preprw at ffffffffa0c4dedc [obdfilter]

Looking for the struct address in the slab, it would seem the ldiskfs_inode_cache slab itself is corrupted:

crash> kmem -s ffff880040aa57c0
kmem: ldiskfs_inode_cache: full list: slab: ffff880040aa5280  bad next pointer: 0
kmem: ldiskfs_inode_cache: full list: slab: ffff880040aa5280  bad prev pointer: 0
kmem: ldiskfs_inode_cache: full list: slab: ffff880040aa5280  bad inuse counter: 0
kmem: ldiskfs_inode_cache: full list: slab: ffff880040aa5280  bad s_mem pointer: 0
kmem: ldiskfs_inode_cache: partial list: slab: ffff880040aa5280  bad next pointer: 0
kmem: ldiskfs_inode_cache: partial list: slab: ffff880040aa5280  bad s_mem pointer: 0
kmem: ldiskfs_inode_cache: full list: slab: ffff880040aa5280  bad next pointer: 0
kmem: ldiskfs_inode_cache: full list: slab: ffff880040aa5280  bad s_mem pointer: 0
kmem: ldiskfs_inode_cache: free list: slab: ffff880040aa5280  bad next pointer: 0
kmem: ldiskfs_inode_cache: free list: slab: ffff880040aa5280  bad s_mem pointer: 0
kmem: ldiskfs_inode_cache: partial list: slab: ffff880040aa5280  bad next pointer: 0
kmem: ldiskfs_inode_cache: partial list: slab: ffff880040aa5280  bad s_mem pointer: 0
kmem: ldiskfs_inode_cache: full list: slab: ffff880040aa5280  bad next pointer: 0
kmem: ldiskfs_inode_cache: full list: slab: ffff880040aa5280  bad s_mem pointer: 0
kmem: ldiskfs_inode_cache: free list: slab: ffff880040aa5280  bad next pointer: 0
kmem: ldiskfs_inode_cache: free list: slab: ffff880040aa5280  bad s_mem pointer: 0
kmem: ldiskfs_inode_cache: address not found in cache: ffff880040aa57c0
kmem: 16 errors encountered

I am uploading the dump as well on the ftp site. Should be available tomorrow.

Comment by Bruno Faccini (Inactive) [ 03/Jul/14 ]

Seb, Yes you are right, I should have written 0xffff880612fd7960 for the location/save of %rbp value at the time of filter_get_page() call in filter_preprw_read(), and I in fact since it was not available in your crash extracts for cut+paste, I have simply used 0xffff880612fd7970 but then have forgotten to manually modify it !!!

And BTW, the "kmem" sub-command confirms the validity of the inode address and its errors confirm the Slab corruption !!...

Thanks for uploading the crash-dump.

Comment by Bruno Faccini (Inactive) [ 28/Jul/14 ]

Crash dump analysis findings :

_ there are 2 additional threads with references to the same page/slab/inode address range, they both triggered the same GPF too, with the same stack!

crash> bt 21929 22191
PID: 21929  TASK: ffff880c12206830  CPU: 2   COMMAND: "ll_ost_io_155"
 #0 [ffff880c12277780] die at ffffffff8100711e
 #1 [ffff880c122777b0] do_general_protection at ffffffff81499be2
 #2 [ffff880c122777e0] general_protection at ffffffff814993b5
    [exception RIP: radix_tree_lookup_slot+5]
    RIP: ffffffff81261465  RSP: ffff880c12277890  RFLAGS: 00010286
    RAX: e940201000000010  RBX: e940201000000008  RCX: 0000000000000000
    RDX: 00000000000200d2  RSI: 0000000000000008  RDI: e940201000000008
    RBP: ffff880c122778b0   R8: ffff880614e926c0   R9: 00000000000000f8
    R10: 0000000000001000  R11: 0000000000000001  R12: 0000000000000008
    R13: 0000000000000008  R14: e940201000000000  R15: 20105fa000080221
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #3 [ffff880c12277898] find_get_page at ffffffff810ffe8e
 #4 [ffff880c122778b8] find_lock_page at ffffffff8110112a
 #5 [ffff880c122778e8] find_or_create_page at ffffffff8110129f
 #6 [ffff880c12277938] filter_get_page at ffffffffa0c4b065 [obdfilter]
 #7 [ffff880c12277968] filter_preprw_read at ffffffffa0c4d64d [obdfilter]
 #8 [ffff880c12277a98] filter_preprw at ffffffffa0c4dedc [obdfilter]
 #9 [ffff880c12277ad8] obd_preprw at ffffffffa0c09051 [ost]
#10 [ffff880c12277b48] ost_brw_read at ffffffffa0c10091 [ost]
#11 [ffff880c12277c88] ost_handle at ffffffffa0c16423 [ost]
#12 [ffff880c12277da8] ptlrpc_main at ffffffffa07fd4e6 [ptlrpc]
#13 [ffff880c12277f48] kernel_thread at ffffffff8100412a

PID: 22191  TASK: ffff8806144fd0c0  CPU: 0   COMMAND: "ll_ost_io_417"
 #0 [ffff880614617780] die at ffffffff8100711e
 #1 [ffff8806146177b0] do_general_protection at ffffffff81499be2
 #2 [ffff8806146177e0] general_protection at ffffffff814993b5
    [exception RIP: radix_tree_lookup_slot+5]
    RIP: ffffffff81261465  RSP: ffff880614617890  RFLAGS: 00010286
    RAX: e940201000000010  RBX: e940201000000008  RCX: 0000000000000000
    RDX: 00000000000200d2  RSI: 0000000000000100  RDI: e940201000000008
    RBP: ffff8806146178b0   R8: ffff88061461c140   R9: 0000000000000008
    R10: 0000000000001000  R11: 0000000000000001  R12: 0000000000000100
    R13: 0000000000000100  R14: e940201000000000  R15: 20105fa000080221
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #3 [ffff880614617898] find_get_page at ffffffff810ffe8e
 #4 [ffff8806146178b8] find_lock_page at ffffffff8110112a
 #5 [ffff8806146178e8] find_or_create_page at ffffffff8110129f
 #6 [ffff880614617938] filter_get_page at ffffffffa0c4b065 [obdfilter]
 #7 [ffff880614617968] filter_preprw_read at ffffffffa0c4d64d [obdfilter]
 #8 [ffff880614617a98] filter_preprw at ffffffffa0c4dedc [obdfilter]
 #9 [ffff880614617ad8] obd_preprw at ffffffffa0c09051 [ost]
#10 [ffff880614617b48] ost_brw_read at ffffffffa0c10091 [ost]
#11 [ffff880614617c88] ost_handle at ffffffffa0c16423 [ost]
#12 [ffff880614617da8] ptlrpc_main at ffffffffa07fd4e6 [ptlrpc]
#13 [ffff880614617f48] kernel_thread at ffffffff8100412a
crash> 

_ The ldiskfs_inode slab page corruption looks like :

crash> rd 0xffff880040aa5000 512 | grep -v "0000000000000000 0000000000000000" | head
ffff880040aa5600:  0000041100000401 20025bc900000421   ........!....[. 
ffff880040aa5610:  0000000000040002 d514200200000000   ............. ..
ffff880040aa5620:  0000041200000402 2010176b00000622   ........"...k.. 
ffff880040aa5630:  0000000000050000 3e44201000000000   ............. D>
ffff880040aa5640:  0000041300000403 20103b4500000823   ........#...E;. 
ffff880040aa5650:  0000000000050000 ec37201000000000   ............. 7.
ffff880040aa5660:  0000041400000404 20102f7700000a24   ........$...w/. 
ffff880040aa5670:  0000000000050000 3928201000000000   ............. (9
ffff880040aa5680:  0000041500000405 2010299500000c25   ........%....). 
ffff880040aa5690:  0000000000050000 d013201000000000   ............. ..
grep: write error
crash> 
crash> rd 0xffff880040aa5000 512 | grep -v "0000000000000000 0000000000000000" | tail
ffff880040aa5f60:  0020001b0020000b 201080000020162b   .. ... .+. .... 
ffff880040aa5f70:  0000000000070000 2a74201000000000   ............. t*
ffff880040aa5f80:  0020001c0020000c 201080000020182c   .. ... .,. .... 
ffff880040aa5f90:  0000000000070000 6e46201000000000   ............. Fn
ffff880040aa5fa0:  0020001d0020000d 2010800000201a2d   .. ... .-. .... 
ffff880040aa5fb0:  0000000000070000 024d201000000000   ............. M.
ffff880040aa5fc0:  0020001e0020000e 2010800000201c2e   .. ... ... .... 
ffff880040aa5fd0:  0000000000070000 b650201000000000   ............. P.
ffff880040aa5fe0:  0020001f0020000f 2010800000201e2f   .. ... ./. .... 
ffff880040aa5ff0:  0000000000070000 da5b201000000000   ............. [.
crash> 

_ dump is missing the 2 previous/next 4K virtual/real pages (inside the same 2M page!) because they are un-mapped/free.

Comment by Bruno Faccini (Inactive) [ 28/Jul/14 ]

Even if this will impact crash-dump timing and on-disk space, could it be possible to set kdump config to allow full memory/all pages to be dumped next time ??

Comment by Bruno Faccini (Inactive) [ 27/Aug/14 ]

Seb, any news or new occurence ??

Comment by Sebastien Piechurski [ 07/Jun/17 ]

Given that we did not see this issue for almost 3 years, and all our customers have migrated away from 2.1, I guess you can close this ticket.

Comment by Peter Jones [ 07/Jun/17 ]

ok , thanks

Generated at Sat Feb 10 01:50:09 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.