[LU-5400] inode structure corruption leading to OSS crash Created: 23/Jul/14  Updated: 07/Jun/17  Resolved: 07/Jun/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.6
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Sebastien Piechurski Assignee: Bruno Faccini (Inactive)
Resolution: Cannot Reproduce Votes: 0
Labels: None
Environment:

Bull environment


Severity: 3
Rank (Obsolete): 15033

 Description   

One of our customer had kernel Null pointer dereference in __iget.
The backtrace is as follows:

PID: 29825  TASK: ffff88044c49e7b0  CPU: 3   COMMAND: "ll_ost_583"
[...]
    [exception RIP: __iget+45]
    RIP: ffffffff81180cfd  RSP: ffff88044c517ac0  RFLAGS: 00010246
    RAX: ffff880040aa5550  RBX: ffff880040aa5540  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: ffff88040c7bd3a9  RDI: ffff880040aa5540
    RBP: ffff88044c517ac0   R8: 00000000fffffff3   R9: 00000000fffffff6
    R10: 0000000000000008  R11: 0000000000000096  R12: ffff8800b59f0a80
    R13: ffff88040c7bd300  R14: ffff8804243b22f8  R15: 000000000000000b
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #9 [ffff88044c517ac8] igrab at ffffffff81180fd8
#10 [ffff88044c517ae8] filter_lvbo_init at ffffffffa0bdc795 [obdfilter]
#11 [ffff88044c517b18] ldlm_resource_get at ffffffffa07c33a4 [ptlrpc]
#12 [ffff88044c517b88] ldlm_lock_create at ffffffffa07bcb85 [ptlrpc]
#13 [ffff88044c517bd8] ldlm_handle_enqueue0 at ffffffffa07e40a4 [ptlrpc]
#14 [ffff88044c517c48] ldlm_handle_enqueue at ffffffffa07e4ef6 [ptlrpc]
#15 [ffff88044c517c88] ost_handle at ffffffffa0964e83 [ost]
#16 [ffff88044c517da8] ptlrpc_main at ffffffffa08134e6 [ptlrpc]
#17 [ffff88044c517f48] kernel_thread at ffffffff8100412a

The crash occurs here:

crash> dis __iget
0xffffffff81180cd0 <__iget>:    push   %rbp
0xffffffff81180cd1 <__iget+1>:  mov    %rsp,%rbp
0xffffffff81180cd4 <__iget+4>:  nopl   0x0(%rax,%rax,1)
0xffffffff81180cd9 <__iget+9>:  mov    0x48(%rdi),%eax
0xffffffff81180cdc <__iget+12>: test   %eax,%eax
0xffffffff81180cde <__iget+14>: jne    0xffffffff81180d30 <__iget+96>
0xffffffff81180ce0 <__iget+16>: lock incl 0x48(%rdi)
0xffffffff81180ce4 <__iget+20>: testq  $0x107,0x218(%rdi)
0xffffffff81180cef <__iget+31>: jne    0xffffffff81180d22 <__iget+82>
0xffffffff81180cf1 <__iget+33>: mov    0x18(%rdi),%rdx
0xffffffff81180cf5 <__iget+37>: mov    0x10(%rdi),%rcx
0xffffffff81180cf9 <__iget+41>: lea    0x10(%rdi),%rax
0xffffffff81180cfd <__iget+45>: mov    %rdx,0x8(%rcx)    <=== HERE

which corresponds to :

	if (!(inode->i_state & (I_DIRTY|I_SYNC)))
		list_move(&inode->i_list, &inode_in_use);

The %rcx is supposed to hold &inode->i_list, but is NULL.
Looking at the inode structure, all first fields contain zeros:

struct inode {
  i_hash = {
    next = 0x0, 
    pprev = 0x0
  }, 
  i_list = {
    next = 0x0, 
    prev = 0x0
  }, 
  i_sb_list = {
    next = 0x0, 
    prev = 0x0
  }, 
  i_dentry = {
    next = 0x0, 
    prev = 0x0
  }, 
  i_ino = 0, 
  i_count = {
    counter = 1
  }, 
  i_nlink = 0, 
....

Looking at the dentry structure from which the inode address comes from, it looks to be ok:

crash> struct dentry ffff88040c7bd300
struct dentry {
  d_count = {
    counter = 1
  }, 
  d_flags = 8, 
  d_lock = {
    raw_lock = {
      slock = 2555943
    }
  }, 
  d_mounted = -559087616, 
  d_inode = 0xffff880040aa5540, 
  d_hash = {
    next = 0xffff88039a004f18, 
    pprev = 0xffff8803b4f8c558
  }, 
  d_parent = 0xffff8804235ea9c0, 
  d_name = {
    hash = 72921089, 
    len = 9, 
    name = 0xffff88040c7bd3a0 "120408088"
  }, 
  d_lru = {
    next = 0xffff88040c7bd400, 
    prev = 0xffff88040c7bd280
  }, 
  d_u = {
    d_child = {
      next = 0xffff88040c31dc10, 
      prev = 0xffff88054d37b950
    }, 
    d_rcu = {
      next = 0xffff88040c31dc10, 
      func = 0xffff88054d37b950
    }
  }, 
  d_subdirs = {
    next = 0xffff88040c7bd360, 
    prev = 0xffff88040c7bd360
  }, 
  d_alias = {
    next = 0xffff880040aa5570, 
    prev = 0xffff880040aa5570
  }, 
  d_time = 0, 
  d_op = 0x0, 
  d_sb = 0xffff880bc74dd400, 
  d_fsdata = 0x0, 
  d_iname = "120408088\000\000\000\000\000\000\000\000\000\b\000\000\000\000\000\000\000\000\000\000\000\000"
}

and is consistent with its parent directory:

crash> struct dentry.d_name ffff8804235ea9c0
  d_name = {
    hash = 2243934, 
    len = 3, 
    name = 0xffff8804235eaa60 "d24"
  }

Can you find how this corruption happened ?



 Comments   
Comment by Bruno Faccini (Inactive) [ 24/Jul/14 ]

Hello Seb,
I think you meant that "%rcx is supposed to hold inode->i_list.next", right ?

You say the beginning of Inode at address 0xffff880040aa5540 has been zeroed, but what about later fields do they look ok? I already see that i_count's value is 1 !

Also, can you check memory content just before this inode structure and to which slab kmem_cache it belongs to?

Comment by Sebastien Piechurski [ 24/Jul/14 ]

Hi Bruno

I was meaning %rcx == inode->i_list, not &inode->i_list, with the crash occurring when trying to access i_list.prev in the inlined call to list_move.

I looked the 400 bytes preceding the inode address, and everything is set to zeros.
The i_count value was incremented just a few instructions before the crash, which is why it is the only value != 0.
However, not all fields in the inode struct are zeros, only the beginning, but all others are not consistent.
I attach the complete inode struct dump as well as the hex dump.
You will notice in the hex dump, that there seems to be an incrementing pattern every 32 bytes like:

ffff880040aa6220: 00 300011 00300001 00000be300300221
ffff880040aa6230: 0000000000040252 65a7000000000000
ffff880040aa6240: 00 300012 00300002 0000092b00300422
ffff880040aa6250: 00000000000401ec 921e000000000000
ffff880040aa6260: 00 300013 00300003 00000f1800300623

Finally, it looks like we have a ldiskfs_inode slab corruption:

crash> kmem -s ffff880040aa5540
kmem: ldiskfs_inode_cache: partial list: slab: ffff880040aa72c0  bad next pointer: 0
kmem: ldiskfs_inode_cache: partial list: slab: ffff880040aa72c0  bad prev pointer: 0
kmem: ldiskfs_inode_cache: partial list: slab: ffff880040aa72c0  bad inuse counter: 0
kmem: ldiskfs_inode_cache: partial list: slab: ffff880040aa72c0  bad s_mem pointer: 0
kmem: ldiskfs_inode_cache: partial list: slab: ffff880040aa5000  bad next pointer: 0
kmem: ldiskfs_inode_cache: partial list: slab: ffff880040aa5000  bad s_mem pointer: 0
kmem: ldiskfs_inode_cache: full list: slab: ffff880040aa5000  bad next pointer: 0
kmem: ldiskfs_inode_cache: full list: slab: ffff880040aa5000  bad s_mem pointer: 0
kmem: ldiskfs_inode_cache: free list: slab: ffff880040aa5000  bad next pointer: 0
kmem: ldiskfs_inode_cache: free list: slab: ffff880040aa5000  bad s_mem pointer: 0
kmem: ldiskfs_inode_cache: partial list: slab: ffff880040aa5000  bad next pointer: 0
kmem: ldiskfs_inode_cache: partial list: slab: ffff880040aa5000  bad s_mem pointer: 0
kmem: ldiskfs_inode_cache: full list: slab: ffff880040aa5000  bad next pointer: 0
kmem: ldiskfs_inode_cache: full list: slab: ffff880040aa5000  bad s_mem pointer: 0
kmem: ldiskfs_inode_cache: free list: slab: ffff880040aa5000  bad next pointer: 0
kmem: ldiskfs_inode_cache: free list: slab: ffff880040aa5000  bad s_mem pointer: 0
kmem: ldiskfs_inode_cache: address not found in cache: ffff880040aa5540

To be linked to LU-5284 ? The crash comes from the same customer ...

Comment by Bruno Faccini (Inactive) [ 24/Jul/14 ]

Humm, I am late on LU-5284 crash-dump analysis, but this can be suspected sure!
Let me check further and I will get back with some more ideas ...

Comment by Bruno Faccini (Inactive) [ 25/Jul/14 ]

After some work on crash-dump for LU-5284, I can already confirm that the corruption looks really similar (serie of 4 quad-words with similarities/increments).

Comment by Bruno Faccini (Inactive) [ 25/Jul/14 ]

Humm and the corruption address range is very close too !! Could this be the same node that failed both time ??

Comment by Sebastien Piechurski [ 08/Sep/14 ]

No the occurrence here and in LU-5284 are not on the same node, even though on the same site.

Comment by Bruno Faccini (Inactive) [ 16/Sep/14 ]

Humm this is very strange, I know it is not an easy question/answer, but is there something specific (HW/SW configs, work-load, ...) for these 2 nodes vs others ??

Did you encounter new crashes ?

Comment by Sebastien Piechurski [ 07/Jun/17 ]

Same thing. Old ticket and problem disappeared ...

Please close.

Comment by Peter Jones [ 07/Jun/17 ]

Thanks!

Generated at Sat Feb 10 01:51:10 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.