[LU-8286] kernel BUG at fs/inode.c:1358! Created: 15/Jun/16  Updated: 22/Sep/16  Resolved: 22/Sep/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.3
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Mahmoud Hanafi Assignee: Bob Glossman (Inactive)
Resolution: Duplicate Votes: 0
Labels: None

Severity: 1
Rank (Obsolete): 9223372036854775807

 Description   

Is duplicate of LU-6911. But we need back port to 2.5.3 and 2.7

MDS crash with error.
Rebooted mds and ran fsck on mdt.
After recovery it crashed again
Rebooted mds after recovery it crashed again same spot!
System is currently DOWN

------------[ cut here ]------------
kernel BUG at fs/inode.c:1358!
Entering kdb (current=0xffff8807f4ab1520, pid 7311) on processor 6 due to KDB_ENTER()
[6]kdb> bt
Stack traceback for pid 7311
0xffff8807f4ab1520     7311        2  1    6   R  0xffff8807f4ab1bb8 *mdt02_000
sp                ip                Function (args)
0xffff8807f4ab3948 0xffffffff811a58c9 iput+0x69 (0xffff880f34d75860)
kdb_bb: address 0x0000000000000018 not recognised
Using old style backtrace, unreliable with no arguments
sp                ip                Function (args)
0xffff8807f4ab3948 0xffffffff811a58c9 iput+0x69
0xffff8807f4ab3988 0xffffffffa0e020f5 [ldiskfs]ldiskfs_xattr_inode_array_free+0x75
0xffff8807f4ab39d8 0xffffffffa0dd8c39 [ldiskfs]ldiskfs_delete_inode+0x1e9
0xffff8807f4ab3a08 0xffffffffa0dd8a50 [ldiskfs]ldiskfs_delete_inode
0xffff8807f4ab3a18 0xffffffff811a691e generic_delete_inode+0xde
0xffff8807f4ab3a48 0xffffffff811a6a75 generic_drop_inode+0x65
0xffff8807f4ab3a68 0xffffffff811a58c2 iput+0x62
0xffff8807f4ab3a88 0xffffffffa0ec3a7c [osd_ldiskfs]osd_object_delete+0x1fc
0xffff8807f4ab3ad8 0xffffffffa06e57f1 [obdclass]lu_object_free+0x81
0xffff8807f4ab3b18 0xffffffffa059f442 [libcfs]cfs_hash_bd_from_key+0x42
0xffff8807f4ab3b58 0xffffffffa06e5f2d [obdclass]lu_object_put+0xbd
0xffff8807f4ab3bc8 0xffffffffa10bb1ff [mdt]mdt_object_put+0x3f
0xffff8807f4ab3be8 0xffffffffa10c3e19 [mdt]mdt_reint_unlink+0x7e9
0xffff8807f4ab3bf8 0xffffffffa0705f40 [obdclass]lu_ucred+0x20
0xffff8807f4ab3c98 0xffffffffa10bb01d [mdt]mdt_reint_rec+0x5d
0xffff8807f4ab3cc8 0xffffffffa109ed9b [mdt]mdt_reint_internal+0x4cb
0xffff8807f4ab3d08 0xffffffffa109f50b [mdt]mdt_reint+0x6b
0xffff8807f4ab3d48 0xffffffffa0975d9e [ptlrpc]tgt_request_handle+0x8be
0xffff8807f4ab3da8 0xffffffffa091fca1 [ptlrpc]ptlrpc_main+0xf41
0xffff8807f4ab3ec8 0xffffffffa091ed60 [ptlrpc]ptlrpc_main
0xffff8807f4ab3ee8 0xffffffff8109dc8e kthread+0x9e
0xffff8807f4ab3f48 0xffffffff8100c28a child_rip+0xa


 Comments   
Comment by Mahmoud Hanafi [ 15/Jun/16 ]

Correction we only need backport to lustre2.7.1

Comment by Mahmoud Hanafi [ 15/Jun/16 ]

A 3rd time doing an abort recovery it stayed up. So the filesystem is back up but sill need the backport of the patch to 2.7.1 and 2.7.2

Comment by Jay Lan (Inactive) [ 15/Jun/16 ]

back port to b2_7_fe.

Conflicts were on
ldiskfs/kernel_patches/patches/rhel7/ext4-large-eas.patch

Comment by Peter Jones [ 15/Jun/16 ]

Engineering agrees with your assessment and Bob is working on a port

Comment by Oleg Drokin [ 15/Jun/16 ]

If you don't use rhel7, I think you can ignore the rhel7 conflict for now to get you in business faster.

Comment by Jay Lan (Inactive) [ 15/Jun/16 ]

Sounds like a good alternative Oleg. I will ignore the rhel7 conflict and move ahead if we get hit with this bug again before a b2_7_fe back port is available.

Comment by Peter Jones [ 14/Jul/16 ]

http://review.whamcloud.com/#/c/20820/

Comment by Jay Lan (Inactive) [ 14/Jul/16 ]

Thank you Peter.

Comment by Mahmoud Hanafi [ 22/Sep/16 ]

Close this case.

Generated at Sat Feb 10 02:16:13 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.