Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.4.0, Lustre 2.5.0
-
3
-
8703
Description
Hi,
We have been testing v2.4 and have hit this LBUG which we have never experienced in v1.8.x for similar workloads. It looks like it is related to do an rm/unlink on certain files. I had to abort recovery and stop the ongoing file deletion in order to keep the MDS from repeatedly crashing with the same LBUG. We can supply more debug info should you need it.
Cheers,
Daire
<0>LustreError: 6274:0:(linkea.c:169:linkea_links_find()) ASSERTION( ldata->ld_leh != ((void *)0) ) failed:
<0>LustreError: 6274:0:(linkea.c:169:linkea_links_find()) LBUG
<4>Pid: 6274, comm: mdt01_004
<4>
<4>Call Trace:
<4> [<ffffffffa044b895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
<4> [<ffffffffa044be97>] lbug_with_loc+0x47/0xb0 [libcfs]
<4> [<ffffffffa05b47d6>] linkea_links_find+0x186/0x190 [obdclass]
<4> [<ffffffffa0b87206>] ? mdo_xattr_get+0x26/0x30 [mdd]
<4> [<ffffffffa0b8a645>] mdd_linkea_prepare+0x95/0x430 [mdd]
<4> [<ffffffffa0b8ab01>] mdd_links_rename+0x121/0x540 [mdd]
<4> [<ffffffffa0b8eae6>] mdd_unlink+0xb86/0xe30 [mdd]
<4> [<ffffffffa0e0db98>] mdo_unlink+0x18/0x50 [mdt]
<4> [<ffffffffa0e10f40>] mdt_reint_unlink+0x820/0x1010 [mdt]
<4> [<ffffffffa0e0d891>] mdt_reint_rec+0x41/0xe0 [mdt]
<4> [<ffffffffa0df2b03>] mdt_reint_internal+0x4c3/0x780 [mdt]
<4> [<ffffffffa0df2e04>] mdt_reint+0x44/0xe0 [mdt]
<4> [<ffffffffa0df7ab8>] mdt_handle_common+0x648/0x1660 [mdt]
<4> [<ffffffffa0e31165>] mds_regular_handle+0x15/0x20 [mdt]
<4> [<ffffffffa0730388>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
<4> [<ffffffffa044c5de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
<4> [<ffffffffa045dd8f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
<4> [<ffffffffa07276e9>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
<4> [<ffffffff81055ab3>] ? __wake_up+0x53/0x70
<4> [<ffffffffa073171e>] ptlrpc_main+0xace/0x1700 [ptlrpc]
<4> [<ffffffffa0730c50>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
<4> [<ffffffff8100c0ca>] child_rip+0xa/0x20
<4> [<ffffffffa0730c50>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
<4> [<ffffffffa0730c50>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
<4> [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
<4>
<0>Kernel panic - not syncing: LBUG
<4>Pid: 6274, comm: mdt01_004 Tainted: G --------------- T 2.6.32-358.6.2.el6_lustre.g230b174.x86_64 #1
<4>Call Trace:
<4> [<ffffffff8150d878>] ? panic+0xa7/0x16f
<4> [<ffffffffa044beeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
<4> [<ffffffffa05b47d6>] ? linkea_links_find+0x186/0x190 [obdclass]
<4> [<ffffffffa0b87206>] ? mdo_xattr_get+0x26/0x30 [mdd]
<4> [<ffffffffa0b8a645>] ? mdd_linkea_prepare+0x95/0x430 [mdd]
<4> [<ffffffffa0b8ab01>] ? mdd_links_rename+0x121/0x540 [mdd]
<4> [<ffffffffa0b8eae6>] ? mdd_unlink+0xb86/0xe30 [mdd]
<4> [<ffffffffa0e0db98>] ? mdo_unlink+0x18/0x50 [mdt]
<4> [<ffffffffa0e10f40>] ? mdt_reint_unlink+0x820/0x1010 [mdt]
<4> [<ffffffffa0e0d891>] ? mdt_reint_rec+0x41/0xe0 [mdt]
<4> [<ffffffffa0df2b03>] ? mdt_reint_internal+0x4c3/0x780 [mdt]
<4> [<ffffffffa0df2e04>] ? mdt_reint+0x44/0xe0 [mdt]
<4> [<ffffffffa0df7ab8>] ? mdt_handle_common+0x648/0x1660 [mdt]
<4> [<ffffffffa0e31165>] ? mds_regular_handle+0x15/0x20 [mdt]
<4> [<ffffffffa0730388>] ? ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
<4> [<ffffffffa044c5de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
<4> [<ffffffffa045dd8f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
<4> [<ffffffffa07276e9>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
<4> [<ffffffff81055ab3>] ? __wake_up+0x53/0x70
<4> [<ffffffffa073171e>] ? ptlrpc_main+0xace/0x1700 [ptlrpc]
<4> [<ffffffffa0730c50>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
<4> [<ffffffff8100c0ca>] ? child_rip+0xa/0x20
<4> [<ffffffffa0730c50>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
<4> [<ffffffffa0730c50>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
<4> [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
Attachments
Issue Links
- is related to
-
LU-5145 kernel slab data memory coruption on MDS due to a file with truncated link extended attribute
-
- Resolved
-
Wow I am sorry Daire, I don't know how this happen but patch-set#3 of http://review.whamcloud.com/6676 contained a regression from patch-set #1/#2 (in fact it did not contain the main part/change from patch-set #1 that must be in to prevent the LBUG!!) .... Can you give a try to patch-set #4 that should be definitive one ??