Details
-
Bug
-
Resolution: Duplicate
-
Minor
-
None
-
Lustre 2.0.0
-
None
-
3
-
6454
Description
The MDS panic stack/log looks like the following :
==================================================
LustreError: 64654:0:(md_object.h:740:mo_version_get()) ASSERTION(m->mo_ops->moo_version_get) failed
LustreError: 64654:0:(md_object.h:740:mo_version_get()) LBUG
Pid: 64654, comm: mdt_19
Call Trace:
[<ffffffffa04b8857>] libcfs_debug_dumpstack+0x57/0x80 [libcfs]
[<ffffffffa04b8e95>] lbug_with_loc+0x75/0xe0 [libcfs]
[<ffffffffa04c48b6>] libcfs_assertion_failed+0x66/0x70 [libcfs]
[<ffffffffa07235e9>] cml_version_get+0xa9/0xc0 [cmm]
[<ffffffffa0a62bcb>] mdt_obj_version_get+0x14b/0x220 [mdt]
[<ffffffffa0a63089>] mdt_version_get_check_save+0x39/0xe0 [mdt]
[<ffffffffa0a63204>] mdt_reint_unlink+0xd4/0x940 [mdt]
[<ffffffffa0a6084f>] ? mdt_unlink_unpack+0x41f/0x5a0 [mdt]
[<ffffffffa0a1a256>] ? md_ucred+0x26/0x60 [mdd]
[<ffffffffa0a1a256>] ? md_ucred+0x26/0x60 [mdd]
[<ffffffffa0a6167f>] mdt_reint_rec+0x3f/0x100 [mdt]
[<ffffffffa0668004>] ? lustre_msg_get_flags+0x34/0xa0 [ptlrpc]
[<ffffffffa0a58a34>] mdt_reint_internal+0x6d4/0x9f0 [mdt]
[<ffffffffa0a4f756>] ? mdt_reint_opcode+0x96/0x160 [mdt]
[<ffffffffa0a58d9c>] mdt_reint+0x4c/0x120 [mdt]
[<ffffffffa0667ad8>] ? lustre_msg_check_version+0xc8/0xe0 [ptlrpc]
[<ffffffffa0a4d9f5>] mdt_handle_common+0x8d5/0x1810 [mdt]
[<ffffffffa0665734>] ? lustre_msg_get_opc+0x94/0x100 [ptlrpc]
[<ffffffffa0a4ea05>] mdt_regular_handle+0x15/0x20 [mdt]
[<ffffffffa0674641>] ptlrpc_server_handle_request+0x421/0xef0 [ptlrpc]
[<ffffffff810408fe>] ? activate_task+0x2e/0x40
[<ffffffff8104d906>] ? try_to_wake_up+0x276/0x380
[<ffffffff8104da22>] ? default_wake_function+0x12/0x20
[<ffffffff810411a9>] ? __wake_up_common+0x59/0x90
[<ffffffffa04b963e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
[<ffffffffa06759e2>] ptlrpc_main+0x8d2/0x1550 [ptlrpc]
[<ffffffff8104da10>] ? default_wake_function+0x0/0x20
[<ffffffff8100d1aa>] child_rip+0xa/0x20
[<ffffffffa0675110>] ? ptlrpc_main+0x0/0x1550 [ptlrpc]
[<ffffffff8100d1a0>] ? child_rip+0x0/0x20
Kernel panic - not syncing: LBUG
Pid: 64654, comm: mdt_19 Not tainted 2.6.32-71.24.1.el6.Bull.23.x86_64 #1
Call Trace:
[<ffffffff81466b14>] panic+0x78/0x137
[<ffffffffa04b8eeb>] lbug_with_loc+0xcb/0xe0 [libcfs]
[<ffffffffa04c48b6>] libcfs_assertion_failed+0x66/0x70 [libcfs]
[<ffffffffa07235e9>] cml_version_get+0xa9/0xc0 [cmm]
[<ffffffffa0a62bcb>] mdt_obj_version_get+0x14b/0x220 [mdt]
[<ffffffffa0a63089>] mdt_version_get_check_save+0x39/0xe0 [mdt]
[<ffffffffa0a63204>] mdt_reint_unlink+0xd4/0x940 [mdt]
[<ffffffffa0a6084f>] ? mdt_unlink_unpack+0x41f/0x5a0 [mdt]
[<ffffffffa0a1a256>] ? md_ucred+0x26/0x60 [mdd]
[<ffffffffa0a1a256>] ? md_ucred+0x26/0x60 [mdd]
[<ffffffffa0a6167f>] mdt_reint_rec+0x3f/0x100 [mdt]
[<ffffffffa0668004>] ? lustre_msg_get_flags+0x34/0xa0 [ptlrpc]
[<ffffffffa0a58a34>] mdt_reint_internal+0x6d4/0x9f0 [mdt]
[<ffffffffa0a4f756>] ? mdt_reint_opcode+0x96/0x160 [mdt]
[<ffffffffa0a58d9c>] mdt_reint+0x4c/0x120 [mdt]
[<ffffffffa0667ad8>] ? lustre_msg_check_version+0xc8/0xe0 [ptlrpc]
[<ffffffffa0a4d9f5>] mdt_handle_common+0x8d5/0x1810 [mdt]
[<ffffffffa0665734>] ? lustre_msg_get_opc+0x94/0x100 [ptlrpc]
[<ffffffffa0a4ea05>] mdt_regular_handle+0x15/0x20 [mdt]
[<ffffffffa0674641>] ptlrpc_server_handle_request+0x421/0xef0 [ptlrpc]
[<ffffffff810408fe>] ? activate_task+0x2e/0x40
[<ffffffff8104d906>] ? try_to_wake_up+0x276/0x380
[<ffffffff8104da22>] ? default_wake_function+0x12/0x20
[<ffffffff810411a9>] ? __wake_up_common+0x59/0x90
[<ffffffffa04b963e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
[<ffffffffa06759e2>] ptlrpc_main+0x8d2/0x1550 [ptlrpc]
[<ffffffff8104da10>] ? default_wake_function+0x0/0x20
[<ffffffff8100d1aa>] child_rip+0xa/0x20
[<ffffffffa0675110>] ? ptlrpc_main+0x0/0x1550 [ptlrpc]
[<ffffffff8100d1a0>] ? child_rip+0x0/0x20
==================================================
The solid reproducer, to be run from a Lustre-Client, is very simple :
======================================================================
- touch /ccc/work/.../foo
- lfs path2fid /ccc/work/.../foo
[0x200002cb4:0x3cbb:0x0] - rm /ccc/work/.lustre/fid/[0x200002cb4:0x3cbb:0x0]
<<< MDS crash like described just before >>>
======================================================================
indeep crash-dump and concerned source-code analysis clearly indicate that "rm/unlink by-FID" is not fully implemented but also not correctly checked+prohibited !!...
My assumption is that this should be fixed/checked+prohibited with the following :
==================================================================================
[root@gaia1 lustre-2.0.0.1] # diff -urN lustre/mdd/mdd_dir.c.orig lustre/mdd/mdd_dir.c.bfi
— lustre/mdd/mdd_dir.c.orig 2011-11-04 18:46:46.000000000 +0100
+++ lustre/mdd/mdd_dir.c.bfi 2012-02-01 17:18:37.074912630 +0100
@@ -415,7 +415,7 @@
RETURN(rc);
}
- if (mdd_is_append(pobj))
+ if (mdd_is_immutable(pobj) || mdd_is_append(pobj))
RETURN(-EPERM);
}
[root@gaia1 lustre-2.0.0.1] #
==================================================================================
What do you think ??
Also, our 1st tests with Lustre 2.1.0 show the same problem/behavior ...