[LU-1111] "rm" by FID causes MDS LBUG ("(md_object.h:740:mo_version_get()) ASSERTION(m->mo_ops->moo_version_get) failed") Created: 16/Feb/12  Updated: 08/Aug/18  Resolved: 29/Feb/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.0.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Alexandre Louvet Assignee: Lai Siyao
Resolution: Duplicate Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 6454

 Description   

The MDS panic stack/log looks like the following :
==================================================
LustreError: 64654:0:(md_object.h:740:mo_version_get()) ASSERTION(m->mo_ops->moo_version_get) failed
LustreError: 64654:0:(md_object.h:740:mo_version_get()) LBUG
Pid: 64654, comm: mdt_19

Call Trace:
[<ffffffffa04b8857>] libcfs_debug_dumpstack+0x57/0x80 [libcfs]
[<ffffffffa04b8e95>] lbug_with_loc+0x75/0xe0 [libcfs]
[<ffffffffa04c48b6>] libcfs_assertion_failed+0x66/0x70 [libcfs]
[<ffffffffa07235e9>] cml_version_get+0xa9/0xc0 [cmm]
[<ffffffffa0a62bcb>] mdt_obj_version_get+0x14b/0x220 [mdt]
[<ffffffffa0a63089>] mdt_version_get_check_save+0x39/0xe0 [mdt]
[<ffffffffa0a63204>] mdt_reint_unlink+0xd4/0x940 [mdt]
[<ffffffffa0a6084f>] ? mdt_unlink_unpack+0x41f/0x5a0 [mdt]
[<ffffffffa0a1a256>] ? md_ucred+0x26/0x60 [mdd]
[<ffffffffa0a1a256>] ? md_ucred+0x26/0x60 [mdd]
[<ffffffffa0a6167f>] mdt_reint_rec+0x3f/0x100 [mdt]
[<ffffffffa0668004>] ? lustre_msg_get_flags+0x34/0xa0 [ptlrpc]
[<ffffffffa0a58a34>] mdt_reint_internal+0x6d4/0x9f0 [mdt]
[<ffffffffa0a4f756>] ? mdt_reint_opcode+0x96/0x160 [mdt]
[<ffffffffa0a58d9c>] mdt_reint+0x4c/0x120 [mdt]
[<ffffffffa0667ad8>] ? lustre_msg_check_version+0xc8/0xe0 [ptlrpc]
[<ffffffffa0a4d9f5>] mdt_handle_common+0x8d5/0x1810 [mdt]
[<ffffffffa0665734>] ? lustre_msg_get_opc+0x94/0x100 [ptlrpc]
[<ffffffffa0a4ea05>] mdt_regular_handle+0x15/0x20 [mdt]
[<ffffffffa0674641>] ptlrpc_server_handle_request+0x421/0xef0 [ptlrpc]
[<ffffffff810408fe>] ? activate_task+0x2e/0x40
[<ffffffff8104d906>] ? try_to_wake_up+0x276/0x380
[<ffffffff8104da22>] ? default_wake_function+0x12/0x20
[<ffffffff810411a9>] ? __wake_up_common+0x59/0x90
[<ffffffffa04b963e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
[<ffffffffa06759e2>] ptlrpc_main+0x8d2/0x1550 [ptlrpc]
[<ffffffff8104da10>] ? default_wake_function+0x0/0x20
[<ffffffff8100d1aa>] child_rip+0xa/0x20
[<ffffffffa0675110>] ? ptlrpc_main+0x0/0x1550 [ptlrpc]
[<ffffffff8100d1a0>] ? child_rip+0x0/0x20

Kernel panic - not syncing: LBUG
Pid: 64654, comm: mdt_19 Not tainted 2.6.32-71.24.1.el6.Bull.23.x86_64 #1
Call Trace:
[<ffffffff81466b14>] panic+0x78/0x137
[<ffffffffa04b8eeb>] lbug_with_loc+0xcb/0xe0 [libcfs]
[<ffffffffa04c48b6>] libcfs_assertion_failed+0x66/0x70 [libcfs]
[<ffffffffa07235e9>] cml_version_get+0xa9/0xc0 [cmm]
[<ffffffffa0a62bcb>] mdt_obj_version_get+0x14b/0x220 [mdt]
[<ffffffffa0a63089>] mdt_version_get_check_save+0x39/0xe0 [mdt]
[<ffffffffa0a63204>] mdt_reint_unlink+0xd4/0x940 [mdt]
[<ffffffffa0a6084f>] ? mdt_unlink_unpack+0x41f/0x5a0 [mdt]
[<ffffffffa0a1a256>] ? md_ucred+0x26/0x60 [mdd]
[<ffffffffa0a1a256>] ? md_ucred+0x26/0x60 [mdd]
[<ffffffffa0a6167f>] mdt_reint_rec+0x3f/0x100 [mdt]
[<ffffffffa0668004>] ? lustre_msg_get_flags+0x34/0xa0 [ptlrpc]
[<ffffffffa0a58a34>] mdt_reint_internal+0x6d4/0x9f0 [mdt]
[<ffffffffa0a4f756>] ? mdt_reint_opcode+0x96/0x160 [mdt]
[<ffffffffa0a58d9c>] mdt_reint+0x4c/0x120 [mdt]
[<ffffffffa0667ad8>] ? lustre_msg_check_version+0xc8/0xe0 [ptlrpc]
[<ffffffffa0a4d9f5>] mdt_handle_common+0x8d5/0x1810 [mdt]
[<ffffffffa0665734>] ? lustre_msg_get_opc+0x94/0x100 [ptlrpc]
[<ffffffffa0a4ea05>] mdt_regular_handle+0x15/0x20 [mdt]
[<ffffffffa0674641>] ptlrpc_server_handle_request+0x421/0xef0 [ptlrpc]
[<ffffffff810408fe>] ? activate_task+0x2e/0x40
[<ffffffff8104d906>] ? try_to_wake_up+0x276/0x380
[<ffffffff8104da22>] ? default_wake_function+0x12/0x20
[<ffffffff810411a9>] ? __wake_up_common+0x59/0x90
[<ffffffffa04b963e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
[<ffffffffa06759e2>] ptlrpc_main+0x8d2/0x1550 [ptlrpc]
[<ffffffff8104da10>] ? default_wake_function+0x0/0x20
[<ffffffff8100d1aa>] child_rip+0xa/0x20
[<ffffffffa0675110>] ? ptlrpc_main+0x0/0x1550 [ptlrpc]
[<ffffffff8100d1a0>] ? child_rip+0x0/0x20
==================================================

The solid reproducer, to be run from a Lustre-Client, is very simple :
======================================================================

  1. touch /ccc/work/.../foo
  2. lfs path2fid /ccc/work/.../foo
    [0x200002cb4:0x3cbb:0x0]
  3. rm /ccc/work/.lustre/fid/[0x200002cb4:0x3cbb:0x0]
    <<< MDS crash like described just before >>>
    ======================================================================

indeep crash-dump and concerned source-code analysis clearly indicate that "rm/unlink by-FID" is not fully implemented but also not correctly checked+prohibited !!...

My assumption is that this should be fixed/checked+prohibited with the following :
==================================================================================
[root@gaia1 lustre-2.0.0.1] # diff -urN lustre/mdd/mdd_dir.c.orig lustre/mdd/mdd_dir.c.bfi
— lustre/mdd/mdd_dir.c.orig 2011-11-04 18:46:46.000000000 +0100
+++ lustre/mdd/mdd_dir.c.bfi 2012-02-01 17:18:37.074912630 +0100
@@ -415,7 +415,7 @@
RETURN(rc);
}

  • if (mdd_is_append(pobj))
    + if (mdd_is_immutable(pobj) || mdd_is_append(pobj))
    RETURN(-EPERM);
    }

[root@gaia1 lustre-2.0.0.1] #
==================================================================================

What do you think ??

Also, our 1st tests with Lustre 2.1.0 show the same problem/behavior ...



 Comments   
Comment by Oleg Drokin [ 16/Feb/12 ]

Indeed, unlink by fid (and also rename by fid/into fid for example) are not supported, but the logic to guard against that is missing.
I imagine the original implementer did not anticipate such a use case.

Comment by Peter Jones [ 16/Feb/12 ]

Lai

Please can you take care of this one

Thanks

Peter

Comment by Lai Siyao [ 29/Feb/12 ]

Bruno, your proposed check for immutable is in mdd_permission_internal_locked().

Comment by Lai Siyao [ 29/Feb/12 ]

Since this is also op-by-fid issue, I fixed it together with LU-1110, please check patch at http://review.whamcloud.com/#change,2224.

Comment by Peter Jones [ 29/Feb/12 ]

Tracked under LU-1110

Comment by Gerrit Updater [ 08/Aug/18 ]

Alexey Lyashkov (c17817@cray.com) uploaded a new patch: https://review.whamcloud.com/32959
Subject: LU-1111 lfsck: skip orphan processing
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 0329d94594949c0049654e5a91be212d3ed927e6

Generated at Sat Feb 10 01:13:36 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.