[LU-4040] mdt_object_find() and mdt_body_unpack() assert on some FIDs Created: 01/Oct/13  Updated: 18/Jun/14  Resolved: 18/Jun/14

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.0
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: John Hammond Assignee: Bruno Faccini (Inactive)
Resolution: Not a Bug Votes: 0
Labels: HSM, mdt

Severity: 3
Rank (Obsolete): 10849

 Description   

Passing a valid local storage FID to mdt_object_find() or mdt_body_unpack() will cause a successful lu_object lookup but trigger a failed assertion in mdt_obj():

static struct mdt_object *mdt_obj(struct lu_object *o)
{
        LASSERT(lu_device_is_mdt(o->lo_dev));
        return container_of0(o, struct mdt_object, mot_obj);
}

An easy way to exploit this is through the new HSM ioctls which ship unvalidated FIDs to the MDT:

# sys_hsm archive /mnt/lustre '[0xa:0x0:0x0]'

Message from syslogd@t at Oct  1 15:15:22 ...
 kernel:LustreError: 12435:0:(mdt_handler.c:2395:mdt_obj()) ASSERTION( lu_device_is_mdt(o->lo_dev) ) failed: 

Message from syslogd@t at Oct  1 15:15:22 ...
 kernel:LustreError: 12435:0:(mdt_handler.c:2395:mdt_obj()) LBUG

Call Trace:
 [<ffffffffa0d93895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
 [<ffffffffa0d93e97>] lbug_with_loc+0x47/0xb0 [libcfs]
 [<ffffffffa0534365>] mdt_obj+0x55/0x80 [mdt]
 [<ffffffffa0538b06>] mdt_object_find+0x66/0x170 [mdt]
 [<ffffffffa058aa8b>] mdt_hsm_get_md_hsm+0x6b/0x480 [mdt]
 [<ffffffffa0582765>] mdt_hsm_add_actions+0x435/0x10a0 [mdt]
 [<ffffffff81168bab>] ? cache_alloc_refill+0x15b/0x240
 [<ffffffff81169b7c>] ? __kmalloc+0x20c/0x220
 [<ffffffffa0579e17>] mdt_hsm_request+0x647/0xba0 [mdt]
 [<ffffffffa0542a8a>] mdt_handle_common+0x52a/0x1470 [mdt]
 [<ffffffffa057ca25>] mds_regular_handle+0x15/0x20 [mdt]
 [<ffffffffa106ee45>] ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc]
 [<ffffffffa0da527f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
 [<ffffffffa10664e9>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]
 [<ffffffffa10701ad>] ptlrpc_main+0xaed/0x1740 [ptlrpc]
 [<ffffffffa106f6c0>] ? ptlrpc_main+0x0/0x1740 [ptlrpc]
 [<ffffffff81096a36>] kthread+0x96/0xa0
 [<ffffffff8100c0ca>] child_rip+0xa/0x20
 [<ffffffff810969a0>] ? kthread+0x0/0xa0
 [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
# sys_hsm release /mnt/lustre '[0xa:0x0:0x0]'

Message from syslogd@t at Oct  1 15:24:08 ...
 kernel:LustreError: 3456:0:(mdt_handler.c:2395:mdt_obj()) ASSERTION( lu_device_is_mdt(o->lo_dev) ) failed: 

Message from syslogd@t at Oct  1 15:24:08 ...
 kernel:LustreError: 3456:0:(mdt_handler.c:2395:mdt_obj()) LBUG

Kernel panic - not syncing: LBUG
Pid: 3456, comm: mdt01_002 Not tainted 2.6.32-358.18.1.el6.lustre.x86_64 #1
Call Trace:
 [<ffffffff8150f018>] ? panic+0xa7/0x16f
 [<ffffffffa02a9eeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
 [<ffffffffa0ac8365>] ? mdt_obj+0x55/0x80 [mdt]
 [<ffffffffa0accb06>] ? mdt_object_find+0x66/0x170 [mdt]
 [<ffffffffa0accecc>] ? mdt_unpack_req_pack_rep+0x2bc/0x4d0 [mdt]
 [<ffffffffa0ad6a54>] ? mdt_handle_common+0x4f4/0x1470 [mdt]
 [<ffffffffa0b10a25>] ? mds_regular_handle+0x15/0x20 [mdt]
 [<ffffffffa05dfe45>] ? ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc]
 [<ffffffffa02bb27f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
 [<ffffffffa05d74e9>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]
 [<ffffffffa05e11ad>] ? ptlrpc_main+0xaed/0x1740 [ptlrpc]
 [<ffffffffa05e06c0>] ? ptlrpc_main+0x0/0x1740 [ptlrpc]
 [<ffffffff81096a36>] ? kthread+0x96/0xa0
 [<ffffffff8100c0ca>] ? child_rip+0xa/0x20
 [<ffffffff810969a0>] ? kthread+0x0/0xa0
 [<ffffffff8100c0c0>] ? child_rip+0x0/0x20


 Comments   
Comment by Bruno Faccini (Inactive) [ 11/Dec/13 ]

John, I wonder if this is still an issue with current master ?
BTW, current mdt_obj() code don't have the "LASSERT(lu_device_is_mdt(o->lo_dev))" any more.
Also, do you remember which of "the new HSM ioctls which ship unvalidated FIDs to the MDT" you used to trigger the LBUG, so that we can retry against the new code and see if there other/new issues when using unvalidated FIDs arguments ?

Comment by John Hammond [ 20/Feb/14 ]

Bruno, you're correct about mdt_obj(). There may still be some bugs with bad FIDs but that's not new for the MDT handlers. Let's close this one.

Generated at Sat Feb 10 01:39:07 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.