[LU-1531] Accessing .lustre/fid/[0x1:0x0:0x0] triggers LBUG in osd_compat_objid_lookup() Created: 15/Jun/12  Updated: 06/Nov/12  Resolved: 06/Nov/12

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.3.0
Fix Version/s: Lustre 2.4.0

Type: Bug Priority: Minor
Reporter: John Hammond Assignee: Richard Henwood (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:
  1. uname -r
    2.6.32-220.13.1.el6.l22.x86_64
  2. cat /proc/fs/lustre/version
    lustre: 2.2.55
    kernel: patchless_client
    build: 2.2.55--CHANGED-2.6.32-220.13.1.el6.l22.x86_64

Issue Links:
Related
is related to LU-1518 Missing/bad operations in mdd_{obf,do... Resolved
Severity: 3
Rank (Obsolete): 6380

 Description   

[sanity@r62-lustre lustre]$ cat .lustre/fid/[0x1:0x0:0x0]

LustreError: 2316:0:(osd_compat.c:383:osd_compat_objid_lookup()) ASSERTION( map ) failed:
LustreError: 2316:0:(osd_compat.c:383:osd_compat_objid_lookup()) LBUG
Pid: 2316, comm: mdt_00

Call Trace:
[<ffffffffa02c2905>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
[<ffffffffa02c2f17>] lbug_with_loc+0x47/0xb0 [libcfs]
[<ffffffffa0b7f889>] osd_compat_objid_lookup+0x479/0x520 [osd_ldiskfs]
[<ffffffffa0b74165>] osd_oi_lookup+0x55/0xb0 [osd_ldiskfs]
[<ffffffffa0b6ea90>] osd_object_init+0x260/0x6a0 [osd_ldiskfs]
[<ffffffffa048ab0c>] ? lu_object_add+0x2c/0x30 [obdclass]
[<ffffffffa048c5d5>] lu_object_alloc+0xd5/0x310 [obdclass]
[<ffffffffa048cba8>] ? htable_lookup+0x108/0x1c0 [obdclass]
[<ffffffffa048ce61>] lu_object_find_at+0x201/0x450 [obdclass]
[<ffffffff81273c47>] ? vsscanf+0x617/0x7c0
[<ffffffffa048d0ef>] lu_object_find_slice+0x1f/0x80 [obdclass]
[<ffffffffa0a4f160>] mdd_object_find+0x10/0x70 [mdd]
[<ffffffffa0a79006>] obf_lookup+0xd6/0x270 [mdd]
[<ffffffffa0b3f136>] cml_lookup+0x66/0x1b0 [cmm]
[<ffffffffa0ad50e7>] ? mdt_version_get_check+0x47/0xe0 [mdt]
[<ffffffffa0ae9c37>] mdt_reint_open+0x6f7/0x18b0 [mdt]
[<ffffffffa0a7782e>] ? md_ucred+0x1e/0x60 [mdd]
[<ffffffffa0ab81a5>] ? mdt_ucred+0x15/0x20 [mdt]
[<ffffffffa0acf21c>] ? mdt_root_squash+0x2c/0x3e0 [mdt]
[<ffffffffa0ad3b21>] mdt_reint_rec+0x41/0xe0 [mdt]
[<ffffffffa0acd37a>] mdt_reint_internal+0x50a/0x810 [mdt]
[<ffffffffa0acd94d>] mdt_intent_reint+0x1ed/0x500 [mdt]
[<ffffffffa0ac9c91>] mdt_intent_policy+0x371/0x6a0 [mdt]
[<ffffffffa05ae831>] ldlm_lock_enqueue+0x361/0x8f0 [ptlrpc]
[<ffffffffa05d611a>] ldlm_handle_enqueue0+0x48a/0xf40 [ptlrpc]
[<ffffffffa0ac9836>] mdt_enqueue+0x46/0x130 [mdt]
[<ffffffffa0abf2a2>] mdt_handle_common+0x922/0x1740 [mdt]
[<ffffffffa0ac0195>] mdt_regular_handle+0x15/0x20 [mdt]
[<ffffffffa0604782>] ptlrpc_server_handle_request+0x412/0xeb0 [ptlrpc]
[<ffffffffa02c365e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
[<ffffffffa02d3d9f>] ? lc_watchdog_touch+0x6f/0x180 [libcfs]
[<ffffffffa05fd5e2>] ? ptlrpc_wait_event+0xb2/0x2c0 [ptlrpc]
[<ffffffff81051ab3>] ? __wake_up+0x53/0x70
[<ffffffffa06059f7>] ptlrpc_main+0x7d7/0x1610 [ptlrpc]
[<ffffffffa0605220>] ? ptlrpc_main+0x0/0x1610 [ptlrpc]
[<ffffffff8100c14a>] child_rip+0xa/0x20
[<ffffffffa0605220>] ? ptlrpc_main+0x0/0x1610 [ptlrpc]
[<ffffffffa0605220>] ? ptlrpc_main+0x0/0x1610 [ptlrpc]
[<ffffffff8100c140>] ? child_rip+0x0/0x20

LustreError: dumping log to /tmp/lustre-log.1339790039.2316



 Comments   
Comment by Andreas Dilger [ 15/Jun/12 ]

This is code that was landed to master from orion in commit 4980567857699c7f902ebda336ea98fdc4b83100.

It definitely shouldn't be possible for regular users to trigger an LASSERT(), or otherwise access invalid FIDs via .lustre, so the MDS needs to validate the FID is sane and allowed to be accessed before passing it down to the lower layers.

Comment by John Hammond [ 09/Jul/12 ]

You can also get here using lfs path2fid, in fact by using the sample FID provided.

[root]# lfs fid2path /mnt/lustre PANTS
bad FID format [PANTS], should be [0x1:0x2:0x0]

fid2path error: Invalid argument
[root]# lfs fid2path /mnt/lustre [0x1:0x2:0x0]

LustreError: 2446:0:(osd_compat.c:381:osd_compat_objid_lookup()) ASSERTION( map ) failed:
LustreError: 2446:0:(osd_compat.c:381:osd_compat_objid_lookup()) LBUG
Pid: 2446, comm: mdt00_002

Call Trace:
[<ffffffffa0308905>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
[<ffffffffa0308f17>] lbug_with_loc+0x47/0xb0 [libcfs]
[<ffffffffa0b73c59>] osd_compat_objid_lookup+0x479/0x520 [osd_ldiskfs]
[<ffffffff81275d46>] ? vsnprintf+0x2b6/0x5f0
[<ffffffffa0b68565>] osd_oi_lookup+0x55/0xc0 [osd_ldiskfs]
[<ffffffffa0b5dbf8>] osd_object_init+0x338/0xd80 [osd_ldiskfs]
[<ffffffffa048a22e>] ? dt_object_init+0xe/0x10 [obdclass]
[<ffffffffa0486f65>] lu_object_alloc+0xd5/0x310 [obdclass]
[<ffffffffa0487538>] ? htable_lookup+0x108/0x1c0 [obdclass]
[<ffffffffa04877f1>] lu_object_find_at+0x201/0x450 [obdclass]
[<ffffffffa05f4271>] ? lustre_pack_reply_v2+0x1e1/0x280 [ptlrpc]
[<ffffffffa05f330d>] ? lustre_msg_buf+0x5d/0x60 [ptlrpc]
[<ffffffffa0487a56>] lu_object_find+0x16/0x20 [obdclass]
[<ffffffffa0a71da6>] mdt_object_find+0x56/0x170 [mdt]
[<ffffffffa0a75fda>] mdt_get_info+0x22a/0xa90 [mdt]
[<ffffffffa0a71f0d>] ? mdt_unpack_req_pack_rep+0x4d/0x4d0 [mdt]
[<ffffffffa0a7a922>] mdt_handle_common+0x922/0x1740 [mdt]
[<ffffffffa0a7b815>] mdt_regular_handle+0x15/0x20 [mdt]
[<ffffffffa060457d>] ptlrpc_server_handle_request+0x40d/0xea0 [ptlrpc]
[<ffffffffa030965e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
[<ffffffffa05fba37>] ? ptlrpc_wait_event+0xa7/0x2a0 [ptlrpc]
[<ffffffff81051ba3>] ? __wake_up+0x53/0x70
[<ffffffffa0605b79>] ptlrpc_main+0xb69/0x1870 [ptlrpc]
[<ffffffffa0605010>] ? ptlrpc_main+0x0/0x1870 [ptlrpc]
[<ffffffff8100c14a>] child_rip+0xa/0x20
[<ffffffffa0605010>] ? ptlrpc_main+0x0/0x1870 [ptlrpc]
[<ffffffffa0605010>] ? ptlrpc_main+0x0/0x1870 [ptlrpc]
[<ffffffff8100c140>] ? child_rip+0x0/0x20

Comment by Richard Henwood (Inactive) [ 03/Oct/12 ]

I couldn't reproduce this issue with a recent Master build (924) and I can't reproduce.

# cat .lustre/fid/[0x1:0x0:0x0]
cat: .lustre/fid/[0x1:0x0:0x0]: Invalid argument
Comment by Richard Henwood (Inactive) [ 03/Oct/12 ]

Seems like this issue has been fixed by LU-1518.

Comment by Andreas Dilger [ 03/Oct/12 ]

Richard, it would probably be mired appropriate to have closed this as "Fixed" rather than "Cannot Reproduce", since it was a real bug that was fixed with a code change. The "Cannot Reproduce" label indicates that a reported bug was not seen in later testing for whatever reason, and no change was made to the code to resolve the issue.

In any case, I'm glad this problem is gone.

Comment by Richard Henwood (Inactive) [ 03/Oct/12 ]

Correcting resolution message. This issue was a real problem that was /fixed/.

Comment by Richard Henwood (Inactive) [ 03/Oct/12 ]

double checking if this is actually fixed on Lustre 2.3 ...

Comment by Richard Henwood (Inactive) [ 03/Oct/12 ]

Given Andreas's comment that this issue was introduced by an Orion commit, 2.3 should not be affected. I've checked it anyway, and I couldn't reproduce.

However, this issue is fixed.

Comment by Richard Henwood (Inactive) [ 05/Oct/12 ]

This issue is present on Master.

lfs fid2path /mnt/lustre/ [0x1:0x2:0x0]
LustreError: 31926:0:(mdt_handler.c:2447:mdt_obj()) ASSERTION( lu_device_is_mdt(o->lo_dev) ) failed: 
LustreError: 31926:0:(mdt_handler.c:2447:mdt_obj()) LBUG
Pid: 31926, comm: mdt00_000

Call Trace:
 [<ffffffffa03d3905>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
 [<ffffffffa03d3f17>] lbug_with_loc+0x47/0xb0 [libcfs]
 [<ffffffffa0b851cf>] mdt_obj+0x5f/0x80 [mdt]
 [<ffffffffa0b892e6>] mdt_object_find+0x66/0x170 [mdt]
 [<ffffffffa0b8dcaa>] mdt_get_info+0x22a/0xa90 [mdt]
 [<ffffffffa0b8943d>] ? mdt_unpack_req_pack_rep+0x4d/0x4c0 [mdt]
 [<ffffffffa0b91322>] mdt_handle_common+0x932/0x1740 [mdt]
 [<ffffffffa0b92205>] mdt_regular_handle+0x15/0x20 [mdt]
 [<ffffffffa06ed7ac>] ptlrpc_server_handle_request+0x41c/0xe00 [ptlrpc]
 [<ffffffffa03d465e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
 [<ffffffffa06e4b87>] ? ptlrpc_wait_event+0xa7/0x2a0 [ptlrpc]
 [<ffffffff810533f3>] ? __wake_up+0x53/0x70
 [<ffffffffa06eed81>] ptlrpc_main+0xbf1/0x19e0 [ptlrpc]
 [<ffffffffa06ee190>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc]
 [<ffffffff8100c14a>] child_rip+0xa/0x20
 [<ffffffffa06ee190>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc]
 [<ffffffffa06ee190>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc]
 [<ffffffff8100c140>] ? child_rip+0x0/0x20

Kernel panic - not syncing: LBUG
Pid: 31926, comm: mdt00_000 Not tainted 2.6.32-279.5.1.el6_lustre.x86_64 #1
Call Trace:
 [<ffffffff814fd58a>] ? panic+0xa0/0x168
 [<ffffffffa03d3f6b>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
 [<ffffffffa0b851cf>] ? mdt_obj+0x5f/0x80 [mdt]
 [<ffffffffa0b892e6>] ? mdt_object_find+0x66/0x170 [mdt]
 [<ffffffffa0b8dcaa>] ? mdt_get_info+0x22a/0xa90 [mdt]
 [<ffffffffa0b8943d>] ? mdt_unpack_req_pack_rep+0x4d/0x4c0 [mdt]
 [<ffffffffa0b91322>] ? mdt_handle_common+0x932/0x1740 [mdt]
 [<ffffffffa0b92205>] ? mdt_regular_handle+0x15/0x20 [mdt]
 [<ffffffffa06ed7ac>] ? ptlrpc_server_handle_request+0x41c/0xe00 [ptlrpc]
 [<ffffffffa03d465e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
 [<ffffffffa06e4b87>] ? ptlrpc_wait_event+0xa7/0x2a0 [ptlrpc]
 [<ffffffff810533f3>] ? __wake_up+0x53/0x70
 [<ffffffffa06eed81>] ? ptlrpc_main+0xbf1/0x19e0 [ptlrpc]
 [<ffffffffa06ee190>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc]
 [<ffffffff8100c14a>] ? child_rip+0xa/0x20
 [<ffffffffa06ee190>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc]
 [<ffffffffa06ee190>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc]
 [<ffffffff8100c140>] ? child_rip+0x0/0x20
Comment by Richard Henwood (Inactive) [ 06/Nov/12 ]

a fix has landed in master:

http://review.whamcloud.com/#change,4255

Generated at Sat Feb 10 01:17:28 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.