[LU-539] small size for RMF_CONNECT_DATA caused out of bound memory crash - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Minor
Fix Version/s: Lustre 2.1.0
Affects Version/s: Lustre 2.1.0
Labels:
None

Severity:
3
Rank (Obsolete):
4936

Description

For the interoperability between 1.8 and 2.x, we use small size structure of "obd_connect_data_v1" for "RMF_CONNECT_DATA" as following:

==========
struct req_msg_field RMF_CONNECT_DATA =
DEFINE_MSGF("cdata",
RMF_F_NO_SIZE_CHECK /* we allow extra space for interop */,
#if LUSTRE_VERSION_CODE > OBD_OCD_VERSION(2, 9, 0, 0)
sizeof(struct obd_connect_data),
#else
/* For interoperability with 1.8 and 2.0 clients/servers.

The RPC verification code allows larger RPC buffers, but not
smaller buffers. Until we no longer need to keep compatibility
with older servers/clients we can only check that the buffer
size is at least as large as obd_connect_data_v1. That is not
not in itself harmful, since the chance of just corrupting this
field is low. See JIRA ~~LU-16~~ for details. */
sizeof(struct obd_connect_data_v1),
#endif
lustre_swab_connect, NULL);
============

But when server process connection in "target_handle_connect()", it treats related fileds as large size structure of "obd_connect_data", assigning such fileds maybe cause out of bound memory over-written, and then cause memory crash as following:

============
LustreError: 9439:0:(pack_generic.c:800:lustre_msghdr_get_flags()) ASSERTION(0) failed: incorrect message magic: 00000000
LustreError: 9439:0:(pack_generic.c:800:lustre_msghdr_get_flags()) LBUG
Pid: 9439, comm: ll_mgs_00

Call Trace:
[<00000000f8bff5c0>] libcfs_debug_dumpstack+0x50/0x70 [libcfs]
[<00000000f8bffd5d>] lbug_with_loc+0x6d/0xd0 [libcfs]
[<00000000f9d3bbc0>] reply_in_callback+0x0/0x850 [ptlrpc]
[<00000000f9d32fe2>] lustre_msghdr_get_flags+0x82/0x90 [ptlrpc]
[<00000000f9d3bf80>] reply_in_callback+0x3c0/0x850 [ptlrpc]
[<00000000f9069751>] ldiskfs_mark_iloc_dirty+0x341/0x560 [ldiskfs]
[<00000000f9d3bbc0>] reply_in_callback+0x0/0x850 [ptlrpc]
[<00000000f9d3a527>] ptlrpc_master_callback+0x47/0xa0 [ptlrpc]
[<00000000f8c51a0a>] lnet_enq_event_locked+0x5a/0xb0 [lnet]
[<00000000f8c51ad8>] lnet_finalize+0x78/0x200 [lnet]
[<00000000f8c60fef>] lolnd_recv+0x5f/0x100 [lnet]
[<00000000f8c55e09>] lnet_ni_recv+0xf9/0x260 [lnet]
[<00000000f8c56059>] lnet_recv_put+0xe9/0x130 [lnet]
[<00000000f8c5c560>] lnet_parse+0x14e0/0x2620 [lnet]
[<00000000c048ccd6>] dput+0x72/0xed
[<00000000f8da1baf>] llog_free_handle+0x9f/0x330 [obdclass]
[<00000000c04906be>] mntput_no_expire+0x11/0x6a
[<00000000f8cc34f5>] pop_ctxt+0xe5/0x320 [lvfs]
[<00000000f8db9870>] __llog_ctxt_put+0x20/0x2e0 [obdclass]
[<00000000f8da3ce2>] llog_close+0x72/0x440 [obdclass]
[<00000000f8c610d1>] lolnd_send+0x41/0x90 [lnet]
[<00000000f8c55c9b>] lnet_ni_send+0x4b/0xc0 [lnet]
[<00000000f8c5804c>] lnet_send+0x1fc/0xd90 [lnet]
[<00000000f8db9870>] __llog_ctxt_put+0x20/0x2e0 [obdclass]
[<00000000f8c5e665>] LNetPut+0x565/0xef0 [lnet]
[<00000000f9d28924>] ptl_send_buf+0x1f4/0xab0 [ptlrpc]
[<00000000f9d39e26>] lustre_msg_set_timeout+0x96/0x110 [ptlrpc]
[<00000000f9d2942c>] ptlrpc_send_reply+0x24c/0x8b0 [ptlrpc]
[<00000000f9cdb934>] target_send_reply+0x94/0x910 [ptlrpc]
[<00000000f9d38d2c>] lustre_msg_get_conn_cnt+0xfc/0x1e0 [ptlrpc]
[<00000000f92e851e>] mgs_handle+0x31e/0x1f10 [mgs]
[<00000000f9d3234c>] lustre_msg_get_opc+0x10c/0x1f0 [ptlrpc]
[<00000000f9d4ce07>] ptlrpc_main+0x1217/0x27b0 [ptlrpc]
[<00000000c044d29c>] audit_syscall_exit+0x2d4/0x2ea
[<00000000f9d4bbf0>] ptlrpc_main+0x0/0x27b0 [ptlrpc]
[<00000000c0405c87>] kernel_thread_helper+0x7/0x10
<IRQ>
Kernel panic - not syncing: LBUG
Memory for crash kernel (0x0 to 0x0) notwithin permissible range
PCI: BIOS Bug: MCFG area at e0000000 is not E820-reserved

============

Attachments

Issue Links

is duplicated by

LU-557 llmount.sh: lustre_msghdr_get_flags(): ASSERTION(0) failed: incorrect message magic: 00000000

Resolved

Activity

People

Assignee:: Zhenyu Xu

Reporter:: nasf (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 26/Jul/11 1:26 PM

Updated:: 25/Apr/13 9:34 AM

Resolved:: 01/Aug/11 10:55 AM