Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.1.0
-
None
-
3
-
4936
Description
For the interoperability between 1.8 and 2.x, we use small size structure of "obd_connect_data_v1" for "RMF_CONNECT_DATA" as following:
==========
struct req_msg_field RMF_CONNECT_DATA =
DEFINE_MSGF("cdata",
RMF_F_NO_SIZE_CHECK /* we allow extra space for interop */,
#if LUSTRE_VERSION_CODE > OBD_OCD_VERSION(2, 9, 0, 0)
sizeof(struct obd_connect_data),
#else
/* For interoperability with 1.8 and 2.0 clients/servers.
- The RPC verification code allows larger RPC buffers, but not
- smaller buffers. Until we no longer need to keep compatibility
- with older servers/clients we can only check that the buffer
- size is at least as large as obd_connect_data_v1. That is not
- not in itself harmful, since the chance of just corrupting this
- field is low. See JIRA
LU-16for details. */
sizeof(struct obd_connect_data_v1),
#endif
lustre_swab_connect, NULL);
============
But when server process connection in "target_handle_connect()", it treats related fileds as large size structure of "obd_connect_data", assigning such fileds maybe cause out of bound memory over-written, and then cause memory crash as following:
============
LustreError: 9439:0:(pack_generic.c:800:lustre_msghdr_get_flags()) ASSERTION(0) failed: incorrect message magic: 00000000
LustreError: 9439:0:(pack_generic.c:800:lustre_msghdr_get_flags()) LBUG
Pid: 9439, comm: ll_mgs_00
Call Trace:
[<00000000f8bff5c0>] libcfs_debug_dumpstack+0x50/0x70 [libcfs]
[<00000000f8bffd5d>] lbug_with_loc+0x6d/0xd0 [libcfs]
[<00000000f9d3bbc0>] reply_in_callback+0x0/0x850 [ptlrpc]
[<00000000f9d32fe2>] lustre_msghdr_get_flags+0x82/0x90 [ptlrpc]
[<00000000f9d3bf80>] reply_in_callback+0x3c0/0x850 [ptlrpc]
[<00000000f9069751>] ldiskfs_mark_iloc_dirty+0x341/0x560 [ldiskfs]
[<00000000f9d3bbc0>] reply_in_callback+0x0/0x850 [ptlrpc]
[<00000000f9d3a527>] ptlrpc_master_callback+0x47/0xa0 [ptlrpc]
[<00000000f8c51a0a>] lnet_enq_event_locked+0x5a/0xb0 [lnet]
[<00000000f8c51ad8>] lnet_finalize+0x78/0x200 [lnet]
[<00000000f8c60fef>] lolnd_recv+0x5f/0x100 [lnet]
[<00000000f8c55e09>] lnet_ni_recv+0xf9/0x260 [lnet]
[<00000000f8c56059>] lnet_recv_put+0xe9/0x130 [lnet]
[<00000000f8c5c560>] lnet_parse+0x14e0/0x2620 [lnet]
[<00000000c048ccd6>] dput+0x72/0xed
[<00000000f8da1baf>] llog_free_handle+0x9f/0x330 [obdclass]
[<00000000c04906be>] mntput_no_expire+0x11/0x6a
[<00000000f8cc34f5>] pop_ctxt+0xe5/0x320 [lvfs]
[<00000000f8db9870>] __llog_ctxt_put+0x20/0x2e0 [obdclass]
[<00000000f8da3ce2>] llog_close+0x72/0x440 [obdclass]
[<00000000f8c610d1>] lolnd_send+0x41/0x90 [lnet]
[<00000000f8c55c9b>] lnet_ni_send+0x4b/0xc0 [lnet]
[<00000000f8c5804c>] lnet_send+0x1fc/0xd90 [lnet]
[<00000000f8db9870>] __llog_ctxt_put+0x20/0x2e0 [obdclass]
[<00000000f8c5e665>] LNetPut+0x565/0xef0 [lnet]
[<00000000f9d28924>] ptl_send_buf+0x1f4/0xab0 [ptlrpc]
[<00000000f9d39e26>] lustre_msg_set_timeout+0x96/0x110 [ptlrpc]
[<00000000f9d2942c>] ptlrpc_send_reply+0x24c/0x8b0 [ptlrpc]
[<00000000f9cdb934>] target_send_reply+0x94/0x910 [ptlrpc]
[<00000000f9d38d2c>] lustre_msg_get_conn_cnt+0xfc/0x1e0 [ptlrpc]
[<00000000f92e851e>] mgs_handle+0x31e/0x1f10 [mgs]
[<00000000f9d3234c>] lustre_msg_get_opc+0x10c/0x1f0 [ptlrpc]
[<00000000f9d4ce07>] ptlrpc_main+0x1217/0x27b0 [ptlrpc]
[<00000000c044d29c>] audit_syscall_exit+0x2d4/0x2ea
[<00000000f9d4bbf0>] ptlrpc_main+0x0/0x27b0 [ptlrpc]
[<00000000c0405c87>] kernel_thread_helper+0x7/0x10
<IRQ>
Kernel panic - not syncing: LBUG
Memory for crash kernel (0x0 to 0x0) notwithin permissible range
PCI: BIOS Bug: MCFG area at e0000000 is not E820-reserved
============
Attachments
Issue Links
- is duplicated by
-
LU-557 llmount.sh: lustre_msghdr_get_flags(): ASSERTION(0) failed: incorrect message magic: 00000000
- Resolved