[LU-557] llmount.sh: lustre_msghdr_get_flags(): ASSERTION(0) failed: incorrect message magic: 00000000 Created: 01/Aug/11  Updated: 25/Apr/13  Resolved: 25/Apr/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.0
Fix Version/s: Lustre 2.1.0

Type: Bug Priority: Minor
Reporter: Li Wei (Inactive) Assignee: WC Triage
Resolution: Duplicate Votes: 0
Labels: None
Environment:

A local CentOS 5 i686 VM, with a separate MGS device.


Issue Links:
Duplicate
duplicates LU-539 small size for RMF_CONNECT_DATA cause... Resolved
Severity: 3
Rank (Obsolete): 7879

 Description   

Commit: adc0fa37a44fce26e4c161176612c3c360a4dfbf

I was trying to mount Lustre with a separate MGS device on my VM:

[root@h221f tests]# MGSDEV=/tmp/lustre-mgs ./llmount.sh 
Stopping clients: h221f /mnt/lustre (opts:)
Stopping clients: h221f /mnt/lustre2 (opts:)
Loading modules from /root/lustre-release/lustre/tests/..
debug=0x33f0404
subsystem_debug=0xffb7e3ff
../lnet/lnet/lnet options: 'networks=tcp(eth1) accept=all'
gss/krb5 is not supported
quota/lquota options: 'hash_lqs_cur_bits=3'
Formatting mgs, mds, osts

   Permanent disk data:
Target:     MGS
Index:      unassigned
Lustre FS:  lustre
Mount type: ldiskfs
Flags:      0x74
              (MGS needs_index first_time update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters:

formatting backing filesystem ldiskfs on /dev/loop0
	target name  MGS
	4k blocks     50000
	options        -q -O uninit_bg,dir_nlink,huge_file,flex_bg -E lazy_journal_init -F
mkfs_cmd = mke2fs -j -b 4096 -L MGS  -q -O uninit_bg,dir_nlink,huge_file,flex_bg -E lazy_journal_init -F /dev/loop0 50000
Writing CONFIGS/mountdata
Format mds1: /tmp/lustre-mdt1
Format ost1: /tmp/lustre-ost1
Format ost2: /tmp/lustre-ost2
Checking servers environments
Checking clients h221f environments
Loading modules from /root/lustre-release/lustre/tests/..
debug=0x33f0404
subsystem_debug=0xffb7e3ff
gss/krb5 is not supported
Setup mgs, mdt, osts
Starting mgs: -o loop,user_xattr,acl  /tmp/lustre-mgs /mnt/mgs
Read from remote host 192.168.56.4: Connection reset by peer
Connection to 192.168.56.4 closed.

I'll keep the crash dump for a few days. From the crash log:

Lustre: 2828:0:(debug.c:323:libcfs_debug_str2mask()) You are trying to use a numerical value for the mask - this will be deprecated in a future release.
Lustre: OBD class driver, http://wiki.whamcloud.com/
Lustre:         Lustre Version: 2.0.66
Lustre:         Build Version: ../lustre/scripts--PRISTINE-2.6.18-238.12.1.el5.2943701
Lustre: Lustre LU module (e0eb6020).
Lustre: Added LNI 192.168.56.4@tcp [8/256/0/180]
Lustre: Accept all, port 988
Lustre: Lustre OSC module (e1157ee0).
Lustre: Lustre LOV module (e11f3e40).
init dynlocks cache
ldiskfs created from ext4-2.6-rhel5
Lustre: Lustre client module (e15a5be0).
LDISKFS-fs (loop0): warning: maximal mount count reached, running e2fsck is recommended
LDISKFS-fs (loop0): mounted filesystem with ordered data mode
LDISKFS-fs (loop0): warning: maximal mount count reached, running e2fsck is recommended
LDISKFS-fs (loop0): mounted filesystem with ordered data mode
LDISKFS-fs (loop0): warning: maximal mount count reached, running e2fsck is recommended
LDISKFS-fs (loop0): mounted filesystem with ordered data mode
LDISKFS-fs (loop0): warning: maximal mount count reached, running e2fsck is recommended
LDISKFS-fs (loop0): mounted filesystem with ordered data mode
Lustre: 3578:0:(debug.c:323:libcfs_debug_str2mask()) You are trying to use a numerical value for the mask - this will be deprecated in a future release.
Lustre: 3578:0:(debug.c:323:libcfs_debug_str2mask()) Skipped 1 previous similar message
LDISKFS-fs (loop0): mounted filesystem with ordered data mode
LDISKFS-fs (loop0): mounted filesystem with ordered data mode
Lustre: MGS MGS started
Lustre: 3702:0:(sec.c:1474:sptlrpc_import_sec_adapt()) import MGC192.168.56.4@tcp->MGC192.168.56.4@tcp_0 netid 90000: select flavor null
Lustre: 3727:0:(ldlm_lib.c:874:target_handle_connect()) MGS: connection from 739be4f1-ebe7-82f6-16d5-337bd19bdfcd@0@lo t0 exp 00000000 cur 1312188896 last 0
LustreError: 3727:0:(pack_generic.c:800:lustre_msghdr_get_flags()) ASSERTION(0) failed: incorrect message magic: 00000000
LustreError: 3727:0:(pack_generic.c:800:lustre_msghdr_get_flags()) LBUG
Pid: 3727, comm: ll_mgs_02

Call Trace:
 [<00000000e0be15b0>] libcfs_debug_dumpstack+0x50/0x70 [libcfs]
 [<00000000e0be1d4d>] lbug_with_loc+0x6d/0xd0 [libcfs]
 [<00000000e0f40a00>] reply_in_callback+0x0/0x850 [ptlrpc]
 [<00000000e0f37e22>] lustre_msghdr_get_flags+0x82/0x90 [ptlrpc]
 [<00000000e0f40dc0>] reply_in_callback+0x3c0/0x850 [ptlrpc]
 [<00000000e1203851>] ldiskfs_mark_iloc_dirty+0x341/0x560 [ldiskfs]
 [<00000000e0f40a00>] reply_in_callback+0x0/0x850 [ptlrpc]
 [<00000000e0f3f367>] ptlrpc_master_callback+0x47/0xa0 [ptlrpc]
 [<00000000e0c33a0a>] lnet_enq_event_locked+0x5a/0xb0 [lnet]
 [<00000000e0c33ad8>] lnet_finalize+0x78/0x200 [lnet]
 [<00000000e0c42fcf>] lolnd_recv+0x5f/0x100 [lnet]
 [<00000000e0c37e09>] lnet_ni_recv+0xf9/0x260 [lnet]
 [<00000000e0c38059>] lnet_recv_put+0xe9/0x130 [lnet]
 [<00000000e0c3e560>] lnet_parse+0x14e0/0x2620 [lnet]
 [<00000000c048ca3d>] dput+0x72/0xed
 [<00000000e0db3baf>] llog_free_handle+0x9f/0x330 [obdclass]
 [<00000000c0490402>] mntput_no_expire+0x11/0x6a
 [<00000000e0b914f5>] pop_ctxt+0xe5/0x320 [lvfs]
 [<00000000e0dcb810>] __llog_ctxt_put+0x20/0x2e0 [obdclass]
 [<00000000e0db5c82>] llog_close+0x72/0x440 [obdclass]
 [<00000000e0c430b1>] lolnd_send+0x41/0x90 [lnet]
 [<00000000e0c37c9b>] lnet_ni_send+0x4b/0xc0 [lnet]
 [<00000000e0c3a04c>] lnet_send+0x1fc/0xd90 [lnet]
 [<00000000e0dcb810>] __llog_ctxt_put+0x20/0x2e0 [obdclass]
 [<00000000e0c40665>] LNetPut+0x565/0xef0 [lnet]
 [<00000000e0f2d764>] ptl_send_buf+0x1f4/0xab0 [ptlrpc]
 [<00000000e0f3ec66>] lustre_msg_set_timeout+0x96/0x110 [ptlrpc]
 [<00000000e0f2e26c>] ptlrpc_send_reply+0x24c/0x8b0 [ptlrpc]
 [<00000000e0ee0874>] target_send_reply+0x94/0x910 [ptlrpc]
 [<00000000e0f3db6c>] lustre_msg_get_conn_cnt+0xfc/0x1e0 [ptlrpc]
 [<00000000e124b51e>] mgs_handle+0x31e/0x1f10 [mgs]
 [<00000000e0f3718c>] lustre_msg_get_opc+0x10c/0x1f0 [ptlrpc]
 [<00000000e0f51b67>] ptlrpc_main+0x1217/0x27b0 [ptlrpc]
 [<00000000c044cf34>] audit_syscall_exit+0x2d4/0x2ea
 [<00000000e0f50950>] ptlrpc_main+0x0/0x27b0 [ptlrpc]
 [<00000000c0405c87>] kernel_thread_helper+0x7/0x10
 <IRQ> 
Kernel panic - not syncing: LBUG


 Comments   
Comment by Mikhail Pershin [ 01/Aug/11 ]

this is duplicate of LU-539

Comment by Andreas Dilger [ 25/Apr/13 ]

Per Mike's last comment this is a duplicate of LU-539.

Generated at Sat Feb 10 01:08:13 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.