Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-557

llmount.sh: lustre_msghdr_get_flags(): ASSERTION(0) failed: incorrect message magic: 00000000

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • Lustre 2.1.0
    • Lustre 2.1.0
    • None
    • A local CentOS 5 i686 VM, with a separate MGS device.
    • 3
    • 7879

    Description

      Commit: adc0fa37a44fce26e4c161176612c3c360a4dfbf

      I was trying to mount Lustre with a separate MGS device on my VM:

      [root@h221f tests]# MGSDEV=/tmp/lustre-mgs ./llmount.sh 
      Stopping clients: h221f /mnt/lustre (opts:)
      Stopping clients: h221f /mnt/lustre2 (opts:)
      Loading modules from /root/lustre-release/lustre/tests/..
      debug=0x33f0404
      subsystem_debug=0xffb7e3ff
      ../lnet/lnet/lnet options: 'networks=tcp(eth1) accept=all'
      gss/krb5 is not supported
      quota/lquota options: 'hash_lqs_cur_bits=3'
      Formatting mgs, mds, osts
      
         Permanent disk data:
      Target:     MGS
      Index:      unassigned
      Lustre FS:  lustre
      Mount type: ldiskfs
      Flags:      0x74
                    (MGS needs_index first_time update )
      Persistent mount opts: user_xattr,errors=remount-ro
      Parameters:
      
      formatting backing filesystem ldiskfs on /dev/loop0
      	target name  MGS
      	4k blocks     50000
      	options        -q -O uninit_bg,dir_nlink,huge_file,flex_bg -E lazy_journal_init -F
      mkfs_cmd = mke2fs -j -b 4096 -L MGS  -q -O uninit_bg,dir_nlink,huge_file,flex_bg -E lazy_journal_init -F /dev/loop0 50000
      Writing CONFIGS/mountdata
      Format mds1: /tmp/lustre-mdt1
      Format ost1: /tmp/lustre-ost1
      Format ost2: /tmp/lustre-ost2
      Checking servers environments
      Checking clients h221f environments
      Loading modules from /root/lustre-release/lustre/tests/..
      debug=0x33f0404
      subsystem_debug=0xffb7e3ff
      gss/krb5 is not supported
      Setup mgs, mdt, osts
      Starting mgs: -o loop,user_xattr,acl  /tmp/lustre-mgs /mnt/mgs
      Read from remote host 192.168.56.4: Connection reset by peer
      Connection to 192.168.56.4 closed.
      

      I'll keep the crash dump for a few days. From the crash log:

      Lustre: 2828:0:(debug.c:323:libcfs_debug_str2mask()) You are trying to use a numerical value for the mask - this will be deprecated in a future release.
      Lustre: OBD class driver, http://wiki.whamcloud.com/
      Lustre:         Lustre Version: 2.0.66
      Lustre:         Build Version: ../lustre/scripts--PRISTINE-2.6.18-238.12.1.el5.2943701
      Lustre: Lustre LU module (e0eb6020).
      Lustre: Added LNI 192.168.56.4@tcp [8/256/0/180]
      Lustre: Accept all, port 988
      Lustre: Lustre OSC module (e1157ee0).
      Lustre: Lustre LOV module (e11f3e40).
      init dynlocks cache
      ldiskfs created from ext4-2.6-rhel5
      Lustre: Lustre client module (e15a5be0).
      LDISKFS-fs (loop0): warning: maximal mount count reached, running e2fsck is recommended
      LDISKFS-fs (loop0): mounted filesystem with ordered data mode
      LDISKFS-fs (loop0): warning: maximal mount count reached, running e2fsck is recommended
      LDISKFS-fs (loop0): mounted filesystem with ordered data mode
      LDISKFS-fs (loop0): warning: maximal mount count reached, running e2fsck is recommended
      LDISKFS-fs (loop0): mounted filesystem with ordered data mode
      LDISKFS-fs (loop0): warning: maximal mount count reached, running e2fsck is recommended
      LDISKFS-fs (loop0): mounted filesystem with ordered data mode
      Lustre: 3578:0:(debug.c:323:libcfs_debug_str2mask()) You are trying to use a numerical value for the mask - this will be deprecated in a future release.
      Lustre: 3578:0:(debug.c:323:libcfs_debug_str2mask()) Skipped 1 previous similar message
      LDISKFS-fs (loop0): mounted filesystem with ordered data mode
      LDISKFS-fs (loop0): mounted filesystem with ordered data mode
      Lustre: MGS MGS started
      Lustre: 3702:0:(sec.c:1474:sptlrpc_import_sec_adapt()) import MGC192.168.56.4@tcp->MGC192.168.56.4@tcp_0 netid 90000: select flavor null
      Lustre: 3727:0:(ldlm_lib.c:874:target_handle_connect()) MGS: connection from 739be4f1-ebe7-82f6-16d5-337bd19bdfcd@0@lo t0 exp 00000000 cur 1312188896 last 0
      LustreError: 3727:0:(pack_generic.c:800:lustre_msghdr_get_flags()) ASSERTION(0) failed: incorrect message magic: 00000000
      LustreError: 3727:0:(pack_generic.c:800:lustre_msghdr_get_flags()) LBUG
      Pid: 3727, comm: ll_mgs_02
      
      Call Trace:
       [<00000000e0be15b0>] libcfs_debug_dumpstack+0x50/0x70 [libcfs]
       [<00000000e0be1d4d>] lbug_with_loc+0x6d/0xd0 [libcfs]
       [<00000000e0f40a00>] reply_in_callback+0x0/0x850 [ptlrpc]
       [<00000000e0f37e22>] lustre_msghdr_get_flags+0x82/0x90 [ptlrpc]
       [<00000000e0f40dc0>] reply_in_callback+0x3c0/0x850 [ptlrpc]
       [<00000000e1203851>] ldiskfs_mark_iloc_dirty+0x341/0x560 [ldiskfs]
       [<00000000e0f40a00>] reply_in_callback+0x0/0x850 [ptlrpc]
       [<00000000e0f3f367>] ptlrpc_master_callback+0x47/0xa0 [ptlrpc]
       [<00000000e0c33a0a>] lnet_enq_event_locked+0x5a/0xb0 [lnet]
       [<00000000e0c33ad8>] lnet_finalize+0x78/0x200 [lnet]
       [<00000000e0c42fcf>] lolnd_recv+0x5f/0x100 [lnet]
       [<00000000e0c37e09>] lnet_ni_recv+0xf9/0x260 [lnet]
       [<00000000e0c38059>] lnet_recv_put+0xe9/0x130 [lnet]
       [<00000000e0c3e560>] lnet_parse+0x14e0/0x2620 [lnet]
       [<00000000c048ca3d>] dput+0x72/0xed
       [<00000000e0db3baf>] llog_free_handle+0x9f/0x330 [obdclass]
       [<00000000c0490402>] mntput_no_expire+0x11/0x6a
       [<00000000e0b914f5>] pop_ctxt+0xe5/0x320 [lvfs]
       [<00000000e0dcb810>] __llog_ctxt_put+0x20/0x2e0 [obdclass]
       [<00000000e0db5c82>] llog_close+0x72/0x440 [obdclass]
       [<00000000e0c430b1>] lolnd_send+0x41/0x90 [lnet]
       [<00000000e0c37c9b>] lnet_ni_send+0x4b/0xc0 [lnet]
       [<00000000e0c3a04c>] lnet_send+0x1fc/0xd90 [lnet]
       [<00000000e0dcb810>] __llog_ctxt_put+0x20/0x2e0 [obdclass]
       [<00000000e0c40665>] LNetPut+0x565/0xef0 [lnet]
       [<00000000e0f2d764>] ptl_send_buf+0x1f4/0xab0 [ptlrpc]
       [<00000000e0f3ec66>] lustre_msg_set_timeout+0x96/0x110 [ptlrpc]
       [<00000000e0f2e26c>] ptlrpc_send_reply+0x24c/0x8b0 [ptlrpc]
       [<00000000e0ee0874>] target_send_reply+0x94/0x910 [ptlrpc]
       [<00000000e0f3db6c>] lustre_msg_get_conn_cnt+0xfc/0x1e0 [ptlrpc]
       [<00000000e124b51e>] mgs_handle+0x31e/0x1f10 [mgs]
       [<00000000e0f3718c>] lustre_msg_get_opc+0x10c/0x1f0 [ptlrpc]
       [<00000000e0f51b67>] ptlrpc_main+0x1217/0x27b0 [ptlrpc]
       [<00000000c044cf34>] audit_syscall_exit+0x2d4/0x2ea
       [<00000000e0f50950>] ptlrpc_main+0x0/0x27b0 [ptlrpc]
       [<00000000c0405c87>] kernel_thread_helper+0x7/0x10
       <IRQ> 
      Kernel panic - not syncing: LBUG
      

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              liwei Li Wei (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: