Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3333

lustre_msg_get_opc()) incorrect message magic: a0b03b5 LBUG

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.6.0, Lustre 2.5.2
    • Lustre 2.3.0, Lustre 2.4.0, Lustre 2.1.5, Lustre 1.8.9
    • None
    • 3
    • 8241

    Description

      I was suspicious when I hit this for the first time, but I hit it once more, so I am filing a bug.

      First time was in sanity test 182

      [78415.059119] LustreError: 17692:0:(mdt_recovery.c:390:mdt_last_rcvd_update()) Trying to overwrite bigger transno:on-disk: 25769825268, new: 25769825267 replay: 0. see LU-617.
      [78415.061562] LustreError: 17692:0:(mdt_recovery.c:390:mdt_last_rcvd_update()) Skipped 8 previous similar messages
      [78417.103412] LustreError: 31938:0:(pack_generic.c:1032:lustre_msg_get_opc()) incorrect message magic: a0b03b5e(msg:ffff8800aba18df0)
      [78417.103916] LustreError: 31938:0:(pack_generic.c:1033:lustre_msg_get_opc()) LBUG
      [78417.104332] Pid: 31938, comm: createmany
      [78417.104542] 
      [78417.104543] Call Trace:
      [78417.104925]  [<ffffffffa0b028a5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      [78417.105213]  [<ffffffffa0b02ea7>] lbug_with_loc+0x47/0xb0 [libcfs]
      [78417.105490]  [<ffffffffa0d5a4ba>] lustre_msg_get_opc+0x8a/0x110 [ptlrpc]
      [78417.105778]  [<ffffffffa0d5d97b>] _debug_req+0x57b/0x680 [ptlrpc]
      [78417.106087]  [<ffffffff81047fde>] ? kernel_map_pages+0xfe/0x110
      [78417.106345]  [<ffffffffa0d4965e>] ptlrpc_free_committed+0x20e/0x620 [ptlrpc]
      [78417.106619]  [<ffffffffa0d4b313>] after_reply+0x7a3/0xd90 [ptlrpc]
      [78417.106878]  [<ffffffffa0d502c3>] ptlrpc_check_set+0x1093/0x1da0 [ptlrpc]
      [78417.107137]  [<ffffffff8107eca0>] ? process_timeout+0x0/0x10
      [78417.107389]  [<ffffffffa0d512fa>] ptlrpc_set_wait+0x32a/0x880 [ptlrpc]
      [78417.107639]  [<ffffffff8105ad10>] ? default_wake_function+0x0/0x20
      [78417.107907]  [<ffffffffa0d5b4c6>] ? lustre_msg_set_jobid+0xb6/0x140 [ptlrpc]
      [78417.108187]  [<ffffffffa0d518cf>] ptlrpc_queue_wait+0x7f/0x220 [ptlrpc]
      [78417.108453]  [<ffffffffa0d2d9c5>] ldlm_cli_enqueue+0x365/0x770 [ptlrpc]
      [78417.119499]  [<ffffffff814fc8bc>] ? __mutex_lock_slowpath+0x21c/0x2c0
      [78417.119779]  [<ffffffffa0d326e0>] ? ldlm_completion_ast+0x0/0x950 [ptlrpc]
      [78417.120071]  [<ffffffffa0f389e0>] ? ll_md_blocking_ast+0x0/0x750 [lustre]
      [78417.120334]  [<ffffffffa0415f55>] mdc_enqueue+0x795/0x18c0 [mdc]
      [78417.120583]  [<ffffffffa041727d>] mdc_intent_lock+0x1fd/0x5b4 [mdc]
      [78417.120847]  [<ffffffffa0f389e0>] ? ll_md_blocking_ast+0x0/0x750 [lustre]
      [78417.121427]  [<ffffffffa0d326e0>] ? ldlm_completion_ast+0x0/0x950 [ptlrpc]
      [78417.121688]  [<ffffffffa03c8667>] ? lmv_fid_alloc+0x117/0x4b0 [lmv]
      [78417.121952]  [<ffffffffa03e02d6>] lmv_intent_open+0x1f6/0x8c0 [lmv]
      [78417.122206]  [<ffffffffa0f389e0>] ? ll_md_blocking_ast+0x0/0x750 [lustre]
      [78417.122464]  [<ffffffff811661ca>] ? cache_alloc_debugcheck_after+0x14a/0x210
      [78417.122720]  [<ffffffff81168671>] ? __kmalloc+0x191/0x2a0
      [78417.122958]  [<ffffffffa03e0c4b>] lmv_intent_lock+0x2ab/0x370 [lmv]
      [78417.123211]  [<ffffffffa0f389e0>] ? ll_md_blocking_ast+0x0/0x750 [lustre]
      [78417.123471]  [<ffffffffa0f37a9e>] ? ll_i2gids+0x2e/0xd0 [lustre]
      [78417.123718]  [<ffffffffa0f1e4ba>] ? ll_prep_md_op_data+0xfa/0x3a0 [lustre]
      [78417.123983]  [<ffffffffa0f3d014>] ll_lookup_it+0x3a4/0xbf0 [lustre]
      [78417.124273]  [<ffffffffa0f389e0>] ? ll_md_blocking_ast+0x0/0x750 [lustre]
      [78417.124539]  [<ffffffffa0f3d8e9>] ll_lookup_nd+0x89/0x3b0 [lustre]
      [78417.124784]  [<ffffffff8119b456>] ? d_alloc+0x166/0x1d0
      [78417.125087]  [<ffffffff8118e8e2>] __lookup_hash+0x102/0x160
      [78417.125322]  [<ffffffff8118ef8a>] lookup_hash+0x3a/0x50
      [78417.125550]  [<ffffffff81192edb>] do_filp_open+0x2eb/0xe00
      [78417.125790]  [<ffffffff814fae88>] ? thread_return+0x4e/0x776
      [78417.126034]  [<ffffffff8118f37b>] ? getname+0x3b/0x250
      [78417.126259]  [<ffffffff8119fd7b>] ? alloc_fd+0xab/0x160
      [78417.126490]  [<ffffffff8117e399>] do_sys_open+0x69/0x140
      [78417.126717]  [<ffffffff8100c575>] ? math_state_restore+0x45/0x60
      [78417.126965]  [<ffffffff8117e4b0>] sys_open+0x20/0x30
      [78417.127190]  [<ffffffff8100b0b2>] system_call_fastpath+0x16/0x1b
      [78417.127428] 
      [78417.169930] Kernel panic - not syncing: LBUG
      

      Crashdumps and modules for this one are in:
      /exports/crashdumps/192.168.10.224-2013-05-09-20\:43\:05

      The other crash was like this:

      [79955.565718] Lustre: DEBUG MARKER: == sanity test 71: Running dbench on lustre (don't segment fault) ====== 21:01:15 (1368493275)
      [80393.197212] LustreError: 25536:0:(pack_generic.c:1032:lustre_msg_get_opc()) incorrect message magic: a0b03b5e(msg:ffff8800464ccdf0)
      [80393.197711] LustreError: 25536:0:(pack_generic.c:1033:lustre_msg_get_opc()) LBUG
      [80393.209355] Pid: 25536, comm: dbench
      [80393.209559] 
      [80393.209559] Call Trace:
      [80393.209964]  [<ffffffffa0b028a5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      [80393.210228]  [<ffffffffa0b02ea7>] lbug_with_loc+0x47/0xb0 [libcfs]
      [80393.210502]  [<ffffffffa0d5a4ba>] lustre_msg_get_opc+0x8a/0x110 [ptlrpc]
      [80393.212853]  [<ffffffffa0d5d97b>] _debug_req+0x57b/0x680 [ptlrpc]
      [80393.213149]  [<ffffffff81047fde>] ? kernel_map_pages+0xfe/0x110
      [80393.213412]  [<ffffffffa0d4965e>] ptlrpc_free_committed+0x20e/0x620 [ptlrpc]
      [80393.213689]  [<ffffffffa0d4b313>] after_reply+0x7a3/0xd90 [ptlrpc]
      [80393.213967]  [<ffffffffa0d502c3>] ptlrpc_check_set+0x1093/0x1da0 [ptlrpc]
      [80393.214225]  [<ffffffff8107eca0>] ? process_timeout+0x0/0x10
      [80393.214477]  [<ffffffffa0d512fa>] ptlrpc_set_wait+0x32a/0x880 [ptlrpc]
      [80393.214733]  [<ffffffff8105ad10>] ? default_wake_function+0x0/0x20
      [80393.214995]  [<ffffffffa0d5b4c6>] ? lustre_msg_set_jobid+0xb6/0x140 [ptlrpc]
      [80393.215271]  [<ffffffffa0d518cf>] ptlrpc_queue_wait+0x7f/0x220 [ptlrpc]
      [80393.215526]  [<ffffffffa0401f59>] mdc_sync+0xb9/0x230 [mdc]
      [80393.215773]  [<ffffffffa03d576e>] lmv_sync+0x2ae/0x550 [lmv]
      [80393.216022]  [<ffffffffa0f0a6ab>] ll_fsync+0x1ab/0x860 [lustre]
      [80393.216290]  [<ffffffff811b2401>] vfs_fsync_range+0xa1/0xe0
      [80393.216525]  [<ffffffff811b24ad>] vfs_fsync+0x1d/0x20
      [80393.216777]  [<ffffffff811b24ee>] do_fsync+0x3e/0x60
      [80393.217003]  [<ffffffff811b2540>] sys_fsync+0x10/0x20
      [80393.217230]  [<ffffffff8100b0b2>] system_call_fastpath+0x16/0x1b
      [80393.217471] 
      [80393.228801] Kernel panic - not syncing: LBUG
      

      /exports/crashdumps/192.168.10.224-2013-05-13-21\:08\:35/vmcore, same modules as before.

      Attachments

        Issue Links

          Activity

            People

              green Oleg Drokin
              green Oleg Drokin
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: