Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7488

_req_capsule_get()) ASSERTION( fmt != ((void *)(long)0x5a5a5a5a5a5a5a5a) ) failed:

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Blocker
    • Lustre 2.8.0
    • None
    • lola
      build: 2.7.63-4-gf84e06e, a7eface85ea2d2aa6198681264b082a0244855d4 + patches
    • 3
    • 9223372036854775807

    Description

      The error occurred during soak testing of master branch build '20151122' (see https://wiki.hpdd.intel.com/pages/viewpage.action?title=Soak+Testing+on+Lola&spaceKey=Releases#SoakTestingonLola-20151122). DNE is enabled. MDSes are configured in active-active failover configuration.

      Sequence of events:

      • 2015-11-26 10:32 Failover resources (mdt-0,1) lola-8 --> lola-9 started
      • 2015-11-26 11:40 Failback resources (mdt-0.1) lola-9 --> lola-8 completed successful
      • 2015-11-26 11:44 LBUG on lola-9. See the following message.
        Nov 26 11:44:53 lola-9 kernel: LustreError: 7588:0:(layout.c:1989:__req_capsule_get()) ASSERTION( fmt != ((void *)(long)0x5a5a5a5a5a5a5a5a) ) failed: 
        Nov 26 11:44:53 lola-9 kernel: LustreError: 7588:0:(layout.c:1989:__req_capsule_get()) LBUG
        Nov 26 11:44:53 lola-9 kernel: Pid: 7588, comm: mdt02_000
        Nov 26 11:44:53 lola-9 kernel: 
        Nov 26 11:44:53 lola-9 kernel: Call Trace:
        Nov 26 11:44:53 lola-9 kernel: [<ffffffffa07c1875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
        Nov 26 11:44:53 lola-9 kernel: [<ffffffffa07c1e77>] lbug_with_loc+0x47/0xb0 [libcfs]
        Nov 26 11:44:53 lola-9 kernel: [<ffffffffa0b2eed7>] __req_capsule_get+0x617/0x6e0 [ptlrpc]
        Nov 26 11:44:53 lola-9 kernel: [<ffffffffa08bb595>] ? class_handle2object+0x95/0x190 [obdclass]
        Nov 26 11:44:53 lola-9 kernel: [<ffffffffa0b2f0a8>] req_capsule_server_get+0x18/0x20 [ptlrpc]
        Nov 26 11:44:53 lola-9 kernel: [<ffffffffa0ad2f52>] ldlm_cli_enqueue_fini+0x1d2/0xe30 [ptlrpc]
        Nov 26 11:44:54 lola-9 kernel: [<ffffffffa0af55b4>] ? ptlrpc_set_destroy+0x414/0x570 [ptlrpc]
        Nov 26 11:44:54 lola-9 kernel: [<ffffffffa0ad3f71>] ldlm_cli_enqueue+0x3c1/0x870 [ptlrpc]
        Nov 26 11:44:54 lola-9 kernel: [<ffffffffa0ad9010>] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc]
        Nov 26 11:44:54 lola-9 kernel: [<ffffffffa122c2f0>] ? mdt_remote_blocking_ast+0x0/0x210 [mdt]
        Nov 26 11:44:54 lola-9 kernel: [<ffffffffa14105c5>] osp_md_object_lock+0x185/0x240 [osp]
        Nov 26 11:44:54 lola-9 kernel: [<ffffffffa131d557>] lod_object_lock+0x147/0x860 [lod]
        Nov 26 11:44:54 lola-9 kernel: [<ffffffffa08dfa0f>] ? lu_object_find_try+0x9f/0x260 [obdclass]
        Nov 26 11:44:54 lola-9 kernel: [<ffffffffa139f92b>] mdd_object_lock+0x3b/0xd0 [mdd]
        Nov 26 11:44:54 lola-9 kernel: [<ffffffffa1239b2a>] mdt_remote_object_lock+0x14a/0x310 [mdt]
        Nov 26 11:44:54 lola-9 kernel: [<ffffffffa0b05925>] ? lustre_msg_buf+0x55/0x60 [ptlrpc]
        Nov 26 11:44:54 lola-9 kernel: [<ffffffffa0b2ea22>] ? __req_capsule_get+0x162/0x6e0 [ptlrpc]
        Nov 26 11:44:54 lola-9 kernel: [<ffffffffa1239e19>] mdt_object_lock_internal+0x129/0x2d0 [mdt]
        Nov 26 11:44:54 lola-9 kernel: [<ffffffffa123a081>] mdt_object_lock+0x11/0x20 [mdt]
        Nov 26 11:44:54 lola-9 kernel: [<ffffffffa124bb2a>] mdt_reint_create+0x6fa/0xcc0 [mdt]
        Nov 26 11:44:54 lola-9 kernel: [<ffffffffa08fb870>] ? lu_ucred+0x20/0x30 [obdclass]
        Nov 26 11:44:54 lola-9 kernel: [<ffffffffa122b675>] ? mdt_ucred+0x15/0x20 [mdt]
        Nov 26 11:44:54 lola-9 kernel: [<ffffffffa12448dc>] ? mdt_root_squash+0x2c/0x3f0 [mdt]
        Nov 26 11:44:54 lola-9 kernel: [<ffffffffa0b2ea22>] ? __req_capsule_get+0x162/0x6e0 [ptlrpc]
        Nov 26 11:44:54 lola-9 kernel: [<ffffffff81294a3a>] ? strlcpy+0x4a/0x60
        Nov 26 11:44:54 lola-9 kernel: [<ffffffffa1248a1d>] mdt_reint_rec+0x5d/0x200 [mdt]
        Nov 26 11:44:54 lola-9 kernel: [<ffffffffa123477b>] mdt_reint_internal+0x62b/0xb80 [mdt]
        Nov 26 11:44:54 lola-9 kernel: [<ffffffffa123516b>] mdt_reint+0x6b/0x120 [mdt]
        Nov 26 11:44:54 lola-9 kernel: [<ffffffffa0b70e1c>] tgt_request_handle+0x8bc/0x12e0 [ptlrpc]
        Nov 26 11:44:54 lola-9 kernel: [<ffffffffa0b18711>] ptlrpc_main+0xe41/0x1910 [ptlrpc]
        Nov 26 11:44:54 lola-9 kernel: [<ffffffff8152a39e>] ? thread_return+0x4e/0x7d0
        Nov 26 11:44:54 lola-9 kernel: [<ffffffffa0b178d0>] ? ptlrpc_main+0x0/0x1910 [ptlrpc]
        Nov 26 11:44:54 lola-9 kernel: [<ffffffff8109e78e>] kthread+0x9e/0xc0
        Nov 26 11:44:54 lola-9 kernel: [<ffffffff8100c28a>] child_rip+0xa/0x20
        Nov 26 11:44:54 lola-9 kernel: [<ffffffff8109e6f0>] ? kthread+0x0/0xc0
        Nov 26 11:44:54 lola-9 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20
        Nov 26 11:44:54 lola-9 kernel: 
        Nov 26 11:44:54 lola-9 kernel: LustreError: dumping log to /tmp/lustre-log.1448567094.7588
        

        Attached messages, console log files of MDS (lola-9) and debug log file mentioned in LBUG error message. Also extracted Lustre messages on client nodes and attached them to the ticket. No errors occured on OSS nodes.

      Attachments

        Issue Links

          Activity

            People

              di.wang Di Wang
              heckes Frank Heckes (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: