Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5314

Lustre 2.4.2 MDS hit LBUG and crash

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Major
    • None
    • Lustre 2.4.3
    • Linux meerkat-mds-10-1.local 2.6.32-358.23.2.el6_lustre.x86_64 #1 SMP Thu Dec 19 19:57:45 PST 2013 x86_64 x86_64 x86_64 GNU/Linux
    • 3
    • 14850

    Description

      Our MDS hit LBUG and crashed this evening. Here are the /var/log/messages:

      Jul 9 18:40:22 meerkat-mds-10-1 kernel: Lustre: meerkat-MDT0000: Client e27741dc-f76c-ea5a-c426-4c6b5e86a758 (at 198.202.118.120@tcp) reconnecting
      Jul 9 18:40:24 meerkat-mds-10-1 kernel: Lustre: meerkat-MDT0000: Client 42a919ae-9df5-e771-0e9a-7ee82fdc33d9 (at 198.202.119.106@tcp) reconnecting
      Jul 9 18:40:24 meerkat-mds-10-1 kernel: Lustre: Skipped 1 previous similar message
      Jul 9 18:40:29 meerkat-mds-10-1 kernel: Lustre: 3457:0:(mdt_handler.c:1338:mdt_getattr_name_lock()) Although resent, but still not get child lockparent:[0x200003fb3:0xaa9f:0x0] child:[0x200003fb3:0xb42e:0x0]
      Jul 9 18:40:29 meerkat-mds-10-1 kernel: LustreError: 3457:0:(mdt_handler.c:3568:mdt_intent_lock_replace()) ASSERTION( lustre_msg_get_flags(req->rq_reqmsg) & 0x0002 ) failed:
      Jul 9 18:40:29 meerkat-mds-10-1 kernel: LustreError: 3457:0:(mdt_handler.c:3568:mdt_intent_lock_replace()) LBUG
      Jul 9 18:40:29 meerkat-mds-10-1 kernel: Pid: 3457, comm: mdt03_003
      Jul 9 18:40:29 meerkat-mds-10-1 kernel:
      Jul 9 18:40:29 meerkat-mds-10-1 kernel: Call Trace:
      Jul 9 18:40:29 meerkat-mds-10-1 kernel: [<ffffffffa02e7895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      Jul 9 18:40:29 meerkat-mds-10-1 kernel: [<ffffffffa02e7e97>] lbug_with_loc+0x47/0xb0 [libcfs]
      Jul 9 18:40:29 meerkat-mds-10-1 kernel: [<ffffffffa0c9c3b1>] mdt_intent_lock_replace+0x391/0x400 [mdt]
      Jul 9 18:40:29 meerkat-mds-10-1 kernel: [<ffffffffa0cb34c6>] mdt_intent_getattr+0x3b6/0x490 [mdt]
      Jul 9 18:40:29 meerkat-mds-10-1 kernel: [<ffffffffa0c9ff1e>] mdt_intent_policy+0x39e/0x720 [mdt]
      Jul 9 18:40:29 meerkat-mds-10-1 kernel: [<ffffffffa0575831>] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc]
      Jul 9 18:40:29 meerkat-mds-10-1 kernel: [<ffffffffa059c1cf>] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc]
      Jul 9 18:40:29 meerkat-mds-10-1 kernel: [<ffffffffa0ca03a6>] mdt_enqueue+0x46/0xe0 [mdt]
      Jul 9 18:40:29 meerkat-mds-10-1 kernel: [<ffffffffa0ca6a97>] mdt_handle_common+0x647/0x16d0 [mdt]
      Jul 9 18:40:29 meerkat-mds-10-1 kernel: [<ffffffffa05beb8c>] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc]
      Jul 9 18:40:29 meerkat-mds-10-1 kernel: [<ffffffffa0ce06a5>] mds_regular_handle+0x15/0x20 [mdt]
      Jul 9 18:40:29 meerkat-mds-10-1 kernel: [<ffffffffa05ce3a8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
      Jul 9 18:40:29 meerkat-mds-10-1 kernel: [<ffffffffa02e85de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      Jul 9 18:40:29 meerkat-mds-10-1 kernel: [<ffffffffa02f9d9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
      Jul 9 18:40:29 meerkat-mds-10-1 kernel: [<ffffffffa05c5709>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
      Jul 9 18:40:29 meerkat-mds-10-1 kernel: [<ffffffff81063990>] ? default_wake_function+0x0/0x20
      Jul 9 18:40:29 meerkat-mds-10-1 kernel: [<ffffffffa05cf73e>] ptlrpc_main+0xace/0x1700 [ptlrpc]
      Jul 9 18:40:29 meerkat-mds-10-1 kernel: [<ffffffffa05cec70>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      Jul 9 18:40:29 meerkat-mds-10-1 kernel: [<ffffffff8100c0ca>] child_rip+0xa/0x20
      Jul 9 18:40:29 meerkat-mds-10-1 kernel: [<ffffffffa05cec70>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      Jul 9 18:40:29 meerkat-mds-10-1 kernel: [<ffffffffa05cec70>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      Jul 9 18:40:29 meerkat-mds-10-1 kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
      Jul 9 18:40:29 meerkat-mds-10-1 kernel:

      Attachments

        Issue Links

          Activity

            People

              niu Niu Yawei (Inactive)
              haisong Haisong Cai (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: