Details
-
Bug
-
Resolution: Cannot Reproduce
-
Major
-
None
-
Lustre 2.4.3
-
Linux meerkat-mds-10-1.local 2.6.32-358.23.2.el6_lustre.x86_64 #1 SMP Thu Dec 19 19:57:45 PST 2013 x86_64 x86_64 x86_64 GNU/Linux
-
3
-
14850
Description
Our MDS hit LBUG and crashed this evening. Here are the /var/log/messages:
Jul 9 18:40:22 meerkat-mds-10-1 kernel: Lustre: meerkat-MDT0000: Client e27741dc-f76c-ea5a-c426-4c6b5e86a758 (at 198.202.118.120@tcp) reconnecting
Jul 9 18:40:24 meerkat-mds-10-1 kernel: Lustre: meerkat-MDT0000: Client 42a919ae-9df5-e771-0e9a-7ee82fdc33d9 (at 198.202.119.106@tcp) reconnecting
Jul 9 18:40:24 meerkat-mds-10-1 kernel: Lustre: Skipped 1 previous similar message
Jul 9 18:40:29 meerkat-mds-10-1 kernel: Lustre: 3457:0:(mdt_handler.c:1338:mdt_getattr_name_lock()) Although resent, but still not get child lockparent:[0x200003fb3:0xaa9f:0x0] child:[0x200003fb3:0xb42e:0x0]
Jul 9 18:40:29 meerkat-mds-10-1 kernel: LustreError: 3457:0:(mdt_handler.c:3568:mdt_intent_lock_replace()) ASSERTION( lustre_msg_get_flags(req->rq_reqmsg) & 0x0002 ) failed:
Jul 9 18:40:29 meerkat-mds-10-1 kernel: LustreError: 3457:0:(mdt_handler.c:3568:mdt_intent_lock_replace()) LBUG
Jul 9 18:40:29 meerkat-mds-10-1 kernel: Pid: 3457, comm: mdt03_003
Jul 9 18:40:29 meerkat-mds-10-1 kernel:
Jul 9 18:40:29 meerkat-mds-10-1 kernel: Call Trace:
Jul 9 18:40:29 meerkat-mds-10-1 kernel: [<ffffffffa02e7895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
Jul 9 18:40:29 meerkat-mds-10-1 kernel: [<ffffffffa02e7e97>] lbug_with_loc+0x47/0xb0 [libcfs]
Jul 9 18:40:29 meerkat-mds-10-1 kernel: [<ffffffffa0c9c3b1>] mdt_intent_lock_replace+0x391/0x400 [mdt]
Jul 9 18:40:29 meerkat-mds-10-1 kernel: [<ffffffffa0cb34c6>] mdt_intent_getattr+0x3b6/0x490 [mdt]
Jul 9 18:40:29 meerkat-mds-10-1 kernel: [<ffffffffa0c9ff1e>] mdt_intent_policy+0x39e/0x720 [mdt]
Jul 9 18:40:29 meerkat-mds-10-1 kernel: [<ffffffffa0575831>] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc]
Jul 9 18:40:29 meerkat-mds-10-1 kernel: [<ffffffffa059c1cf>] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc]
Jul 9 18:40:29 meerkat-mds-10-1 kernel: [<ffffffffa0ca03a6>] mdt_enqueue+0x46/0xe0 [mdt]
Jul 9 18:40:29 meerkat-mds-10-1 kernel: [<ffffffffa0ca6a97>] mdt_handle_common+0x647/0x16d0 [mdt]
Jul 9 18:40:29 meerkat-mds-10-1 kernel: [<ffffffffa05beb8c>] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc]
Jul 9 18:40:29 meerkat-mds-10-1 kernel: [<ffffffffa0ce06a5>] mds_regular_handle+0x15/0x20 [mdt]
Jul 9 18:40:29 meerkat-mds-10-1 kernel: [<ffffffffa05ce3a8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
Jul 9 18:40:29 meerkat-mds-10-1 kernel: [<ffffffffa02e85de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
Jul 9 18:40:29 meerkat-mds-10-1 kernel: [<ffffffffa02f9d9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
Jul 9 18:40:29 meerkat-mds-10-1 kernel: [<ffffffffa05c5709>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
Jul 9 18:40:29 meerkat-mds-10-1 kernel: [<ffffffff81063990>] ? default_wake_function+0x0/0x20
Jul 9 18:40:29 meerkat-mds-10-1 kernel: [<ffffffffa05cf73e>] ptlrpc_main+0xace/0x1700 [ptlrpc]
Jul 9 18:40:29 meerkat-mds-10-1 kernel: [<ffffffffa05cec70>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
Jul 9 18:40:29 meerkat-mds-10-1 kernel: [<ffffffff8100c0ca>] child_rip+0xa/0x20
Jul 9 18:40:29 meerkat-mds-10-1 kernel: [<ffffffffa05cec70>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
Jul 9 18:40:29 meerkat-mds-10-1 kernel: [<ffffffffa05cec70>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
Jul 9 18:40:29 meerkat-mds-10-1 kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
Jul 9 18:40:29 meerkat-mds-10-1 kernel: