Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.10.0
-
Soak stress cluster
-
3
-
9223372036854775807
Description
Loaded latest master build, started soak. Only fault induced was router drop.
Soak-9 is the second MDS (MDT0001)
soak-9 has a hard crash in normal operation:
Soak is started, mount completes
[ 893.779340] LustreError: 11-0: soaked-OST000c-osc-MDT0001: operation ost_connect to node 192.168.1.102@o2ib10 failed: rc = -16^M
[ 893.801270] LustreError: Skipped 139 previous similar messages^M
[ 894.095801] Lustre: soaked-MDT0003-osp-MDT0001: Connection restored to 192.168.1.111@o2ib10 (at 192.168.1.111@o2ib10)^M
[ 894.110919] Lustre: Skipped 7 previous similar messages^M
[ 894.253639] Lustre: soaked-MDT0001: recovery is timed out, evict stale exports^M
[ 894.265333] Lustre: soaked-MDT0001: disconnecting 28 stale clients^M
[ 894.278063] Lustre: soaked-MDT0001: Recovery over after 5:01, of 31 clients 3 recovered and 28 were evicted.^M
[ 1465.548946] Lustre: soaked-MDT0001: Client 8e539072-a775-2171-7825-433ade3d0c39 (at 192.168.1.132@o2ib100) reconnecting^M
[ 1465.563698] Lustre: soaked-MDT0001: Connection restored to 8e539072-a775-2171-7825-433ade3d0c39 (at 192.168.1.132@o2ib100)^M
[ 1465.579478] Lustre: Skipped 30 previous similar messages^M
[ 1757.516066] Lustre: soaked-MDT0001: Client 6fb4512c-e89d-8233-88ad-b696d11c9821 (at 192.168.1.138@o2ib100) reconnecting^M
[ 1757.531619] Lustre: soaked-MDT0001: Connection restored to 6fb4512c-e89d-8233-88ad-b696d11c9821 (at 192.168.1.138@o2ib100)^M
[ 1904.507160] Lustre: soaked-MDT0001: Client 6ae0a03c-6567-9e19-d483-8743882e83e1 (at 192.168.1.116@o2ib100) reconnecting^M
[ 1942.000624] Lustre: soaked-MDT0001: Client 0a5dee9d-f606-8dd1-e9d4-d42a75c735e1 (at 192.168.1.129@o2ib100) reconnecting^M
[ 2436.463475] Lustre: soaked-MDT0001: Client 2d2ea61b-cf5f-add6-6e06-30458f85a726 (at 192.168.1.139@o2ib100) reconnecting^M
[ 2436.478139] Lustre: soaked-MDT0001: Connection restored to 2d2ea61b-cf5f-add6-6e06-30458f85a726 (at 192.168.1.139@o2ib100)^M
[ 2436.493810] Lustre: Skipped 2 previous similar messages^M
[ 2733.438247] Lustre: soaked-MDT0001: Client cb715667-fb6c-b895-e632-274b232c5bc9 (at 192.168.1.119@o2ib100) reconnecting^M
[ 3117.401693] Lustre: soaked-MDT0001: Client 0a5dee9d-f606-8dd1-e9d4-d42a75c735e1 (at 192.168.1.129@o2ib100) reconnecting^M
[ 3117.416360] Lustre: soaked-MDT0001: Connection restored to 0a5dee9d-f606-8dd1-e9d4-d42a75c735e1 (at 192.168.1.129@o2ib100)^M
[ 3117.430856] Lustre: Skipped 1 previous similar message^M
[ 3359.388740] Lustre: soaked-MDT0001: Client 6ae0a03c-6567-9e19-d483-8743882e83e1 (at 192.168.1.116@o2ib100) reconnecting^M
[ 3497.321905] Lustre: soaked-MDT0001: Client 6ae0a03c-6567-9e19-d483-8743882e83e1 (at 192.168.1.116@o2ib100) reconnecting^M
[ 3951.338201] Lustre: soaked-MDT0001: Client 6ae0a03c-6567-9e19-d483-8743882e83e1 (at 192.168.1.116@o2ib100) reconnecting^M
[ 3951.353148] Lustre: soaked-MDT0001: Connection restored to 6ae0a03c-6567-9e19-d483-8743882e83e1 (at 192.168.1.116@o2ib100)^M
[ 3951.368647] Lustre: Skipped 2 previous similar messages^M
[ 4415.324113] Lustre: soaked-MDT0001: Client 6ae0a03c-6567-9e19-d483-8743882e83e1 (at 192.168.1.116@o2ib100) reconnecting^M
[ 4501.233379] Lustre: 3876:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1494873743/real 1494873743] req@ffff8803b3329b00 x1567484370368192/t0(0) o104->soaked-MDT0001@192.168.1.139@o2ib100:15/16 lens 296/224 e 0 to 1 dl 1494873750 ref 1 fl Rpc:X/0/ffffffff rc 0/-1^M
[ 4501.273401] Lustre: 3876:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 7 previous similar messages^M
[ 4632.525998] LustreError: 3049:0:(service.c:2229:ptlrpc_handle_rs()) ASSERTION( lock != ((void *)0) ) failed: ^M
[ 4632.539820] LustreError: 3049:0:(service.c:2229:ptlrpc_handle_rs()) LBUG^M
[ 4632.550400] Pid: 3049, comm: ptlrpc_hr01_004^M
[ 4632.557537] ^M
[ 4632.557537] Call Trace:^M
[ 4632.566532] [<ffffffffa080e7ee>] libcfs_call_trace+0x4e/0x60 [libcfs]^M
[ 4632.575923] [<ffffffffa080e87c>] lbug_with_loc+0x4c/0xb0 [libcfs]^M
[ 4632.585006] [<ffffffffa0ba2bed>] ptlrpc_hr_main+0x83d/0x8f0 [ptlrpc]^M
[ 4632.594107] [<ffffffff810c8345>] ? sched_clock_cpu+0x85/0xc0^M
[ 4632.602416] [<ffffffff810c54c0>] ? default_wake_function+0x0/0x20^M
[ 4632.611086] [<ffffffffa0ba23b0>] ? ptlrpc_hr_main+0x0/0x8f0 [ptlrpc]^M
[ 4632.620009] [<ffffffff810b0a4f>] kthread+0xcf/0xe0^M
[ 4632.627027] [<ffffffff810b0980>] ? kthread+0x0/0xe0^M
[ 4632.634106] [<ffffffff81697318>] ret_from_fork+0x58/0x90^M
[ 4632.641571] [<ffffffff810b0980>] ? kthread+0x0/0xe0^M
[ 4632.648510] ^M
[ 4632.651470] Kernel panic - not syncing: LBUG^M
[ 4632.657497] CPU: 11 PID: 3049 Comm: ptlrpc_hr01_004 Tainted: P OE ------------ 3.10.0-514.16.1.el7_lustre.x86_64 #1^M
[ 4632.673033] Hardware name: Intel Corporation S2600GZ ........../S2600GZ, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013^M
[ 4632.686895] ffffffffa082cdac 00000000329e47aa ffff8808212b7d30 ffffffff81686d1f^M
[ 4632.696571] ffff8808212b7db0 ffffffff8168014a ffffffff00000008 ffff8808212b7dc0^M
[ 4632.706233] ffff8808212b7d60 00000000329e47aa 00000000329e47aa ffff88082d8cf838^M
[ 4632.715875] Call Trace:^M
[ 4632.715875] Call Trace:^M
[ 4632.719901] [<ffffffff81686d1f>] dump_stack+0x19/0x1b^M
[ 4632.726949] [<ffffffff8168014a>] panic+0xe3/0x1f2^M
[ 4632.733611] [<ffffffffa080e894>] lbug_with_loc+0x64/0xb0 [libcfs]^M
[ 4632.741856] [<ffffffffa0ba2bed>] ptlrpc_hr_main+0x83d/0x8f0 [ptlrpc]^M
[ 4632.750326] [<ffffffff810c8345>] ? sched_clock_cpu+0x85/0xc0^M
[ 4632.757970] [<ffffffff810c54c0>] ? wake_up_state+0x20/0x20^M
[ 4632.765508] [<ffffffffa0ba23b0>] ? ptlrpc_svcpt_stop_threads+0x590/0x590 [ptlrpc]^M
[ 4632.775290] [<ffffffff810b0a4f>] kthread+0xcf/0xe0^M
[ 4632.781973] [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140^M
[ 4632.790485] [<ffffffff81697318>] ret_from_fork+0x58/0x90^M
[ 4632.797662] [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140^M
System then crashed, crash dump is available on the node.
vmcore-dmesg attached.