Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.4.0
-
3
-
Orion
-
3037
Description
Observed after rebooting the grove-mds2 on to our latest kernel and lustre-orion tag. The server was rebooted while under a fairly heavy test load. No detailed investigation has been done yet, the MDS was able to successfully restart after rebooting the node again and claims to have successfully completed recovery this time.
— First boot —
Mounting grove-mds2/mgs on /mnt/lustre/local/lstest-MGS0000
Lustre: Lustre: Build Version: orion-2_2_49_57_3-49chaos-49chaos--PRISTINE-2.6.32-220.17.1.3chaos.ch5.x86_64
Lustre: MGS: Mounted grove-mds2/mgs
Mounting grove-mds2/mdt0 on /mnt/lustre/local/lstest-MDT0000
LustreError: 11-0: MGC172.20.5.2@o2ib500: Communicating with 0@lo, operation llog_origin_handle_create failed with -2
LustreError: 4503:0:(mgc_request.c:250:do_config_log_add()) failed processing sptlrpc log: -2
Lustre: 4508:0:(fld_index.c:356:fld_index_init()) srv-lstest-MDT0000: File "fld" doesn't support range lookup, using stub. DNE and FIDs on OST will not work with this backend
Lustre: lstest-MDT0000: Temporarily refusing client connection from 172.20.3.48@o2ib500
Lustre: lstest-MDT0000: Temporarily refusing client connection from 172.20.2.199@o2ib500
LNet: 26244:0:(o2iblnd_cb.c:2340:kiblnd_passive_connect()) Conn race 172.20.4.51@o2ib500
LNet: 26244:0:(o2iblnd_cb.c:2340:kiblnd_passive_connect()) Conn race 172.20.4.44@o2ib500
Lustre: lstest-MDT0000: Temporarily refusing client connection from 172.20.3.53@o2ib500
Lustre: Skipped 186 previous similar messages
LustreError: 4567:0:(llog_cat.c:186:llog_cat_id2handle()) error opening log id 0x5a5a5a5a5a5a5a5a:5a5a5a5a: rc -2
LustreError: 4567:0:(llog_cat.c:503:llog_cat_cancel_records()) Cannot find log 0x5a5a5a5a5a5a5a5a
LustreError: 4567:0:(llog_cat.c:533:llog_cat_cancel_records()) Cancel 0 of 1 llog-records failed: -2
LustreError: 4567:0:(osp_sync.c:705:osp_sync_process_committed()) lstest-OST0281-osc-MDT0000: can't cancel record: -2
LustreError: 4567:0:(llog_cat.c:186:llog_cat_id2handle()) error opening log id 0x5a5a5a5a5a5a5a5a:5a5a5a5a: rc -2
LustreError: 4567:0:(llog_cat.c:503:llog_cat_cancel_records()) Cannot find log 0x5a5a5a5a5a5a5a5a
LustreError: 4567:0:(llog_cat.c:533:llog_cat_cancel_records()) Cancel 0 of 1 llog-records failed: -2
LustreError: 4567:0:(osp_sync.c:705:osp_sync_process_committed()) lstest-OST0281-osc-MDT0000: can't cancel record: -2
general protection fault: 0000 1 SMP
last sysfs file: /sys/module/sg/initstate
CPU 7
Pid: 4567, comm: osp-syn-641
Tainted: P W ---------------- 2.6.32-220.17.1.3chaos.ch5.x86_64 #1 Supermicro X8DTH-i/6/iF/6F/X8DTH
RIP: 0010:[<ffffffffa0694252>] [<ffffffffa0694252>] __llog_process_thread+0x2a2/0xc80 [obdclass]
RSP: 0018:ffff882f66e9db60 EFLAGS: 00010206
RAX: 5a5a5a5a5a5a5a5a RBX: 0000000000008701 RCX: 0000000000000000
RDX: 000000000021e000 RSI: 0000000000000001 RDI: ffff882f63cfe000
RBP: ffff882f66e9dc10 R08: ffff88179d8f1900 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff88179caf4058
R13: 000000000000fcff R14: ffff882f63cfc000 R15: ffff882f63cfe000
FS: 00007ffff7fdc700(0000) GS:ffff881894820000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007ffff7ff9000 CR3: 0000000001a85000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process osp-syn-641
(pid: 4567, threadinfo ffff882f66e9c000, task ffff882ff9988ae0)
Stack:
0000000000002000 0000000000000050 0000000000000010 0000000000000000
<0> ffff882f63cfc001 000086fe00000000 0000000000000000 ffff882f640124c0
<0> ffff882f66e9de80 0000000000000000 000000000021e000 0000fd009aee4b80
Call Trace:
[<ffffffffa0dd4570>] ? osp_sync_process_queues+0x0/0xf60 [osp]
[<ffffffffa0694d33>] __llog_process+0x103/0x4d0 [obdclass]
[<ffffffffa06961bb>] llog_cat_process_cb+0x21b/0x290 [obdclass]
[<ffffffffa06946fe>] __llog_process_thread+0x74e/0xc80 [obdclass]
[<ffffffff810618d4>] ? enqueue_task_fair+0x64/0x100
[<ffffffffa0695fa0>] ? llog_cat_process_cb+0x0/0x290 [obdclass]
[<ffffffffa0694d33>] __llog_process+0x103/0x4d0 [obdclass]
[<ffffffffa06953d8>] __llog_cat_process+0x98/0x260 [obdclass]
[<ffffffffa0dd4570>] ? osp_sync_process_queues+0x0/0xf60 [osp]
[<ffffffff81051ba3>] ? __wake_up+0x53/0x70
[<ffffffffa0dd64f2>] osp_sync_thread+0x1c2/0x620 [osp]
[<ffffffffa0dd6330>] ? osp_sync_thread+0x0/0x620 [osp]
[<ffffffff8100c14a>] child_rip+0xa/0x20
[<ffffffffa0dd6330>] ? osp_sync_thread+0x0/0x620 [osp]
[<ffffffffa0dd6330>] ? osp_sync_thread+0x0/0x620 [osp]
[<ffffffff8100c140>] ? child_rip+0x0/0x20
— Second boot —
Mounting grove-mds2/mgs on /mnt/lustre/local/lstest-MGS0000
Lustre: Lustre: Build Version: orion-2_2_49_57_3-49chaos-49chaos--PRISTINE-2.6.32-220.17.1.3chaos.ch5.x86_64
Lustre: MGS: Mounted grove-mds2/mgs
Mounting grove-mds2/mdt0 on /mnt/lustre/local/lstest-MDT0000
LustreError: 11-0: MGC172.20.5.2@o2ib500: Communicating with 0@lo, operation llog_origin_handle_create failed with -2
LustreError: 4525:0:(mgc_request.c:250:do_config_log_add()) failed processing sptlrpc log: -2
Lustre: 4530:0:(fld_index.c:356:fld_index_init()) srv-lstest-MDT0000: File "fld" doesn't support range lookup, using stub. DNE and FIDs on OST will not work with this backend
Lustre: lstest-MDT0000: Temporarily refusing client connection from 172.20.3.183@o2ib500
Lustre: lstest-MDT0000: Temporarily refusing client connection from 172.20.3.22@o2ib500
Lustre: lstest-MDT0000: Temporarily refusing client connection from 172.20.3.24@o2ib500
Lustre: Skipped 3 previous similar messages
Lustre: lstest-MDT0000: Temporarily refusing client connection from 172.20.3.36@o2ib500
Lustre: Skipped 2 previous similar messages
Lustre: lstest-MDT0000: Mounted grove-mds2/mdt0
Lustre: lstest-MDT0000: Will be in recovery for at least 5:00, or until 255 clients reconnect.
LustreError: 11-0: lstest-OST0282-osc-MDT0000: Communicating with 172.20.4.42@o2ib500, operation ost_connect failed with -16
Lustre: lstest-MDT0000: Recovery over after 1:10, of 255 clients 255 recovered and 0 were evicted.
LustreError: 11-0: lstest-OST0282-osc-MDT0000: Communicating with 172.20.4.42@o2ib500, operation ost_connect failed with -16
Attachments
Issue Links
- is related to
-
LU-2362 MDS crash in llog_process_thread
- Closed