Details
-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
Lustre 2.12.4
-
None
-
2
-
9223372036854775807
Description
OSS LBUG. First time we have seen this.
[1574769.939126] LNetError: 7420:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1574769.972906] LNetError: 7420:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.11.102@o2ib (293): c: 32, oc: 0, rc: 32 [1574968.944839] LNetError: 7420:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1574968.978608] LNetError: 7420:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 3 previous similar messages [1574969.012379] LNetError: 7420:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.24.203@o2ib (247): c: 32, oc: 0, rc: 32 [1574969.053585] LNetError: 7420:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 3 previous similar messages [1575256.183968] Lustre: nbp8-OST0103: Connection restored to 10dfb7c6-2481-1ba8-d8c9-5458677b6b29 (at 10.151.31.52@o2ib) [1575256.183973] Lustre: Skipped 15281 previous similar messages [1575337.223394] LNetError: 8719:0:(peer.c:280:lnet_destroy_peer_locked()) ASSERTION( list_empty(&lp->lp_peer_nets) ) failed: [1575337.260035] LNetError: 8719:0:(peer.c:280:lnet_destroy_peer_locked()) LBUG [1575337.283229] Pid: 8719, comm: lnet_discovery 3.10.0-1062.12.1.el7_lustre2124.x86_64 #1 SMP Tue Mar 17 13:32:19 PDT 2020 [1575337.283233] Call Trace: [1575337.283243] [<ffffffffc0cbd7cc>] libcfs_call_trace+0x8c/0xc0 [libcfs] [1575337.305316] [<ffffffffc0cbd87c>] lbug_with_loc+0x4c/0xa0 [libcfs] [1575337.305340] [<ffffffffc0d56a8a>] lnet_destroy_peer_locked+0x24a/0x350 [lnet] [1575337.305351] [<ffffffffc0d570c5>] lnet_peer_discovery_complete+0x2a5/0x350 [lnet] [1575337.305361] [<ffffffffc0d5bd20>] lnet_peer_discovery+0x6c0/0x1150 [lnet] [1575337.305365] [<ffffffffb20c61f1>] kthread+0xd1/0xe0 [1575337.305368] [<ffffffffb278dd37>] ret_from_fork_nospec_end+0x0/0x39 [1575337.305389] [<ffffffffffffffff>] 0xffffffffffffffff [1575337.305391] Kernel panic - not syncing: LBUG [1575337.305393] CPU: 11 PID: 8719 Comm: lnet_discovery Kdump: loaded Tainted: G OE ------------ 3.10.0-1062.12.1.el7_lustre2124.x86_64 #1 [1575337.305394] Hardware name: SGI.COM SUMMIT/S2600GZ, BIOS SE5C600.86B.02.01.0002.082220131453 08/22/2013 [1575337.305395] Call Trace: [1575337.305399] [<ffffffffb277ac43>] dump_stack+0x19/0x1b [1575337.305402] [<ffffffffb2774987>] panic+0xe8/0x21f [1575337.305408] [<ffffffffc0cbd8cb>] lbug_with_loc+0x9b/0xa0 [libcfs] [1575337.305417] [<ffffffffc0d56a8a>] lnet_destroy_peer_locked+0x24a/0x350 [lnet] [1575337.305425] [<ffffffffc0d570c5>] lnet_peer_discovery_complete+0x2a5/0x350 [lnet] [1575337.305434] [<ffffffffc0d5bd20>] lnet_peer_discovery+0x6c0/0x1150 [lnet] [1575337.305436] [<ffffffffb20c72e0>] ? wake_up_atomic_t+0x30/0x30 [1575337.305444] [<ffffffffc0d5b660>] ? lnet_peer_merge_data+0xde0/0xde0 [lnet] [1575337.305446] [<ffffffffb20c61f1>] kthread+0xd1/0xe0 [1575337.305448] [<ffffffffb20c6120>] ? insert_kthread_work+0x40/0x40 [1575337.305450] [<ffffffffb278dd37>] ret_from_fork_nospec_begin+0x21/0x21 [1575337.305452] [<ffffffffb20c6120>] ? insert_kthread_work+0x40/0x40
Attachments
Issue Links
- is related to
-
LU-9971 MR: ABA problem in lnet_discover_peer_locked
- Resolved