Details
-
Bug
-
Resolution: Cannot Reproduce
-
Critical
-
None
-
Lustre 2.13.0
-
b2_13-ib build #2
-
3
-
9223372036854775807
Description
During OSS failover testing, the failover pair node hung during failback after system running for about 35 hours
soak-7
[17882.991912] Lustre: soaked-OST0006: deleting orphan objects from 0x0:1148098 to 0x0:1149825 [17884.486957] Lustre: Failing over soaked-OST0002 [17884.728160] Lustre: server umount soaked-OST0002 complete [17895.395319] Lustre: Failing over soaked-OST000a [17895.413977] Lustre: server umount soaked-OST000a complete [17896.790424] LustreError: 137-5: soaked-OST0002_UUID: not available for connect from 192.168.1.124@o2ib (no target). If you are running an HA pair che ck that the target is mounted on the other server. [17896.810226] LustreError: Skipped 437 previous similar messages [17902.199718] Lustre: Failing over soaked-OST0006 [17902.346578] Lustre: server umount soaked-OST0006 complete [17908.627101] Lustre: Failing over soaked-OST000e [17908.758771] Lustre: server umount soaked-OST000e complete [17923.789823] Lustre: soaked-OST0007: Export ffff9724a6452c00 already connecting from 192.168.1.118@o2ib [17931.042012] Lustre: soaked-OST0007: Export ffff9724d6813400 already connecting from 192.168.1.126@o2ib [17937.914063] Lustre: soaked-OST0007: Export ffff9728a5360800 already connecting from 192.168.1.130@o2ib [17940.946368] Lustre: soaked-OST0007: Export ffff9728d851d800 already connecting from 192.168.1.137@o2ib [17940.956765] Lustre: Skipped 1 previous similar message [17946.942948] Lustre: soaked-OST0007: Export ffff9728d81df400 already connecting from 192.168.1.124@o2ib [17973.969095] Lustre: soaked-OST0007: Export ffff9724a6452c00 already connecting from 192.168.1.118@o2ib [17973.979489] Lustre: Skipped 13 previous similar messages [17991.122858] Lustre: soaked-OST0007: Export ffff9728d851d800 already connecting from 192.168.1.137@o2ib [17991.133257] Lustre: Skipped 3 previous similar messages [18024.148290] Lustre: soaked-OST0007: Export ffff9724a6452c00 already connecting from 192.168.1.118@o2ib [18024.158682] Lustre: Skipped 14 previous similar messages [18064.712164] Lustre: ll_ost00_008: service thread pid 18052 was inactive for 200.107 seconds. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [18064.733418] Pid: 18052, comm: ll_ost00_008 3.10.0-1062.1.1.el7_lustre.x86_64 #1 SMP Fri Nov 8 18:37:40 UTC 2019 [18064.744672] Call Trace: [18064.747423] [<ffffffffc091e2d5>] cv_wait_common+0x125/0x150 [spl] [18064.754345] [<ffffffffc091e315>] __cv_wait+0x15/0x20 [spl] [18064.760577] [<ffffffffc0c992ef>] txg_wait_synced+0xef/0x140 [zfs] [18064.767530] [<ffffffffc0c4ecc5>] dmu_tx_wait+0x275/0x3c0 [zfs] [18064.774174] [<ffffffffc0c4eea2>] dmu_tx_assign+0x92/0x490 [zfs] [18064.780910] [<ffffffffc16c2fd9>] osd_trans_start+0x199/0x440 [osd_zfs] [18064.788312] [<ffffffffc14e8430>] tgt_server_data_update+0x3c0/0x510 [ptlrpc] [18064.796347] [<ffffffffc14ea40d>] tgt_client_del+0x29d/0x6a0 [ptlrpc] [18064.803581] [<ffffffffc180523c>] ofd_obd_disconnect+0x1ac/0x220 [ofd] [18064.810885] [<ffffffffc144f176>] target_handle_disconnect+0xd6/0x450 [ptlrpc] [18064.818985] [<ffffffffc14f0d38>] tgt_disconnect+0x58/0x170 [ptlrpc] [18064.826127] [<ffffffffc14f983a>] tgt_request_handle+0x98a/0x1630 [ptlrpc] [18064.833851] [<ffffffffc149ba96>] ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc] [18064.842439] [<ffffffffc149f5cc>] ptlrpc_main+0xbac/0x1540 [ptlrpc] [18064.849482] [<ffffffff93cc50d1>] kthread+0xd1/0xe0 [18064.854943] [<ffffffff9438cd37>] ret_from_fork_nospec_end+0x0/0x39 [18064.861953] [<ffffffffffffffff>] 0xffffffffffffffff [18065.736218] Lustre: ll_ost00_006: service thread pid 18039 was inactive for 200.435 seconds. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [18065.757478] Pid: 18039, comm: ll_ost00_006 3.10.0-1062.1.1.el7_lustre.x86_64 #1 SMP Fri Nov 8 18:37:40 UTC 2019 [18065.768742] Call Trace: [18065.771481] [<ffffffffc091e2d5>] cv_wait_common+0x125/0x150 [spl] [18065.778397] [<ffffffffc091e315>] __cv_wait+0x15/0x20 [spl] [18065.784636] [<ffffffffc0c992ef>] txg_wait_synced+0xef/0x140 [zfs] [18065.791576] [<ffffffffc0c4ecc5>] dmu_tx_wait+0x275/0x3c0 [zfs] [18065.798219] [<ffffffffc0c4eea2>] dmu_tx_assign+0x92/0x490 [zfs] [18065.804957] [<ffffffffc16c2fd9>] osd_trans_start+0x199/0x440 [osd_zfs] [18065.812358] [<ffffffffc14e8430>] tgt_server_data_update+0x3c0/0x510 [ptlrpc] [18065.820373] [<ffffffffc14ea40d>] tgt_client_del+0x29d/0x6a0 [ptlrpc] [18065.827613] [<ffffffffc180523c>] ofd_obd_disconnect+0x1ac/0x220 [ofd] [18065.834923] [<ffffffffc144f176>] target_handle_disconnect+0xd6/0x450 [ptlrpc] [18065.843022] [<ffffffffc14f0d38>] tgt_disconnect+0x58/0x170 [ptlrpc] [18065.850165] [<ffffffffc14f983a>] tgt_request_handle+0x98a/0x1630 [ptlrpc] [18065.857886] [<ffffffffc149ba96>] ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc] [18065.866478] [<ffffffffc149f5cc>] ptlrpc_main+0xbac/0x1540 [ptlrpc] [18065.873511] [<ffffffff93cc50d1>] kthread+0xd1/0xe0 [18065.878972] [<ffffffff9438cd37>] ret_from_fork_nospec_end+0x0/0x39 [18065.885982] [<ffffffffffffffff>] 0xffffffffffffffff ... [21128.204987] Lustre: Skipped 1 previous similar message [21138.287341] Lustre: soaked-OST0002: Not available for connect from 192.168.1.137@o2ib (not set up) [21138.297347] Lustre: Skipped 2 previous similar messages [21167.633849] Lustre: soaked-OST0002: Not available for connect from 192.168.1.111@o2ib (not set up) [21167.643858] Lustre: Skipped 9 previous similar messages [21195.356109] Lustre: 23774:0:(service.c:1442:ptlrpc_at_send_early_reply()) @@@ Could not add any time (5/5), not sending early reply req@ffff97278de0e780 x1650366648233984/t0(0) o5->soaked-MDT0002-mdtlov_UUID@192.168.1.110@o2ib:545/0 lens 432/432 e 1 to 0 dl 1573925640 ref 2 fl Interpret:/0/0 rc 0/0 job:'' [21195.386391] Lustre: 23774:0:(service.c:1442:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages [21199.340302] ptlrpc_watchdog_fire: 3 callbacks suppressed [21199.346241] Lustre: ll_ost00_024: service thread pid 5497 was inactive for 1200.379 seconds. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [21199.367494] Pid: 5497, comm: ll_ost00_024 3.10.0-1062.1.1.el7_lustre.x86_64 #1 SMP Fri Nov 8 18:37:40 UTC 2019 [21199.378663] Call Trace: [21199.381399] [<ffffffffc1801873>] ofd_create_hdl+0xcc3/0x2100 [ofd] [21199.388410] [<ffffffffc14f983a>] tgt_request_handle+0x98a/0x1630 [ptlrpc] [21199.396155] [<ffffffffc149ba96>] ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc] [21199.404748] [<ffffffffc149f5cc>] ptlrpc_main+0xbac/0x1540 [ptlrpc] [21199.411790] [<ffffffff93cc50d1>] kthread+0xd1/0xe0 [21199.417239] [<ffffffff9438cd37>] ret_from_fork_nospec_end+0x0/0x39 [21199.424249] [<ffffffffffffffff>] 0xffffffffffffffff [21203.436516] Pid: 18121, comm: ll_ost01_023 3.10.0-1062.1.1.el7_lustre.x86_64 #1 SMP Fri Nov 8 18:37:40 UTC 2019 [21203.447781] Call Trace: [21203.450514] [<ffffffffc1801873>] ofd_create_hdl+0xcc3/0x2100 [ofd] [21203.457536] [<ffffffffc14f983a>] tgt_request_handle+0x98a/0x1630 [ptlrpc] [21203.465297] [<ffffffffc149ba96>] ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc] [21203.473893] [<ffffffffc149f5cc>] ptlrpc_main+0xbac/0x1540 [ptlrpc] [21203.480928] [<ffffffff93cc50d1>] kthread+0xd1/0xe0 [21203.486390] [<ffffffff9438cd37>] ret_from_fork_nospec_end+0x0/0x39 [21203.493401] [<ffffffffffffffff>] 0xffffffffffffffff [21217.812639] Lustre: soaked-OST0002: Not available for connect from 192.168.1.111@o2ib (not set up) [21217.822646] Lustre: Skipped 23 previous similar messages
Attachments
Issue Links
- mentioned in
-
Page Loading...