[LU-8428] Very long recovery times for MDTs after MDS restart Created: 21/Jul/16 Updated: 05/Oct/16 Resolved: 05/Oct/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.9.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Frank Heckes (Inactive) | Assignee: | nasf (Inactive) |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | soak | ||
| Environment: |
lola |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
The effect happens during soak testing of build '20160713' (see https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20160713) The effect of very long recovery times occurred after the restart of two MDS nodes. An OSS failover/failback injected by the framework, while the MDTs of the restarted MDS nodes were in recovery, finished successful in 48 seconds. Sequence of events:
Clients were in the following state during recovery: [root@lola-16 ~]# pdsh -g clients 'lctl get_param *.*.state | grep -A 2 -E "MDT0001|MDT0002" ' | dshbak -c | less -i ---------------- lola-13 ---------------- mdc.soaked-MDT0001-mdc-ffff88028add7800.state= current_state: REPLAY_WAIT state_history: -- mdc.soaked-MDT0002-mdc-ffff88028add7800.state= current_state: REPLAY_WAIT state_history: ---------------- lola-16 ---------------- mdc.soaked-MDT0001-mdc-ffff880415a56400.state= current_state: REPLAY_WAIT state_history: -- mdc.soaked-MDT0002-mdc-ffff880415a56400.state= current_state: REPLAY_WAIT state_history: ---------------- lola-19 ---------------- mdc.soaked-MDT0001-mdc-ffff88021b0fa800.state= current_state: REPLAY_WAIT state_history: -- mdc.soaked-MDT0002-mdc-ffff88021b0fa800.state= current_state: REPLAY_WAIT state_history: ---------------- lola-20 ---------------- mdc.soaked-MDT0001-mdc-ffff880c9b13f400.state= current_state: REPLAY_WAIT state_history: -- mdc.soaked-MDT0002-mdc-ffff880c9b13f400.state= current_state: REPLAY_WAIT state_history: ---------------- lola-21 ---------------- mdc.soaked-MDT0001-mdc-ffff881033e32400.state= current_state: REPLAY_WAIT state_history: -- mdc.soaked-MDT0002-mdc-ffff881033e32400.state= current_state: REPLAY_WAIT state_history: ---------------- ---------------- lola-22 ---------------- mdc.soaked-MDT0001-mdc-ffff88082e455000.state= current_state: REPLAY_WAIT state_history: -- mdc.soaked-MDT0002-mdc-ffff88082e455000.state= current_state: REPLAY_WAIT state_history: ---------------- lola-23 ---------------- mdc.soaked-MDT0001-mdc-ffff8808e6af4400.state= current_state: REPLAY_WAIT state_history: -- mdc.soaked-MDT0002-mdc-ffff8808e6af4400.state= current_state: REPLAY_WAIT state_history: ---------------- lola-24 ---------------- mdc.soaked-MDT0001-mdc-ffff88102e910800.state= current_state: REPLAY_WAIT state_history: -- mdc.soaked-MDT0002-mdc-ffff88102e910800.state= current_state: REPLAY_WAIT state_history: ---------------- lola-25 ---------------- mdc.soaked-MDT0001-mdc-ffff88102f38a400.state= current_state: REPLAY_WAIT state_history: -- mdc.soaked-MDT0002-mdc-ffff88102f38a400.state= current_state: REPLAY_WAIT state_history: ---------------- lola-26 ---------------- mdc.soaked-MDT0001-mdc-ffff880f7857fc00.state= current_state: REPLAY_WAIT state_history: -- mdc.soaked-MDT0002-mdc-ffff880f7857fc00.state= current_state: REPLAY_WAIT state_history: ---------------- lola-27 ---------------- mdc.soaked-MDT0001-mdc-ffff88081a0eec00.state= current_state: REPLAY_WAIT state_history: -- mdc.soaked-MDT0002-mdc-ffff88081a0eec00.state= current_state: REPLAY_WAIT state_history: ---------------- lola-29 ---------------- ---------------- lola-13 ---------------- mdc.soaked-MDT0001-mdc-ffff88028add7800.state= current_state: REPLAY_WAIT state_history: -- mdc.soaked-MDT0002-mdc-ffff88028add7800.state= current_state: REPLAY_WAIT state_history: ---------------- lola-16 ---------------- mdc.soaked-MDT0001-mdc-ffff880415a56400.state= current_state: REPLAY_WAIT state_history: -- mdc.soaked-MDT0002-mdc-ffff880415a56400.state= current_state: REPLAY_WAIT state_history: ---------------- lola-19 ---------------- mdc.soaked-MDT0001-mdc-ffff88021b0fa800.state= current_state: REPLAY_WAIT state_history: -- mdc.soaked-MDT0002-mdc-ffff88021b0fa800.state= current_state: REPLAY_WAIT state_history: ---------------- lola-20 ---------------- mdc.soaked-MDT0001-mdc-ffff880c9b13f400.state= current_state: REPLAY_WAIT state_history: -- mdc.soaked-MDT0002-mdc-ffff880c9b13f400.state= current_state: REPLAY_WAIT state_history: ---------------- lola-21 ---------------- mdc.soaked-MDT0001-mdc-ffff881033e32400.state= current_state: REPLAY_WAIT state_history: -- mdc.soaked-MDT0002-mdc-ffff881033e32400.state= current_state: REPLAY_WAIT state_history: ---------------- lola-22 ---------------- mdc.soaked-MDT0001-mdc-ffff88082e455000.state= current_state: REPLAY_WAIT state_history: Attached files: |
| Comments |
| Comment by Frank Heckes (Inactive) [ 21/Jul/16 ] |
|
debug kernel files with suffix '_0136' had been created during recovery, those with suffix '_0225' after completion of recovery. |
| Comment by Cliff White (Inactive) [ 17/Aug/16 ] |
|
We are still seeing the long recovery times, example: lola-10 ---------------- mdt.soaked-MDT0002.recovery_status= status: COMPLETE recovery_start: 1471407102 recovery_duration: 1644 completed_clients: 21/21 replayed_requests: 0 last_transno: 150323855360 VBR: ENABLED IR: ENABLED |
| Comment by nasf (Inactive) [ 20/Sep/16 ] |
|
According to the given log (lustre-log-lola-10-20160720_0136), the lola-10 was trying to connect MDT0003 for many many times: 00000020:00000080:23.0:1468994840.970247:0:6167:0:(obd_config.c:1148:class_process_config()) processing cmd: cf003 00000004:00080000:23.0:1468994840.970280:0:6167:0:(osp_dev.c:1121:osp_init0()) soaked-MDT0003-osp-MDT0002: connect to soaked-MDT0002-osd (soaked-MDT0003-osp-MDT0002) 00080000:01000000:23.0:1468994840.970287:0:6167:0:(osd_handler.c:6820:osd_obd_connect()) connect #5 00000020:00000080:23.0:1468994840.970290:0:6167:0:(genops.c:1169:class_connect()) connect: client soaked-MDT0002-osd_UUID, cookie 0x24fc9dac120838ee 00010000:00080000:23.0:1468994840.970307:0:6167:0:(ldlm_lib.c:112:import_set_conn()) imp ffff8803fc39b800@soaked-MDT0003-osp-MDT0002: add connection 192.168.1.111@o2ib10 at head 00000020:00000080:23.0:1468994840.971931:0:6167:0:(obd_config.c:552:class_setup()) finished setup of obd soaked-MDT0003-osp-MDT0002 (uuid soaked-MDT0002-mdtlov_UUID) 00000020:00000080:23.0:1468994840.971939:0:6167:0:(obd_config.c:1148:class_process_config()) processing cmd: cf005 00000020:00000080:23.0:1468994840.971943:0:6167:0:(obd_config.c:1159:class_process_config()) adding mapping from uuid 192.168.1.111@o2ib10 to nid 0x5000ac0a8016f (192.168.1.111@o2ib10) 00000020:00000080:23.0:1468994840.971948:0:6167:0:(obd_config.c:1148:class_process_config()) processing cmd: cf00b 00010000:00080000:23.0:1468994840.971957:0:6167:0:(ldlm_lib.c:95:import_set_conn()) imp ffff8803fc39b800@soaked-MDT0003-osp-MDT0002: found existing conn 192.168.1.111@o2ib10 00000020:00000080:23.0:1468994840.971962:0:6167:0:(obd_config.c:1148:class_process_config()) processing cmd: cf005 00000020:00000080:23.0:1468994840.971967:0:6167:0:(obd_config.c:1159:class_process_config()) adding mapping from uuid 192.168.1.110@o2ib10 to nid 0x5000ac0a8016e (192.168.1.110@o2ib10) 00000020:00000080:23.0:1468994840.971971:0:6167:0:(obd_config.c:1148:class_process_config()) processing cmd: cf00b 00010000:00080000:23.0:1468994840.971976:0:6167:0:(ldlm_lib.c:112:import_set_conn()) imp ffff8803fc39b800@soaked-MDT0003-osp-MDT0002: add connection 192.168.1.110@o2ib10 at tail 00000020:00000080:23.0:1468994840.971980:0:6167:0:(obd_config.c:1148:class_process_config()) processing cmd: cf014 00000004:01000000:23.0:1468994840.971993:0:6167:0:(lod_lov.c:225:lod_add_device()) osp:soaked-MDT0003_UUID idx:3 gen:1 00000004:01000000:23.0:1468994840.971997:0:6167:0:(osp_dev.c:1411:osp_obd_connect()) connect #0 00000020:00000080:23.0:1468994840.972004:0:6167:0:(genops.c:1169:class_connect()) connect: client soaked-MDT0002-mdtlov_UUID, cookie 0x24fc9dac120838fc 00000100:00080000:23.0:1468994840.972006:0:6167:0:(import.c:679:ptlrpc_connect_import()) ffff8803fc39b800 soaked-MDT0003_UUID: changing import state from NEW to CONNECTING 00000100:00080000:23.0:1468994840.972010:0:6167:0:(import.c:523:import_select_connection()) soaked-MDT0003-osp-MDT0002: connect to NID 192.168.1.111@o2ib10 last attempt 0 00000100:00080000:23.0:1468994840.972014:0:6167:0:(import.c:601:import_select_connection()) soaked-MDT0003-osp-MDT0002: import ffff8803fc39b800 using connection 192.168.1.111@o2ib10/192.168.1.111@o2ib10 ... 00000100:02020000:1.0:1468994840.996289:0:4562:0:(client.c:1266:ptlrpc_check_status()) 11-0: soaked-MDT0003-osp-MDT0002: operation mds_connect to node 192.168.1.111@o2ib10 failed: rc = -114 00000100:00080000:1.0:1468994840.996295:0:4562:0:(import.c:1290:ptlrpc_connect_interpret()) ffff8803fc39b800 soaked-MDT0003_UUID: changing import state from CONNECTING to DISCONN 00000100:00080000:1.0:1468994840.996298:0:4562:0:(import.c:1336:ptlrpc_connect_interpret()) recovery of soaked-MDT0003_UUID on 192.168.1.111@o2ib10 failed (-114) ... 00000100:00080000:17.0:1468994866.072408:0:4595:0:(recover.c:62:ptlrpc_initiate_recovery()) soaked-MDT0003_UUID: starting recovery 00000100:00080000:17.0:1468994866.072410:0:4595:0:(import.c:679:ptlrpc_connect_import()) ffff8803fc39b800 soaked-MDT0003_UUID: changing import state from DISCONN to CONNECTING 00000100:00080000:17.0:1468994866.072415:0:4595:0:(import.c:523:import_select_connection()) soaked-MDT0003-osp-MDT0002: connect to NID 192.168.1.111@o2ib10 last attempt 4295034989 00000100:00080000:17.0:1468994866.072418:0:4595:0:(import.c:523:import_select_connection()) soaked-MDT0003-osp-MDT0002: connect to NID 0@lo last attempt 0 00000100:00080000:17.0:1468994866.072428:0:4595:0:(import.c:593:import_select_connection()) soaked-MDT0003-osp-MDT0002: Connection changing to soaked-MDT0003 (at 0@lo) 00000100:00080000:17.0:1468994866.072431:0:4595:0:(import.c:601:import_select_connection()) soaked-MDT0003-osp-MDT0002: import ffff8803fc39b800 using connection 192.168.1.110@o2ib10/0@lo 00000100:00080000:17.0:1468994866.072578:0:4595:0:(pinger.c:221:ptlrpc_pinger_process_import()) soaked-MDT0000-lwp-MDT0002_UUID->soaked-MDT0000_UUID: level DISCONN/3 force 0 force_next 0 deactive 0 pingable 0 suppress 0 00000100:00080000:17.0:1468994866.072584:0:4595:0:(recover.c:62:ptlrpc_initiate_recovery()) soaked-MDT0000_UUID: starting recovery 00000100:00080000:17.0:1468994866.072586:0:4595:0:(import.c:679:ptlrpc_connect_import()) ffff880832815000 soaked-MDT0000_UUID: changing import state from DISCONN to CONNECTING 00000100:00080000:17.0:1468994866.072592:0:4595:0:(import.c:523:import_select_connection()) soaked-MDT0000-lwp-MDT0002: connect to NID 192.168.1.108@o2ib10 last attempt 4295035090 00000100:00080000:17.0:1468994866.072597:0:4595:0:(import.c:523:import_select_connection()) soaked-MDT0000-lwp-MDT0002: connect to NID 192.168.1.109@o2ib10 last attempt 0 00000100:00080000:17.0:1468994866.072602:0:4595:0:(import.c:593:import_select_connection()) soaked-MDT0000-lwp-MDT0002: Connection changing to soaked-MDT0000 (at 192.168.1.109@o2ib10) 00000100:00080000:17.0:1468994866.072607:0:4595:0:(import.c:601:import_select_connection()) soaked-MDT0000-lwp-MDT0002: import ffff880832815000 using connection 192.168.1.109@o2ib10/192.168.1.109@o2ib10 00010000:02020000:1.0:1468994866.072759:0:6154:0:(ldlm_lib.c:962:target_handle_connect()) 137-5: soaked-MDT0003_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server. 00000100:00080000:17.0:1468994866.089880:0:4562:0:(import.c:1290:ptlrpc_connect_interpret()) ffff8803fc39b800 soaked-MDT0003_UUID: changing import state from CONNECTING to DISCONN 00000100:00080000:17.0:1468994866.089885:0:4562:0:(import.c:1336:ptlrpc_connect_interpret()) recovery of soaked-MDT0003_UUID on 192.168.1.110@o2ib10 failed (-19) ... 00000100:00080000:17.0:1468994891.072459:0:4595:0:(recover.c:62:ptlrpc_initiate_recovery()) soaked-MDT0003_UUID: starting recovery 00000100:00080000:17.0:1468994891.072461:0:4595:0:(import.c:679:ptlrpc_connect_import()) ffff8803fc39b800 soaked-MDT0003_UUID: changing import state from DISCONN to CONNECTING 00000100:00080000:17.0:1468994891.072466:0:4595:0:(import.c:523:import_select_connection()) soaked-MDT0003-osp-MDT0002: connect to NID 192.168.1.111@o2ib10 last attempt 4295034989 00000100:00080000:17.0:1468994891.072470:0:4595:0:(import.c:523:import_select_connection()) soaked-MDT0003-osp-MDT0002: connect to NID 0@lo last attempt 4295060090 00000100:00080000:17.0:1468994891.072475:0:4595:0:(import.c:567:import_select_connection()) soaked-MDT0003-osp-MDT0002: tried all connections, increasing latency to 6s 00000100:00080000:17.0:1468994891.072488:0:4595:0:(import.c:593:import_select_connection()) soaked-MDT0003-osp-MDT0002: Connection changing to soaked-MDT0003 (at 192.168.1.111@o2ib10) 00000100:00080000:17.0:1468994891.072492:0:4595:0:(import.c:601:import_select_connection()) soaked-MDT0003-osp-MDT0002: import ffff8803fc39b800 using connection 192.168.1.111@o2ib10/192.168.1.111@o2ib10 00000100:02020000:2.0:1468994891.072768:0:4562:0:(client.c:1266:ptlrpc_check_status()) 11-0: soaked-MDT0000-osp-MDT0002: operation mds_connect to node 192.168.1.108@o2ib10 failed: rc = -114 00000100:00080000:2.0:1468994891.089071:0:4562:0:(import.c:1290:ptlrpc_connect_interpret()) ffff8804256c8000 soaked-MDT0000_UUID: changing import state from CONNECTING to DISCONN 00000100:00080000:2.0:1468994891.089075:0:4562:0:(import.c:1336:ptlrpc_connect_interpret()) recovery of soaked-MDT0000_UUID on 192.168.1.108@o2ib10 failed (-114) 00000100:02020000:2.0:1468994891.089104:0:4562:0:(client.c:1266:ptlrpc_check_status()) 11-0: soaked-MDT0003-osp-MDT0002: operation mds_connect to node 192.168.1.111@o2ib10 failed: rc = -114 00000100:00080000:2.0:1468994891.089108:0:4562:0:(import.c:1290:ptlrpc_connect_interpret()) ffff8803fc39b800 soaked-MDT0003_UUID: changing import state from CONNECTING to DISCONN 00000100:00080000:2.0:1468994891.089110:0:4562:0:(import.c:1336:ptlrpc_connect_interpret()) recovery of soaked-MDT0003_UUID on 192.168.1.111@o2ib10 failed (-114) ... As shown, at the beginning (timestamps: 1468994840), the MDT0002 on the lola-10 selected lola-11 (192.168.1.111@o2ib10) for the soaked-MDT0003-osp-MDT0002 connection. Unfortunately, the lola-11 returned -114 to the lola-10. That means another instance of soaked-MDT0003-osp-MDT0002 has already connected (or in connecting) to the MDT0003 on the lola-11. Because lola-10 and lola-11 were HA pairs, it was quite possible that the lola-11 took lola-10's role to server as MDT0002 during the lola-10 failover. Since MDT0002 had been mounted on the lola-11, the connection between MDT0002 and MDT0003 had been established at that time, then it is normal that the MDT0002 on the lola-10 got -114 failure. Because lola-10 and lola-11 were HA pairs, when hit -114 failure, the MDT0002 on the lola-10 selected another peer for the soaked-MDT0003-osp-MDT0002 connection, it is lola-10 (192.168.1.111@o2ib10) itself. But it is obviously that MDT0003 was running on the lola-11, so the MDT0002 on the lola-10 got -19 failure. Then, the MDT0002 has to find other peer for the soaked-MDT0003-osp-MDT0002 connection. There were only two choices, either lola-11 or lola-10, so it selected lola-11, the same as the first choice, but still hit -114 failure. And then try to connect lola-10 again, failed as -19 again. So the MDT0002 on the lola-10 tried to connect to MDT0003 repeatedly for a long time (from the timestamps 1468994840 to 1469003821), 8981 seconds passed, but still cannot connect to the MDT0003. Such connection failure blocked the cross-MDT recovery. From the given logs, I cannot find when the soaked-MDT0003-osp-MDT0002 connection was established successfully. Because we have no logs on the lola-11, we do not know what happened for the lola-11. But it is suspected that it is related with another MDT0002 instance on the lola-11 because of the HA pairs configuration. Since the HA is controlled by some third-part HA software, such as heartbeat or pack-maker, can we check whether it works well or not under the failover case? Can we switch the resource by force under such case? Anyway, offer me more logs on the lola-11 will be helpful. Thanks! |
| Comment by Frank Heckes (Inactive) [ 20/Sep/16 ] |
|
The HA configuration reads as follows:
MDTs are configured for failover during format. HA is controlled by the soak test framework. For the scenario above the following process flow were executed
The console and message files for lola-11 aren't available anymore. Because of capacity reason all files older than 6 weeks are deleted on the cluster's share FS. For build '20160916' (see https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20160916) this effect didn't occurred |
| Comment by nasf (Inactive) [ 24/Sep/16 ] |
How did you do that? umount the MDT just after the MDT mount complete? With or without "-f"? It seems that from the lola-11 view, the connection from the MDT0002 was not cleanup, as to refused the new connection from the MDT0002 on the lola-10. One possible solution is allow the new connection to replace the old. I wondered why the patch http://review.whamcloud.com/#/c/18800/ did not take effect (or some side-effect?) under such case. |
| Comment by Peter Jones [ 05/Oct/16 ] |
|
Di believes this issue is a duplicate of |