[LU-4043] clients unable to reconnect after OST failover Created: 02/Oct/13 Updated: 03/Jun/14 Resolved: 13/Mar/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.1 |
| Fix Version/s: | Lustre 2.4.2, Lustre 2.5.1 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Kit Westneat (Inactive) | Assignee: | Jinshan Xiong (Inactive) |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 10853 | ||||||||
| Description |
|
BP has been running into an issue on their 2.4.1 system where clients are not able to connect to the OSTs after a failover. Looking at the debug logs on the MDS, it looks like the problem is that when the OSTs register, both service node NIDs are assigned to one UUID, which is named after nid[0]. The imperative recovery code, however, uses the UUID name instead of the NID name when creating a connection. This causes imperative recovery to keep trying to connect to the first service node, even when the MGS tells it to connect to the second. 20000000:01000000:0.0:1380683267.395792:0:14496:0:(mgs_handler.c:344:mgs_handle_target_reg()) Server pfs-OST0006 is running on 10.10.160.26@tcp1
...
00000100:00000040:0.0:1380683274.383862:0:14498:0:(lustre_peer.c:200:class_check_uuid()) check if uuid 10.10.160.25@tcp1 has 10.10.160.26@tcp1.
10000000:00000040:0.0:1380683274.383865:0:14498:0:(mgc_request.c:1408:mgc_apply_recover_logs()) Find uuid 10.10.160.25@tcp1 by nid 10.10.160.26@tcp1
Here is the tunefs line used to create the OST: Does import really need to use the uuid, or can it use nid? Alternatively, should the registration code really be using all the servicenode nids for the same UUID? This also leads into a concern I have with the fix for Also, I'm sort of confused as to why this wasn't caught in failover testing, is the servicenode parameter not part of the test suite? Thanks, |
| Comments |
| Comment by Kit Westneat (Inactive) [ 02/Oct/13 ] |
|
Here's the code that adds all the failnodes to a single uuid: mgs_write_log_failnids: ... while (class_find_param(ptr, PARAM_FAILNODE, &ptr) == 0) { while (class_parse_nid(ptr, &nid, &ptr) == 0) { if (failnodeuuid == NULL) { /* We don't know the failover node name, so just use the first nid as the uuid */ rc = name_create(&failnodeuuid, libcfs_nid2str(nid), ""); if (rc) return rc; } CDEBUG(D_MGS, "add nid %s for failover uuid %s, " "client %s\n", libcfs_nid2str(nid), failnodeuuid, cliname); rc = record_add_uuid(env, llh, nid, failnodeuuid); } if (failnodeuuid) rc = record_add_conn(env, llh, cliname, failnodeuuid); } ... |
| Comment by Kit Westneat (Inactive) [ 02/Oct/13 ] |
|
I tested this with the patch for |
| Comment by Jinshan Xiong (Inactive) [ 02/Oct/13 ] |
|
it looks like the failover NIDs were wrongly added to the first connection. Try patch http://review.whamcloud.com/7835 please - try it first before applying it on production because I didn't test it myself. |
| Comment by Stephen Champion [ 06/Mar/14 ] |
|
This bug appears to have been introduced by It should be fixed by So this can probably be closed. |
| Comment by Andreas Dilger [ 13/Mar/14 ] |
|
This was fixed via http://review.whamcloud.com/8372 from |