[LU-4043] clients unable to reconnect after OST failover - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Duplicate
Priority: Critical
Fix Version/s: Lustre 2.4.2, Lustre 2.5.1
Affects Version/s: Lustre 2.4.1
Labels:
None

Severity:
3
Rank (Obsolete):
10853

Description

BP has been running into an issue on their 2.4.1 system where clients are not able to connect to the OSTs after a failover.

Looking at the debug logs on the MDS, it looks like the problem is that when the OSTs register, both service node NIDs are assigned to one UUID, which is named after nid[0]. The imperative recovery code, however, uses the UUID name instead of the NID name when creating a connection. This causes imperative recovery to keep trying to connect to the first service node, even when the MGS tells it to connect to the second.

20000000:01000000:0.0:1380683267.395792:0:14496:0:(mgs_handler.c:344:mgs_handle_target_reg()) Server pfs-OST0006 is running on 10.10.160.26@tcp1
...
00000100:00000040:0.0:1380683274.383862:0:14498:0:(lustre_peer.c:200:class_check_uuid()) check if uuid 10.10.160.25@tcp1 has 10.10.160.26@tcp1.
10000000:00000040:0.0:1380683274.383865:0:14498:0:(mgc_request.c:1408:mgc_apply_recover_logs()) Find uuid 10.10.160.25@tcp1 by nid 10.10.160.26@tcp1

Here is the tunefs line used to create the OST:
tunefs.lustre --erase-params --writeconf --mgsnode=10.10.160.21@tcp1 --mgsnode=10.10.160.22@tcp1 --servicenode=10.10.160.25@tcp1 --servicenode=10.10.160.26@tcp1 --param ost.quota_type=ug /dev/mapper/ost_pfs_6

Does import really need to use the uuid, or can it use nid? Alternatively, should the registration code really be using all the servicenode nids for the same UUID?

This also leads into a concern I have with the fix for ~~LU-3445~~. There doesn't appear to be anyway to distinguish between multiple NIDs for the same node, versus different nodes.

Also, I'm sort of confused as to why this wasn't caught in failover testing, is the servicenode parameter not part of the test suite?

Thanks,
Kit

Attachments

Issue Links

duplicates

LU-4243 multiple servicenodes or failnids: wrong client llog registration

Resolved

Activity

[LU-4043] clients unable to reconnect after OST failover

Andreas Dilger added a comment - 13/Mar/14 6:40 PM

This was fixed via http://review.whamcloud.com/8372 from ~~LU-4243~~.

Andreas Dilger added a comment - 13/Mar/14 6:40 PM This was fixed via http://review.whamcloud.com/8372 from LU-4243 .

Stephen Champion added a comment - 06/Mar/14 3:24 AM - edited

This bug appears to have been introduced by ~~LU-1301~~, commit d9d27cadcab87d242072f3ff9d8fd4625415fc10

It should be fixed by ~~LU-4243~~ :
b2_4 commit d61cd9a816f8e9ce959b7e32e73ad2871a8ffe7f
b2_5 commit 79dd406672d87f8c14a14cfe6a3715d562b8838a
master commit f157ebea33edb656b123bb34d4cb06f970bb4bb6
which appears functionally identical to http://review.whamcloud.com/7835

So this can probably be closed.

Stephen Champion added a comment - 06/Mar/14 3:24 AM - edited This bug appears to have been introduced by LU-1301 , commit d9d27cadcab87d242072f3ff9d8fd4625415fc10 It should be fixed by LU-4243 : b2_4 commit d61cd9a816f8e9ce959b7e32e73ad2871a8ffe7f b2_5 commit 79dd406672d87f8c14a14cfe6a3715d562b8838a master commit f157ebea33edb656b123bb34d4cb06f970bb4bb6 which appears functionally identical to http://review.whamcloud.com/7835 So this can probably be closed.

Jinshan Xiong (Inactive) added a comment - 02/Oct/13 5:52 PM

it looks like the failover NIDs were wrongly added to the first connection.

Try patch http://review.whamcloud.com/7835 please - try it first before applying it on production because I didn't test it myself.

Jinshan Xiong (Inactive) added a comment - 02/Oct/13 5:52 PM it looks like the failover NIDs were wrongly added to the first connection. Try patch http://review.whamcloud.com/7835 please - try it first before applying it on production because I didn't test it myself.

Kit Westneat (Inactive) added a comment - 02/Oct/13 5:01 PM

I tested this with the patch for ~~LU-3829~~ and it seems to workaround the issue, so I think the priority on this ticket can be lowered. IR with service nodes still appears to be broken though, it still attempts to connect to the first service node as opposed to the connected service node.

Kit Westneat (Inactive) added a comment - 02/Oct/13 5:01 PM I tested this with the patch for LU-3829 and it seems to workaround the issue, so I think the priority on this ticket can be lowered. IR with service nodes still appears to be broken though, it still attempts to connect to the first service node as opposed to the connected service node.

Kit Westneat (Inactive) added a comment - 02/Oct/13 12:46 PM

Here's the code that adds all the failnodes to a single uuid:

mgs_write_log_failnids:
...
while (class_find_param(ptr, PARAM_FAILNODE, &ptr) == 0) {
                while (class_parse_nid(ptr, &nid, &ptr) == 0) {
                        if (failnodeuuid == NULL) {
                                /* We don't know the failover node name,
                                   so just use the first nid as the uuid */
                                rc = name_create(&failnodeuuid,
                                                 libcfs_nid2str(nid), "");
                                if (rc)
                                        return rc;
                        }
                        CDEBUG(D_MGS, "add nid %s for failover uuid %s, "
                               "client %s\n", libcfs_nid2str(nid),
                               failnodeuuid, cliname);
                        rc = record_add_uuid(env, llh, nid, failnodeuuid);
                }
                if (failnodeuuid)
                        rc = record_add_conn(env, llh, cliname, failnodeuuid);
        }
...

Kit Westneat (Inactive) added a comment - 02/Oct/13 12:46 PM Here's the code that adds all the failnodes to a single uuid: mgs_write_log_failnids: ... while (class_find_param(ptr, PARAM_FAILNODE, &ptr) == 0) { while (class_parse_nid(ptr, &nid, &ptr) == 0) { if (failnodeuuid == NULL) { /* We don't know the failover node name, so just use the first nid as the uuid */ rc = name_create(&failnodeuuid, libcfs_nid2str(nid), ""); if (rc) return rc; } CDEBUG(D_MGS, "add nid %s for failover uuid %s, " "client %s\n" , libcfs_nid2str(nid), failnodeuuid, cliname); rc = record_add_uuid(env, llh, nid, failnodeuuid); } if (failnodeuuid) rc = record_add_conn(env, llh, cliname, failnodeuuid); } ...

clients unable to reconnect after OST failover

Details

Description

Attachments

Issue Links

Activity

People

Dates