[LU-12274] Clients aren't connecting to OST defined failover.node - Whamcloud Community JIRA

Details

Type: Question/Request
Resolution: Unresolved
Priority: Blocker
Fix Version/s: None
Affects Version/s: Lustre 2.10.2
Labels:
None
Environment:
Servers: Lustre-2.10.2, Kernel: 3.10.0-693.5.2.el7_lustre.x86_64
Clients: Lustre-2.10.3, Kernel: 3.10.0-693.21.1.el7.x86_64
Client/Server OS: CentOS Linux release 7.4.1708

Epic/Theme:
- Lustre-2.10.2
Rank (Obsolete):
9223372036854775807

Description

I tried running tunefs.lustre and successfully changed the failover NIDs to what they should be. This problem is happening on several OSTs, but fixing one should fix them all.

I'm assuming I forgot a step when I ran tunefs.lustre.

tunefs.lustre --erase-param failover.node --param failover.node=172.17.1.103@o2ib,172.16.1.103@tcp1 /dev/mapper/mpathg

The OST OST0017 is mounted on 172.17.1.103 with the following parameters:

[root@apslstr03 ~]# tunefs.lustre --dryrun /dev/mapper/mpathg
checking for existing Lustre data: found
Reading CONFIGS/mountdata

   Read previous values:
Target:     lustrefc-OST0017
Index:      23
Lustre FS: lustrefc
Mount type: ldiskfs
Flags:      0x2
              (OST )
Persistent mount opts: ,errors=remount-ro
Parameters: failover.node=172.17.1.103@o2ib,172.16.1.103@tcp1 mgsnode=172.17.1.112@o2ib,172.16.1.112@tcp1 mgsnode=172.17.1.113@o2ib,172.16.1.113@tcp1

   Permanent disk data:
Target:     lustrefc-OST0017
Index:      23
Lustre FS: lustrefc
Mount type: ldiskfs
Flags:      0x2
              (OST )
Persistent mount opts: ,errors=remount-ro
Parameters: failover.node=172.17.1.103@o2ib,172.16.1.103@tcp1 mgsnode=172.17.1.112@o2ib,172.16.1.112@tcp1 mgsnode=172.17.1.113@o2ib,172.16.1.113@tcp1

exiting before disk write.
[root@apslstr03 ~]#

However, the clients are still displaying errors like this:

May 8 11:43:33 localhost kernel: Lustre: 2028:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1557333772/real 0] req@ffff880bd9296f00 x1632920191594624/t0(0) o8->lustrefc-OST0017-osc-ffff8817ef372000@172.17.1.106@o2ib:28/4 lens 520/544 e 0 to 1 dl 1557333813 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
May 8 11:43:33 localhost kernel: Lustre: 2028:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 65 previous similar messages
May 8 11:45:26 localhost kernel: LNet: 1994:0:(o2iblnd_cb.c:3192:kiblnd_check_conns()) Timed out tx for 172.17.1.106@o2ib: 3 seconds
May 8 11:45:26 localhost kernel: LNet: 1994:0:(o2iblnd_cb.c:3192:kiblnd_check_conns()) Skipped 39 previous similar messages

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

lustrefc-client.txt
41 kB
10/May/19 2:32 PM

Activity

People

Assignee:: Sebastien Buisson

Reporter:: Roger Sersted

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 08/May/19 7:47 PM

Updated:: 12/May/19 2:38 PM