[LU-5583] clients receive IO error after MDT failover Created: 04/Sep/14 Updated: 07/Jun/16 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.2 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Frederik Ferner (Inactive) | Assignee: | Zhenyu Xu |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Environment: |
RHEL6 server, RHEL6 clients, servers connected to IB and ethernet, clients can be either connected to IB and ethernet or just ethernet |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 15574 | ||||||||
| Description |
|
After our active MDS became completely unresponsive earlier, we attempted to fail over to the second MDS. This appeared to succeed, the MGS and MDT mounted successfully, as far as we can tell all clients reconnected, recovery completed. However at this stage, any operation on the file system (for example ls) on any client only connected via ethernet either hung or returned I/O errors, all clients using IB were operating normally. We then discovered that the MDT that there seemed to be a problem between MDT and all OSTs, as lctl get_param lod.lustre03-MDT0000-mdtlov.target_obd came back empty. Failing back to the (now rebooted) previous MDT worked and the file system is now operating normally again. Sample errors in syslog on one of the ethernet only clients while ls /mnt/lustre03 was returing I/O errors: Sep 4 09:56:18 cs04r-sc-serv-06 kernel: Lustre: MGC172.23.144.1@tcp: Connection restored to MGS (at 172.23.144.2@tcp) Sep 4 09:57:58 cs04r-sc-serv-06 kernel: LustreError: 11-0: lustre03-MDT0000-mdc-ffff880073fec800: Communicating with 172.23.144.2@tcp, operation mds_connect failed with -16. Sep 4 09:58:23 cs04r-sc-serv-06 kernel: LustreError: 11-0: lustre03-MDT0000-mdc-ffff880073fec800: Communicating with 172.23.144.2@tcp, operation mds_connect failed with -16. Sep 4 09:58:48 cs04r-sc-serv-06 kernel: LustreError: 11-0: lustre03-MDT0000-mdc-ffff880073fec800: Communicating with 172.23.144.2@tcp, operation mds_connect failed with -16. Sep 4 09:59:13 cs04r-sc-serv-06 kernel: LustreError: 11-0: lustre03-MDT0000-mdc-ffff880073fec800: Communicating with 172.23.144.2@tcp, operation mds_connect failed with -16. Sep 4 09:59:38 cs04r-sc-serv-06 kernel: LustreError: 11-0: lustre03-MDT0000-mdc-ffff880073fec800: Communicating with 172.23.144.2@tcp, operation mds_connect failed with -16. Sep 4 10:00:03 cs04r-sc-serv-06 kernel: LustreError: 11-0: lustre03-MDT0000-mdc-ffff880073fec800: Communicating with 172.23.144.2@tcp, operation mds_connect failed with -16. Sep 4 10:00:28 cs04r-sc-serv-06 kernel: LustreError: 11-0: lustre03-MDT0000-mdc-ffff880073fec800: Communicating with 172.23.144.2@tcp, operation mds_connect failed with -16. Sep 4 10:01:18 cs04r-sc-serv-06 kernel: LustreError: 11-0: lustre03-MDT0000-mdc-ffff880073fec800: Communicating with 172.23.144.2@tcp, operation mds_connect failed with -16. Sep 4 10:01:18 cs04r-sc-serv-06 kernel: LustreError: Skipped 1 previous similar message Sep 4 10:02:33 cs04r-sc-serv-06 kernel: LustreError: 11-0: lustre03-MDT0000-mdc-ffff880073fec800: Communicating with 172.23.144.2@tcp, operation mds_connect failed with -16. Sep 4 10:02:33 cs04r-sc-serv-06 kernel: LustreError: Skipped 2 previous similar messages Sep 4 10:05:03 cs04r-sc-serv-06 kernel: LustreError: 11-0: lustre03-MDT0000-mdc-ffff880073fec800: Communicating with 172.23.144.2@tcp, operation mds_connect failed with -16. Sep 4 10:05:03 cs04r-sc-serv-06 kernel: LustreError: Skipped 5 previous similar messages Sep 4 10:09:38 cs04r-sc-serv-06 kernel: LustreError: 11-0: lustre03-MDT0000-mdc-ffff880073fec800: Communicating with 172.23.144.2@tcp, operation mds_connect failed with -16. Sep 4 10:09:38 cs04r-sc-serv-06 kernel: LustreError: Skipped 10 previous similar messages Sep 4 10:33:15 cs04r-sc-serv-06 kernel: LustreError: 32662:0:(dir.c:422:ll_get_dir_page()) read cache page: [0xe900001:0x3b1189d1:0x0] at 0: rc -4 Sep 4 10:33:15 cs04r-sc-serv-06 kernel: LustreError: 32662:0:(dir.c:584:ll_dir_read()) error reading dir [0xe900001:0x3b1189d1:0x0] at 0: rc -4 Sep 4 10:34:00 cs04r-sc-serv-06 kernel: LustreError: 32717:0:(dir.c:398:ll_get_dir_page()) dir page locate: [0xe900001:0x3b1189d1:0x0] at 0: rc -5 Sep 4 10:34:00 cs04r-sc-serv-06 kernel: LustreError: 32717:0:(dir.c:584:ll_dir_read()) error reading dir [0xe900001:0x3b1189d1:0x0] at 0: rc -5 Sep 4 10:37:44 cs04r-sc-serv-06 kernel: LustreError: 487:0:(mdc_locks.c:918:mdc_enqueue()) ldlm_cli_enqueue: -4 Sep 4 10:37:44 cs04r-sc-serv-06 kernel: LustreError: 487:0:(mdc_locks.c:918:mdc_enqueue()) Skipped 879 previous similar messages Sep 4 10:37:57 cs04r-sc-serv-06 kernel: LustreError: 508:0:(dir.c:398:ll_get_dir_page()) dir page locate: [0xe900001:0x3b1189d1:0x0] at 0: rc -5 Sep 4 10:37:57 cs04r-sc-serv-06 kernel: LustreError: 508:0:(dir.c:584:ll_dir_read()) error reading dir [0xe900001:0x3b1189d1:0x0] at 0: rc -5 Sep 4 10:37:58 cs04r-sc-serv-06 kernel: LustreError: 510:0:(dir.c:398:ll_get_dir_page()) dir page locate: [0xe900001:0x3b1189d1:0x0] at 0: rc -5 Sep 4 10:37:59 cs04r-sc-serv-06 kernel: LustreError: 512:0:(dir.c:584:ll_dir_read()) error reading dir [0xe900001:0x3b1189d1:0x0] at 0: rc -5 Sep 4 10:37:59 cs04r-sc-serv-06 kernel: LustreError: 512:0:(dir.c:584:ll_dir_read()) Skipped 1 previous similar message Sep 4 10:43:34 cs04r-sc-serv-06 kernel: LustreError: 875:0:(dir.c:398:ll_get_dir_page()) dir page locate: [0xe900001:0x3b1189d1:0x0] at 0: rc -5 Sep 4 10:43:34 cs04r-sc-serv-06 kernel: LustreError: 875:0:(dir.c:398:ll_get_dir_page()) Skipped 2 previous similar messages Sep 4 10:43:34 cs04r-sc-serv-06 kernel: LustreError: 875:0:(dir.c:584:ll_dir_read()) error reading dir [0xe900001:0x3b1189d1:0x0] at 0: rc -5 Sep 4 10:43:34 cs04r-sc-serv-06 kernel: LustreError: 875:0:(dir.c:584:ll_dir_read()) Skipped 1 previous similar message Sep 4 10:47:19 cs04r-sc-serv-06 kernel: LustreError: 1122:0:(dir.c:398:ll_get_dir_page()) dir page locate: [0xe900001:0x3b1189d1:0x0] at 0: rc -5 Sep 4 10:47:19 cs04r-sc-serv-06 kernel: LustreError: 1122:0:(dir.c:584:ll_dir_read()) error reading dir [0xe900001:0x3b1189d1:0x0] at 0: rc -5 I'll attach the full MDT syslog file starting with the mount until we unmounted again to fail back to the previous MDT as a file. Note that IB and lnet over IB has been added to this file system recently, following the instructions in the manual on changing server NIDs, including unmounting everything, unloading lustre modules on the servers completely, tunefs.lustre --writeconf --erase-param with the new NIDs etc, mounting MGS, MDT, OSTs, in this order. (Some ethernet only clients might have been still up during this, but the client I used to test this while it wasn't working certainly had been unmounted then and rebooted a few times after). We are currently concerned that this will happen again if we have to do another fail over on the MDT, so want to solve this. Let us know what other information we should provide. |
| Comments |
| Comment by Peter Jones [ 05/Sep/14 ] |
|
Bobijam Could you please advise on this issue? Thanks Peter |
| Comment by Zhenyu Xu [ 05/Sep/14 ] |
|
from the mds log, it shows that for some unknown reason, the MGS does not work correctly, Sep 4 09:55:21 cs04r-sc-mds03-02 kernel: Lustre: MGS: non-config logname received: params Sep 4 09:55:22 cs04r-sc-mds03-02 kernel: Lustre: MGS: non-config logname received: params ... Sep 4 09:56:35 cs04r-sc-mds03-02 kernel: LustreError: 43873:0:(obd_mount_server.c:1136:server_register_target()) lustre03-MDT0000: error registering with the MGS: rc = -5 (not fatal) ... Sep 4 09:57:11 cs04r-sc-mds03-02 kernel: LustreError: 13a-8: Failed to get MGS log params and no local copy. Is MGT is a separate device and is it mounted elsewhere while this MDS node is trying to mount it? |
| Comment by Frederik Ferner (Inactive) [ 05/Sep/14 ] |
|
Yes, the MGS is a separate partition, it would also be affected by any MDT fail over and our scripts will mount both of them on the same server. Though now that I think about it, I'm not convinced any order is enforced in those scripts. It has always worked so far... |
| Comment by Frederik Ferner (Inactive) [ 09/Sep/14 ] |
|
so I've just tried again, with the same result at least as far as the logs and the MDT are concerned, this time ignoring any scripts and doing all steps manually after a fresh reboot of the failover MDS. The following steps seem to reproduce it in this setup every time:
The same errors about as above are in the logs, though this time all clients that I've tried seem to work and new clients can mount the file system, so the MGS appears to work at least for them. I've then (on the same failover MDS) attempted to unmount/mount the MDT, this fails with the following log messages: Sep 9 23:46:26 cs04r-sc-mds03-02 kernel: Lustre: server umount lustre03-MDT0000 complete Sep 9 23:46:37 cs04r-sc-mds03-02 kernel: LDISKFS-fs (dm-6): mounted filesystem with ordered data mode. quota=off. Opts: Sep 9 23:46:38 cs04r-sc-mds03-02 kernel: LustreError: 40807:0:(genops.c:320:class_newdev()) Device MGC10.144.144.1@o2ib already exists at 4, won't add Sep 9 23:46:38 cs04r-sc-mds03-02 kernel: LustreError: 40807:0:(obd_config.c:374:class_attach()) Cannot create device MGC10.144.144.1@o2ib of type mgc : -17 Sep 9 23:46:38 cs04r-sc-mds03-02 kernel: LustreError: 40807:0:(obd_mount.c:195:lustre_start_simple()) MGC10.144.144.1@o2ib attach error -17 Sep 9 23:46:38 cs04r-sc-mds03-02 kernel: LustreError: 40807:0:(obd_mount_server.c:861:lustre_disconnect_lwp()) lustre03-MDT0000-lwp-MDT0000: Can't end config log lustre03-client. Sep 9 23:46:38 cs04r-sc-mds03-02 kernel: LustreError: 40807:0:(obd_mount_server.c:1436:server_put_super()) lustre03-MDT0000: failed to disconnect lwp. (rc=-2) Sep 9 23:46:38 cs04r-sc-mds03-02 kernel: LustreError: 40807:0:(obd_mount_server.c:1466:server_put_super()) no obd lustre03-MDT0000 Sep 9 23:46:38 cs04r-sc-mds03-02 kernel: LustreError: 40807:0:(obd_mount_server.c:135:server_deregister_mount()) lustre03-MDT0000 not registered Sep 9 23:46:38 cs04r-sc-mds03-02 kernel: Lustre: server umount lustre03-MDT0000 complete Repeating the steps above on the initially active MDS again does not generate the last two log entries you highlighted (only the first two) and cycling through umount/mount for just the MDT works as expected and succeeds in mounting the MDT every time. Looking at tunefs.lustre output (below), I don't see any typo in the IP addresses for the mgs node, but maybe there's another problem, so I'll put them here in case it's relevant and/or helps [bnh65367@cs04r-sc-mds03-01 ~]$ sudo tunefs.lustre --print /dev/vg_lustre03/mdt
checking for existing Lustre data: found
Reading CONFIGS/mountdata
Read previous values:
Target: lustre03-MDT0000
Index: 0
Lustre FS: lustre03
Mount type: ldiskfs
Flags: 0x1401
(MDT no_primnode )
Persistent mount opts: iopen_nopriv,user_xattr,errors=remount-ro,acl
Parameters: mgsnode=10.144.144.1@o2ib,172.23.144.1@tcp mgsnode=10.144.144.2@o2ib,172.23.144.2@tcp failover.node=10.144.144.1@o2ib,172.23.144.1@tcp failover.node=10.144.144.2@o2ib,172.23.144.2@tcp mdt.quota_type=ug mdt.group_upcall=/usr/sbin/l_getgroups
Permanent disk data:
Target: lustre03-MDT0000
Index: 0
Lustre FS: lustre03
Mount type: ldiskfs
Flags: 0x1401
(MDT no_primnode )
Persistent mount opts: iopen_nopriv,user_xattr,errors=remount-ro,acl
Parameters: mgsnode=10.144.144.1@o2ib,172.23.144.1@tcp mgsnode=10.144.144.2@o2ib,172.23.144.2@tcp failover.node=10.144.144.1@o2ib,172.23.144.1@tcp failover.node=10.144.144.2@o2ib,172.23.144.2@tcp mdt.quota_type=ug mdt.group_upcall=/usr/sbin/l_getgroups
exiting before disk write.
[bnh65367@cs04r-sc-mds03-01 ~]$ sudo tunefs.lustre --print /dev/vg_lustre03/mgs
checking for existing Lustre data: found
Reading CONFIGS/mountdata
Read previous values:
Target: MGS
Index: unassigned
Lustre FS: lustre
Mount type: ldiskfs
Flags: 0x4
(MGS )
Persistent mount opts: iopen_nopriv,user_xattr,errors=remount-ro
Parameters:
Permanent disk data:
Target: MGS
Index: unassigned
Lustre FS: lustre
Mount type: ldiskfs
Flags: 0x4
(MGS )
Persistent mount opts: iopen_nopriv,user_xattr,errors=remount-ro
Parameters:
exiting before disk write.
[bnh65367@cs04r-sc-mds03-01 ~]$
Additional help in debugging this would be appreciated. |
| Comment by Zhenyu Xu [ 10/Sep/14 ] |
|
this looks like |
| Comment by Frederik Ferner (Inactive) [ 10/Sep/14 ] |
|
This patch has fixed the umount/remount issue on the failover MDS. It didn't fix the issue with these entries below, but I don't think you expect this to be fixed by the patch, just stating for clarity kernel: LustreError: 13a-8: Failed to get MGS log params and no local copy. On the other hand, I'm not completely able to reproduce the initial issue anymore, it looks like all clients can talk to the MDT and lctl get_param lod.lustre03-MDT0000-mdtlov.target_obd returns all OSTs as active. |
| Comment by Zhenyu Xu [ 10/Sep/14 ] |
|
yes, you are right, #11765 is just for umount/remount issue. The failure to get MGS log I think could be related to the multiple mount of MGT device at the same time. |
| Comment by Frederik Ferner (Inactive) [ 10/Sep/14 ] |
|
This is how we had been running before the upgrade without any problem and as far as I can see it's only been an issue after the upgrade. Is the recommendation these days to have a separate MGS on a separate machine? Anyway, is the failure to get the MGS log a problem we need to worry about or is it mainly cosmetic? |
| Comment by Zhenyu Xu [ 10/Sep/14 ] |
|
you can use one MGS node for different filesystem, but need separate MGT device for different filesystem. |
| Comment by Frederik Ferner (Inactive) [ 10/Sep/14 ] |
|
I've done a few more tests on this file system while I can (planned maintenance, nearly over now). I'll try to summarise the results here and hopefully they'll be useful for something (at least they'll be for us to remember what has been tested if we get back to this later..) In this file system, we have one MGT and one MDT, both share the same disk backend and are on the same LVM VG, separate LVs. We have two MDS servers able to access this storage (cs04r-sc-mds03-01 and cs04r-sc-mds03-02), both have lnet configured to use TCP and o2ib. The MDT has is configured to access the MGS on either of the servers, via two mgsnode parameters, both listing o2ib and tcp IP addresses. When the MGT and MDT are mounted in this order on cs04r-sc-mds03-01 all seems to be well, no messages in syslog about failure to get MGS log params or anything. So, it seems the MGT works on cs04r-sc-mds03-01 but not on cs04r-sc-mds03-02. |