Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5583

clients receive IO error after MDT failover

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • Lustre 2.5.2
    • None
    • RHEL6 server, RHEL6 clients, servers connected to IB and ethernet, clients can be either connected to IB and ethernet or just ethernet
    • 3
    • 15574

    Description

      After our active MDS became completely unresponsive earlier, we attempted to fail over to the second MDS. This appeared to succeed, the MGS and MDT mounted successfully, as far as we can tell all clients reconnected, recovery completed. However at this stage, any operation on the file system (for example ls) on any client only connected via ethernet either hung or returned I/O errors, all clients using IB were operating normally.

      We then discovered that the MDT that there seemed to be a problem between MDT and all OSTs, as lctl get_param lod.lustre03-MDT0000-mdtlov.target_obd came back empty. Failing back to the (now rebooted) previous MDT worked and the file system is now operating normally again.

      Sample errors in syslog on one of the ethernet only clients while ls /mnt/lustre03 was returing I/O errors:

      Sep  4 09:56:18 cs04r-sc-serv-06 kernel: Lustre: MGC172.23.144.1@tcp: Connection restored to MGS (at 172.23.144.2@tcp)
      Sep  4 09:57:58 cs04r-sc-serv-06 kernel: LustreError: 11-0: lustre03-MDT0000-mdc-ffff880073fec800: Communicating with 172.23.144.2@tcp, operation mds_connect failed with -16.
      Sep  4 09:58:23 cs04r-sc-serv-06 kernel: LustreError: 11-0: lustre03-MDT0000-mdc-ffff880073fec800: Communicating with 172.23.144.2@tcp, operation mds_connect failed with -16.
      Sep  4 09:58:48 cs04r-sc-serv-06 kernel: LustreError: 11-0: lustre03-MDT0000-mdc-ffff880073fec800: Communicating with 172.23.144.2@tcp, operation mds_connect failed with -16.
      Sep  4 09:59:13 cs04r-sc-serv-06 kernel: LustreError: 11-0: lustre03-MDT0000-mdc-ffff880073fec800: Communicating with 172.23.144.2@tcp, operation mds_connect failed with -16.
      Sep  4 09:59:38 cs04r-sc-serv-06 kernel: LustreError: 11-0: lustre03-MDT0000-mdc-ffff880073fec800: Communicating with 172.23.144.2@tcp, operation mds_connect failed with -16.
      Sep  4 10:00:03 cs04r-sc-serv-06 kernel: LustreError: 11-0: lustre03-MDT0000-mdc-ffff880073fec800: Communicating with 172.23.144.2@tcp, operation mds_connect failed with -16.
      Sep  4 10:00:28 cs04r-sc-serv-06 kernel: LustreError: 11-0: lustre03-MDT0000-mdc-ffff880073fec800: Communicating with 172.23.144.2@tcp, operation mds_connect failed with -16.
      Sep  4 10:01:18 cs04r-sc-serv-06 kernel: LustreError: 11-0: lustre03-MDT0000-mdc-ffff880073fec800: Communicating with 172.23.144.2@tcp, operation mds_connect failed with -16.
      Sep  4 10:01:18 cs04r-sc-serv-06 kernel: LustreError: Skipped 1 previous similar message
      Sep  4 10:02:33 cs04r-sc-serv-06 kernel: LustreError: 11-0: lustre03-MDT0000-mdc-ffff880073fec800: Communicating with 172.23.144.2@tcp, operation mds_connect failed with -16.
      Sep  4 10:02:33 cs04r-sc-serv-06 kernel: LustreError: Skipped 2 previous similar messages
      Sep  4 10:05:03 cs04r-sc-serv-06 kernel: LustreError: 11-0: lustre03-MDT0000-mdc-ffff880073fec800: Communicating with 172.23.144.2@tcp, operation mds_connect failed with -16.
      Sep  4 10:05:03 cs04r-sc-serv-06 kernel: LustreError: Skipped 5 previous similar messages
      Sep  4 10:09:38 cs04r-sc-serv-06 kernel: LustreError: 11-0: lustre03-MDT0000-mdc-ffff880073fec800: Communicating with 172.23.144.2@tcp, operation mds_connect failed with -16.
      Sep  4 10:09:38 cs04r-sc-serv-06 kernel: LustreError: Skipped 10 previous similar messages
      Sep  4 10:33:15 cs04r-sc-serv-06 kernel: LustreError: 32662:0:(dir.c:422:ll_get_dir_page()) read cache page: [0xe900001:0x3b1189d1:0x0] at 0: rc -4
      Sep  4 10:33:15 cs04r-sc-serv-06 kernel: LustreError: 32662:0:(dir.c:584:ll_dir_read()) error reading dir [0xe900001:0x3b1189d1:0x0] at 0: rc -4
      Sep  4 10:34:00 cs04r-sc-serv-06 kernel: LustreError: 32717:0:(dir.c:398:ll_get_dir_page()) dir page locate: [0xe900001:0x3b1189d1:0x0] at 0: rc -5
      Sep  4 10:34:00 cs04r-sc-serv-06 kernel: LustreError: 32717:0:(dir.c:584:ll_dir_read()) error reading dir [0xe900001:0x3b1189d1:0x0] at 0: rc -5
      Sep  4 10:37:44 cs04r-sc-serv-06 kernel: LustreError: 487:0:(mdc_locks.c:918:mdc_enqueue()) ldlm_cli_enqueue: -4
      Sep  4 10:37:44 cs04r-sc-serv-06 kernel: LustreError: 487:0:(mdc_locks.c:918:mdc_enqueue()) Skipped 879 previous similar messages
      Sep  4 10:37:57 cs04r-sc-serv-06 kernel: LustreError: 508:0:(dir.c:398:ll_get_dir_page()) dir page locate: [0xe900001:0x3b1189d1:0x0] at 0: rc -5
      Sep  4 10:37:57 cs04r-sc-serv-06 kernel: LustreError: 508:0:(dir.c:584:ll_dir_read()) error reading dir [0xe900001:0x3b1189d1:0x0] at 0: rc -5
      Sep  4 10:37:58 cs04r-sc-serv-06 kernel: LustreError: 510:0:(dir.c:398:ll_get_dir_page()) dir page locate: [0xe900001:0x3b1189d1:0x0] at 0: rc -5
      Sep  4 10:37:59 cs04r-sc-serv-06 kernel: LustreError: 512:0:(dir.c:584:ll_dir_read()) error reading dir [0xe900001:0x3b1189d1:0x0] at 0: rc -5
      Sep  4 10:37:59 cs04r-sc-serv-06 kernel: LustreError: 512:0:(dir.c:584:ll_dir_read()) Skipped 1 previous similar message
      Sep  4 10:43:34 cs04r-sc-serv-06 kernel: LustreError: 875:0:(dir.c:398:ll_get_dir_page()) dir page locate: [0xe900001:0x3b1189d1:0x0] at 0: rc -5
      Sep  4 10:43:34 cs04r-sc-serv-06 kernel: LustreError: 875:0:(dir.c:398:ll_get_dir_page()) Skipped 2 previous similar messages
      Sep  4 10:43:34 cs04r-sc-serv-06 kernel: LustreError: 875:0:(dir.c:584:ll_dir_read()) error reading dir [0xe900001:0x3b1189d1:0x0] at 0: rc -5
      Sep  4 10:43:34 cs04r-sc-serv-06 kernel: LustreError: 875:0:(dir.c:584:ll_dir_read()) Skipped 1 previous similar message
      Sep  4 10:47:19 cs04r-sc-serv-06 kernel: LustreError: 1122:0:(dir.c:398:ll_get_dir_page()) dir page locate: [0xe900001:0x3b1189d1:0x0] at 0: rc -5
      Sep  4 10:47:19 cs04r-sc-serv-06 kernel: LustreError: 1122:0:(dir.c:584:ll_dir_read()) error reading dir [0xe900001:0x3b1189d1:0x0] at 0: rc -5
      

      I'll attach the full MDT syslog file starting with the mount until we unmounted again to fail back to the previous MDT as a file.

      Note that IB and lnet over IB has been added to this file system recently, following the instructions in the manual on changing server NIDs, including unmounting everything, unloading lustre modules on the servers completely, tunefs.lustre --writeconf --erase-param with the new NIDs etc, mounting MGS, MDT, OSTs, in this order. (Some ethernet only clients might have been still up during this, but the client I used to test this while it wasn't working certainly had been unmounted then and rebooted a few times after).

      We are currently concerned that this will happen again if we have to do another fail over on the MDT, so want to solve this. Let us know what other information we should provide.

      Attachments

        Issue Links

          Activity

            [LU-5583] clients receive IO error after MDT failover

            I've done a few more tests on this file system while I can (planned maintenance, nearly over now).

            I'll try to summarise the results here and hopefully they'll be useful for something (at least they'll be for us to remember what has been tested if we get back to this later..)

            In this file system, we have one MGT and one MDT, both share the same disk backend and are on the same LVM VG, separate LVs. We have two MDS servers able to access this storage (cs04r-sc-mds03-01 and cs04r-sc-mds03-02), both have lnet configured to use TCP and o2ib. The MDT has is configured to access the MGS on either of the servers, via two mgsnode parameters, both listing o2ib and tcp IP addresses.

            When the MGT and MDT are mounted in this order on cs04r-sc-mds03-01 all seems to be well, no messages in syslog about failure to get MGS log params or anything.
            When the MGT and then the MDT are mounted in this order on cs04r-sc-mds03-02, we get these messages about the failure to get MGS log params but other than the first time, the MDT appears to be working fine.
            Mounting the MGT on cs04r-sc-mds03-01 and later mounting the MDT on cs04r-sc-mds03-02 also works fine, no errors in syslog.
            Mounting the MGT on cs04r-sc-mds03-02 and later mounting the MDT on cs04r-sc-mds03-01 generates the messages about failure to get MGS log params on cs04r-sc-mds03-01.

            So, it seems the MGT works on cs04r-sc-mds03-01 but not on cs04r-sc-mds03-02.

            ferner Frederik Ferner (Inactive) added a comment - I've done a few more tests on this file system while I can (planned maintenance, nearly over now). I'll try to summarise the results here and hopefully they'll be useful for something (at least they'll be for us to remember what has been tested if we get back to this later..) In this file system, we have one MGT and one MDT, both share the same disk backend and are on the same LVM VG, separate LVs. We have two MDS servers able to access this storage (cs04r-sc-mds03-01 and cs04r-sc-mds03-02), both have lnet configured to use TCP and o2ib. The MDT has is configured to access the MGS on either of the servers, via two mgsnode parameters, both listing o2ib and tcp IP addresses. When the MGT and MDT are mounted in this order on cs04r-sc-mds03-01 all seems to be well, no messages in syslog about failure to get MGS log params or anything. When the MGT and then the MDT are mounted in this order on cs04r-sc-mds03-02, we get these messages about the failure to get MGS log params but other than the first time, the MDT appears to be working fine. Mounting the MGT on cs04r-sc-mds03-01 and later mounting the MDT on cs04r-sc-mds03-02 also works fine, no errors in syslog. Mounting the MGT on cs04r-sc-mds03-02 and later mounting the MDT on cs04r-sc-mds03-01 generates the messages about failure to get MGS log params on cs04r-sc-mds03-01. So, it seems the MGT works on cs04r-sc-mds03-01 but not on cs04r-sc-mds03-02.
            bobijam Zhenyu Xu added a comment -

            you can use one MGS node for different filesystem, but need separate MGT device for different filesystem.

            bobijam Zhenyu Xu added a comment - you can use one MGS node for different filesystem, but need separate MGT device for different filesystem.

            This is how we had been running before the upgrade without any problem and as far as I can see it's only been an issue after the upgrade. Is the recommendation these days to have a separate MGS on a separate machine?

            Anyway, is the failure to get the MGS log a problem we need to worry about or is it mainly cosmetic?

            ferner Frederik Ferner (Inactive) added a comment - This is how we had been running before the upgrade without any problem and as far as I can see it's only been an issue after the upgrade. Is the recommendation these days to have a separate MGS on a separate machine? Anyway, is the failure to get the MGS log a problem we need to worry about or is it mainly cosmetic?
            bobijam Zhenyu Xu added a comment -

            yes, you are right, #11765 is just for umount/remount issue. The failure to get MGS log I think could be related to the multiple mount of MGT device at the same time.

            bobijam Zhenyu Xu added a comment - yes, you are right, #11765 is just for umount/remount issue. The failure to get MGS log I think could be related to the multiple mount of MGT device at the same time.

            This patch has fixed the umount/remount issue on the failover MDS.

            It didn't fix the issue with these entries below, but I don't think you expect this to be fixed by the patch, just stating for clarity

            kernel: LustreError: 13a-8: Failed to get MGS log params and no local copy.
            

            On the other hand, I'm not completely able to reproduce the initial issue anymore, it looks like all clients can talk to the MDT and lctl get_param lod.lustre03-MDT0000-mdtlov.target_obd returns all OSTs as active.

            ferner Frederik Ferner (Inactive) added a comment - This patch has fixed the umount/remount issue on the failover MDS. It didn't fix the issue with these entries below, but I don't think you expect this to be fixed by the patch, just stating for clarity kernel: LustreError: 13a-8: Failed to get MGS log params and no local copy. On the other hand, I'm not completely able to reproduce the initial issue anymore, it looks like all clients can talk to the MDT and lctl get_param lod.lustre03-MDT0000-mdtlov.target_obd returns all OSTs as active.
            bobijam Zhenyu Xu added a comment -

            this looks like LU-4943 issue (MGC device does not clean up before another mount), would you mind trying patch http://review.whamcloud.com/#/c/11765/ on the MDS?

            bobijam Zhenyu Xu added a comment - this looks like LU-4943 issue (MGC device does not clean up before another mount), would you mind trying patch http://review.whamcloud.com/#/c/11765/ on the MDS?

            People

              bobijam Zhenyu Xu
              ferner Frederik Ferner (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: