Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-890

MDS Failover Issue - Clients not reconnecting after MGT/MDT fail over to other MDS.

Details

    • Bug
    • Resolution: Fixed
    • Major
    • None
    • None
    • 3
    • 6512

    Description

      The production compute nodes and login nodes can access both filesystems when the MGT/MDT is running on the primary MDS of scratch1. When the MGT and MDT are failed over to the backup MDS, the clients fail to reconnect.

      The basic configuration information is as follows:

      The primary MDS for scratch1 is named lfs-mds-1-1 and the secondary MDS is named lfs-mds-1-2.
      /etc/modprobe.d/lustre.conf:
      options lnet networks="o2ib0(ib0), o2ib1(ib2), o2ib2(ib3)"

      lfs-mds-1-1:
      ib0 inet addr:10.174.31.241 Bcast:10.174.31.255 Mask:255.255.224.0
      ib1 inet addr:10.175.31.241 Bcast:10.175.31.255 Mask:255.255.224.0
      ib2 inet addr:10.174.79.241 Bcast:10.174.79.255 Mask:255.255.240.0
      ib3 inet addr:10.174.80.40 Bcast:10.174.111.255 Mask:255.255.240.0

      [root@lfs-mds-1-1 config]# lctl list_nids
      10.174.31.241@o2ib
      10.174.79.241@o2ib1
      10.174.80.40@o2ib2

      lfs-mds-1-2:
      ib0 inet addr:10.174.31.251 Bcast:10.174.31.255 Mask:255.255.224.0
      ib1 inet addr:10.175.31.251 Bcast:10.175.31.255 Mask:255.255.224.0
      ib2 inet addr:10.174.79.251 Bcast:10.174.79.255 Mask:255.255.240.0
      ib3 inet addr:10.174.80.41 Bcast:10.174.111.255 Mask:255.255.240.0

      [root@lfs-mds-1-2 ~]# lctl list_nids
      10.174.31.251@o2ib
      10.174.79.251@o2ib1
      10.174.80.41@o2ib2

      r1i0n0 config (compute node):
      ib0 inet addr:10.174.0.55 Bcast:10.174.31.255 Mask:255.255.224.0
      ib1 inet addr:10.175.0.55 Bcast:10.175.31.255 Mask:255.255.224.0

      /etc/modprobe.d/lustre.conf
      options lnet networks="o2ib0(ib0), o2ib1(ib1)"

      [root@r1i0n0 ~]# lctl list_nids
      10.174.0.55@o2ib
      10.175.0.55@o2ib1

      [root@r1i0n0 ~]# lctl ping 10.174.31.241@o2ib
      12345-0@lo
      12345-10.174.31.241@o2ib
      12345-10.174.79.241@o2ib1
      12345-10.174.80.40@o2ib2
      [root@r1i0n0 ~]# lctl ping 10.174.31.251@o2ib
      12345-0@lo
      12345-10.174.31.251@o2ib
      12345-10.174.79.251@o2ib1
      12345-10.174.80.41@o2ib2

      fe1 (login node):
      inet addr:10.174.0.37 Bcast:10.255.255.255 Mask:255.255.224.0
      inet addr:10.175.0.37 Bcast:10.255.255.255 Mask:255.255.224.0
      inet addr:10.174.81.1 Bcast:10.174.95.255 Mask:255.255.240.0

      /etc/modprobe.d/lustre.conf
      options lnet networks="o2ib0(ib0), o2ib1(ib1), o2ib2(ib2)"

      [root@fe1 ~]# lctl list_nids
      10.174.0.37@o2ib
      10.175.0.37@o2ib1
      10.174.81.10@o2ib2

      [root@fe1 ~]# lctl ping 10.174.80.40@o2ib2
      12345-0@lo
      12345-10.174.31.241@o2ib
      12345-10.174.79.241@o2ib1
      12345-10.174.80.40@o2ib2
      [root@fe1 ~]# lctl ping 10.174.80.41@o2ib2
      12345-0@lo
      12345-10.174.31.251@o2ib
      12345-10.174.79.251@o2ib1
      12345-10.174.80.41@o2ib2

      [root@lfs-mds-1-1 ~]# tunefs.lustre --dryrun /dev/vg_scratch1/mdt
      checking for existing Lustre data: found CONFIGS/mountdata
      Reading CONFIGS/mountdata

      Read previous values:
      Target: scratch1-MDT0000
      Index: 0
      Lustre FS: scratch1
      Mount type: ldiskfs
      Flags: 0x1401
      (MDT no_primnode )
      Persistent mount opts: iopen_nopriv,user_xattr,errors=remount-ro
      Parameters: mgsnode=10.174.31.241@o2ib,10.174.79.241@o2ib1,10.174.80.40@o2ib2 mgsnode=10.174.31.251@o2ib,10.174.79.251@o2ib1,10.174.80.41@o2ib2 failover.node=10.174.31.241@o2ib,10.174.79.241@o2ib1,10.174.80.40@o2ib2 failover.node=10.174.31.251@o2ib,10.174.79.251@o2ib1,10.174.80.41@o2ib2 mdt.quota_type=ug

      Permanent disk data:
      Target: scratch1-MDT0000
      Index: 0
      Lustre FS: scratch1
      Mount type: ldiskfs
      Flags: 0x1401
      (MDT no_primnode )
      Persistent mount opts: iopen_nopriv,user_xattr,errors=remount-ro
      Parameters: mgsnode=10.174.31.241@o2ib,10.174.79.241@o2ib1,10.174.80.40@o2ib2 mgsnode=10.174.31.251@o2ib,10.174.79.251@o2ib1,10.174.80.41@o2ib2 failover.node=10.174.31.241@o2ib,10.174.79.241@o2ib1,10.174.80.40@o2ib2 failover.node=10.174.31.251@o2ib,10.174.79.251@o2ib1,10.174.80.41@o2ib2 mdt.quota_type=ug

      exiting before disk write.

      After failing over the MGT and MDT to the backup MDS (lfs-mds-1-2) it appears to have never started recovery:

      [root@lfs-mds-1-2 lustre]# cat
      /proc/fs/lustre/mds/scratch1-MDT0000/recovery_status
      status: RECOVERING
      recovery_start: 0
      time_remaining: 0
      connected_clients: 0/2275
      delayed_clients: 0/2275
      completed_clients: 0/2275
      replayed_requests: 0/??
      queued_requests: 0
      next_transno: 55834575147

      Once I moved the MGT and MDT back to the original system, the client reconnected again in less than a minute:

      [root@lfs-mds-1-1 ~]# cat
      /proc/fs/lustre/mds/scratch1-MDT0000/recovery_status
      status: RECOVERING
      recovery_start: 1322752821
      time_remaining: 267
      connected_clients: 1896/2275
      delayed_clients: 0/2275
      completed_clients: 1896/2275
      replayed_requests: 0/??
      queued_requests: 0
      next_transno: 55834575147
      [root@lfs-mds-1-1 ~]# cat
      /proc/fs/lustre/mds/scratch1-MDT0000/recovery_status
      status: COMPLETE
      recovery_start: 1322752821
      recovery_duration: 56
      delayed_clients: 0/2275
      completed_clients: 2275/2275
      replayed_requests: 0
      last_transno: 55834575146

      The log file on fe1 showed this:
      Dec 1 15:08:21 fe1 kernel: Lustre: 7508:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1386944264150314 sent from scratch1-MDT0000-mdc-ffff880be72aec00 to NID 10.174.31.241@o2ib 7s ago has timed out (7s prior to deadline).
      Dec 1 15:08:21 fe1 kernel: req@ffff880bee44fc00 x1386944264150314/t0 o35->scratch1-MDT0000_UUID@10.174.31.241@o2ib:23/10 lens 408/9864 e 0 to 1 dl 1322752101 ref 1 fl Rpc:/0/0 rc 0/0
      Dec 1 15:08:21 fe1 kernel: Lustre: 7508:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 19 previous similar messages
      Dec 1 15:08:21 fe1 kernel: Lustre: scratch1-MDT0000-mdc-ffff880be72aec00: Connection to service scratch1-MDT0000 via nid 10.174.31.241@o2ib was lost; in progress operations using this service will wait for recovery to complete.
      Dec 1 15:08:36 fe1 kernel: Lustre: 5587:0:(import.c:517:import_select_connection()) scratch1-MDT0000-mdc-ffff880be72aec00: tried all connections, increasing latency to 2s
      Dec 1 15:08:38 fe1 kernel: Lustre: 5585:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1386944264150337 sent from MGC10.174.80.40@o2ib2 to NID 10.174.80.40@o2ib2 17s ago has timed out (17s prior to deadline).
      Dec 1 15:08:38 fe1 kernel: req@ffff880becc30000 x1386944264150337/t0 o400->MGS@MGC10.174.80.40@o2ib2_0:26/25 lens 192/384 e 0 to 1 dl 1322752117 ref 1 fl Rpc:N/0/0 rc 0/0
      Dec 1 15:08:38 fe1 kernel: Lustre: 5585:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 2 previous similar messages
      Dec 1 15:08:38 fe1 kernel: LustreError: 166-1: MGC10.174.80.40@o2ib2: Connection to service MGS via nid 10.174.80.40@o2ib2 was lost; in progress operations using this service will fail.
      Dec 1 15:08:52 fe1 kernel: Lustre: 5587:0:(import.c:517:import_select_connection()) scratch1-MDT0000-mdc-ffff880be72aec00: tried all connections, increasing latency to 3s
      Dec 1 15:08:59 fe1 kernel: Lustre: 5586:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1386944264151143 sent from MGC10.174.80.40@o2ib2 to NID 10.174.80.40@o2ib2 6s ago has timed out (6s prior to deadline).
      Dec 1 15:08:59 fe1 kernel: req@ffff880bed70c400 x1386944264151143/t0 o250->MGS@MGC10.174.80.40@o2ib2_0:26/25 lens 368/584 e 0 to 1 dl 1322752139 ref 1 fl Rpc:N/0/0 rc 0/0
      Dec 1 15:08:59 fe1 kernel: Lustre: 5586:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 5 previous similar messages
      Dec 1 15:09:00 fe1 kernel: Lustre: 5586:0:(import.c:855:ptlrpc_connect_interpret()) MGS@MGC10.174.80.40@o2ib2_1 changed server handle from 0x242210f6584197b7 to 0xa6cae1b09294c1a2
      Dec 1 15:09:00 fe1 kernel: Lustre: MGC10.174.80.40@o2ib2: Reactivating import
      Dec 1 15:09:00 fe1 kernel: Lustre: MGC10.174.80.40@o2ib2: Connection restored to service MGS using nid 10.174.80.41@o2ib2.
      Dec 1 15:09:11 fe1 kernel: Lustre: 5587:0:(import.c:517:import_select_connection()) scratch1-MDT0000-mdc-ffff880be72aec00: tried all connections, increasing latency to 4s
      Dec 1 15:09:31 fe1 kernel: Lustre: 5587:0:(import.c:517:import_select_connection()) scratch1-MDT0000-mdc-ffff880be72aec00: tried all connections, increasing latency to 5s
      Dec 1 15:09:41 fe1 kernel: Lustre: 5586:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1386944264151550 sent from scratch1-MDT0000-mdc-ffff880be72aec00 to NID 10.174.31.241@o2ib 10s ago has timed out (10s prior to deadline).
      Dec 1 15:09:41 fe1 kernel: req@ffff8817eea7d000 x1386944264151550/t0 o38->scratch1-MDT0000_UUID@10.174.31.241@o2ib:12/10 lens 368/584 e 0 to 1 dl 1322752181 ref 1 fl Rpc:N/0/0 rc 0/0
      Dec 1 15:09:41 fe1 kernel: Lustre: 5586:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 4 previous similar messages
      Dec 1 15:10:17 fe1 kernel: Lustre: 5587:0:(import.c:517:import_select_connection()) scratch1-MDT0000-mdc-ffff880be72aec00: tried all connections, increasing latency to 7s
      Dec 1 15:10:17 fe1 kernel: Lustre: 5587:0:(import.c:517:import_select_connection()) Skipped 1 previous similar message
      Dec 1 15:10:56 fe1 kernel: Lustre: 5586:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1386944264152753 sent from scratch1-MDT0000-mdc-ffff880be72aec00 to NID 10.174.31.241@o2ib 13s ago has timed out (13s prior to deadline).
      Dec 1 15:10:56 fe1 kernel: req@ffff881808992800 x1386944264152753/t0 o38->scratch1-MDT0000_UUID@10.174.31.241@o2ib:12/10 lens 368/584 e 0 to 1 dl 1322752256 ref 1 fl Rpc:N/0/0 rc 0/0
      Dec 1 15:10:56 fe1 kernel: Lustre: 5586:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 5 previous similar messages
      Dec 1 15:11:41 fe1 kernel: Lustre: 5587:0:(import.c:517:import_select_connection()) scratch1-MDT0000-mdc-ffff880be72aec00: tried all connections, increasing latency to 10s
      Dec 1 15:11:41 fe1 kernel: Lustre: 5587:0:(import.c:517:import_select_connection()) Skipped 2 previous similar messages
      Dec 1 15:13:41 fe1 kernel: Lustre: 5586:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1386944264155556 sent from scratch1-MDT0000-mdc-ffff880be72aec00 to NID 10.174.31.241@o2ib 18s ago has timed out (18s prior to deadline).
      Dec 1 15:13:41 fe1 kernel: req@ffff880be9f5e000 x1386944264155556/t0 o38->scratch1-MDT0000_UUID@10.174.31.241@o2ib:12/10 lens 368/584 e 0 to 1 dl 1322752421 ref 1 fl Rpc:N/0/0 rc 0/0
      Dec 1 15:13:41 fe1 kernel: Lustre: 5586:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 9 previous similar messages
      Dec 1 15:14:41 fe1 kernel: Lustre: 5587:0:(import.c:517:import_select_connection()) scratch1-MDT0000-mdc-ffff880be72aec00: tried all connections, increasing latency to 15s
      Dec 1 15:14:41 fe1 kernel: Lustre: 5587:0:(import.c:517:import_select_connection()) Skipped 4 previous similar messages
      Dec 1 15:18:56 fe1 kernel: Lustre: 5586:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1386944264160358 sent from scratch1-MDT0000-mdc-ffff880be72aec00 to NID 10.174.31.241@o2ib 25s ago has timed out (25s prior to deadline).
      Dec 1 15:18:56 fe1 kernel: req@ffff880beb4a4000 x1386944264160358/t0 o38->scratch1-MDT0000_UUID@10.174.31.241@o2ib:12/10 lens 368/584 e 0 to 1 dl 1322752736 ref 1 fl Rpc:N/0/0 rc 0/0
      Dec 1 15:18:56 fe1 kernel: Lustre: 5586:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 13 previous similar messages
      Dec 1 15:20:17 fe1 kernel: LustreError: 166-1: MGC10.174.80.40@o2ib2: Connection to service MGS via nid 10.174.80.41@o2ib2 was lost; in progress operations using this service will fail.
      Dec 1 15:20:18 fe1 kernel: Lustre: 5587:0:(import.c:517:import_select_connection()) scratch1-MDT0000-mdc-ffff880be72aec00: tried all connections, increasing latency to 22s
      Dec 1 15:20:18 fe1 kernel: Lustre: 5587:0:(import.c:517:import_select_connection()) Skipped 6 previous similar messages
      Dec 1 15:20:24 fe1 kernel: Lustre: 5586:0:(import.c:855:ptlrpc_connect_interpret()) MGS@MGC10.174.80.40@o2ib2_0 changed server handle from 0xa6cae1b09294c1a2 to 0x242210f65845423c
      Dec 1 15:20:24 fe1 kernel: Lustre: MGC10.174.80.40@o2ib2: Reactivating import
      Dec 1 15:20:24 fe1 kernel: Lustre: MGC10.174.80.40@o2ib2: Connection restored to service MGS using nid 10.174.80.40@o2ib2.
      Dec 1 15:21:14 fe1 kernel: LustreError: 5586:0:(client.c:2347:ptlrpc_replay_interpret()) @@@ status 301, old was 0 req@ffff880be96a8000 x1386944264100092/t55834575126 o101->scratch1-MDT0000_UUID@10.174.31.241@o2ib:12/10 lens 512/4880 e 0 to 1 dl 1322752939 ref 2 fl Interpret:RP/4/0 rc 301/301
      Dec 1 15:21:17 fe1 kernel: Lustre: scratch1-MDT0000-mdc-ffff880be72aec00: Connection restored to service scratch1-MDT0000 using nid 10.174.31.241@o2ib.
      Dec 1 15:21:17 fe1 kernel: LustreError: 11-0: an error occurred while communicating with 10.174.31.241@o2ib. The mds_close operation failed with -116
      Dec 1 15:21:17 fe1 kernel: LustreError: Skipped 7 previous similar messages
      Dec 1 15:21:17 fe1 kernel: LustreError: 7508:0:(file.c:116:ll_close_inode_openhandle()) inode 1905262791 mdc close failed: rc = -116

      The log files on lfs-mds-1-1 and lfs-mds-1-2 are void of any useful data.

      Attachments

        1. lustre1_uuids.txt
          139 kB
        2. lustre2_uuids.txt
          347 kB
        3. lustre-scratch1
          1.44 MB
        4. lustre-scratch1
          826 kB
        5. lustre-scratch1
          1.44 MB
        6. lustre-scratch1
          9.71 MB

        Activity

          [LU-890] MDS Failover Issue - Clients not reconnecting after MGT/MDT fail over to other MDS.

          Recreating config logs with writeconf fixed the failover issue

          cliffw Cliff White (Inactive) added a comment - Recreating config logs with writeconf fixed the failover issue

          As Johann said, we think there was an issue with the initial creation of the config log, recreation fixed it. We are also working on replication in the lab.
          Closing

          cliffw Cliff White (Inactive) added a comment - As Johann said, we think there was an issue with the initial creation of the config log, recreation fixed it. We are also working on replication in the lab. Closing

          I believe this can be declared resolved by the writeconf. I am curious though if anyone has any insight on what might have gone wrong. We did not change any parameters, yet after the writeconf, it now works. I still have a client connectivity issue being worked in LU-899 which may or may not be related.

          dnelson@ddn.com Dennis Nelson added a comment - I believe this can be declared resolved by the writeconf. I am curious though if anyone has any insight on what might have gone wrong. We did not change any parameters, yet after the writeconf, it now works. I still have a client connectivity issue being worked in LU-899 which may or may not be related.

          I am fixing the incorrect broadcast addresses. I'm not sure that will fix the issues but it is wrong and needs to be fixed. I'll report back with new info after that is completed.

          dnelson@ddn.com Dennis Nelson added a comment - I am fixing the incorrect broadcast addresses. I'm not sure that will fix the issues but it is wrong and needs to be fixed. I'll report back with new info after that is completed.

          hold, on. I just noticed that we have a broadcast address problem.

          dnelson@ddn.com Dennis Nelson added a comment - hold, on. I just noticed that we have a broadcast address problem.

          OK, here is the server info once again:

          lfs-mds-1-2:
          ib0 inet addr:10.174.31.251 Bcast:10.174.31.255 Mask:255.255.224.0
          ib1 inet addr:10.175.31.251 Bcast:10.175.31.255 Mask:255.255.224.0
          ib2 inet addr:10.174.79.251 Bcast:10.174.79.255 Mask:255.255.240.0
          ib3 inet addr:10.174.80.41 Bcast:10.174.111.255 Mask:255.255.240.0

          Although configured with an ip address, the scratch1 filesystem does not use the ib1 fabric. On lfs-mds-2-x, the ib0 fabric is not used.

          [root@lfs-mds-1-2 ~]# cat /etc/modprobe.d/lustre.conf
          options lnet networks="o2ib0(ib0), o2ib1(ib2), o2ib2(ib3)"

          [root@lfs-mds-1-2 ~]# lctl list_nids
          10.174.31.251@o2ib
          10.174.79.251@o2ib1
          10.174.80.41@o2ib2

          Client fe1 (login node)
          ib0 inet addr:10.174.0.37 Bcast:10.255.255.255 Mask:255.255.224.0
          ib1 inet addr:10.175.0.37 Bcast:10.255.255.255 Mask:255.255.224.0
          ib2 inet addr:10.174.81.10 Bcast:10.174.95.255 Mask:255.255.240.0

          Although the login nodes have a connection to the ib0 and ib1 fabrics (same as ib0 and ib1 on the Lustre servers), the design of the system was such that the login nodes should use the ib2 port (Same fabric as ib3 on Lustre servers) for mounting the Lustre filesystems. Havin all three entries in teh file might be an issue. I had difficulties making the mount work with just ib2 defined in modprobe.d/lustre.conf. This configuration allows the mounts to work although, scratch2 does take a while to mount (about 2.5 minutes).

          [root@fe1 ~]# cat /etc/modprobe.d/lustre.conf

          1. Lustre module configuration file
            options lnet networks="o2ib0(ib0), o2ib1(ib1), o2ib2(ib2)"

          [root@fe1 ~]# lctl list_nids
          10.174.0.37@o2ib
          10.175.0.37@o2ib1
          10.174.81.10@o2ib2

          [root@fe1 ~]# mount
          10.174.80.40@o2ib2:10.174.80.41@o2ib2:/scratch1 on /mnt/lustre1 type lustre (rw,flock)
          10.174.80.42@o2ib2:10.174.80.43@o2ib2:/scratch2 on /mnt/lustre2 type lustre (rw,flock)

          [root@fe1 ~]# lctl ping 10.174.80.40@o2ib2
          12345-0@lo
          12345-10.174.31.241@o2ib
          12345-10.174.79.241@o2ib1
          12345-10.174.80.40@o2ib2
          [root@fe1 ~]# lctl ping 10.174.80.41@o2ib2
          12345-0@lo
          12345-10.174.31.251@o2ib
          12345-10.174.79.251@o2ib1
          12345-10.174.80.41@o2ib2

          Client dtn1 (data transfer node):
          dtn1 accesses the filesystems just like the login nodes but dtn1 does not have interfaces that connect to the ib0 and ib1 ports of the servers.

          ib0 inet addr:10.174.81.1 Bcast:10.174.95.255 Mask:255.255.240.0

          [root@dtn1 ~]# cat /etc/modprobe.d/lustre.conf

          1. Lustre module configuration file
            options lnet networks="o2ib2(ib0)"

          [root@dtn1 ~]# lctl list_nids
          10.174.81.1@o2ib2

          [root@dtn1 ~]# lctl ping 10.174.80.40@o2ib2
          12345-0@lo
          12345-10.174.31.241@o2ib
          12345-10.174.79.241@o2ib1
          12345-10.174.80.40@o2ib2
          [root@dtn1 ~]# lctl ping 10.174.80.41@o2ib2
          12345-0@lo
          12345-10.174.31.251@o2ib
          12345-10.174.79.251@o2ib1
          12345-10.174.80.41@o2ib2

          [root@dtn1 ~]# mount /mnt/lustre1
          mount.lustre: mount 10.174.80.40@o2ib2:10.174.80.41@o2ib2:/scratch1 at /mnt/lustre1 failed: No such file or directory
          Is the MGS specification correct?
          Is the filesystem name correct?
          If upgrading, is the copied client log valid? (see upgrade docs)

          [root@dtn1 ~]# cat /etc/fstab
          10.174.80.40@o2ib2:10.174.80.41@o2ib2:/scratch1 /mnt/lustre1 lustre defaults,flock 0 0
          10.174.80.42@o2ib2:10.174.80.43@o2ib2:/scratch2 /mnt/lustre2 lustre defaults,flock 0 0

          [root@dtn1 ~]# mount
          10.174.80.42@o2ib2:10.174.80.43@o2ib2:/scratch2 on /mnt/lustre2 type lustre (rw,flock)

          Did I miss anything that you wanted to see?

          dnelson@ddn.com Dennis Nelson added a comment - OK, here is the server info once again: lfs-mds-1-2: ib0 inet addr:10.174.31.251 Bcast:10.174.31.255 Mask:255.255.224.0 ib1 inet addr:10.175.31.251 Bcast:10.175.31.255 Mask:255.255.224.0 ib2 inet addr:10.174.79.251 Bcast:10.174.79.255 Mask:255.255.240.0 ib3 inet addr:10.174.80.41 Bcast:10.174.111.255 Mask:255.255.240.0 Although configured with an ip address, the scratch1 filesystem does not use the ib1 fabric. On lfs-mds-2-x, the ib0 fabric is not used. [root@lfs-mds-1-2 ~] # cat /etc/modprobe.d/lustre.conf options lnet networks="o2ib0(ib0), o2ib1(ib2), o2ib2(ib3)" [root@lfs-mds-1-2 ~] # lctl list_nids 10.174.31.251@o2ib 10.174.79.251@o2ib1 10.174.80.41@o2ib2 Client fe1 (login node) ib0 inet addr:10.174.0.37 Bcast:10.255.255.255 Mask:255.255.224.0 ib1 inet addr:10.175.0.37 Bcast:10.255.255.255 Mask:255.255.224.0 ib2 inet addr:10.174.81.10 Bcast:10.174.95.255 Mask:255.255.240.0 Although the login nodes have a connection to the ib0 and ib1 fabrics (same as ib0 and ib1 on the Lustre servers), the design of the system was such that the login nodes should use the ib2 port (Same fabric as ib3 on Lustre servers) for mounting the Lustre filesystems. Havin all three entries in teh file might be an issue. I had difficulties making the mount work with just ib2 defined in modprobe.d/lustre.conf. This configuration allows the mounts to work although, scratch2 does take a while to mount (about 2.5 minutes). [root@fe1 ~] # cat /etc/modprobe.d/lustre.conf Lustre module configuration file options lnet networks="o2ib0(ib0), o2ib1(ib1), o2ib2(ib2)" [root@fe1 ~] # lctl list_nids 10.174.0.37@o2ib 10.175.0.37@o2ib1 10.174.81.10@o2ib2 [root@fe1 ~] # mount 10.174.80.40@o2ib2:10.174.80.41@o2ib2:/scratch1 on /mnt/lustre1 type lustre (rw,flock) 10.174.80.42@o2ib2:10.174.80.43@o2ib2:/scratch2 on /mnt/lustre2 type lustre (rw,flock) [root@fe1 ~] # lctl ping 10.174.80.40@o2ib2 12345-0@lo 12345-10.174.31.241@o2ib 12345-10.174.79.241@o2ib1 12345-10.174.80.40@o2ib2 [root@fe1 ~] # lctl ping 10.174.80.41@o2ib2 12345-0@lo 12345-10.174.31.251@o2ib 12345-10.174.79.251@o2ib1 12345-10.174.80.41@o2ib2 Client dtn1 (data transfer node): dtn1 accesses the filesystems just like the login nodes but dtn1 does not have interfaces that connect to the ib0 and ib1 ports of the servers. ib0 inet addr:10.174.81.1 Bcast:10.174.95.255 Mask:255.255.240.0 [root@dtn1 ~] # cat /etc/modprobe.d/lustre.conf Lustre module configuration file options lnet networks="o2ib2(ib0)" [root@dtn1 ~] # lctl list_nids 10.174.81.1@o2ib2 [root@dtn1 ~] # lctl ping 10.174.80.40@o2ib2 12345-0@lo 12345-10.174.31.241@o2ib 12345-10.174.79.241@o2ib1 12345-10.174.80.40@o2ib2 [root@dtn1 ~] # lctl ping 10.174.80.41@o2ib2 12345-0@lo 12345-10.174.31.251@o2ib 12345-10.174.79.251@o2ib1 12345-10.174.80.41@o2ib2 [root@dtn1 ~] # mount /mnt/lustre1 mount.lustre: mount 10.174.80.40@o2ib2:10.174.80.41@o2ib2:/scratch1 at /mnt/lustre1 failed: No such file or directory Is the MGS specification correct? Is the filesystem name correct? If upgrading, is the copied client log valid? (see upgrade docs) [root@dtn1 ~] # cat /etc/fstab 10.174.80.40@o2ib2:10.174.80.41@o2ib2:/scratch1 /mnt/lustre1 lustre defaults,flock 0 0 10.174.80.42@o2ib2:10.174.80.43@o2ib2:/scratch2 /mnt/lustre2 lustre defaults,flock 0 0 [root@dtn1 ~] # mount 10.174.80.42@o2ib2:10.174.80.43@o2ib2:/scratch2 on /mnt/lustre2 type lustre (rw,flock) Did I miss anything that you wanted to see?

          I have been reviewing this again, wanted to add some clarification. Server recovery will NOT start until the first client attempts a connection.
          We do this so a node with a dead network won't have a failed recovery, it waits for the network to be restored before starting recovery, and looks for a client connection attempt to happen. In your case, I think this is telling us that clients cannot find the backup MDS, since we do not see connection attempts. Are you certain all network routing, masks, etc are correct for clients to reach lfs-mds-1-2? Might be worth a re-check and another round of lctl pings.

          cliffw Cliff White (Inactive) added a comment - I have been reviewing this again, wanted to add some clarification. Server recovery will NOT start until the first client attempts a connection. We do this so a node with a dead network won't have a failed recovery, it waits for the network to be restored before starting recovery, and looks for a client connection attempt to happen. In your case, I think this is telling us that clients cannot find the backup MDS, since we do not see connection attempts. Are you certain all network routing, masks, etc are correct for clients to reach lfs-mds-1-2? Might be worth a re-check and another round of lctl pings.

          Sorry for the delay. I had laptop issues. Here are the uuids.

          dnelson@ddn.com Dennis Nelson added a comment - Sorry for the delay. I had laptop issues. Here are the uuids.

          just like the comment in LU-899, could you please run the following commands and attach the config files in JIRA,

          umount /mnt/mgs
          mount -t ldiskfs /dev/your_mgs_device /mnt/mgs

          the config files is in directory /mnt/mgs/CONFIGS/

          thanks

          hongchao.zhang Hongchao Zhang added a comment - just like the comment in LU-899 , could you please run the following commands and attach the config files in JIRA, umount /mnt/mgs mount -t ldiskfs /dev/your_mgs_device /mnt/mgs the config files is in directory /mnt/mgs/CONFIGS/ thanks

          in the last debug log, the connection to failover node of MDT 10.174.31.251 is added to MDC, but it wasn't shown in the logs
          of the description section of this ticket, which only used the main 10.174.31.241 MDT node, is there any change for the
          system? could you please retry to test whether this node (fe2) can fail over to 10.174.31.151 or not? thanks!

          hongchao.zhang Hongchao Zhang added a comment - in the last debug log, the connection to failover node of MDT 10.174.31.251 is added to MDC, but it wasn't shown in the logs of the description section of this ticket, which only used the main 10.174.31.241 MDT node, is there any change for the system? could you please retry to test whether this node (fe2) can fail over to 10.174.31.151 or not? thanks!

          People

            hongchao.zhang Hongchao Zhang
            dnelson@ddn.com Dennis Nelson
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: