Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-899

Client Connectivity Issues in Complex Lustre Environment

Details

    • Bug
    • Resolution: Fixed
    • Major
    • None
    • None
    • None
    • 3
    • 6508

    Description

      Connectivity Issues:
      Although the login nodes are able to mount both production systems, mounting of the second filesystem takes several minutes:

      client fe2:
      Client fe2 - Mount Test:

      [root@fe2 ~]# date
      Mon Dec 5 17:31:17 UTC 2011
      [root@fe2 ~]# logger "Start Testing"
      [root@fe2 ~]# date;mount /mnt/lustre1;date
      Mon Dec 5 17:31:50 UTC 2011
      Mon Dec 5 17:31:51 UTC 2011
      [root@fe2 ~]# date;mount /mnt/lustre2;date
      Mon Dec 5 17:32:09 UTC 2011
      Mon Dec 5 17:34:24 UTC 2011
      [root@fe2 ~]# logger "End Testing"
      Log file attached - fe2.log

      Client fe2:
      ib0: inet addr:10.174.0.38 Bcast:10.255.255.255 Mask:255.255.224.0
      ib1: inet addr:10.175.0.38 Bcast:10.255.255.255 Mask:255.255.224.0
      ib2: inet addr:10.174.81.11 Bcast:10.174.95.255 Mask:255.255.240.0

      [root@fe2 ~]# cat /etc/modprobe.d/lustre.conf

      1. Lustre module configuration file
        options lnet networks="o2ib0(ib0), o2ib1(ib1), o2ib2(ib2)"

      [root@fe2 ~]# lctl list_nids
      10.174.0.38@o2ib
      10.175.0.38@o2ib1
      10.174.81.11@o2ib2

      [root@fe2 ~]# cat /etc/fstab | grep lustre
      10.174.80.40@o2ib2:10.174.80.41@o2ib2:/scratch1 /mnt/lustre1 lustre defaults,flock 0 0
      10.174.80.42@o2ib2:10.174.80.43@o2ib2:/scratch2 /mnt/lustre2 lustre defaults,flock 0 0

      [root@fe2 ~]# df -h | grep lustre
      2.5P 4.7T 2.5P 1% /mnt/lustre1
      3.1P 3.0T 3.1P 1% /mnt/lustre2

      The configuration of the data transfer nodes differs in that they only have 1 active ib port where the login nodes have 3. Even so, they both use the same ib fabric to connect to the production filesystems. The dtn nodes are able to mount the scratch2 filesystem without issue, but cannot mount the scratch1 filesystem.

      dtn1:
      ib0: inet addr:10.174.81.1 Bcast:10.174.95.255 Mask:255.255.240.0

      [root@dtn1 ~]# cat /etc/modprobe.d/lustre.conf

      1. Lustre module configuration file
        options lnet networks="o2ib2(ib0)"

      [root@dtn1 ~]# lctl list_nids
      10.174.81.1@o2ib2

      [root@dtn1 ~]# lctl ping 10.174.80.40@o2ib2
      12345-0@lo
      12345-10.174.31.241@o2ib
      12345-10.174.79.241@o2ib1
      12345-10.174.80.40@o2ib2
      [root@dtn1 ~]# lctl ping 10.174.80.41@o2ib2
      12345-0@lo
      12345-10.174.31.251@o2ib
      12345-10.174.79.251@o2ib1
      12345-10.174.80.41@o2ib2
      [root@dtn1 ~]# lctl ping 10.174.80.42@o2ib2
      12345-0@lo
      12345-10.175.31.242@o2ib
      12345-10.174.79.242@o2ib1
      12345-10.174.80.42@o2ib2
      [root@dtn1 ~]# lctl ping 10.174.80.43@o2ib2
      12345-0@lo
      12345-10.175.31.252@o2ib
      12345-10.174.79.252@o2ib1
      12345-10.174.80.43@o2ib2

      [root@dtn1 ~]# mount /mnt/lustre2
      [root@dtn1 ~]# df -h
      Filesystem Size Used Avail Use% Mounted on
      /dev/mapper/vg_dtn1-lv_root
      50G 9.4G 38G 20% /
      tmpfs 24G 88K 24G 1% /dev/shm
      /dev/sda1 485M 52M 408M 12% /boot
      10.181.1.2:/contrib 132G 2.9G 129G 3% /contrib
      10.181.1.2:/apps/v1 482G 38G 444G 8% /apps
      10.181.1.2:/home 4.1T 404G 3.7T 10% /home
      10.174.80.42@o2ib2:10.174.80.43@o2ib2:/scratch2
      3.1P 3.0T 3.1P 1% /mnt/lustre2

      [root@dtn1 ~]# mount /mnt/lustre1
      mount.lustre: mount 10.174.80.40@o2ib2:10.174.80.41@o2ib2:/scratch1 at /mnt/lustre1 failed: No such file or directory
      Is the MGS specification correct?
      Is the filesystem name correct?
      If upgrading, is the copied client log valid? (see upgrade docs)

      [root@dtn1 ~]# cat /etc/fstab | grep lustre
      10.174.80.40@o2ib2:10.174.80.41@o2ib2:/scratch1 /mnt/lustre1 lustre defaults,flock 0 0
      10.174.80.42@o2ib2:10.174.80.43@o2ib2:/scratch2 /mnt/lustre2 lustre defaults,flock 0 0

      Finally, the TDS compute nodes cannot access the production filesystems. They have the TDS filesystems mounted (lustre1 and lustre2).
      This may be a simple networking issue. Still investigating.

      Attachments

        1. fe2.log
          9 kB
        2. log.client
          243 kB
        3. log1
          88 kB
        4. log2
          5.75 MB
        5. lustre1_uuids.txt
          139 kB
        6. lustre2_uuids.txt
          347 kB
        7. lustre-scratch1
          826 kB
        8. scratch1.log
          243 kB
        9. scratch2.log
          612 kB

        Activity

          [LU-899] Client Connectivity Issues in Complex Lustre Environment
          pjones Peter Jones added a comment -

          Great - thanks Dennis!

          pjones Peter Jones added a comment - Great - thanks Dennis!

          Yes. I already suggested that LU-890 be closed and it was closed by Cliff. This one can be also.

          dnelson@ddn.com Dennis Nelson added a comment - Yes. I already suggested that LU-890 be closed and it was closed by Cliff. This one can be also.
          pjones Peter Jones added a comment -

          Dennis

          Thanks for the update. So can we close both this ticket and LU890?

          Peter

          pjones Peter Jones added a comment - Dennis Thanks for the update. So can we close both this ticket and LU890? Peter

          I made the change on the TDS servers and had to perform a writeconf in order to get it mounted up again. Everything seems to be working now.

          Thank you very much for all of your help!

          dnelson@ddn.com Dennis Nelson added a comment - I made the change on the TDS servers and had to perform a writeconf in order to get it mounted up again. Everything seems to be working now. Thank you very much for all of your help!

          Ah, no. I will have to schedule some time with the customer to do that. I have one node that is not currently in the job queue that I can use for testing. To take the whole filesystem down, I will have to schedule it.

          I will get that scheduled today.

          dnelson@ddn.com Dennis Nelson added a comment - Ah, no. I will have to schedule some time with the customer to do that. I have one node that is not currently in the job queue that I can use for testing. To take the whole filesystem down, I will have to schedule it. I will get that scheduled today.

          have you also changed MDS/MGS and other servers in TDS filesystem to o2ib3 as well (i.e: mds01)? Because you are using o2ib3 as TDS network number, so all clients and servers on TDS network should use that network number (o2ib3).
          Also, try "lctl ping" to verify network is reachable is always a good idea,

          liang Liang Zhen (Inactive) added a comment - have you also changed MDS/MGS and other servers in TDS filesystem to o2ib3 as well (i.e: mds01)? Because you are using o2ib3 as TDS network number, so all clients and servers on TDS network should use that network number (o2ib3). Also, try "lctl ping" to verify network is reachable is always a good idea,

          OK, I tried the following:

          [root@r1i3n15 ~]# cat /etc/modprobe.d/lustre.conf

          1. Lustre module configuration file
            options lnet networks="o2ib3(ib1), o2ib1(ib0)"

          [root@r1i3n15 ~]# lctl list_nids
          10.174.96.65@o2ib3
          10.174.64.65@o2ib1

          [root@r1i3n15 ~]# cat /etc/fstab
          ...
          10.174.96.138@o2ib3:/lustre1 /mnt/tds_lustre1 lustre defaults,flock 0 0
          10.174.96.138@o2ib3:/lustre2 /mnt/tds_lustre2 lustre defaults,flock 0 0
          10.174.79.241@o2ib1:10.174.79.251@o2ib1:/scratch1 /mnt/lsc_lustre1 lustre defaults,flock 0 0
          10.174.79.242@o2ib1:10.174.79.252@o2ib1:/scratch2 /mnt/lsc_lustre2 lustre defaults,flock 0 0

          Now, the production filesystems (scratch1, scratch2) mount and the TDS filesystems fail to mount.

          [root@r1i3n15 ~]# mount -at lustre
          mount.lustre: mount 10.174.96.138@o2ib3:/lustre1 at /mnt/tds_lustre1 failed: Cannot send after transport endpoint shutdown
          mount.lustre: mount 10.174.96.138@o2ib3:/lustre2 at /mnt/tds_lustre2 failed: File exists
          [root@r1i3n15 ~]# df
          Filesystem 1K-blocks Used Available Use% Mounted on
          tmpfs 153600 1708 151892 2% /tmp
          10.181.1.2:/contrib 137625600 3002528 134623072 3% /contrib
          10.181.1.2:/testapps/v1
          45875200 35991488 9883712 79% /apps
          10.181.1.2:/testhome 550764544 166799968 383964576 31% /home
          10.174.79.241@o2ib1:10.174.79.251@o2ib1:/scratch1
          2688660012544 29627611556 2632114228424 2% /mnt/lsc_lustre1
          10.174.79.242@o2ib1:10.174.79.252@o2ib1:/scratch2
          3360825015680 785492156 3326396150596 1% /mnt/lsc_lustre2

          dnelson@ddn.com Dennis Nelson added a comment - OK, I tried the following: [root@r1i3n15 ~] # cat /etc/modprobe.d/lustre.conf Lustre module configuration file options lnet networks="o2ib3(ib1), o2ib1(ib0)" [root@r1i3n15 ~] # lctl list_nids 10.174.96.65@o2ib3 10.174.64.65@o2ib1 [root@r1i3n15 ~] # cat /etc/fstab ... 10.174.96.138@o2ib3:/lustre1 /mnt/tds_lustre1 lustre defaults,flock 0 0 10.174.96.138@o2ib3:/lustre2 /mnt/tds_lustre2 lustre defaults,flock 0 0 10.174.79.241@o2ib1:10.174.79.251@o2ib1:/scratch1 /mnt/lsc_lustre1 lustre defaults,flock 0 0 10.174.79.242@o2ib1:10.174.79.252@o2ib1:/scratch2 /mnt/lsc_lustre2 lustre defaults,flock 0 0 Now, the production filesystems (scratch1, scratch2) mount and the TDS filesystems fail to mount. [root@r1i3n15 ~] # mount -at lustre mount.lustre: mount 10.174.96.138@o2ib3:/lustre1 at /mnt/tds_lustre1 failed: Cannot send after transport endpoint shutdown mount.lustre: mount 10.174.96.138@o2ib3:/lustre2 at /mnt/tds_lustre2 failed: File exists [root@r1i3n15 ~] # df Filesystem 1K-blocks Used Available Use% Mounted on tmpfs 153600 1708 151892 2% /tmp 10.181.1.2:/contrib 137625600 3002528 134623072 3% /contrib 10.181.1.2:/testapps/v1 45875200 35991488 9883712 79% /apps 10.181.1.2:/testhome 550764544 166799968 383964576 31% /home 10.174.79.241@o2ib1:10.174.79.251@o2ib1:/scratch1 2688660012544 29627611556 2632114228424 2% /mnt/lsc_lustre1 10.174.79.242@o2ib1:10.174.79.252@o2ib1:/scratch2 3360825015680 785492156 3326396150596 1% /mnt/lsc_lustre2
          liang Liang Zhen (Inactive) added a comment - - edited

          Here is my undestanding about your setting, please correct me if I was wrong:

           
          client                      TDS MDS                    Production MDS
          ---------                   ---------                  -------
          rli3n15                     mds01                      lfs-mds-1-1 (scratch1)
          10.174.96.64@o2ib0(ib1)     10.174.96.138@o2ib0 [y]    10.174.31.241@o2ib0 [n]
          10.174.64.65@o2ib1(ib0)                                10.174.79.241@o2ib1 [y]
          
          [y] == [yes], means we can reach that NID via "lctl ping" from rli3n15
          [n] == [no],  means we can not reach that NID via "lctl ping" from rli3n15
          
          

          So between rli3n15 and lfs-mds-1-1:

          • 10.174.64.65@o2ib1(ib0) and 10.174.79.241@o2ib1 are on the same LNet network,and they are physically reachable to each other
          • 10.174.96.64@o2ib0(ib1) and 10.174.31.241@o2ib0 are on the same LNet network,
            but they are physically unreachable to each other

          I think if you try to mount scratch1 from rli3n15, it will firstly look at all N
          IDs of lfs-mds-1-1, and it found both itself and lfs-mds-1-1 have two local NIDs on o2ib0 and o2ib1 (although they can't reach eath other on o2ib0), and LNet hop of these two NIDs are same and both interfaces are healthy, so ptlrpc will choose the first NID of lfs-mds-1-1 10.174.31.241@o2ib0, which is actually unreachable to rli3n15.

          I would suggest to try with this one rli3n15:
          options lnet networks="o2ib1(ib0)"

          and try to mount scratch1,2, if it can work, I would suggest to use configuratio
          n like this:

           
          client                      TDS MDS                    Production MDS
          ---------                   ---------                  -------
          rli3n15                     mds01                      lfs-mds-1-1 (scratch1)
          10.174.96.64@o2ib3(ib1)     10.174.96.138@o2ib3 [y]    
          10.174.64.65@o2ib1(ib0)                                10.174.79.241@o2ib1 [y]
                                                                 10.174.31.241@o2ib0 [y]
          
          

          The only change we made here is:
          o2ib0 on rli3n15 and mds01 is replaced by o2ib3, of course, if it can work you will have to change all nodes on TDS to o2ib3...

          liang Liang Zhen (Inactive) added a comment - - edited Here is my undestanding about your setting, please correct me if I was wrong: client TDS MDS Production MDS --------- --------- ------- rli3n15 mds01 lfs-mds-1-1 (scratch1) 10.174.96.64@o2ib0(ib1) 10.174.96.138@o2ib0 [y] 10.174.31.241@o2ib0 [n] 10.174.64.65@o2ib1(ib0) 10.174.79.241@o2ib1 [y] [y] == [yes], means we can reach that NID via "lctl ping" from rli3n15 [n] == [no], means we can not reach that NID via "lctl ping" from rli3n15 So between rli3n15 and lfs-mds-1-1: 10.174.64.65@o2ib1(ib0) and 10.174.79.241@o2ib1 are on the same LNet network,and they are physically reachable to each other 10.174.96.64@o2ib0(ib1) and 10.174.31.241@o2ib0 are on the same LNet network, but they are physically unreachable to each other I think if you try to mount scratch1 from rli3n15, it will firstly look at all N IDs of lfs-mds-1-1, and it found both itself and lfs-mds-1-1 have two local NIDs on o2ib0 and o2ib1 (although they can't reach eath other on o2ib0), and LNet hop of these two NIDs are same and both interfaces are healthy, so ptlrpc will choose the first NID of lfs-mds-1-1 10.174.31.241@o2ib0, which is actually unreachable to rli3n15. I would suggest to try with this one rli3n15: options lnet networks="o2ib1(ib0)" and try to mount scratch1,2, if it can work, I would suggest to use configuratio n like this: client TDS MDS Production MDS --------- --------- ------- rli3n15 mds01 lfs-mds-1-1 (scratch1) 10.174.96.64@o2ib3(ib1) 10.174.96.138@o2ib3 [y] 10.174.64.65@o2ib1(ib0) 10.174.79.241@o2ib1 [y] 10.174.31.241@o2ib0 [y] The only change we made here is: o2ib0 on rli3n15 and mds01 is replaced by o2ib3, of course, if it can work you will have to change all nodes on TDS to o2ib3...

          I realized that I did not answer one question. There is only one MDS on the TDS filesystem and it has only one nid:

          [root@mds01 ~]# lctl list_nids
          10.174.96.138@o2ib

          [root@r1i3n15 ~]# netstat -rn
          Kernel IP routing table
          Destination Gateway Genmask Flags MSS Window irtt Iface
          10.181.1.0 10.174.64.67 255.255.255.0 UG 0 0 0 ib0
          192.168.159.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
          10.174.64.0 0.0.0.0 255.255.240.0 U 0 0 0 ib0
          10.174.96.0 0.0.0.0 255.255.240.0 U 0 0 0 ib1
          169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 ib1
          169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0

          As you can see, there is no route to 10.174.31.241.

          [root@r1i3n15 ~]# ping 10.174.31.241
          connect: Network is unreachable

          [root@r1i3n15 ~]# cat /etc/modprobe.d/lustre.conf

          1. Lustre module configuration file
            options lnet networks="o2ib(ib1)"

          [root@r1i3n15 ~]# df
          Filesystem 1K-blocks Used Available Use% Mounted on
          tmpfs 153600 60 153540 1% /tmp
          ...
          10.174.96.138@o2ib:/lustre1
          30523086656 2401014176 27816223672 8% /mnt/tds_lustre1
          10.174.96.138@o2ib:/lustre2
          30523086656 268760500 29948475720 1% /mnt/tds_lustre2

          If I unmount the TDS filesystems and change the modprobe.d/lustre.conf file to only include the ib0 port:

          [root@r1i3n15 ~]# cat /etc/modprobe.d/lustre.conf

          1. Lustre module configuration file
            options lnet networks="o2ib(ib0)"

          I cannot communicate with the MDS. I get this error:
          [root@r1i3n15 ~]# lctl list_nids
          10.174.64.65@o2ib
          [root@r1i3n15 ~]# cat /etc/fstab
          ...
          10.174.79.241@o2ib:10.174.79.251@o2ib:/scratch1 /mnt/lsc_lustre1 lustre defaults,flock 0 0
          10.174.79.242@o2ib:10.174.79.252@o2ib:/scratch2 /mnt/lsc_lustre2 lustre defaults,flock 0 0

          [root@r1i3n15 ~]# mount -at lustre
          mount.lustre: mount 10.174.79.241@o2ib:10.174.79.251@o2ib:/scratch1 at /mnt/lsc_lustre1 failed: Cannot send after transport endpoint shutdown
          mount.lustre: mount 10.174.79.242@o2ib:10.174.79.252@o2ib:/scratch2 at /mnt/lsc_lustre2 failed: Cannot send after transport endpoint shutdown

          Dec 14 12:17:23 r1i3n15 kernel: Lustre: 27084:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1388172991791113 sent from MGC10.174.79.241@o2ib to NID 10.174.79.241@o2ib 0s ago has failed due to network error (5s prior to deadline).
          Dec 14 12:17:23 r1i3n15 kernel: req@ffff880639bf3400 x1388172991791113/t0 o250->MGS@MGC10.174.79.241@o2ib_0:26/25 lens 368/584 e 0 to 1 dl 1323865048 ref 1 fl Rpc:N/0/0 rc 0/0
          Dec 14 12:17:23 r1i3n15 kernel: LustreError: 1280:0:(o2iblnd_cb.c:2532:kiblnd_rejected()) 10.174.79.241@o2ib rejected: o2iblnd fatal error
          Dec 14 12:17:48 r1i3n15 kernel: Lustre: 27084:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1388172991791115 sent from MGC10.174.79.241@o2ib to NID 10.174.79.251@o2ib 0s ago has failed due to network error (5s prior to deadline).
          Dec 14 12:17:48 r1i3n15 kernel: req@ffff880639347c00 x1388172991791115/t0 o250->MGS@MGC10.174.79.241@o2ib_1:26/25 lens 368/584 e 0 to 1 dl 1323865073 ref 1 fl Rpc:N/0/0 rc 0/0
          Dec 14 12:17:48 r1i3n15 kernel: LustreError: 27292:0:(client.c:858:ptlrpc_import_delay_req()) @@@ IMP_INVALID req@ffff880639bf3000 x1388172991791116/t0 o501->MGS@MGC10.174.79.241@o2ib_1:26/25 lens 264/432 e 0 to 1 dl 0 ref 1 fl Rpc:/0/0 rc 0/0
          Dec 14 12:17:48 r1i3n15 kernel: LustreError: 1280:0:(o2iblnd_cb.c:2532:kiblnd_rejected()) 10.174.79.251@o2ib rejected: o2iblnd fatal error
          Dec 14 12:17:48 r1i3n15 kernel: LustreError: 15c-8: MGC10.174.79.241@o2ib: The configuration from log 'scratch1-client' failed (-108). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
          Dec 14 12:17:48 r1i3n15 kernel: LustreError: 27292:0:(llite_lib.c:1095:ll_fill_super()) Unable to process log: -108
          Dec 14 12:17:48 r1i3n15 kernel: Lustre: client ffff8803386abc00 umount complete
          Dec 14 12:17:48 r1i3n15 kernel: LustreError: 27292:0:(obd_mount.c:2065:lustre_fill_super()) Unable to mount (-108)
          Dec 14 12:18:13 r1i3n15 kernel: Lustre: 27084:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1388172991791119 sent from MGC10.174.79.242@o2ib to NID 10.174.79.252@o2ib 0s ago has failed due to network error (5s prior to deadline).
          Dec 14 12:18:13 r1i3n15 kernel: req@ffff88063be32400 x1388172991791119/t0 o250->MGS@MGC10.174.79.242@o2ib_1:26/25 lens 368/584 e 0 to 1 dl 1323865098 ref 1 fl Rpc:N/0/0 rc 0/0
          Dec 14 12:18:13 r1i3n15 kernel: Lustre: 27084:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 1 previous similar message
          Dec 14 12:18:13 r1i3n15 kernel: LustreError: 27343:0:(client.c:858:ptlrpc_import_delay_req()) @@@ IMP_INVALID req@ffff8803247b2000 x1388172991791120/t0 o501->MGS@MGC10.174.79.242@o2ib_1:26/25 lens 264/432 e 0 to 1 dl 0 ref 1 fl Rpc:/0/0 rc 0/0
          Dec 14 12:18:13 r1i3n15 kernel: LustreError: 15c-8: MGC10.174.79.242@o2ib: The configuration from log 'scratch2-client' failed (-108). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
          Dec 14 12:18:13 r1i3n15 kernel: LustreError: 27343:0:(llite_lib.c:1095:ll_fill_super()) Unable to process log: -108
          Dec 14 12:18:13 r1i3n15 kernel: LustreError: 1293:0:(o2iblnd_cb.c:2532:kiblnd_rejected()) 10.174.79.252@o2ib rejected: o2iblnd fatal error
          Dec 14 12:18:13 r1i3n15 kernel: Lustre: client ffff880322a0c400 umount complete
          Dec 14 12:18:13 r1i3n15 kernel: LustreError: 27343:0:(obd_mount.c:2065:lustre_fill_super()) Unable to mount (-108)
          Dec 14 12:18:13 r1i3n15 kernel: LustreError: 1293:0:(o2iblnd_cb.c:2532:kiblnd_rejected()) Skipped 1 previous similar message

          From what I can see, there is no indication of a network problem:

          [root@lfs-mds-1-1 ~]# ibstat
          CA 'mlx4_0'
          CA type: MT26428
          Number of ports: 2
          Firmware version: 2.9.1000
          Hardware version: b0
          Node GUID: 0x0002c9030010c5f4
          System image GUID: 0x0002c9030010c5f7
          Port 1:
          State: Active
          Physical state: LinkUp
          Rate: 40
          Base lid: 152
          LMC: 0
          SM lid: 1
          Capability mask: 0x02510868
          Port GUID: 0x0002c9030010c5f5
          Link layer: IB
          Port 2:
          State: Active
          Physical state: LinkUp
          Rate: 40
          Base lid: 5232
          LMC: 0
          SM lid: 2
          Capability mask: 0x02510868
          Port GUID: 0x0002c9030010c5f6
          Link layer: IB
          CA 'mlx4_1'
          CA type: MT26428
          Number of ports: 2
          Firmware version: 2.9.1000
          Hardware version: b0
          Node GUID: 0x0002c9030010c6b0
          System image GUID: 0x0002c9030010c6b3
          Port 1:
          State: Active
          Physical state: LinkUp
          Rate: 40
          Base lid: 104
          LMC: 0
          SM lid: 106
          Capability mask: 0x02510868
          Port GUID: 0x0002c9030010c6b1
          Link layer: IB
          Port 2:
          State: Active
          Physical state: LinkUp
          Rate: 40
          Base lid: 64
          LMC: 0
          SM lid: 1
          Capability mask: 0x02510868
          Port GUID: 0x0002c9030010c6b2
          Link layer: IB
          [root@lfs-mds-1-1 ~]# ibping -C mlx4_1 -P 1 -S

          [root@r1i3n15 ~]# ibping -G 0x0002c9030010c6b1
          Pong from lfs-mds-1-1.(none) (Lid 104): time 0.156 ms
          Pong from lfs-mds-1-1.(none) (Lid 104): time 0.154 ms
          Pong from lfs-mds-1-1.(none) (Lid 104): time 0.137 ms
          Pong from lfs-mds-1-1.(none) (Lid 104): time 0.134 ms
          Pong from lfs-mds-1-1.(none) (Lid 104): time 0.056 ms
          Pong from lfs-mds-1-1.(none) (Lid 104): time 0.131 ms
          ^C
          — lfs-mds-1-1.(none) (Lid 104) ibping statistics —
          6 packets transmitted, 6 received, 0% packet loss, time 5123 ms
          rtt min/avg/max = 0.056/0.128/0.156 ms

          Yet, lctl ping fails:

          [root@r1i3n15 ~]# lctl ping 10.174.79.241@o2ib
          failed to ping 10.174.79.241@o2ib: Input/output error

          If I go back to the original configuration:

          [root@r1i3n15 ~]# cat /etc/modprobe.d/lustre.conf

          1. Lustre module configuration file
            options lnet networks="o2ib(ib1), o2ib1(ib0)"

          [root@r1i3n15 ~]# lctl list_nids
          10.174.96.65@o2ib
          10.174.64.65@o2ib1

          [root@r1i3n15 ~]# cat /etc/fstab

          1. <file system> <mount point> <type> <options> <dump> <pass>
            ...
            10.174.96.138@o2ib:/lustre1 /mnt/tds_lustre1 lustre defaults,flock 0 0
            10.174.96.138@o2ib:/lustre2 /mnt/tds_lustre2 lustre defaults,flock 0 0
            10.174.79.241@o2ib1:10.174.79.251@o2ib1:/scratch1 /mnt/lsc_lustre1 lustre defaults,flock 0 0
            10.174.79.242@o2ib1:10.174.79.252@o2ib1:/scratch2 /mnt/lsc_lustre2 lustre defaults,flock 0 0

          The TDS filesystems mount (lustre1, lustre2) and the production filesystems (scratch1, scratch2) just hang while performing the mount.

          dnelson@ddn.com Dennis Nelson added a comment - I realized that I did not answer one question. There is only one MDS on the TDS filesystem and it has only one nid: [root@mds01 ~] # lctl list_nids 10.174.96.138@o2ib [root@r1i3n15 ~] # netstat -rn Kernel IP routing table Destination Gateway Genmask Flags MSS Window irtt Iface 10.181.1.0 10.174.64.67 255.255.255.0 UG 0 0 0 ib0 192.168.159.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0 10.174.64.0 0.0.0.0 255.255.240.0 U 0 0 0 ib0 10.174.96.0 0.0.0.0 255.255.240.0 U 0 0 0 ib1 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 ib1 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0 As you can see, there is no route to 10.174.31.241. [root@r1i3n15 ~] # ping 10.174.31.241 connect: Network is unreachable [root@r1i3n15 ~] # cat /etc/modprobe.d/lustre.conf Lustre module configuration file options lnet networks="o2ib(ib1)" [root@r1i3n15 ~] # df Filesystem 1K-blocks Used Available Use% Mounted on tmpfs 153600 60 153540 1% /tmp ... 10.174.96.138@o2ib:/lustre1 30523086656 2401014176 27816223672 8% /mnt/tds_lustre1 10.174.96.138@o2ib:/lustre2 30523086656 268760500 29948475720 1% /mnt/tds_lustre2 If I unmount the TDS filesystems and change the modprobe.d/lustre.conf file to only include the ib0 port: [root@r1i3n15 ~] # cat /etc/modprobe.d/lustre.conf Lustre module configuration file options lnet networks="o2ib(ib0)" I cannot communicate with the MDS. I get this error: [root@r1i3n15 ~] # lctl list_nids 10.174.64.65@o2ib [root@r1i3n15 ~] # cat /etc/fstab ... 10.174.79.241@o2ib:10.174.79.251@o2ib:/scratch1 /mnt/lsc_lustre1 lustre defaults,flock 0 0 10.174.79.242@o2ib:10.174.79.252@o2ib:/scratch2 /mnt/lsc_lustre2 lustre defaults,flock 0 0 [root@r1i3n15 ~] # mount -at lustre mount.lustre: mount 10.174.79.241@o2ib:10.174.79.251@o2ib:/scratch1 at /mnt/lsc_lustre1 failed: Cannot send after transport endpoint shutdown mount.lustre: mount 10.174.79.242@o2ib:10.174.79.252@o2ib:/scratch2 at /mnt/lsc_lustre2 failed: Cannot send after transport endpoint shutdown Dec 14 12:17:23 r1i3n15 kernel: Lustre: 27084:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1388172991791113 sent from MGC10.174.79.241@o2ib to NID 10.174.79.241@o2ib 0s ago has failed due to network error (5s prior to deadline). Dec 14 12:17:23 r1i3n15 kernel: req@ffff880639bf3400 x1388172991791113/t0 o250->MGS@MGC10.174.79.241@o2ib_0:26/25 lens 368/584 e 0 to 1 dl 1323865048 ref 1 fl Rpc:N/0/0 rc 0/0 Dec 14 12:17:23 r1i3n15 kernel: LustreError: 1280:0:(o2iblnd_cb.c:2532:kiblnd_rejected()) 10.174.79.241@o2ib rejected: o2iblnd fatal error Dec 14 12:17:48 r1i3n15 kernel: Lustre: 27084:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1388172991791115 sent from MGC10.174.79.241@o2ib to NID 10.174.79.251@o2ib 0s ago has failed due to network error (5s prior to deadline). Dec 14 12:17:48 r1i3n15 kernel: req@ffff880639347c00 x1388172991791115/t0 o250->MGS@MGC10.174.79.241@o2ib_1:26/25 lens 368/584 e 0 to 1 dl 1323865073 ref 1 fl Rpc:N/0/0 rc 0/0 Dec 14 12:17:48 r1i3n15 kernel: LustreError: 27292:0:(client.c:858:ptlrpc_import_delay_req()) @@@ IMP_INVALID req@ffff880639bf3000 x1388172991791116/t0 o501->MGS@MGC10.174.79.241@o2ib_1:26/25 lens 264/432 e 0 to 1 dl 0 ref 1 fl Rpc:/0/0 rc 0/0 Dec 14 12:17:48 r1i3n15 kernel: LustreError: 1280:0:(o2iblnd_cb.c:2532:kiblnd_rejected()) 10.174.79.251@o2ib rejected: o2iblnd fatal error Dec 14 12:17:48 r1i3n15 kernel: LustreError: 15c-8: MGC10.174.79.241@o2ib: The configuration from log 'scratch1-client' failed (-108). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. Dec 14 12:17:48 r1i3n15 kernel: LustreError: 27292:0:(llite_lib.c:1095:ll_fill_super()) Unable to process log: -108 Dec 14 12:17:48 r1i3n15 kernel: Lustre: client ffff8803386abc00 umount complete Dec 14 12:17:48 r1i3n15 kernel: LustreError: 27292:0:(obd_mount.c:2065:lustre_fill_super()) Unable to mount (-108) Dec 14 12:18:13 r1i3n15 kernel: Lustre: 27084:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1388172991791119 sent from MGC10.174.79.242@o2ib to NID 10.174.79.252@o2ib 0s ago has failed due to network error (5s prior to deadline). Dec 14 12:18:13 r1i3n15 kernel: req@ffff88063be32400 x1388172991791119/t0 o250->MGS@MGC10.174.79.242@o2ib_1:26/25 lens 368/584 e 0 to 1 dl 1323865098 ref 1 fl Rpc:N/0/0 rc 0/0 Dec 14 12:18:13 r1i3n15 kernel: Lustre: 27084:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 1 previous similar message Dec 14 12:18:13 r1i3n15 kernel: LustreError: 27343:0:(client.c:858:ptlrpc_import_delay_req()) @@@ IMP_INVALID req@ffff8803247b2000 x1388172991791120/t0 o501->MGS@MGC10.174.79.242@o2ib_1:26/25 lens 264/432 e 0 to 1 dl 0 ref 1 fl Rpc:/0/0 rc 0/0 Dec 14 12:18:13 r1i3n15 kernel: LustreError: 15c-8: MGC10.174.79.242@o2ib: The configuration from log 'scratch2-client' failed (-108). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. Dec 14 12:18:13 r1i3n15 kernel: LustreError: 27343:0:(llite_lib.c:1095:ll_fill_super()) Unable to process log: -108 Dec 14 12:18:13 r1i3n15 kernel: LustreError: 1293:0:(o2iblnd_cb.c:2532:kiblnd_rejected()) 10.174.79.252@o2ib rejected: o2iblnd fatal error Dec 14 12:18:13 r1i3n15 kernel: Lustre: client ffff880322a0c400 umount complete Dec 14 12:18:13 r1i3n15 kernel: LustreError: 27343:0:(obd_mount.c:2065:lustre_fill_super()) Unable to mount (-108) Dec 14 12:18:13 r1i3n15 kernel: LustreError: 1293:0:(o2iblnd_cb.c:2532:kiblnd_rejected()) Skipped 1 previous similar message From what I can see, there is no indication of a network problem: [root@lfs-mds-1-1 ~] # ibstat CA 'mlx4_0' CA type: MT26428 Number of ports: 2 Firmware version: 2.9.1000 Hardware version: b0 Node GUID: 0x0002c9030010c5f4 System image GUID: 0x0002c9030010c5f7 Port 1: State: Active Physical state: LinkUp Rate: 40 Base lid: 152 LMC: 0 SM lid: 1 Capability mask: 0x02510868 Port GUID: 0x0002c9030010c5f5 Link layer: IB Port 2: State: Active Physical state: LinkUp Rate: 40 Base lid: 5232 LMC: 0 SM lid: 2 Capability mask: 0x02510868 Port GUID: 0x0002c9030010c5f6 Link layer: IB CA 'mlx4_1' CA type: MT26428 Number of ports: 2 Firmware version: 2.9.1000 Hardware version: b0 Node GUID: 0x0002c9030010c6b0 System image GUID: 0x0002c9030010c6b3 Port 1: State: Active Physical state: LinkUp Rate: 40 Base lid: 104 LMC: 0 SM lid: 106 Capability mask: 0x02510868 Port GUID: 0x0002c9030010c6b1 Link layer: IB Port 2: State: Active Physical state: LinkUp Rate: 40 Base lid: 64 LMC: 0 SM lid: 1 Capability mask: 0x02510868 Port GUID: 0x0002c9030010c6b2 Link layer: IB [root@lfs-mds-1-1 ~] # ibping -C mlx4_1 -P 1 -S [root@r1i3n15 ~] # ibping -G 0x0002c9030010c6b1 Pong from lfs-mds-1-1.(none) (Lid 104): time 0.156 ms Pong from lfs-mds-1-1.(none) (Lid 104): time 0.154 ms Pong from lfs-mds-1-1.(none) (Lid 104): time 0.137 ms Pong from lfs-mds-1-1.(none) (Lid 104): time 0.134 ms Pong from lfs-mds-1-1.(none) (Lid 104): time 0.056 ms Pong from lfs-mds-1-1.(none) (Lid 104): time 0.131 ms ^C — lfs-mds-1-1.(none) (Lid 104) ibping statistics — 6 packets transmitted, 6 received, 0% packet loss, time 5123 ms rtt min/avg/max = 0.056/0.128/0.156 ms Yet, lctl ping fails: [root@r1i3n15 ~] # lctl ping 10.174.79.241@o2ib failed to ping 10.174.79.241@o2ib: Input/output error If I go back to the original configuration: [root@r1i3n15 ~] # cat /etc/modprobe.d/lustre.conf Lustre module configuration file options lnet networks="o2ib(ib1), o2ib1(ib0)" [root@r1i3n15 ~] # lctl list_nids 10.174.96.65@o2ib 10.174.64.65@o2ib1 [root@r1i3n15 ~] # cat /etc/fstab <file system> <mount point> <type> <options> <dump> <pass> ... 10.174.96.138@o2ib:/lustre1 /mnt/tds_lustre1 lustre defaults,flock 0 0 10.174.96.138@o2ib:/lustre2 /mnt/tds_lustre2 lustre defaults,flock 0 0 10.174.79.241@o2ib1:10.174.79.251@o2ib1:/scratch1 /mnt/lsc_lustre1 lustre defaults,flock 0 0 10.174.79.242@o2ib1:10.174.79.252@o2ib1:/scratch2 /mnt/lsc_lustre2 lustre defaults,flock 0 0 The TDS filesystems mount (lustre1, lustre2) and the production filesystems (scratch1, scratch2) just hang while performing the mount.

          No, these clients cannot lctl ping, or ping, the 10.174.31.241 address. That bid exists on the servers to support the scratch1 filesystems from the production clients.

          Yes, r1i3n15 can mount the TDS filesystem.

          dnelson@ddn.com Dennis Nelson added a comment - No, these clients cannot lctl ping, or ping, the 10.174.31.241 address. That bid exists on the servers to support the scratch1 filesystems from the production clients. Yes, r1i3n15 can mount the TDS filesystem.

          People

            cliffw Cliff White (Inactive)
            dnelson@ddn.com Dennis Nelson
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: