Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14381

Client stuck using single NID when multiple are available

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • Lustre 2.14.0
    • None
    • 3
    • 9223372036854775807

    Description

      Filing this as a bug, but it might be user-error. The problem I'm seeing is that clients are only attempting to connect to one of two available NIDs on the servers. Both clients and servers are configured with two NIDs, one on @tcp and on one @tcp1. Discovery/multi-rail is disabled. If I simulate failure of the @tcp NID on the OSS then I would expect the client to eventually try the @tcp1 NID, but that never happens. I tried modifying test-framework.sh so that it formats the OST with --servicenode=192.168.2.34@tcp,192.168.2.35@tcp1, but that didn't make any difference. Is this working as expected? Is this a bug? Am I missing some necessary config to allow client to use either NID?

      Edit 1: There was a suggestion to put the interfaces on separate subnets. I tried that and it did not resolve the issue. See this comment.

      Edit 2: Attached -1 debug log from the client client.dklog . Note this debug log was captured after I changed the configuration to put the interfaces on separate networks as noted in this comment.

      Details on how to reproduce follow.

      Version under test:

      sles15s01:/home/hornc/fs2 # ./LUSTRE-VERSION-GEN
      2.13.57_71_gb538826
      sles15s01:/home/hornc/fs2 #
      

      LNet configuration is tcp(eth0) and tcp1(eth1) with LNet peer discovery disabled:

      sles15s01:/home/hornc/fs2/lustre/tests # cat /etc/modprobe.d/lustre.conf
      options libcfs libcfs_debug=320735104
      options libcfs libcfs_subsystem_debug=-2049
      options lnet lnet_peer_discovery_disabled=1
      options lnet ip2nets="tcp(eth0) 192.168.2.[30,32,34,36,38,39,40,41]; tcp1(eth1) 192.168.2.[31,33,35,37,42,43,44,45]"
      sles15s01:/home/hornc/fs2/lustre/tests #
      

      Test-framework config:

      sles15s01:/home/hornc/fs2/lustre/tests # cat cfg/hornc.sh
      # facet hosts
      MDSCOUNT=1
      
      mds_HOST=sles15s01
      MDSDEV1=/dev/sdc
      
      OSTCOUNT=1
      
      ost_HOST=sles15s03
      OSTDEV1=/dev/sde
      
      CLIENTCOUNT=1
      RCLIENTS="sles15c01"
      PDSH="pdsh -S -Rssh -w"
      
      SHARED_DIRECTORY="/shared/testing"
      MGSNID="192.168.2.30@tcp,192.168.2.31@tcp1"
      
      . /home/hornc/fs2/lustre/tests/cfg/ncli.sh
      sles15s01:/home/hornc/fs2/lustre/tests #
      

      Use llmount.sh to stand up filesystem:

      sles15s01:/home/hornc/fs2/lustre/tests # NAME=hornc LOAD_MODULES_REMOTE=true VERBOSE=true /home/hornc/fs2/lustre/tests/llmount.sh
      ...
      sles15s01:/home/hornc/fs2/lustre/tests # pdsh -w sles15s0[1,3],sles15c01 lctl list_nids | dshbak -c
      ----------------
      sles15s01
      ----------------
      192.168.2.30@tcp
      192.168.2.31@tcp1
      ----------------
      sles15c01
      ----------------
      192.168.2.38@tcp
      192.168.2.42@tcp1
      ----------------
      sles15s03
      ----------------
      192.168.2.34@tcp
      192.168.2.35@tcp1
      sles15s01:/home/hornc/fs2 # pdsh -w sles15s0[1,3],sles15c01 'lnetctl global show' | dshbak -c
      ----------------
      sles15c01,sles15s[01,03]
      ----------------
      global:
          numa_range: 0
          max_intf: 200
          discovery: 0
          drop_asym_route: 0
          retry_count: 2
          transaction_timeout: 50
          health_sensitivity: 100
          recovery_interval: 1
          router_sensitivity: 100
          lnd_timeout: 16
          response_tracking: 3
      sles15s01:/home/hornc/fs2 #
      

      Unmount lustre from the client then re-mount with a drop rule in place, so that any traffic the client sends to eth0 on the OSS (a.k.a. 192.168.2.34@tcp) is dropped:

      sles15c01:~ # umount /mnt/lustre
      sles15c01:~ # lctl net down; lustre_rmmod
      LNET busy
      sles15c01:~ # tf_start.sh
      Loading modules from /home/hornc/fs2/lustre
      detected 1 online CPUs by sysfs
      libcfs will create CPU partition based on online CPUs
      ../libcfs/libcfs/libcfs options: 'libcfs_debug=320735104 libcfs_subsystem_debug=-2049'
      ../lnet/lnet/lnet options: 'lnet_peer_discovery_disabled=1 ip2nets="tcp(eth0) 192.168.2.[30,32,34,36,38,39,40,41]; tcp1(eth1) 192.168.2.[31,33,35,37,42,43,44,45]" accept=all'
      quota/lquota options: 'hash_lqs_cur_bits=3'
      sles15c01:~ # lctl set_param debug=+'net rpctrace'
      debug=+net rpctrace
      sles15c01:~ # lctl get_param debug
      debug=
      super ioctl neterror net warning dlmtrace error emerg ha rpctrace vfstrace config console lfsck
      sles15c01:~ # lctl net_drop_add -s *@tcp -d 192.168.2.34@tcp -r 1 -e remote_timeout
      Added drop rule 255.255.255.255@tcp->192.168.2.34@tcp (1/1)
      sles15c01:~ # mount -t lustre -o user_xattr,flock 192.168.2.30@tcp,192.168.2.31@tcp1:/lustre /mnt/lustre
      sles15c01:~ # lfs check servers
      lfs check: error: check 'lustre-OST0000-osc-ffff9c00a8b70000': Resource temporarily unavailable (11)
      lustre-MDT0000-mdc-ffff9c00a8b70000 active.
      sles15c01:~ #
      

      Wait a couple minutes and dump the log:

      sles15c01:~ # sleep 120; lfs check servers; lctl dk > /tmp/dk.log
      lfs check: error: check 'lustre-OST0000-osc-ffff9c00a8b70000': Resource temporarily unavailable (11)
      lustre-MDT0000-mdc-ffff9c00a8b70000 active.
      sles15c01:~ #
      

      With net trace enabled, LNet logs every send that it performs. Those entries look like this:

      00000400:00000200:0.0:1611862288.864372:0:11737:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 192.168.2.38@tcp(192.168.2.38@tcp:<?>) -> 192.168.2.34@tcp(192.168.2.34@tcp:192.168.2.34@tcp) <?> : GET try# 0
      

      So if LNet sent any message to an @tcp1 NID then we should have record of it in the log.

      We can see in the debug log that we haven't sent any messages to any tcp1 NIDs:

      sles15c01:~ # grep tcp1 /tmp/dk.log
      00000400:02000000:0.0:1611862138.860810:0:11727:0:(api-ni.c:2336:lnet_startup_lndni()) Added LNI 192.168.2.42@tcp1 [8/256/0/180]
      00000020:01000004:0.0:1611862159.890388:0:12205:0:(obd_mount.c:968:lmd_print()) device:  192.168.2.30@tcp,192.168.2.31@tcp1:/lustre
      00000020:00000080:0.0:1611862159.890446:0:12205:0:(obd_config.c:1383:class_process_config()) adding mapping from uuid MGC192.168.2.30@tcp_0 to nid 0x20001c0a8021f (192.168.2.31@tcp1)
      00000020:00000080:0.0:1611862159.900401:0:12218:0:(obd_config.c:1383:class_process_config()) adding mapping from uuid 192.168.2.30@tcp to nid 0x20001c0a8021f (192.168.2.31@tcp1)
      00000020:01000004:0.0:1611862159.903289:0:12218:0:(obd_mount.c:1004:lustre_check_exclusion()) Check exclusion lustre-OST0000 (0) in 0 of 192.168.2.30@tcp,192.168.2.31@tcp1:/lustre
      00000020:00000080:0.0:1611862159.903298:0:12218:0:(obd_config.c:1383:class_process_config()) adding mapping from uuid 192.168.2.34@tcp to nid 0x20001c0a80223 (192.168.2.35@tcp1)
      00000020:00000004:0.0:1611862160.910881:0:12205:0:(obd_mount.c:1683:lustre_fill_super()) Mount 192.168.2.30@tcp,192.168.2.31@tcp1:/lustre complete
      sles15c01:~ # grep 192.168.2.35 /tmp/dk.log
      00000020:00000080:0.0:1611862159.903298:0:12218:0:(obd_config.c:1383:class_process_config()) adding mapping from uuid 192.168.2.34@tcp to nid 0x20001c0a80223 (192.168.2.35@tcp1)
      sles15c01:~ #
      

      Statistics on the OSS confirm no traffic on tcp1:

      sles15s03:~ # lnetctl net show --net tcp1 -v 2 | egrep -e send_count -e recv_count
                    send_count: 0
                    recv_count: 0
      sles15s03:~ #
      

      Attachments

        Activity

          People

            wc-triage WC Triage
            hornc Chris Horn
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated: