[LU-10245] lnetctl --cpt does not assosiate cpt properly Created: 15/Nov/17  Updated: 01/Dec/17  Resolved: 01/Dec/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.1
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Elena Gryaznova Assignee: WC Triage
Resolution: Not a Bug Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   
[root@fre805 tests]# cat  /sys/devices/system/cpu/online
0-1

[root@fre805 tests]# lnetctl net show --verbose
net:
    - net type: lo
      local NI(s):
        - nid: 0@lo
          status: up
          statistics:
              send_count: 0
              recv_count: 0
              drop_count: 0
          tunables:
              peer_timeout: 0
              peer_credits: 0
              peer_buffer_credits: 0
              credits: 0
          lnd tunables:
          tcp bonding: 0
          dev cpt: 0
          CPT: "[0]"
    - net type: tcp
      local NI(s):
        - nid: 192.168.108.5@tcp
          status: up
          interfaces:
              0: eth0
          statistics:
              send_count: 276723
              recv_count: 316655
              drop_count: 0
          tunables:
              peer_timeout: 180
              peer_credits: 8
              peer_buffer_credits: 0
              credits: 256
          lnd tunables:
          tcp bonding: 0
          dev cpt: -1
          CPT: "[0]"
        - nid: 192.168.118.5@tcp
          status: up
          interfaces:
              0: eth1
          statistics:
              send_count: 0
              recv_count: 0
              drop_count: 0
          tunables:
              peer_timeout: 180
              peer_credits: 8
              peer_buffer_credits: 0
              credits: 256
          lnd tunables:
          tcp bonding: 0
          dev cpt: -1
          CPT: "[0]"
[root@fre805 tests]# 

[root@fre805 tests]# lnetctl net add --net tcp --if eth2 --cpt [0, 1]

[root@fre805 tests]# lnetctl net show --verbose
net:
    - net type: lo
      local NI(s):
        - nid: 0@lo
          status: up
          statistics:
              send_count: 0
              recv_count: 0
              drop_count: 0
          tunables:
              peer_timeout: 0
              peer_credits: 0
              peer_buffer_credits: 0
              credits: 0
          lnd tunables:
          tcp bonding: 0
          dev cpt: 0
          CPT: "[0]"
    - net type: tcp
      local NI(s):
        - nid: 192.168.108.5@tcp
          status: up
          interfaces:
              0: eth0
          statistics:
              send_count: 276843
              recv_count: 316775
              drop_count: 0
          tunables:
              peer_timeout: 180
              peer_credits: 8
              peer_buffer_credits: 0
              credits: 256
          lnd tunables:
          tcp bonding: 0
          dev cpt: -1
          CPT: "[0]"
        - nid: 192.168.118.5@tcp
          status: up
          interfaces:
              0: eth1
          statistics:
              send_count: 0
              recv_count: 0
              drop_count: 0
          tunables:
              peer_timeout: 180
              peer_credits: 8
              peer_buffer_credits: 0
              credits: 256
          lnd tunables:
          tcp bonding: 0
          dev cpt: -1
          CPT: "[0]"
        - nid: 192.168.128.5@tcp
          status: up
          interfaces:
              0: eth2
          statistics:
              send_count: 0
              recv_count: 0
              drop_count: 0
          tunables:
              peer_timeout: 180
              peer_credits: 8
              peer_buffer_credits: 0
              credits: 256
          lnd tunables:
          tcp bonding: 0
          dev cpt: -1
          CPT: "[0]"
[root@fre805 tests]# 

nid 192.168.128.5@tcp is not associated with CPT [1]



 Comments   
Comment by Amir Shehata (Inactive) [ 15/Nov/17 ]

Take a look at your libcfs cpu partitions. Just because you have 2 CPUs doesn't directly mean that you'll endup with two CPTs. Both CPUs can be part of the same NUMA and by default libcfs cpu partitions will be set to "N", if you don't explicitly set it to something else. "N" means to use the NUMA architecture. IE: create a CPT per NUMA node which has at least one CPU attached to it.

From the output you shared above, I think that's the issue, since when you configure without explicitly specifying a cpt option, the network gets attached to CPT 0, which just means that you only have one CPT partition in your system.

to get your test to work you might need to add the following line in your lustre.conf:

options libcfs cpu_pattern="0[0], 1[1]"

That'll create two CPTs. CPT 0 will have CPU 0 and CPT 1 will have CPU 1.

The lnetctl syntax you used should then work.

Comment by Elena Gryaznova [ 01/Dec/17 ]

Amir,
thank you.
Yes, this works if the following is added to lustre.conf :

options libcfs cpu_pattern="0[0] 1[1]"
Comment by Elena Gryaznova [ 01/Dec/17 ]

Ticket can be closed.

Comment by Peter Jones [ 01/Dec/17 ]

thanks Elena

Generated at Sat Feb 10 02:33:20 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.