[LU-15746] UDSP: udsp_single_net_04 failed as "uneven tx traffic distribution across interfaces" Created: 14/Apr/22  Updated: 11/Oct/22

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.15.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Sarah Liu Assignee: Cyril Bordage
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

version=2.15.0_RC2_6_g09fe899

This is from LUTF output

lutf>>> suites['udsp'].scripts['udsp_single_net_04'].run()

nids:  ['10.240.43.102@tcp', '10.240.43.109@tcp', '10.240.43.110@tcp', '10.240.43.117@tcp']

None

[\{'net type': 'tcp', 'local NI(s)': [{'nid': '10.240.43.102@tcp', 'statistics': {'send_count': 2, 'recv_count': 2, 'drop_count': 0}, 'udsp info': \{'net priority': -1, 'nid priority': 0}, 'sent_stats': \{'put': 1, 'get': 1, 'reply': 0, 'ack': 0, 'hello': 0}, 'received_stats': \{'put': 0, 'get': 0, 'reply': 1, 'ack': 1, 'hello': 0}, 'dropped_stats': \{'put': 0, 'get': 0, 'reply': 0, 'ack': 0, 'hello': 0}, 'health stats': \{'fatal_error': 0, 'health value': 1000, 'interrupts': 0, 'dropped': 0, 'aborted': 0, 'no route': 0, 'timeouts': 0, 'error': 0, 'ping_count': 0, 'next_ping': 0}, 'lnd tunables': \{'conns_per_peer': 1}, 'dev cpt': -1, 'CPT': '[0]'}, \{'nid': '10.240.43.109@tcp', 'statistics': {'send_count': 1, 'recv_count': 1, 'drop_count': 0}, 'udsp info': \{'net priority': -1, 'nid priority': 0}, 'sent_stats': \{'put': 1, 'get': 0, 'reply': 0, 'ack': 0, 'hello': 0}, 'received_stats': \{'put': 1, 'get': 0, 'reply': 0, 'ack': 0, 'hello': 0}, 'dropped_stats': \{'put': 0, 'get': 0, 'reply': 0, 'ack': 0, 'hello': 0}, 'health stats': \{'fatal_error': 0, 'health value': 1000, 'interrupts': 0, 'dropped': 0, 'aborted': 0, 'no route': 0, 'timeouts': 0, 'error': 0, 'ping_count': 0, 'next_ping': 0}, 'lnd tunables': \{'conns_per_peer': 1}, 'dev cpt': -1, 'CPT': '[0]'}, \{'nid': '10.240.43.110@tcp', 'statistics': {'send_count': 0, 'recv_count': 0, 'drop_count': 0}, 'udsp info': \{'net priority': -1, 'nid priority': -1}, 'sent_stats': \{'put': 0, 'get': 0, 'reply': 0, 'ack': 0, 'hello': 0}, 'received_stats': \{'put': 0, 'get': 0, 'reply': 0, 'ack': 0, 'hello': 0}, 'dropped_stats': \{'put': 0, 'get': 0, 'reply': 0, 'ack': 0, 'hello': 0}, 'health stats': \{'fatal_error': 0, 'health value': 1000, 'interrupts': 0, 'dropped': 0, 'aborted': 0, 'no route': 0, 'timeouts': 0, 'error': 0, 'ping_count': 0, 'next_ping': 0}, 'lnd tunables': \{'conns_per_peer': 1}, 'dev cpt': -1, 'CPT': '[0]'}, \{'nid': '10.240.43.117@tcp', 'statistics': {'send_count': 0, 'recv_count': 0, 'drop_count': 0}, 'udsp info': \{'net priority': -1, 'nid priority': -1}, 'sent_stats': \{'put': 0, 'get': 0, 'reply': 0, 'ack': 0, 'hello': 0}, 'received_stats': \{'put': 0, 'get': 0, 'reply': 0, 'ack': 0, 'hello': 0}, 'dropped_stats': \{'put': 0, 'get': 0, 'reply': 0, 'ack': 0, 'hello': 0}, 'health stats': \{'fatal_error': 0, 'health value': 1000, 'interrupts': 0, 'dropped': 0, 'aborted': 0, 'no route': 0, 'timeouts': 0, 'error': 0, 'ping_count': 0, 'next_ping': 0}, 'lnd tunables': \{'conns_per_peer': 1}, 'dev cpt': -1, 'CPT': '[0]'}]}]

[\{'net type': 'tcp', 'local NI(s)': [{'nid': '10.240.43.102@tcp', 'statistics': {'send_count': 7, 'recv_count': 7, 'drop_count': 0}, 'udsp info': \{'net priority': -1, 'nid priority': 0}, 'sent_stats': \{'put': 1, 'get': 6, 'reply': 0, 'ack': 0, 'hello': 0}, 'received_stats': \{'put': 0, 'get': 0, 'reply': 6, 'ack': 1, 'hello': 0}, 'dropped_stats': \{'put': 0, 'get': 0, 'reply': 0, 'ack': 0, 'hello': 0}, 'health stats': \{'fatal_error': 0, 'health value': 1000, 'interrupts': 0, 'dropped': 0, 'aborted': 0, 'no route': 0, 'timeouts': 0, 'error': 0, 'ping_count': 0, 'next_ping': 0}, 'lnd tunables': \{'conns_per_peer': 1}, 'dev cpt': -1, 'CPT': '[0]'}, \{'nid': '10.240.43.109@tcp', 'statistics': {'send_count': 6, 'recv_count': 6, 'drop_count': 0}, 'udsp info': \{'net priority': -1, 'nid priority': 0}, 'sent_stats': \{'put': 1, 'get': 5, 'reply': 0, 'ack': 0, 'hello': 0}, 'received_stats': \{'put': 1, 'get': 0, 'reply': 5, 'ack': 0, 'hello': 0}, 'dropped_stats': \{'put': 0, 'get': 0, 'reply': 0, 'ack': 0, 'hello': 0}, 'health stats': \{'fatal_error': 0, 'health value': 1000, 'interrupts': 0, 'dropped': 0, 'aborted': 0, 'no route': 0, 'timeouts': 0, 'error': 0, 'ping_count': 0, 'next_ping': 0}, 'lnd tunables': \{'conns_per_peer': 1}, 'dev cpt': -1, 'CPT': '[0]'}, \{'nid': '10.240.43.110@tcp', 'statistics': {'send_count': 0, 'recv_count': 0, 'drop_count': 0}, 'udsp info': \{'net priority': -1, 'nid priority': -1}, 'sent_stats': \{'put': 0, 'get': 0, 'reply': 0, 'ack': 0, 'hello': 0}, 'received_stats': \{'put': 0, 'get': 0, 'reply': 0, 'ack': 0, 'hello': 0}, 'dropped_stats': \{'put': 0, 'get': 0, 'reply': 0, 'ack': 0, 'hello': 0}, 'health stats': \{'fatal_error': 0, 'health value': 1000, 'interrupts': 0, 'dropped': 0, 'aborted': 0, 'no route': 0, 'timeouts': 0, 'error': 0, 'ping_count': 0, 'next_ping': 0}, 'lnd tunables': \{'conns_per_peer': 1}, 'dev cpt': -1, 'CPT': '[0]'}, \{'nid': '10.240.43.117@tcp', 'statistics': {'send_count': 0, 'recv_count': 0, 'drop_count': 0}, 'udsp info': \{'net priority': -1, 'nid priority': -1}, 'sent_stats': \{'put': 0, 'get': 0, 'reply': 0, 'ack': 0, 'hello': 0}, 'received_stats': \{'put': 0, 'get': 0, 'reply': 0, 'ack': 0, 'hello': 0}, 'dropped_stats': \{'put': 0, 'get': 0, 'reply': 0, 'ack': 0, 'hello': 0}, 'health stats': \{'fatal_error': 0, 'health value': 1000, 'interrupts': 0, 'dropped': 0, 'aborted': 0, 'no route': 0, 'timeouts': 0, 'error': 0, 'ping_count': 0, 'next_ping': 0}, 'lnd tunables': \{'conns_per_peer': 1}, 'dev cpt': -1, 'CPT': '[0]'}]}]

{0: 2, 1: 1} \{0: 7, 1: 6}

[\{'net type': 'tcp', 'local NI(s)': [{'nid': '10.240.43.102@tcp', 'statistics': {'send_count': 7, 'recv_count': 7, 'drop_count': 0}, 'udsp info': \{'net priority': -1, 'nid priority': -1}, 'sent_stats': \{'put': 1, 'get': 6, 'reply': 0, 'ack': 0, 'hello': 0}, 'received_stats': \{'put': 0, 'get': 0, 'reply': 6, 'ack': 1, 'hello': 0}, 'dropped_stats': \{'put': 0, 'get': 0, 'reply': 0, 'ack': 0, 'hello': 0}, 'health stats': \{'fatal_error': 0, 'health value': 1000, 'interrupts': 0, 'dropped': 0, 'aborted': 0, 'no route': 0, 'timeouts': 0, 'error': 0, 'ping_count': 0, 'next_ping': 0}, 'lnd tunables': \{'conns_per_peer': 1}, 'dev cpt': -1, 'CPT': '[0]'}, \{'nid': '10.240.43.109@tcp', 'statistics': {'send_count': 6, 'recv_count': 6, 'drop_count': 0}, 'udsp info': \{'net priority': -1, 'nid priority': -1}, 'sent_stats': \{'put': 1, 'get': 5, 'reply': 0, 'ack': 0, 'hello': 0}, 'received_stats': \{'put': 1, 'get': 0, 'reply': 5, 'ack': 0, 'hello': 0}, 'dropped_stats': \{'put': 0, 'get': 0, 'reply': 0, 'ack': 0, 'hello': 0}, 'health stats': \{'fatal_error': 0, 'health value': 1000, 'interrupts': 0, 'dropped': 0, 'aborted': 0, 'no route': 0, 'timeouts': 0, 'error': 0, 'ping_count': 0, 'next_ping': 0}, 'lnd tunables': \{'conns_per_peer': 1}, 'dev cpt': -1, 'CPT': '[0]'}, \{'nid': '10.240.43.110@tcp', 'statistics': {'send_count': 0, 'recv_count': 0, 'drop_count': 0}, 'udsp info': \{'net priority': -1, 'nid priority': -1}, 'sent_stats': \{'put': 0, 'get': 0, 'reply': 0, 'ack': 0, 'hello': 0}, 'received_stats': \{'put': 0, 'get': 0, 'reply': 0, 'ack': 0, 'hello': 0}, 'dropped_stats': \{'put': 0, 'get': 0, 'reply': 0, 'ack': 0, 'hello': 0}, 'health stats': \{'fatal_error': 0, 'health value': 1000, 'interrupts': 0, 'dropped': 0, 'aborted': 0, 'no route': 0, 'timeouts': 0, 'error': 0, 'ping_count': 0, 'next_ping': 0}, 'lnd tunables': \{'conns_per_peer': 1}, 'dev cpt': -1, 'CPT': '[0]'}, \{'nid': '10.240.43.117@tcp', 'statistics': {'send_count': 0, 'recv_count': 0, 'drop_count': 0}, 'udsp info': \{'net priority': -1, 'nid priority': -1}, 'sent_stats': \{'put': 0, 'get': 0, 'reply': 0, 'ack': 0, 'hello': 0}, 'received_stats': \{'put': 0, 'get': 0, 'reply': 0, 'ack': 0, 'hello': 0}, 'dropped_stats': \{'put': 0, 'get': 0, 'reply': 0, 'ack': 0, 'hello': 0}, 'health stats': \{'fatal_error': 0, 'health value': 1000, 'interrupts': 0, 'dropped': 0, 'aborted': 0, 'no route': 0, 'timeouts': 0, 'error': 0, 'ping_count': 0, 'next_ping': 0}, 'lnd tunables': \{'conns_per_peer': 1}, 'dev cpt': -1, 'CPT': '[0]'}]}]

[\{'net type': 'tcp', 'local NI(s)': [{'nid': '10.240.43.102@tcp', 'statistics': {'send_count': 9, 'recv_count': 9, 'drop_count': 0}, 'udsp info': \{'net priority': -1, 'nid priority': -1}, 'sent_stats': \{'put': 1, 'get': 8, 'reply': 0, 'ack': 0, 'hello': 0}, 'received_stats': \{'put': 0, 'get': 0, 'reply': 8, 'ack': 1, 'hello': 0}, 'dropped_stats': \{'put': 0, 'get': 0, 'reply': 0, 'ack': 0, 'hello': 0}, 'health stats': \{'fatal_error': 0, 'health value': 1000, 'interrupts': 0, 'dropped': 0, 'aborted': 0, 'no route': 0, 'timeouts': 0, 'error': 0, 'ping_count': 0, 'next_ping': 0}, 'lnd tunables': \{'conns_per_peer': 1}, 'dev cpt': -1, 'CPT': '[0]'}, \{'nid': '10.240.43.109@tcp', 'statistics': {'send_count': 8, 'recv_count': 8, 'drop_count': 0}, 'udsp info': \{'net priority': -1, 'nid priority': -1}, 'sent_stats': \{'put': 1, 'get': 7, 'reply': 0, 'ack': 0, 'hello': 0}, 'received_stats': \{'put': 1, 'get': 0, 'reply': 7, 'ack': 0, 'hello': 0}, 'dropped_stats': \{'put': 0, 'get': 0, 'reply': 0, 'ack': 0, 'hello': 0}, 'health stats': \{'fatal_error': 0, 'health value': 1000, 'interrupts': 0, 'dropped': 0, 'aborted': 0, 'no route': 0, 'timeouts': 0, 'error': 0, 'ping_count': 0, 'next_ping': 0}, 'lnd tunables': \{'conns_per_peer': 1}, 'dev cpt': -1, 'CPT': '[0]'}, \{'nid': '10.240.43.110@tcp', 'statistics': {'send_count': 3, 'recv_count': 3, 'drop_count': 0}, 'udsp info': \{'net priority': -1, 'nid priority': -1}, 'sent_stats': \{'put': 0, 'get': 3, 'reply': 0, 'ack': 0, 'hello': 0}, 'received_stats': \{'put': 0, 'get': 0, 'reply': 3, 'ack': 0, 'hello': 0}, 'dropped_stats': \{'put': 0, 'get': 0, 'reply': 0, 'ack': 0, 'hello': 0}, 'health stats': \{'fatal_error': 0, 'health value': 1000, 'interrupts': 0, 'dropped': 0, 'aborted': 0, 'no route': 0, 'timeouts': 0, 'error': 0, 'ping_count': 0, 'next_ping': 0}, 'lnd tunables': \{'conns_per_peer': 1}, 'dev cpt': -1, 'CPT': '[0]'}, \{'nid': '10.240.43.117@tcp', 'statistics': {'send_count': 3, 'recv_count': 3, 'drop_count': 0}, 'udsp info': \{'net priority': -1, 'nid priority': -1}, 'sent_stats': \{'put': 0, 'get': 3, 'reply': 0, 'ack': 0, 'hello': 0}, 'received_stats': \{'put': 0, 'get': 0, 'reply': 3, 'ack': 0, 'hello': 0}, 'dropped_stats': \{'put': 0, 'get': 0, 'reply': 0, 'ack': 0, 'hello': 0}, 'health stats': \{'fatal_error': 0, 'health value': 1000, 'interrupts': 0, 'dropped': 0, 'aborted': 0, 'no route': 0, 'timeouts': 0, 'error': 0, 'ping_count': 0, 'next_ping': 0}, 'lnd tunables': \{'conns_per_peer': 1}, 'dev cpt': -1, 'CPT': '[0]'}]}]

{0: 7, 1: 6, 2: 0, 3: 0} \{0: 9, 1: 8, 2: 3, 3: 3}

uneven tx traffic distribution across interfaces

According to UDSP test plan, single_net_04 tests:

Setup: configure single network, 3 NIDs on the network 
Add UDSP rule that gives two of the interfaces highest priority
Start traffic
Stop traffic
Verify that two NIDs with the highest priority were used
Add UDSP that lowers the priority of both of the NIDs with the highest priority back to default
Start traffic
Stop traffic
Verify that all NIDs were used

Generated at Sat Feb 10 03:20:56 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.