[LU-17004] sanity-lnet test_222: FAIL: Failed to delete tcp2 route Created: 31/Jul/23  Updated: 22/Jan/24  Resolved: 22/Jan/24

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.16.0

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for jianyu <yujian@whamcloud.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/fd28874f-00e5-4527-a75a-1fe1521a95ca

test_222 failed with the following error:

CMD: onyx-52vm3 /usr/sbin/lnetctl route show -v
/usr/sbin/lnetctl ping 10.240.24.130@tcp2
manage:
    - ping:
          errno: -1
          descr: failed to ping 10.240.24.130@tcp2: Network is down
                 
CMD: onyx-52vm3 /usr/sbin/lnetctl ping 10.240.24.128@tcp1
onyx-52vm3: manage:
onyx-52vm3:     - ping:
onyx-52vm3:           errno: -1
onyx-52vm3:           descr: failed to ping 10.240.24.128@tcp1: No route to host
onyx-52vm3:                  
CMD: onyx-52vm1.onyx.whamcloud.com if /usr/sbin/lnetctl route show --net tcp2 --gateway 10.240.24.129@tcp1; then 				/usr/sbin/lnetctl route del --net tcp2 --gateway 10.240.24.129@tcp1;   			 else						       				exit 0;					       			 fi
del:
    - route:
          errno: -100
          descr: route operation failed: Network is down
 sanity-lnet test_222: @@@@@@ FAIL: Failed to delete tcp2 route 

Test session details:
clients: https://build.whamcloud.com/job/lustre-reviews/96504 - 3.10.0-1160.90.1.el7.x86_64
servers: https://build.whamcloud.com/job/lustre-reviews/96504 - 3.10.0-1160.95.1.el7_lustre.x86_64

<<Please provide additional information about the failure here>>

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity-lnet test_222 - Failed to delete tcp2 route



 Comments   
Comment by Jian Yu [ 04/Aug/23 ]

Hi ssmirnov,
Could you please take a look at this failure?
RHEL 7.9 kernel update patch https://review.whamcloud.com/51801 review testing is blocked by this failure.

Comment by Serguei Smirnov [ 09/Aug/23 ]

Looks like there's currently a difference in behaviour between Centos 7.9 and later kernels:

"lnetctl route show" executed without any nets configured (i.e. "lnetctl lnet configure" was never run) results in some sort of error on other kernels, while on Centos 7.9 it returns nothing. 

Sanity-lnet test cases which cover routing, when cleaning up, use a sub-routine (do_route_del) which executes "lnetctl route show" on a specific route before attempting to delete the route. Sometimes do_route_del is called when network not configured (or got unconfigured) and in that case on Centos 7.9 the sub-routine is going to attempt removal of non-existing route and report an error.

Comment by James A Simmons [ 22/Jan/24 ]

https://review.whamcloud.com/c/fs/lustre-release/+/53366 resolves this issue

Generated at Sat Feb 10 03:31:48 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.