[LU-12688] sanity-sec test 31 fails with 'unable to configure NID o2ib999' Created: 23/Aug/19  Updated: 29/May/20

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.13.0, Lustre 2.12.3
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: James Nunez (Inactive) Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None
Environment:

IB network


Issue Links:
Related
is related to LU-12312 sanity-sec: test_31: 'network' mount ... Reopened
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

sanity-sec test_31 fails to configure a network on IB networks.

In the client test_log, we see

== sanity-sec test 31: client mount option '-o network' ============================================== 04:29:59 (1565756999)
192.168.5.148@o2ib:/lustre /mnt/lustre lustre rw,flock,user_xattr,lazystatfs 0 0
CMD: onyx-64vm1.onyx.whamcloud.com grep -c /mnt/lustre' ' /proc/mounts
Stopping client onyx-64vm1.onyx.whamcloud.com /mnt/lustre (opts:)
CMD: onyx-64vm1.onyx.whamcloud.com lsof -t /mnt/lustre
CMD: onyx-64vm1.onyx.whamcloud.com umount  /mnt/lustre 2>&1
CMD: onyx-64vm4 lctl get_param -n *.MGS*.exports.'192.168.5.145@o2ib'.uuid 2>/dev/null |
		      grep -q -
CMD: onyx-64vm3,onyx-64vm4 lctl get_param -n *.lustre*.exports.'192.168.5.145@o2ib'.uuid 		  2>/dev/null | grep -q -
CMD: onyx-64vm1.onyx.whamcloud.com,onyx-64vm2,onyx-64vm3,onyx-64vm4 /usr/sbin/lnetctl lnet configure && /usr/sbin/lnetctl net add --if 		  $(/usr/sbin/lnetctl net show --net o2ib | awk 'BEGIN{inf=0} 		  {if (inf==1) print $2; fi; inf=0} /interfaces/{inf=1}') 		  --net o2ib999
onyx-64vm1: add:
onyx-64vm1:     - net:
onyx-64vm1:           errno: -100
onyx-64vm1:           descr: "cannot add network: Network is down"
onyx-64vm2: add:
onyx-64vm2:     - net:
onyx-64vm2:           errno: -100
onyx-64vm2:           descr: "cannot add network: Network is down"
onyx-64vm4: add:
onyx-64vm4:     - net:
onyx-64vm4:           errno: -100
onyx-64vm4:           descr: "cannot add network: Network is down"
onyx-64vm3: add:
onyx-64vm3:     - net:
onyx-64vm3:           errno: -100
onyx-64vm3:           descr: "cannot add network: Network is down"
 sanity-sec test_31: @@@@@@ FAIL: unable to configure NID o2ib999 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:5829:error()
  = /usr/lib64/lustre/tests/sanity-sec.sh:2238:test_31()

We see very similar output on the console logs for all nodes. For example on a client console, we see

[31452.506785] Lustre: DEBUG MARKER: == sanity-sec test 31: client mount option '-o network' ============================================== 04:29:59 (1565756999)
[31452.635972] Lustre: DEBUG MARKER: grep -c /mnt/lustre' ' /proc/mounts
[31452.645394] Lustre: DEBUG MARKER: lsof -t /mnt/lustre
[31452.773901] Lustre: DEBUG MARKER: umount /mnt/lustre 2>&1
[31452.817588] Lustre: Unmounted lustre-client
[31452.818414] Lustre: Skipped 4 previous similar messages
[31453.822622] Lustre: DEBUG MARKER: /usr/sbin/lnetctl lnet configure && /usr/sbin/lnetctl net add --if 		  ib0 		  --net o2ib999
[31454.062442] LNetError: 10968:0:(o2iblnd.c:2766:kiblnd_dev_failover()) Failed to bind ib0:192.168.5.145 to device(ffff93f2f7830000): -98
[31454.064594] LNetError: 10968:0:(o2iblnd.c:3256:kiblnd_startup()) ko2iblnd: Can't initialize device: rc = -98
[31454.066261] LNetError: 105-4: Error -100 starting up LNI o2ib
[31454.297769] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  sanity-sec test_31: @@@@@@ FAIL: unable to configure NID o2ib999 

We just started seeing this issue because we started running autotesting with IB networks.

Here are logs for a few failures
https://testing.whamcloud.com/test_sets/d9f710b0-b662-11e9-9f36-52540065bddc
https://testing.whamcloud.com/test_sets/fedcdcb8-bb0b-11e9-97d5-52540065bddc
https://testing.whamcloud.com/test_sets/44209a52-bc3e-11e9-98c8-52540065bddc


Generated at Sat Feb 10 02:54:47 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.