[LU-12688] sanity-sec test 31 fails with 'unable to configure NID o2ib999' Created: 23/Aug/19 Updated: 29/May/20 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.13.0, Lustre 2.12.3 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Nunez (Inactive) | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Environment: |
IB network |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
sanity-sec test_31 fails to configure a network on IB networks. In the client test_log, we see == sanity-sec test 31: client mount option '-o network' ============================================== 04:29:59 (1565756999)
192.168.5.148@o2ib:/lustre /mnt/lustre lustre rw,flock,user_xattr,lazystatfs 0 0
CMD: onyx-64vm1.onyx.whamcloud.com grep -c /mnt/lustre' ' /proc/mounts
Stopping client onyx-64vm1.onyx.whamcloud.com /mnt/lustre (opts:)
CMD: onyx-64vm1.onyx.whamcloud.com lsof -t /mnt/lustre
CMD: onyx-64vm1.onyx.whamcloud.com umount /mnt/lustre 2>&1
CMD: onyx-64vm4 lctl get_param -n *.MGS*.exports.'192.168.5.145@o2ib'.uuid 2>/dev/null |
grep -q -
CMD: onyx-64vm3,onyx-64vm4 lctl get_param -n *.lustre*.exports.'192.168.5.145@o2ib'.uuid 2>/dev/null | grep -q -
CMD: onyx-64vm1.onyx.whamcloud.com,onyx-64vm2,onyx-64vm3,onyx-64vm4 /usr/sbin/lnetctl lnet configure && /usr/sbin/lnetctl net add --if $(/usr/sbin/lnetctl net show --net o2ib | awk 'BEGIN{inf=0} {if (inf==1) print $2; fi; inf=0} /interfaces/{inf=1}') --net o2ib999
onyx-64vm1: add:
onyx-64vm1: - net:
onyx-64vm1: errno: -100
onyx-64vm1: descr: "cannot add network: Network is down"
onyx-64vm2: add:
onyx-64vm2: - net:
onyx-64vm2: errno: -100
onyx-64vm2: descr: "cannot add network: Network is down"
onyx-64vm4: add:
onyx-64vm4: - net:
onyx-64vm4: errno: -100
onyx-64vm4: descr: "cannot add network: Network is down"
onyx-64vm3: add:
onyx-64vm3: - net:
onyx-64vm3: errno: -100
onyx-64vm3: descr: "cannot add network: Network is down"
sanity-sec test_31: @@@@@@ FAIL: unable to configure NID o2ib999
Trace dump:
= /usr/lib64/lustre/tests/test-framework.sh:5829:error()
= /usr/lib64/lustre/tests/sanity-sec.sh:2238:test_31()
We see very similar output on the console logs for all nodes. For example on a client console, we see [31452.506785] Lustre: DEBUG MARKER: == sanity-sec test 31: client mount option '-o network' ============================================== 04:29:59 (1565756999) [31452.635972] Lustre: DEBUG MARKER: grep -c /mnt/lustre' ' /proc/mounts [31452.645394] Lustre: DEBUG MARKER: lsof -t /mnt/lustre [31452.773901] Lustre: DEBUG MARKER: umount /mnt/lustre 2>&1 [31452.817588] Lustre: Unmounted lustre-client [31452.818414] Lustre: Skipped 4 previous similar messages [31453.822622] Lustre: DEBUG MARKER: /usr/sbin/lnetctl lnet configure && /usr/sbin/lnetctl net add --if ib0 --net o2ib999 [31454.062442] LNetError: 10968:0:(o2iblnd.c:2766:kiblnd_dev_failover()) Failed to bind ib0:192.168.5.145 to device(ffff93f2f7830000): -98 [31454.064594] LNetError: 10968:0:(o2iblnd.c:3256:kiblnd_startup()) ko2iblnd: Can't initialize device: rc = -98 [31454.066261] LNetError: 105-4: Error -100 starting up LNI o2ib [31454.297769] Lustre: DEBUG MARKER: /usr/sbin/lctl mark sanity-sec test_31: @@@@@@ FAIL: unable to configure NID o2ib999 We just started seeing this issue because we started running autotesting with IB networks. Here are logs for a few failures |