Details
-
Bug
-
Resolution: Cannot Reproduce
-
Minor
-
None
-
Lustre 2.13.0, Lustre 2.12.3
-
None
-
IB network
-
3
-
9223372036854775807
Description
sanity-sec test_31 fails to configure a network on IB networks.
In the client test_log, we see
== sanity-sec test 31: client mount option '-o network' ============================================== 04:29:59 (1565756999) 192.168.5.148@o2ib:/lustre /mnt/lustre lustre rw,flock,user_xattr,lazystatfs 0 0 CMD: onyx-64vm1.onyx.whamcloud.com grep -c /mnt/lustre' ' /proc/mounts Stopping client onyx-64vm1.onyx.whamcloud.com /mnt/lustre (opts:) CMD: onyx-64vm1.onyx.whamcloud.com lsof -t /mnt/lustre CMD: onyx-64vm1.onyx.whamcloud.com umount /mnt/lustre 2>&1 CMD: onyx-64vm4 lctl get_param -n *.MGS*.exports.'192.168.5.145@o2ib'.uuid 2>/dev/null | grep -q - CMD: onyx-64vm3,onyx-64vm4 lctl get_param -n *.lustre*.exports.'192.168.5.145@o2ib'.uuid 2>/dev/null | grep -q - CMD: onyx-64vm1.onyx.whamcloud.com,onyx-64vm2,onyx-64vm3,onyx-64vm4 /usr/sbin/lnetctl lnet configure && /usr/sbin/lnetctl net add --if $(/usr/sbin/lnetctl net show --net o2ib | awk 'BEGIN{inf=0} {if (inf==1) print $2; fi; inf=0} /interfaces/{inf=1}') --net o2ib999 onyx-64vm1: add: onyx-64vm1: - net: onyx-64vm1: errno: -100 onyx-64vm1: descr: "cannot add network: Network is down" onyx-64vm2: add: onyx-64vm2: - net: onyx-64vm2: errno: -100 onyx-64vm2: descr: "cannot add network: Network is down" onyx-64vm4: add: onyx-64vm4: - net: onyx-64vm4: errno: -100 onyx-64vm4: descr: "cannot add network: Network is down" onyx-64vm3: add: onyx-64vm3: - net: onyx-64vm3: errno: -100 onyx-64vm3: descr: "cannot add network: Network is down" sanity-sec test_31: @@@@@@ FAIL: unable to configure NID o2ib999 Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:5829:error() = /usr/lib64/lustre/tests/sanity-sec.sh:2238:test_31()
We see very similar output on the console logs for all nodes. For example on a client console, we see
[31452.506785] Lustre: DEBUG MARKER: == sanity-sec test 31: client mount option '-o network' ============================================== 04:29:59 (1565756999) [31452.635972] Lustre: DEBUG MARKER: grep -c /mnt/lustre' ' /proc/mounts [31452.645394] Lustre: DEBUG MARKER: lsof -t /mnt/lustre [31452.773901] Lustre: DEBUG MARKER: umount /mnt/lustre 2>&1 [31452.817588] Lustre: Unmounted lustre-client [31452.818414] Lustre: Skipped 4 previous similar messages [31453.822622] Lustre: DEBUG MARKER: /usr/sbin/lnetctl lnet configure && /usr/sbin/lnetctl net add --if ib0 --net o2ib999 [31454.062442] LNetError: 10968:0:(o2iblnd.c:2766:kiblnd_dev_failover()) Failed to bind ib0:192.168.5.145 to device(ffff93f2f7830000): -98 [31454.064594] LNetError: 10968:0:(o2iblnd.c:3256:kiblnd_startup()) ko2iblnd: Can't initialize device: rc = -98 [31454.066261] LNetError: 105-4: Error -100 starting up LNI o2ib [31454.297769] Lustre: DEBUG MARKER: /usr/sbin/lctl mark sanity-sec test_31: @@@@@@ FAIL: unable to configure NID o2ib999
We just started seeing this issue because we started running autotesting with IB networks.
Here are logs for a few failures
https://testing.whamcloud.com/test_sets/d9f710b0-b662-11e9-9f36-52540065bddc
https://testing.whamcloud.com/test_sets/fedcdcb8-bb0b-11e9-97d5-52540065bddc
https://testing.whamcloud.com/test_sets/44209a52-bc3e-11e9-98c8-52540065bddc
Attachments
Issue Links
- is related to
-
LU-12312 sanity-sec: test_31: 'network' mount option cannot be taken into account
- Resolved