[LU-15363] Don't use lustre modules to test LNet with sanity-lnet Created: 12/Dec/21  Updated: 29/Jul/23  Resolved: 03/Apr/22

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.15.0
Fix Version/s: Lustre 2.15.0

Type: Improvement Priority: Minor
Reporter: James A Simmons Assignee: James A Simmons
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-12511 Prepare lustre for adoption into the ... Open
Rank (Obsolete): 9223372036854775807

 Description   

A few sanity-lnet test were failing for the native Linux client. The reason for this is by default the LNet stack is initialized with LNET_PID_ANY which doesn't automatically setup LNet with the module parameters. Currently sanity-lnet works around this by loading the lustre modules which initialize the LNet stack with LNET_PID_LUSTRE which does properly setup the lnet stack. This doesn't work for the native Linux client since Lustre doesn't start the LNet at module loading but mounting which sanity-lnet doesn't do.



 Comments   
Comment by Gerrit Updater [ 12/Dec/21 ]

"James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/45834
Subject: LU-15363 tests: don't use lustre module to test lnet
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 7fd4a2145aed2df28e454b37a64870179cc2e2f7

Comment by Chris Horn [ 15/Dec/21 ]

Patch breaks running the test suite out of a build directory:

sles15build01:/home/hornc/lustre-filesystem/lustre/tests # ./auster -N -v sanity-lnet
Started at Wed Dec 15 14:34:08 CST 2021
sles15build01: executing check_logdir /tmp/test_logs/2021-12-15/143407
sles15build01: ../libcfs/libcfs/libcfs options: 'libcfs_debug=320735104 libcfs_subsystem_debug=-2049'
Logging to shared log directory: /tmp/test_logs/2021-12-15/143407
sles15build01: executing yml_node
IOC_LIBCFS_GET_NI error 22: Invalid argument
Client: 2.14.55.170
MDS: 2.14.55.170
OSS: 2.14.55.170
running: sanity-lnet
run_suite sanity-lnet /home/hornc/lustre-filesystem/lustre/tests/sanity-lnet.sh
-----============= acceptance-small: sanity-lnet ============----- Wed Dec 15 14:34:11 CST 2021
Running: bash /home/hornc/lustre-filesystem/lustre/tests/sanity-lnet.sh
excepting tests:
opening /dev/obd failed: No such file or directory
hint: the kernel modules may not be loaded
Stopping clients: sles15build01 /mnt/lustre (opts:-f)
Stopping clients: sles15build01 /mnt/lustre2 (opts:-f)
modules unloaded.
ip netns exec test_ns ip addr add 10.1.2.3/31 dev test1pg
ip netns exec test_ns ip link set test1pg up
libkmod: kmod_module_get_holders: could not open '/sys/module/x86_pkg_temp_thermal/holders': No such file or directory
libkmod: kmod_module_get_holders: could not open '/sys/module/pcc_cpufreq/holders': No such file or directory
../libcfs/libcfs/libcfs options: 'libcfs_debug=320735104 libcfs_subsystem_debug=-2049'
../lnet/lnet/lnet options: 'config_on_load=1'
IOC_LIBCFS_GET_NI error 100: Network is down
 sanity-lnet : @@@@@@ FAIL: No NID configured after module load
  Trace dump:
  = /home/hornc/lustre-filesystem/lustre/tests/test-framework.sh:6336:error()
  = /home/hornc/lustre-filesystem/lustre/tests/sanity-lnet.sh:255:main()
Dumping lctl log to /tmp/test_logs/2021-12-15/143407/sanity-lnet..*.1639600455.log
Dumping logs only on local client.
sanity-lnet returned 1
Finished at Wed Dec 15 14:34:15 CST 2021 in 8s
./auster: completed with rc 0
sles15build01:/home/hornc/lustre-filesystem/lustre/tests # dmesg | tail
[3728796.892739] Lustre: DEBUG MARKER: -----============= acceptance-small: sanity-lnet ============----- Wed Dec 15 14:34:11 CST 2021
[3728797.623735] Lustre: DEBUG MARKER: excepting tests:
[3728797.872111] device-mapper: uevent: version 1.0.3
[3728797.872371] device-mapper: ioctl: 4.37.0-ioctl (2017-09-20) initialised: dm-devel@redhat.com
[3728798.910153] IPv6: ADDRCONF(NETDEV_UP): test1pl: link is not ready
[3728798.924060] IPv6: ADDRCONF(NETDEV_CHANGE): test1pl: link becomes ready
[3728799.060046] LNet: HW NUMA nodes: 4, HW CPU cores: 32, npartitions: 4
[3728799.063595] alg: No test for adler32 (adler32-zlib)
[3728799.954055] LNetError: 5456:0:(api-ni.c:2574:lnet_startup_lndnet()) Can't load LND tcp, module ksocklnd, rc=256
[3728800.129812] Lustre: DEBUG MARKER: sanity-lnet : @@@@@@ FAIL: No NID configured after module load
sles15build01:/home/hornc/lustre-filesystem/lustre/tests #
Comment by James A Simmons [ 01/Feb/22 ]

The reason for this is due to LNet internally calling request_module() to load the LNDs. That only works if the LND modules are in the standard /lib/modules so it the sand box approach this breaks. I'm playing with an idea of making config_on_load and module parameter something  that can be set latter after both lnet core and the LND drivers are loaded. Then the setup could be done. That way request_modules() is never called.

Comment by Gerrit Updater [ 03/Apr/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45834/
Subject: LU-15363 tests: don't use lustre module to test lnet
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: e41f91dc90a0977f7ea85b199b7e5809c56b810e

Comment by Peter Jones [ 03/Apr/22 ]

Landed for 2.15

Generated at Sat Feb 10 03:17:40 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.