Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
This issue was created by maloo for sbansal <sbansal@whamcloud.com>
This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/91397e75-b4cb-4d4d-820b-2f408aca5262
test_253 failed with the following error:
Failed cleanup
Test session details:
clients: https://build.whamcloud.com/job/lustre-b_es6_0/730 - 4.18.0-513.24.1.el8_9.aarch64
servers: https://build.whamcloud.com/job/lustre-b_es6_0/730 - 3.10.0-1160.102.1.el7_lustre.ddn17.x86_64
Error: 'Failed cleanup'
Failure Rate: 0.00% of most recent 15 runs, 85 skipped (all branches)
MDS 1 (trevis-102vm9)
sanity-lnet.test_253.debug_log.trevis-102vm9.1732368402.log [ Download | Show | Preview 50 ]
sanity-lnet.test_253.dmesg.trevis-102vm9.1732368402.log [ Download | Show | Preview 50 ]
Client 1 (trevis-108vm7)
sanity-lnet.test_253.debug_log.trevis-108vm7.1732368402.log [ Download | Show | Preview 50 ]
sanity-lnet.test_253.dmesg.trevis-108vm7.1732368402.log [ Download | Show | Preview 50 ]
sanity-lnet.test_253.test_log.trevis-108vm7.log [ Download | Show | Preview 250 ]
local NI(s):
- nid: 10.240.44.210@tcp
status: up
interfaces:
0: eth0 - primary nid: 10.240.44.52@tcp
- nid: 10.240.44.52@tcp
health stats:
health value: 1000
debug=+net
/usr/sbin/lnetctl set transaction_timeout 10
Added delay rule 10.240.44.210@tcp->10.240.44.52@tcp (1/1)
Issued 8 pings to 10.240.44.52@tcp from 10.240.44.210@tcp
Removed 1 delay rules
manage: - ping:
errno: -1
descr: failed to ping 10.240.44.52@tcp: Connection timed out
ping:
- primary nid: 10.240.44.52@tcp
Multi-Rail: True
peer ni: - nid: 10.240.44.52@tcp
/usr/sbin/lnetctl set transaction_timeout 150
CMD: trevis-108vm7.trevis.whamcloud.com lsmod | grep lnet > /dev/null &&
lctl dl | grep ' ST ' || true
modules unloaded.
CMD: trevis-102vm9 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/opt/iozone/bin:/usr/lib64/openmpi/bin:/usr/share/Modules/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/sbin:/sbin::/sbin:/bin:/usr/sbin: NAME=autotest_config bash rpc.sh unload_modules_local
trevis-102vm9: trevis-102vm9.trevis.whamcloud.com: executing unload_modules_local
trevis-102vm9: [43789.003225] LustreError: 24308:0:(class_obd.c:841:obdclass_exit()) obd_memory max: 520491885, leaked: 32784
trevis-102vm9:
trevis-102vm9: mv: cannot stat '/tmp/debug': No such file or directory
trevis-102vm9: Memory leaks detected
pdsh@trevis-108vm7: trevis-102vm9: ssh exited with exit code 254
sanity-lnet test_253: @@@@@@ FAIL: Failed cleanup
Trace dump:
= /usr/lib64/lustre/tests/test-framework.sh:6759:error()
= /usr/lib64/lustre/tests/sanity-lnet.sh:2016:cleanup_health_test()
= /usr/lib64/lustre/tests/sanity-lnet.sh:3635:test_253()
= /usr/lib64/lustre/tests/test-framework.sh:7114:run_one()
= /usr/lib64/lustre/tests/test-framework.sh:7175:run_one_logged()
= /usr/lib64/lustre/tests/test-framework.sh:6998:run_test()
= /usr/lib64/lustre/tests/sanity-lnet.sh:3637:main()
Dumping lctl log to /autotest/autotest-2/2024-11-23/lustre-b_es6_0_full-part-1_730_209_a14f71b0-879e-45fb-9158-8cf8f4936f5e//sanity-lnet.test_253.*.1732368402.log
CMD: trevis-102vm9,trevis-108vm7.trevis.whamcloud.com,trevis-108vm8,trevis-47vm4 /usr/sbin/lctl dk > /autotest/autotest-2/2024-11-23/lustre-b_es6_0_full-part-1_730_209_a14f71b0-879e-45fb-9158-8cf8f4936f5e//sanity-lnet.test_253.debug_log.$(hostname -s).1732368402.log;
dmesg > /autotest/autotest-2/2024-11-23/lustre-b_es6_0_full-part-1_730_209_a14f71b0-879e-45fb-9158-8cf8f4936f5e//sanity-lnet.test_253.dmesg.$(hostname -s).1732368402.log
trevis-102vm9: invalid parameter 'dump_kernel'
trevis-102vm9: open(dump_kernel) failed: No such file or directory
pdsh@trevis-108vm7: trevis-102vm9: ssh exited with exit code 2
VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity-lnet test_253 - Failed cleanup
Alex's patch https://review.whamcloud.com/54552 ("LU-17671 libcfs: track each OBD_ALLOC()") may help debug this if it is reproducible.