Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-18494

sanity-lnet test_253: Error: 'Failed cleanup' also memory leaks detected

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for sbansal <sbansal@whamcloud.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/91397e75-b4cb-4d4d-820b-2f408aca5262

      test_253 failed with the following error:

      Failed cleanup
      

      Test session details:
      clients: https://build.whamcloud.com/job/lustre-b_es6_0/730 - 4.18.0-513.24.1.el8_9.aarch64
      servers: https://build.whamcloud.com/job/lustre-b_es6_0/730 - 3.10.0-1160.102.1.el7_lustre.ddn17.x86_64

      Error: 'Failed cleanup'
      Failure Rate: 0.00% of most recent 15 runs, 85 skipped (all branches)
      MDS 1 (trevis-102vm9)
      sanity-lnet.test_253.debug_log.trevis-102vm9.1732368402.log [ Download | Show | Preview 50 ]
      sanity-lnet.test_253.dmesg.trevis-102vm9.1732368402.log [ Download | Show | Preview 50 ]
      Client 1 (trevis-108vm7)
      sanity-lnet.test_253.debug_log.trevis-108vm7.1732368402.log [ Download | Show | Preview 50 ]
      sanity-lnet.test_253.dmesg.trevis-108vm7.1732368402.log [ Download | Show | Preview 50 ]
      sanity-lnet.test_253.test_log.trevis-108vm7.log [ Download | Show | Preview 250 ]
      local NI(s):

      • nid: 10.240.44.210@tcp
        status: up
        interfaces:
        0: eth0
      • primary nid: 10.240.44.52@tcp
      • nid: 10.240.44.52@tcp
        health stats:
        health value: 1000
        debug=+net
        /usr/sbin/lnetctl set transaction_timeout 10
        Added delay rule 10.240.44.210@tcp->10.240.44.52@tcp (1/1)
        Issued 8 pings to 10.240.44.52@tcp from 10.240.44.210@tcp
        Removed 1 delay rules
        manage:
      • ping:
        errno: -1
        descr: failed to ping 10.240.44.52@tcp: Connection timed out

      ping:

      • primary nid: 10.240.44.52@tcp
        Multi-Rail: True
        peer ni:
      • nid: 10.240.44.52@tcp
        /usr/sbin/lnetctl set transaction_timeout 150
        CMD: trevis-108vm7.trevis.whamcloud.com lsmod | grep lnet > /dev/null &&
        lctl dl | grep ' ST ' || true
        modules unloaded.
        CMD: trevis-102vm9 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/opt/iozone/bin:/usr/lib64/openmpi/bin:/usr/share/Modules/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/sbin:/sbin::/sbin:/bin:/usr/sbin: NAME=autotest_config bash rpc.sh unload_modules_local
        trevis-102vm9: trevis-102vm9.trevis.whamcloud.com: executing unload_modules_local
        trevis-102vm9: [43789.003225] LustreError: 24308:0:(class_obd.c:841:obdclass_exit()) obd_memory max: 520491885, leaked: 32784
        trevis-102vm9:
        trevis-102vm9: mv: cannot stat '/tmp/debug': No such file or directory
        trevis-102vm9: Memory leaks detected
        pdsh@trevis-108vm7: trevis-102vm9: ssh exited with exit code 254
        sanity-lnet test_253: @@@@@@ FAIL: Failed cleanup
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:6759:error()
        = /usr/lib64/lustre/tests/sanity-lnet.sh:2016:cleanup_health_test()
        = /usr/lib64/lustre/tests/sanity-lnet.sh:3635:test_253()
        = /usr/lib64/lustre/tests/test-framework.sh:7114:run_one()
        = /usr/lib64/lustre/tests/test-framework.sh:7175:run_one_logged()
        = /usr/lib64/lustre/tests/test-framework.sh:6998:run_test()
        = /usr/lib64/lustre/tests/sanity-lnet.sh:3637:main()
        Dumping lctl log to /autotest/autotest-2/2024-11-23/lustre-b_es6_0_full-part-1_730_209_a14f71b0-879e-45fb-9158-8cf8f4936f5e//sanity-lnet.test_253.*.1732368402.log
        CMD: trevis-102vm9,trevis-108vm7.trevis.whamcloud.com,trevis-108vm8,trevis-47vm4 /usr/sbin/lctl dk > /autotest/autotest-2/2024-11-23/lustre-b_es6_0_full-part-1_730_209_a14f71b0-879e-45fb-9158-8cf8f4936f5e//sanity-lnet.test_253.debug_log.$(hostname -s).1732368402.log;
        dmesg > /autotest/autotest-2/2024-11-23/lustre-b_es6_0_full-part-1_730_209_a14f71b0-879e-45fb-9158-8cf8f4936f5e//sanity-lnet.test_253.dmesg.$(hostname -s).1732368402.log
        trevis-102vm9: invalid parameter 'dump_kernel'
        trevis-102vm9: open(dump_kernel) failed: No such file or directory
        pdsh@trevis-108vm7: trevis-102vm9: ssh exited with exit code 2

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      sanity-lnet test_253 - Failed cleanup

      Attachments

        Issue Links

          Activity

            [LU-18494] sanity-lnet test_253: Error: 'Failed cleanup' also memory leaks detected

            Alex's patch https://review.whamcloud.com/54552 ("LU-17671 libcfs: track each OBD_ALLOC()") may help debug this if it is reproducible.

            adilger Andreas Dilger added a comment - Alex's patch https://review.whamcloud.com/54552 (" LU-17671 libcfs: track each OBD_ALLOC() ") may help debug this if it is reproducible.

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: