Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9938

unload_modules() should fail on remote node errors or memory leaks

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      The current memory leak detection scheme in TF is not very effective. Comparing single node runs with what I see in AT I think we are failing to fail when a memory leak occurs on a remote node.

      unload_modules() {
              wait_exit_ST client # bug 12845                                                   
      
      	$LUSTRE_RMMOD ldiskfs || return 2
      
      	if $LOAD_MODULES_REMOTE; then
                      local list=$(comma_list $(remote_nodes_list))
      	        if [ -n "$list" ]; then
                              echo "unloading modules on: '$list'"
      	        	do_rpc_nodes "$list" $LUSTRE_RMMOD ldiskfs
                              do_rpc_nodes "$list" check_mem_leak
                      fi
              fi
      
              local sbin_mount=$(readlink -f /sbin)/mount.lustre
      	if grep -qe "$sbin_mount " /proc/mounts; then
                      umount $sbin_mount || true
                      [ -s $sbin_mount ] && ! grep -q "STUB MARK" $sbin_mount ||
                              rm -f $sbin_mount
              fi
      
              check_mem_leak || return 254
              ...
      

      Furthermore, cleanupall() does not check the return value of unload_modules() so it may be that we are missing memory leaks when we cleanup at the end of most test scripts.

      Attachments

        Activity

          People

            wc-triage WC Triage
            jhammond John Hammond
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: