[LU-9938] unload_modules() should fail on remote node errors or memory leaks Created: 01/Sep/17  Updated: 29/Jan/22

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor
Reporter: John Hammond Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: test

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

The current memory leak detection scheme in TF is not very effective. Comparing single node runs with what I see in AT I think we are failing to fail when a memory leak occurs on a remote node.

unload_modules() {
        wait_exit_ST client # bug 12845                                                   

	$LUSTRE_RMMOD ldiskfs || return 2

	if $LOAD_MODULES_REMOTE; then
                local list=$(comma_list $(remote_nodes_list))
	        if [ -n "$list" ]; then
                        echo "unloading modules on: '$list'"
	        	do_rpc_nodes "$list" $LUSTRE_RMMOD ldiskfs
                        do_rpc_nodes "$list" check_mem_leak
                fi
        fi

        local sbin_mount=$(readlink -f /sbin)/mount.lustre
	if grep -qe "$sbin_mount " /proc/mounts; then
                umount $sbin_mount || true
                [ -s $sbin_mount ] && ! grep -q "STUB MARK" $sbin_mount ||
                        rm -f $sbin_mount
        fi

        check_mem_leak || return 254
        ...

Furthermore, cleanupall() does not check the return value of unload_modules() so it may be that we are missing memory leaks when we cleanup at the end of most test scripts.


Generated at Sat Feb 10 02:30:37 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.