[LU-9938] unload_modules() should fail on remote node errors or memory leaks Created: 01/Sep/17 Updated: 29/Jan/22 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Minor |
| Reporter: | John Hammond | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | test | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
The current memory leak detection scheme in TF is not very effective. Comparing single node runs with what I see in AT I think we are failing to fail when a memory leak occurs on a remote node. unload_modules() {
wait_exit_ST client # bug 12845
$LUSTRE_RMMOD ldiskfs || return 2
if $LOAD_MODULES_REMOTE; then
local list=$(comma_list $(remote_nodes_list))
if [ -n "$list" ]; then
echo "unloading modules on: '$list'"
do_rpc_nodes "$list" $LUSTRE_RMMOD ldiskfs
do_rpc_nodes "$list" check_mem_leak
fi
fi
local sbin_mount=$(readlink -f /sbin)/mount.lustre
if grep -qe "$sbin_mount " /proc/mounts; then
umount $sbin_mount || true
[ -s $sbin_mount ] && ! grep -q "STUB MARK" $sbin_mount ||
rm -f $sbin_mount
fi
check_mem_leak || return 254
...
Furthermore, cleanupall() does not check the return value of unload_modules() so it may be that we are missing memory leaks when we cleanup at the end of most test scripts. |