[LU-2988] conf-sanity 66: Modules still loaded - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Minor
Fix Version/s: Lustre 2.4.0
Affects Version/s: Lustre 2.4.0
Labels:
None

Severity:
3
Rank (Obsolete):
7281

Description

It is easy to reproduce this on a single VM by running only conf-sanity 66:

== conf-sanity test 66: replace nids == 15:30:00 (1363678200)
Loading modules from /root/lustre-master/lustre/tests/..
detected 1 online CPUs by sysfs
libcfs will create CPU partition based on online CPUs
debug=-1
subsystem_debug=all -lnet -lnd -pinger
gss/krb5 is not supported
quota/lquota options: 'hash_lqs_cur_bits=3'
start mds service on linux
Starting mds1:   -o loop /tmp/lustre-mdt1 /mnt/mds1
Started lustre-MDT0000
start ost1 service on linux
Starting ost1:   -o loop /tmp/lustre-ost1 /mnt/ost1
Started lustre-OST0000
mount lustre on /mnt/lustre.....
Starting client: linux: -o user_xattr,flock linux@tcp:/lustre /mnt/lustre
replace_nids should fail if MDS, OSTs and clients are UP
error: replace_nids: Operation now in progress
umount lustre on /mnt/lustre.....
Stopping client linux /mnt/lustre (opts:)
sh: lsof: command not found
replace_nids should fail if MDS and OSTs are UP
error: replace_nids: Operation now in progress
stop ost1 service on linux
Stopping /mnt/ost1 (opts:-f) on linux
replace_nids should fail if MDS is UP
error: replace_nids: Operation now in progress
stop mds service on linux
Stopping /mnt/mds1 (opts:-f) on linux
start mds service on linux
Starting mds1: -o nosvc,loop  /tmp/lustre-mdt1 /mnt/mds1
Started lustre-MDT0000
command should accept two parameters
replace primary NIDs for a device
usage: replace_nids <device> <nid1>[,nid2,nid3]
correct device name should be passed
error: replace_nids: Invalid argument
wrong nids list should not destroy the system
replace primary NIDs for a device
usage: replace_nids <device> <nid1>[,nid2,nid3]
replace OST nid
command should accept two parameters
replace primary NIDs for a device
usage: replace_nids <device> <nid1>[,nid2,nid3]
wrong nids list should not destroy the system
replace primary NIDs for a device
usage: replace_nids <device> <nid1>[,nid2,nid3]
replace MDS nid
stop mds service on linux
Stopping /mnt/mds1 (opts:-f) on linux
start mds service on linux
Starting mds1:   -o loop /tmp/lustre-mdt1 /mnt/mds1
Started lustre-MDT0000
start ost1 service on linux
Starting ost1:   -o loop /tmp/lustre-ost1 /mnt/ost1
Started lustre-OST0000
mount lustre on /mnt/lustre.....
Starting client: linux: -o user_xattr,flock linux@tcp:/lustre /mnt/lustre
setup single mount lustre success
umount lustre on /mnt/lustre.....
Stopping client linux /mnt/lustre (opts:)
sh: lsof: command not found
stop ost1 service on linux
Stopping /mnt/ost1 (opts:-f) on linux
stop mds service on linux
Stopping /mnt/mds1 (opts:-f) on linux
Modules still loaded: 
ldiskfs/ldiskfs/ldiskfs.o lustre/mdd/mdd.o lustre/mgs/mgs.o lustre/quota/lquota.o lustre/mgc/mgc.o lustre/fid/fid.o lustre/fld/fld.o lustre/ptlrpc/ptlrpc.o lustre/obdclass/obdclass.o lustre/lvfs/lvfs.o lnet/klnds/socklnd/ksocklnd.o lnet/lnet/lnet.o libcfs/libcfs/libcfs.o
Stopping clients: linux /mnt/lustre (opts:)
Stopping clients: linux /mnt/lustre2 (opts:)
Loading modules from /root/lustre-master/lustre/tests/..
detected 1 online CPUs by sysfs
libcfs will create CPU partition based on online CPUs
debug=-1
subsystem_debug=all -lnet -lnd -pinger
gss/krb5 is not supported
Formatting mgs, mds, osts
Format mds1: /tmp/lustre-mdt1
Format ost1: /tmp/lustre-ost1
Format ost2: /tmp/lustre-ost2
Resetting fail_loc on all nodes...done.
PASS 66 (69s)
............== conf-sanity test complete, duration 113 sec == 15:31:10 (1363678270)

This prevents some of my new tests, which are placed after 66, from removing and reloading Lustre kernel modules. The root cause is that the "lctl replace_nids" implementation may leak lu_envs when certain errors happen.

I'll post a patch shortly.

Attachments

Activity

People

Assignee:: Li Wei (Inactive)

Reporter:: Li Wei (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 19/Mar/13 10:23 AM

Updated:: 02/Apr/13 12:54 AM

Resolved:: 02/Apr/13 12:54 AM