[LU-2988] conf-sanity 66: Modules still loaded Created: 19/Mar/13  Updated: 02/Apr/13  Resolved: 02/Apr/13

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: Lustre 2.4.0

Type: Bug Priority: Minor
Reporter: Li Wei (Inactive) Assignee: Li Wei (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 7281

 Description   

It is easy to reproduce this on a single VM by running only conf-sanity 66:

== conf-sanity test 66: replace nids == 15:30:00 (1363678200)
Loading modules from /root/lustre-master/lustre/tests/..
detected 1 online CPUs by sysfs
libcfs will create CPU partition based on online CPUs
debug=-1
subsystem_debug=all -lnet -lnd -pinger
gss/krb5 is not supported
quota/lquota options: 'hash_lqs_cur_bits=3'
start mds service on linux
Starting mds1:   -o loop /tmp/lustre-mdt1 /mnt/mds1
Started lustre-MDT0000
start ost1 service on linux
Starting ost1:   -o loop /tmp/lustre-ost1 /mnt/ost1
Started lustre-OST0000
mount lustre on /mnt/lustre.....
Starting client: linux: -o user_xattr,flock linux@tcp:/lustre /mnt/lustre
replace_nids should fail if MDS, OSTs and clients are UP
error: replace_nids: Operation now in progress
umount lustre on /mnt/lustre.....
Stopping client linux /mnt/lustre (opts:)
sh: lsof: command not found
replace_nids should fail if MDS and OSTs are UP
error: replace_nids: Operation now in progress
stop ost1 service on linux
Stopping /mnt/ost1 (opts:-f) on linux
replace_nids should fail if MDS is UP
error: replace_nids: Operation now in progress
stop mds service on linux
Stopping /mnt/mds1 (opts:-f) on linux
start mds service on linux
Starting mds1: -o nosvc,loop  /tmp/lustre-mdt1 /mnt/mds1
Started lustre-MDT0000
command should accept two parameters
replace primary NIDs for a device
usage: replace_nids <device> <nid1>[,nid2,nid3]
correct device name should be passed
error: replace_nids: Invalid argument
wrong nids list should not destroy the system
replace primary NIDs for a device
usage: replace_nids <device> <nid1>[,nid2,nid3]
replace OST nid
command should accept two parameters
replace primary NIDs for a device
usage: replace_nids <device> <nid1>[,nid2,nid3]
wrong nids list should not destroy the system
replace primary NIDs for a device
usage: replace_nids <device> <nid1>[,nid2,nid3]
replace MDS nid
stop mds service on linux
Stopping /mnt/mds1 (opts:-f) on linux
start mds service on linux
Starting mds1:   -o loop /tmp/lustre-mdt1 /mnt/mds1
Started lustre-MDT0000
start ost1 service on linux
Starting ost1:   -o loop /tmp/lustre-ost1 /mnt/ost1
Started lustre-OST0000
mount lustre on /mnt/lustre.....
Starting client: linux: -o user_xattr,flock linux@tcp:/lustre /mnt/lustre
setup single mount lustre success
umount lustre on /mnt/lustre.....
Stopping client linux /mnt/lustre (opts:)
sh: lsof: command not found
stop ost1 service on linux
Stopping /mnt/ost1 (opts:-f) on linux
stop mds service on linux
Stopping /mnt/mds1 (opts:-f) on linux
Modules still loaded: 
ldiskfs/ldiskfs/ldiskfs.o lustre/mdd/mdd.o lustre/mgs/mgs.o lustre/quota/lquota.o lustre/mgc/mgc.o lustre/fid/fid.o lustre/fld/fld.o lustre/ptlrpc/ptlrpc.o lustre/obdclass/obdclass.o lustre/lvfs/lvfs.o lnet/klnds/socklnd/ksocklnd.o lnet/lnet/lnet.o libcfs/libcfs/libcfs.o
Stopping clients: linux /mnt/lustre (opts:)
Stopping clients: linux /mnt/lustre2 (opts:)
Loading modules from /root/lustre-master/lustre/tests/..
detected 1 online CPUs by sysfs
libcfs will create CPU partition based on online CPUs
debug=-1
subsystem_debug=all -lnet -lnd -pinger
gss/krb5 is not supported
Formatting mgs, mds, osts
Format mds1: /tmp/lustre-mdt1
Format ost1: /tmp/lustre-ost1
Format ost2: /tmp/lustre-ost2
Resetting fail_loc on all nodes...done.
PASS 66 (69s)
............== conf-sanity test complete, duration 113 sec == 15:31:10 (1363678270)

This prevents some of my new tests, which are placed after 66, from removing and reloading Lustre kernel modules. The root cause is that the "lctl replace_nids" implementation may leak lu_envs when certain errors happen.

I'll post a patch shortly.



 Comments   
Comment by Li Wei (Inactive) [ 19/Mar/13 ]

http://review.whamcloud.com/5765

Comment by Li Wei (Inactive) [ 02/Apr/13 ]

The patch has landed to master.

Generated at Sat Feb 10 01:29:59 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.