[LU-7230] memory leak in sanityn.sh 90 & 91 Created: 30/Sep/15  Updated: 03/Nov/15  Resolved: 03/Nov/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: Lustre 2.8.0

Type: Bug Priority: Critical
Reporter: Di Wang Assignee: Di Wang
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-7231 ENOSPC on remote MDT might create a i... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   
[root@testnode tests]# MDSCOUNT=4 sh llmount.sh 
Stopping clients: testnode /mnt/lustre (opts:)
Stopping clients: testnode /mnt/lustre2 (opts:)
Loading modules from /work/lustre-release_new/lustre/tests/..
detected 8 online CPUs by sysfs
libcfs will create CPU partition based on online CPUs
debug=vfstrace rpctrace dlmtrace neterror ha config 		      ioctl super lfsck
subsystem_debug=all -lnet -lnd -pinger
quota/lquota options: 'hash_lqs_cur_bits=3'
Formatting mgs, mds, osts
Format mds1: /tmp/lustre-mdt1
Format mds2: /tmp/lustre-mdt2
Format mds3: /tmp/lustre-mdt3
Format mds4: /tmp/lustre-mdt4
Format ost1: /tmp/lustre-ost1
Format ost2: /tmp/lustre-ost2
Checking servers environments
Checking clients testnode environments
Loading modules from /work/lustre-release_new/lustre/tests/..
detected 8 online CPUs by sysfs
libcfs will create CPU partition based on online CPUs
debug=vfstrace rpctrace dlmtrace neterror ha config 		      ioctl super lfsck
subsystem_debug=all -lnet -lnd -pinger
Setup mgs, mdt, osts
Starting mds1:   -o loop /tmp/lustre-mdt1 /mnt/mds1
Started lustre-MDT0000
Starting mds2:   -o loop /tmp/lustre-mdt2 /mnt/mds2
Started lustre-MDT0001
Starting mds3:   -o loop /tmp/lustre-mdt3 /mnt/mds3
Started lustre-MDT0002
Starting mds4:   -o loop /tmp/lustre-mdt4 /mnt/mds4
Started lustre-MDT0003
Starting ost1:   -o loop /tmp/lustre-ost1 /mnt/ost1
Started lustre-OST0000
Starting ost2:   -o loop /tmp/lustre-ost2 /mnt/ost2
Started lustre-OST0001
Starting client: testnode:  -o user_xattr,flock testnode@tcp:/lustre /mnt/lustre
Using TIMEOUT=20
seting jobstats to procname_uid
Setting lustre.sys.jobid_var from disable to procname_uid
Waiting 90 secs for update
Updated after 3s: wanted 'procname_uid' got 'procname_uid'
disable quota as required
[root@testnode tests]# MDSCOUNT=4 ONLY="90 91" sh sanityn.sh
Logging to shared log directory: /tmp/test_logs/1443440481
Starting client testnode:  -o user_xattr,flock testnode@tcp:/lustre /mnt/lustre2
Started clients testnode: 
testnode@tcp:/lustre on /mnt/lustre2 type lustre (rw,user_xattr,flock)
testnode: Checking config lustre mounted on /mnt/lustre
Checking servers environments
Checking clients testnode environments
Using TIMEOUT=20
disable quota as required
osd-ldiskfs.track_declares_assert=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.073335 s, 14.3 MB/s
running as uid/gid/euid/egid 500/500/500/500, groups:
 [touch] [/mnt/lustre/d0_runas_test/f6126]
excepting tests: 14b 18c 19 28 29 35
skipping tests SLOW=no: 33a
192.168.1.61@tcp:/lustre /mnt/lustre2 lustre rw,flock,user_xattr 0 0


== sanityn test 90: open/create and unlink striped directory == 04:41:22 (1443440482)
start pid 7559 to open/create under striped directory
start pid 7560 to unlink striped directory
sanityn.sh: line 3395:  7559 Terminated              ( cd $DIR1; while true; do
    $LFS mkdir -c$MDSCOUNT $tdir > /dev/null 2>&1; touch $tdir/f{0..3} > /dev/null 2>&1;
done )
sanityn.sh: line 3395:  7560 Terminated              ( cd $DIR2; while true; do
    rm -rf $tdir > /dev/null 2>&1;
done )
Resetting fail_loc on all nodes...done.
PASS 90 (180s)

== sanityn test 91: chmod and unlink striped directory == 04:44:22 (1443440662)
start pid 83746 to chmod striped directory
start pid 83747 to unlink striped directory
sanityn.sh: line 3432: 83746 Terminated              ( cd $DIR1; while true; do
    $LFS mkdir -c$MDSCOUNT $tdir > /dev/null 2>&1; chmod go+w $tdir > /dev/null 2>&1;
done )
sanityn.sh: line 3432: 83747 Terminated              ( cd $DIR2; while true; do
    rm -rf $tdir > /dev/null 2>&1;
done )
Resetting fail_loc on all nodes...done.
PASS 91 (180s)
cleanup: ======================================================
== sanityn test complete, duration 362 sec == 04:47:23 (1443440843)
Stopping clients: testnode /mnt/lustre2 (opts:)
[root@testnode tests]# MDSCOUNT=4 sh llmountcleanup.sh 
Stopping clients: testnode /mnt/lustre (opts:-f)
Stopping client testnode /mnt/lustre opts:-f
Stopping clients: testnode /mnt/lustre2 (opts:-f)
Stopping /mnt/mds1 (opts:-f) on testnode
Stopping /mnt/mds2 (opts:-f) on testnode
Stopping /mnt/mds3 (opts:-f) on testnode
Stopping /mnt/mds4 (opts:-f) on testnode
Stopping /mnt/ost1 (opts:-f) on testnode
Stopping /mnt/ost2 (opts:-f) on testnode
waited 0 for 31 ST ost OSS OSS_uuid 0
LustreError: 19915:0:(class_obd.c:638:cleanup_obdclass()) obd_memory max: 947765874, leaked: 4111712

Memory leaks detected


 Comments   
Comment by Gerrit Updater [ 30/Sep/15 ]

wangdi (di.wang@intel.com) uploaded a new patch: http://review.whamcloud.com/16677
Subject: LU-7230 llite: clear dir stripe md in ll_iget
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 6a7dd3bcb58d65f3e8fe1dd4c8973f5a88f21b5b

Comment by Di Wang [ 02/Oct/15 ]

This patch fix some quite serious top->sub transaction issue. (the original implementation forget to transfer th_local/th_tag to sub_trans for local transaction). Let's make it critical.

Comment by Gerrit Updater [ 03/Nov/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16677/
Subject: LU-7230 llite: clear dir stripe md in ll_iget
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: d05051c34077c74e3c08f3723bf6084554c0daf8

Comment by Peter Jones [ 03/Nov/15 ]

Landed for 2.8

Generated at Sat Feb 10 02:07:06 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.