Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.5.0
-
3
-
10871
Description
Running sanity-hsm I see that we're leaking a log_handle. You can see it by mounting, running full sanity-hsm, then unmounting or by mounting, running 107 222a 222b 224 226 250, then unmounting.
# export OSTSIZE=1000000 # export agt1_HOST=t # export PTLDEBUG="vfstrace rpctrace dlmtrace neterror ha config ioctl super hsm other malloc info" # export DEBUG_SIZE=2048 # dir=/tmp/hsm-leak.$$ # mkdir $direxport OSTSIZE=1000000 # export agt1_HOST=t # export PTLDEBUG="vfstrace rpctrace dlmtrace neterror ha config ioctl super hsm other malloc info" # export DEBUG_SIZE=2048 # dir=/tmp/hsm-leak.$$ # mkdir $dir || exit 1 # echo $dir # rm -rf /tmp/debug-leak.* /tmp/lustre-log.* /tmp/sanity-hsm.log /tmp/test_logs/ # export ONLY="107 222a 222b 224 226 250" # dmesg -c > /dev/null # llmount.sh # sh ~/lustre-release/lustre/tests/sanity-hsm.sh # llmountcleanup.sh # dmesg > $dir/dmesg # mv /tmp/debug-leak.* /tmp/lustre-log.* /tmp/sanity-hsm.log /tmp/test_logs/ $dir/
Stopping clients: t /mnt/lustre (opts:) Stopping clients: t /mnt/lustre2 (opts:) ... only running test 107 222a 222b 224 226 250 excepting tests: 31a 34 35 36 200 201 221 223a 223b 225 Killing existing copytools on t Set HSM on and start Changed after 0s: from '' to 'stopped' Waiting 20 secs for update Updated after 7s: wanted 'enabled' got 'enabled' Start copytool Purging archive on t Starting copytool agt1 on t Set sanity-hsm HSM policy == sanity-hsm test 107: Copytool re-register after MDS restart == 16:48:19 (1380750499) Wakeup copytool agt1 on t Changed after 0s: from '' to 'STARTED' Waiting 100 secs for update Failing mds1 on t Stopping /mnt/mds1 (opts:) on t reboot facets: mds1 Failover mds1 to t 16:48:31 (1380750511) waiting for t network 900 secs ... 16:48:31 (1380750511) network interface is UP mount facets: mds1 Starting mds1: -o loop /tmp/lustre-mdt1 /mnt/mds1 Started lustre-MDT0000 t: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec Changed after 0s: from '' to 'STARTED' Waiting 100 secs for update Copytool is stopped on t Resetting fail_loc on all nodes...done. PASS 107 (16s) == sanity-hsm test 222a: Changelog for explicit restore == 16:48:35 (1380750515) Purging archive on t Starting copytool agt1 on t lhsmtool_posix[18491]: action=1 src=d0.sanity-hsm/d222/f.sanity-hsm.222a dst=/mnt/lustre/d0.sanity-hsm/d222/f.sanity-hsm.222a mount_point=/mnt/lustre lhsmtool_posix[18491]: importing '/mnt/lustre/d0.sanity-hsm/d222/f.sanity-hsm.222a' from '/tmp/arc1/d0.sanity-hsm/d222/f.sanity-hsm.222a' lhsmtool_posix[18491]: imported '/mnt/lustre/d0.sanity-hsm/d222/f.sanity-hsm.222a' from '/tmp/arc1/0006/0000/0400/0000/0002/0000/0x200000400:0x6:0x0'=='/tmp/arc1/d0.sanity-hsm/d222/f.sanity-hsm.222a' lhsmtool_posix[18491]: process finished, errs: 0 major, 0 minor, rc=0 (Success) mdd.lustre-MDT0000.changelog_mask=+hsm Changed after 0s: from '' to 'STARTED' Waiting 100 secs for update Copytool is stopped on t lustre-MDT0000: Deregistered changelog user 'cl1' Resetting fail_loc on all nodes...done. PASS 222a (3s) == sanity-hsm test 222b: Changelog for implicit restore == 16:48:38 (1380750518) Purging archive on t Starting copytool agt1 on t mdd.lustre-MDT0000.changelog_mask=+hsm Changed after 0s: from '' to 'STARTED' Waiting 100 secs for update 8e700881220db1269deb14847f7ffd4d /mnt/lustre/d0.sanity-hsm/d222/f.sanity-hsm.222b Copytool is stopped on t lustre-MDT0000: Deregistered changelog user 'cl2' Resetting fail_loc on all nodes...done. PASS 222b (2s) == sanity-hsm test 224: Changelog for remove == 16:48:40 (1380750520) Purging archive on t Starting copytool agt1 on t mdd.lustre-MDT0000.changelog_mask=+hsm Changed after 0s: from '' to 'STARTED' Waiting 100 secs for update Copytool is stopped on t lustre-MDT0000: Deregistered changelog user 'cl3' Resetting fail_loc on all nodes...done. PASS 224 (3s) == sanity-hsm test 226: changelog for last rm/mv with exiting archive == 16:48:43 (1380750523) Purging archive on t Starting copytool agt1 on t 0x200000401:0x7:0x0 mdd.lustre-MDT0000.changelog_mask=+hsm Changed after 0s: from '' to 'STARTED' Waiting 100 secs for update Changed after 0s: from '' to 'STARTED' Waiting 100 secs for update Copytool is stopped on t lustre-MDT0000: Deregistered changelog user 'cl4' Resetting fail_loc on all nodes...done. PASS 226 (4s) == sanity-hsm test 250: Coordinator max request == 16:48:47 (1380750527) Purging archive on t Starting copytool agt1 on t mdt.lustre-MDT0000.hsm_control=disabled 10+0 records in 10+0 records out 10485760 bytes (10 MB) copied, 1.87602 s, 5.6 MB/s 10+0 records in 10+0 records out 10485760 bytes (10 MB) copied, 1.69116 s, 6.2 MB/s 10+0 records in 10+0 records out 10485760 bytes (10 MB) copied, 1.75049 s, 6.0 MB/s 10+0 records in 10+0 records out 10485760 bytes (10 MB) copied, 1.83329 s, 5.7 MB/s 10+0 records in 10+0 records out 10485760 bytes (10 MB) copied, 1.66484 s, 6.3 MB/s 10+0 records in 10+0 records out 10485760 bytes (10 MB) copied, 1.65743 s, 6.3 MB/s 10+0 records in 10+0 records out 10485760 bytes (10 MB) copied, 1.65997 s, 6.3 MB/s 10+0 records in 10+0 records out 10485760 bytes (10 MB) copied, 1.65964 s, 6.3 MB/s 10+0 records in 10+0 records out 10485760 bytes (10 MB) copied, 1.65768 s, 6.3 MB/s mdt.lustre-MDT0000.hsm_control=enabled max=3 started=2 waiting=6 ... max=3 started=0 waiting=0 Copytool is stopped on t Resetting fail_loc on all nodes...done. PASS 250 (107s) == sanity-hsm test complete, duration 145 sec == 16:50:34 (1380750634) Stopping clients: t /mnt/lustre2 (opts:) Stopping client t /mnt/lustre2 opts: Stopping clients: t /mnt/lustre (opts:-f) Stopping client t /mnt/lustre opts:-f Stopping clients: t /mnt/lustre2 (opts:-f) Stopping /mnt/mds1 (opts:-f) on t Stopping /mnt/ost1 (opts:-f) on t Stopping /mnt/ost2 (opts:-f) on t LustreError: 21879:0:(class_obd.c:730:cleanup_obdclass()) obd_memory max: 70547704, leaked: 8376
# cd /tmp/hsm-leak.* # perl /root/lustre-release/lustre/tests/leak_finder.pl debug-leak.* 2>&1 | grep Leak *** Leak: 184 bytes allocated at ffff88019d691840 (llog.c:llog_alloc_handle:66, debug file line 77331) *** Leak: 8192 bytes allocated at ffff88018e8c2000 (llog.c:llog_init_handle:215, debug file line 77334)