[LU-11956] conf-sanity test_32a failed with 1 Created: 11/Feb/19 Updated: 29/May/20 Resolved: 29/May/20 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.13.0, Lustre 2.12.1, Lustre 2.12.4 |
| Fix Version/s: | Lustre 2.13.0, Lustre 2.12.4 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | Dongyang Li |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
This issue was created by maloo for jianyu <yujian@whamcloud.com> This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/a63d2a9e-2b9c-11e9-b3df-52540065bddc test_32a failed with the following error: CMD: trevis-24vm6 mount -t lustre -o loop,exclude=t32fs-OST0000 /tmp/t32/mdt /tmp/t32/mnt/mdt CMD: trevis-24vm6 umount -d /tmp/t32/mnt/mdt CMD: trevis-24vm6 rm -rf /tmp/t32 conf-sanity test_32a: @@@@@@ FAIL: test_32a failed with 1 VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV |
| Comments |
| Comment by James Nunez (Inactive) [ 12/Feb/19 ] |
|
In the logs for this failure, we see the following in MDS1 console logs [40671.635023] LNet: 31304:0:(socklnd.c:2550:ksocknal_shutdown()) waiting for 1 peers to disconnect [40704.635048] LNet: Removed LNI 10.9.5.40@tcp [40704.708067] LustreError: 31346:0:(class_obd.c:824:obdclass_exit()) obd_memory max: 203787793, leaked: 7294919 [40705.248049] LNet: HW NUMA nodes: 1, HW CPU cores: 2, npartitions: 1 Could this be related to |
| Comment by Sarah Liu [ 10/Apr/19 ] |
|
another one on b2_12 branch SLES12 SP3 DNE MDS console [34732.473160] Lustre: DEBUG MARKER: dmesg [34733.623267] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == conf-sanity test 32a: Upgrade \(not live\) ========================================================== 10:04:14 \(1554743054\) [34733.670594] Lustre: DEBUG MARKER: == conf-sanity test 32a: Upgrade (not live) ========================================================== 10:04:14 (1554743054) [34733.692376] Lustre: DEBUG MARKER: which tunefs.lustre [34733.730553] Lustre: DEBUG MARKER: find /usr/lib64/lustre/tests -maxdepth 1 -name 'disk*-ldiskfs.tar.bz2' [34735.594154] Lustre: DEBUG MARKER: /usr/sbin/lctl list_nids [34735.630206] Lustre: DEBUG MARKER: mkdir -p /tmp/t32/mnt/mdt /tmp/t32/mnt/mdt1 /tmp/t32/mnt/ost [34735.663544] Lustre: DEBUG MARKER: tar xjvf /usr/lib64/lustre/tests/disk2_9-ldiskfs.tar.bz2 -S -C /tmp/t32 [34735.781726] Lustre: DEBUG MARKER: cat /tmp/t32/commit [34735.815364] Lustre: DEBUG MARKER: cat /tmp/t32/kernel [34735.848358] Lustre: DEBUG MARKER: cat /tmp/t32/arch [34735.881322] Lustre: DEBUG MARKER: cat /tmp/t32/bspace [34735.914886] Lustre: DEBUG MARKER: cat /tmp/t32/ispace [34735.948408] Lustre: DEBUG MARKER: test -f /tmp/t32/blimit [34735.982034] Lustre: DEBUG MARKER: cat /tmp/t32/blimit [34736.015054] Lustre: DEBUG MARKER: test -f /tmp/t32/ilimit [34736.048105] Lustre: DEBUG MARKER: cat /tmp/t32/ilimit [34736.174414] Lustre: DEBUG MARKER: test -f /tmp/t32/mdt2 [34736.207377] Lustre: DEBUG MARKER: /usr/sbin/lctl set_param debug=-1 [34736.241592] Lustre: DEBUG MARKER: tunefs.lustre --dryrun /tmp/t32/mdt [34736.957068] Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lust [34737.260424] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-35vm4: executing \/usr\/sbin\/lctl device_list [34737.308969] Lustre: DEBUG MARKER: trevis-35vm4: executing /usr/sbin/lctl device_list [34737.337722] Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lust [34737.630641] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-35vm4: executing losetup -a [34737.677227] Lustre: DEBUG MARKER: trevis-35vm4: executing losetup -a [34737.796388] Lustre: DEBUG MARKER: mount -t lustre -o loop,exclude=t32fs-OST0000 /tmp/t32/mdt /tmp/t32/mnt/mdt [34738.028666] loop: module loaded [34738.385785] LDISKFS-fs (loop0): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc [34738.790192] Lustre: 10306:0:(obd_mount.c:972:lustre_check_exclusion()) Excluding t32fs-OST0000 (on exclusion list) [34753.912623] Lustre: t32fs-MDT0000: Imperative Recovery not enabled, recovery window 60-180 [34753.912630] Lustre: Skipped 7 previous similar messages [34754.784409] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdt.t32fs-MDT0000.uuid [34754.819328] Lustre: DEBUG MARKER: tunefs.lustre --dryrun /tmp/t32/ost [34755.614690] Lustre: DEBUG MARKER: mount -t lustre -onomgs -oloop,mgsnode=10.9.5.121@tcp /tmp/t32/ost /tmp/t32/mnt/ost [34755.924426] LDISKFS-fs (loop1): mounted filesystem with ordered data mode. Opts: errors=remount-ro,no_mbcache,nodelalloc [34759.777761] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n obdfilter.t32fs-OST0000.uuid [34759.812960] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-OST0000.failover.node=10.9.5.121@tcp [34759.851786] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-MDT0000.failover.node=10.9.5.121@tcp [34759.887341] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-OST0000.osc.max_dirty_mb=15 [34759.923865] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-MDT0000.mdc.max_rpcs_in_flight=9 [34759.959701] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-MDT0000.lov.stripesize=4M [34759.995805] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-MDT0000.mdd.atime_diff=70 [34760.031367] Lustre: DEBUG MARKER: /usr/sbin/lctl pool_new t32fs.interop [34763.312597] LustreError: 10284:0:(mgc_request.c:1572:mgc_apply_recover_logs()) mgc: cannot find uuid by nid 10.9.5.121@tcp [34763.312609] Lustre: 10284:0:(mgc_request.c:1792:mgc_process_recover_nodemap_log()) MGC10.9.5.121@tcp: error processing recovery log t32fs-mdtir: rc = -2 [34763.312630] LustreError: 10284:0:(mgc_request.c:2127:mgc_process_log()) MGC10.9.5.121@tcp: recover log t32fs-mdtir failed, not fatal: rc = -2 [34763.320299] Lustre: 10570:0:(obd_mount.c:972:lustre_check_exclusion()) Excluding t32fs-OST0000 (on exclusion list) [34763.812009] Lustre: t32fs-OST0000: deleting orphan objects from 0x0:33 to 0x0:97 [34764.072473] Lustre: DEBUG MARKER: umount /tmp/t32/mnt/mdt [34764.093785] Lustre: Failing over t32fs-MDT0000 [34764.311493] LustreError: 10312:0:(osp_precreate.c:656:osp_precreate_send()) t32fs-OST0000-osc-MDT0000: can't precreate: rc = -5 [34764.311513] LustreError: 10312:0:(osp_precreate.c:1312:osp_precreate_thread()) t32fs-OST0000-osc-MDT0000: cannot precreate objects: rc = -5 [34764.789363] blk_update_request: I/O error, dev loop0, sector 0 [34764.794440] blk_update_request: I/O error, dev loop0, sector 0 [34764.799324] blk_update_request: I/O error, dev loop0, sector 0 [34764.802428] Lustre: DEBUG MARKER: umount /tmp/t32/mnt/ost [34764.806207] blk_update_request: I/O error, dev loop0, sector 0 [34764.811405] blk_update_request: I/O error, dev loop0, sector 0 [34764.815350] blk_update_request: I/O error, dev loop0, sector 0 [34770.821690] Lustre: 10627:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1554743086/real 1554743086] req@ffff880061977c80 x1630230038952912/t0(0) o39->t32fs-MDT0000-lwp-OST0000@0@lo:12/10 lens 224/224 e 0 to 1 dl 1554743092 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1 [34770.821698] Lustre: 10627:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 4 previous similar messages [34778.313094] blk_update_request: I/O error, dev loop1, sector 0 [34778.317682] blk_update_request: I/O error, dev loop1, sector 0 [34778.322686] blk_update_request: I/O error, dev loop1, sector 0 [34778.328470] blk_update_request: I/O error, dev loop1, sector 0 [34778.332926] blk_update_request: I/O error, dev loop1, sector 0 [34778.334269] Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lust [34778.343911] blk_update_request: I/O error, dev loop1, sector 0 [34778.621078] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-35vm4: executing \/usr\/sbin\/lustre_rmmod ldiskfs [34778.661091] Lustre: DEBUG MARKER: trevis-35vm4: executing /usr/sbin/lustre_rmmod ldiskfs [34780.437293] LNet: 11109:0:(socklnd.c:2550:ksocknal_shutdown()) waiting for 2 peers to disconnect [34784.437136] LNet: 11109:0:(socklnd.c:2550:ksocknal_shutdown()) waiting for 2 peers to disconnect [34792.436817] LNet: 11109:0:(socklnd.c:2550:ksocknal_shutdown()) waiting for 2 peers to disconnect [34808.436197] LNet: 11109:0:(socklnd.c:2550:ksocknal_shutdown()) waiting for 2 peers to disconnect [34840.434930] LNet: 11109:0:(socklnd.c:2550:ksocknal_shutdown()) waiting for 2 peers to disconnect [34872.433661] LNet: Removed LNI 10.9.5.121@tcp [34872.552055] LustreError: 11153:0:(class_obd.c:821:obdclass_exit()) obd_memory max: 201964618, leaked: 216 [34872.924983] LNet: HW NUMA nodes: 1, HW CPU cores: 2, npartitions: 1 [34872.928217] alg: No test for adler32 (adler32-zlib) [34873.749768] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-35vm4: executing check_mem_leak [34873.790449] Lustre: DEBUG MARKER: trevis-35vm4: executing check_mem_leak [34873.922632] Lustre: DEBUG MARKER: /usr/sbin/lctl mark conf-sanity test_32a: @@@@@@ FAIL: Reloading modules |
| Comment by Gerrit Updater [ 27/Jun/19 ] |
|
Li Dongyang (dongyangli@ddn.com) uploaded a new patch: https://review.whamcloud.com/35333 |
| Comment by Gerrit Updater [ 20/Sep/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35333/ |
| Comment by Peter Jones [ 20/Sep/19 ] |
|
Landed for 2.13 |
| Comment by Gerrit Updater [ 16/Dec/19 ] |
|
Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/37029 |
| Comment by Gerrit Updater [ 03/Jan/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37029/ |
| Comment by James Nunez (Inactive) [ 27/Jan/20 ] |
|
I'm still seeing memory leaks for conf-sanity test 32a at https://testing.whamcloud.com/test_sets/f4e2fb40-3ead-11ea-ac52-52540065bddc for 2.12.3.100. In the client test_log, we see ... trevis-63vm8: trevis-63vm8.trevis.whamcloud.com: executing check_mem_leak trevis-63vm8: [47023.722414] LustreError: 4730:0:(class_obd.c:821:obdclass_exit()) obd_memory max: 226318582, leaked: 33040 trevis-63vm8: trevis-63vm8: Memory leaks detected trevis-63vm8: mv: cannot stat '/tmp/debug': No such file or directory conf-sanity test_32a: @@@@@@ FAIL: Reloading modules Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:5888:error_noexit() = /usr/lib64/lustre/tests/conf-sanity.sh:2329:t32_test() … CMD: trevis-63vm8 /usr/sbin/lctl list_nids trevis-63vm8: IOC_LIBCFS_GET_NI error 22: Invalid argument CMD: trevis-63vm8 mkdir -p /tmp/t32/mnt/mdt /tmp/t32/mnt/mdt1 /tmp/t32/mnt/ost … exiting before disk write. CMD: trevis-63vm8 mount -t lustre -o loop,exclude=t32fs-OST0000 /tmp/t32/mdt /tmp/t32/mnt/mdt CMD: trevis-63vm8 umount -d /tmp/t32/mnt/mdt CMD: trevis-63vm8 rm -rf /tmp/t32 conf-sanity test_32a: @@@@@@ FAIL: test_32a failed with 1 Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:5900:error() |
| Comment by Andreas Dilger [ 28/Apr/20 ] |
|
Hit this when testing e2fsprogs against SLES12.3, but it was running 2.12.58 on the server, which does not have commit v2_12_58-70-g94a5bc1bcb. I've filed ATM-1728 to have e2fsprogs run against master, so that this issue is not hit. |
| Comment by Andreas Dilger [ 29/May/20 ] |
|
I'm going to mark this issue resolved. Memory leaks could happen in a number of different ways, so it makes sense to open separate tickets for those issues. |