[LU-11956] conf-sanity test_32a failed with 1 Created: 11/Feb/19  Updated: 29/May/20  Resolved: 29/May/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.13.0, Lustre 2.12.1, Lustre 2.12.4
Fix Version/s: Lustre 2.13.0, Lustre 2.12.4

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Dongyang Li
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-12469 sanity test_230b crash: LBUG of lu_bu... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for jianyu <yujian@whamcloud.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/a63d2a9e-2b9c-11e9-b3df-52540065bddc

test_32a failed with the following error:

CMD: trevis-24vm6 mount -t lustre -o loop,exclude=t32fs-OST0000 /tmp/t32/mdt /tmp/t32/mnt/mdt
CMD: trevis-24vm6 umount -d /tmp/t32/mnt/mdt
CMD: trevis-24vm6 rm -rf /tmp/t32
 conf-sanity test_32a: @@@@@@ FAIL: test_32a failed with 1 

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
conf-sanity test_32a - test_32a failed with 1



 Comments   
Comment by James Nunez (Inactive) [ 12/Feb/19 ]

In the logs for this failure, we see the following in MDS1 console logs

[40671.635023] LNet: 31304:0:(socklnd.c:2550:ksocknal_shutdown()) waiting for 1 peers to disconnect
[40704.635048] LNet: Removed LNI 10.9.5.40@tcp
[40704.708067] LustreError: 31346:0:(class_obd.c:824:obdclass_exit()) obd_memory max: 203787793, leaked: 7294919
[40705.248049] LNet: HW NUMA nodes: 1, HW CPU cores: 2, npartitions: 1

Could this be related to LU-11920 ?

Comment by Sarah Liu [ 10/Apr/19 ]

another one on b2_12 branch SLES12 SP3 DNE
https://testing.whamcloud.com/test_sets/d9df000a-5ac9-11e9-a256-52540065bddc

MDS console

[34732.473160] Lustre: DEBUG MARKER: dmesg
[34733.623267] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == conf-sanity test 32a: Upgrade \(not live\) ========================================================== 10:04:14 \(1554743054\)
[34733.670594] Lustre: DEBUG MARKER: == conf-sanity test 32a: Upgrade (not live) ========================================================== 10:04:14 (1554743054)
[34733.692376] Lustre: DEBUG MARKER: which tunefs.lustre
[34733.730553] Lustre: DEBUG MARKER: find /usr/lib64/lustre/tests -maxdepth 1 -name 'disk*-ldiskfs.tar.bz2'
[34735.594154] Lustre: DEBUG MARKER: /usr/sbin/lctl list_nids
[34735.630206] Lustre: DEBUG MARKER: mkdir -p /tmp/t32/mnt/mdt /tmp/t32/mnt/mdt1 /tmp/t32/mnt/ost
[34735.663544] Lustre: DEBUG MARKER: tar xjvf /usr/lib64/lustre/tests/disk2_9-ldiskfs.tar.bz2 -S -C /tmp/t32
[34735.781726] Lustre: DEBUG MARKER: cat /tmp/t32/commit
[34735.815364] Lustre: DEBUG MARKER: cat /tmp/t32/kernel
[34735.848358] Lustre: DEBUG MARKER: cat /tmp/t32/arch
[34735.881322] Lustre: DEBUG MARKER: cat /tmp/t32/bspace
[34735.914886] Lustre: DEBUG MARKER: cat /tmp/t32/ispace
[34735.948408] Lustre: DEBUG MARKER: test -f /tmp/t32/blimit
[34735.982034] Lustre: DEBUG MARKER: cat /tmp/t32/blimit
[34736.015054] Lustre: DEBUG MARKER: test -f /tmp/t32/ilimit
[34736.048105] Lustre: DEBUG MARKER: cat /tmp/t32/ilimit
[34736.174414] Lustre: DEBUG MARKER: test -f /tmp/t32/mdt2
[34736.207377] Lustre: DEBUG MARKER: /usr/sbin/lctl set_param debug=-1
[34736.241592] Lustre: DEBUG MARKER: tunefs.lustre --dryrun /tmp/t32/mdt
[34736.957068] Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lust
[34737.260424] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-35vm4: executing \/usr\/sbin\/lctl device_list
[34737.308969] Lustre: DEBUG MARKER: trevis-35vm4: executing /usr/sbin/lctl device_list
[34737.337722] Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lust
[34737.630641] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-35vm4: executing losetup -a
[34737.677227] Lustre: DEBUG MARKER: trevis-35vm4: executing losetup -a
[34737.796388] Lustre: DEBUG MARKER: mount -t lustre -o loop,exclude=t32fs-OST0000 /tmp/t32/mdt /tmp/t32/mnt/mdt
[34738.028666] loop: module loaded
[34738.385785] LDISKFS-fs (loop0): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
[34738.790192] Lustre: 10306:0:(obd_mount.c:972:lustre_check_exclusion()) Excluding t32fs-OST0000 (on exclusion list)
[34753.912623] Lustre: t32fs-MDT0000: Imperative Recovery not enabled, recovery window 60-180
[34753.912630] Lustre: Skipped 7 previous similar messages
[34754.784409] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdt.t32fs-MDT0000.uuid
[34754.819328] Lustre: DEBUG MARKER: tunefs.lustre --dryrun /tmp/t32/ost
[34755.614690] Lustre: DEBUG MARKER: mount -t lustre -onomgs -oloop,mgsnode=10.9.5.121@tcp /tmp/t32/ost /tmp/t32/mnt/ost
[34755.924426] LDISKFS-fs (loop1): mounted filesystem with ordered data mode. Opts: errors=remount-ro,no_mbcache,nodelalloc
[34759.777761] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n obdfilter.t32fs-OST0000.uuid
[34759.812960] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-OST0000.failover.node=10.9.5.121@tcp
[34759.851786] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-MDT0000.failover.node=10.9.5.121@tcp
[34759.887341] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-OST0000.osc.max_dirty_mb=15
[34759.923865] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-MDT0000.mdc.max_rpcs_in_flight=9
[34759.959701] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-MDT0000.lov.stripesize=4M
[34759.995805] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-MDT0000.mdd.atime_diff=70
[34760.031367] Lustre: DEBUG MARKER: /usr/sbin/lctl pool_new t32fs.interop
[34763.312597] LustreError: 10284:0:(mgc_request.c:1572:mgc_apply_recover_logs()) mgc: cannot find uuid by nid 10.9.5.121@tcp
[34763.312609] Lustre: 10284:0:(mgc_request.c:1792:mgc_process_recover_nodemap_log()) MGC10.9.5.121@tcp: error processing recovery log t32fs-mdtir: rc = -2
[34763.312630] LustreError: 10284:0:(mgc_request.c:2127:mgc_process_log()) MGC10.9.5.121@tcp: recover log t32fs-mdtir failed, not fatal: rc = -2
[34763.320299] Lustre: 10570:0:(obd_mount.c:972:lustre_check_exclusion()) Excluding t32fs-OST0000 (on exclusion list)
[34763.812009] Lustre: t32fs-OST0000: deleting orphan objects from 0x0:33 to 0x0:97
[34764.072473] Lustre: DEBUG MARKER: umount /tmp/t32/mnt/mdt
[34764.093785] Lustre: Failing over t32fs-MDT0000
[34764.311493] LustreError: 10312:0:(osp_precreate.c:656:osp_precreate_send()) t32fs-OST0000-osc-MDT0000: can't precreate: rc = -5
[34764.311513] LustreError: 10312:0:(osp_precreate.c:1312:osp_precreate_thread()) t32fs-OST0000-osc-MDT0000: cannot precreate objects: rc = -5
[34764.789363] blk_update_request: I/O error, dev loop0, sector 0
[34764.794440] blk_update_request: I/O error, dev loop0, sector 0
[34764.799324] blk_update_request: I/O error, dev loop0, sector 0
[34764.802428] Lustre: DEBUG MARKER: umount /tmp/t32/mnt/ost
[34764.806207] blk_update_request: I/O error, dev loop0, sector 0
[34764.811405] blk_update_request: I/O error, dev loop0, sector 0
[34764.815350] blk_update_request: I/O error, dev loop0, sector 0
[34770.821690] Lustre: 10627:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1554743086/real 1554743086]  req@ffff880061977c80 x1630230038952912/t0(0) o39->t32fs-MDT0000-lwp-OST0000@0@lo:12/10 lens 224/224 e 0 to 1 dl 1554743092 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
[34770.821698] Lustre: 10627:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 4 previous similar messages
[34778.313094] blk_update_request: I/O error, dev loop1, sector 0
[34778.317682] blk_update_request: I/O error, dev loop1, sector 0
[34778.322686] blk_update_request: I/O error, dev loop1, sector 0
[34778.328470] blk_update_request: I/O error, dev loop1, sector 0
[34778.332926] blk_update_request: I/O error, dev loop1, sector 0
[34778.334269] Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lust
[34778.343911] blk_update_request: I/O error, dev loop1, sector 0
[34778.621078] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-35vm4: executing \/usr\/sbin\/lustre_rmmod ldiskfs
[34778.661091] Lustre: DEBUG MARKER: trevis-35vm4: executing /usr/sbin/lustre_rmmod ldiskfs
[34780.437293] LNet: 11109:0:(socklnd.c:2550:ksocknal_shutdown()) waiting for 2 peers to disconnect
[34784.437136] LNet: 11109:0:(socklnd.c:2550:ksocknal_shutdown()) waiting for 2 peers to disconnect
[34792.436817] LNet: 11109:0:(socklnd.c:2550:ksocknal_shutdown()) waiting for 2 peers to disconnect
[34808.436197] LNet: 11109:0:(socklnd.c:2550:ksocknal_shutdown()) waiting for 2 peers to disconnect
[34840.434930] LNet: 11109:0:(socklnd.c:2550:ksocknal_shutdown()) waiting for 2 peers to disconnect
[34872.433661] LNet: Removed LNI 10.9.5.121@tcp
[34872.552055] LustreError: 11153:0:(class_obd.c:821:obdclass_exit()) obd_memory max: 201964618, leaked: 216
[34872.924983] LNet: HW NUMA nodes: 1, HW CPU cores: 2, npartitions: 1
[34872.928217] alg: No test for adler32 (adler32-zlib)
[34873.749768] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-35vm4: executing check_mem_leak
[34873.790449] Lustre: DEBUG MARKER: trevis-35vm4: executing check_mem_leak
[34873.922632] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  conf-sanity test_32a: @@@@@@ FAIL: Reloading modules 
Comment by Gerrit Updater [ 27/Jun/19 ]

Li Dongyang (dongyangli@ddn.com) uploaded a new patch: https://review.whamcloud.com/35333
Subject: LU-11956 mdd: do not reset original lu_buf.lb_len
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: edd6b2321fff95feb52b70cbc9c2542bc4f9eccb

Comment by Gerrit Updater [ 20/Sep/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35333/
Subject: LU-11956 mdd: do not reset original lu_buf.lb_len
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 94a5bc1bcb6c6373ead5b091ff5915dfe452377b

Comment by Peter Jones [ 20/Sep/19 ]

Landed for 2.13

Comment by Gerrit Updater [ 16/Dec/19 ]

Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/37029
Subject: LU-11956 mdd: do not reset original lu_buf.lb_len
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 0a01e15afaf12c544d061bd1e14a0bf08c274cc3

Comment by Gerrit Updater [ 03/Jan/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37029/
Subject: LU-11956 mdd: do not reset original lu_buf.lb_len
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 9ba87725d1395dc729db7c90f72145e3a339f9c0

Comment by James Nunez (Inactive) [ 27/Jan/20 ]

I'm still seeing memory leaks for conf-sanity test 32a at https://testing.whamcloud.com/test_sets/f4e2fb40-3ead-11ea-ac52-52540065bddc for 2.12.3.100.

In the client test_log, we see

...
trevis-63vm8: trevis-63vm8.trevis.whamcloud.com: executing check_mem_leak
trevis-63vm8: [47023.722414] LustreError: 4730:0:(class_obd.c:821:obdclass_exit()) obd_memory max: 226318582, leaked: 33040
trevis-63vm8: 
trevis-63vm8: Memory leaks detected
trevis-63vm8: mv: cannot stat '/tmp/debug': No such file or directory
 conf-sanity test_32a: @@@@@@ FAIL: Reloading modules 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:5888:error_noexit()
  = /usr/lib64/lustre/tests/conf-sanity.sh:2329:t32_test()
…
CMD: trevis-63vm8 /usr/sbin/lctl list_nids
trevis-63vm8: IOC_LIBCFS_GET_NI error 22: Invalid argument
CMD: trevis-63vm8 mkdir -p /tmp/t32/mnt/mdt /tmp/t32/mnt/mdt1 /tmp/t32/mnt/ost
…
exiting before disk write.
CMD: trevis-63vm8 mount -t lustre -o loop,exclude=t32fs-OST0000 /tmp/t32/mdt /tmp/t32/mnt/mdt
CMD: trevis-63vm8 umount -d /tmp/t32/mnt/mdt
CMD: trevis-63vm8 rm -rf /tmp/t32
 conf-sanity test_32a: @@@@@@ FAIL: test_32a failed with 1 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:5900:error()
Comment by Andreas Dilger [ 28/Apr/20 ]

Hit this when testing e2fsprogs against SLES12.3, but it was running 2.12.58 on the server, which does not have commit v2_12_58-70-g94a5bc1bcb. I've filed ATM-1728 to have e2fsprogs run against master, so that this issue is not hit.

Comment by Andreas Dilger [ 29/May/20 ]

I'm going to mark this issue resolved. Memory leaks could happen in a number of different ways, so it makes sense to open separate tickets for those issues.

Generated at Sat Feb 10 02:48:24 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.