[LU-6828] conf-sanity test_32a: Setting MDT failover.node Created: 09/Jul/15  Updated: 04/Nov/15  Resolved: 29/Jul/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: Lustre 2.8.0

Type: Bug Priority: Critical
Reporter: Maloo Assignee: Yang Sheng
Resolution: Fixed Votes: 0
Labels: None
Environment:

client and server: lustre-master build # 3094 RHEL7


Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for sarah_lw <wei3.liu@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/07512048-25ec-11e5-866a-5254006e85c2.

The sub-test test_32a failed with the following error:

Setting MDT failover.node

Test log below. MDS log is missing, TEI-3677 is for tracking the issue. This may be a similar issue with LU-6807

CMD: shadow-43vm3 mount -t lustre -o loop,mgsnode=10.1.6.3@tcp /tmp/t32/ost /tmp/t32/mnt/ost
CMD: shadow-43vm3 /usr/sbin/lctl get_param -n obdfilter.t32fs-OST0000.uuid
CMD: shadow-43vm3 /usr/sbin/lctl conf_param t32fs-OST0000.osc.max_dirty_mb=15
shadow-43vm3: warning: 'lctl conf_param' is deprecated, use 'lctl set_param -P' instead
CMD: shadow-43vm3 /usr/sbin/lctl conf_param t32fs-OST0000.failover.node=10.1.6.3@tcp
shadow-43vm3: warning: 'lctl conf_param' is deprecated, use 'lctl set_param -P' instead
CMD: shadow-43vm3 /usr/sbin/lctl conf_param t32fs-MDT0000.mdc.max_rpcs_in_flight=9
shadow-43vm3: warning: 'lctl conf_param' is deprecated, use 'lctl set_param -P' instead
CMD: shadow-43vm3 /usr/sbin/lctl conf_param t32fs-MDT0000.failover.node=10.1.6.3@tcp
pdsh@shadow-43vm6: shadow-43vm3: mcmd: xpoll (setting up stderr): Interrupted system call
 conf-sanity test_32a: @@@@@@ FAIL: Setting MDT "failover.node" 


 Comments   
Comment by Peter Jones [ 09/Jul/15 ]

Fan Yong

Please could you make this your top priority? This is disrupting the RHEL7 server testing.

Thanks

Peter

Comment by nasf (Inactive) [ 11/Jul/15 ]

The tests from test_32a failed because of losing the MDS (shadow-43vm3). It seems that the MDS corrupted. But there are NOT MDS's logs on Maloo. The same case for LU-6807. We need the MDS logs for further investigation. That depends on TEI-3677.

Comment by Sarah Liu [ 13/Jul/15 ]

I found something from this link: https://testing.hpdd.intel.com/test_logs/14bab794-25ec-11e5-866a-5254006e85c2/show_text

MDS console

01:48:26:[ 2959.156834] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == conf-sanity test 32a: Upgrade \(not live\) ========================================================== 01:48:14 \(1436406494\)
01:48:26:[ 2959.269087] Lustre: DEBUG MARKER: == conf-sanity test 32a: Upgrade (not live) ========================================================== 01:48:14 (1436406494)
01:48:26:[ 2959.365721] Lustre: DEBUG MARKER: which tunefs.lustre
01:48:26:[ 2959.547961] Lustre: DEBUG MARKER: find /usr/lib64/lustre/tests -maxdepth 1 -name 'disk*-ldiskfs.tar.bz2'
01:48:26:[ 2959.717579] Lustre: DEBUG MARKER: /usr/sbin/lctl list_nids
01:48:26:[ 2960.008631] Lustre: DEBUG MARKER: mkdir -p /tmp/t32/mnt/mdt /tmp/t32/mnt/ost
01:48:26:[ 2960.184272] Lustre: DEBUG MARKER: tar xjvf /usr/lib64/lustre/tests/disk1_8_up_2_5-ldiskfs.tar.bz2 -S -C /tmp/t32
01:48:26:[ 2960.465708] Lustre: DEBUG MARKER: cat /tmp/t32/commit
01:48:26:[ 2960.629076] Lustre: DEBUG MARKER: cat /tmp/t32/kernel
01:48:26:[ 2960.789007] Lustre: DEBUG MARKER: cat /tmp/t32/arch
01:48:26:[ 2960.953119] Lustre: DEBUG MARKER: cat /tmp/t32/bspace
01:48:26:[ 2961.135406] Lustre: DEBUG MARKER: cat /tmp/t32/ispace
01:48:37:[ 2961.835195] Lustre: DEBUG MARKER: /usr/sbin/lctl set_param debug=-1
01:48:37:[ 2962.064981] Lustre: DEBUG MARKER: tunefs.lustre --dryrun /tmp/t32/mdt
01:48:37:[ 2962.293098] Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lust
01:48:37:[ 2962.683563] Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lust
01:48:37:[ 2963.112844] Lustre: DEBUG MARKER: mount -t lustre -o loop,exclude=t32fs-OST0000 /tmp/t32/mdt /tmp/t32/mnt/mdt
01:48:37:[ 2963.220105] loop: module loaded
01:48:37:[ 2963.240628] LDISKFS-fs (loop0): mounted filesystem with ordered data mode. Opts: errors=remount-ro,user_xattr,acl,no_mbcache
01:48:37:[ 2963.241781] Lustre: 29175:0:(osd_handler.c:5773:osd_mount()) t32fs-MDT0000-osd: device /dev/loop0 was upgraded from Lustre-1.x without enabling the dirdata feature. If you do not want to downgrade to Lustre-1.x again, you can enable it via 'tune2fs -O dirdata device'
01:48:37:[ 2963.280558] Lustre: 29221:0:(mgs_llog.c:235:mgs_fsdb_handler()) MDT using 1.8 OSC name scheme
01:48:37:[ 2963.375900] Lustre: 29222:0:(obd_config.c:1497:class_config_llog_handler()) For 1.8 interoperability, rename obd type from mds to mdt
01:48:37:[ 2963.490512] Lustre: 29222:0:(obd_mount.c:885:lustre_check_exclusion()) Excluding t32fs-OST0000 (on exclusion list)
01:48:37:[ 2963.668973] Lustre: ctl-t32fs-MDT0000: super-sequence allocation rc = 0 [0x0000000200000400-0x0000000240000400):0:mdt
01:48:37:[ 2963.882424] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdt.t32fs-MDT0000.uuid
01:48:37:[ 2964.054176] Lustre: DEBUG MARKER: tunefs.lustre --dryrun /tmp/t32/ost
01:48:37:[ 2964.259355] Lustre: DEBUG MARKER: mount -t lustre -o loop,mgsnode=10.1.6.3@tcp /tmp/t32/ost /tmp/t32/mnt/ost
01:48:37:[ 2964.393345] LDISKFS-fs (loop1): mounted filesystem with ordered data mode. Opts: errors=remount-ro,,no_mbcache
01:48:37:[ 2964.427745] Lustre: srv-t32fs-OST0000: No data found on store. Initialize space
01:48:37:[ 2964.429269] Lustre: Skipped 1 previous similar message
01:48:37:[ 2964.755144] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n obdfilter.t32fs-OST0000.uuid
01:48:37:[ 2964.934554] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-OST0000.osc.max_dirty_mb=15
01:48:37:[ 2965.037764] Lustre: Setting parameter t32fs-OST0000-osc.osc.max_dirty_mb in log t32fs-client
01:48:37:[ 2965.038645] Lustre: Skipped 26 previous similar messages
01:48:37:[ 2965.129618] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-OST0000.failover.node=10.1.6.3@tcp
01:48:37:[ 2965.319212] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-MDT0000.mdc.max_rpcs_in_flight=9
01:48:37:[ 2965.512505] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-MDT0000.failover.node=10.1.6.3@tcp
01:48:37:[ 2965.542134] ------------[ cut here ]------------
01:48:37:[ 2965.543026] kernel BUG at mm/slub.c:3379!
01:48:37:[ 2965.543026] invalid opcode: 0000 [#1] SMP 
01:48:37:[ 2965.544007] Modules linked in: ofd(OF) ost(OF) loop lustre(OF) lmv(OF) mdc(OF) lov(OF) osp(OF) mdd(OF) lod(OF) mdt(OF) lfsck(OF) mgs(OF) mgc(OF) osd_ldiskfs(OF) lquota(OF) fid(OF) fld(OF) ksocklnd(OF) ptlrpc(OF) obdclass(OF) lnet(OF) sha512_generic libcfs(OF) ldiskfs(OF) dm_mod nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd fscache xprtrdma sunrpc ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ppdev ib_sa serio_raw ib_mad pcspkr virtio_balloon i2c_piix4 parport_pc parport ib_core ib_addr ext4 mbcache jbd2 ata_generic pata_acpi virtio_blk cirrus syscopyarea sysfillrect sysimgblt drm_kms_helper ata_piix ttm 8139too virtio_pci 8139cp virtio_ring virtio mii drm libata i2c_core floppy
01:48:37:[ 2965.544007] CPU: 0 PID: 12 Comm: rcuos/0 Tainted: GF          O--------------   3.10.0-229.4.2.el7_lustre.x86_64 #1
01:48:37:[ 2965.544007] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
01:48:37:[ 2965.544007] task: ffff88007c050000 ti: ffff88007c04c000 task.ti: ffff88007c04c000
01:48:37:[ 2965.544007] RIP: 0010:[<ffffffff811ab1f3>]  [<ffffffff811ab1f3>] kfree+0x133/0x140
01:48:37:[ 2965.544007] RSP: 0018:ffff88007c04fd98  EFLAGS: 00010246
01:48:37:[ 2965.544007] RAX: 001fffff00000000 RBX: ffff88007b220064 RCX: 0000000180150001
01:48:37:[ 2965.544007] RDX: 001fffff00000000 RSI: 0000000000000002 RDI: ffff88007b220064
01:48:37:[ 2965.544007] RBP: ffff88007c04fdb0 R08: ffff880079016540 R09: 0000000180150001
01:48:37:[ 2965.544007] R10: ffffea0001ec8800 R11: ffffffff810b6df0 R12: ffff8800654fc000
01:48:37:[ 2965.544007] R13: ffffffff810b6e02 R14: ffffffff81a25a00 R15: ffff8800654fc0f8
01:48:37:[ 2965.544007] FS:  0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
01:48:37:[ 2965.544007] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
01:48:37:[ 2965.544007] CR2: 00007f3e36b98000 CR3: 00000000366ea000 CR4: 00000000000006f0
01:48:37:[ 2965.544007] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
01:48:37:[ 2965.544007] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
01:48:37:[ 2965.544007] Stack:
01:48:37:[ 2965.544007]  0000000000000002 ffff8800654fc000 0000000000000002 ffff88007c04fde0
01:48:37:[ 2965.544007]  ffffffff810b6e02 ffff8800654fc000 0000000000000002 0000000000000000
01:48:37:[ 2965.544007]  0000000000000013 ffff88007c04fdf8 ffffffff810a2b72 ffff8800654fc0f8
01:48:37:[ 2965.544007] Call Trace:
01:48:37:[ 2965.544007]  [<ffffffff810b6e02>] free_fair_sched_group+0xa2/0xc0
01:48:37:[ 2965.544007]  [<ffffffff810a2b72>] free_sched_group+0x12/0x30
01:48:37:[ 2965.544007]  [<ffffffff810a2ba5>] free_sched_group_rcu+0x15/0x20
01:48:37:[ 2965.544007]  [<ffffffff81113289>] rcu_nocb_kthread+0x229/0x370
01:48:37:[ 2965.544007]  [<ffffffff81098350>] ? wake_up_bit+0x30/0x30
01:48:37:[ 2965.544007]  [<ffffffff81113060>] ? rcu_start_gp+0x40/0x40
01:48:37:[ 2965.544007]  [<ffffffff8109739f>] kthread+0xcf/0xe0
01:48:37:[ 2965.544007]  [<ffffffff810972d0>] ? kthread_create_on_node+0x140/0x140
01:48:37:[ 2965.544007]  [<ffffffff81614f7c>] ret_from_fork+0x7c/0xb0
01:48:37:[ 2965.544007]  [<ffffffff810972d0>] ? kthread_create_on_node+0x140/0x140
01:48:37:[ 2965.544007] Code: 49 8b 02 31 f6 f6 c4 40 74 04 41 8b 72 68 4c 89 d7 e8 a2 42 fb ff eb 8f 4c 8b 50 30 48 8b 10 80 e6 80 4c 0f 44 d0 e9 32 ff ff ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 
01:48:37:[ 2965.544007] RIP  [<ffffffff811ab1f3>] kfree+0x133/0x140
01:48:37:[ 2965.544007]  RSP <ffff88007c04fd98>
01:48:37:[ 2965.544007] ------------[ cut here ]------------
01:48:37:[ 2965.544007] kernel BUG at mm/vmalloc.c:1339!
01:48:37:[ 2965.544007] invalid opcode: 0000 [#2] SMP 
01:48:37:[ 2965.544007] Modules linked in: ofd(OF) ost(OF) loop lustre(OF) lmv(OF) mdc(OF) lov(OF) osp(OF) mdd(OF) lod(OF) mdt(OF) lfsck(OF) mgs(OF) mgc(OF) osd_ldiskfs(OF) lquota(OF) fid(OF) fld(OF) ksocklnd(OF) ptlrpc(OF) obdclass(OF) lnet(OF) sha512_generic libcfs(OF) ldiskfs(OF) dm_mod nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd fscache xprtrdma sunrpc ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ppdev ib_sa serio_raw ib_mad pcspkr virtio_balloon i2c_piix4 parport_pc parport ib_core ib_addr ext4 mbcache jbd2 ata_generic pata_acpi virtio_blk cirrus syscopyarea sysfillrect sysimgblt drm_kms_helper ata_piix ttm 8139too virtio_pci 8139cp virtio_ring virtio mii drm libata i2c_core floppy
01:48:37:[ 2965.544007] CPU: 0 PID: 12 Comm: rcuos/0 Tainted: GF          O--------------   3.10.0-229.4.2.el7_lustre.x86_64 #1
01:48:37:[ 2965.544007] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
01:48:37:[ 2965.544007] task: ffff88007c050000 ti: ffff88007c04c000 task.ti: ffff88007c04c000
01:48:37:[ 2965.544007] RIP: 0010:[<ffffffff811903ae>]  [<ffffffff811903ae>] __get_vm_area_node+0x1ce/0x1d0
01:48:37:[ 2965.544007] RSP: 0018:ffff88007c04f390  EFLAGS: 00010006
01:48:37:[ 2965.544007] RAX: ffff88007c04ffd8 RBX: 00000000ffffffff RCX: ffffc90000000000
01:48:37:[ 2965.544007] RDX: 0000000000000022 RSI: 0000000000000001 RDI: 0000000000002000
01:48:37:[ 2965.544007] RBP: ffff88007c04f3f0 R08: ffffe8ffffffffff R09: 00000000ffffffff
01:48:37:[ 2965.544007] R10: ffff880062d55180 R11: ffff8800366a78d0 R12: ffffffffa01092c9
01:48:37:[ 2965.544007] R13: 0000000000001200 R14: 00000000000080d2 R15: ffffea0000d951c0
01:48:37:[ 2965.544007] FS:  0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
01:48:37:[ 2965.544007] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
01:48:37:[ 2965.544007] CR2: 00007f3e36b98000 CR3: 00000000366ea000 CR4: 00000000000006f0
01:48:37:[ 2965.544007] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
01:48:37:[ 2965.544007] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
01:48:37:[ 2965.544007] Stack:
01:48:37:[ 2965.544007]  ffffffff81191b1d 00000000000080d2 ffffffffa01092c9 8000000000000163
01:48:37:[ 2965.544007]  000080d200000000 0000000000000000 00000000e0857f38 ffff880062d55180
01:48:37:[ 2965.544007]  ffff8800366a78b0 0000000000240000 0000000000000080 ffffea0000d951c0
01:48:37:[ 2965.544007] Call Trace:
01:48:37:[ 2965.544007]  [<ffffffff81191b1d>] ? __vmalloc_node_range+0x7d/0x270
01:48:37:[ 2965.544007]  [<ffffffffa01092c9>] ? ttm_tt_init+0x69/0xb0 [ttm]
01:48:37:[ 2965.544007]  [<ffffffff81191d51>] __vmalloc+0x41/0x50
01:48:37:[ 2965.544007]  [<ffffffffa01092c9>] ? ttm_tt_init+0x69/0xb0 [ttm]
01:48:37:[ 2965.544007]  [<ffffffffa01092c9>] ttm_tt_init+0x69/0xb0 [ttm]
01:48:37:[ 2965.544007]  [<ffffffffa00d34e8>] cirrus_ttm_tt_create+0x58/0x90 [cirrus]
01:48:37:[ 2965.544007]  [<ffffffffa0109a7d>] ttm_bo_add_ttm+0x8d/0xc0 [ttm]
01:48:37:[ 2965.544007]  [<ffffffffa010b0f1>] ttm_bo_handle_move_mem+0x571/0x5b0 [ttm]
01:48:37:[ 2965.544007]  [<ffffffff81601994>] ? __slab_free+0x10e/0x277
01:48:37:[ 2965.544007]  [<ffffffffa010b74a>] ? ttm_bo_mem_space+0x10a/0x310 [ttm]
01:48:37:[ 2965.544007]  [<ffffffffa010be17>] ttm_bo_validate+0x247/0x260 [ttm]
01:48:37:[ 2965.544007]  [<ffffffff81059e69>] ? iounmap+0x79/0xa0
01:48:37:[ 2965.544007]  [<ffffffff81050069>] ? kgdb_arch_late+0xe9/0x180
01:48:37:[ 2965.544007]  [<ffffffffa00d3ac2>] cirrus_bo_push_sysram+0x82/0xe0 [cirrus]
01:48:37:[ 2965.544007]  [<ffffffffa00d1c84>] cirrus_crtc_do_set_base.isra.8.constprop.10+0x84/0x430 [cirrus]
01:48:37:[ 2965.544007]  [<ffffffffa00d2479>] cirrus_crtc_mode_set+0x449/0x4d0 [cirrus]
01:48:37:[ 2965.544007]  [<ffffffffa012a939>] drm_crtc_helper_set_mode+0x2e9/0x520 [drm_kms_helper]
01:48:37:[ 2965.544007]  [<ffffffffa012b6bf>] drm_crtc_helper_set_config+0x87f/0xaa0 [drm_kms_helper]
01:48:37:[ 2965.544007]  [<ffffffff8160919b>] ? __ww_mutex_lock+0x1b/0xa0
01:48:37:[ 2965.544007]  [<ffffffffa0088711>] drm_mode_set_config_internal+0x61/0xe0 [drm]
01:48:37:[ 2965.544007]  [<ffffffffa0133a94>] drm_fb_helper_pan_display+0x94/0xf0 [drm_kms_helper]
01:48:37:[ 2965.544007]  [<ffffffff81326259>] fb_pan_display+0xc9/0x190
01:48:37:[ 2965.544007]  [<ffffffff813352e0>] bit_update_start+0x20/0x50
01:48:37:[ 2965.544007]  [<ffffffff81334d0d>] fbcon_switch+0x39d/0x5a0
01:48:37:[ 2965.544007]  [<ffffffff813a3549>] redraw_screen+0x1a9/0x270
01:48:37:[ 2965.544007]  [<ffffffff8132645e>] ? fb_blank+0xae/0xc0
01:48:37:[ 2965.544007]  [<ffffffff8133222a>] fbcon_blank+0x22a/0x2f0
01:48:37:[ 2965.544007]  [<ffffffff81070384>] ? wake_up_klogd+0x34/0x50
01:48:37:[ 2965.544007]  [<ffffffff810705a8>] ? console_unlock+0x208/0x400
01:48:37:[ 2965.544007]  [<ffffffff8107ee63>] ? __internal_add_timer+0x113/0x130
01:48:37:[ 2965.544007]  [<ffffffff8107f057>] ? internal_add_timer+0x17/0x40
01:48:37:[ 2965.544007]  [<ffffffff81080bfd>] ? mod_timer+0x11d/0x240
01:48:37:[ 2965.544007]  [<ffffffff813a3c08>] do_unblank_screen+0xb8/0x1f0
01:48:37:[ 2965.544007]  [<ffffffff813a3d50>] unblank_screen+0x10/0x20
01:48:37:[ 2965.544007]  [<ffffffff812e4b59>] bust_spinlocks+0x19/0x40
01:48:37:[ 2965.544007]  [<ffffffff8160d6a8>] oops_end+0x38/0x150
01:48:37:[ 2965.544007]  [<ffffffff810173eb>] die+0x4b/0x70
01:48:37:[ 2965.544007]  [<ffffffff8160ce60>] do_trap+0x60/0x170
01:48:37:[ 2965.544007]  [<ffffffff81014224>] do_invalid_op+0xb4/0x130
01:48:37:[ 2965.544007]  [<ffffffff811ab1f3>] ? kfree+0x133/0x140
01:48:37:[ 2965.544007]  [<ffffffff810a67a5>] ? check_preempt_curr+0x85/0xa0
01:48:37:[ 2965.544007]  [<ffffffff810a67d9>] ? ttwu_do_wakeup+0x19/0xc0
01:48:37:[ 2965.544007]  [<ffffffff810b6e02>] ? free_fair_sched_group+0xa2/0xc0
01:48:37:[ 2965.544007]  [<ffffffff8161675e>] invalid_op+0x1e/0x30
01:48:37:[ 2965.544007]  [<ffffffff810b6e02>] ? free_fair_sched_group+0xa2/0xc0
01:48:37:[ 2965.544007]  [<ffffffff810b6df0>] ? free_fair_sched_group+0x90/0xc0
01:48:37:[ 2965.544007]  [<ffffffff811ab1f3>] ? kfree+0x133/0x140

Comment by Gerrit Updater [ 17/Jul/15 ]

Yang Sheng (yang.sheng@intel.com) uploaded a new patch: http://review.whamcloud.com/15635
Subject: LU-6828 lod: fix memory leak in lod_connect_to_osd
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: c7d37a8c321e7d358b3d64c68a9411ca63babc99

Comment by Gerrit Updater [ 29/Jul/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/15635/
Subject: LU-6828 lod: fix memory leak in lod_connect_to_osd
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: bb813cfb78affdeafb6cd6e748e5df8c0330b600

Comment by Peter Jones [ 29/Jul/15 ]

Landed for 2.8

Comment by Andreas Dilger [ 03/Nov/15 ]

Did this bug suddenly become visible for RHEL 7 because it has 16-byte slab allocation sizes? I think the smallest slab allocation size for RHEL6 is 32 bytes, which would mean that even the smaller allocation size would be safe since it is rounded up to 32 bytes, and would only be visible if CONFIG_DEBUG_SLAB is enabled it similar. If the slab size is only 16 bytes then (strlen("lustre-mdtlov") + 1) is only 15 but (strlen("lustre-MDT0000-osd") + 1) is 19.

Comment by Yang Sheng [ 04/Nov/15 ]

Yes, The smallest size is 8bytes in RHEL7. So it is more easy to crash since memory leak.

Generated at Sat Feb 10 02:03:36 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.