Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5376

slab error in kmem_cache_destroy(): cache `xattr_kmem': Can't free all objects

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Major
    • None
    • Lustre 2.7.0
    • None
    • 3
    • 14983

    Description

      Running racer with an extra patch to increase number of operations: http://review.whamcloud.com/#/c/5936/5

      I am seemingly hitting a memory leak in xattr code:

      <3>[362074.759838] slab error in kmem_cache_destroy(): cache `xattr_kmem': Can't free all objects
      <4>[362074.760559] Pid: 3154, comm: rmmod Not tainted 2.6.32-rhe6.5-debug #2
      <4>[362074.760962] Call Trace:
      <4>[362074.761282]  [<ffffffff8116d069>] ? __slab_error+0x29/0x30
      <4>[362074.761882]  [<ffffffff81171506>] ? kmem_cache_destroy+0xa6/0xf0
      <4>[362074.762308]  [<ffffffffa0e4aabd>] ? lu_kmem_fini+0x2d/0x50 [obdclass]
      <4>[362074.762878]  [<ffffffffa0c4c015>] ? ll_xattr_fini+0x15/0x20 [lustre]
      <4>[362074.763507]  [<ffffffffa0c6a3c2>] ? exit_lustre_lite+0xe/0xd3 [lustre]
      <4>[362074.764163]  [<ffffffff810b81b4>] ? sys_delete_module+0x194/0x260
      <4>[362074.764813]  [<ffffffff8151989e>] ? do_page_fault+0x3e/0xa0
      <4>[362074.765379]  [<ffffffff8100b0b2>] ? system_call_fastpath+0x16/0x1b
      <6>[362080.901752] LNet: Removed LNI 192.168.10.220@tcp
      <3>[362081.031031] LustreError: 3254:0:(class_obd.c:708:cleanup_obdclass()) obd_memory max: 149654698, leaked: 551
      

      This is followed by problems to load lustre modules again and then dies on invalid pointer dereference suggesting there's something handling allocation failures incorrectly:

      
      

      <6>[362101.036189] Lustre: Lustre: Build Version: 2.6.50-gc5e9f13-CHANGED-2.6.32-rhe6.5-debug
      <3>[362101.039867] SLAB: cache with size 64 has lost its name
      ...
      <6>[362104.420882] LNet: Added LNI 192.168.10.220@tcp [8/256/0/180]
      <6>[362104.421503] LNet: Accept secure, port 988
      <3>[362104.424302] SLAB: cache with size 64 has lost its name
      ... (repeated many-many times)
      <3>[362111.779957] SLAB: cache with size 64 has lost its name
      <3>[362115.274250] kmem_cache_create: duplicate cache xattr_kmem
      <4>[362115.274649] Pid: 4220, comm: insmod Not tainted 2.6.32-rhe6.5-debug #2
      <4>[362115.275063] Call Trace:
      <4>[362115.275425] [<ffffffff81172465>] ? kmem_cache_create+0x655/0x6e0
      <4>[362115.275845] [<ffffffffa11ec66e>] ? lu_env_init+0x1e/0x30 [obdclass]
      <4>[362115.276281] [<ffffffffa0c6048f>] ? ccc_global_init+0x5f/0xb0 [lustre]
      <4>[362115.276941] [<ffffffffa11f48fd>] ? cl_env_new+0x15d/0x350 [obdclass]
      <4>[362115.277604] [<ffffffffa11e8b28>] ? lu_kmem_init+0x48/0x80 [obdclass]
      <4>[362115.278317] [<ffffffffa0c4c035>] ? ll_xattr_init+0x15/0x20 [lustre]
      <4>[362115.278989] [<ffffffffa0a171e7>] ? init_lustre_lite+0x1e7/0x280 [lustre]
      <4>[362115.279449] [<ffffffffa0a17000>] ? init_lustre_lite+0x0/0x280 [lustre]
      <4>[362115.279910] [<ffffffff8100204c>] ? do_one_initcall+0x3c/0x1d0
      <4>[362115.280346] [<ffffffff810bb291>] ? sys_init_module+0xe1/0x250
      <4>[362115.280890] [<ffffffff8100b0b2>] ? system_call_fastpath+0x16/0x1b
      <1>[362115.428554] BUG: unable to handle kernel paging request at ffffffffa0c97860
      <1>[362115.429294] IP: [<ffffffffa11e7e84>] keys_fill+0x54/0x190 [obdclass]
      <4>[362115.430109] PGD 1a27067 PUD 1a2b063 PMD b7af4067 PTE 0
      <4>[362115.430756] Oops: 0000 1 SMP DEBUG_PAGEALLOC
      <4>[362115.431365] last sysfs file: /sys/devices/system/cpu/online
      <4>[362115.432021] CPU 2
      <4>[362115.432112] Modules linked in: ofd osp lod ost mdt mdd mgs nodemap osd_ldiskfs ldiskfs lquota lfsck obdecho mgc lov osc mdc lmv fid fld ptlrpc obdclass ksocklnd lnet libcfs exportfs jbd sha512_generic sha256_generic ext4 jbd2 mbcache virtio_balloon virtio_console i2c_piix4 i2c_core virtio_blk virtio_net virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod nfs lockd fscache auth_rpcgss nfs_acl sunrpc be2iscsi bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: libcfs]
      <4>[362115.432512]
      <4>[362115.432512] Pid: 3956, comm: ptlrpcd_3 Not tainted 2.6.32-rhe6.5-debug #2 Red Hat KVM
      <4>[362115.432512] RIP: 0010:[<ffffffffa11e7e84>] [<ffffffffa11e7e84>] keys_fill+0x54/0x190 [obdclass]
      <4>[362115.432512] RSP: 0018:ffff880079cb3cf0 EFLAGS: 00010286
      <4>[362115.432512] RAX: ffff880056b0bdf0 RBX: 00000000000000e0 RCX: 0000000000000000
      <4>[362115.442964] RDX: ffff88000c5a0f70 RSI: ffff880026cd63b0 RDI: ffff880079cb3e00
      <4>[362115.443718] RBP: ffff880079cb3d30 R08: 0000000000000000 R09: 0000000000000000
      <4>[362115.444089] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880079cb3e00
      <4>[362115.444089] R13: ffffffffa0c97860 R14: 0000000000000000 R15: 0000000000000000
      <4>[362115.444089] FS: 0000000000000000(0000) GS:ffff880006280000(0000) knlGS:0000000000000000
      <4>[362115.447312] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
      <4>[362115.447312] CR2: ffffffffa0c97860 CR3: 0000000001a25000 CR4: 00000000000006e0
      <4>[362115.449181] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      <4>[362115.449181] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      <4>[362115.449181] Process ptlrpcd_3 (pid: 3956, threadinfo ffff880079cb2000, task ffff8800b57c6500)
      <4>[362115.449181] Stack:
      <4>[362115.449181] ffff880079cb3d10 ffffffff810829b2 ffff8800bb1e8000 ffff880079cb3e00
      <4>[362115.449181] <d> ffff880026cd63b0 ffff880079cb3e00 0000000000000000 0000000000000000
      <4>[362115.449181] <d> ffff880079cb3d40 ffffffffa11e7fdd ffff880079cb3d60 ffffffffa11e8006
      <4>[362115.449181] Call Trace:
      <4>[362115.449181] [<ffffffff810829b2>] ? del_timer_sync+0x22/0x30
      <4>[362115.449181] [<ffffffffa11e7fdd>] lu_context_refill+0x1d/0x30 [obdclass]
      <4>[362115.449181] [<ffffffffa11e8006>] lu_env_refill+0x16/0x30 [obdclass]
      <4>[362115.449181] [<ffffffffa13e859f>] ptlrpcd_check+0x4f/0x590 [ptlrpc]
      <4>[362115.449181] [<ffffffffa13e906d>] ptlrpcd+0x2ad/0x3f0 [ptlrpc]
      <4>[362115.449181] [<ffffffff8105de00>] ? default_wake_function+0x0/0x20
      <4>[362115.449181] [<ffffffffa13e8dc0>] ? ptlrpcd+0x0/0x3f0 [ptlrpc]
      <4>[362115.449181] [<ffffffff81098c06>] kthread+0x96/0xa0
      <4>[362115.449181] [<ffffffff8100c24a>] child_rip+0xa/0x20
      <4>[362115.449181] [<ffffffff81098b70>] ? kthread+0x0/0xa0
      <4>[362115.449181] [<ffffffff8100c240>] ? child_rip+0x0/0x20
      <4>[362115.449181] Code: 08 48 81 fb 40 01 00 00 41 89 44 24 28 0f 84 c4 00 00 00 49 8b 44 24 10 4c 8b ab 20 7e 27 a1 48 83 3c 18 00 75 d1 4d 85 ed 74 cc <41> 8b 45 00 41 85 04 24 74 c2 a9 00 00 00 40 75 bb 4c 89 ee 4c
      <1>[362115.449181] RIP [<ffffffffa11e7e84>] keys_fill+0x54/0x190 [obdclass]

      Attachments

        Activity

          People

            wc-triage WC Triage
            green Oleg Drokin
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: