[LU-2073] procfs symlinks are apparently never freed Created: 02/Oct/12  Updated: 07/Nov/12  Resolved: 07/Nov/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.4.0

Type: Bug Priority: Minor
Reporter: Oleg Drokin Assignee: Yang Sheng
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 4327

 Description   

It looks like we need to somehow free lustre subdevice symlinks. After running with memleak detector I see a bunch of entries like this:

unreferenced object 0xffff880131381458 (size 64):
  comm "llog_process_th", pid 12528, jiffies 4296618724
  hex dump (first 32 bytes):
    2e 2e 2f 6f 73 70 2f 6c 75 73 74 72 65 2d 4f 53  ../osp/lustre-OS
    54 30 30 30 31 2d 6f 73 63 2d 4d 44 54 30 30 30  T0001-osc-MDT000
  backtrace:
    [<ffffffff814e4aee>] kmemleak_alloc+0x5e/0xd0
    [<ffffffff81164533>] __kmalloc+0x1c3/0x2f0
    [<ffffffff811eb433>] proc_symlink+0x53/0xb0
    [<ffffffffa0370c5a>] 0xffffffffa0370c5a
...

unreferenced object 0xffff8800dc8cc2a0 (size 64):
  comm "llog_process_th", pid 12528, jiffies 4296618722
  hex dump (first 32 bytes):
    2e 2e 2f 6f 73 70 2f 6c 75 73 74 72 65 2d 4f 53  ../osp/lustre-OS
    54 30 30 30 30 2d 6f 73 63 2d 4d 44 54 30 30 30  T0000-osc-MDT000
  backtrace:
    [<ffffffff814e4aee>] kmemleak_alloc+0x5e/0xd0
    [<ffffffff81164533>] __kmalloc+0x1c3/0x2f0
    [<ffffffff811eb433>] proc_symlink+0x53/0xb0
    [<ffffffffa0370c5a>] 0xffffffffa0370c5a
...
unreferenced object 0xffff880250630400 (size 64):
  comm "mount.lustre", pid 12525, jiffies 4296617672
  hex dump (first 32 bytes):
    2e 2e 2f 2e 2e 2f 2e 2e 2f 6d 64 63 2f 6c 75 73  ../../../mdc/lus
    74 72 65 2d 4d 44 54 30 30 30 30 2d 6d 64 63 2d  tre-MDT0000-mdc-
  backtrace:
    [<ffffffff814e4aee>] kmemleak_alloc+0x5e/0xd0
    [<ffffffff81164533>] __kmalloc+0x1c3/0x2f0
    [<ffffffff811eb433>] proc_symlink+0x53/0xb0
    [<ffffffffa0370c5a>] 0xffffffffa0370c5a
...
unreferenced object 0xffff88014c3e7ea0 (size 32):
  comm "llog_process_th", pid 12225, jiffies 4296616700
  hex dump (first 32 bytes):
    2e 2e 2f 6c 6f 64 2f 6c 75 73 74 72 65 2d 4d 44  ../lod/lustre-MD
    54 30 30 30 30 2d 6d 64 74 6c 6f 76 00 5a 5a a5  T0000-mdtlov.ZZ.
  backtrace:
    [<ffffffff814e4aee>] kmemleak_alloc+0x5e/0xd0
    [<ffffffff81164533>] __kmalloc+0x1c3/0x2f0
    [<ffffffff811eb433>] proc_symlink+0x53/0xb0
    [<ffffffffa0370c5a>] 0xffffffffa0370c5a
...
unreferenced object 0xffff8800d85ab820 (size 64):
  comm "mount.lustre", pid 12178, jiffies 4296616645
  hex dump (first 32 bytes):
    2e 2e 2f 2e 2e 2f 6f 73 64 2d 6c 64 69 73 6b 66  ../../osd-ldiskf
    73 2f 6c 75 73 74 72 65 2d 4d 44 54 30 30 30 30  s/lustre-MDT0000
  backtrace:
    [<ffffffff814e4aee>] kmemleak_alloc+0x5e/0xd0
    [<ffffffff81164533>] __kmalloc+0x1c3/0x2f0
    [<ffffffff811eb433>] proc_symlink+0x53/0xb0
    [<ffffffffa0370c5a>] 0xffffffffa0370c5a
...

and so on.



 Comments   
Comment by Alex Zhuravlev [ 02/Oct/12 ]

there is the following lines in osp:

if (m->opd_symlink)
lprocfs_remove(&m->opd_symlink);

Comment by Oleg Drokin [ 02/Oct/12 ]

Looking at lprocfs_remove_nolock we see this:

#ifdef HAVE_PROCFS_USERS
                /* if procfs uses user count to synchronize deletion of
                 * proc entry, there is no protection for rm_entry->data,
                 * then lprocfs_fops_read and lprocfs_fops_write maybe
                 * call proc_dir_entry->read_proc (or write_proc) with
                 * proc_dir_entry->data == NULL, then cause kernel Oops.
                 * see bug19706 for detailed information */

                /* procfs won't free rm_entry->data if it isn't a LINK,
                 * and Lustre won't use rm_entry->data if it is a LINK */
                if (S_ISLNK(rm_entry->mode))
                        rm_entry->data = NULL;
#else

Now, if we look into proc_symlink, we see that entry->data is kmalloced so that's how the leak appears.

In free_proc_entry we can see:

        if (S_ISLNK(de->mode))
                kfree(de->data);

As such I think the assignment of NULL is in error.

Last person touching this code (and probably even adding the NULL assignment) was YangSheng, so perhaps he can chime in here?

Comment by Peter Jones [ 04/Oct/12 ]

Yangsheng

Could you please look into this one?

Thanks

Peter

Comment by Yang Sheng [ 02/Nov/12 ]

Patch commit to :http://review.whamcloud.com/4434

Comment by Peter Jones [ 07/Nov/12 ]

Landed for 2.4

Generated at Sat Feb 10 01:22:08 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.