[LU-4702] crash in idmap_destroy() when unload module Created: 04/Mar/14  Updated: 11/Mar/14  Resolved: 11/Mar/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0
Fix Version/s: Lustre 2.6.0

Type: Bug Priority: Blocker
Reporter: Niu Yawei (Inactive) Assignee: Niu Yawei (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 12937

 Description   

It's easy to reproduce it: sh llmount.sh; sh llmountcleanup.sh

BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffffa035545e>] idmap_destroy+0xe/0x1d0 [nodemap]
PGD 6069a067 PUD ce0e067 PMD 0 
Oops: 0000 [#1] SMP 
last sysfs file: /sys/kernel/mm/ksm/run
CPU 0 
Modules linked in: nodemap(-) exportfs lquota lfsck jbd obdecho mgc lov osc mdc lmv fid fld ptlrpc obdclass ksocklnd lnet sha512_generic sha256_generic crc32c_intel libcfs ebtable_nat ebtables xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 ipt_REJECT iptable_filter ip_tables bridge stp llc autofs4 sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 fuse dm_mirror dm_region_hash dm_log dm_mod uinput ppdev parport_pc parport e1000 snd_ens1371 snd_rawmidi snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc sg vmware_balloon i2c_piix4 i2c_core shpchp ext4 jbd2 mbcache sd_mod crc_t10dif sr_mod cdrom mptspi mptscsih mptbase scsi_transport_spi pata_acpi ata_generic ata_piix [last unloaded: mgs]

Pid: 20413, comm: rmmod Not tainted 2.6.32431 #1 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
RIP: 0010:[<ffffffffa035545e>]  [<ffffffffa035545e>] idmap_destroy+0xe/0x1d0 [nodemap]
RSP: 0018:ffff88007d3fdd98  EFLAGS: 00010292
RAX: 0000000000000000 RBX: ffffffffffffffe0 RCX: 0000000000000003
RDX: 0000000000000001 RSI: ffff880037e4f930 RDI: ffffffffffffffe0
RBP: ffff88007d3fdda8 R08: ffffffff81c064c0 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffffffffe0
R13: ffff880037e4f8e8 R14: 0000000000000000 R15: ffff880037e4f930
FS:  00007f3e0c93e700(0000) GS:ffff88000c400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000060162000 CR4: 00000000000407f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process rmmod (pid: 20413, threadinfo ffff88007d3fc000, task ffff88007a18d500)
Stack:
 ffff88007d3fde58 ffffffffffffffe0 ffff88007d3fddd8 ffffffffa0355663
<d> 0000000000000000 ffffffff810d3419 ffff880037e4f8c0 ffff880037e4f8e8
<d> ffff88007d3fde08 ffffffffa03532c6 ffff88007d3fdde8 ffff88007d443780
Call Trace:
 [<ffffffffa0355663>] idmap_delete_tree+0x43/0x60 [nodemap]
 [<ffffffff810d3419>] ? __stop_cpus+0x59/0x80
 [<ffffffffa03532c6>] nodemap_destroy+0x56/0x210 [nodemap]
 [<ffffffffa03534ad>] nodemap_putref+0x2d/0xa0 [nodemap]
 [<ffffffffa0353532>] nodemap_hs_put_locked+0x12/0x20 [nodemap]
 [<ffffffffa040ac21>] cfs_hash_bd_del_locked+0x91/0x140 [libcfs]
 [<ffffffffa040c1d1>] cfs_hash_putref+0x191/0x480 [libcfs]
 [<ffffffffa03545ba>] nodemap_cleanup_all+0x2a/0x30 [nodemap]
 [<ffffffffa03545ce>] nodemap_mod_exit+0xe/0x20 [nodemap]
 [<ffffffff810b94d4>] sys_delete_module+0x194/0x260
 [<ffffffff810e16c7>] ? audit_syscall_entry+0x1d7/0x200
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Code: 74 d9 e9 6c ff ff ff 66 0f 1f 44 00 00 48 83 c7 48 e9 4c ff ff ff 0f 1f 80 00 00 00 00 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 00 <48> 8b 47 20 48 8d 57 20 48 89 fb 48 83 e0 fc 48 39 d0 0f 84 73 
RIP  [<ffffffffa035545e>] idmap_destroy+0xe/0x1d0 [nodemap]
 RSP <ffff88007d3fdd98>
CR2: 0000000000000000
---[ end trace 1c64b1dd0883bf35 ]---
Kernel panic - not syncing: Fatal exception
Pid: 20413, comm: rmmod Tainted: G      D    ---------------    2.6.32431 #1
Call Trace:
 [<ffffffff81526137>] ? panic+0xa7/0x16f
 [<ffffffff8152a474>] ? oops_end+0xe4/0x100
 [<ffffffff8104a04b>] ? no_context+0xfb/0x260
 [<ffffffff8104a2d5>] ? __bad_area_nosemaphore+0x125/0x1e0
 [<ffffffff8104a3fe>] ? bad_area+0x4e/0x60
 [<ffffffff8104abaf>] ? __do_page_fault+0x3cf/0x480
 [<ffffffff81059b61>] ? update_curr+0xe1/0x1f0
 [<ffffffff81526850>] ? thread_return+0x4e/0x76e
 [<ffffffff81014979>] ? sched_clock+0x9/0x10
 [<ffffffff8152c39e>] ? do_page_fault+0x3e/0xa0
 [<ffffffff81529755>] ? page_fault+0x25/0x30
 [<ffffffffa035545e>] ? idmap_destroy+0xe/0x1d0 [nodemap]
 [<ffffffffa0355663>] ? idmap_delete_tree+0x43/0x60 [nodemap]
 [<ffffffff810d3419>] ? __stop_cpus+0x59/0x80
 [<ffffffffa03532c6>] ? nodemap_destroy+0x56/0x210 [nodemap]
 [<ffffffffa03534ad>] ? nodemap_putref+0x2d/0xa0 [nodemap]
 [<ffffffffa0353532>] ? nodemap_hs_put_locked+0x12/0x20 [nodemap]
 [<ffffffffa040ac21>] ? cfs_hash_bd_del_locked+0x91/0x140 [libcfs]
 [<ffffffffa040c1d1>] ? cfs_hash_putref+0x191/0x480 [libcfs]
 [<ffffffffa03545ba>] ? nodemap_cleanup_all+0x2a/0x30 [nodemap]
 [<ffffffffa03545ce>] ? nodemap_mod_exit+0xe/0x20 [nodemap]
 [<ffffffff810b94d4>] ? sys_delete_module+0x194/0x260
 [<ffffffff810e16c7>] ? audit_syscall_entry+0x1d7/0x200
 [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b


 Comments   
Comment by Niu Yawei (Inactive) [ 04/Mar/14 ]

Look into the code:

#define nm_rbtree_postorder_for_each_entry_safe(pos, n,                 \
                                                root, field)            \
        for (pos = rb_entry(nm_rb_first_postorder(root), typeof(*pos),  \
                            field),                                     \
                n = rb_entry(nm_rb_next_postorder(&pos->field),         \
                typeof(*pos), field);                                   \
                &pos->field;                                            \
                pos = n,                                                \
                n = rb_entry(nm_rb_next_postorder(&pos->field),         \
                             typeof(*pos), field))

Shouldn't we check if nm_rb_first/next_postorder(root) returns NULL?

Comment by Niu Yawei (Inactive) [ 05/Mar/14 ]

http://review.whamcloud.com/9500

Comment by Niu Yawei (Inactive) [ 11/Mar/14 ]

patch landed for 2.6

Generated at Sat Feb 10 01:45:06 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.