[LU-8001] Null pointer dereference in nm_member_reclassify_nodemap Created: 11/Apr/16  Updated: 25/Jul/16  Resolved: 27/Apr/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.9.0

Type: Bug Priority: Major
Reporter: Oleg Drokin Assignee: Kit Westneat
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Hit this in my testing today (receovery-small test 111):

<1>[231830.373084] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
<1>[231830.374591] IP: [<ffffffffa08966a1>] nm_member_reclassify_nodemap+0x71/0x130 [ptlrpc]
<4>[231830.376120] PGD 414cc067 PUD 2ba8f067 PMD 0 
<4>[231830.376513] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
<4>[231830.376513] last sysfs file: /sys/devices/virtual/block/loop0/queue/scheduler
<4>[231830.376513] CPU 1 
<4>[231830.376513] Modules linked in: lustre ofd osp lod ost mdt mdd mgs osd_ldiskfs ldiskfs lquota lfsck obdecho mgc lov osc mdc lmv fid fld ptlrpc obdclass ksocklnd lnet libcfs zfs(P) zcommon(P) znvpair(P) zavl(P) zunicode(P) spl zlib_deflate exportfs jbd sha512_generic sha256_generic ext4 jbd2 mbcache virtio_balloon virtio_console i2c_piix4 i2c_core virtio_blk virtio_net virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod nfs lockd fscache auth_rpcgss nfs_acl sunrpc be2iscsi bnx2i cnic uio cxgb3i libcxgbi ipv6 cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: libcfs]
<4>[231830.376513] 
<4>[231830.376513] Pid: 31113, comm: mount.lustre Tainted: P           -- ------------    2.6.32-rhe6.7-debug #1 Bochs Bochs
<4>[231830.376513] RIP: 0010:[<ffffffffa08966a1>]  [<ffffffffa08966a1>] nm_member_reclassify_nodemap+0x71/0x130 [ptlrpc]
<4>[231830.376513] RSP: 0018:ffff8800825cb628  EFLAGS: 00010286
<4>[231830.376513] RAX: 0000000000000000 RBX: ffff8800338337f0 RCX: 0000000000000000
<4>[231830.376513] RDX: 0000000000000000 RSI: 0000000000000071 RDI: 0000000000000246
<4>[231830.376513] RBP: ffff8800825cb678 R08: 0000000000000ed9 R09: ffff880000000000
<4>[231830.376513] R10: 0000000000000001 R11: 0000000087654321 R12: ffff880033833b58
<4>[231830.376513] R13: ffff88008e29d7f0 R14: ffff880095cf4ef0 R15: ffff880095cf4fa8
<4>[231830.376513] FS:  00007f35b8cbf7a0(0000) GS:ffff880006280000(0000) knlGS:0000000000000000
<4>[231830.376513] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
<4>[231830.376513] CR2: 0000000000000018 CR3: 000000007d53f000 CR4: 00000000000006e0
<4>[231830.376513] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>[231830.376513] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>[231830.376513] Process mount.lustre (pid: 31113, threadinfo ffff8800825c8000, task ffff880023028080)
<4>[231830.376513] Stack:
<4>[231830.376513]  0000000000000010 ffffffffa09173c0 ffff880095cf4f18 ffff880095cf4f60
<4>[231830.376513] <d> ffff8800292eae88 ffff880095cf4ef0 ffff880095cf4f18 ffff880095cf4f18
<4>[231830.376513] <d> ffff8800292eae88 ffff880095cf4ef0 ffff8800825cb698 ffffffffa0891efa
<4>[231830.376513] Call Trace:
<4>[231830.376513]  [<ffffffffa0891efa>] nodemap_putref+0x7a/0x2f0 [ptlrpc]
<4>[231830.376513]  [<ffffffffa08923aa>] nodemap_config_cleanup+0xda/0x120 [ptlrpc]
<4>[231830.376513]  [<ffffffffa0892406>] nodemap_config_dealloc+0x16/0xf0 [ptlrpc]
<4>[231830.376513]  [<ffffffffa089264e>] nodemap_config_set_active+0x14e/0x270 [ptlrpc]
<4>[231830.376513]  [<ffffffffa0897b76>] nm_config_file_register+0x966/0xf50 [ptlrpc]
<4>[231830.376513]  [<ffffffffa0c0892d>] ? iam_container_setup+0xad/0x110 [osd_ldiskfs]
<4>[231830.376513]  [<ffffffffa0c20000>] ? osd_inode_iteration+0xa30/0xd80 [osd_ldiskfs]
<4>[231830.376513]  [<ffffffffa0721043>] mgs_fs_setup+0x2f3/0x6d0 [mgs]
<4>[231830.376513]  [<ffffffffa072028f>] mgs_init0+0xe1f/0x16e0 [mgs]
<4>[231830.376513]  [<ffffffffa0719689>] ? mgs_type_start+0x19/0x20 [mgs]
<4>[231830.376513]  [<ffffffffa0720be8>] mgs_device_alloc+0x98/0x140 [mgs]
<4>[231830.376513]  [<ffffffffa058c37f>] obd_setup+0x1bf/0x290 [obdclass]
<4>[231830.376513]  [<ffffffffa058c6a8>] class_setup+0x258/0x930 [obdclass]
<4>[231830.376513]  [<ffffffffa0592e91>] class_process_config+0x1151/0x23f0 [obdclass]
<4>[231830.376513]  [<ffffffffa0427fb1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
<4>[231830.376513]  [<ffffffffa059afcf>] do_lcfg+0x2cf/0x8e0 [obdclass]
<4>[231830.376513]  [<ffffffffa059b674>] lustre_start_simple+0x94/0x200 [obdclass]
<4>[231830.376513]  [<ffffffffa04247f8>] ? libcfs_log_return+0x28/0x40 [libcfs]
<4>[231830.376513]  [<ffffffffa05ca1d7>] server_fill_super+0xc37/0x106c [obdclass]
<4>[231830.376513]  [<ffffffffa0427fb1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
<4>[231830.376513]  [<ffffffffa059d558>] lustre_fill_super+0x328/0x8a0 [obdclass]
<4>[231830.376513]  [<ffffffffa059d230>] ? lustre_fill_super+0x0/0x8a0 [obdclass]
<4>[231830.376513]  [<ffffffff811965cf>] get_sb_nodev+0x5f/0xa0
<4>[231830.376513]  [<ffffffffa0597505>] lustre_get_sb+0x25/0x30 [obdclass]
<4>[231830.376513]  [<ffffffff81195bfb>] vfs_kern_mount+0x7b/0x1b0
<4>[231830.376513]  [<ffffffff81195da2>] do_kern_mount+0x52/0x130
<4>[231830.376513]  [<ffffffff811b7ceb>] do_mount+0x2fb/0x920
<4>[231830.376513]  [<ffffffff811b83a0>] sys_mount+0x90/0xe0
<4>[231830.376513]  [<ffffffff8100b112>] system_call_fastpath+0x16/0x1b
<4>[231830.376513] Code: 03 00 00 0f 84 bf 00 00 00 49 81 ed 68 03 00 00 eb 12 0f 1f 84 00 00 00 00 00 4c 89 eb 4c 8d a8 98 fc ff ff 48 8b 83 10 01 00 00 <48> 8b 78 18 e8 96 b7 ff ff 49 39 c6 74 70 48 8b 8b 68 03 00 00 
<1>[231830.376513] RIP  [<ffffffffa08966a1>] nm_member_reclassify_nodemap+0x71/0x130 [ptlrpc]
<4>[231830.376513]  RSP <ffff8800825cb628>
<4>[231830.376513] CR2: 0000000000000018
(gdb) l *(nm_member_reclassify_nodemap+0x71)
0xd76d1 is in nm_member_reclassify_nodemap (/home/green/git/lustre-release/lustre/ptlrpc/nodemap_member.c:153).
148		list_for_each_entry_safe(exp, tmp, &nodemap->nm_member_list,
149					 exp_target_data.ted_nodemap_member) {
150			lnet_nid_t nid = exp->exp_connection->c_peer.nid;
151	
152			/* nodemap_classify_nid requires nmc_range_tree_lock */
153			new_nodemap = nodemap_classify_nid(nid);
154			if (new_nodemap != nodemap) {
155				/* don't use member_del because ted_nodemap
156				 * should never be null
157				 */

I guess the nid could be NULL after all



 Comments   
Comment by Gerrit Updater [ 15/Apr/16 ]

Kit Westneat (kit.westneat@gmail.com) uploaded a new patch: http://review.whamcloud.com/19595
Subject: LU-8001 nodemap: fix null deref when reclassifying
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 5ecc58393ce2c4008941dbf4370e9bc81dbbb548

Comment by Gerrit Updater [ 22/Apr/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19595/
Subject: LU-8001 nodemap: fix null deref when reclassifying
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 475576e74ec315e316fe6528c97c437511b18872

Comment by Joseph Gmitter (Inactive) [ 27/Apr/16 ]

Landed to master for 2.9.0

Generated at Sat Feb 10 02:13:46 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.