Details
-
Bug
-
Resolution: Duplicate
-
Critical
-
None
-
Lustre 2.4.0
-
None
-
Lustre 2.3.69 servers, 2.3.58 clients, zfs (though), all on VMs, which lnet running on eth1 only.
-
3
-
7050
Description
running: lctl get_param '..*' on the MDS/MGS
Results in:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 IP: [<ffffffffa07a9c8e>] lprocfs_rd_import+0x38e/0x6e0 [obdclass] PGD 17e62067 PUD 1af63067 PMD 0 Oops: 0000 [#1] SMP last sysfs file: /sys/devices/pci0000:00/0000:00:11.0/0000:02:01.0/net/eth0/broadcast CPU 0 Modules linked in: lustre(U) ofd(U) osp(U) lod(U) ost(U) mdt(U) mdd(U) mgs(U) osd_zfs(U) lquota(U) obdecho(U) mgc(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) ptlrpc(U) obdclass(U) lvfs(U) ksocklnd(U) lnet(U) libcfs(U) jbd sha512_generic sha256_generic nfs lockd fscache nfs_acl auth_rpcgss sunrpc ipv6 zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate ppdev parport_pc parport btusb bluetooth rfkill e1000 snd_ens1371 snd_rawmidi snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc vmware_balloon sg i2c_piix4 i2c_core shpchp ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif mptspi mptscsih mptbase scsi_transport_spi pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: libcfs] Pid: 27042, comm: lctl Tainted: P --------------- 2.6.32-279.14.1.el6_lustre.x86_64 #1 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform RIP: 0010:[<ffffffffa07a9c8e>] [<ffffffffa07a9c8e>] lprocfs_rd_import+0x38e/0x6e0 [obdclass] RSP: 0018:ffff880017ec5d98 EFLAGS: 00010246 RAX: ffff88001b6c20b8 RBX: ffff88001a527000 RCX: 0000000000000001 RDX: ffff880017ec5dd8 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff880017ec5e38 R08: 00000000ffffff0a R09: 00000000fffffffe R10: 0000000000000000 R11: 2d6e692020202020 R12: 0000000000000001 R13: 0000000000000170 R14: 0000000000000000 R15: 0000000000001000 FS: 00007f477761a700(0000) GS:ffff880002e00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000010 CR3: 0000000014ca8000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process lctl (pid: 27042, threadinfo ffff880017ec4000, task ffff8800037f1500) Stack: ffff880000000000 0000000000000000 ffff880000000001 ffff88001a527000 <d> ffff88001b6c25e0 ffff88001a527268 ffff88001b6c20b8 ffff88000e61a000 <d> 0000000000000000 ffffffff81abf980 0000000000000000 fffffffffffffffb Call Trace: [<ffffffff8115c7ea>] ? alloc_pages_current+0xaa/0x110 [<ffffffffa07a59e3>] lprocfs_fops_read+0xf3/0x1f0 [obdclass] [<ffffffff811e11fe>] proc_reg_read+0x7e/0xc0 [<ffffffff8117bee5>] vfs_read+0xb5/0x1a0 [<ffffffff810d6d42>] ? audit_syscall_entry+0x272/0x2a0 [<ffffffff8117c021>] sys_read+0x51/0x90 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b Code: f6 31 c0 44 89 34 24 e8 91 39 ad e0 46 8d 2c 28 48 8b 45 88 c7 00 00 00 00 00 48 8b 45 90 48 8d 55 a0 31 f6 48 8b b8 e8 19 00 00 <4c> 8b 67 10 e8 09 f8 ff ff 48 8b 4d a0 48 85 c9 0f 85 cc 01 00 RIP [<ffffffffa07a9c8e>] lprocfs_rd_import+0x38e/0x6e0 [obdclass] RSP <ffff880017ec5d98> CR2: 0000000000000010 ---[ end trace 0cb74fb73d5c7aba ]--- Kernel panic - not syncing: Fatal exception
This boils down to lustre/obdclass/lprocfs_status.c:1061 which is dereferencing obd->obd_svc_stats->ls_cnt_header, but everywhere else is protected from obd->obd_svc_stats being NULL.