[LU-622] Oops: RIP: libcfs:libcfs_debug_vmsg2+0x40b/0x9f0 Created: 23/Aug/11  Updated: 29/Aug/11  Resolved: 24/Aug/11

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.0
Fix Version/s: Lustre 2.1.0

Type: Bug Priority: Blocker
Reporter: Jian Yu Assignee: Zhenyu Xu
Resolution: Fixed Votes: 0
Labels: None
Environment:

Lustre Tag: v2_1_0_0_RC0
Lustre Build: http://newbuild.whamcloud.com/job/lustre-master/267/
Distro/Arch: RHEL5/x86_64 (kernel version: 2.6.18-238.19.1.el5)


Severity: 3
Rank (Obsolete): 4266

 Description   

While running conf-sanity test 36 on a single node, the following Oops occurred:

Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: 
 [<ffffffff887e379b>] :libcfs:libcfs_debug_vmsg2+0x40b/0x9f0
PGD 31bcee067 PUD 236062067 PMD 0 
Oops: 0000 [1] SMP 
last sysfs file: /block/sdb/queue/max_sectors_kb
CPU 3 
Modules linked in: llite_lloop(U) lustre(U) obdfilter(U) ost(U) osd_ldiskfs(U) cmm(U) fsfilt_ldiskfs(U) mdt(U) mdd(U) mds(U) mgs(U) ldiskfs(U) mgc(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) lquota(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lvfs(U) ksocklnd(U) lnet(U) libcfs(U) nfs(U) fscache(U) nfs_acl(U) exportfs(U) jbd2(U) crc16(U) autofs4(U) hidp(U) rfcomm(U) l2cap(U) bluetooth(U) lockd(U) sunrpc(U) cpufreq_ondemand(U) powernow_k8(U) freq_table(U) mperf(U) be2iscsi(U) iscsi_tcp(U) bnx2i(U) cnic(U) uio(U) cxgb3i(U) iw_cxgb3(U) cxgb3(U) libiscsi_tcp(U) ib_iser(U) libiscsi2(U) scsi_transport_iscsi2(U) scsi_transport_iscsi(U) ib_srp(U) rds(U) ib_sdp(U) ib_ipoib(U) ipoib_helper(U) ipv6(U) xfrm_nalgo(U) crypto_api(U) rdma_ucm(U) rdma_cm(U) ib_ucm(U) ib_uverbs(U) ib_umad(U) ib_cm(U) iw_cm(U) ib_addr(U) ib_sa(U) loop(U) dm_mirror(U) dm_multipath(U) scsi_dh(U) video(U) backlight(U) sbs(U) power_meter(U) i2c_ec(U) dell_wmi(U) wmi(U) button(U) battery(U) asus_acpi(U) acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) mlx4_ib(U) ib_mad(U) ib_core(U) mlx4_en(U) shpchp(U) igb(U) 8021q(U) sg(U) i2c_piix4(U) mlx4_core(U) i2c_core(U) k10temp(U) dca(U) pcspkr(U) hwmon(U) serio_raw(U) tpm_tis(U) tpm(U) tpm_bios(U) amd64_edac_mod(U) edac_mc(U) dm_raid45(U) dm_message(U) dm_region_hash(U) dm_log(U) dm_mod(U) dm_mem_cache(U) ahci(U) libata(U) sd_mod(U) scsi_mod(U) ext3(U) jbd(U) uhci_hcd(U) ohci_hcd(U) ehci_hcd(U)
Pid: 30667, comm: mount.lustre Tainted: G      2.6.18-238.19.1.el5_lustre.g65156ed #1
RIP: 0010:[<ffffffff887e379b>]  [<ffffffff887e379b>] :libcfs:libcfs_debug_vmsg2+0x40b/0x9f0
RSP: 0018:ffff8100d0fc77a8  EFLAGS: 00010246
RAX: 0000000000000000 RBX: 00000000000000a0 RCX: 0000000000001fff
RDX: ffff810102f42600 RSI: 0000000000000000 RDI: ffffffff88fa6a4e
RBP: ffff8100d5fce420 R08: ffff8100d80aedf8 R09: 00000000000003c9
R10: ffff8100d5fce3e0 R11: 0000000000000020 R12: ffff81010c9a0180
R13: 000000000000004a R14: 0000000000000055 R15: 0000000000000000
FS:  00002acf763916e0(0000) GS:ffff8102239255c0(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 000000031b441000 CR4: 00000000000006e0
Process mount.lustre (pid: 30667, threadinfo ffff8100d0fc6000, task ffff81010b0ab100)
Stack:  ffff8100d0fc7868 ffff8100d0fc77e8 000003c900000010 ffffffff88fa6a4e
 ffffffff88fabdc2 0000009f00000010 0000000000000055 0000000000000000
 0000000000000000 0000001000000080 4e53554600000003 0000000000098d66
Call Trace:
 [<ffffffff800477e7>] sprintf+0x51/0x59
 [<ffffffff888d1738>] :obdclass:class_get_profile+0xf8/0x1a0
 [<ffffffff88f6d55a>] :lustre:ll_fill_super+0x148a/0x7770
 [<ffffffff888e79cf>] :obdclass:lustre_start_mgc+0x27cf/0x2860
 [<ffffffff888eec70>] :obdclass:lustre_fill_super+0x0/0x590
 [<ffffffff888ef012>] :obdclass:lustre_fill_super+0x3a2/0x590
 [<ffffffff800e6d2c>] get_sb_nodev+0x4f/0x97
 [<ffffffff888de6fc>] :obdclass:lustre_get_sb+0x1c/0x30
 [<ffffffff800e6604>] vfs_kern_mount+0x93/0x11a
 [<ffffffff800e66cd>] do_kern_mount+0x36/0x4d
 [<ffffffff800f0fd5>] do_mount+0x6a9/0x719
 [<ffffffff8000d044>] do_lookup+0x65/0x1e6
 [<ffffffff8002cc10>] mntput_no_expire+0x19/0x89
 [<ffffffff8000a81a>] __link_path_walk+0xf79/0xfb9
 [<ffffffff8002cc10>] mntput_no_expire+0x19/0x89
 [<ffffffff8000eb94>] link_path_walk+0xa6/0xb2
 [<ffffffff800ce765>] zone_statistics+0x3e/0x6d
 [<ffffffff8000f41e>] __alloc_pages+0x78/0x308
 [<ffffffff8004c757>] sys_mount+0x8a/0xcd
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0


Code: 48 8b 06 8b 4d 18 48 83 e0 fc 48 29 c2 48 b8 b7 6d db b6 6d 
RIP  [<ffffffff887e379b>] :libcfs:libcfs_debug_vmsg2+0x40b/0x9f0
 RSP <ffff8100d0fc77a8>
CR2: 0000000000000000
 <0>Kernel panic - not syncing: Fatal exception

Maloo report: https://maloo.whamcloud.com/test_sets/97f5748e-cd5b-11e0-8d02-52540025f9af



 Comments   
Comment by Peter Jones [ 23/Aug/11 ]

Bobijam

Could you please look into this issue as your top priority?

Thanks

Peter

Comment by Andreas Dilger [ 23/Aug/11 ]

There is a good chance that this is related to LU-556 (commit cbc4ca2e8dc37d54bb7d3c9b02ab20b63e60f592), which was changing this code recently.

Comment by Oleg Drokin [ 23/Aug/11 ]

I tried to reproduce it locally and seems to be unable to, as a data point.

Comment by Oleg Drokin [ 23/Aug/11 ]

My digging around the dump seems to imply the crash actually happened in OBD_ALLOC somewhere., when we were trying to allocate md. Noy sure how did it die in the printing place then.

Anyway, I think this is not exactly right in the patch:
const int instlen = sizeof(cfg->cfg_instance) * 2 + 1;

Should be +2, 1 for the dash between name and address and another 1 for the nul terminator.

The previous code before the patch had the logic at +2.

Comment by Zhenyu Xu [ 24/Aug/11 ]

patch tracking at http://review.whamcloud.com/1280

Comment by Jinshan Xiong (Inactive) [ 24/Aug/11 ]

My fault. Sorry

Comment by Build Master (Inactive) [ 24/Aug/11 ]

Integrated in lustre-master » i686,client,el6,inkernel #273
LU-622 Alloc enough cfg buffer space

Oleg Drokin : ed2e4d8205e5b6bc9dc2ad8319ad7666e49e5dfe
Files :

  • lustre/llite/llite_lib.c
  • lustre/obdclass/obd_config.c
  • lustre/liblustre/super.c
Comment by Build Master (Inactive) [ 24/Aug/11 ]

Integrated in lustre-master » x86_64,server,el5,ofa #273
LU-622 Alloc enough cfg buffer space

Oleg Drokin : ed2e4d8205e5b6bc9dc2ad8319ad7666e49e5dfe
Files :

  • lustre/llite/llite_lib.c
  • lustre/liblustre/super.c
  • lustre/obdclass/obd_config.c
Comment by Build Master (Inactive) [ 24/Aug/11 ]

Integrated in lustre-master » x86_64,server,el5,inkernel #273
LU-622 Alloc enough cfg buffer space

Oleg Drokin : ed2e4d8205e5b6bc9dc2ad8319ad7666e49e5dfe
Files :

  • lustre/liblustre/super.c
  • lustre/obdclass/obd_config.c
  • lustre/llite/llite_lib.c
Comment by Build Master (Inactive) [ 24/Aug/11 ]

Integrated in lustre-master » x86_64,client,el6,inkernel #273
LU-622 Alloc enough cfg buffer space

Oleg Drokin : ed2e4d8205e5b6bc9dc2ad8319ad7666e49e5dfe
Files :

  • lustre/obdclass/obd_config.c
  • lustre/liblustre/super.c
  • lustre/llite/llite_lib.c
Comment by Build Master (Inactive) [ 24/Aug/11 ]

Integrated in lustre-master » i686,client,el5,ofa #273
LU-622 Alloc enough cfg buffer space

Oleg Drokin : ed2e4d8205e5b6bc9dc2ad8319ad7666e49e5dfe
Files :

  • lustre/liblustre/super.c
  • lustre/obdclass/obd_config.c
  • lustre/llite/llite_lib.c
Comment by Build Master (Inactive) [ 24/Aug/11 ]

Integrated in lustre-master » x86_64,client,el5,ofa #273
LU-622 Alloc enough cfg buffer space

Oleg Drokin : ed2e4d8205e5b6bc9dc2ad8319ad7666e49e5dfe
Files :

  • lustre/llite/llite_lib.c
  • lustre/obdclass/obd_config.c
  • lustre/liblustre/super.c
Comment by Build Master (Inactive) [ 24/Aug/11 ]

Integrated in lustre-master » x86_64,client,ubuntu1004,inkernel #273
LU-622 Alloc enough cfg buffer space

Oleg Drokin : ed2e4d8205e5b6bc9dc2ad8319ad7666e49e5dfe
Files :

  • lustre/liblustre/super.c
  • lustre/obdclass/obd_config.c
  • lustre/llite/llite_lib.c
Comment by Build Master (Inactive) [ 24/Aug/11 ]

Integrated in lustre-master » x86_64,client,el5,inkernel #273
LU-622 Alloc enough cfg buffer space

Oleg Drokin : ed2e4d8205e5b6bc9dc2ad8319ad7666e49e5dfe
Files :

  • lustre/obdclass/obd_config.c
  • lustre/liblustre/super.c
  • lustre/llite/llite_lib.c
Comment by Build Master (Inactive) [ 24/Aug/11 ]

Integrated in lustre-master » i686,server,el5,inkernel #273
LU-622 Alloc enough cfg buffer space

Oleg Drokin : ed2e4d8205e5b6bc9dc2ad8319ad7666e49e5dfe
Files :

  • lustre/liblustre/super.c
  • lustre/llite/llite_lib.c
  • lustre/obdclass/obd_config.c
Comment by Build Master (Inactive) [ 24/Aug/11 ]

Integrated in lustre-master » i686,client,el5,inkernel #273
LU-622 Alloc enough cfg buffer space

Oleg Drokin : ed2e4d8205e5b6bc9dc2ad8319ad7666e49e5dfe
Files :

  • lustre/llite/llite_lib.c
  • lustre/obdclass/obd_config.c
  • lustre/liblustre/super.c
Comment by Build Master (Inactive) [ 24/Aug/11 ]

Integrated in lustre-master » x86_64,server,el6,inkernel #273
LU-622 Alloc enough cfg buffer space

Oleg Drokin : ed2e4d8205e5b6bc9dc2ad8319ad7666e49e5dfe
Files :

  • lustre/liblustre/super.c
  • lustre/obdclass/obd_config.c
  • lustre/llite/llite_lib.c
Comment by Build Master (Inactive) [ 24/Aug/11 ]

Integrated in lustre-master » i686,server,el5,ofa #273
LU-622 Alloc enough cfg buffer space

Oleg Drokin : ed2e4d8205e5b6bc9dc2ad8319ad7666e49e5dfe
Files :

  • lustre/obdclass/obd_config.c
  • lustre/liblustre/super.c
  • lustre/llite/llite_lib.c
Comment by Build Master (Inactive) [ 29/Aug/11 ]

Integrated in lustre-master » x86_64,client,sles11,inkernel #274
LU-622 Alloc enough cfg buffer space

Oleg Drokin : ed2e4d8205e5b6bc9dc2ad8319ad7666e49e5dfe
Files :

  • lustre/obdclass/obd_config.c
  • lustre/llite/llite_lib.c
  • lustre/liblustre/super.c
Comment by Build Master (Inactive) [ 29/Aug/11 ]

Integrated in lustre-master » i686,server,el6,inkernel #274
LU-622 Alloc enough cfg buffer space

Oleg Drokin : ed2e4d8205e5b6bc9dc2ad8319ad7666e49e5dfe
Files :

  • lustre/liblustre/super.c
  • lustre/obdclass/obd_config.c
  • lustre/llite/llite_lib.c
Generated at Sat Feb 10 01:08:50 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.