Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.1.0
-
None
-
Lustre Branch: master
Lustre Build: http://newbuild.whamcloud.com/job/lustre-master/192/arch=x86_64,build_type=server,distro=el5,ib_stack=inkernel/
e2fsprogs Build: http://newbuild.whamcloud.com/job/e2fsprogs-master/42/arch=x86_64,distro=el5/
Distro/Arch: CentOS5.6/x86_64
Kernel Version: 2.6.18-238.12.1.el5_lustre.g6a3d997
Lustre Branch: master Lustre Build: http://newbuild.whamcloud.com/job/lustre-master/192/arch=x86_64,build_type=server,distro=el5,ib_stack=inkernel/ e2fsprogs Build: http://newbuild.whamcloud.com/job/e2fsprogs-master/42/arch=x86_64,distro=el5/ Distro/Arch: CentOS5.6/x86_64 Kernel Version: 2.6.18-238.12.1.el5_lustre.g6a3d997
-
3
-
4949
Description
While formatting an 128TB OST on DDN SFA10KE with the following command:
mkfs.lustre --reformat --fsname=largefs --ost --mgsnode=10.0.2.15@tcp --mountfsoptions='errors=remount-ro,extents,mballoc,force_over_16tb' /dev/large_vg/ost_lv
It hit kernel panic as follows:
Lustre: DEBUG MARKER: ===================== format the OST /dev/large_vg/ost_lv ===================== init dynlocks cache ldiskfs created from ext4-2.6-rhel5 LDISKFS-fs (dm-3): warning: maximal mount count reached, running e2fsck is recommended LDISKFS-fs: can't allocate buddy meta group LDISKFS-fs (dm-3): failed to initalize mballoc (-12) LDISKFS-fs (dm-3): mount failed Unable to handle kernel NULL pointer dereference at 00000000000001c8 RIP: [<ffffffff887421f1>] :ldiskfs:ldiskfs_clear_inode+0x81/0xb0 PGD 7c3436067 PUD 7c0051067 PMD 0 Oops: 0000 [1] SMP last sysfs file: /block/ram0/dev CPU 3 Modules linked in: ldiskfs(U) jbd2(U) crc16(U) lnet(U) libcfs(U) raid0(U) autofs4(U) hidp(U) rfcomm(U) l2cap(U) bluetooth(U) lockd(U) sunrpc(U) be2iscsi(U) ib_iser(U) rdma_cm(U) ib_cm(U) iw_cm(U) ib_sa(U) ib_mad(U) ib_core(U) ib_addr(U) iscsi_tcp(U) bnx2i(U) cnic(U) ipv6(U) xfrm_nalgo(U) crypto_api(U) uio(U) cxgb3i(U) cxgb3(U) 8021q(U) libiscsi_tcp(U) libiscsi2(U) scsi_transport_iscsi2(U) scsi_transport_iscsi(U) dm_multipath(U) scsi_dh(U) video(U) backlight(U) sbs(U) power_meter(U) hwmon(U) i2c_ec(U) dell_wmi(U) wmi(U) button(U) battery(U) asus_acpi(U) acpi_memhotplug(U) ac(U) lp(U) floppy(U) 8139too(U) mlx4_en(U) tpm_tis(U) ide_cd(U) i2c_piix4(U) tpm(U) parport_pc(U) sfablkdrvr(U) parport(U) 8139cp(U) mlx4_core(U) serio_raw(U) tpm_bios(U) cdrom(U) pcspkr(U) i2c_core(U) mii(U) dm_raid45(U) dm_message(U) dm_region_hash(U) dm_mem_cache(U) dm_snapshot(U) dm_zero(U) dm_mirror(U) dm_log(U) dm_mod(U) ata_piix(U) libata(U) sd_mod(U) scsi_mod(U) ext3(U) jbd(U) uhci_hcd(U) ohci_hcd(U) ehci_hcd(U) Pid: 3290, comm: mkfs.lustre Tainted: G 2.6.18-238.12.1.el5_lustre.g6a3d997 #1 RIP: 0010:[<ffffffff887421f1>] [<ffffffff887421f1>] :ldiskfs:ldiskfs_clear_inode+0x81/0xb0 RSP: 0018:ffff8107c0bc7ad8 EFLAGS: 00010296 RAX: 0000000000000000 RBX: ffff8104d1978990 RCX: ffff8107c01b2cc0 RDX: ffff8107c01b2cc0 RSI: ffff8104d1978b98 RDI: ffff8104d1978990 RBP: ffff8104d1978890 R08: ffff810000032600 R09: 7fffffffffffffff R10: ffff8107c0bc78a8 R11: ffffffff80039e56 R12: ffff8107c0050948 R13: 0000000000000000 R14: ffff8107d908d000 R15: ffffffff88742600 FS: 00002aaed55fa6e0(0000) GS:ffff81011bbdb640(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000000000001c8 CR3: 00000007c6925000 CR4: 00000000000006e0 Process mkfs.lustre (pid: 3290, threadinfo ffff8107c0bc6000, task ffff8107d77507a0) Stack: 7fffffffffffffff ffff8104d1978990 ffff8107c01b2c00 ffffffff8002303b ffff8104d1978990 ffffffff80039f9c 0000000000000000 ffff8107c00508e8 0000000000000000 ffffffff800ede72 ffff8107c01b2c00 ffffffff88764d00 Call Trace: [<ffffffff8002303b>] clear_inode+0xd2/0x123 [<ffffffff80039f9c>] generic_drop_inode+0x146/0x15a [<ffffffff800ede72>] shrink_dcache_for_umount_subtree+0x1f2/0x21e [<ffffffff800ee40c>] shrink_dcache_for_umount+0x35/0x43 [<ffffffff800e636b>] generic_shutdown_super+0x1b/0xfb [<ffffffff800e647c>] kill_block_super+0x31/0x45 [<ffffffff800e654a>] deactivate_super+0x6a/0x82 [<ffffffff800e6c6f>] get_sb_bdev+0x121/0x16c [<ffffffff800e65f5>] vfs_kern_mount+0x93/0x11a [<ffffffff800e66be>] do_kern_mount+0x36/0x4d [<ffffffff800f0fc6>] do_mount+0x6a9/0x719 [<ffffffff8002b502>] flush_tlb_page+0xac/0xda [<ffffffff8001125b>] do_wp_page+0x3f8/0x91e [<ffffffff88030d09>] :jbd:do_get_write_access+0x4f9/0x530 [<ffffffff80019de3>] __getblk+0x25/0x236 [<ffffffff800096d4>] __handle_mm_fault+0xf6b/0x1039 [<ffffffff88030804>] :jbd:journal_stop+0x249/0x255 [<ffffffff800ce756>] zone_statistics+0x3e/0x6d [<ffffffff8000f41e>] __alloc_pages+0x78/0x308 [<ffffffff800eadb4>] sys_mkdirat+0xd1/0xe4 [<ffffffff8004c74a>] sys_mount+0x8a/0xcd [<ffffffff8005d28d>] tracesys+0xd5/0xe0 Code: 48 8b b8 c8 01 00 00 48 85 ff 74 13 48 83 c4 08 48 8d b5 30 RIP [<ffffffff887421f1>] :ldiskfs:ldiskfs_clear_inode+0x81/0xb0 RSP <ffff8107c0bc7ad8>
The issue was also described in LU-136 #comment-14649, #comment-17082 and #comment-14650.