Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.4.1
-
3
-
11490
Description
Nov 6 16:38:07 lustre-mds-0-0 kernel: BUG: unable to handle kernel NULL pointer dereference at 000000000000001c
Nov 6 16:38:07 lustre-mds-0-0 kernel: IP: [<ffffffffa0cfb246>] mdt_dump_lmm+0x16/0x410 [mdt]
Nov 6 16:38:07 lustre-mds-0-0 kernel: PGD 0
Nov 6 16:38:07 lustre-mds-0-0 kernel: Oops: 0000 1 SMP
Nov 6 16:38:07 lustre-mds-0-0 kernel: last sysfs file: /sys/devices/pci0000:00/0000:00:09.0/0000:19:00.0/0000:1a:04.0/0000:1c:00.0/irq
Nov 6 16:38:07 lustre-mds-0-0 kernel: CPU 4
Nov 6 16:38:07 lustre-mds-0-0 kernel: Modules linked in: osp(U) lod(U) mdt(U) mgs(U) mgc(U) fsfilt_ldiskfs(U) osd_ldiskfs(U) lquota(U) mdd(U) lustre(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) sha512_generic sha256_generic crc32c_intel libcfs(U) ldiskfs(U) autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf iptable_filter ip_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode serio_raw i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support i7core_edac edac_core ioatdma raid10 myri10ge ses enclosure sg igb dca ptp pps_core sr_mod cdrom ext4 mbcache jbd2 sd_mod crc_t10dif usb_storage ahci mptsas mptscsih mptbase scsi_transport_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
Nov 6 16:38:07 lustre-mds-0-0 kernel:
Nov 6 16:38:07 lustre-mds-0-0 kernel: Pid: 4408, comm: mdt02_002 Not tainted 2.6.32-358.14.1.el6_lustre.g0a46394.x86_64 #1 SUN MICROSYSTEMS SUN FIRE X4170 SERVER /ASSY,MOTHERBOARD,X4170
Nov 6 16:38:07 lustre-mds-0-0 kernel: RIP: 0010:[<ffffffffa0cfb246>] [<ffffffffa0cfb246>] mdt_dump_lmm+0x16/0x410 [mdt]
Nov 6 16:38:07 lustre-mds-0-0 kernel: RSP: 0018:ffff88066bf87a20 EFLAGS: 00010282
Nov 6 16:38:07 lustre-mds-0-0 kernel: RAX: 0000000000000003 RBX: ffff88066bf7e000 RCX: ffffc9002118d6f0
Nov 6 16:38:07 lustre-mds-0-0 kernel: RDX: ffff88066914bc00 RSI: 0000000000000000 RDI: 0000000000000040
Nov 6 16:38:07 lustre-mds-0-0 kernel: RBP: ffff88066bf87a70 R08: 0000000000008001 R09: ffff88066bf7e510
Nov 6 16:38:07 lustre-mds-0-0 kernel: R10: ffff88067451c49c R11: ffffffffa03b89b0 R12: ffff880669236070
Nov 6 16:38:07 lustre-mds-0-0 kernel: R13: ffff8806793c77a0 R14: 0000000000000038 R15: ffff880669208a68
Nov 6 16:38:07 lustre-mds-0-0 kernel: FS: 00007f00c33bf700(0000) GS:ffff88038ac00000(0000) knlGS:0000000000000000
Nov 6 16:38:07 lustre-mds-0-0 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Nov 6 16:38:07 lustre-mds-0-0 kernel: CR2: 000000000000001c CR3: 00000006789ec000 CR4: 00000000000007e0
Nov 6 16:38:07 lustre-mds-0-0 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Nov 6 16:38:07 lustre-mds-0-0 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Nov 6 16:38:07 lustre-mds-0-0 kernel: Process mdt02_002 (pid: 4408, threadinfo ffff88066bf86000, task ffff88066d264080)
Nov 6 16:38:07 lustre-mds-0-0 kernel: Stack:
Nov 6 16:38:07 lustre-mds-0-0 kernel: ffff88066bf7e000 ffff880677269000 ffff88066bf87a70 ffffffffa0ce1832
Nov 6 16:38:07 lustre-mds-0-0 kernel: <d> ffff880669236070 ffff88066bf7e000 ffff880669236070 ffff8806793c77a0
Nov 6 16:38:07 lustre-mds-0-0 kernel: <d> 0000000000000038 ffff880669208a68 ffff88066bf87b00 ffffffffa0cf4b0f
Nov 6 16:38:07 lustre-mds-0-0 kernel: Call Trace:
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa0ce1832>] ? mdt_pack_attr2body+0xe2/0x270 [mdt]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa0cf4b0f>] mdt_getattr_internal+0x56f/0x1210 [mdt]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa0cf661e>] mdt_getattr_name_lock+0xe6e/0x1980 [mdt]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa06bd135>] ? lustre_msg_buf+0x55/0x60 [ptlrpc]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa06e5646>] ? __req_capsule_get+0x166/0x700 [ptlrpc]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa06bf3c4>] ? lustre_msg_get_flags+0x34/0xb0 [ptlrpc]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa0cf73cd>] mdt_intent_getattr+0x29d/0x490 [mdt]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa0ce3f3e>] mdt_intent_policy+0x39e/0x720 [mdt]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa0675831>] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa069c1ef>] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa0ce43c6>] mdt_enqueue+0x46/0xe0 [mdt]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa0ceaab7>] mdt_handle_common+0x647/0x16d0 [mdt]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa06bebac>] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa0d243f5>] mds_regular_handle+0x15/0x20 [mdt]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa06ce3c8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa03e85de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa03f9d9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa06c5729>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffff81055ad3>] ? __wake_up+0x53/0x70
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa06cf75e>] ptlrpc_main+0xace/0x1700 [ptlrpc]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa06cec90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffff8100c0ca>] child_rip+0xa/0x20
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa06cec90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa06cec90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
Nov 6 16:38:07 lustre-mds-0-0 kernel: Code: 41 ab 9e ff 48 89 83 70 04 00 00 e9 2d ff ff ff 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 53 48 83 ec 28 0f 1f 44 00 00 <44> 0f b7 66 1c 41 89 fe 41 89 fd 48 89 f3 41 81 e6 00 04 06 02
Nov 6 16:38:07 lustre-mds-0-0 kernel: RIP [<ffffffffa0cfb246>] mdt_dump_lmm+0x16/0x410 [mdt]
Nov 6 16:38:07 lustre-mds-0-0 kernel: RSP <ffff88066bf87a20>
Nov 6 16:38:07 lustre-mds-0-0 kernel: CR2: 000000000000001c
Nov 6 16:38:07 lustre-mds-0-0 kernel: --[ end trace 0dadd51afe1c36b7 ]--
Nov 6 16:38:07 lustre-mds-0-0 kernel: Kernel panic - not syncing: Fatal exception
We were trying to setup active/active MDS/MDT from a cluster with two MDS and two MDT. while trying to mount the 1.8.9 clients, we hit this panic on the MDS.
Our goal was from a 1.8.x server -> upgrade to 2.4.1 -> backup and restore the single MDT to a new system with 1 MDT -> add another MDT on different MDS as remote mdt.
The last step is to use tunefs.lustre to configure active/active HA on MDS(s)