[LU-4222] Oops in mdt_dump_lmm+0x16/0x410 [mdt] Created: 07/Nov/13  Updated: 20/Jan/14  Resolved: 07/Jan/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.1
Fix Version/s: Lustre 2.6.0, Lustre 2.5.1

Type: Bug Priority: Blocker
Reporter: Minh Diep Assignee: Di Wang
Resolution: Fixed Votes: 0
Labels: mn4, sdsc

Severity: 3
Rank (Obsolete): 11490

 Description   

Nov 6 16:38:07 lustre-mds-0-0 kernel: BUG: unable to handle kernel NULL pointer dereference at 000000000000001c
Nov 6 16:38:07 lustre-mds-0-0 kernel: IP: [<ffffffffa0cfb246>] mdt_dump_lmm+0x16/0x410 [mdt]
Nov 6 16:38:07 lustre-mds-0-0 kernel: PGD 0
Nov 6 16:38:07 lustre-mds-0-0 kernel: Oops: 0000 1 SMP
Nov 6 16:38:07 lustre-mds-0-0 kernel: last sysfs file: /sys/devices/pci0000:00/0000:00:09.0/0000:19:00.0/0000:1a:04.0/0000:1c:00.0/irq
Nov 6 16:38:07 lustre-mds-0-0 kernel: CPU 4
Nov 6 16:38:07 lustre-mds-0-0 kernel: Modules linked in: osp(U) lod(U) mdt(U) mgs(U) mgc(U) fsfilt_ldiskfs(U) osd_ldiskfs(U) lquota(U) mdd(U) lustre(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) sha512_generic sha256_generic crc32c_intel libcfs(U) ldiskfs(U) autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf iptable_filter ip_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode serio_raw i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support i7core_edac edac_core ioatdma raid10 myri10ge ses enclosure sg igb dca ptp pps_core sr_mod cdrom ext4 mbcache jbd2 sd_mod crc_t10dif usb_storage ahci mptsas mptscsih mptbase scsi_transport_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
Nov 6 16:38:07 lustre-mds-0-0 kernel:
Nov 6 16:38:07 lustre-mds-0-0 kernel: Pid: 4408, comm: mdt02_002 Not tainted 2.6.32-358.14.1.el6_lustre.g0a46394.x86_64 #1 SUN MICROSYSTEMS SUN FIRE X4170 SERVER /ASSY,MOTHERBOARD,X4170
Nov 6 16:38:07 lustre-mds-0-0 kernel: RIP: 0010:[<ffffffffa0cfb246>] [<ffffffffa0cfb246>] mdt_dump_lmm+0x16/0x410 [mdt]
Nov 6 16:38:07 lustre-mds-0-0 kernel: RSP: 0018:ffff88066bf87a20 EFLAGS: 00010282
Nov 6 16:38:07 lustre-mds-0-0 kernel: RAX: 0000000000000003 RBX: ffff88066bf7e000 RCX: ffffc9002118d6f0
Nov 6 16:38:07 lustre-mds-0-0 kernel: RDX: ffff88066914bc00 RSI: 0000000000000000 RDI: 0000000000000040
Nov 6 16:38:07 lustre-mds-0-0 kernel: RBP: ffff88066bf87a70 R08: 0000000000008001 R09: ffff88066bf7e510
Nov 6 16:38:07 lustre-mds-0-0 kernel: R10: ffff88067451c49c R11: ffffffffa03b89b0 R12: ffff880669236070
Nov 6 16:38:07 lustre-mds-0-0 kernel: R13: ffff8806793c77a0 R14: 0000000000000038 R15: ffff880669208a68
Nov 6 16:38:07 lustre-mds-0-0 kernel: FS: 00007f00c33bf700(0000) GS:ffff88038ac00000(0000) knlGS:0000000000000000
Nov 6 16:38:07 lustre-mds-0-0 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Nov 6 16:38:07 lustre-mds-0-0 kernel: CR2: 000000000000001c CR3: 00000006789ec000 CR4: 00000000000007e0
Nov 6 16:38:07 lustre-mds-0-0 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Nov 6 16:38:07 lustre-mds-0-0 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Nov 6 16:38:07 lustre-mds-0-0 kernel: Process mdt02_002 (pid: 4408, threadinfo ffff88066bf86000, task ffff88066d264080)
Nov 6 16:38:07 lustre-mds-0-0 kernel: Stack:
Nov 6 16:38:07 lustre-mds-0-0 kernel: ffff88066bf7e000 ffff880677269000 ffff88066bf87a70 ffffffffa0ce1832
Nov 6 16:38:07 lustre-mds-0-0 kernel: <d> ffff880669236070 ffff88066bf7e000 ffff880669236070 ffff8806793c77a0
Nov 6 16:38:07 lustre-mds-0-0 kernel: <d> 0000000000000038 ffff880669208a68 ffff88066bf87b00 ffffffffa0cf4b0f
Nov 6 16:38:07 lustre-mds-0-0 kernel: Call Trace:
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa0ce1832>] ? mdt_pack_attr2body+0xe2/0x270 [mdt]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa0cf4b0f>] mdt_getattr_internal+0x56f/0x1210 [mdt]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa0cf661e>] mdt_getattr_name_lock+0xe6e/0x1980 [mdt]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa06bd135>] ? lustre_msg_buf+0x55/0x60 [ptlrpc]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa06e5646>] ? __req_capsule_get+0x166/0x700 [ptlrpc]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa06bf3c4>] ? lustre_msg_get_flags+0x34/0xb0 [ptlrpc]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa0cf73cd>] mdt_intent_getattr+0x29d/0x490 [mdt]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa0ce3f3e>] mdt_intent_policy+0x39e/0x720 [mdt]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa0675831>] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa069c1ef>] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa0ce43c6>] mdt_enqueue+0x46/0xe0 [mdt]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa0ceaab7>] mdt_handle_common+0x647/0x16d0 [mdt]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa06bebac>] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa0d243f5>] mds_regular_handle+0x15/0x20 [mdt]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa06ce3c8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa03e85de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa03f9d9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa06c5729>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffff81055ad3>] ? __wake_up+0x53/0x70
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa06cf75e>] ptlrpc_main+0xace/0x1700 [ptlrpc]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa06cec90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffff8100c0ca>] child_rip+0xa/0x20
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa06cec90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffffa06cec90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
Nov 6 16:38:07 lustre-mds-0-0 kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
Nov 6 16:38:07 lustre-mds-0-0 kernel: Code: 41 ab 9e ff 48 89 83 70 04 00 00 e9 2d ff ff ff 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 53 48 83 ec 28 0f 1f 44 00 00 <44> 0f b7 66 1c 41 89 fe 41 89 fd 48 89 f3 41 81 e6 00 04 06 02
Nov 6 16:38:07 lustre-mds-0-0 kernel: RIP [<ffffffffa0cfb246>] mdt_dump_lmm+0x16/0x410 [mdt]
Nov 6 16:38:07 lustre-mds-0-0 kernel: RSP <ffff88066bf87a20>
Nov 6 16:38:07 lustre-mds-0-0 kernel: CR2: 000000000000001c
Nov 6 16:38:07 lustre-mds-0-0 kernel: --[ end trace 0dadd51afe1c36b7 ]--
Nov 6 16:38:07 lustre-mds-0-0 kernel: Kernel panic - not syncing: Fatal exception

We were trying to setup active/active MDS/MDT from a cluster with two MDS and two MDT. while trying to mount the 1.8.9 clients, we hit this panic on the MDS.

Our goal was from a 1.8.x server -> upgrade to 2.4.1 -> backup and restore the single MDT to a new system with 1 MDT -> add another MDT on different MDS as remote mdt.

The last step is to use tunefs.lustre to configure active/active HA on MDS(s)



 Comments   
Comment by Peter Jones [ 07/Nov/13 ]

Hi Bobijam

Could you please look into this issue?

Peter

Comment by Minh Diep [ 22/Nov/13 ]

we hit this again

Nov 21 11:51:30 lustre-mds-0-0 kernel: BUG: unable to handle kernel NULL pointer dereference at 000000000000001c
Nov 21 11:51:30 lustre-mds-0-0 kernel: IP: [<ffffffffa0cfa246>] mdt_dump_lmm+0x16/0x410 [mdt]
Nov 21 11:51:30 lustre-mds-0-0 kernel: PGD 375648067 PUD 375600067 PMD 0
Nov 21 11:51:30 lustre-mds-0-0 kernel: Oops: 0000 1 SMP
Nov 21 11:51:30 lustre-mds-0-0 kernel: last sysfs file: /sys/devices/pci0000:00/0000:00:1e.0/0000:1f:05.0/local_cpus
Nov 21 11:51:30 lustre-mds-0-0 kernel: CPU 0
Nov 21 11:51:30 lustre-mds-0-0 kernel: Modules linked in: nfs lockd fscache auth_rpcgss nfs_acl osp(U) lod(U) mdt(U) mgs(U) mgc(U) fsfilt_ldiskfs(U) osd_ldiskfs(U) ldiskfs(U) lquota(U) mdd(U) lustre(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) sha512_generic sha256_generic crc32c_intel libcfs(U) autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf iptable_filter ip_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode serio_raw i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support ioatdma i7core_edac edac_core raid10 myri10ge ses enclosure sg igb dca ptp pps_core sr_mod cdrom ext4 jbd2 mbcache sd_mod crc_t10dif usb_storage ahci mptsas mptscsih mptbase scsi_transport_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
Nov 21 11:51:30 lustre-mds-0-0 kernel:
Nov 21 11:51:30 lustre-mds-0-0 kernel: Pid: 26979, comm: mdt00_001 Not tainted 2.6.32-358.18.1.el6_lustre.x86_64 #1 SUN MICROSYSTEMS SUN FIRE X4170 SERVER /ASSY,MOTHERBOARD,X4170
Nov 21 11:51:30 lustre-mds-0-0 kernel: RIP: 0010:[<ffffffffa0cfa246>] [<ffffffffa0cfa246>] mdt_dump_lmm+0x16/0x410 [mdt]
Nov 21 11:51:30 lustre-mds-0-0 kernel: RSP: 0018:ffff8806791e3a20 EFLAGS: 00010282
Nov 21 11:51:30 lustre-mds-0-0 kernel: RAX: 0000000000000003 RBX: ffff88036f1ed000 RCX: ffffc9001e8bada0
Nov 21 11:51:30 lustre-mds-0-0 kernel: RDX: ffff88036d465600 RSI: 0000000000000000 RDI: 0000000000000040
Nov 21 11:51:30 lustre-mds-0-0 kernel: RBP: ffff8806791e3a70 R08: 00000000fffffffb R09: 00000000fffffffe
Nov 21 11:51:30 lustre-mds-0-0 kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffff88063b3f8d08
Nov 21 11:51:30 lustre-mds-0-0 kernel: R13: ffff88036b368ba0 R14: 0000000000000038 R15: ffff880370762a68
Nov 21 11:51:30 lustre-mds-0-0 kernel: FS: 00007f177ce6a700(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
Nov 21 11:51:30 lustre-mds-0-0 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Nov 21 11:51:30 lustre-mds-0-0 kernel: CR2: 000000000000001c CR3: 0000000376129000 CR4: 00000000000007f0
Nov 21 11:51:30 lustre-mds-0-0 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Nov 21 11:51:30 lustre-mds-0-0 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Nov 21 11:51:30 lustre-mds-0-0 kernel: Process mdt00_001 (pid: 26979, threadinfo ffff8806791e2000, task ffff8806791e1500)
Nov 21 11:51:30 lustre-mds-0-0 kernel: Stack:
Nov 21 11:51:30 lustre-mds-0-0 kernel: ffff88036f1ed000 ffff88036b295000 ffff8806791e3a70 ffffffffa0ce0812
Nov 21 11:51:30 lustre-mds-0-0 kernel: <d> ffff8806000081ed ffff88036f1ed000 ffff88063b3f8d08 ffff88036b368ba0
Nov 21 11:51:30 lustre-mds-0-0 kernel: <d> 0000000000000038 ffff880370762a68 ffff8806791e3b00 ffffffffa0cf3aef
Nov 21 11:51:30 lustre-mds-0-0 kernel: Call Trace:
Nov 21 11:51:30 lustre-mds-0-0 kernel: [<ffffffffa0ce0812>] ? mdt_pack_attr2body+0xe2/0x270 [mdt]
Nov 21 11:51:30 lustre-mds-0-0 kernel: [<ffffffffa0cf3aef>] mdt_getattr_internal+0x56f/0x1210 [mdt]
Nov 21 11:51:30 lustre-mds-0-0 kernel: [<ffffffffa0371d98>] ? libcfs_log_return+0x28/0x40 [libcfs]
Nov 21 11:51:30 lustre-mds-0-0 kernel: [<ffffffffa0cf55fe>] mdt_getattr_name_lock+0xe6e/0x1980 [mdt]
Nov 21 11:51:30 lustre-mds-0-0 kernel: [<ffffffffa063c135>] ? lustre_msg_buf+0x55/0x60 [ptlrpc]
Nov 21 11:51:30 lustre-mds-0-0 kernel: [<ffffffffa0664646>] ? __req_capsule_get+0x166/0x700 [ptlrpc]
Nov 21 11:51:30 lustre-mds-0-0 kernel: [<ffffffffa0cf63ad>] mdt_intent_getattr+0x29d/0x490 [mdt]
Nov 21 11:51:30 lustre-mds-0-0 kernel: [<ffffffffa0ce2f1e>] mdt_intent_policy+0x39e/0x720 [mdt]
Nov 21 11:51:30 lustre-mds-0-0 kernel: [<ffffffffa05f4831>] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc]
Nov 21 11:51:30 lustre-mds-0-0 kernel: [<ffffffffa061b1ef>] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc]
Nov 21 11:51:30 lustre-mds-0-0 kernel: [<ffffffffa0ce33a6>] mdt_enqueue+0x46/0xe0 [mdt]
Nov 21 11:51:30 lustre-mds-0-0 kernel: [<ffffffffa0ce9a97>] mdt_handle_common+0x647/0x16d0 [mdt]
Nov 21 11:51:30 lustre-mds-0-0 kernel: [<ffffffffa0d233f5>] mds_regular_handle+0x15/0x20 [mdt]
Nov 21 11:51:30 lustre-mds-0-0 kernel: [<ffffffffa064d3c8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
Nov 21 11:51:30 lustre-mds-0-0 kernel: [<ffffffffa0378e05>] ? lc_watchdog_touch+0xd5/0x170 [libcfs]
Nov 21 11:51:30 lustre-mds-0-0 kernel: [<ffffffffa0644729>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
Nov 21 11:51:30 lustre-mds-0-0 kernel: [<ffffffffa03772d1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
Nov 21 11:51:30 lustre-mds-0-0 kernel: [<ffffffff81055ad3>] ? __wake_up+0x53/0x70
Nov 21 11:51:30 lustre-mds-0-0 kernel: [<ffffffffa064e75e>] ptlrpc_main+0xace/0x1700 [ptlrpc]
Nov 21 11:51:30 lustre-mds-0-0 kernel: [<ffffffffa064dc90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
Nov 21 11:51:30 lustre-mds-0-0 kernel: [<ffffffff8100c0ca>] child_rip+0xa/0x20
Nov 21 11:51:30 lustre-mds-0-0 kernel: [<ffffffffa064dc90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
Nov 21 11:51:30 lustre-mds-0-0 kernel: [<ffffffffa064dc90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
Nov 21 11:51:30 lustre-mds-0-0 kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
Nov 21 11:51:30 lustre-mds-0-0 kernel: Code: 41 ab 96 ff 48 89 83 70 04 00 00 e9 2d ff ff ff 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 53 48 83 ec 28 0f 1f 44 00 00 <44> 0f b7 66 1c 41 89 fe 41 89 fd 48 89 f3 41 81 e6 00 04 06 02
Nov 21 11:51:30 lustre-mds-0-0 kernel: RIP [<ffffffffa0cfa246>] mdt_dump_lmm+0x16/0x410 [mdt]
Nov 21 11:51:30 lustre-mds-0-0 kernel: RSP <ffff8806791e3a20>
Nov 21 11:51:30 lustre-mds-0-0 kernel: CR2: 000000000000001c
Nov 21 11:51:30 lustre-mds-0-0 kernel: --[ end trace 3314141bd29618c1 ]--

Comment by Di Wang [ 23/Nov/13 ]

It seems ma_lmm is NULL, even MA_LOV is setup and ma_lmm_size > 0. I post a debug patch http://review.whamcloud.com/8377 , Minh will try later, hope we can get more information.

Comment by Ned Bass [ 11/Dec/13 ]

We started hitting this same oops on a production filesystem. MDS is stuck in a crash/reboot cycle. It was upgraded from 2.1 last week. Has there been any progress on this bug?

Comment by Ned Bass [ 11/Dec/13 ]

Our tree is here: https://github.com/chaos/lustre/commits/2.4.0-19chaos

The backtrace looks like this:

mdt_getattr_internal
mdt_getattr_name_lock
mdt_intent_getattr
mdt_indent_policy
ldlm_lock_enqueue
ldlm_handle_enqueue0
mdt_enqueue
mdt_handle_common
mds_regular_handle
ptlrpc_server_handle_request
pltrpc_main

I can't provide crash dumps, but I can carry out debugging procedures and post results.

Comment by Ned Bass [ 11/Dec/13 ]

Interestingly, I just learned that two different filesystems started hitting this within minutes of each other. Perhaps a common client is sending problematic requests to both MDSs. On the other system, many of the NULL pointer derefs are preceded by a message:

LustreError: 4367:0:(mdt_lvb.c:175:mdt_lvbo_fill()) ls6-MDT0000: expected 80 actual 0.
Comment by Ned Bass [ 11/Dec/13 ]

I started the MDS with the debug patch and got this:

ASSERTION( ma->ma_lmm_size != 0 && ma->ma_lmm != NULL ) failed: Invalid EA 80:(null) [0x1a89081cdcb:0x13b:0x0]
Comment by Minh Diep [ 11/Dec/13 ]

Hi Ned,

could you do
echo 0 > /proc/sys/lnet/panic_on_lbug
echo -1 > /proc/sys/lnet/debug
echo 50 > /proc/sys/lnet/debug_mb

after it crash, it should have a debug log in /tmp. please attach it
if you can identify the client that cause it, please collect debug -1 too when it crash

Thanks

Comment by Ned Bass [ 11/Dec/13 ]

Hi Minh, sorry this system is classified so I can't provide debug logs.

As a test, I modified mdt_getattr_internal() to log an error and return EFAULT if ma->ma_lmm == NULL. In that case, it logged the error for the same FID twice. I was able to use fid2path to get the file name, and I was able to read the file and get its striping EA info. But I don't know safe it is to run with that patch.

Comment by Di Wang [ 11/Dec/13 ]

Are there any 1.8 clients attached to this server? The reason of this LBUG, which happened in SDSC, is because some 1.8 clients send RPC with invalid format to 2.4 server, which 2.4 does not handle it well enough.

Comment by Ned Bass [ 12/Dec/13 ]

I don't think there's any 1.8 clients, but do you know of a way to determine that from the server side?

Comment by Di Wang [ 12/Dec/13 ]

Hmm, checking debug log to find out which client the RPC is from? I probably can provide a patch to stop the LBUG, but we probably need find out why client send invalid RPC here.

Comment by Ned Bass [ 12/Dec/13 ]

Di, OK I'm trying to get a debug log.

Comment by Di Wang [ 12/Dec/13 ]

Here is a patch to check the RPC on MDT.

http://review.whamcloud.com/8550

With this patch, it might not LBUG again, but we need figure out the reason why the client sends this RPC.

Comment by Ned Bass [ 12/Dec/13 ]

Hmm, it still crashes in the same place with patch 8550. My only guess is that it changes the ma->ma_lmm pointer in mdt_attr_get_lov().

Comment by Ned Bass [ 12/Dec/13 ]

I moved the check from patch 8550 to just above the call to mdt_dump_lmm(). Now I get the debug message with the client uuid. On both MDS the bad request came from the same x86_64 client cluster which is running lustre-2.4.0-19chaos.

Comment by Ned Bass [ 12/Dec/13 ]

I can reproduce the bad request by opening any file from some, but not all, nodes on that client cluster (graph). The open on the client fails with EFAULT and logs

LustreError: ... ll_inode_revalidate_fini() ls6: revalidate FID [0x...] error: rc = -14

Dropping caches or remounting the filesystem doesn't fix it. Rebooting the client node does fix it. That client cluster was updated today, and used to run Lustre 2.1. We updated another client cluster (zin) to the same software levels today, but I don't see the problem from zin. So it seems there is something funny going on with graph.

Comment by Di Wang [ 12/Dec/13 ]

There is code snippet at the end of mdt_attr_get_complex

        int mdt_attr_get_complex(struct mdt_thread_info *info,
                         struct mdt_object *o, struct md_attr *ma)
        .....
        ma->ma_need = need;
        CDEBUG(D_INODE, "after getattr rc = %d, ma_valid = "LPX64" ma_lmm=%p\n",
               rc, ma->ma_valid, ma->ma_lmm);
        RETURN(rc);

Could you please tell me what did you get when this -14 happens?

Comment by Di Wang [ 12/Dec/13 ]

I moved the check from patch 8550 to just above the call to mdt_dump_lmm().

How did you check there? You can not check buffer->lb_buf anymore, if you move this check above mdt_dump_lmm, because buffer->lb_buf are from thread_info, which will be reused in the following function.

Comment by Ned Bass [ 12/Dec/13 ]

I checked ma->ma_lmm.

I'll post the other debug information you requested tomorrow morning PST when I get back to the office.

Comment by Ned Bass [ 12/Dec/13 ]

Di, here is the CDEBUG message.

mdt_attr_get_complex() after getattr rc = 0, ma_valid = 0x3 ma_lmm=(null)
Comment by Di Wang [ 12/Dec/13 ]

Ned, please try this patch http://review.whamcloud.com/#/c/8550/, which will not retrieve lovEA, if client does not require, and also some other debug information, in case this does not resolve the issue.

Comment by Ned Bass [ 12/Dec/13 ]

Di, with that patch I get:

mdt_getattr_internal() ls4-MDT0000: RPC from <UUID>: does not need LOVEA
mdt_attr_get_lov() [<FID>] retrieve lovEA with (null):0
mdt_attr_get_lov() [<FID>] got lovEA with (null):80

This would have crashed, but I kept the check before mdt_dump_lmm() to log an error and return EFAULT if ma->ma_lmm is NULL.

Comment by Di Wang [ 12/Dec/13 ]

hmm, so the RPC does not need LOVEA, but somehow the server still trying to get it. hmm

Comment by Ned Bass [ 12/Dec/13 ]

My only guess is that mdt_getattr_name_lock() sets MA_LOV in ma_need:

1422                 if (try_layout) {
1423                         child_bits |= MDS_INODELOCK_LAYOUT;
1424                         /* try layout lock, it may fail to be granted due to
1425                          * contention at LOOKUP or UPDATE */
1426                         if (!mdt_object_lock_try(info, child, lhc, child_bits,
1427                                                  MDT_CROSS_LOCK)) {
1428                                 child_bits &= ~MDS_INODELOCK_LAYOUT;
1429                                 LASSERT(child_bits != 0);
1430                                 rc = mdt_object_lock(info, child, lhc,
1431                                                 child_bits, MDT_CROSS_LOCK);
1432                         } else {
1433                                 ma_need |= MA_LOV;
...
1452         rc = mdt_getattr_internal(info, child, ma_need);
Comment by Di Wang [ 12/Dec/13 ]

Hmm, it seems to me cl_max_md_size is not being set correctly in time in some cases, which client will reply to pack getattr RPC. I update the patch, but that requires to remount the client. As you said remount can fix the problem? So after remount you can not reproduce the problem anymore? Anyway can you try this patch? thanks.

Comment by Ned Bass [ 12/Dec/13 ]

After rebooting more clients, many did come up still having this problem. So we should be able to verify pretty easily if the patch works.

Comment by Ned Bass [ 13/Dec/13 ]

Di, initial results are that the client-side patch does not fix the problem.

Comment by Di Wang [ 13/Dec/13 ]

Hmm, I think there are two problems here
1. MDS should not retrieve LOVEA if there are no room in the request, this is easy to fix.
2. Why clients send getattr RPC without reserving the space for LOVEA, IMHO, if the cl_max_md_size is being set correctly, all of getattr RPC should have enough space for LOVEA. Hmm, could you check debug log on the client side to see where is the RPC from. is it from mdc_intent_getattr_pack ?

Comment by Ned Bass [ 13/Dec/13 ]

Di, this client normally mounts 6 filesystems. The problem seems more likely to happen if 5 or more of those filesystems are mounted. I have yet to see it if I only mount 4 or fewer of the filesystems. I'm using about 100 client nodes to test so the evidence is pretty compelling.

Comment by Ned Bass [ 13/Dec/13 ]

My test looks something like this, where <hostlist> contains about 100 nodes (pdsh is a distributed remote command invoker):

pdsh -w <hostlist> 'umount -a -t lustre ; for x in 1 2 3 4 5 6 ; do mount /p/lscratch$x ; done ; cat /p/lscratch5/bass6/x 2>&1'

If I mount all six filesystems, I always get some subset of nodes for which the cat command failed with 'Bad address'.

If I mount only five, I usually but not always get failures.

If I mount four or fewer, I never get failures.

Comment by Di Wang [ 13/Dec/13 ]

Ned, Could you try this patch

diff --git a/lustre/mdc/mdc_locks.c b/lustre/mdc/mdc_locks.c
index 28c07b4..4541f1a 100644
--- a/lustre/mdc/mdc_locks.c
+++ b/lustre/mdc/mdc_locks.c
@@ -434,6 +434,10 @@ static struct ptlrpc_request *mdc_intent_getattr_pack(struct obd_export *exp,
         lit = req_capsule_client_get(&req->rq_pill, &RMF_LDLM_INTENT);
         lit->opc = (__u64)it->it_op;
 
+       if (obddev->u.cli.cl_max_mds_easize == 0) {
+               CERROR("%s: cl_max_mds_easize is zero!\n", obddev->obd_name);
+               RETURN(ERR_PTR(-EINVAL));
+       }
         /* pack the intended request */
         mdc_getattr_pack(req, valid, it->it_flags, op_data,
                          obddev->u.cli.cl_max_mds_easize);

to see whether you can see the error message from client side?

Comment by Ned Bass [ 13/Dec/13 ]

I'll try it.

Regarding my earlier comment, I do sometimes see failure with 3 4 or mounts, but usually only one client out of 100. So it seems to become more likely with more mounts. Also the order of mounts seems to matter. I don't get errors if I mount all six in reverse order :/ I can't figure out what the pattern is yet.

Comment by Ned Bass [ 13/Dec/13 ]

No, the error does not show up on the client when I reproduce the bug.

Comment by Di Wang [ 13/Dec/13 ]

Hmm, this probably mean cl_max_md_size initialization is being delayed when there are multiple client being mounted at the same time, since this cl_max_md_size is being updated when OSC is connected. so we need initialize the cl_max_md_size synchronously during the mount process.

Comment by Di Wang [ 13/Dec/13 ]

Hmm, could you please try this plus 8550

diff --git a/lustre/mdc/mdc_request.c b/lustre/mdc/mdc_request.c
index cc592a7..f9ef390 100644
--- a/lustre/mdc/mdc_request.c
+++ b/lustre/mdc/mdc_request.c
@@ -189,14 +189,13 @@ static int mdc_getattr_common(struct obd_export *exp,
 
         CDEBUG(D_NET, "mode: %o\n", body->mode);
 
-        if (body->eadatasize != 0) {
-                mdc_update_max_ea_from_body(exp, body);
-
-                eadata = req_capsule_server_sized_get(pill, &RMF_MDT_MD,
-                                                      body->eadatasize);
-                if (eadata == NULL)
-                        RETURN(-EPROTO);
-        }
+       mdc_update_max_ea_from_body(exp, body);
+       if (body->eadatasize != 0) {
+               eadata = req_capsule_server_sized_get(pill, &RMF_MDT_MD,
+                                                     body->eadatasize);
+               if (eadata == NULL)
+                       RETURN(-EPROTO);
+       }
 
         if (body->valid & OBD_MD_FLRMTPERM) {


Comment by Di Wang [ 13/Dec/13 ]

Hmm, I compared lustre b2_4 code with lustre chaos 2.4, in mdc_intent_getattr_pack

chaos lustre 2.4

        /* pack the intended request */
        mdc_getattr_pack(req, valid, it->it_flags, op_data,
                         obddev->u.cli.cl_default_mds_easize);

my lustre b2_4 branch

     /* pack the intended request */
        mdc_getattr_pack(req, valid, it->it_flags, op_data,
                         obddev->u.cli.cl_max_mds_easize);

Ned, Could you please confirm that? Some one change this code in chaos?

Comment by Di Wang [ 13/Dec/13 ]

Oh, it seems from LU-3338. Ned, does this patch in your lustre source(lustre-2.4.0-19chaos)?

Comment by Ned Bass [ 13/Dec/13 ]

Yes, we are running with the LU-3338 patch.

Comment by Ned Bass [ 13/Dec/13 ]

Initial results with your latest patch to mdc_getattr_common() are good. The problem is not happening so far with that patch.

Comment by Di Wang [ 13/Dec/13 ]

Hmm, the patch try to get the max_md_size from MDS initially, which is MAX_MD_SIZE; (4 stripes), so initially, it will set the default_md_size to be 4 stripes, intead of one,

         cli->cl_default_mds_easize =
                            min_t(__u32, body->max_mdsize, PAGE_CACHE_SIZE);

I am not so sure, this is the right way to fix this.

Comment by Di Wang [ 16/Dec/13 ]

http://review.whamcloud.com/#/c/8550/

Comment by Di Wang [ 17/Dec/13 ]

http://review.whamcloud.com/8599 for master branch

Comment by Peter Jones [ 07/Jan/14 ]

Fix landed for 2.6

Comment by James Beal [ 08/Jan/14 ]

Is it too late for this to hit 2.5.1 ?

Comment by Peter Jones [ 08/Jan/14 ]

No it is not too late so this is still possible.

Comment by James Beal [ 08/Jan/14 ]

Thank you that would be appreciated...

Generated at Sat Feb 10 01:40:46 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.