[LU-4455] GPF crash on Lustre client acting as NFS server Created: 08/Jan/14 Updated: 05/Aug/20 Resolved: 05/Aug/20 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.1 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Marion Hakanson (Inactive) | Assignee: | Zhenyu Xu |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | None | ||
| Environment: |
CentOS 6.4, kernel 2.6.32-358.18.1.el6.x86_64, Lustre client acting as NFS server for 2-node cluster. |
||
| Attachments: |
|
| Rank (Obsolete): | 12211 |
| Description |
|
Cluster head node acts as Lustre client, and re-exports Lustre mountpoint via NFS to its compute node NFS clients. When applications start using NFS with significant I/O (gigabit ethernet is sufficient), head node NFS services hang, and eventually the head node crashes with a GPF panic. There are messages on the system console and in "dmesg" about both Lustre and nfsd issues (see attached vmcore-dmesg.txt file). |
| Comments |
| Comment by Keith Mannthey (Inactive) [ 08/Jan/14 ] |
|
These Messages are indicative of an NFS issue: <4>nfsd: peername failed (err 107)! <4>nfsd: peername failed (err 107)! <4>nfsd: peername failed (err 107)! <4>nfsd: peername failed (err 107)! <4>nfsd: peername failed (err 107)! <4>nfsd: peername failed (err 107)! They seem to cause a bad chain of events that leads to this GPF: 3>LustreError: 18503:0:(mdc_locks.c:840:mdc_enqueue()) ldlm_cli_enqueue: -116 <3>LustreError: 18503:0:(vvp_io.c:1230:vvp_io_init()) lustre1: refresh file layout [0x200000406:0x15c1b:0x0] error -116. <3>LustreError: 11-0: lustre1-MDT0000-mdc-ffff883ff68fe000: Communicating with 192.168.5.111@o2ib, operation ldlm_enqueue failed with -116. <3>LustreError: 18503:0:(mdc_locks.c:840:mdc_enqueue()) ldlm_cli_enqueue: -116 <3>LustreError: 18503:0:(vvp_io.c:1230:vvp_io_init()) lustre1: refresh file layout [0x200000406:0x1b5c0:0x0] error -116. <4>general protection fault: 0000 [#1] SMP <4>last sysfs file: /sys/devices/pci0000:ff/0000:ff:1e.7/irq <4>CPU 4 <4>Modules linked in: lmv(U) mgc(U) lustre(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) sha512_generic sha256_generic crc32c_intel libcfs(U) nfsd exportfs autofs4 nfs lockd fscache auth_rpcgss nfs_acl sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf bonding 8021q garp stp llc ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables rdma_ucm rdma_cm iw_cm ib_addr ib_ipoib ib_cm ipv6 ib_uverbs ib_umad iw_nes libcrc32c iw_cxgb4 cxgb4 iw_cxgb3 cxgb3 ib_qib mlx4_en mlx4_ib ib_sa ib_mthca ib_mad ib_core raid1 mlx4_core ixgbe mdio igb dca ptp pps_core microcode sg i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support shpchp ext4 jbd2 mbcache isci libsas scsi_transport_sas sr_mod cdrom sd_mod crc_t10dif ahci wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] <4> <4>Pid: 16743, comm: nfsd Not tainted 2.6.32-358.18.1.el6.x86_64 #1 Intel Corporation S2600GZ/S2600GZ <4>RIP: 0010:[<ffffffff8105690c>] [<ffffffff8105690c>] update_curr+0x14c/0x1f0 <4>RSP: 0018:ffff880099a83db8 EFLAGS: 00010092 <4>RAX: ffff8840035baaa0 RBX: 0000000000013200 RCX: ffff88201ff12240 <4>RDX: ccccccccccce5fa4 RSI: 0000000000000000 RDI: ffff8840035baad8 <4>RBP: ffff880099a83de8 R08: ffffffff8160bb65 R09: 0000000000000000 <4>R10: 0000000000000010 R11: 0000000000000000 R12: ffff880099a96768 <4>R13: 00000000000f41c8 R14: 0001345e1adea991 R15: ffff8840035baaa0 <4>FS: 0000000000000000(0000) GS:ffff880099a80000(0000) knlGS:0000000000000000 <4>CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b <4>CR2: 00007f5b9ec29000 CR3: 00000040111c7000 CR4: 00000000001407e0 <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 <4>Process nfsd (pid: 16743, threadinfo ffff883f26abc000, task ffff8840035baaa0) <4>Stack: <4> ffff880099a83dc8 ffffffff81013873 ffff8840035baad8 ffff880099a96768 <4><d> 0000000000000000 0000000000000000 ffff880099a83e18 ffffffff81056ebb <4><d> ffff880099a96700 0000000000000004 0000000000016700 0000000000000004 <4>Call Trace: <4> <IRQ> <4> [<ffffffff81013873>] ? native_sched_clock+0x13/0x80 <4> [<ffffffff81056ebb>] task_tick_fair+0xdb/0x160 <4> [<ffffffff8105ad01>] scheduler_tick+0xc1/0x260 <4> [<ffffffff810a8060>] ? tick_sched_timer+0x0/0xc0 <4> [<ffffffff810812fe>] update_process_times+0x6e/0x90 <4> [<ffffffff810a80c6>] tick_sched_timer+0x66/0xc0 <4> [<ffffffff8109b4ae>] __run_hrtimer+0x8e/0x1a0 <4> [<ffffffff810a219f>] ? ktime_get_update_offsets+0x4f/0xd0 <4> [<ffffffff8107710f>] ? __do_softirq+0x11f/0x1e0 <4> [<ffffffff8109b816>] hrtimer_interrupt+0xe6/0x260 <4> [<ffffffff815177cb>] smp_apic_timer_interrupt+0x6b/0x9b <4> [<ffffffff8100bb93>] apic_timer_interrupt+0x13/0x20 <4> <EOI> <4> [<ffffffffa083774c>] ? cl_page_io_start+0xc/0x130 [obdclass] <4> [<ffffffffa08386be>] cl_page_prep+0x19e/0x210 [obdclass] <4> [<ffffffffa0b68057>] ? osc_page_transfer_add+0x77/0xb0 [osc] <4> [<ffffffffa0b6d144>] osc_io_submit+0x194/0x4a0 [osc] <4> [<ffffffffa08426dc>] cl_io_submit_rw+0x6c/0x160 [obdclass] <4> [<ffffffffa0c02191>] lov_io_submit+0x351/0xbc0 [lov] <4> [<ffffffffa08426dc>] cl_io_submit_rw+0x6c/0x160 [obdclass] <4> [<ffffffffa0844cfe>] cl_io_read_page+0xae/0x170 [obdclass] <4> [<ffffffffa0838ac7>] ? cl_page_assume+0xf7/0x220 [obdclass] <4> [<ffffffffa0ca8056>] ll_readpage+0x96/0x1a0 [lustre] <4> [<ffffffff811b0cf8>] __generic_file_splice_read+0x3a8/0x560 <4> [<ffffffffa06fd977>] ? cfs_hash_bd_lookup_intent+0x37/0x130 [libcfs] <4> [<ffffffffa06fd977>] ? cfs_hash_bd_lookup_intent+0x37/0x130 [libcfs] <4> [<ffffffffa083c71b>] ? cl_lock_fits_into+0x6b/0x190 [obdclass] <4> [<ffffffffa0bfc8cf>] ? lov_lock_fits_into+0x3ef/0x540 [lov] <4> [<ffffffffa083c6a1>] ? cl_lock_mutex_tail+0x51/0x60 [obdclass] <4> [<ffffffffa0833825>] ? cl_env_info+0x15/0x20 [obdclass] <4> [<ffffffff8109ca9f>] ? up+0x2f/0x50 <4> [<ffffffff81075887>] ? current_fs_time+0x27/0x30 <4> [<ffffffff811aeb90>] ? spd_release_page+0x0/0x20 <4> [<ffffffff811b0efa>] generic_file_splice_read+0x4a/0x90 <4> [<ffffffffa0cd52a5>] vvp_io_read_start+0x3c5/0x470 [lustre] <4> [<ffffffffa084283a>] cl_io_start+0x6a/0x140 [obdclass] <4> [<ffffffffa0846f74>] cl_io_loop+0xb4/0x1b0 [obdclass] <4> [<ffffffffa0c7b8cf>] ll_file_io_generic+0x33f/0x600 [lustre] <4> [<ffffffffa0c7c2e0>] ll_file_splice_read+0xb0/0x1d0 [lustre] <4> [<ffffffff811af15b>] do_splice_to+0x6b/0xa0 <4> [<ffffffff811af45f>] splice_direct_to_actor+0xaf/0x1c0 <4> [<ffffffffa06a03b0>] ? nfsd_direct_splice_actor+0x0/0x20 [nfsd] <4> [<ffffffffa06a0e70>] nfsd_vfs_read+0x1a0/0x1c0 [nfsd] <4> [<ffffffffa06a24a0>] nfsd_read_file+0x90/0xb0 [nfsd] <4> [<ffffffffa06b60ef>] nfsd4_encode_read+0x13f/0x240 [nfsd] <4> [<ffffffffa06bbd46>] ? nfs4_preprocess_stateid_op+0x1f6/0x310 [nfsd] <4> [<ffffffffa06afd35>] nfsd4_encode_operation+0x75/0x180 [nfsd] <4> [<ffffffffa06add35>] nfsd4_proc_compound+0x195/0x490 [nfsd] <4> [<ffffffffa069b43e>] nfsd_dispatch+0xfe/0x240 [nfsd] <4> [<ffffffffa057c614>] svc_process_common+0x344/0x640 [sunrpc] <4> [<ffffffff81063410>] ? default_wake_function+0x0/0x20 <4> [<ffffffffa057cc50>] svc_process+0x110/0x160 [sunrpc] <4> [<ffffffffa069bb62>] nfsd+0xc2/0x160 [nfsd] <4> [<ffffffffa069baa0>] ? nfsd+0x0/0x160 [nfsd] <4> [<ffffffff81096a36>] kthread+0x96/0xa0 <4> [<ffffffff8100c0ca>] child_rip+0xa/0x20 <4> [<ffffffff810969a0>] ? kthread+0x0/0xa0 <4> [<ffffffff8100c0c0>] ? child_rip+0x0/0x20 <4>Code: d2 74 34 48 8b 50 08 8b 5a 18 48 8b 90 10 09 00 00 48 8b 4a 50 48 85 c9 74 1d 48 63 db 66 90 48 8b 51 20 48 03 14 dd a0 de bf 81 <4c> 01 2a 48 8b 49 78 48 85 c9 75 e8 48 8b 98 68 07 00 00 48 85 <1>RIP [<ffffffff8105690c>] update_curr+0x14c/0x1f0 <4> RSP <ffff880099a83db8> |
| Comment by Andreas Dilger [ 08/Jan/14 ] |
|
It looks like this is a very deep stack in the NFSd->Lustre IO path, and then there is an interrupt and it overflows the stack and crashes. In particular, __generic_file_splice_read() is consuming about 400 bytes of stack by itself, most of it in *pages and partial, which is bad. Separately, it would be good to quiet the console messages for -ESTALE: <3>LustreError: 16799:0:(file.c:2716:ll_inode_revalidate_fini()) lustre1: revalidate FID [0x20000057e:0x76:0x0] error: rc = -116 <3>LustreError: 11-0: lustre1-MDT0000-mdc-ffff883ff68fe000: Communicating with 192.168.5.111@o2ib, operation ldlm_enqueue failed with -116. <3>LustreError: 18503:0:(mdc_locks.c:840:mdc_enqueue()) ldlm_cli_enqueue: -116 <3>LustreError: 18503:0:(vvp_io.c:1230:vvp_io_init()) lustre1: refresh file layout [0x200000405:0x13c34:0x0] error -116. That is probably a result of NFS not being cache coherent and a file getting deleted while it is undergoing IO. |
| Comment by Marion Hakanson (Inactive) [ 09/Jan/14 ] |
|
New crash. Previous crash was with nfsd thread count at 8, while this crash was with the thread count upped to 128. We have been advised to try 512, so that setting is in place going forward. |
| Comment by Keith Mannthey (Inactive) [ 10/Jan/14 ] |
4>NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory <================================ NFS starts? <6>NFSD: starting 90-second grace period <6>certmonger[5563]: segfault at 0 ip 0000003493527a96 sp 00007ffff778cf78 error 4 in libc-2.12.so[3493400000+18a000] <5>Bridge firewalling registered <6>LNet: HW CPU cores: 48, npartitions: 8 <6>alg: No test for crc32 (crc32-table) <6>alg: No test for adler32 (adler32-zlib) <6>alg: No test for crc32 (crc32-pclmul) <5>padlock: VIA PadLock Hash Engine not detected. <6>Lustre: Lustre: Build Version: 2.4.1-RC2--PRISTINE-2.6.32-358.18.1.el6.x86_64 <6>LNet: Added LNI 192.168.5.120@o2ib [8/512/0/180] <6>Lustre: Layout lock feature supported. <4>Lustre: Mounted lustre1-client <========================= NFS Starts <4>nfsd: last server has exited, flushing export cache <4>NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory <6>NFSD: starting 90-second grace period This 2nd crash id very different. Do you hit this on start up of after some time? In general I would wait to start nfsd until after Lustre has been mounted. |
| Comment by Marion Hakanson (Inactive) [ 10/Jan/14 ] |
|
Both crashes occurred after being up for some time (12+ hours). Seems to happen after some period of steady I/O load from the NFS clients. Currently the Lustre mount does not happen automatically (neither on the head node nor for that path on the NFS clients). I.e. none of the NFS clients are mounting that path until after someone manually mounts the Lustre path on the head node. By the way, my earlier comment about this most recent crash having happened with 128 nfsd threads active was mistaken. There was another crash yesterday that I was unaware of until examining "last" logs, so today's crash was with 512 nfsd threads active (in case it makes a difference). |
| Comment by Peter Jones [ 10/Mar/14 ] |
|
Bobijam Could you please advise on this one? Thanks Peter |
| Comment by Keith Mannthey (Inactive) [ 14/Mar/14 ] |
|
It has been reported that |
| Comment by Peter Jones [ 28/Apr/14 ] |
|
Marion Have you been able to check whether this issue still exists with the Peter |
| Comment by Marion Hakanson (Inactive) [ 28/Apr/14 ] |
|
Peter, No, we haven't arranged a downtime of the cluster to test that yet. Marion |
| Comment by Andreas Dilger [ 05/Aug/20 ] |
|
Close old ticket that has not been seen recently. |