[LU-9868] dcache/namei fixes for lustre Created: 11/Aug/17 Updated: 24/Nov/22 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Upstream, Lustre 2.11.0, Lustre 2.12.0, Lustre 2.10.5 |
| Fix Version/s: | Upstream |
| Type: | Bug | Priority: | Major |
| Reporter: | James A Simmons | Assignee: | James A Simmons |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | patch, upstream | ||
| Environment: |
upstream and well as any lustre 2.11 client. |
||
| Issue Links: |
|
||||||||||||||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||||||||||||||
| Description |
|
From Neil Brown that works at SuSE I was drawn to look at this code due to the tests on A particular change here from the previous posting is This code passes basic "smoke tests". Note that the cast to "struct dentry *" in the first patch is because |
| Comments |
| Comment by Gerrit Updater [ 11/Aug/17 ] |
|
James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/28486 |
| Comment by Marek Magrys [ 28/Sep/18 ] |
|
I gave it a spin, but opening a file under vim (probably a mix of stat/open/write/chmod operations) hangs the process. Trace is :
Sep 28 17:50:46 p0851 kernel: INFO: rcu_sched self-detected stall on CPU { 7} (t=60000 jiffies g=16256 c=16255 q=13733)
Sep 28 17:50:46 p0851 kernel: Task dump for CPU 7:
Sep 28 17:50:46 p0851 kernel: vim R running task 0 28301 28299 0x0000000b
Sep 28 17:50:46 p0851 kernel: Call Trace:
Sep 28 17:50:46 p0851 kernel: <IRQ> [<ffffffffa6ed1268>] sched_show_task+0xa8/0x110
Sep 28 17:50:46 p0851 kernel: [<ffffffffa6ed4e39>] dump_cpu_task+0x39/0x70
Sep 28 17:50:46 p0851 kernel: [<ffffffffa6f4bb00>] rcu_dump_cpu_stacks+0x90/0xd0
Sep 28 17:50:46 p0851 kernel: [<ffffffffa6f4f1a2>] rcu_check_callbacks+0x442/0x730
Sep 28 17:50:46 p0851 kernel: [<ffffffffa6f046c0>] ? tick_sched_do_timer+0x50/0x50
Sep 28 17:50:46 p0851 kernel: [<ffffffffa6ea7a16>] update_process_times+0x46/0x80
Sep 28 17:50:46 p0851 kernel: [<ffffffffa6f044c0>] tick_sched_handle+0x30/0x70
Sep 28 17:50:46 p0851 kernel: [<ffffffffa6f046f9>] tick_sched_timer+0x39/0x80
Sep 28 17:50:46 p0851 kernel: [<ffffffffa6ec2163>] __hrtimer_run_queues+0xf3/0x270
Sep 28 17:50:46 p0851 kernel: [<ffffffffa6ec26ef>] hrtimer_interrupt+0xaf/0x1d0
Sep 28 17:50:46 p0851 kernel: [<ffffffffa6e596ab>] local_apic_timer_interrupt+0x3b/0x60
Sep 28 17:50:46 p0851 kernel: [<ffffffffa752a083>] smp_apic_timer_interrupt+0x43/0x60
Sep 28 17:50:46 p0851 kernel: [<ffffffffa75267b2>] apic_timer_interrupt+0x162/0x170
Sep 28 17:50:46 p0851 kernel: <EOI> [<ffffffffa6f0bb43>] ? native_queued_spin_lock_slowpath+0x1d3/0x200
Sep 28 17:50:46 p0851 kernel: [<ffffffffa750dc5f>] queued_spin_lock_slowpath+0xb/0xf
Sep 28 17:50:46 p0851 kernel: [<ffffffffa751b750>] _raw_spin_lock+0x20/0x30
Sep 28 17:50:46 p0851 kernel: [<ffffffffa703bf3e>] igrab+0x1e/0x60
Sep 28 17:50:46 p0851 kernel: [<ffffffffc1026040>] ll_inode_from_resource_lock+0xa0/0xb0 [lustre]
Sep 28 17:50:46 p0851 kernel: [<ffffffffc103ed92>] ll_md_blocking_ast+0x52/0x730 [lustre]
Sep 28 17:50:46 p0851 kernel: [<ffffffffa6f9f8e4>] ? free_one_page+0x2e4/0x310
Sep 28 17:50:46 p0851 kernel: [<ffffffffc0e1321a>] ldlm_cancel_callback+0x8a/0x330 [ptlrpc]
Sep 28 17:50:46 p0851 kernel: [<ffffffffc0e81f22>] ? null_free_repbuf+0xd2/0x200 [ptlrpc]
Sep 28 17:50:46 p0851 kernel: [<ffffffffa6fa01fd>] ? __free_pages+0x1d/0x30
Sep 28 17:50:46 p0851 kernel: [<ffffffffa6fa0412>] ? __free_memcg_kmem_pages+0x22/0x50
Sep 28 17:50:46 p0851 kernel: [<ffffffffc0e1e9b0>] ldlm_cli_cancel_local+0xa0/0x420 [ptlrpc]
Sep 28 17:50:46 p0851 kernel: [<ffffffffc0e2303a>] ldlm_cli_cancel_list_local+0xea/0x280 [ptlrpc]
Sep 28 17:50:46 p0851 kernel: [<ffffffffc0e2335b>] ldlm_cancel_resource_local+0x18b/0x280 [ptlrpc]
Sep 28 17:50:46 p0851 kernel: [<ffffffffc0a93e92>] mdc_resource_get_unused+0x142/0x2a0 [mdc]
Sep 28 17:50:46 p0851 kernel: [<ffffffffc0a94217>] mdc_setattr+0x227/0x4a0 [mdc]
Sep 28 17:50:46 p0851 kernel: [<ffffffffa7037779>] ? dput+0x29/0x160
Sep 28 17:50:46 p0851 kernel: [<ffffffffc0f5ff54>] lmv_setattr+0x1c4/0x570 [lmv]
Sep 28 17:50:46 p0851 kernel: [<ffffffffc103115b>] ll_setattr_raw+0x28b/0x1350 [lustre]
Sep 28 17:50:46 p0851 kernel: [<ffffffffc103228c>] ll_setattr+0x6c/0xd0 [lustre]
Sep 28 17:50:46 p0851 kernel: [<ffffffffa703dc54>] notify_change+0x2c4/0x420
Sep 28 17:50:46 p0851 kernel: [<ffffffffa701c837>] chmod_common+0x137/0x160
Sep 28 17:50:46 p0851 kernel: [<ffffffffa701dac7>] SyS_fchmodat+0x57/0xc0
Sep 28 17:50:46 p0851 kernel: [<ffffffffa701db49>] SyS_chmod+0x19/0x20
Sep 28 17:50:46 p0851 kernel: [<ffffffffa7525a1b>] tracesys+0xa3/0xc9
Tested on Centos 7.5 3.10.0-862.14.4.el7.x86_64 kernel and Lustre 2.10.5. |
| Comment by James A Simmons [ 28/Sep/18 ] |
|
Yeah the original patch was based on a earlier version posted for upstream. A later patch set was done and landed. I just did a push of the new patch. My testing shows its in much better shape. Sorry about the breakage. |
| Comment by Marek Magrys [ 01/Oct/18 ] |
|
We moved patchset 17 into production after some initial testing and as for now it has been running for over 3000 node days without any problems. Is there any plan to have the patchset included in b2_10 or master? |
| Comment by James A Simmons [ 01/Oct/18 ] |
|
Wow, that is awesome news. Before your report of fixing an actually production system their wasn't really a push to merge this for 2.12. Now it looks like it's an important fix for production systems. The way it works is that we have to port patches exactly from upstream to opensfs so we are looking at 5 patches instead of the one test patch I have currently. Since this fixes real problems I will push to 2.10.X once it lands to latest lustre. |
| Comment by Gerrit Updater [ 01/Oct/18 ] |
|
|
| Comment by Gerrit Updater [ 01/Oct/18 ] |
|
James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/33259 |
| Comment by Gerrit Updater [ 01/Oct/18 ] |
|
James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/33260 |
| Comment by Gerrit Updater [ 01/Oct/18 ] |
|
James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/33261 |
| Comment by James A Simmons [ 02/Oct/18 ] |
|
I see while this bug fixes things in newer kernel it breaks ancients kernel like RHEL. |
| Comment by James A Simmons [ 09/Oct/18 ] |
|
I found a work around to the bug the breaks older kernels. I'm testing it now. |
| Comment by Peter Jones [ 11/Oct/18 ] |
|
We think that any changes in this area would need to soak a while and get tested broadly before including into a release |
| Comment by James A Simmons [ 15/Oct/18 ] |
|
I agree. This is a complex problem so caution is required here. |
| Comment by Lukasz Flis [ 17/Oct/18 ] |
|
We are running https://review.whamcloud.com/#/c/28486 - patch set 17 with b2_10 and as Marek said there were no major issues except few nodes with processes blocked in rename operation invoked by software called OOMMF. We are not sure it's related but i thought it's worth reporting Stacktrace:
[455219.524004] oxs D ffff984d68ba1fa0 0 3818 3806 0x00000000 [455219.558389] Call Trace: [455219.571231] [<ffffffffa3f19e59>] schedule_preempt_disabled+0x29/0x70 [455219.602131] [<ffffffffa3f17c17>] __mutex_lock_slowpath+0xc7/0x1d0 [455219.631887] [<ffffffffa3f16fff>] mutex_lock+0x1f/0x2f [455219.656923] [<ffffffffa3a2b50a>] lock_rename+0xda/0xe0 [455219.681973] [<ffffffffa3a315bf>] SYSC_renameat2+0x22f/0x5a0 [455219.709742] [<ffffffffa39caefd>] ? handle_mm_fault+0x39d/0x9b0 [455219.739183] [<ffffffffa3f2056c>] ? __do_page_fault+0x1bc/0x4f0 [455219.767746] [<ffffffffa3a327ae>] SyS_renameat2+0xe/0x10 [455219.793358] [<ffffffffa3a327ee>] SyS_rename+0x1e/0x20 [455219.818174] [<ffffffffa3f2579b>] system_call_fastpath+0x22/0x27 According to our logs issue has started to appear few days after deployment of client patched with https://review.whamcloud.com/#/c/28486 - patch set 17 I have noticed that current patch set is 22 - do you recommend to give upgrade a try ?
UPDATE#1: or is it better way to patch rhel kernel like SuSe did to address the issues now?
|
| Comment by James A Simmons [ 17/Oct/18 ] |
|
I broke the patch up so its more than one now. Also a bug in lustre was exposed while testing this fix. On newer kernels it reports an error but older ones it will crash the node. Once I figure everything out I will port a 2.10 backported patch. The rename issue is new. |
| Comment by Lukasz Flis [ 17/Oct/18 ] |
|
OK. If so we will revert patch 17 from https://review.whamcloud.com/#/c/28486 and see if the rename issue is gone as OOMMF crash is easily reproducible (however we don't have a separate reproducer yet). |
| Comment by Lukasz Flis [ 19/Oct/18 ] |
|
Reverting patch 17 from https://review.whamcloud.com/#/c/28486 fixed the problem with clients hanging on rename in oommf We have four processes interacting, probably one is handling file writes, then signals another one which does rename opearations which lead to client lock-up after several dozens of such operations [pid 5643] futex(0x2b774caf1d20, FUTEX_WAKE_PRIVATE, 1) = 0 [pid 5643] sendto(17, "reply 1762 0 0\r\n", 16, 0, NULL, 0) = 16 [pid 5643] recvfrom(18, "query 13 datafile {/net/scratch/"..., 4096, 0, NULL, NULL) = 305 [pid 5643] stat("/net/scratch/people/plgczuchr/OOMMF/dwieBariery/./dwieBariery/freeAniz_20000_Voltage_0.1500_mtop_-1_mfree_1_mbottom_1_VProfileType_0_/dwieBariery-Oxs_TimeDriver-Magnetization-00-0044000.omf", {st_mode=S_IFREG|0644, st_size=388216, ...}) = 0 [pid 5643] lstat("/net/scratch/people/plgczuchr/slurm_jobdir/13738101/tmp/dwieBariery-p1311.prometheus-5582.omf", {st_mode=S_IFREG|0644, st_size=388216, ...}) = 0 [pid 5643] lstat("/net/scratch/people/plgczuchr/OOMMF/dwieBariery/./dwieBariery/freeAniz_20000_Voltage_0.1500_mtop_-1_mfree_1_mbottom_1_VProfileType_0_/dwieBariery-Oxs_TimeDriver-Magnetization-00-0044000.omf", {st_mode=S_IFREG|0644, st_size=388216, ...}) = 0 [pid 5643] stat("/net/scratch/people/plgczuchr/OOMMF/dwieBariery/./dwieBariery/freeAniz_20000_Voltage_0.1500_mtop_-1_mfree_1_mbottom_1_VProfileType_0_/dwieBariery-Oxs_TimeDriver-Magnetization-00-0044000.omf", {st_mode=S_IFREG|0644, st_size=388216, ...}) = 0 [pid 5643] chmod("/net/scratch/people/plgczuchr/OOMMF/dwieBariery/./dwieBariery/freeAniz_20000_Voltage_0.1500_mtop_-1_mfree_1_mbottom_1_VProfileType_0_/dwieBariery-Oxs_TimeDriver-Magnetization-00-0044000.omf", 0644) = 0 [pid 5643] rename("/net/scratch/people/plgczuchr/slurm_jobdir/13738101/tmp/dwieBariery-p1311.prometheus-5582.omf", "/net/scratch/people/plgczuchr/OOMMF/dwieBariery/./dwieBariery/freeAniz_20000_Voltage_0.1500_mtop_-1_mfree_1_mbottom_1_VProfileType_0_/dwieBariery-Oxs_TimeDriver-Magnetization-00-0044000.omf") = 0 [pid 5643] write(9, "\0", 1 <unfinished ...> [pid 5652] <... select resumed> ) = 1 (in [8]) [pid 5643] <... write resumed> ) = 1 |
| Comment by James A Simmons [ 23/Oct/18 ] |
|
Thank you for this info and also for testing out this work. I will talk with Neil about this problem to see if we can resolve it. |
| Comment by James A Simmons [ 24/Oct/18 ] |
|
Which version of lustre are you running and is it only patched with https://review.whamcloud.com/#/c/28486/17 ? Better yet can you share your rpms with us. |
| Comment by Gerrit Updater [ 11/Feb/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/28486/ |
| Comment by Gerrit Updater [ 03/Apr/19 ] |
|
Sebastien Piechurski (sebastien.piechurski@atos.net) uploaded a new patch: https://review.whamcloud.com/34582 |
| Comment by Sebastien Piechurski [ 03/Apr/19 ] |
|
Any chances to have the patch included in 2.12.1 ? |
| Comment by Peter Jones [ 04/Apr/19 ] |
|
spiechurski it would be possible but what is the driver to do so? |
| Comment by James A Simmons [ 04/Apr/19 ] |
|
Note the work is not complete. I need to solve the dcache hard link breakage issue. I have an idea on how to fix it but it will take some time to resolve it. |
| Comment by Sebastien Piechurski [ 02/Oct/19 ] |
|
pjones : one of our customer (Cardiff University) is hitting this issue from time to time. simmonsja : Did you progress on this topic since April ? Maybe in another ticket ? |
| Comment by James A Simmons [ 02/Oct/19 ] |
|
Oh I thought with |
| Comment by Sebastien Piechurski [ 02/Oct/19 ] |
|
Indeed with No pressure though, I was just looking for information on progress. Thanks, Seb. |
| Comment by James A Simmons [ 30/Oct/19 ] |
|
I have been looking at how to resolve these issues and thinking of some ideas. So the breakage you see is due to the way Lustre manages its dcache so the special .lustre/fid directory works. This special handling is what is causing these problems. The patches from upstream resolve the issues you see but end up making the .lustre/fid handling not work. I cover why under ticket LU-11501 what that breakage is. Additional the special .lustre/fid directory doesn't work with file sets so I really need to think of a way to handle all these conditions. Before we can land the patches under these tickets .lustre/fid needs to be resolved first. What I have been thinking of is using a overlayfs type approach which used d_real() to map a dentry to a real dentry under the hood. This way a dentry for .lustre/fid/$FID can then be mapped to the real dentry. This means moving the .lustre/fid dentry out of the dcache into its own private cache for lookup. This is needed anyways since you don't want a mounted file set being able to FID outside of its tree. Also at the same time work is being done in the tools to move away from using .lustre/fid to using fhandles. This might help to resolve some of the issues as well. |
| Comment by Gerrit Updater [ 13/Dec/19 ] |
|
|
| Comment by Gerrit Updater [ 27/Feb/20 ] |
|
Neil Brown (neilb@suse.de) uploaded a new patch: https://review.whamcloud.com/37741 |
| Comment by Gerrit Updater [ 05/Mar/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37741/ |
| Comment by Gerrit Updater [ 14/Apr/20 ] |
|
|
| Comment by Gerrit Updater [ 03/Jul/21 ] |
|
James Simmons (jsimmons@infradead.org) uploaded a new patch: https://review.whamcloud.com/44135 |
| Comment by Gerrit Updater [ 11/Sep/21 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/44135/ |
| Comment by Gerrit Updater [ 24/Nov/22 ] |
|
"James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49237 |