patchless server kernel (LU-20)

[LU-3406] Submit raid5-mmp-unplug-dev patch upstream Created: 27/May/13  Updated: 24/Apr/17  Resolved: 24/Apr/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.0
Fix Version/s: None

Type: Technical task Priority: Major
Reporter: Andreas Dilger Assignee: Bruno Faccini (Inactive)
Resolution: Won't Fix Votes: 0
Labels: None

Attachments: Text File 0001-md-submit-MMP-reads-REQ_SYNC-to-bypass-RAID5-cache.patch    
Issue Links:
Related
is related to LU-20 patchless server kernel Resolved
Bugzilla ID: 17,895
Rank (Obsolete): 8423

 Description   

In order to submit the raid5-mmp-unplug-dev patch upstream, this needs to be updated for the latest kernel. Unfortunately, the affected code seems to have changed since the patch was written, so I'm not sure whether a simple "best guess update" of the patch will be correct.

I'll attach my "best guess" patch, but it needs to be verified by someone who actually understands the MD RAID code better.



 Comments   
Comment by Andreas Dilger [ 27/May/13 ]

Prototype patch against current Linux Git HEAD.

Comment by James A Simmons [ 28/May/13 ]

Since this patch is more a bug fix it might even make it into 3.10. Time to send it to dm-devel@redhat.com

Comment by Andreas Dilger [ 29/May/13 ]

James, a problem I've just noticed since I looked at this patch in detail for upstream submission is that overloading READ_SYNC/REQ_SYNC to mean "bypass MD RAID cache" is that this can negatively impact other filesystems such as Btrfs, OCFS, and others that now use READ_SYNC to distinguish reads that are being waited upon by processes from readahead requests.

That is fine for Lustre server kernels, but isn't necessarily fine for general usage. One option is to make a separate REQ_NOCACHE flag or similar, but that needs a much bigger patch that is only useful for ext4 MMP.

Hopefully someone who understands the RAID and block layer details better can come up with a solution.

Comment by Bruno Faccini (Inactive) [ 31/May/13 ]

Why not using _META flag for this purpose ?

Comment by Andreas Dilger [ 01/Jun/13 ]

I don't think _META is less used than _SYNC. It is often used for metadata IO requests to increase the priority in the scheduler. I think it is probably best to try with a new REQ flag and see what the upstream MD maintainer thinks. This is a generic bug with ext4 MMP, and not Lustre specific, so there should be some kind of solution possible.

Comment by Bruno Faccini (Inactive) [ 05/Jun/13 ]

Ok, so let's go for a new flag.
But having a look to the MD/Raid5 source code I am now concerned about the real need for md_wakeup_thread() call at the end of make_request() if flag is set, seems to me that it should have been already called within release_stripe[_plug]() call and underlying routines. Thus, even if ineffective it could be considered as useless and costly.

Comment by Bruno Faccini (Inactive) [ 05/Jun/13 ]

BTW, within current supported Kernels and current patch version, my earlier comment applies to md_raid5_unplug_device() instead of md_wakeup_thread(). And it was already in original patch submitted for BZ #17895 by Jinshan, may be he can help me to confirm if it is necessary or not.

Comment by Bruno Faccini (Inactive) [ 11/Jun/13 ]

I spent some time digging in latest/3.9.4 kernel sources and I can confirm there are still no way to bypass the MD/Raid5 stripe-cache upon a read request.
I am 1st testing a patch (which introduces a new flag) against current Lustre-Server supported Kernel version and to be exposed under HA/mmp tests.

Comment by Bruno Faccini (Inactive) [ 01/Aug/13 ]

Oops, forgot to indicate patch is at http://review.whamcloud.com/6652.
Also auto-tests never started for patch-set #1 for unexplained reasons, I submitted patch-set #2 with less restrictive test-parameters just in case.
On the other hand I am working on a local test platform + use-cases to ensure MMP work fine over SW-Raid.

Comment by James A Simmons [ 07/Jul/14 ]

Patch http://review.whamcloud.com/6652 has been updated to latest master. If it proves stable we should look to pushing it upstream to get feedback to see what the final result is.

Comment by Bruno Faccini (Inactive) [ 08/Jul/14 ]

James,
I need to apologize to have not updated this ticket, and associated change #6652 too, since months now, even if I have been assigned to higher priority tasks since... In fact what I have really forgotten is to already give a detailled update on where I was on this, so will try to do it now !

After I had pushed patch-set #1 of LU-6652, I ran local tests on an ad-hoc HA platform to verify patch's functionality and correct behavior, but then I discovered that there was no debug trace generated out from raid456 module upon MMP block reads !!!! And this when there was during MMP block writes, which seems just impossible when reading both ext4/ldiskfs and md/raid5 source code! But also confirmed by iostat/blktrace monitoring.
This is the reason why next patch-sets 2-5 (don't remember why I removed the "fortestonly" param ...) are only adding more debug stuff/traces to help understand what's going-on ...

BTW, at this time and before to give-up due to higher priorities..., I tried to verify the current/original patch behavior, and it exhibited tha same unexpected results.

So here I was and still I am on this, so if you pursue in re-basing my patch What I strongly suggest is to again verify patch functionality/behavior at the lowest level.

Comment by James A Simmons [ 03/Nov/14 ]

I submitted a similar patch to dm-revel@redhat.com. You can see the message at https://www.redhat.com/archives/dm-devel/2014-November/msg00004.html. I like to see what the feedback is so we can develop a approach acceptable upstream and then back port it to supported distros.

Comment by James A Simmons [ 24/Apr/17 ]

Since this will not be fixed I suggest we delete the current patches we carry. Another option is to cache the patches in contrib until some one wants to work on a version to get accepted upstream. Especially since upstream show potential corruption with our current patch.

Comment by Andreas Dilger [ 24/Apr/17 ]

I think that makes sense. Alternately, the patch could just be removed from the patch series files and left in kernel_patches in case anyone wants to use it. That would be easier to find than in the contrib directory.

Generated at Sat Feb 10 01:33:38 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.