[LU-4416] support for 3.12 linux kernel Created: 27/Dec/13 Updated: 20/Nov/15 Resolved: 16/Jul/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.6.0 |
| Fix Version/s: | Lustre 2.8.0 |
| Type: | Improvement | Priority: | Critical |
| Reporter: | Bob Glossman (Inactive) | Assignee: | Yang Sheng |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | HB, patch | ||
| Environment: |
fc19, fc20 |
||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Sub-Tasks: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Rank (Obsolete): | 12130 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
tracker for 3.12 kernel support now that the kernel used in fc19 and fc20 is 3.12 we will need to support it soon. |
| Comments |
| Comment by Bob Glossman (Inactive) [ 27/Dec/13 ] |
|
At a minimum there are a few upstream commits from Peng Tao that need to be back ported: https://git.kernel.org/cgit/linux/kernel/git/gregkh/staging.git/commit/drivers/staging/lustre?h=staging-next&id=ea8352c289294e21ee13bdb105f55dc63497acff staging/lustre/libcfs: cleanup linux-mem.h https://git.kernel.org/cgit/linux/kernel/git/gregkh/staging.git/commit/drivers/staging/lustre?h=staging-next&id=3bb22ec53e2bd12a241ed84359bffd591a40ab87 staging/lustre/ptlrpc: convert to new shrinker API https://git.kernel.org/cgit/linux/kernel/git/gregkh/staging.git/commit/drivers/staging/lustre?h=staging-next&id=fe92a0557a6f332119c51fdd2f3d574040989447 staging/lustre/obdclass: convert lu_object shrinker to count/scan API https://git.kernel.org/cgit/linux/kernel/git/gregkh/staging.git/commit/drivers/staging/lustre?h=staging-next&id=cbc3769ecd74b183d3ba5e11264cf484d8572a00 staging/lustre/ldlm: convert to shrinkers to count/scan API These upstream mods are only good for the exact kernel version they reside in. Backports will need autoconf support to adapt to different kernel versions as well as 3.12. |
| Comment by Yang Sheng [ 10/Jan/14 ] |
|
http://review.whamcloud.com/8799 |
| Comment by Jinshan Xiong (Inactive) [ 11/Feb/14 ] |
|
another patch is at: http://review.whamcloud.com/9230 from |
| Comment by Yang Sheng [ 17/Feb/14 ] |
|
Last sanity test result: sanity: FAIL: test_17m e2fsck should not report error upon short/long symlink MDT: rc=4 test_182 dead lock the kernel |
| Comment by James A Simmons [ 17/Feb/14 ] |
|
Is this for ZFS? Do you have a shrinker patch? |
| Comment by Yang Sheng [ 18/Feb/14 ] |
|
It is ldiskfs. Yes, I'll commit it shortly. |
| Comment by Yang Sheng [ 18/Feb/14 ] |
|
http://review.whamcloud.com/9300 shrinker change. |
| Comment by Yang Sheng [ 10/Mar/14 ] |
|
Last sanity status: sanity: FAIL: test_17k rsync failed with xattrs enabled test_209 stuck in shrinker process. |
| Comment by Yang Sheng [ 17/Mar/14 ] |
|
Last sanity status: sanity: FAIL: test_17m e2fsck should not report error upon short/long symlink MDT: rc=4 test_209 stuck on shrink. |
| Comment by Yang Sheng [ 18/Apr/14 ] |
|
Link with test_103 failed. |
| Comment by Jeff Mahoney [ 29/Apr/14 ] |
|
SLES12 is also 3.12 based, so I can lend a hand here as well.l |
| Comment by Jeff Mahoney [ 30/Apr/14 ] |
|
I have it building against SLES12 Beta5. No testing done yet, but my recipe looks like: Existing commits: New commits: I'll start some testing in the morning. |
| Comment by Bob Glossman (Inactive) [ 30/Apr/14 ] |
|
I think you left out http://review.whamcloud.com/8010 from your recipe. |
| Comment by Jeff Mahoney [ 30/Apr/14 ] |
|
Yep, missed that one when pasting. I have it applied in my local repo. |
| Comment by James A Simmons [ 30/Apr/14 ] |
|
Peter can you link this ticket to |
| Comment by Bob Glossman (Inactive) [ 30/Apr/14 ] |
|
James, I can & will do that. fwiw, I too have been able to do a successful server build following Jeff's recipe including #8010. Build only so far, no actual functional results to report. |
| Comment by Bob Glossman (Inactive) [ 30/Apr/14 ] |
|
when building with latest zfs http://review.whamcloud.com/10064 is also needed. |
| Comment by James A Simmons [ 30/Apr/14 ] |
|
Welcome back Jeff. I also just changed the recipe. You will now need http://review.whamcloud.com/#/c/2677 from As for 3.12 client support only the shrinker change is needed. Everything else is merged |
| Comment by James A Simmons [ 30/Apr/14 ] |
|
Forgot to mention the kernel patch series. The only patches you need to port are raid5-mmp-unplug-dev and bh_lru_size_config.patch. Their are also some quota scaling fixes but I don't know if those have been picked up mainstream and ended up in the SLES12 kernels. Recent test show the block tunable patches might not be worth it anymore. Also dev_read_only kernel hack will be going way. Please try out patch http://review.whamcloud.com/#/c/7200 instead. Feed back on that patch would be great. |
| Comment by Bob Glossman (Inactive) [ 30/Apr/14 ] |
|
James, for now I have been using the old 3.x-fc18.series for kernel patches. it applies cleanly but still includes the dev_read_only hack. Will look into your alternative. |
| Comment by James A Simmons [ 30/Apr/14 ] |
|
We have a collision with the ext4_map_block patch from |
| Comment by Jeff Mahoney [ 30/Apr/14 ] |
|
After reviewing both, I'd like to perform testing with my version. It's cleaner and ext4_map_blocks is being called correctly. ext4_map_blocks will return the number of blocks mapped and if it returns less than map.m_len, it needs to be called multiple times to properly map the entire range. The version in |
| Comment by Jeff Mahoney [ 30/Apr/14 ] |
|
Ok, so I just ran acceptance-small against my recipe and ran into: All of those are just annoyances and can be fixed pretty easily. But the BUG_ON(de->name_len != 1); check in dx_get_dx_info is getting triggered as well, which has halted testing for now. I'll dig into these a bit. |
| Comment by Bob Glossman (Inactive) [ 30/Apr/14 ] |
|
Jeff, the procfs warnings from nrs_tbf_quantum not existing during umount have already been reported elsewhere, so it's not something new in your patches. see |
| Comment by Jeff Mahoney [ 30/Apr/14 ] |
|
Oh, yeah, I'm not expecting that they were introduced by my patches - most of these are checks that have gone into the upstream kernel and haven't been addressed yet. |
| Comment by Jeff Mahoney [ 30/Apr/14 ] |
|
The scheduling while atomic issue was introduced here: The author claimed that the allocation was unnecessary, but it was necessary to avoid holding the atomic lock across a sleeping operation. Given that the operations using this value aren't performance critical (if they were, we wouldn't be doing call_usermodehelper), we can replace the lock with a sleeping lock and keep the allocations out of it. |
| Comment by James A Simmons [ 30/Apr/14 ] |
|
Oh so that is the source of the upcall backtrace I get. I have been ignoring it thinking it was due to a configuration issue on my part. The patch for the TBF NR is totally broken. When I created the patch I had no docs on how to actually test it. Li has left me some pointers in the gerrit review. |
| Comment by Jeff Mahoney [ 30/Apr/14 ] |
|
Found another scheduling while atomic issue: ll_statahead_interpret takes &lli->lli_sa_lock ... this one is a bit more involved than I'm comfortable with. |
| Comment by Jeff Mahoney [ 30/Apr/14 ] |
|
I added a bit of debugging to the dx_get_dx_info BUG_ON and got back the following. AFAICT it's supposed to be "." Looks like that's a bug in the ldiskfs patch set. |
| Comment by James A Simmons [ 01/May/14 ] |
|
Jeff could you reverse the order of dependency for http://review.whamcloud.com/#/c/10164 and http://review.whamcloud.com/#/c/10163. The reason is 10164 will be much easier to land and 10163 is currently broken which makes 10164 broken. Also we need to investigate to see if a performance difference happens when moving to ext4_map_blocks. |
| Comment by Jeff Mahoney [ 01/May/14 ] |
|
Ok, reversed the order and fixed the ext4_map_blocks patch. The entire set builds now. |
| Comment by Jeff Mahoney [ 05/May/14 ] |
|
There was a bug in my ext4_map_blocks patch that caused pretty much everything to return zeroes. I have a fixed version that works properly. I also have a patch that's required for any 3.10+ kernel since ext4_truncate will check that i_mutex is held. I'm going to run small-acceptance again and push the fixes. |
| Comment by Jeff Mahoney [ 10/May/14 ] |
|
After a quick look over, I think the main performance difference is going to come from the holes case. ext4_map_blocks handles it, but not efficiently. It should be doable to use the FIEMAP code to handle the read-only holes case and use ext4_map_blocks for the write case since the allocations paper over the performance issue AFAICT. Not that I'm volunteering to do it. |
| Comment by James A Simmons [ 20/May/14 ] |
|
Peter can you link http://review.whamcloud.com/#/c/10325 - These 3 patches don't have dependency on each other and it is much faster to land patches that lack dependencies. Since time is short we really need to focus on making RHEL7 and SLES12 client support available for Lustre 2.6. We can do the server support for Lustre 2.7. This means focusing on getting 9300, |
| Comment by Jeff Mahoney [ 20/May/14 ] |
|
Works for me, though we've got demand for 2.6 server support on SLE12 AFAIK. So I'll still be plugging away on these (as they pass acceptance except for the namei.c:517 crash). |
| Comment by Bob Glossman (Inactive) [ 20/May/14 ] |
|
James, I put in the link you asked for. I agree that focusing on the small set of patches needed for client support and without dependencies on each other makes sense. Really want to get at least those landed before 2.6 closes. Don't at all suggest we should slow down or stop progress on all the other patches, just that we do what's needed for clients first. |
| Comment by James A Simmons [ 21/May/14 ] |
|
Thank you Bob. Jeff please keep doing the awesome work. |
| Comment by James A Simmons [ 21/May/14 ] |
|
Patches needed for RHEL7 client support: http://review.whamcloud.com/#/c/10325 - Additional patches needed for [SuSE12 client ] / [ generic 3.12 ] support http://review.whamcloud.com/#/c/10160 - Patches need for ZFS server support include the above plus: http://review.whamcloud.com/#/c/7934 |
| Comment by Bob Glossman (Inactive) [ 23/May/14 ] |
|
Think there's a typo in your list. don't you mean: http://review.whamcloud.com/#/c/10160 - ? |
| Comment by James A Simmons [ 23/May/14 ] |
|
fixed. |
| Comment by James A Simmons [ 30/May/14 ] |
|
For the latest master all patches needed for client support of 3.12 kernels has been landed. This covers both RHEL7 and SuSE 12 clients. If you want ZFS back end support with 3.12 kernels you will need the following two patches: http://review.whamcloud.com/#/c/7934 - Currently ldiskfs development is in the early stages for 3.12 kernel supports. The list of patches currently under development can be reviewed with the following link http://review.whamcloud.com/#/q/status:open+message:LU-4416,n,z In addition for RHEL7 ldiskfs support the following patch needs to be applied: |
| Comment by James A Simmons [ 13/Jun/14 ] |
|
Currently clients on newer kernels require you to modprobe lustre before you do a client mount. This is fix in the patch For people wanted to use an external OFED stack (3.12 or Mellanox) you will need patch http://review.whamcloud.com/#/c/10571 |
| Comment by James A Simmons [ 16/Jun/14 ] |
|
Time for a update. All the needed Currently osd-ldiskfs is still in development but should be ready for 2.7. Patches for that work For client support the patch for |
| Comment by James A Simmons [ 16/Jun/14 ] |
|
Update : The patch for |
| Comment by James A Simmons [ 18/Jun/14 ] |
|
I have done some experimenting with the ldiskfs patch for SuSE12 to discover their is only one patch in that series that can't be applied to the 3.12.22 upstream linux kernel trree. Since this is the case I like to purpose we create a 3.12 series file and extend SuSE12 and RHEL7 off of it.The reason for this is so we can start organizing the patches we should be sending upstream to the |
| Comment by Bob Glossman (Inactive) [ 19/Jun/14 ] |
|
I think the recent landing of fix for |
| Comment by James A Simmons [ 19/Jun/14 ] |
|
Looking into it. Can you post what build error you get? |
| Comment by Bob Glossman (Inactive) [ 19/Jun/14 ] |
|
added a bit more detail in |
| Comment by James A Simmons [ 20/Jun/14 ] |
|
As Bob has mentioned a new issue has merged in master that broke client support for RHEL7 and SuSE12. I have a patch to resolve this at http://review.whamcloud.com/#/c/10761 which is related to |
| Comment by James A Simmons [ 03/Jul/14 ] |
|
As of today all patches needed for client and zfs support has landed. Many patches for ldiskfs has landed but more testing and work needs to be done. The patches left for ldiskfs support are: http://review.whamcloud.com/#/c/8116 - http://review.whamcloud.com/#/c/10249 - |
| Comment by James A Simmons [ 11/Aug/14 ] |
|
More patches have landed. The patches left for ldiskfs support are: http://review.whamcloud.com/#/c/10376 - http://review.whamcloud.com/#/c/10249 - |
| Comment by James A Simmons [ 11/Sep/14 ] |
|
Down to two patches. http://review.whamcloud.com/#/c/10165 ldiskfs for SuSE12 support |
| Comment by James A Simmons [ 24/Sep/14 ] |
|
All that is left to land is http://review.whamcloud.com/#/c/10165 which covers ldiskfs for SuSE12 support. |
| Comment by James A Simmons [ 03/Nov/14 ] |
|
Jeff I like to discuss how to go about to make a pathless server for SLES12 support. Currently three primary patches are needed: bh_lru_size_config.patch - a simple which might change based on what will go upstream. Andrew Morton discuss just having BH_LRU_SIZE quota enhancement patches - These have already been merged upstream as the following commits commit b9ba6f94b2382ef832f97122976b73004f714714 It would be really nice to have those merged into your tree. Lastly the work under |
| Comment by Jeff Mahoney [ 03/Nov/14 ] |
|
Hi James, I'm afraid we're too late to incorporate the quota changes. SLE12 GA was released last week. The quota changes would change the kABI so we can't accept them into the release. Unless |
| Comment by James A Simmons [ 09/Jan/15 ] |
|
Jeff since SLES11SP4 is not out just yet would you consider having the BH_LRU_SIZE and quota improvements merged into your kernel? |
| Comment by Jeff Mahoney [ 16/Jan/15 ] |
|
I can propose it. We're technically past our feature request deadline, but I can present it as a late change request. |
| Comment by Yang Sheng [ 22/Jan/15 ] |
|
I think main issue is push http://review.whamcloud.com/7200 can be landed asap. |
| Comment by Gerrit Updater [ 21/Apr/15 ] |
|
Bob Glossman (bob.glossman@intel.com) uploaded a new patch: http://review.whamcloud.com/14532 |
| Comment by Gerrit Updater [ 05/Jun/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14532/ |
| Comment by Yang Sheng [ 15/Jul/15 ] |
|
Looks like this ticket can be close after http://review.whamcloud.com/#/c/10165/ landed. |
| Comment by Gerrit Updater [ 16/Jul/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/10165/ |
| Comment by Peter Jones [ 16/Jul/15 ] |
|
Landed for 2.8 |