[LU-8056] Support for linux 4.5 kernels Created: 22/Apr/16  Updated: 06/Mar/17  Resolved: 22/Aug/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.9.0
Fix Version/s: Lustre 2.9.0

Type: Improvement Priority: Minor
Reporter: James A Simmons Assignee: James A Simmons
Resolution: Fixed Votes: 0
Labels: None
Environment:

Lustre built against linux kernel version 4.5.X


Attachments: File dump.log.gz    
Issue Links:
Related
is related to LU-6215 Sync Lustre external tree with lustre... Resolved
is related to LU-8210 OST Read Cache does not work in Cento... Resolved
is related to LU-8560 Support for linux 4.6 kernels Resolved
is related to LU-9183 Support for linux 4.9 kernels Resolved
Rank (Obsolete): 9223372036854775807

 Description   

This ticket will track the needed changes to enable lustre support with linux kernels 4.5.X



 Comments   
Comment by Peter Jones [ 25/Apr/16 ]

Thanks for opening a ticket to track this activity James

Comment by Gerrit Updater [ 05/May/16 ]

Dmitry Eremin (dmitry.eremin@intel.com) uploaded a new patch: http://review.whamcloud.com/20009
Subject: LU-8056 libcfs: Support for linux 4.2 kernels
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 5bee7cfe779a0ae8e6ba2771e8c7275545031347

Comment by Gerrit Updater [ 16/May/16 ]

Li Dongyang (dongyang.li@anu.edu.au) uploaded a new patch: http://review.whamcloud.com/20221
Subject: LU-8056 o2iblnd: ib_query_device removed in 4.5
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 2a70db9bd0b6d33b5a1d9e3756a2a376e0924a5f

Comment by Gerrit Updater [ 16/May/16 ]

Li Dongyang (dongyang.li@anu.edu.au) uploaded a new patch: http://review.whamcloud.com/20222
Subject: LU-8056 socklnd: NETIF_F_ALL_CSUM renamed to NETIF_F_CSUM_MASK
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: a7f5223e622ad3aee027a1b602a37a622af26cdd

Comment by Gerrit Updater [ 16/May/16 ]

Li Dongyang (dongyang.li@anu.edu.au) uploaded a new patch: http://review.whamcloud.com/20223
Subject: LU-8056 llite: use inode_lock to access i_mutex
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 37f52d82396adc41d451a29d0b814715e0da7b50

Comment by Gerrit Updater [ 16/May/16 ]

Li Dongyang (dongyang.li@anu.edu.au) uploaded a new patch: http://review.whamcloud.com/20224
Subject: LU-8056 llite: inode_operations interface changed in 4.5
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: a711d0c976238311678c6133d7421f4bfca926d3

Comment by Gerrit Updater [ 16/May/16 ]

Li Dongyang (dongyang.li@anu.edu.au) uploaded a new patch: http://review.whamcloud.com/20225
Subject: LU-8056 llite: POSIX_ACL_XATTR_

{ACCESS,DEFAULT}

removed in 4.5
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 1dd4e8c307251c96f85784e44761a976ae0e6671

Comment by Gerrit Updater [ 03/Jun/16 ]

James Simmons (uja.ornl@yahoo.com) uploaded a new patch: http://review.whamcloud.com/20619
Subject: LU-8056 lloop: fix bio_for_each_segment_all for newer kernels
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 2864b64e1ef2541e04da4816318e3876c6f826c2

Comment by Gerrit Updater [ 14/Jun/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/20009/
Subject: LU-8056 libcfs: Support for linux 4.2 kernels
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: beca050380b592477153fe16b79b7b6bb3aacbf2

Comment by Alexey Shvetsov [ 18/Jun/16 ]

Hi!

I tryed running lustre 2.8.0 and 2.8.54 with patches using kernel 4.4, and 4.5 from mainline and lts mainline with in-kernel IB stack drivers.
Doing that i get problems with writes to lustre fs. Client can only write portions that are smaller then ~32M. If i try to write files larger then this limit client will hang.

[ 24.133660] Lustre: Lustre: Build Version: 2.8.54_1_g9b478d6
[ 24.196967] Lustre: 2854:0:(client.c:2067:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1466197626/real 1466197626] req@ffff88086ac78000 x1537419641880592/t0(0) o250->MGC10.252.246.12@o2ib@10.252.246.12@o2ib:26/25 lens 520/544 e 0 to 1 dl 1466197631 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
[ 49.258153] Lustre: Server MGS version (2.1.0.0) is much older than client. Consider upgrading server (2.8.54_1_g9b478d6)
[ 49.294582] LustreError: 2909:0:(obd_config.c:1387:class_process_proc_param()) lov: home-clilov-ffff88106c9de800 unknown param qos_threshold_rr=100
[ 49.312588] LustreError: 2909:0:(obd_config.c:1392:class_process_proc_param()) lov.: error writing proc entry 'stripeoffset': rc = -34
[ 49.407596] Lustre: home-clilov-ffff88106c9de800: disabling xattr cache due to unknown maximum xattr size.
[ 49.425897] Lustre: Mounted home-client
[ 72.514684] NFSD: starting 90-second grace period (net ffffffff81b58e00)
[ 112.190423] Lustre: 2858:0:(client.c:2067:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1466197698/real 1466197698] req@ffff880074610600 x1537419641883984/t0(0) o4->home-OST0010-osc-ffff88106c9de800@10.252.246.18@o2ib:6/4 lens 608/448 e 0 to 1 dl 1466197714 ref 2 fl Rpc:X/0/ffffffff rc 0/-1
[ 112.222944] Lustre: home-OST0010-osc-ffff88106c9de800: Connection to home-OST0010 (at 10.252.246.18@o2ib) was lost; in progress operations using this service will wait for recovery to complete
[ 112.295469] Lustre: home-OST0010-osc-ffff88106c9de800: Connection restored to 10.252.246.18@o2ib (at 10.252.246.18@o2ib)
[ 128.294527] Lustre: 2859:0:(client.c:2067:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1466197714/real 1466197714] req@ffff880074610c00 x1537419641884176/t0(0) o4->home-OST0010-osc-ffff88106c9de800@10.252.246.18@o2ib:6/4 lens 608/448 e 0 to 1 dl 1466197730 ref 2 fl Rpc:X/2/ffffffff rc -11/-1
[ 128.294531] Lustre: 2857:0:(client.c:2067:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1466197714/real 1466197714] req@ffff880074610300 x1537419641884160/t0(0) o4->home-OST0010-osc-ffff88106c9de800@10.252.246.18@o2ib:6/4 lens 608/448 e 0 to 1 dl 1466197730 ref 2 fl Rpc:X/2/ffffffff rc -11/-1
[ 128.294537] Lustre: 2857:0:(client.c:2067:ptlrpc_expire_one_request()) Skipped 2 previous similar messages
[ 128.294560] Lustre: home-OST0010-osc-ffff88106c9de800: Connection to home-OST0010 (at 10.252.246.18@o2ib) was lost; in progress operations using this service will wait for recovery to complete

Comment by James A Simmons [ 20/Jun/16 ]

Your server side is at 2.1 which is to old. You need to run a minimum 2.5 server.

Comment by Alexey Shvetsov [ 23/Jun/16 ]

It seems not related to server side cause if i use stock centos 7.2 kernel then it works
(server is 2.1.0 and client is 2.8.0). So it seems to be bug in one of patches...
[root@backup1 ~]# cat /proc/fs/lustre/version
lustre: 2.8.0
kernel: patchless_client
build: jenkins-arch=x86_64,build_type=client,distro=el7,ib_stack=inkernel-12-1-PRISTINE-3.10.0-327.18.2.el7.x86_64

Comment by Alexey Shvetsov [ 25/Jun/16 ]

Just to add... It seems like a kernel bug what comes after linux 4.1. Since mainline kernel 4.1.27 with lustre 2.8.0 and 2.8.54 works just fine =\
I'll try to bisect it.

Comment by Alexey Shvetsov [ 28/Jun/16 ]

Seems i was wrong.. 2.8.5x doesnt work with 4.1.27 (either dont build because of bio changes, or builds but doent work i.e. can write only 32M)

Comment by Alexey Shvetsov [ 28/Jun/16 ]

Git bissect of lustre master shows that client fails after http://review.whamcloud.com/#/c/19368/ was landed

Tests was done with 4.1.27 mainline lts kernel

Comment by Alexey Shvetsov [ 28/Jun/16 ]

alexxy@scc-106-02 ~/Src/lustre $ git bisect log
git bisect start

  1. good: [50ac302a72c91853d64469d90111cb0baf223358] New version 2.8.52
    git bisect good 50ac302a72c91853d64469d90111cb0baf223358
  2. bad: [d5865df5c1419457d0aa7a1636aa72de499d7b38] New tag 2.8.53
    git bisect bad d5865df5c1419457d0aa7a1636aa72de499d7b38
  3. good: [71d2ea0fde17ecde0bf237f486d4bafb5d54fe3f] LU-8023 lbuild: add find-requires parameter to rpmbuild
    git bisect good 71d2ea0fde17ecde0bf237f486d4bafb5d54fe3f
  4. good: [d2a310d698df167ccb41c1ae0ac24d4ef9846f45] LU-8051 tests: Skipping sanity tests 300[a-d] for old servers.
    git bisect good d2a310d698df167ccb41c1ae0ac24d4ef9846f45
  5. good: [3cbdb896b6b380ff7c843f9a7104456e7e80b347] LU-7904 osd: honor LOC_F_NEW
    git bisect good 3cbdb896b6b380ff7c843f9a7104456e7e80b347
  6. bad: [c28933602a6971739cb5ec3a1e920409ff19b01e] LU-8017 obd: report correct health state of a node
    git bisect bad c28933602a6971739cb5ec3a1e920409ff19b01e
  7. bad: [037202f194cfed3bd195619d374f88adbd74ea38] LU-4423 libcfs: use ahash for libcfs crypto layer
    git bisect bad 037202f194cfed3bd195619d374f88adbd74ea38
  8. bad: [d8467ab8a2ca15fbbd5be3429c9cf9ceb0fa78b8] LU-7990 clio: revise readahead to support 16MB IO
    git bisect bad d8467ab8a2ca15fbbd5be3429c9cf9ceb0fa78b8
  9. first bad commit: [d8467ab8a2ca15fbbd5be3429c9cf9ceb0fa78b8] LU-7990 clio: revise readahead to support 16MB IO
Comment by James A Simmons [ 29/Jun/16 ]

Thanks for tracking this down. This is a serious bug that could impact more than just newer kernel users. This points to the ptlrpc layer as the problem and not LNet/ko2iblnd which is good since the upstream kernel client has the same LNet stack as the Intel/OpenSFS branch. Since it is easy to reproduce can you collect debug logs. Run the command 'lctl set_param debug=all" and then lctl dk > /tmp/dump.log. Then post the log here. We will gladly work on this problem.

Comment by Gerrit Updater [ 05/Jul/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/20221/
Subject: LU-8056 o2iblnd: ib_query_device removed in 4.5
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 31d6445718b896290198f5d127f86c174d499c6c

Comment by Gerrit Updater [ 05/Jul/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/20619/
Subject: LU-8056 lloop: fix bio_for_each_segment_all for newer kernels
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 87c5e1cd7f45b336d40028af986bf09d2e4922b3

Comment by Alexey Shvetsov [ 06/Jul/16 ]

Dump log from lustre

Comment by Gerrit Updater [ 11/Jul/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/20222/
Subject: LU-8056 socklnd: NETIF_F_ALL_CSUM renamed to NETIF_F_CSUM_MASK
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 11e4f77fa6c2b120864f19d27a303cdfe6f57d44

Comment by Gerrit Updater [ 11/Jul/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/20223/
Subject: LU-8056 llite: use inode_lock to access i_mutex
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 2c7da05ca58b4146fa47cfcbc86de51099cf452a

Comment by Gerrit Updater [ 11/Jul/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/20224/
Subject: LU-8056 llite: inode_operations interface changed in 4.5
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: e8e440b36aaf8e45649e72f7a92d2ebc1ae8d874

Comment by Gerrit Updater [ 20/Jul/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/20225/
Subject: LU-8056 llite: POSIX_ACL_XATTR_

{ACCESS,DEFAULT}

removed in 4.5
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 7af958a556ef2be0219ba279631b05a7b2a58cee

Comment by James A Simmons [ 20/Jul/16 ]

One last patch to push for the ChangeLog.

Comment by Gerrit Updater [ 20/Jul/16 ]

James Simmons (uja.ornl@yahoo.com) uploaded a new patch: http://review.whamcloud.com/21453
Subject: LU-8056 lprocfs: treat seq_printf as void function
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 6e6b2f9ec8f4dfdc5ae85c0522a2fce65ae520c6

Comment by Gerrit Updater [ 20/Jul/16 ]

James Simmons (uja.ornl@yahoo.com) uploaded a new patch: http://review.whamcloud.com/21454
Subject: LU-8056 xattr: update server code for POSIX xattr rename
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 3890149aa255340bcfa2c2a1cec3580116f90fa1

Comment by James A Simmons [ 20/Jul/16 ]

I gave the latest master a run using ZFS with a linux 4.5.7 kernel and found new regression. One of the changes for the 4.5 kernel is the removal of GFP_IOFS. In the past I submitted a patch to remove this flag in the server code but I was told it was needed. Now in newer kernels the lack of GFP_IOFS needs to be addressed.

Comment by Gerrit Updater [ 27/Jul/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/21453/
Subject: LU-8056 lprocfs: treat seq_printf as void function
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 008b5ffc75a22859ebe052d381594de7a51d95f5

Comment by Gerrit Updater [ 06/Aug/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/21454/
Subject: LU-8056 xattr: update server code for POSIX xattr rename
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 7509d5f6fbec7a38c51d696ffdabc0516b6baeff

Comment by Peter Jones [ 06/Aug/16 ]

So is there more work to come tracked under this ticket?

Comment by James A Simmons [ 06/Aug/16 ]

One more patch is needed.

Comment by Gerrit Updater [ 08/Aug/16 ]

James Simmons (uja.ornl@yahoo.com) uploaded a new patch: http://review.whamcloud.com/21781
Subject: LU-8056 mem: handle GFP_IOFS removal in newer kernels
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 9b0cfaa1386423e6f538184c1910806a1a0ea93f

Comment by Gerrit Updater [ 22/Aug/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/21781/
Subject: LU-8056 mem: handle GFP_IOFS removal in newer kernels
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 654586b8f498550056d4a5949768a70736f07677

Comment by Gerrit Updater [ 22/Aug/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/21970/
Subject: LU-8056 build: announce linux kernel 4.5.7 support
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 198c2eda4dec4e19a71845d325c8af3acdd30f81

Comment by James A Simmons [ 22/Aug/16 ]

Support fot linux 4.5 kernels is complete.

Generated at Sat Feb 10 02:14:14 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.