[LU-7578] Push latest gnilnd changes Created: 17/Dec/15  Updated: 05/Feb/16  Resolved: 05/Feb/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.8.0

Type: Task Priority: Minor
Reporter: Chris Horn Assignee: Chris Horn
Resolution: Fixed Votes: 0
Labels: None

Rank (Obsolete): 9223372036854775807

 Description   

There have been a few gnilnd changes since the last time we sync'd up. I'll be pushing up the latest commits.



 Comments   
Comment by Gerrit Updater [ 17/Dec/15 ]

Chris Horn (hornc@cray.com) uploaded a new patch: http://review.whamcloud.com/17663
Subject: LU-7578 gnilnd: Modify allocator flags to prevent waiting
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: d172877a683e8f0a980265c96edccf8f214fd674

Comment by Gerrit Updater [ 17/Dec/15 ]

Chris Horn (hornc@cray.com) uploaded a new patch: http://review.whamcloud.com/17664
Subject: LU-7578 gnilnd: Add module parameter reg_fail_timeout
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: a1909655ff47f5276ca789dad6e88b6bc167fd3f

Comment by Gerrit Updater [ 17/Dec/15 ]

Chris Horn (hornc@cray.com) uploaded a new patch: http://review.whamcloud.com/17665
Subject: LU-7578 gnilnd: Handle new return code in gni_mem_register()
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 237a4857ae42ef2fb569664a1e5f24398ae53687

Comment by Gerrit Updater [ 17/Dec/15 ]

Chris Horn (hornc@cray.com) uploaded a new patch: http://review.whamcloud.com/17666
Subject: LU-7578 gnilnd: Return correct error on GNI_RC_ERROR_NOMEM
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: b2d35706d02c0cf16ef404f865211e7fde14cfb1

Comment by Gerrit Updater [ 17/Dec/15 ]

Chris Horn (hornc@cray.com) uploaded a new patch: http://review.whamcloud.com/17667
Subject: LU-7578 gnilnd: Revert max_immediate setting
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: a382968a06574400bd48e2e0beb848ad1ba81304

Comment by James A Simmons [ 21/Jan/16 ]

Chris one of these patches is causing a regression in my testing. I'm seeing an increase in memory pressure that is causing jobs to fail under pressure.

Comment by Chris Horn [ 21/Jan/16 ]

James, I've passed along your comments to our gnilnd engineers and asked them to weigh in on this ticket.

Comment by Chuck Fossen [ 25/Jan/16 ]

James, are you saying that gnilnd is now using more memory or that allocations are failing when the node is under high memory pressure?
Also, I assume this is on compute nodes that you are seeing this issue. Is that true?
I don't see that these changes would cause gnilnd to use more memory.
http://review.whamcloud.com/17663 changed the vmalloc allocation flags so an allocation will fail instead of waiting forever to allocate memory.
We have seen heartbeat failures when a node needs to allocate memory to establish a connection in the case where Lustre is trying to write to disk in order to free memory.

Comment by James A Simmons [ 01/Feb/16 ]

Just did another round of testing and I didn't see problems this time. Strange some unrelated change must of landed that fix the problem the latest Gemini changes must of been exposing.

Comment by Gerrit Updater [ 02/Feb/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/17667/
Subject: LU-7578 gnilnd: Revert max_immediate setting
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 928c5050f7d2a8a2cabb6eeb3993b29166fdaf1e

Comment by Gerrit Updater [ 04/Feb/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/17663/
Subject: LU-7578 gnilnd: Modify allocator flags to prevent waiting
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 4e7994f45811e66f50a5d174b1b5dfc20c65269b

Comment by Gerrit Updater [ 05/Feb/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/17664/
Subject: LU-7578 gnilnd: Add module parameter reg_fail_timeout
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 5b787cb7a375372c7a4f3c405d38137a7a867677

Comment by Gerrit Updater [ 05/Feb/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/17665/
Subject: LU-7578 gnilnd: Handle new return code in gni_mem_register()
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 37e5f21ee4db9cb3df063d5537511ec15c1196b3

Comment by Gerrit Updater [ 05/Feb/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/17666/
Subject: LU-7578 gnilnd: Return correct error on GNI_RC_ERROR_NOMEM
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 919b8968d84d0d6ad57e2e6e5e1a8ccb02a1bd2c

Comment by James A Simmons [ 05/Feb/16 ]

All outstanding patches have landed.

Comment by Joseph Gmitter (Inactive) [ 05/Feb/16 ]

Patches have landed for 2.8

Generated at Sat Feb 10 02:10:06 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.