[LU-4147] LNET parameter recommendations Created: 25/Oct/13  Updated: 18/Nov/13  Resolved: 18/Nov/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.1
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Jason Hill (Inactive) Assignee: James Nunez (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Attachments: Microsoft Word Titan-Atlas-Eos router calcs 9-26-2013.xlsx    
Severity: 3
Rank (Obsolete): 11262

 Description   

Attached you'll find a spreadsheet of our thoughts on setting ko2iblnd parameters and kgnilnd parameters for our center. I apologize that I've dropped the ball on this; we're planning on implementing these settings (in the agreed upon column) on Tuesday 10/29. There are some points where we need some guidance on the recommended values. They are noted.

If there are any questions please do not hesitate to ask. Likely there will be more comments from ORNL folks as I publicize this jira ticket.



 Comments   
Comment by James Nunez (Inactive) [ 25/Oct/13 ]

Liang,

Would you please comment on the proposed lnet/ko2iblnd/kgnilnd values for ORNL's systems?

Thank you,
James

Comment by Liang Zhen (Inactive) [ 30/Oct/13 ]

Sorry I just come back from vacation, maybe it's too late to add comment, but…

  • server
    • o2iblnd::fmr/pmr_pool_size and fmr_flush_trigger, they are unused unless "map_on_demand" is set or it's running on some of Chelsio infiniband
    • o2iblnd::ntx, I think 4096 is a good value in this case.
    • o2iblnd::keepalive and timeout, because timeout is 100 in your setting (timeout value of a TX), it's probably meaningful to have keepalive >= timeout
    • o2iblnd::peer_credits, any reason that you want to increase it to 63? It's ok if it's a empirical value, otherwise I think 16 can saturate most links (I remember some users report that 16 peercredits can saturate FDR infiniband)
    • o2iblnd::peertimeout, this is only checked on router
  • RTR
    • o2iblnd::ntx, double of credits could be fine, it wouldn't consume too much memory.
    • o2iblnd::peer_credits, same as comment for server
    • o2iblnd::peer_buffer_credits, this means LNet will ignore lent::peer_buffer_credits, but lnet::peer_buffer_credits is also set in the proposal. Also, is current value (128) an empirical value?
  • o2iblnd client
    • same comment as server
Comment by James A Simmons [ 30/Oct/13 ]

When would you consider using map_on_demand?

Comment by Liang Zhen (Inactive) [ 30/Oct/13 ]

I would say, unless there is performance issue while benchmarking with lnet_selftest bulk, or memory allocation failure in o2iblnd while running w/o map_on_demand (they can happen on some HCAs), otherwise please just keep it turned off.

Comment by James Nunez (Inactive) [ 18/Nov/13 ]

Jason or James,

Are there any more LNET parameter questions/issues that we need to address or should we close this ticket?

Thanks,
James

Comment by Jason Hill (Inactive) [ 18/Nov/13 ]

Go ahead and close this. We've implemented the changes already. Some of the notes from Liang were too late to influence what we put in production but it has not seemed to be to any detriment.

Comment by James Nunez (Inactive) [ 18/Nov/13 ]

Request completed.

Generated at Sat Feb 10 01:40:05 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.