Serguei,
Thanks for the feedback and sorry for the slow response. In answer to your question, on orelic, it is running the TOSS 3 OS, based on RHEL 7 and we are using default lnet credit settings with the exception of:
ko2iblnd credits=1024
ksocklnd credits=512
Additional Lnet-related tunings on orelic are:
lustre_common.conf:options libcfs libcfs_panic_on_lbug=1
lustre_common.conf:options libcfs libcfs_debug=0x3060580
lustre_common.conf:options ptlrpc at_min=45
lustre_common.conf:options ptlrpc at_max=600
lustre_common.conf:options ksocklnd keepalive_count=100
lustre_common.conf:options ksocklnd keepalive_idle=30
lustre_common.conf:options lnet check_routers_before_use=1
lustre_common.conf:options lnet lnet_peer_discovery_disabled=1
lustre_common.lustre212.conf:options lnet lnet_retry_count=0
lustre_common.lustre212.conf:options lnet lnet_health_sensitivity=0
lustre_router.conf:options lnet forwarding="enabled"
lustre_router.conf:options lnet tiny_router_buffers=2048
lustre_router.conf:options lnet small_router_buffers=16384
lustre_router.conf:options lnet large_router_buffers=2048
Ruby, and nearly every other system in the center is running the TOSS 4 OS, based on RHEL 8, and they also have extra tunings in addition to those two above. The routers on ruby are setting the following, which appears to take effect for both IB and OPA interfaces:
ko2iblnd-opa peer_credits=32 peer_credits_hiw=16 credits=1024 concurrent_sends=64 ntx=2048 map_on_demand=256 fmr_pool_size=2048 fmr_flush_trigger=512 fmr_cache=1 conns_per_peer=4
ko2iblnd credits=1024 lustre_router.conf:options ksocklnd credits=512
In case it's helpful, the other Lnet-related settings on Ruby routers are:
lustre_common.conf:options libcfs libcfs_panic_on_lbug=1
lustre_common.conf:options libcfs libcfs_debug=0x3060580
lustre_common.conf:options ptlrpc at_min=45
lustre_common.conf:options ptlrpc at_max=600
lustre_common.conf:options ksocklnd keepalive_count=100
lustre_common.conf:options ksocklnd keepalive_idle=30
lustre_common.conf:options lnet check_routers_before_use=1
lustre_common.conf:options lnet lnet_health_sensitivity=0
lustre_common.conf:options lnet lnet_peer_discovery_disabled=1
lustre_router.conf:options lnet forwarding="enabled"
lustre_router.conf:options lnet tiny_router_buffers=2048
lustre_router.conf:options lnet small_router_buffers=16384
lustre_router.conf:options lnet large_router_buffers=2048
Cameron,
Some clarifications:
Thanks,
Serguei