[LU-12288] Preferred flag of route selection policy does not work Created: 13/May/19  Updated: 12/Nov/20

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.1
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Tatsushi Takamura Assignee: Tatsushi Takamura
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Route selection is detemined according to circumstances(Preferred, Health Value, Credits and Seq Number) and Preferred is top priority.

        /*
         * Look at the peer NIs for the destination peer that connect
         * to the chosen net. If a peer_ni is preferred when using the
         * best_ni to communicate, we use that one. If there is no
         * preferred peer_ni, or there are multiple preferred peer_ni,
         * the available transmit credits are used. If the transmit
         * credits are equal, we round-robin over the peer_ni.
         */

But, If there are more than 2 peers and the first peer is Preffered, there are cases where the Preffered peer(the first one) is not selected(Preffered flag is ignored).
Also, there are no need to set Preferred flag on when recovery.

lnet_select_peer_ni()

    /* pick the healthiest peer ni */
    if (lpni_healthv < best_lpni_healthv) {
        continue;
    } else if (lpni_healthv > best_lpni_healthv) {
        best_lpni_healthv = lpni_healthv; // peer1(supporse ni_is_pref), but preferred flag not be set
    /* if this is a preferred peer use it */
    } else if (!preferred && ni_is_pref) {
            preferred = true;            
    } else if (preferred && !ni_is_pref) { 
            continue;
    } else if (lpni->lpni_txcredits < best_lpni_credits) { // peer2 is judged by another metrics

We fixed that route selection is in the following order.

  1. Preferred
  2. Health Value
  3. Credits
  4. Seq Number


 Comments   
Comment by Amir Shehata (Inactive) [ 16/May/19 ]

The intended design is to always have health take precedence. In this way the healthiest interface is always used. Would there be a scenario where we should use the preferred interface, even though it's not healthy, while another healthier interface can be used?

Comment by Tatsushi Takamura [ 30/Aug/19 ]

Amir Shehata,

Sorry, the late replay.

Suppose there are 2 preferred routes as follows(ni_is_pref is 1 and both of healthv are same value):

 

00000400:00000200:13.0:1539748979.363069:0:14461:0:(lib-move.c:1755:lnet_select_peer_ni()) 192.168.128.202@o2ib[ffff880bf773a400]->192.168.128.201@o2ib[ffff88060a785c00] ni_is_pref = 1, healthv = 1000
00000400:00000200:13.0:1539748979.363072:0:14461:0:(lib-move.c:1755:lnet_select_peer_ni()) 192.168.128.202@o2ib[ffff880bf773a400]->192.168.130.201@o2ib[ffff880c1fa97e00] ni_is_pref = 1, healthv = 1000

 

 

/* pick the healthiest peer ni */
if (lpni_healthv < best_lpni_healthv) {
continue;
} else if (lpni_healthv > best_lpni_healthv) {
best_lpni_healthv = lpni_healthv; 
//the first route is selected temporarily, but preferred flag is not set true

/* if this is a preferred peer use it */
} else if (!preferred && ni_is_pref) {
preferred = true;
preferred flag is set true and the second route is selected
// So, the first route is never selected.

 

I'll post the patch soon. Could you see it?

Comment by Gerrit Updater [ 30/Aug/19 ]

Tatsushi Takamura (takamr.tatsushi@jp.fujitsu.com) uploaded a new patch: https://review.whamcloud.com/36002
Subject: LU-12288 lnet: preferred flag policy does not work
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 69a4f558b37aa697d69e5aa2c8d030a09f52cdc0

Comment by Gerrit Updater [ 12/Nov/20 ]

Chris Horn (chris.horn@hpe.com) uploaded a new patch: https://review.whamcloud.com/40635
Subject: LU-12288 lnet: Properly account for preferred peer NI
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: d873d2ed60b0fa7f66c3d0703454d78f2de07f4c

Generated at Sat Feb 10 02:51:16 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.