[LU-12453] ko2iblnd: problem handling link failures on bonded interfaces Created: 19/Jun/19 Updated: 03/Oct/19 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.8 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Lukasz Flis | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Environment: |
RDMA over Ethernet:
|
||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
We have encountered a problem when running RoCEv2 over bonded interfaces. In such case only solution is to reenable/fix primary interface or restart lnet by reloading kernel modules. Problem has been seen on ES7990 as well as in vanilla lustre 2.10.* Normaly when bonding is created on top of two ports belonging to the same HCA - mlx driver is handling link failure by moving QPs. In case described above link failure must be handled in ko2iblnd driver. Log message related to the described bug is logged when problem occurs: e0-oss03 kernel: LNetError: 4598:0:(o2iblnd.c:831:kiblnd_create_conn()) cmid HCA(mlx5_0), kib_dev(bond0.881) need failover
|