[LU-12287] Unable to detect device faults from IB event queue Created: 13/May/19  Updated: 13/Oct/22  Resolved: 08/Jul/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.1
Fix Version/s: Lustre 2.14.0

Type: Bug Priority: Major
Reporter: Tatsushi Takamura Assignee: Tatsushi Takamura
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
Related
Epic/Theme: lnet
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

In LU-9120 handling device failure, but the following IB events cannnot be handlled by QP event handler.

  • IB_EVENT_DEVICE_FATAL
  • IB_EVENT_PORT_ERR
  • IB_EVENT_PORT_ACTIVE

 

We are implementing IB event handler which handles device errors such as hardware errors and link down. By using this IB event handler, we intend to detect these fatal device errors.
We will make a patch in a few week.



 Comments   
Comment by Gerrit Updater [ 03/Jun/19 ]

Tatsushi Takamura (takamr.tatsushi@jp.fujitsu.com) uploaded a new patch: https://review.whamcloud.com/35036
Subject: LU-12287 lnet: handling device failure by IB event handler
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 5b4011209ab1c4d2f664323f17ef986046957079

Comment by Gerrit Updater [ 03/Jun/19 ]

Tatsushi Takamura (takamr.tatsushi@jp.fujitsu.com) uploaded a new patch: https://review.whamcloud.com/35037
Subject: LU-12287 lnet: handling device failure by IB event handler
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 237762159a58a5722b0dbfc0542d93585807a44f

Comment by Gerrit Updater [ 08/Feb/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35037/
Subject: LU-12287 lnet: handling device failure by IB event handler
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: c6e4c21c4f8b04abc53c1010a697eb3ada4fb315

Comment by Andreas Dilger [ 08/Jul/20 ]

Patch included into 2.13.52

Comment by Gerrit Updater [ 13/Oct/22 ]

"Serguei Smirnov <ssmirnov@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/48843
Subject: LU-12287 lnet: handling device failure by IB event handler
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: b5466c56f1bf49c4b3b3727584f8092acfb1a8ba

Generated at Sat Feb 10 02:51:15 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.