[LU-1742] Fix 'Timed out tx' error message Created: 13/Aug/12 Updated: 29/Oct/20 Resolved: 29/Oct/20 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.14.0 |
| Type: | Bug | Priority: | Trivial |
| Reporter: | Brian Behlendorf | Assignee: | Cyril Bordage |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | easy, llnl | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9757 | ||||||||
| Description |
|
Misleading error message from kiblnd_check_txs_locked(). The value reported in the error message is how many seconds we exceeded the deadline by. What I (and everyone else here) would have expected before reading the source is that the value would be the total time outstanding before timing out the RDMA. LNetError: 3073:0:(o2iblnd_cb.c:2988:kiblnd_check_txs_locked()) Timed out tx: active_txs, 10 seconds |
| Comments |
| Comment by Brian Behlendorf [ 13/Aug/12 ] |
| Comment by Johann Lombardi (Inactive) [ 13/Aug/12 ] |
|
Hi Brian, did you really intend to file this bug as a severity 1? |
| Comment by Isaac Huang (Inactive) [ 13/Aug/12 ] |
|
Another problem with this error message is that it doesn't tell us how long the tx has been actually on the wire, e.g. the error message above told us a tx expired 60 seconds (10 + default ko2iblnd timeout) after it was queued BUT:
It'd be very useful to be able to distinguish the two cases. |
| Comment by Peter Jones [ 13/Aug/12 ] |
|
Brian B, I have dropped the severity because I assume that LLNL is not down as a result of this issue. Please speak up if I am mistaken Isaac please can you take care of this one. |
| Comment by Brian Behlendorf [ 13/Aug/12 ] |
|
Sorry, this was accidentally filed as high priority. The fix to update the error message is of course not critical. However, ORI-735 is a big deal for us since it's currently preventing us from running IOR on Sequoia. So we need to absolutely get to the bottom of why that is happening , we're just starting to investigate in the context of ORI-735. I'll probably update the patch based on Isaac's suggestion so we can get some more visibility in to actually what's going wrong. |
| Comment by Gerrit Updater [ 26/Sep/18 ] |
|
Sonia Sharma (sharmaso@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33235 |
| Comment by Gerrit Updater [ 10/Jun/20 ] |
|
Patch has been reverted due to |
| Comment by Peter Jones [ 17/Jun/20 ] |
|
Is this still a live issue for LLNL? |
| Comment by Gerrit Updater [ 29/Oct/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/3622/ |
| Comment by Peter Jones [ 29/Oct/20 ] |
|
Landed for 2.14 |