[LU-7434] lost bulk leads to a hang Created: 17/Nov/15 Updated: 04/Mar/20 Resolved: 17/Jun/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.9.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Vitaly Fertman | Assignee: | Jinshan Xiong (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | llnlfixready, patch | ||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||||||||||
| Description |
|
The reverse order of request_out_callback() and reply_in_callback() puts the RPC into UNREGISTERING state, which is waiting for RPC & bulk md unlink, whereas only RPC md unlink has been called so far. If bulk is lost, even expired_set does not check for UNREGISTERING state. The same for write if server returns an error. |
| Comments |
| Comment by Gerrit Updater [ 17/Nov/15 ] |
|
Vitaly Fertman (vitaly.fertman@seagate.com) uploaded a new patch: http://review.whamcloud.com/17221 |
| Comment by Gerrit Updater [ 15/Mar/16 ] |
|
Vitaly Fertman (vitaly.fertman@seagate.com) uploaded a new patch: http://review.whamcloud.com/18934 |
| Comment by Gerrit Updater [ 21/Apr/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/17221/ |
| Comment by Bob Glossman (Inactive) [ 22/Apr/16 ] |
|
another on master: |
| Comment by Bob Glossman (Inactive) [ 22/Apr/16 ] |
|
another on master: this test fail included the fix from http://review.whamcloud.com/17221, so it's still happening |
| Comment by Bob Glossman (Inactive) [ 23/Apr/16 ] |
|
another on master: |
| Comment by Gerrit Updater [ 26/Apr/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) uploaded a new patch: http://review.whamcloud.com/19778 |
| Comment by Gerrit Updater [ 26/Apr/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19778/ |
| Comment by Andreas Dilger [ 26/Apr/16 ] |
|
When resubmitting the http://review.whamcloud.com/17221 patch, please include an additional testing request in the patch commit message for the failing test to ensure that it is not still failing intermittently. The test failed about 1/4 recent test runs, so if the patch can pass 8 recovery-small test runs in a row it should be good: Test-Parameters: testlist=recovery-small,recovery-small,recovery-small,recovery-small,recovery-small,recovery-small |
| Comment by Gerrit Updater [ 28/Apr/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/18934/ |
| Comment by Vitaly Fertman [ 03/May/16 ] |
| Comment by Gerrit Updater [ 03/May/16 ] |
|
Chris Horn (hornc@cray.com) uploaded a new patch: http://review.whamcloud.com/19953 |
| Comment by Cory Spitz [ 13/Jun/16 ] |
|
Can we land this in time for 2.9.0? Is there something more to do to pave the way? |
| Comment by Gerrit Updater [ 16/Jun/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19953/ |