[LU-87] (filter.c:151:filter_finish_transno()) LBUG Created: 18/Feb/11 Updated: 28/Jun/11 Resolved: 21/Feb/11 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 1.8.6 |
| Fix Version/s: | Lustre 1.8.6 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Shuichi Ihara (Inactive) | Assignee: | Johann Lombardi (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
| Severity: | 3 |
| Bugzilla ID: | 20,394 |
| Rank (Obsolete): | 10332 |
| Description |
|
we hit LBUG in filter_finish_transno() on OSS and OSS got heavy loads, then it went to down, finally. Once rebooted it, then started the recovery, we got same LBUG again. In order to move back into the production, we actually started OST with abort_recov, now it's working well. I found same bug (DDN hit same bug before and filed) on bugzilla (bug 20394) and it should be fixed in 1.8.6. Our branch lustre-1.8.4.ddn2 which based on 1.8.4, but this patch is included and applied to our branch, already. So, I don't know why got same LBUG in filter_finish_transno(). Please investigate this. |
| Comments |
| Comment by Mikhail Pershin [ 18/Feb/11 ] |
|
Please pay attention to bug 24420. It was also about that LBUG. The patch there keeps assertion only on wrong transno assignment but tries to evict client which causes wrong transno order during recovery. This is just workaround but not complete solution because it is stil unclear and looks wrong that transaction ordering can be broken during OSS recovery, but this patch eliminates assertion on wire data at least. |
| Comment by Johann Lombardi (Inactive) [ 18/Feb/11 ] |
|
The patch we landed for 1.8.6 adds a LASSERT/CERROR to print the values of last_rcvd & lcd_last_transno and i don't see such a message in your logs. Moreover, the line number in the assertion (i.e. filter.c:151) seems to confirm that the patch was not applied. |
| Comment by Shuichi Ihara (Inactive) [ 18/Feb/11 ] |
|
Hi Johann, good to see and talk with you again here Sorry, this was my fault and you are correct. The the latest our branch definitely includes this patch. I've just double-checked this. However, the patch applied branch was not used for failed OSS. This is why hit bug20394 on the this customer's OSS. Thanks for Johann for this checking and please close this ticket. |
| Comment by Johann Lombardi (Inactive) [ 19/Feb/11 ] |
|
Hey Ihara. You are welcomed |