[LU-87] (filter.c:151:filter_finish_transno()) LBUG Created: 18/Feb/11  Updated: 28/Jun/11  Resolved: 21/Feb/11

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 1.8.6
Fix Version/s: Lustre 1.8.6

Type: Bug Priority: Minor
Reporter: Shuichi Ihara (Inactive) Assignee: Johann Lombardi (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Attachments: File t2s007055.messages    
Severity: 3
Bugzilla ID: 20,394
Rank (Obsolete): 10332

 Description   

we hit LBUG in filter_finish_transno() on OSS and OSS got heavy loads, then it went to down, finally. Once rebooted it, then started the recovery, we got same LBUG again. In order to move back into the production, we actually started OST with abort_recov, now it's working well.

I found same bug (DDN hit same bug before and filed) on bugzilla (bug 20394) and it should be fixed in 1.8.6. Our branch lustre-1.8.4.ddn2 which based on 1.8.4, but this patch is included and applied to our branch, already. So, I don't know why got same LBUG in filter_finish_transno(). Please investigate this.



 Comments   
Comment by Mikhail Pershin [ 18/Feb/11 ]

Please pay attention to bug 24420. It was also about that LBUG. The patch there keeps assertion only on wrong transno assignment but tries to evict client which causes wrong transno order during recovery. This is just workaround but not complete solution because it is stil unclear and looks wrong that transaction ordering can be broken during OSS recovery, but this patch eliminates assertion on wire data at least.

Comment by Johann Lombardi (Inactive) [ 18/Feb/11 ]

The patch we landed for 1.8.6 adds a LASSERT/CERROR to print the values of last_rcvd & lcd_last_transno and i don't see such a message in your logs. Moreover, the line number in the assertion (i.e. filter.c:151) seems to confirm that the patch was not applied.
Are you sure to run a version which has the patch applied?

Comment by Shuichi Ihara (Inactive) [ 18/Feb/11 ]

Hi Johann, good to see and talk with you again here

Sorry, this was my fault and you are correct. The the latest our branch definitely includes this patch. I've just double-checked this. However, the patch applied branch was not used for failed OSS. This is why hit bug20394 on the this customer's OSS. Thanks for Johann for this checking and please close this ticket.

Comment by Johann Lombardi (Inactive) [ 19/Feb/11 ]

Hey Ihara. You are welcomed

Generated at Sat Feb 10 01:03:38 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.