Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-87

(filter.c:151:filter_finish_transno()) LBUG

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 1.8.6
    • Lustre 1.8.6
    • None
    • 3
    • 20,394
    • 10332

    Description

      we hit LBUG in filter_finish_transno() on OSS and OSS got heavy loads, then it went to down, finally. Once rebooted it, then started the recovery, we got same LBUG again. In order to move back into the production, we actually started OST with abort_recov, now it's working well.

      I found same bug (DDN hit same bug before and filed) on bugzilla (bug 20394) and it should be fixed in 1.8.6. Our branch lustre-1.8.4.ddn2 which based on 1.8.4, but this patch is included and applied to our branch, already. So, I don't know why got same LBUG in filter_finish_transno(). Please investigate this.

      Attachments

        Activity

          [LU-87] (filter.c:151:filter_finish_transno()) LBUG
          pjones Peter Jones made changes -
          Affects Version/s New: Lustre 1.8.6 [ 10022 ]
          Affects Version/s Original: Lustre 1.8.x [ 10010 ]
          pjones Peter Jones made changes -
          Status Original: Resolved [ 5 ] New: Closed [ 6 ]
          pjones Peter Jones made changes -
          Resolution New: Fixed [ 1 ]
          Status Original: Reopened [ 4 ] New: Resolved [ 5 ]
          pjones Peter Jones made changes -
          Assignee Original: Robert Read [ rread ] New: Johann Lombardi [ johann ]
          Resolution Original: Fixed [ 1 ]
          Status Original: Closed [ 6 ] New: Reopened [ 4 ]
          johann Johann Lombardi (Inactive) made changes -
          Fix Version/s New: Lustre 1.8.6 [ 10022 ]
          Resolution New: Fixed [ 1 ]
          Status Original: Open [ 1 ] New: Closed [ 6 ]

          Hey Ihara. You are welcomed

          johann Johann Lombardi (Inactive) added a comment - Hey Ihara. You are welcomed

          Hi Johann, good to see and talk with you again here

          Sorry, this was my fault and you are correct. The the latest our branch definitely includes this patch. I've just double-checked this. However, the patch applied branch was not used for failed OSS. This is why hit bug20394 on the this customer's OSS. Thanks for Johann for this checking and please close this ticket.

          ihara Shuichi Ihara (Inactive) added a comment - Hi Johann, good to see and talk with you again here Sorry, this was my fault and you are correct. The the latest our branch definitely includes this patch. I've just double-checked this. However, the patch applied branch was not used for failed OSS. This is why hit bug20394 on the this customer's OSS. Thanks for Johann for this checking and please close this ticket.

          The patch we landed for 1.8.6 adds a LASSERT/CERROR to print the values of last_rcvd & lcd_last_transno and i don't see such a message in your logs. Moreover, the line number in the assertion (i.e. filter.c:151) seems to confirm that the patch was not applied.
          Are you sure to run a version which has the patch applied?

          johann Johann Lombardi (Inactive) added a comment - The patch we landed for 1.8.6 adds a LASSERT/CERROR to print the values of last_rcvd & lcd_last_transno and i don't see such a message in your logs. Moreover, the line number in the assertion (i.e. filter.c:151) seems to confirm that the patch was not applied. Are you sure to run a version which has the patch applied?

          Please pay attention to bug 24420. It was also about that LBUG. The patch there keeps assertion only on wrong transno assignment but tries to evict client which causes wrong transno order during recovery. This is just workaround but not complete solution because it is stil unclear and looks wrong that transaction ordering can be broken during OSS recovery, but this patch eliminates assertion on wire data at least.

          tappro Mikhail Pershin added a comment - Please pay attention to bug 24420. It was also about that LBUG. The patch there keeps assertion only on wrong transno assignment but tries to evict client which causes wrong transno order during recovery. This is just workaround but not complete solution because it is stil unclear and looks wrong that transaction ordering can be broken during OSS recovery, but this patch eliminates assertion on wire data at least.
          ihara Shuichi Ihara (Inactive) created issue -

          People

            johann Johann Lombardi (Inactive)
            ihara Shuichi Ihara (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: