Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14876

OUT: possible concurrent execution of UPDATE request and its resent

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.12.8, Lustre 2.15.0
    • Lustre 2.12.7, Lustre 2.15.0
    • None
    • 3
    • 9223372036854775807

    Description

      There is possible LBUG() in out_reconstruct():

      lustre_update.h:246:object_update_result_insert()) LBUG
      

      Bug happened because export lcd_last_xid became the same as request rq_xid in the middle of OUT UPDATE resent processing.

      1. The first update with index 0 was not equal to lcd_last_xid and was added as normal update to be processed.
      2. The second update, index 1, finds that lcd_last_xid is the same and entered out_reconstruct() first time. In the object_update_result_get() it finds that there is no ourp_lens[0] for previous index 0 and returned NULL as result causing assertion.

      I am pretty sure about sequence and log messages confirm that. This revealed at least two issues with OUT UPDATE resent handling.

      1. req_xid_is_last() check shouldn't be done for each update, this is always the same request with the same XID, so it is either last or not and that should be checked only once prior updates processing. 

      2. In step #2 of scenario the lcd_last_xid was changed and became the same as request's one. This is the real problem and means that original request was processing while resent also starts processing. For ordinary clients this is prevented by checking exp_rpc_count in target_handle_connec() but for MDS-MDS re-connection it is flawed somewhere it seems.

      Attachments

        Activity

          People

            tappro Mikhail Pershin
            tappro Mikhail Pershin
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: