Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10993

Fix for LU-10826 is problematic and skips recvoery

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Minor
    • None
    • Lustre 2.12.0
    • None
    • 2
    • 9223372036854775807

    Description

      I think aptch https://review.whamcloud.com/#/c/31690/ for LU-10826 is more problematic.
      after apply patch https://review.whamcloud.com/#/c/31690/ and test_req_buffer_pressure=1, it prevents OOM, but they are skipping some recvoery clients.

      [root@voss05 ~]#  lctl get_param obdfilter.*.recovery_status
      obdfilter.scratch-OST0024.recovery_status=
      status: COMPLETE
      recovery_start: 1525317355
      recovery_duration: 54
      completed_clients: 7249/7249
      replayed_requests: 0
      last_transno: 98784247808
      VBR: DISABLED
      IR: ENABLED
      obdfilter.scratch-OST0025.recovery_status=
      status: COMPLETE
      recovery_start: 1525317353
      recovery_duration: 56
      completed_clients: 7031/7031
      replayed_requests: 0
      last_transno: 98784247808
      VBR: DISABLED
      IR: ENABLED
      obdfilter.scratch-OST0026.recovery_status=
      status: COMPLETE
      recovery_start: 1525317352
      recovery_duration: 57
      completed_clients: 8168/8168
      replayed_requests: 0
      last_transno: 98784247808
      VBR: DISABLED
      IR: ENABLED
      obdfilter.scratch-OST0027.recovery_status=
      status: COMPLETE
      recovery_start: 1525317350
      recovery_duration: 59
      completed_clients: 8195/8195
      replayed_requests: 0
      last_transno: 98784247808
      VBR: DISABLED
      IR: ENABLED
      obdfilter.scratch-OST0028.recovery_status=
      status: COMPLETE
      recovery_start: 1525317355
      recovery_duration: 54
      completed_clients: 7984/7984
      replayed_requests: 0
      last_transno: 98784247808
      VBR: DISABLED
      IR: ENABLED
      obdfilter.scratch-OST0029.recovery_status=
      status: COMPLETE
      recovery_start: 1525317352
      recovery_duration: 57
      completed_clients: 7985/7985
      replayed_requests: 0
      last_transno: 98784247808
      VBR: DISABLED
      IR: ENABLED
      obdfilter.scratch-OST002a.recovery_status=
      status: COMPLETE
      recovery_start: 1525317354
      recovery_duration: 55
      completed_clients: 8329/8329
      replayed_requests: 0
      last_transno: 98784247808
      VBR: DISABLED
      IR: ENABLED
      obdfilter.scratch-OST002b.recovery_status=
      status: COMPLETE
      recovery_start: 1525317351
      recovery_duration: 58
      completed_clients: 8291/8291
      replayed_requests: 0
      last_transno: 98784247808
      VBR: DISABLED
      IR: ENABLED
      obdfilter.scratch-OST002c.recovery_status=
      status: COMPLETE
      recovery_start: 1525317350
      recovery_duration: 59
      completed_clients: 8286/8286
      replayed_requests: 0
      last_transno: 94489280512
      VBR: DISABLED
      IR: ENABLED
      

      And, aslo sometimes, recovery still never triggered. e.g failover situation.
      I see the messages after restart OSTs

      [ 9169.158440] Lustre: 14598:0:(events.c:368:request_in_callback()) All ost request buffers busy
      [ 9169.158447] Lustre: 14598:0:(events.c:368:request_in_callback()) Skipped 3508 previous similar messages
      

      Attachments

        Issue Links

          Activity

            People

              tappro Mikhail Pershin
              ihara Shuichi Ihara (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: