Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11020

Data corruption during MDT<>OST recovery

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.12.0
    • Lustre 2.5.0, Lustre 2.11.0
    • None
    • found in IEEL3 testing, but bug looks exist from 2.5.0 to master
    • 3
    • 9223372036854775807

    Description

      Data corruption found during mdtest run with MDT failover testing in parallel.
      Visible effects is some files have lack own OST objects after delete orphan phase.
      This can be result of unreplayed requests or wrong OST reading after failover.
      Originally testing start with power lost failover tests, but it have replicated with normal shutdown + failover. This allow to grab a logs during failover / recovery.
      Cut from logs.
      Server take a request from client:

      00000100:00100000:3.0:1524592151.996038:0:18109:0:(service.c:1955:ptlrpc_server_handle_req_in()) got req x1598582492486880
      00000100:00100000:3.0:1524592151.996042:0:18109:0:(nrs_fifo.c:179:nrs_fifo_req_get()) NRS start fifo request from 12345-54@gni, seq: 1564278792
      00000100:00100000:3.0:1524592151.996043:0:18109:0:(service.c:2103:ptlrpc_server_handle_request()) Handling RPC pname:cluuid+ref:pid:xid:nid:opc mdt01_003:061cd4a0-cd22-5048-946a-d8f4ec53917a+6881:15972:x1598582492486880:12345-54@gni:101
      

      Server starts generate a ENODEV during obd_fail set in unmount.

      00000100:00080000:1.0:1524592151.996485:0:2924:0:(niobuf.c:583:ptlrpc_send_reply()) sending ENODEV from failed obd 16
      

      server issue a sync as next step of failover and last committed updated.

      00010000:00080000:3.0:1524592151.996589:0:5591:0:(ldlm_lib.c:2492:target_committed_to_req()) 0c6ce748-6618-ff9d-ad84-6e6f761558b2 last_committed 2937774893859, transno 2937774968804, xid 1598584386635888
      

      — this create is on disk

      00000004:00080000:13.0:1524592151.996630:0:8166:0:(osp_precreate.c:1476:osp_precreate_get_fid()) next [0x240000400:0x91561aae:0x0] last created [0x240000400:0x91562c9d:0x0]
      00000004:00080000:13.0:1524592151.996632:0:8166:0:(osp_object.c:1547:osp_object_create()) last fid for osp_object is [0x240000400:0x91561aae:0x0]
      00000004:00080000:13.0:1524592151.996867:0:8166:0:(osp_object.c:1601:osp_object_create()) snx11205-OST0000-osc-MDT0001: Wrote last used FID: [0x240000400:0x91561aae:0x0], index 0: 0
      

      – but this is not.

      00000004:00080000:3.0:1524592151.996637:0:18109:0:(osp_precreate.c:1476:osp_precreate_get_fid()) next [0x240000400:0x91561aaf:0x0] last created [0x240000400:0x91562c9d:0x0]
      00000004:00080000:3.0:1524592151.996639:0:18109:0:(osp_object.c:1547:osp_object_create()) last fid for osp_object is [0x240000400:0x91561aaf:0x0]
      00000004:00080000:3.0:1524592151.996641:0:18109:0:(osp_object.c:1601:osp_object_create()) snx11205-OST0000-osc-MDT0001: Wrote last used FID: [0x240000400:0x91561aaf:0x0], index 0: 0
      

      Looking in time stamps of osp_object_create calls, I found it have called in reverse. so First write have finished as second it have caused most modern write have overwrite a stale data.

      00000100:00100000:3.0:1524592151.996703:0:18109:0:(service.c:2153:ptlrpc_server_handle_request()) Handled RPC pname:cluuid+ref:pid:xid:nid:opc mdt01_003:061cd4a0-cd22-5048-946a-d8f4ec53917a+6888:15972:x1598582492486880:12345-54@gni:101 Request procesed in 660us (669us total) trans 2937774968811 rc -19/-19
      

      as affected create have processed after obd_fail have set it have finished with ENODEV and it handled as ptlrpc recoverable error, so it have resend after connection restored and recovery finished.

      Attachments

        Activity

          People

            shadow Alexey Lyashkov
            shadow Alexey Lyashkov
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: