Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13974

Wrong behavior for out operations create+write especially for update log

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.12.9, Lustre 2.15.0
    • Lustre 2.14.0
    • None
    • 3
    • 9223372036854775807

    Description

      I've catch a next problem with update log.

      When MDT writes to a new update log, after failover, a writer creates two update request. A first update with llog CREATE and  catalog modification. A second update with WRITE header and record with index 1. Both are sending through a out queue. 

      After first update request osp_create_interpreter set zero to obj->opo_attr.la_valid. Any new adding llog record would  call dt_attr_get(cahe is invalid) for llog before dt_write. A size is 0, and llog record with index 2 would be written right after a llog header.

      So any new llog add  between end of osp_create_interpreter(1 update req) and osp_write_interpreter(2 update req) would lead to update log corruption.

      Here is a log of the problem

      40000000:00000040:1.0:1600183643.158176:0:15651:0:(fid_request.c:400:seq_client_alloc_fid()) cli-cli-lustre-MDT0000-osp-MDT0001: Allocated FID [0x2000061c1:0x1:0x0]
      00000004:00000040:1.0:1600183643.158181:0:15651:0:(mdt_handler.c:5793:mdt_object_init()) object init, fid = [0x2000061c1:0x1:0x0]
      00000040:00001000:1.0:1600183643.158557:0:15651:0:(llog_osd.c:400:llog_osd_write_rec()) new record 106a0000 to [0x2000061c1:0x1:0x0]
      00000004:00000040:1.0:1600183643.158559:0:15651:0:(osp_md_object.c:1177:osp_md_write()) write [0x2000061c1:0x1:0x0] offset = 0 length = 32768
      00000004:00000040:1.0:1600183643.158597:0:15651:0:(osp_md_object.c:1177:osp_md_write()) write [0x2000061c1:0x1:0x0] offset = 32768 length = 464
      00000040:00080000:1.0:1600183643.158600:0:15651:0:(llog_osd.c:708:llog_osd_write_rec()) added record [0x2000061c1:0x1:0x0].1, 464 off33232
      00000004:00000040:1.0:1600183643.158614:0:29981:0:(osp_trans.c:266:object_update_request_dump()) i = 0 fid = [0x2000061c1:0x1:0x0] op = create params = 1 batchid = 0 size = 256 repsize 0
      00000004:00000040:0.0:1600183643.158743:0:15657:0:(mdt_handler.c:5793:mdt_object_init()) object init, fid = [0x2000061c1:0x1:0x0]
      00000020:00000040:0.0:1600183643.158763:0:15657:0:(out_handler.c:574:out_index_insert()) lustre-MDT0000: [0x20000000a:0x0:0x0] index insert [0x2000061c1:0x1:0x0]: rc = 0
      00000020:00001000:0.0:1600183643.158777:0:15657:0:(out_lib.c:548:out_tx_create_exec()) lustre-MDT0000-osd: create [0x2000061c1:0x1:0x0]: dof 0, mode 100000
      00000020:00000040:0.0:1600183643.158819:0:15657:0:(out_lib.c:1092:out_obj_index_insert()) lustre-MDT0000-osd: index insert [0x20000000a:0x0:0x0] name: [0x2000061c1:0x1:0x0] fid [0x2000061c1:0x1:0x0], type 32768
      00000040:00001000:1.0:1600183643.163358:0:16581:0:(llog_osd.c:400:llog_osd_write_rec()) new record 106a0000 to [0x2000061c1:0x1:0x0]
      00000004:00000040:1.0:1600183643.163374:0:16581:0:(osp_trans.c:266:object_update_request_dump()) i = 0 fid = [0x2000061c1:0x1:0x0] op = attr_get params = 0 batchid = 0 size = 40 repsize 208
      00000004:00000040:1.0:1600183643.163641:0:16581:0:(osp_md_object.c:1177:osp_md_write()) write [0x2000061c1:0x1:0x0] offset = 0 length = 32768
      00000004:00000040:1.0:1600183643.163686:0:16581:0:(osp_md_object.c:1177:osp_md_write()) write [0x2000061c1:0x1:0x0] offset = 32768 length = 424
      00000040:00080000:1.0:1600183643.163690:0:16581:0:(llog_osd.c:708:llog_osd_write_rec()) added record [0x2000061c1:0x1:0x0].2, 424 off33192
      00000004:00000040:1.0:1600183643.163790:0:29981:0:(osp_trans.c:266:object_update_request_dump()) i = 0 fid = [0x2000061c1:0x1:0x0] op = write params = 2 batchid = 0 size = 32832 repsize 0
      00000004:00000040:1.0:1600183643.163793:0:29981:0:(osp_trans.c:266:object_update_request_dump()) i = 1 fid = [0x2000061c1:0x1:0x0] op = write params = 2 batchid = 0 size = 528 repsize 0
      00000020:00000040:0.0:1600183643.164058:0:15657:0:(out_lib.c:695:out_tx_write_exec()) write [0x2000061c1:0x1:0x0] pos 0 buf ffff9bdfea590038, len 32768
      00000020:00000040:0.0:1600183643.164174:0:15657:0:(out_lib.c:695:out_tx_write_exec()) write [0x2000061c1:0x1:0x0] pos 32768 buf ffff9bdfea598078, len 464
      00000004:00000040:1.0:1600183643.164381:0:29981:0:(osp_trans.c:266:object_update_request_dump()) i = 0 fid = [0x2000061c1:0x1:0x0] op = write params = 2 batchid = 0 size = 32832 repsize 0
      00000004:00000040:1.0:1600183643.164383:0:29981:0:(osp_trans.c:266:object_update_request_dump()) i = 1 fid = [0x2000061c1:0x1:0x0] op = write params = 2 batchid = 0 size = 488 repsize 0
      00000020:00000040:0.0:1600183643.164533:0:15657:0:(out_lib.c:695:out_tx_write_exec()) write [0x2000061c1:0x1:0x0] pos 0 buf ffff9bdfea590038, len 32768
      00000020:00000040:0.0:1600183643.164545:0:15657:0:(out_lib.c:695:out_tx_write_exec()) write [0x2000061c1:0x1:0x0] pos 32768 buf ffff9bdfea598078, len 424
      00000020:00080000:0.0:1600183643.164547:0:15657:0:(out_lib.c:718:out_tx_write_exec()) llog wrong idx [0x2000061c1:0x1:0x0] pos 32768, len 424, size 32768 idx 2->275382272
      

      At log record 1 and 2 was added at the same offset.

      llog_osd_write_rec()) added record [0x2000061c1:0x1:0x0].1, 464 off33232
      llog_osd_write_rec()) added record [0x2000061c1:0x1:0x0].2, 424 off33192

      So our with index 2 would rewrite the 1.
      Any thoughts on how better to fix it?

      Attachments

        Issue Links

          Activity

            [LU-13974] Wrong behavior for out operations create+write especially for update log

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/46864/
            Subject: LU-13974 tests: update log corruption
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set:
            Commit: a2d27236181e4736f7b0689baca3e2db55387bdc

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/46864/ Subject: LU-13974 tests: update log corruption Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: a2d27236181e4736f7b0689baca3e2db55387bdc

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/46862/
            Subject: LU-13974 llog: check stale osp object
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set:
            Commit: 6ead672b83d9f12f9760a4af0958ef5e2b9de082

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/46862/ Subject: LU-13974 llog: check stale osp object Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: 6ead672b83d9f12f9760a4af0958ef5e2b9de082

            "Mike Pershin <mpershin@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/46864
            Subject: LU-13974 tests: update log corruption
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: 853d5e4a25f393033b132659d24b7aad6916e3b8

            gerrit Gerrit Updater added a comment - "Mike Pershin <mpershin@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/46864 Subject: LU-13974 tests: update log corruption Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: 853d5e4a25f393033b132659d24b7aad6916e3b8

            "Mike Pershin <mpershin@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/46862
            Subject: LU-13974 llog: check stale osp object
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: 1dd5621baa836beb131af894efe751670170463b

            gerrit Gerrit Updater added a comment - "Mike Pershin <mpershin@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/46862 Subject: LU-13974 llog: check stale osp object Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: 1dd5621baa836beb131af894efe751670170463b

            "Mike Pershin <mpershin@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/46804
            Subject: LU-13974 tests: update log corruption
            Project: fs/lustre-release
            Branch: b2_14
            Current Patch Set: 1
            Commit: 35a62ff76299ab6437acd5c5c0db8f130025973f

            gerrit Gerrit Updater added a comment - "Mike Pershin <mpershin@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/46804 Subject: LU-13974 tests: update log corruption Project: fs/lustre-release Branch: b2_14 Current Patch Set: 1 Commit: 35a62ff76299ab6437acd5c5c0db8f130025973f

            "Mike Pershin <mpershin@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/46803
            Subject: LU-13974 llog: check stale osp object
            Project: fs/lustre-release
            Branch: b2_14
            Current Patch Set: 1
            Commit: 147f74c142de3db8bb4025af825dbcdd04c1fbbb

            gerrit Gerrit Updater added a comment - "Mike Pershin <mpershin@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/46803 Subject: LU-13974 llog: check stale osp object Project: fs/lustre-release Branch: b2_14 Current Patch Set: 1 Commit: 147f74c142de3db8bb4025af825dbcdd04c1fbbb
            pjones Peter Jones added a comment -

            Landed for 2.15

            pjones Peter Jones added a comment - Landed for 2.15

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40743/
            Subject: LU-13974 tests: update log corruption
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 562837124ec7bffeba7edb4b4b899bc271833374

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40743/ Subject: LU-13974 tests: update log corruption Project: fs/lustre-release Branch: master Current Patch Set: Commit: 562837124ec7bffeba7edb4b4b899bc271833374

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40742/
            Subject: LU-13974 llog: check stale osp object
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 82c6e42d6137f39a1f2394b7bc6e8d600eb36181

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40742/ Subject: LU-13974 llog: check stale osp object Project: fs/lustre-release Branch: master Current Patch Set: Commit: 82c6e42d6137f39a1f2394b7bc6e8d600eb36181

            Alexander Boyko (alexander.boyko@hpe.com) uploaded a new patch: https://review.whamcloud.com/40743
            Subject: LU-13974 tests: update log corruption
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 9b1a4b48de586c391a944e53c0dc916ec0ce84bb

            gerrit Gerrit Updater added a comment - Alexander Boyko (alexander.boyko@hpe.com) uploaded a new patch: https://review.whamcloud.com/40743 Subject: LU-13974 tests: update log corruption Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 9b1a4b48de586c391a944e53c0dc916ec0ce84bb

            People

              aboyko Alexander Boyko
              aboyko Alexander Boyko
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: