[LU-13974] Wrong behavior for out operations create+write especially for update log Created: 21/Sep/20  Updated: 14/Oct/22  Resolved: 11/Mar/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.14.0
Fix Version/s: Lustre 2.12.9, Lustre 2.15.0

Type: Bug Priority: Major
Reporter: Alexander Boyko Assignee: Alexander Boyko
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-15761 cannot finish MDS recovery Resolved
Epic/Theme: dne
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

I've catch a next problem with update log.

When MDT writes to a new update log, after failover, a writer creates two update request. A first update with llog CREATE and  catalog modification. A second update with WRITE header and record with index 1. Both are sending through a out queue. 

After first update request osp_create_interpreter set zero to obj->opo_attr.la_valid. Any new adding llog record would  call dt_attr_get(cahe is invalid) for llog before dt_write. A size is 0, and llog record with index 2 would be written right after a llog header.

So any new llog add  between end of osp_create_interpreter(1 update req) and osp_write_interpreter(2 update req) would lead to update log corruption.

Here is a log of the problem

40000000:00000040:1.0:1600183643.158176:0:15651:0:(fid_request.c:400:seq_client_alloc_fid()) cli-cli-lustre-MDT0000-osp-MDT0001: Allocated FID [0x2000061c1:0x1:0x0]
00000004:00000040:1.0:1600183643.158181:0:15651:0:(mdt_handler.c:5793:mdt_object_init()) object init, fid = [0x2000061c1:0x1:0x0]
00000040:00001000:1.0:1600183643.158557:0:15651:0:(llog_osd.c:400:llog_osd_write_rec()) new record 106a0000 to [0x2000061c1:0x1:0x0]
00000004:00000040:1.0:1600183643.158559:0:15651:0:(osp_md_object.c:1177:osp_md_write()) write [0x2000061c1:0x1:0x0] offset = 0 length = 32768
00000004:00000040:1.0:1600183643.158597:0:15651:0:(osp_md_object.c:1177:osp_md_write()) write [0x2000061c1:0x1:0x0] offset = 32768 length = 464
00000040:00080000:1.0:1600183643.158600:0:15651:0:(llog_osd.c:708:llog_osd_write_rec()) added record [0x2000061c1:0x1:0x0].1, 464 off33232
00000004:00000040:1.0:1600183643.158614:0:29981:0:(osp_trans.c:266:object_update_request_dump()) i = 0 fid = [0x2000061c1:0x1:0x0] op = create params = 1 batchid = 0 size = 256 repsize 0
00000004:00000040:0.0:1600183643.158743:0:15657:0:(mdt_handler.c:5793:mdt_object_init()) object init, fid = [0x2000061c1:0x1:0x0]
00000020:00000040:0.0:1600183643.158763:0:15657:0:(out_handler.c:574:out_index_insert()) lustre-MDT0000: [0x20000000a:0x0:0x0] index insert [0x2000061c1:0x1:0x0]: rc = 0
00000020:00001000:0.0:1600183643.158777:0:15657:0:(out_lib.c:548:out_tx_create_exec()) lustre-MDT0000-osd: create [0x2000061c1:0x1:0x0]: dof 0, mode 100000
00000020:00000040:0.0:1600183643.158819:0:15657:0:(out_lib.c:1092:out_obj_index_insert()) lustre-MDT0000-osd: index insert [0x20000000a:0x0:0x0] name: [0x2000061c1:0x1:0x0] fid [0x2000061c1:0x1:0x0], type 32768
00000040:00001000:1.0:1600183643.163358:0:16581:0:(llog_osd.c:400:llog_osd_write_rec()) new record 106a0000 to [0x2000061c1:0x1:0x0]
00000004:00000040:1.0:1600183643.163374:0:16581:0:(osp_trans.c:266:object_update_request_dump()) i = 0 fid = [0x2000061c1:0x1:0x0] op = attr_get params = 0 batchid = 0 size = 40 repsize 208
00000004:00000040:1.0:1600183643.163641:0:16581:0:(osp_md_object.c:1177:osp_md_write()) write [0x2000061c1:0x1:0x0] offset = 0 length = 32768
00000004:00000040:1.0:1600183643.163686:0:16581:0:(osp_md_object.c:1177:osp_md_write()) write [0x2000061c1:0x1:0x0] offset = 32768 length = 424
00000040:00080000:1.0:1600183643.163690:0:16581:0:(llog_osd.c:708:llog_osd_write_rec()) added record [0x2000061c1:0x1:0x0].2, 424 off33192
00000004:00000040:1.0:1600183643.163790:0:29981:0:(osp_trans.c:266:object_update_request_dump()) i = 0 fid = [0x2000061c1:0x1:0x0] op = write params = 2 batchid = 0 size = 32832 repsize 0
00000004:00000040:1.0:1600183643.163793:0:29981:0:(osp_trans.c:266:object_update_request_dump()) i = 1 fid = [0x2000061c1:0x1:0x0] op = write params = 2 batchid = 0 size = 528 repsize 0
00000020:00000040:0.0:1600183643.164058:0:15657:0:(out_lib.c:695:out_tx_write_exec()) write [0x2000061c1:0x1:0x0] pos 0 buf ffff9bdfea590038, len 32768
00000020:00000040:0.0:1600183643.164174:0:15657:0:(out_lib.c:695:out_tx_write_exec()) write [0x2000061c1:0x1:0x0] pos 32768 buf ffff9bdfea598078, len 464
00000004:00000040:1.0:1600183643.164381:0:29981:0:(osp_trans.c:266:object_update_request_dump()) i = 0 fid = [0x2000061c1:0x1:0x0] op = write params = 2 batchid = 0 size = 32832 repsize 0
00000004:00000040:1.0:1600183643.164383:0:29981:0:(osp_trans.c:266:object_update_request_dump()) i = 1 fid = [0x2000061c1:0x1:0x0] op = write params = 2 batchid = 0 size = 488 repsize 0
00000020:00000040:0.0:1600183643.164533:0:15657:0:(out_lib.c:695:out_tx_write_exec()) write [0x2000061c1:0x1:0x0] pos 0 buf ffff9bdfea590038, len 32768
00000020:00000040:0.0:1600183643.164545:0:15657:0:(out_lib.c:695:out_tx_write_exec()) write [0x2000061c1:0x1:0x0] pos 32768 buf ffff9bdfea598078, len 424
00000020:00080000:0.0:1600183643.164547:0:15657:0:(out_lib.c:718:out_tx_write_exec()) llog wrong idx [0x2000061c1:0x1:0x0] pos 32768, len 424, size 32768 idx 2->275382272

At log record 1 and 2 was added at the same offset.

llog_osd_write_rec()) added record [0x2000061c1:0x1:0x0].1, 464 off33232
llog_osd_write_rec()) added record [0x2000061c1:0x1:0x0].2, 424 off33192

So our with index 2 would rewrite the 1.
Any thoughts on how better to fix it?



 Comments   
Comment by Alexander Boyko [ 22/Sep/20 ]

I've pushed a debug patch to catch these type of errors
https://review.whamcloud.com/#/c/39995/

Comment by Alexander Boyko [ 06/Oct/20 ]

One more update log corruption for a case

fs1-MDT0001-osp-MDT0003: [0x2:0x400024d0:0x2] Invalid record: index 112926 but expected 112925

  • Some out update request was failed, it had llog updates.
  • A new out request was declared (number 2)
  • llog osp object was invalidated(base on failure), here we have an invalid llog_handle->llog_log_hdr (count, last_index, bitmap)
  • out request(N2) called llog_add, dt_attr_get made an osp request to have a valid size and marked object valid
  • llog record was added with wrong index, at file recX, recX+2
  • got -29 from a debug patch, it detected wrong previous index. Without it error would happened during llog_process_thread as Invalid record etc if failover take a place.

The main condition is invalidation of llog osp object between declare a new out request and llog_add for it.
If declaration follow by invalidation, it does not create an error, cause llog header would be re read. 

00000020:00080000:3.0:1599041250.939177:0:51269:0:(update_trans.c:94:top_multiple_thandle_dump())  cookie [0x2:0x80006997:0x5].145572
00000040:00080000:1.0:1599041250.940534:0:51191:0:(llog_osd.c:709:llog_osd_write_rec()) added record [0x580006997:0x2:0x0].145575, 600 off74673832
00000020:00080000:2.0:1599041250.940900:0:30740:0:(update_trans.c:94:top_multiple_thandle_dump())  cookie [0x2:0x80006997:0x5].145575
00000020:00080000:2.0:1599041250.940911:0:30740:0:(update_trans.c:94:top_multiple_thandle_dump())  cookie [0x2:0x80006997:0x5].145573
00000020:00080000:2.0:1599041250.940925:0:30740:0:(update_trans.c:94:top_multiple_thandle_dump())  cookie [0x2:0x80006997:0x5].145574
00000004:00080000:2.0:1599041250.940935:0:30740:0:(osp_md_object.c:1138:osp_write_interpreter()) error [0x580006997:0x2:0x0]: rc = -2
00000004:00080000:2.0:1599041250.940936:0:30740:0:(osp_md_object.c:1138:osp_write_interpreter()) error [0x580006997:0x2:0x0]: rc = -2
00000004:00080000:2.0:1599041250.940937:0:30740:0:(osp_md_object.c:1138:osp_write_interpreter()) error [0x580006997:0x2:0x0]: rc = -2
00000004:00080000:2.0:1599041250.940938:0:30740:0:(osp_md_object.c:1138:osp_write_interpreter()) error [0x580006997:0x2:0x0]: rc = -2
00000004:00080000:2.0:1599041250.940939:0:30740:0:(osp_object.c:1347:osp_invalidate()) Invalidate osp_object [0x580006997:0x2:0x0]
00000040:00080000:1.0:1599041250.964601:0:51211:0:(llog_osd.c:709:llog_osd_write_rec()) added record [0x580006997:0x2:0x0].145576, 552 off74673784  <--- 145576 record from the same offset as 145575, llog bitmap still have 145575 record
00000020:00080000:0.0:1599041251.006396:0:30740:0:(update_trans.c:94:top_multiple_thandle_dump())  cookie [0x2:0x80006997:0x5].145576
00000004:00080000:0.0:1599041251.006402:0:30740:0:(osp_md_object.c:1138:osp_write_interpreter()) error [0x580006997:0x2:0x0]: rc = -29

 

 

Comment by Alexander Boyko [ 20/Oct/20 ]

@Peter Jones, I don't have a good solution for these bugs. And unfortunately, I did not get any response, comments or ideas from WC side(from Mike or Alex). I'm not a right person for assignee here.

Comment by Gerrit Updater [ 24/Nov/20 ]

Alexander Boyko (alexander.boyko@hpe.com) uploaded a new patch: https://review.whamcloud.com/40742
Subject: LU-13974 llog: check stale osp object
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 12804cc17552bd01968e447ca71cae5deff6d4b1

Comment by Gerrit Updater [ 24/Nov/20 ]

Alexander Boyko (alexander.boyko@hpe.com) uploaded a new patch: https://review.whamcloud.com/40743
Subject: LU-13974 tests: update log corruption
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 9b1a4b48de586c391a944e53c0dc916ec0ce84bb

Comment by Gerrit Updater [ 26/Feb/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40742/
Subject: LU-13974 llog: check stale osp object
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 82c6e42d6137f39a1f2394b7bc6e8d600eb36181

Comment by Gerrit Updater [ 10/Mar/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40743/
Subject: LU-13974 tests: update log corruption
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 562837124ec7bffeba7edb4b4b899bc271833374

Comment by Peter Jones [ 11/Mar/21 ]

Landed for 2.15

Comment by Gerrit Updater [ 11/Mar/22 ]

"Mike Pershin <mpershin@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/46803
Subject: LU-13974 llog: check stale osp object
Project: fs/lustre-release
Branch: b2_14
Current Patch Set: 1
Commit: 147f74c142de3db8bb4025af825dbcdd04c1fbbb

Comment by Gerrit Updater [ 11/Mar/22 ]

"Mike Pershin <mpershin@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/46804
Subject: LU-13974 tests: update log corruption
Project: fs/lustre-release
Branch: b2_14
Current Patch Set: 1
Commit: 35a62ff76299ab6437acd5c5c0db8f130025973f

Comment by Gerrit Updater [ 18/Mar/22 ]

"Mike Pershin <mpershin@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/46862
Subject: LU-13974 llog: check stale osp object
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 1dd5621baa836beb131af894efe751670170463b

Comment by Gerrit Updater [ 18/Mar/22 ]

"Mike Pershin <mpershin@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/46864
Subject: LU-13974 tests: update log corruption
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 853d5e4a25f393033b132659d24b7aad6916e3b8

Comment by Gerrit Updater [ 05/May/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/46862/
Subject: LU-13974 llog: check stale osp object
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 6ead672b83d9f12f9760a4af0958ef5e2b9de082

Comment by Gerrit Updater [ 05/May/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/46864/
Subject: LU-13974 tests: update log corruption
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: a2d27236181e4736f7b0689baca3e2db55387bdc

Generated at Sat Feb 10 03:05:47 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.