[LU-16419] update log failure shouldn't cause operation failure Created: 21/Dec/22  Updated: 21/Dec/22

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Lai Siyao Assignee: Lai Siyao
Resolution: Unresolved Votes: 1
Labels: None

Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Update log failure shouldn't cause operation fail, because it's used for recovery, and there is still chance that operation can succeed, otherwise it may cause inconsistency in the system and cause some directories/files not accessible.

Below error message is seen:

Dec 13 16:10:41 vos08 kernel: LustreError: 7960:0:(llog_cat.c:597:llog_cat_add_rec()) llog_write_rec -2: lh=ffff9a2d43d99000
Dec 13 16:10:41 vos08 kernel: LustreError: 7960:0:(update_trans.c:1075:top_trans_stop()) exa5-MDT0003-osp-MDT0007: write updates failed: rc = -2

And the operation mv failed, which caused the renamed directory not accessible.



 Comments   
Comment by Alex Zhuravlev [ 21/Dec/22 ]

such -2 would mean that many subsequent operations will get -2 as well so no record in the update llog.

Comment by Lai Siyao [ 21/Dec/22 ]

Yes, IMO llog_cat_add_rec() should set chd_current_log = NULL upon this error, so that the subsequent operations will pick the next log handle to write record.

Generated at Sat Feb 10 03:26:51 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.