[LU-13582] open/rmdir/close vs xattr set race may cause a transaction exec abort Created: 18/May/20  Updated: 02/Feb/24

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Alexander Zarochentsev Assignee: Alexander Zarochentsev
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

An attempt to close a file handle to an ophan striped dir may race with a delayed xattr set or just an chrown. Due to lacking of DLM locks for mdt_close(), the race is not prevented by LDLM and only gets resolved at transaction exec phase. For a striped dir , the xattr set itransaction is a distributed operation, its rollback is complex, potentially buggy, and should be avoided if it is possible.

The following errors indicate an tx exec error:

/tmp/test_logs/1589821660/sanityn.test_107.debug_log.devvm1.1589821723.log:00000020:00000040:1.0:1589821723.399086:0:14322:0:(out_lib.c:752:out_tx_xattr_set_exec()) lustre-MDT0002-osd: set xattr buf ffff8d9e52e05548 name user.17 flag 0
/tmp/test_logs/1589821660/sanityn.test_107.debug_log.devvm1.1589821723.log:00000020:02000000:1.0:1589821723.399089:0:14322:0:(out_lib.c:756:out_tx_xattr_set_exec()) No object found [0x280000402:0x2:0x0]
/tmp/test_logs/1589821660/sanityn.test_107.debug_log.devvm1.1589821723.log:00000020:00000001:1.0:1589821723.399098:0:14322:0:(out_lib.c:843:out_tx_xattr_set_exec()) Process leaving via out (rc=18446744073709551614 : -2 : 0xfffffffffffffffe)
/tmp/test_logs/1589821660/sanityn.test_107.debug_log.devvm1.1589821723.log:00000020:00000040:1.0:1589821723.399100:0:14322:0:(out_lib.c:847:out_tx_xattr_set_exec()) lustre-MDT0002-osd: insert xattr set reply ffff8d9de08361e0 index 0: rc = -2

I got them by a reproducer I will post later.



 Comments   
Comment by Gerrit Updater [ 18/May/20 ]

Alexander Zarochentsev (alexander.zarochentsev@hpe.com) uploaded a new patch: https://review.whamcloud.com/38652
Subject: LU-13582 tests: reproducer
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 5e829eb0636e62bd99480a28a61574954517009a

Comment by Gerrit Updater [ 02/Feb/24 ]

"Alexander Zarochentsev <alexander.zarochentsev@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53891
Subject: LU-13582 dne: make tx abort verbose
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 6eb98aef2d7fffa6ff84d33130484f2386429264

Generated at Sat Feb 10 03:02:29 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.