[LU-15869] In a striped directory, there is a cache inconsistency problem on multiple clients when renaming files Created: 18/May/22  Updated: 18/May/22

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.4
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Hui Chen Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Attachments: PNG File image-2022-05-18-10-07-27-084.png     PNG File image-2022-05-18-10-15-46-006.png     PNG File image-2022-05-18-10-19-08-816.png     PNG File image-2022-05-18-10-26-53-310.png     PNG File image-2022-05-18-10-27-01-414.png    
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

 

This is a problem that can be stably reproduced. The reproduction steps are as follows:

  1. Create a striped directory on Client A

 

lfs mkdir -c -1 testdir

2. Create two files on client A as follows

 

cd testdir
echo "a" > file
echo "b" > tmpfile

3. stat files that have been created on clinet A

 

 

stat file
stat tmpfile

4. stat files that have been created on clinet B

 

 

cd testdir
stat file
stat tmpfile

5. mv file to file.1 on client A

 

 

mv file file.1

6. stat file on two clients

 

 

// client A
stat file
// client B
stat file

7. mv tmpfile to file on clinet A

 

 

mv tmpfile file

 

8. stat file on two clients

 

// client A
stat file
// client B
stat file

9. mv file to file.2 on client A

 

 

mv file to file.2

10. The results of all the above steps seem to be correct, then stat file on the two clients, I found that client B read the wrong file dentry

 

 

// client A
stat file
// client B, Read the file dentry, that has been mv stat file 
stat file

 

 

Screenshots of all steps are shown below

Some clues have been found

1. The last stat on client B, the dentry of the file read from the cache

2. From the perspective of client processing flow, the cache is not released

3. systemtap trace

4. It looks like some locks are not cancelled。 I found that the lock bits=27 allocated by client B in the penultimate stat file, when receiving ldlm_callback on client B due to rename, there is no callback for bits=27 lock

 


Generated at Sat Feb 10 03:21:59 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.