[LU-7383]  migrate failed: Device or resource busy (-16) after "ls" dir Created: 04/Nov/15  Updated: 02/Feb/18  Resolved: 09/Dec/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: Lustre 2.8.0

Type: Bug Priority: Minor
Reporter: Sarah Liu Assignee: Di Wang
Resolution: Fixed Votes: 0
Labels: None
Environment:

client and server: lustre-master build #3226 RHEL7


Issue Links:
Related
is related to LU-10597 lfs mv vs stat deadlock Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

according to the migrate dir test plan, here is what I do:

1. Setup lustre with 4 MDTs, 4 OSTs and 1 client.
2. Create 5 directories /mnt/lustre/migrate{1..5} and 100 files under each directory on MDT0
3. Create another directory /mnt/lustre/other_dir also on MDT0, then create some symbol_link/link files, which should be linked to files under /mnt/lustre/migrate{1..5}
4. ls /mnt/lustre/other_dir
5. Migrate /mnt/lustre/migrate2 by
lfs migrate -m 1 /mnt/lustre/migrate2

Got following failure:

[root@onyx-27 lustre]# lfs migrate -m 1 /mnt/lustre/migrate2
/mnt/lustre/migrate2/1 migrate failed: Device or resource busy (-16)
error: migrate: migrate file '/mnt/lustre/migrate2' failed

After talked with Di, he said there are two ways to bypass this problem:
either: touch another file under other_dir

[root@onyx-27 lustre]# touch other_dir/a
[root@onyx-27 lustre]# lfs migrate -m 1 /mnt/lustre/migrate2
[root@onyx-27 lustre]#

or umount and mount the client



 Comments   
Comment by Di Wang [ 04/Nov/15 ]

The reason here is that the lock of the migrating object is being cached on the client side by

ls /mnt/lustre/other_dir

And during migration, to avoid deadlock, we use mdt_object_lock_try to enqueue all of objects in linkea for multiple link file, which will return -EBUSY in such case.

So there are two options

1. If lock_try fails, then enqueue the lock with normal mdt_object_lock, but it will cause deadlock especially you add migration inside racer. Sigh, I wish we had a way to order the locks somehow.

2. Leave it as it is now, because the lock of the migrating object is cached on the client side, and it is indeed busy for migration, (though it is a bit tricky). Note: this only happens for multiple link file. Normal file should be ok.

Comment by Oleg Drokin [ 04/Nov/15 ]

you probably can have a EBUSY handler inside of the lfs migrate to attempt to invalidate the lock once and repeat the operation in the hopes that woul work before returning an error.

Comment by Gerrit Updater [ 05/Nov/15 ]

wangdi (di.wang@intel.com) uploaded a new patch: http://review.whamcloud.com/17048
Subject: LU-7383 mdt: retry for busy lock during migration
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 51bffb2cbe7df7f76a7b66293f15c8566c86feaf

Comment by Gerrit Updater [ 09/Dec/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/17048/
Subject: LU-7383 mdt: retry for busy lock during migration
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 5cb0a721aea53ffc8230190c3a0b35e71a47d35b

Comment by Peter Jones [ 09/Dec/15 ]

Landed for 2.8

Generated at Sat Feb 10 02:08:26 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.