[LU-1882] conf-sanity test_45: umount2: Device or resource busy Created: 11/Sep/12 Updated: 18/Nov/21 Resolved: 08/Oct/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.3.0, Lustre 2.4.0, Lustre 2.4.1, Lustre 2.5.0, Lustre 2.6.0, Lustre 2.5.1, Lustre 2.7.0, Lustre 2.5.3 |
| Fix Version/s: | Lustre 2.9.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Maloo | Assignee: | Bob Glossman (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | patch | ||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||||||||||
| Rank (Obsolete): | 5222 | ||||||||||||||||||||||||||||
| Description |
|
This issue was created by maloo for yujian <yujian@whamcloud.com> This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/aaac51d6-fa70-11e1-887d-52540035b04c. Lustre Build: http://build.whamcloud.com/job/lustre-b2_3/16 The sub-test test_45 failed with the following error: Stopping /mnt/mds1 (opts:-f) on client-30vm3
CMD: client-30vm3 umount -d -f /mnt/mds1
CMD: client-30vm3 lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
sleep 60 sec
CMD: client-30vm6.lab.whamcloud.com lctl set_param fail_loc=0x50f
fail_loc=0x50f
sleep 10 sec
manual umount lustre on /mnt/lustre....
CMD: client-30vm6.lab.whamcloud.com umount -d --force /mnt/lustre
umount2: Device or resource busy
umount: /mnt/lustre: device is busy.
(In some cases useful info about processes that use
the device is found by lsof(8) or fuser(1))
df: umount2: Device or resource busy
conf-sanity test_45: @@@@@@ FAIL: test_45 failed with 3
`/mnt/lustre': Cannot send after transport endpoint shutdown
df: no file systems processed
Info required for matching: conf-sanity 45 |
| Comments |
| Comment by Jian Yu [ 11/Sep/12 ] |
|
This test passed in another test session against the same distro/arch and Lustre build: |
| Comment by Jian Yu [ 09/Aug/13 ] |
|
More instances on Lustre b2_4 branch: |
| Comment by Peter Jones [ 12/Aug/13 ] |
|
Bob is looking at this one |
| Comment by Bob Glossman (Inactive) [ 12/Aug/13 ] |
|
Seems like all the reported instances happen in 'full' tests, never in review tests. Suspecting something in the environment gets used up in full tests only. Even in full tests the failure is quite intermittent. |
| Comment by Peter Jones [ 16/Aug/13 ] |
|
Thanks Bob. It sounds like this is lower priority for now |
| Comment by Jian Yu [ 09/Sep/13 ] |
|
Lustre client: http://build.whamcloud.com/job/lustre-b2_1/215/ (2.1.6) conf-sanity test 45 hit the same failure: |
| Comment by Jian Yu [ 03/Dec/13 ] |
|
More instances on Lustre b2_4 branch: |
| Comment by Jian Yu [ 03/Dec/13 ] |
|
Lustre Build: http://build.whamcloud.com/job/lustre-b2_5/4/ The same failure occurred: |
| Comment by Jian Yu [ 14/Dec/13 ] |
|
More instance on Lustre b2_4 branch: |
| Comment by Jian Yu [ 06/Jan/14 ] |
|
Lustre Build: http://build.whamcloud.com/job/lustre-b2_5/5/ The same failure occurred: |
| Comment by Jian Yu [ 12/Jan/14 ] |
|
More instances on Lustre b2_5 branch: |
| Comment by Sarah Liu [ 07/Feb/14 ] |
|
Hit this error in master tag-2.5.55 DNE testing. https://maloo.whamcloud.com/test_sets/b84701bc-8e06-11e3-b27d-52540035b04c client console shows 21:29:42:Lustre: Skipped 127 previous similar messages 21:29:42:LustreError: 31078:0:(import.c:323:ptlrpc_invalidate_import()) lustre-MDT0000_UUID: rc = -110 waiting for callback (3 != 0) 21:29:42:LustreError: 31078:0:(import.c:323:ptlrpc_invalidate_import()) Skipped 127 previous similar messages 21:29:42:LustreError: 31078:0:(import.c:349:ptlrpc_invalidate_import()) @@@ still on sending list req@ffff88007a8e3800 x1459084469469476/t0(0) o38->lustre-MDT0000-mdc-ffff8800737cec00@10.10.4.198@tcp:12/10 lens 400/544 e 0 to 0 dl 1391491497 ref 1 fl Unregistering:EN/0/ffffffff rc -5/-1 21:29:42:LustreError: 31078:0:(import.c:349:ptlrpc_invalidate_import()) Skipped 127 previous similar messages 21:29:42:LustreError: 31078:0:(import.c:365:ptlrpc_invalidate_import()) lustre-MDT0000_UUID: RPCs in "Unregistering" phase found (3). Network is sluggish? Waiting them to error out. 21:29:42:LustreError: 31078:0:(import.c:365:ptlrpc_invalidate_import()) Skipped 127 previous similar messages 21:29:42:LustreError: 30653:0:(mdc_locks.c:917:mdc_enqueue()) ldlm_cli_enqueue: -5 21:29:42:LustreError: 30653:0:(vvp_io.c:1230:vvp_io_init()) lustre: refresh file layout [0x200002b10:0x1:0x0] error -5. 21:29:42:LustreError: 30730:0:(mdc_locks.c:917:mdc_enqueue()) ldlm_cli_enqueue: -5 21:29:42:LustreError: 30730:0:(file.c:3087:ll_inode_revalidate_fini()) lustre: revalidate FID [0x200000007:0x1:0x0] error: rc = -5 21:29:42:Lustre: setting import lustre-MDT0001_UUID INACTIVE by administrator request 21:29:42:Lustre: 29951:0:(llite_lib.c:2512:ll_dirty_page_discard_warn()) lustre: dirty page discard: 10.10.4.198@tcp:/lustre/fid: [0x200002b10:0x1:0x0]/ may get corrupted (rc -108) 21:29:42:LustreError: 30730:0:(file.c:3087:ll_inode_revalidate_fini()) lustre: revalidate FID [0x200000007:0x1:0x0] error: rc = -108 21:29:42:Lustre: DEBUG MARKER: /usr/sbin/lctl mark conf-sanity test_45: @@@@@@ FAIL: test_45 failed with 3 21:29:42:Lustre: DEBUG MARKER: conf-sanity test_45: @@@@@@ FAIL: test_45 failed with 3 |
| Comment by Jian Yu [ 07/Mar/14 ] |
|
Lustre Build: http://build.whamcloud.com/job/lustre-b2_5/39/ (2.5.1 RC1) The same failure occurred: |
| Comment by Sarah Liu [ 25/Mar/14 ] |
|
https://maloo.whamcloud.com/test_sets/18b5b81c-b243-11e3-a93f-52540035b04c also hit in interop test between 2.5.1 server and master client(build # 1945) |
| Comment by Jian Yu [ 31/Aug/14 ] |
|
Lustre Build: https://build.hpdd.intel.com/job/lustre-b2_5/86/ (2.5.3 RC1) The same failure occurred: https://testing.hpdd.intel.com/test_sets/1c5fee1c-3086-11e4-a3d9-5254006e85c2 |
| Comment by Jian Yu [ 02/Dec/14 ] |
|
More instance on Lustre b2_5 branch: |
| Comment by Sarah Liu [ 16/Dec/14 ] |
|
saw this failure again in interop testing between 2.6.0 server and master client: https://testing.hpdd.intel.com/test_sets/76457df4-7fb2-11e4-a9c0-5254006e85c2 |
| Comment by James Nunez (Inactive) [ 11/Feb/15 ] |
|
I see this failure with lustre-master tag 2.6.93. Results are at https://testing.hpdd.intel.com/test_sessions/fff27cc4-addd-11e4-a0b6-5254006e85c2 |
| Comment by James Nunez (Inactive) [ 06/Mar/15 ] |
|
I hit this issue again with 2.7.0-RC4. Results are at https://testing.hpdd.intel.com/test_sessions/193dce6a-c42f-11e4-a0ef-5254006e85c2 |
| Comment by Sushant Mane [ 25/Feb/16 ] |
umount2: Device or resource busy umount: /mnt/lustre: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) umount2: Device or resource busy The reason for umount failure with EBUSY(-16) is the race with df -h $MOUNT. When df -h $MOUNT is executed, mnt->mnt_count of $MOUNT is incremented by 1 and mnt->mnt_count becomes 2. df calls fstat() which internally calls ll_getattr(). As MDS is down ll_getattr keep on waiting for reply from MDS. Further when umount -df $MOUNT is executed mnt->mnt_count becomes 3. As this is force umount ll_umount_begin() function is called. By the time ll_umount_begin() completes its execution: FAIL case
fs/namespace.c:do_umount() retval = -EBUSY;
if (flags & MNT_DETACH || !propagate_mount_busy(mnt, 2)) {
if (!list_empty(&mnt->mnt_list))
umount_tree(mnt, 1, &umount_list);
retval = 0;
}
As a result of this sys_umount returns EBUSY. |
| Comment by Rahul Deshmukh (Inactive) [ 01/Apr/16 ] |
|
About the problem: When I started investigating this issue, it appear that there is some kind of race between ll_umount and ll_getattr. Here we are stopping the MDS and then running command "df -h $MOUNT &" on which make VFS to increment the mnt_count and call ll_getattr. Also gives command to forcefull umount the client. Now if ll_getattr finish earlier than Here say if we try to umount the client again after some time then ll_getattr completes and able to umount the client. I am able to re-create this bug on my local setup and tried following script change to verify. - manual_umount_client --force || return 3 + manual_umount_client --force + rc=$? + # Try again as getattr might have not done yet + if [ $rc != 0 ]; then + echo "Client umount fails, try again !!" + log "try again: sleep 10 sec" + sleep 10 + manual_umount_client --force || return 3 + fi Output collected from our test board. sleep 10 sec
manual umount lustre on /mnt/lustre....
umount2: Device or resource busy <======================= EBUSY: as getattr not done yet and umount reached first
umount: /mnt/lustre: device is busy.
(In some cases useful info about processes that use
the device is found by lsof(8) or fuser(1))
umount2: Device or resource busy
Client umount fails, try again !! <============================ Tried again and success, TEST PASS
try again: sleep 10 sec
df: `/mnt/lustre': Cannot send after transport endpoint shutdown
df: no file systems processed
manual umount lustre on /mnt/lustre....
fail_loc=0x0
start mds service on fre0205
Now about the fix: 1. We can fix this at test case level as decsribe above i.e. admin needs to handle it. OR 2. Need to handle some where in ll_umount_begin() ? Or any other thought ll_umount_begin()
{
:
:
»·······/* Really, we'd like to wait until there are no requests outstanding,
»······· * and then continue. For now, we just invalidate the requests,
»······· * schedule() and sleep one second if needed, and hope.
»······· */
»·······schedule(); <===========
»·······EXIT;
}
|
| Comment by Andreas Dilger [ 13/Apr/16 ] |
|
My preference would be to fix this internal to Lustre rather than at the test script level, so that users who are calling "umount -f" will get what they expect (unmounted filesystem) instead of having to retry. This test was made to verify that "umount -f" with a blocked RPC would properly kill the RPCs in flight and unmount, so changing the test is just hiding the problem. It would seem that the problem is in obd_iocontrol(IOC_OSC_SET_ACTIVE)->ptlrpc_set_import_active->ptlrpc_invalidate_import() not cleaning up the blocked RPC properly, since the comment for that function states: /** * This function will invalidate the import, if necessary, then block * for all the RPC completions, and finally notify the obd to * invalidate its state (ie cancel locks, clear pending requests, * etc). */ void ptlrpc_invalidate_import(struct obd_import *imp) which clearly is not happening properly in this case. Minor note - any patch for this ticket should fix conf-sanity.sh test_45() to reference OBD_FAIL_PTLRPC_LONG_REPL_UNLINK instead of OBD_FAIL_PTLRPC_LONG_UNLINK, which no longer exists. |
| Comment by Rahul Deshmukh (Inactive) [ 13/Apr/16 ] |
|
Andreas thanks for looking into it and suggestions. > It would seem that the problem is in obd_iocontrol(IOC_OSC_SET_ACTIVE) Actually I have investigated that portion and there is no problem with ptlrpc_invalidate_import() OR the race is not with RPC which is already killed. The problem here is fact that vfs_stat (not ll_getattr) has not exited yet, still having its mntcount. vfs_stat
\__ vfs_fstatat
\__ user_path_at --------> mntget i.e. atomic_inc(&mnt->mnt_count);
|
|\__ vfs_getattr
| \__ inode->i_op->getattr --> ll_getattr
|
\__ path_put ----------> mntput i.e. atomic_dec_and_lock(&mnt->mnt_count..
do_umount()
{
:
:
»·······/*
»······· * If we may have to abort operations to get out of this
»······· * mount, and they will themselves hold resources we must
»······· * allow the fs to do things. In the Unix tradition of
»······· * 'Gee thats tricky lets do it in userspace' the umount_begin
»······· * might fail to complete on the first run through as other tasks
»······· * must return, and the like. Thats for the mount program to worry
»······· * about for the moment.
»······· */
»·······if (flags & MNT_FORCE && sb->s_op->umount_begin) {
»·······»·······sb->s_op->umount_begin(sb); ------> ll_umount_begin()
»·······}
:
:
»·······retval = -EBUSY;
»·······if (flags & MNT_DETACH || !propagate_mount_busy(mnt, 2)) { <----- If mnt_count not dec then retval = -EBUSY
»·······»·······if (!list_empty(&mnt->mnt_list))
»·······»·······»·······umount_tree(mnt, 1, &umount_list);
»·······»·······retval = 0;
»·······}
:
return retval;
}
So one of possible place to handle this is ll_umount_begin() were we can wait ? ll_umount_begin()
{
:
:
»·······/* Really, we'd like to wait until there are no requests outstanding,
»······· * and then continue. For now, we just invalidate the requests,
»······· * schedule() and sleep one second if needed, and hope.
»······· */
»·······schedule(); <===========
»·······EXIT;
}
|
| Comment by Andreas Dilger [ 17/Apr/16 ] |
|
That seems reasonable. The question is what should be waited on here? Is it safe to have an interruptible wait on the mount count? If the statfs is going to finish in some reasonable time we could just wait. The question is if it is some other reason why the mount count is elevated (e.g. open file in the filesystem used by a process) then the unmount could hang indefinitely, so we don't want to wait forever. Maybe 5-10s, checking every 0.5s should be enough?. |
| Comment by Gerrit Updater [ 09/May/16 ] |
|
Rahul Deshmukh (rahul.deshmukh@seagate.com) uploaded a new patch: http://review.whamcloud.com/20061 |
| Comment by Rahul Deshmukh (Inactive) [ 13/Jul/16 ] |
|
Can some one review the patch ? |
| Comment by Jian Yu [ 14/Jul/16 ] |
|
Hi Rahul, While you updating the patch to address the review comments from Andreas, could you please incorporate the changes from http://review.whamcloud.com/15702 into the patch? Thank you. |
| Comment by Rahul Deshmukh (Inactive) [ 18/Jul/16 ] |
Sure, will do that. |
| Comment by Lokesh Nagappa Jaliminche (Inactive) [ 08/Sep/16 ] |
|
Problem with previous patch is it was unable to get vfsmount from client_common_fill_super |
| Comment by Gerrit Updater [ 08/Oct/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/20061/ |
| Comment by Peter Jones [ 08/Oct/16 ] |
|
Landed for 2.9 |