[LU-6116] file system can not be umounted after racer. Created: 14/Jan/15  Updated: 15/Jan/15  Resolved: 14/Jan/15

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Di Wang Assignee: WC Triage
Resolution: Not a Bug Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 17037

 Description   

I tried to run racer with single MDT, and the system can not be umounted after racer.

[root@testnode tests]# MDSCOUNT=4 sh llmountcleanup.sh 
Stopping clients: testnode /mnt/lustre (opts:-f)
Stopping client testnode /mnt/lustre opts:-f
/mnt/lustre is still busy, wait one second
/mnt/lustre is still busy, wait one second
/mnt/lustre is still busy, wait one second
/mnt/lustre is still busy, wait one second
/mnt/lustre is still busy, wait one second
/mnt/lustre is still busy, wait one second
/mnt/lustre is still busy, wait one second
/mnt/lustre is still busy, wait one second
/mnt/lustre is still busy, wait one second
/mnt/lustre is still busy, wait one second
/mnt/lustre is still busy, wait one second
/mnt/lustre is still busy, wait one second
/mnt/lustre is still busy, wait one second
/mnt/lustre is still busy, wait one second
/mnt/lustre is still busy, wait one second
/mnt/lustre is still busy, wait one second
/mnt/lustre is still busy, wait one second
/mnt/lustre is still busy, wait one second
/mnt/lustre is still busy, wait one second
/mnt/lustre is still busy, wait one second
/mnt/lustre is still busy, wait one second
/mnt/lustre is still busy, wait one second


 Comments   
Comment by Jinshan Xiong (Inactive) [ 14/Jan/15 ]

did you check if there are lot of process sleeping after racer run? In that case, you may hit LU-6095.

Are you working on LU-6088?

Comment by Di Wang [ 14/Jan/15 ]

No, I did not see any threads left. Yes, 6088, 4712 and several other tickets found during racer.

Comment by John Hammond [ 14/Jan/15 ]

You used MDSCOUNT=4 above. Are you sure about the single MDT?

There was a bug which leaked a mount reference from sys_link() in one of the RHEL 6.* kernels. What kernel version are you using?

Comment by Oleg Drokin [ 14/Jan/15 ]

I run single mdt configuration all the time in a loop and have zero problems in unmount.
I am also suspicious of MDSCOUNT=4

Comment by Di Wang [ 14/Jan/15 ]

Sorry, I posted the wrong console message here. But I did try racer with single MDT to see if this is specially for DNE, and I did see this on single MDT as well. I used linux-2.6.32-431.3.1.el6.x86_64.

Comment by Di Wang [ 14/Jan/15 ]

Though I run racer with this change as John suggested. Anyway I will check again.

diff --git a/lustre/tests/racer/file_create.sh b/lustre/tests/racer/file_create.sh
index e615365..8d565a4 100755
--- a/lustre/tests/racer/file_create.sh
+++ b/lustre/tests/racer/file_create.sh
@@ -9,7 +9,8 @@ OSTCOUNT=${OSTCOUNT:-$(lfs df $DIR 2> /dev/null | grep -c OST)}
 while /bin/true ; do
        file=$((RANDOM % MAX))
        # $RANDOM is between 0 and 32767, and we want $blockcount in 64kB units
-       blockcount=$((RANDOM * MAX_MB / 32 / 64))
+#      blockcount=$((RANDOM * MAX_MB / 32 / 64))
+       blockcount=$((RANDOM % 4))
        stripecount=$((RANDOM % (OSTCOUNT + 1)))
        [ $OSTCOUNT -gt 0 ] &&
                lfs setstripe -c $stripecount $DIR/$file 2> /dev/null
diff --git a/lustre/tests/racer/racer.sh b/lustre/tests/racer/racer.sh
index deef18e..bb9b4ac 100755
--- a/lustre/tests/racer/racer.sh
+++ b/lustre/tests/racer/racer.sh
@@ -12,12 +12,11 @@ NUM_THREADS=${NUM_THREADS:-3}

 mkdir -p $DIR

-RACER_PROGS="file_create dir_create file_rm file_rename file_link file_symlink \
-file_list file_concat file_exec file_chown file_chmod file_mknod file_truncate \
-file_delxattr file_getxattr file_setxattr"
+RACER_PROGS="file_create dir_create file_rm file_link file_symlink \
+file_list file_concat file_exec file_chown file_chmod file_mknod file_truncate"

 if [ $MDSCOUNT -gt 1 ]; then
-       RACER_PROGS="${RACER_PROGS} dir_remote dir_migrate"
+       RACER_PROGS="${RACER_PROGS} dir_remote"
 fi

 racer_cleanup()
Comment by John Hammond [ 14/Jan/15 ]

Try removing file_link from RACER_PROGS.

Comment by John Hammond [ 14/Jan/15 ]

2.6.32-431.3.1.el6 leaks paths in the -ESTALE retry case in sys_unlinkat(). Please update your kernel and reopen if you still see this behavior.

Comment by Di Wang [ 15/Jan/15 ]

Yes, upgrading to 2.6.32-504.3.3.el6 fix this problem. Thanks.

Generated at Sat Feb 10 01:57:21 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.