[LU-6116] file system can not be umounted after racer. Created: 14/Jan/15 Updated: 15/Jan/15 Resolved: 14/Jan/15 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.7.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Di Wang | Assignee: | WC Triage |
| Resolution: | Not a Bug | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 17037 |
| Description |
|
I tried to run racer with single MDT, and the system can not be umounted after racer. [root@testnode tests]# MDSCOUNT=4 sh llmountcleanup.sh Stopping clients: testnode /mnt/lustre (opts:-f) Stopping client testnode /mnt/lustre opts:-f /mnt/lustre is still busy, wait one second /mnt/lustre is still busy, wait one second /mnt/lustre is still busy, wait one second /mnt/lustre is still busy, wait one second /mnt/lustre is still busy, wait one second /mnt/lustre is still busy, wait one second /mnt/lustre is still busy, wait one second /mnt/lustre is still busy, wait one second /mnt/lustre is still busy, wait one second /mnt/lustre is still busy, wait one second /mnt/lustre is still busy, wait one second /mnt/lustre is still busy, wait one second /mnt/lustre is still busy, wait one second /mnt/lustre is still busy, wait one second /mnt/lustre is still busy, wait one second /mnt/lustre is still busy, wait one second /mnt/lustre is still busy, wait one second /mnt/lustre is still busy, wait one second /mnt/lustre is still busy, wait one second /mnt/lustre is still busy, wait one second /mnt/lustre is still busy, wait one second /mnt/lustre is still busy, wait one second |
| Comments |
| Comment by Jinshan Xiong (Inactive) [ 14/Jan/15 ] |
|
did you check if there are lot of process sleeping after racer run? In that case, you may hit Are you working on |
| Comment by Di Wang [ 14/Jan/15 ] |
|
No, I did not see any threads left. Yes, 6088, 4712 and several other tickets found during racer. |
| Comment by John Hammond [ 14/Jan/15 ] |
|
You used MDSCOUNT=4 above. Are you sure about the single MDT? There was a bug which leaked a mount reference from sys_link() in one of the RHEL 6.* kernels. What kernel version are you using? |
| Comment by Oleg Drokin [ 14/Jan/15 ] |
|
I run single mdt configuration all the time in a loop and have zero problems in unmount. |
| Comment by Di Wang [ 14/Jan/15 ] |
|
Sorry, I posted the wrong console message here. But I did try racer with single MDT to see if this is specially for DNE, and I did see this on single MDT as well. I used linux-2.6.32-431.3.1.el6.x86_64. |
| Comment by Di Wang [ 14/Jan/15 ] |
|
Though I run racer with this change as John suggested. Anyway I will check again. diff --git a/lustre/tests/racer/file_create.sh b/lustre/tests/racer/file_create.sh
index e615365..8d565a4 100755
--- a/lustre/tests/racer/file_create.sh
+++ b/lustre/tests/racer/file_create.sh
@@ -9,7 +9,8 @@ OSTCOUNT=${OSTCOUNT:-$(lfs df $DIR 2> /dev/null | grep -c OST)}
while /bin/true ; do
file=$((RANDOM % MAX))
# $RANDOM is between 0 and 32767, and we want $blockcount in 64kB units
- blockcount=$((RANDOM * MAX_MB / 32 / 64))
+# blockcount=$((RANDOM * MAX_MB / 32 / 64))
+ blockcount=$((RANDOM % 4))
stripecount=$((RANDOM % (OSTCOUNT + 1)))
[ $OSTCOUNT -gt 0 ] &&
lfs setstripe -c $stripecount $DIR/$file 2> /dev/null
diff --git a/lustre/tests/racer/racer.sh b/lustre/tests/racer/racer.sh
index deef18e..bb9b4ac 100755
--- a/lustre/tests/racer/racer.sh
+++ b/lustre/tests/racer/racer.sh
@@ -12,12 +12,11 @@ NUM_THREADS=${NUM_THREADS:-3}
mkdir -p $DIR
-RACER_PROGS="file_create dir_create file_rm file_rename file_link file_symlink \
-file_list file_concat file_exec file_chown file_chmod file_mknod file_truncate \
-file_delxattr file_getxattr file_setxattr"
+RACER_PROGS="file_create dir_create file_rm file_link file_symlink \
+file_list file_concat file_exec file_chown file_chmod file_mknod file_truncate"
if [ $MDSCOUNT -gt 1 ]; then
- RACER_PROGS="${RACER_PROGS} dir_remote dir_migrate"
+ RACER_PROGS="${RACER_PROGS} dir_remote"
fi
racer_cleanup()
|
| Comment by John Hammond [ 14/Jan/15 ] |
|
Try removing file_link from RACER_PROGS. |
| Comment by John Hammond [ 14/Jan/15 ] |
|
2.6.32-431.3.1.el6 leaks paths in the -ESTALE retry case in sys_unlinkat(). Please update your kernel and reopen if you still see this behavior. |
| Comment by Di Wang [ 15/Jan/15 ] |
|
Yes, upgrading to 2.6.32-504.3.3.el6 fix this problem. Thanks. |