[LU-1908] Test failure on test suite racer: mount busy, vfscount=18 Created: 12/Sep/12  Updated: 29/May/17  Resolved: 29/May/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.3.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Lai Siyao
Resolution: Cannot Reproduce Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 10179

 Description   

This issue was created by maloo for yujian <yujian@whamcloud.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/7944d89e-fcb3-11e1-b09c-52540035b04c.

Lustre Build: http://build.whamcloud.com/job/lustre-b2_3/17

After racer test_1 passed, the whole test hung at the cleanup phase:

Stopping /mnt/ost3 (opts:-f) on client-21
CMD: client-21 umount -d -f /mnt/ost3

Console log on client-21:

11:56:44:Lustre: DEBUG MARKER: grep -c /mnt/ost3' ' /proc/mounts
11:56:44:Lustre: DEBUG MARKER: umount -d -f /mnt/ost3
11:56:44:LustreError: 13768:0:(obd_mount.c:257:server_put_mount()) lustre-OST0002: mount busy, vfscount=18!
11:57:15:Lustre: 13784:0:(client.c:1917:ptlrpc_expire_one_request()) @@@ Request  sent has timed out for slow reply: [sent 1347389810/real 1347389810]  req@ffff88032a38e800 x1412838553052287/t0(0) o250->MGC10.10.4.20@tcp@10.10.4.20@tcp:26/25 lens 400/544 e 0 to 1 dl 1347389826 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
11:57:15:Lustre: 13784:0:(client.c:1917:ptlrpc_expire_one_request()) Skipped 2 previous similar messages
11:57:15:Lustre: Mount still busy with 18 refs after 30 secs.
11:57:46:Lustre: Mount still busy with 18 refs after 60 secs.
11:58:18:Lustre: Mount still busy with 18 refs after 90 secs.


 Comments   
Comment by Jian Yu [ 12/Sep/12 ]

Another instance on b2_3 build #17:
https://maloo.whamcloud.com/test_sets/b6e229d4-fc83-11e1-a4a6-52540035b04c

Comment by Peter Jones [ 13/Sep/12 ]

Lai

Could you please look into this one?

Thanks

Peter

Comment by Lai Siyao [ 14/Sep/12 ]

If I run llmount.sh, then racer.sh and finally llmountcleanup.sh, it can always pass. But if I run racer.sh from beginning, it may fail with this error. Besides, I also observed another failure: mdc device is still referenced at cleanup, so it can not be removed on client.

I'll do more test and check logs.

Comment by Jian Yu [ 15/Sep/12 ]

Lustre Build: http://build.whamcloud.com/job/lustre-b2_3/19

https://maloo.whamcloud.com/test_sets/79405a5a-ff04-11e1-bce0-52540035b04c

Comment by Jian Yu [ 17/Sep/12 ]

Lustre build: http://build.whamcloud.com/job/lustre-b2_3/19
Distro/Arch: SLES11SP1/x86_64(client), RHEL6.3/x86_64(server)

The same issue occurred again: https://maloo.whamcloud.com/test_sets/3be06432-005f-11e2-9f3c-52540035b04c

Comment by Lai Siyao [ 17/Sep/12 ]

I added some debug messages, but it's strange I can't reproduce any longer. I just updated to latest code and try to reproduce again.

Comment by Jian Yu [ 17/Sep/12 ]

Another instance: https://maloo.whamcloud.com/test_sets/1e78a072-ff82-11e1-bce0-52540035b04c

Comment by Jian Yu [ 17/Sep/12 ]

The same issues were reported in LU-1705 before. So we need close one of the two tickets as a duplicate of the other one.

Comment by Lai Siyao [ 19/Sep/12 ]

When I tried to reproduce it, it often failed on some other issue. Some notes for this issue:

  • mount refcount is not zero upon umount, and the extra count is variable.
  • this happens on OST only.

I'll continue the test to collect infomations.

Comment by Lai Siyao [ 21/Sep/12 ]

I tried to reproduce it for one whole day, but couldn't. I'll work on this later.

Comment by Peter Jones [ 21/Sep/12 ]

It seems like we should really drop the priority of this one for now

Comment by Jian Yu [ 08/Oct/12 ]

Lustre Build: http://build.whamcloud.com/job/lustre-b2_3/28
Distro/Arch: RHEL6.3/x86_64
The issue occurred again: https://maloo.whamcloud.com/test_sets/d8ba1f7e-0e50-11e2-91a3-52540035b04c

Comment by Jian Yu [ 10/Oct/12 ]

Lustre Tag: v2_3_0_RC2
Lustre Build: http://build.whamcloud.com/job/lustre-b2_3/32
Distro/Arch: RHEL6.3/x86_64(server), SLES11SP1/x86_64(client)

The issue occurred again: https://maloo.whamcloud.com/test_sets/326933b8-129b-11e2-bd97-52540035b04c

Comment by Jian Yu [ 10/Oct/12 ]

Lustre Tag: v2_3_0_RC2
Lustre Build: http://build.whamcloud.com/job/lustre-b2_3/32
Distro/Arch: RHEL6.3/x86_64
This issue occurred again: https://maloo.whamcloud.com/test_sets/529be552-12a2-11e2-bd97-52540035b04c

Comment by Jian Yu [ 10/Oct/12 ]

Lustre Tag: v2_3_0_RC2
Lustre Build: http://build.whamcloud.com/job/lustre-b2_3/32
Distro/Arch: RHEL6.3/x86_64

The issue occurred again: https://maloo.whamcloud.com/test_sets/50c12122-129b-11e2-a23c-52540035b04c

Comment by Jodi Levi (Inactive) [ 10/Oct/12 ]

Reducing from blocker per Oleg's comments in 2.3 channel.

Comment by Jian Yu [ 15/Oct/12 ]

Lustre Tag: v2_3_0_RC3
Lustre Build: http://build.whamcloud.com/job/lustre-b2_3/36

This issue occurred consistently on both manual and autotest runs:
https://maloo.whamcloud.com/test_sets/9dce3a5a-167d-11e2-80d0-52540035b04c
https://maloo.whamcloud.com/test_sets/edd3d94e-1676-11e2-9c65-52540035b04c
https://maloo.whamcloud.com/test_sets/0fc78996-16af-11e2-962d-52540035b04c
https://maloo.whamcloud.com/test_sets/5974eac8-1698-11e2-afe1-52540035b04c
https://maloo.whamcloud.com/test_sets/a55cdfaa-1697-11e2-962d-52540035b04c

Comment by Andreas Dilger [ 29/May/17 ]

Close old ticket.

Generated at Sat Feb 10 01:20:44 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.