Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4638

racer test hung: /mnt/lustre is still busy, wait one second

Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • None
    • Lustre 2.5.1
    • None

    • Lustre Build: http://build.whamcloud.com/job/lustre-b2_5/26/
      Distro/Arch: RHEL6.5/x86_64 (kernel version: 2.6.32-431.3.1.el6)
    • 3
    • 12688

    Description

      While running racer test, it hung as follows:

      Stopping client client-24vm1.lab.whamcloud.com /mnt/lustre opts:-f
      Stopping client client-24vm2.lab.whamcloud.com /mnt/lustre opts:-f
      COMMAND   PID USER   FD   TYPE      DEVICE  SIZE/OFF               NODE NAME
      cat     12523 root    1w   REG 1273,181606   3146752 144115205306064232 /mnt/lustre/racer/5
      cat     12523 root    3r   REG 1273,181606    482309 144115205306064282 /mnt/lustre/racer/6/6/6
      dd      16254 root    1w   REG 1273,181606 209190912 144115205255734078 /mnt/lustre/racer/15
      dd      19402 root    1w   REG 1273,181606 240656384 144115205289279515 /mnt/lustre2/racer/3
      dd      19523 root    1w   REG 1273,181606 151068672 144115205255734554 /mnt/lustre/racer/12 (deleted)
      dd      19757 root    1w   REG 1273,181606 142558208 144115205306065539 /mnt/lustre/racer/4
      dd      31064 root    1w   REG 1273,181606 188440576 144115205272508837 /mnt/lustre2/racer/7/14
      COMMAND   PID USER   FD   TYPE      DEVICE  SIZE/OFF               NODE NAME
      dd       3063 root    1w   REG 1273,181606   3146752 144115205306064232 /mnt/lustre/racer/14
      cat      3485 root    1w   REG 1273,181606    482309 144115205306064282 /mnt/lustre/racer/6/6/6 (deleted)
      cat      3485 root    3r   REG 1273,181606   3149824 144115205306064232 /mnt/lustre/racer/5
      dd      14671 root    1w   REG 1273,181606 240656384 144115205289279515 /mnt/lustre/racer/3
      dd      16043 root    1w   REG 1273,181606  93643776 144115205289289486 /mnt/lustre2/racer/2
      dd      17535 root    1w   REG 1273,181606 105392128 144115205272513127 /mnt/lustre/racer/11
      dd      31434 root    1w   REG 1273,181606 263391232 144115205272504946 /mnt/lustre2/racer/14
      /mnt/lustre is still busy, wait one second
      /mnt/lustre is still busy, wait one second
      /mnt/lustre is still busy, wait one second
      /mnt/lustre is still busy, wait one second
      

      Maloo report: https://maloo.whamcloud.com/test_sets/18534f84-977b-11e3-b941-52540035b04c

      Attachments

        Issue Links

          Activity

            [LU-4638] racer test hung: /mnt/lustre is still busy, wait one second
            adilger Andreas Dilger made changes -
            Link New: This issue is duplicated by LDEV-403 [ LDEV-403 ]
            pjones Peter Jones made changes -
            Resolution New: Duplicate [ 3 ]
            Status Original: Reopened [ 4 ] New: Resolved [ 5 ]
            pjones Peter Jones added a comment -

            This is fixed by the latest RHEL6.5 update - LU-5025

            pjones Peter Jones added a comment - This is fixed by the latest RHEL6.5 update - LU-5025
            pjones Peter Jones made changes -
            Labels Original: mq214

            So far it looks like running racer on an unpatched new version el6 kernel does in fact work. Not seeing the racer hang reported in this bug. Will run it a few more times to accumulate more evidence, but I think the bug fix in the kernel really does fix the problem as we expected it to do.

            bogl Bob Glossman (Inactive) added a comment - So far it looks like running racer on an unpatched new version el6 kernel does in fact work. Not seeing the racer hang reported in this bug. Will run it a few more times to accumulate more evidence, but I think the bug fix in the kernel really does fix the problem as we expected it to do.
            bogl Bob Glossman (Inactive) made changes -
            Link New: This issue is related to LU-5025 [ LU-5025 ]

            The brand new kernel update in LU-5025 includes the upstream kernel fix we've been waiting for. It should fix this problem once and for all in both clients and servers. No kernel patching required.

            bogl Bob Glossman (Inactive) added a comment - The brand new kernel update in LU-5025 includes the upstream kernel fix we've been waiting for. It should fix this problem once and for all in both clients and servers. No kernel patching required.
            pjones Peter Jones made changes -
            Link New: This issue is related to AC-3 [ AC-3 ]

            As mentioned in previous comments this is entirely due to a linux bug. It will continue to happen when using lustre client builds until the known fix is in an upstream linux release and we start building against it.

            bogl Bob Glossman (Inactive) added a comment - As mentioned in previous comments this is entirely due to a linux bug. It will continue to happen when using lustre client builds until the known fix is in an upstream linux release and we start building against it.
            yujian Jian Yu made changes -
            Labels Original: mq114 New: mq214

            People

              bogl Bob Glossman (Inactive)
              yujian Jian Yu
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: