Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6610

lfs df -h query hangs when OST1 is unmounted/offline manually

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.7.0
    • None
    • Scientific Linux release 6.6 (Carbon)
    • 3
    • Test Infrastructure
    • 9223372036854775807

    Description

      Hi,

      Test setup : 1 Single Scientific Linux VM with memory: 1 GB and
      Disk space of 50 GB, this VM has all Lustre components configured in it : i.e ,
      ===========
      1 MDS,
      2 MDTs
      2 OSTs
      and a client.
      ============================================
      Note : HA is not configured for the OSTs at the backend.
      All the MDTs and OSTs are created on Loop devices.
      =============================================

      1. lfs df -h output before I started the test :
        =====================================================
        [root@localhost lustre]# lfs df -h
        UUID bytes Used Available Use% Mounted on
        lustre-MDT0000_UUID 7.2G 435.8M 6.2G 6% /mnt/lustre[MDT:0]
        lustre-MDT0001_UUID 9.0G 536.8M 7.9G 6% /mnt/lustre[MDT:1]
        lustre-OST0000_UUID 14.9G 441.2M 13.7G 3% /mnt/lustre[OST:0]
        lustre-OST0001_UUID 14.9G 441.2M 13.7G 3% /mnt/lustre[OST:1]

      filesystem summary: 29.9G 882.5M 27.5G 3% /mnt/lustre
      =========================================================

      1. mount command output :
        =====================
        /dev/mapper/VolGroup-lv_root on / type ext4 (rw)
        proc on /proc type proc (rw)
        sysfs on /sys type sysfs (rw)
        devpts on /dev/pts type devpts (rw,gid=5,mode=620)
        tmpfs on /dev/shm type tmpfs (rw)
        /dev/sda1 on /boot type ext4 (rw)
        none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
        sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
        /dev/loop0 on /mnt/mds1 type lustre (rw,loop=/dev/loop0)
        /dev/loop1 on /mnt/ost1 type lustre (rw,loop=/dev/loop1)
        /dev/loop2 on /mnt/ost2 type lustre (rw,loop=/dev/loop2)
        localhost@tcp:/lustre on /mnt/lustre type lustre (rw,user_xattr,flock)
        /dev/loop7 on /mnt/mds2 type lustre (rw)
        ==================================================
      1. Below are the steps to reproduce the issue :

      1.mounted Lustre filesystem on a client, executing the script
      </lustre/tests/llmount.sh >
      2. checked lfs df -h command output. All were fine , nicely displaying the
      MDTs/OSTs.
      3. Now from client manually unmount/offline the device on which OST1 is
      configured.
      4. Type the command lfs df -h on the client, it hangs.
      5. /var/log/messages or dmesg continuously prints messages "LustreError: : lustre-OST0000_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server.

      6. The command <lfs df -h> should come out of the loop and throw some error message at the user space printing OST is unavailable , or user/client should be not allowed to unmount OST just by typing simple unix unmount command. If it is allowed then error condition should be handled .
      ======================

      1. when lfs df -h was stuck :
        ========================
        [root@localhost lustre]# lfs df -h
        UUID bytes Used Available Use% Mounted on
        lustre-MDT0000_UUID 7.2G 435.8M 6.2G 6% /mnt/lustre[MDT:0]
        lustre-MDT0001_UUID 9.0G 536.8M 7.9G 6% /mnt/lustre[MDT:1]

      *********************HUNG**************************************

      Attaching /var/log/messages and dmesg
      ===================================

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              paramitavarma Paramita varma (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: