Details
Description
Hi,
Test setup : 1 Single Scientific Linux VM with memory: 1 GB and
Disk space of 50 GB, this VM has all Lustre components configured in it : i.e ,
===========
1 MDS,
2 MDTs
2 OSTs
and a client.
============================================
Note : HA is not configured for the OSTs at the backend.
All the MDTs and OSTs are created on Loop devices.
=============================================
- lfs df -h output before I started the test :
=====================================================
[root@localhost lustre]# lfs df -h
UUID bytes Used Available Use% Mounted on
lustre-MDT0000_UUID 7.2G 435.8M 6.2G 6% /mnt/lustre[MDT:0]
lustre-MDT0001_UUID 9.0G 536.8M 7.9G 6% /mnt/lustre[MDT:1]
lustre-OST0000_UUID 14.9G 441.2M 13.7G 3% /mnt/lustre[OST:0]
lustre-OST0001_UUID 14.9G 441.2M 13.7G 3% /mnt/lustre[OST:1]
filesystem summary: 29.9G 882.5M 27.5G 3% /mnt/lustre
=========================================================
- mount command output :
=====================
/dev/mapper/VolGroup-lv_root on / type ext4 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw)
/dev/sda1 on /boot type ext4 (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
/dev/loop0 on /mnt/mds1 type lustre (rw,loop=/dev/loop0)
/dev/loop1 on /mnt/ost1 type lustre (rw,loop=/dev/loop1)
/dev/loop2 on /mnt/ost2 type lustre (rw,loop=/dev/loop2)
localhost@tcp:/lustre on /mnt/lustre type lustre (rw,user_xattr,flock)
/dev/loop7 on /mnt/mds2 type lustre (rw)
==================================================
- Below are the steps to reproduce the issue :
1.mounted Lustre filesystem on a client, executing the script
</lustre/tests/llmount.sh >
2. checked lfs df -h command output. All were fine , nicely displaying the
MDTs/OSTs.
3. Now from client manually unmount/offline the device on which OST1 is
configured.
4. Type the command lfs df -h on the client, it hangs.
5. /var/log/messages or dmesg continuously prints messages "LustreError: : lustre-OST0000_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server.
6. The command <lfs df -h> should come out of the loop and throw some error message at the user space printing OST is unavailable , or user/client should be not allowed to unmount OST just by typing simple unix unmount command. If it is allowed then error condition should be handled .
======================
- when lfs df -h was stuck :
========================
[root@localhost lustre]# lfs df -h
UUID bytes Used Available Use% Mounted on
lustre-MDT0000_UUID 7.2G 435.8M 6.2G 6% /mnt/lustre[MDT:0]
lustre-MDT0001_UUID 9.0G 536.8M 7.9G 6% /mnt/lustre[MDT:1]
*********************HUNG**************************************
Attaching /var/log/messages and dmesg
===================================
Attachments
Issue Links
- is related to
-
LU-8544 recovery-double-scale test_pairwise_fail: start client on trevis-54vm5 failed
- Resolved