Loading...

XML

Word

Printable

Type: Bug
Resolution: Cannot Reproduce
Priority: Major
Fix Version/s: None
Affects Version/s: Lustre 2.4.0, Lustre 2.1.4
Labels:
- llnl
Environment:
Lustre 2.1.4-5chaos on client, Lustre 2.1.4-5chaos on ldiskfs servers, Lustre 2.4.0-15chaos on zfs servers

Severity:
3
Rank (Obsolete):
10483

With lustre 2.1.4-5chaos, we are finding that clients are not honoring umount correctly. The sysadmins are using the normal "umount" command with no additional options, and it returns relatively quickly.

Linux no longer has a record of the mount in /proc/mounts after the command returns, and the mount point (/p/lscratchrza) appears to be empty. However the directory clearly still has a reference and cannot be removed:

# rzzeus26 /p > ls -la lscratchrza
total 0
drwxr-xr-x 2 root root  40 Aug 13 11:24 .
drwxr-xr-x 4 root root 140 Aug 13 11:24 ..
# rzzeus26 /p > rmdir lscratchrza
rmdir: failed to remove `lscratchrza': Device or resource busy

When we look in /proc/fs/lustre it is clear that most, if not all, objects for this filesystem are still present in llite, osc, mdc, ldlm/namespace, etc.

The sysadmins issued the "umount /p/lscratchrza" command at around 9:42am, but this message did not appear on one of the nodes until over five hours later:

2013-09-13 15:18:11 Lustre: Unmounted lsa-client

So there appear to be at least two problems here

umount is taking far too long
umount for lustre is not blocking until umount is complete (it is exhibiting umount "lazy" behavior)

I should note that this lustre client node is mounting two lustre filesystems, and only one was being umounted. I don't know if it is significant yet, but the servers that we were trying to umount are running Lustre 2.1.4-5chaos with ldiskfs, and servers for the other filesystem are running Lustre 2.4.0-15chaos with zfs.

I did not seem to be able to speed up the umount process by running the sync command, or "echo 3 > /proc/sys/vm/drop_caches".

I did a "foreach bt" under crash, but I don't see any processes that are obviously stuck sleeping in umount related call paths.

Real user applications are running on the client nodes while the umounts are going on. "lsof" does not list any usage under /p/lscratchrza (the filesystem that we are trying to unmount).

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

iwc175.console.txt
10 kB
10/Jul/14 3:54 PM
iwc175.dump.txt.gz
5.65 MB
10/Jul/14 3:54 PM

Assignee:: Hongchao Zhang

Reporter:: Christopher Morrone (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Created:: 13/Sep/13 11:46 PM

Updated:: 13/Oct/21 3:14 AM

Resolved:: 13/Oct/21 3:14 AM

Details

Description

Attachments

Attachments

Activity

People

Dates