[LU-430] Issues with mount.lustre and automounter Created: 18/Jun/11  Updated: 04/Feb/15  Resolved: 04/Feb/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 1.8.6
Fix Version/s: None

Type: Bug Priority: Trivial
Reporter: Lukasz Flis Assignee: WC Triage
Resolution: Fixed Votes: 0
Labels: None
Environment:

Lustre, RHEL5.6, Automounter


Severity: 3
Rank (Obsolete): 10417

 Description   

We are using automounter to mount some of our lustre filesystems on workernodes around the cluster.
These filesystems get unmounted after a long period of inactivity.

The problem i'd like to point is not frequent but may affect other users as well.
In some cases, when fs gets unmounted, the related entry is not removed from /etc/mtab file.
This leads to the situation when automounter is unable to mount lustre again.

Part of strace log from automount daemon:

...
[pid 19154] execve("/sbin/mount.lustre", ["/sbin/mount.lustre", "10.8.1.101:/scratch", "/mnt/auto/scratch-lustre", "-f", "-o", "rw,nosuid,nodev,localflock"], [/* 14 vars */]) = 0

...
[pid 19154] write(2, "mount.lustre: according to /etc/mtab 10.8.1.101:/scratch is already mounted on /mnt/auto/scratch-lustre\n", 104) = 104
...

To make mounting possible again, the related entry needs to be removed from /etc/mtab
I am not sure which part of the lustre-automount pair is mis-behaving here.
Is it automounter not removing the entry from /etc/mtab or mount.lustre ifself not checking
mount status in /proc/mounts?

More details:

[root@n2-1-1 ~]# grep -e lustre /etc/mtab
10.8.1.101:/scratch /mnt/auto/scratch-lustre lustre rw,nosuid,nodev,localflock 0 0
10.8.1.101:/storage /mnt/auto/storage-lustre lustre rw,nosuid,nodev,localflock 0 0
172.16.193.1@o2ib:/scratch /mnt/lustre/scratch lustre rw,nosuid,nodev,user_xattr,flock,acl,user_xattr,flock,acl 0 0

[root@n2-1-1 ~]# grep -e lustre /proc/mounts
10.8.1.101@tcp:/storage /mnt/auto/storage-lustre lustre rw,nosuid,nodev,localflock,acl 0 0
172.16.193.1@o2ib:/scratch /mnt/lustre/scratch lustre rw,nosuid,nodev,flock,acl 0 0

Best Regards

Lukasz Flis
ACC Cyfronet



 Comments   
Comment by Peter Jones [ 19/Jun/11 ]

Lukasz

Thanks for your submission. Could you please clarify about the release you are running. You have selected 1.8.6 as the Lustre version this occurs on. What is your source for this release? Also, did this issue also occur on earlier 1.8.x releases or is it a regression since you upgraded?

Regards

Peter

Comment by Lukasz Flis [ 19/Jun/11 ]

Peter,

Thank you for quick reply
The problem i've described was known for us since we're using lustre.

I remember that all lustre versions from 1.8.x are known to have this issue here so it doesn't look like a regression to me.
We're not sure how about 1.6.x as we have never used it with automounter.

The most recent version we have right now is:
1.8.5.56-2.cyfronet.2.6.18_238.12.1.el5
this is our internal rpm package compiled from b1_8 from git with LU-376 patches applied just before they landed in a branch

Regards

Lukasz

Comment by Peter Jones [ 19/Jun/11 ]

Lukasz

Ah, thanks for clarifying about the code that you are running with. With the information that you have supplied, I do not think that this warrants being a blocker for 1.8.6-wc (which is in release testing) but is something that we could consider fixing for a future release.

Peter

Comment by Andreas Dilger [ 04/Feb/15 ]

Apparently there was a bug in automount for RHEL5 that was fixed in RHEL5.7:

https://bugzilla.redhat.com/show_bug.cgi?id=520745
https://bugzilla.redhat.com/show_bug.cgi?id=632006
The autofs utility failed to mount Lustre metadata target (MDT) failover mounts because it could not understand the mount point syntax. With this update, the mount point syntax is processed correctly and the failover is mounted as expected.

Generated at Sat Feb 10 01:06:57 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.