[LU-4084] ll_inode_revalidate_fini()) failure -13 Created: 10/Oct/13 Updated: 16/Nov/13 Resolved: 16/Nov/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Story | Priority: | Major |
| Reporter: | Patrick Chan (Inactive) | Assignee: | Lai Siyao |
| Resolution: | Not a Bug | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Lustre 2.1.5 on CentOS 6.3 |
||
| Rank (Obsolete): | 10976 |
| Description |
|
Our Lustre client reported the following errors continuously: Oct 10 16:06:13 avmmst1a kernel: Lustre: Mounted modelfs-client We've googled but no hint. What's the cause of this? Thanks. Regards, |
| Comments |
| Comment by Lai Siyao [ 10/Oct/13 ] |
|
Could you check messages on MDS, is there similar error reported? |
| Comment by Bruno Faccini (Inactive) [ 10/Oct/13 ] |
|
Could you also check if Clients and Servers share the same UIDs/GIDs databases ?? |
| Comment by Patrick Chan (Inactive) [ 11/Oct/13 ] |
|
Lai Siyao, There is no logs in MDS. Bruno, MDS is also a NIS server, both servers and clients use the same NIS server. Patrick |
| Comment by Lai Siyao [ 11/Oct/13 ] |
|
The next time you see this error, could you use `lctl dk` on MDS to dump debug logs right after that? |
| Comment by Patrick Chan (Inactive) [ 15/Oct/13 ] |
|
I run 'lctl dk' on MDS, the file lustre_debug.log.gz was uploaded to ftp.whamcloud.com/uploads. By the way, I try to find out if the culprit is the inode 159383553. I perform a full scan on the inode 159383553, but can't find any file in the whole lustre file system. client> find /lustre_mnt_point -inum 159383553 |
| Comment by Lai Siyao [ 15/Oct/13 ] |
|
I don't find any error messages in this log, which means -13 (-EACCES) is not from disk filesystem, because mdt_getattr_internal() will print an error if it gets attr from disk fails. Did you dump the debug logs right after you see this failure? Because debug log size is limited, and it only contains the most recent logs. To further understand this failure, could you enable more debug on MDS with `lctl set_param debug=+inode` and `lctl set_param debug=+trace` which will enable debug for inode access and function trace. And you can use `lctl get_param debug_mb` and `lctl set_param debug_mb=<debug_size>` to check and increase debug memory size. You can also dump debug logs on client, which may help you find the file name. |
| Comment by Patrick Chan (Inactive) [ 15/Oct/13 ] |
|
The 'll_inode_revalidate_fini())' failure message appears occasionally, I need to watch /var/log/messages on client and capture the debug buffer on MDS promptly. The log file lustre_debug2.log.gz is uploaded. This is the debug buffer generated about 10 seconds after 'll_inode_revalidate_fini()) failure' appears on client. As you suggested, I've add inode and trace into debug parameter, debug buffer size is increased to 512MB. |
| Comment by Patrick Chan (Inactive) [ 15/Oct/13 ] |
|
Just already discovered that both inodes mentioned in the error (159383553 & 452984833) are owned by top level directory of mount point. |
| Comment by Lai Siyao [ 16/Oct/13 ] |
|
It'll be better to do this in a script if system is busy, in case the old logs get discarded. |
| Comment by Patrick Chan (Inactive) [ 16/Oct/13 ] |
|
We've solved the problem. The error is displayed, when Nagios periodically runs the plugin check_disk on lustre client to check the disk capacity of the Lustre file system. However, the user nagios only exist on master node (lustre client). MDS does not have such user. After adding local user nagios on MDS, this error does not appear anymore. |
| Comment by Peter Jones [ 16/Nov/13 ] |
|
That's great! Thanks for letting us know. |