[LU-5254] readdir missing a directory Created: 25/Jun/14  Updated: 07/Oct/14  Resolved: 16/Jul/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.1
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Jeremy Filizetti Assignee: Nathaniel Clark
Resolution: Cannot Reproduce Votes: 0
Labels: llite, llnl, zfs
Environment:

Lustre with ZFS for MDT and OSTs


Attachments: File ROOT.zdb.dump    
Issue Links:
Related
is related to LU-3573 lustre-rsync-test test_8: @@@@@@ FAIL... Resolved
is related to LU-5475 readdir missing a directory Resolved
Epic/Theme: zfs
Severity: 3
Rank (Obsolete): 14658

 Description   

When a directory was copied to the mount point for the lustre file system it does not show up directory listing from an ls or otherwise but the directory exists and is accessible if the path is given directly. Directory in question is called tftpboot below.

[root@r01svr1 hydra60]# ls /lustre/hydra60/tftpboot
memtest  pxelinux.0  pxelinux.cfg
[root@r01svr1 hydra60]# stat /lustre/hydra60/tftpboot
  File: `/lustre/hydra60/tftpboot'
  Size: 5632            Blocks: 11         IO Block: 131072 directory
Device: 680a27c4h/1745496004d   Inode: 144115205641631522  Links: 3
Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2014-06-10 22:06:07.000000000 +0000
Modify: 2014-06-25 00:22:31.000000000 +0000
Change: 2014-06-25 00:22:31.000000000 +0000
[root@r01svr1 hydra60]# lfs path2fid /lustre/hydra60/tftpboot
[0x200000417:0x7722:0x0]
[root@r01svr1 hydra60]# ls /lustre/hydra60/ | grep tftp
[root@r01svr1 hydra60]# ls /lustre/hydra60/ 
af06            mnt      r01svr10  r01svr13  r01svr16  r01svr19  r01svr3  r01svr6  r01svr9  tmp
centos_updates  notes    r01svr11  r01svr14  r01svr17  r01svr2   r01svr4  r01svr7  test2    tmp_striped
misc            r01svr1  r01svr12  r01svr15  r01svr18  r01svr20  r01svr5  r01svr8  test3
[root@r01svr1 hydra60]# 


 Comments   
Comment by Jeremy Filizetti [ 25/Jun/14 ]

zdb dump of the Lustre ROOT directory showing the tftpboot entry

Comment by Jeremy Filizetti [ 25/Jun/14 ]

Just tested this with a Lustre 1.8.9-wc1 client and the tftpboot directory shows up.

[root@r01svr2 ~]# ls /lustre/hydra60
af06            misc  notes    r01svr10  r01svr12  r01svr14  r01svr16  r01svr18  r01svr2   r01svr3  r01svr5  r01svr7  r01svr9  test3     tmp
centos_updates  mnt   r01svr1  r01svr11  r01svr13  r01svr15  r01svr17  r01svr19  r01svr20  r01svr4  r01svr6  r01svr8  test2    tftpboot  tmp_striped
[root@r01svr2 ~]# ls /lustre/hydra60/tftpboot/
memtest  pxelinux.0  pxelinux.cfg
[root@r01svr2 ~]# 
Comment by Peter Jones [ 25/Jun/14 ]

Nathaniel could you please look into this one? Thanks peter

Comment by Andreas Dilger [ 25/Jun/14 ]

This might have been fixed by a patch landed since 2.5.1. Please check git log before spending too much time digging into the code.

Comment by Christopher Morrone [ 30/Jun/14 ]

I believe that we have seen the same bug in production using our 2.4.0-Xchaos branch. In our case, the invisible directory was not at the top-level (mount point) of the Lustre tree, it was deeper in the directory tree. We too are using zfs obds for the filesystem where we saw this.

Comment by Jeremy Filizetti [ 15/Jul/14 ]

Looks like I was testing with 2.5.56 and not version 2.5.2 like I thought was installed there. 2.5.2 did not have the issue and I also tested 2.6.50 and it also worked so I think we can close this ticket.

Comment by Christopher Morrone [ 15/Jul/14 ]

How do you know the problem is gone? Did you have a very reliable reproducer with 2.5.56?

For us at least the problem is racy, and I don't know how to reproduce it on demand.

Comment by Christopher Morrone [ 15/Jul/14 ]

I'm reopening. We have seen this in production. If you want to argue that the LLNL production instance is unrelated and needs a separate ticket, I suppose you can do that. But you need to convey in words what you are thinking in that case.

Comment by Jeremy Filizetti [ 15/Jul/14 ]

This issue in particular was consistent for the client. The directory entry was either there or not, so the problem wasn't a race condition. The information I posted doesn't likely pertain to what you are seeing at LLNL so a new bug may be appropriate but I'm fine either way.

Comment by Nathaniel Clark [ 15/Jul/14 ]

I could not reproduce this with Lustre 2.5.1 (zfs 0.6.2) nor 2.4.3 (zfs 0.6.1) with a basic copy test.

Comment by Andreas Dilger [ 16/Jul/14 ]

Chris, I don't think that keeping this issue open with Jeremy's information would help diagnose whatever problem you are seeing with your system, since you are running quite different versions of the code. The client-side directory handling changed a bunch between 2.5.53 and 2.5.60 because of striped directories and there several bugs fixed during its development that was fixed in a later release.

I think it is better to keep this bug separate from yours, since I suspect they have different root causes, though you can of course reference this one in your bug if you think that is helpful.

Generated at Sat Feb 10 01:49:52 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.