Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.4.0, Lustre 2.1.4
-
None
-
3
-
7394
Description
We are using lustre 2.1.4-3chaos on our server clusters.
Running a test application, one of our archive storage folks discovered that Lustre's directory listings are rather unreliable. The first thing she noticed is that directory entries can appear multiple times:
> cd /p/lscratchrza/apotts/divt_rzstagg0/htar_1st_27475 > find . -type f > ../test.lst0 ; echo $? ; wc -l ../test.lst0 0 34339 ../test.lst0 > find . -type f > ../test.lst1 ; echo $? ; wc -l ../test.lst1 0 35006 ../test.lst1
When the two directory listings are sorted and run through uniq, there are only 34339 unique entries.
One of our sysadmins investigated, and further found that sometimes entry listing are missing altogether. But when the missing files are checked with an ls, they are present.
This has been noticed with the above find command, and also using "/bin/ls -laR .". Both files and subdirectories have appeared twice in the directory listing.
The Lustre clients that have reproduced this behaviour are running 2.1.2-4chaos and 1.8.5.0-6chaos.
Attachments
Activity
Resolution | New: Fixed [ 1 ] | |
Status | Original: Open [ 1 ] | New: Resolved [ 5 ] |
Fix Version/s | New: Lustre 2.1.6 [ 10292 ] |
Labels | Original: mq213 ptr |
Priority | Original: Blocker [ 1 ] | New: Critical [ 2 ] |
Affects Version/s | New: Lustre 2.4.0 [ 10154 ] |
Fix Version/s | New: Lustre 2.4.0 [ 10154 ] |
Labels | Original: ptr | New: mq213 ptr |
Comment |
[ Also, just to speculate some more. If the case Ned pointed to in the patch is occurring with our hash in question of 1502393138, then info->curr_hash will be reset to 0, and info->curr_minor_hash will be 1502393138 (I _think_). This is different than the behavior prior to the change, which would have produced info->curr_hash = 1502393138, and info->curr_minor_hash = 0. Looking at the code, that change would cause this call chain to then occur: {noformat} ldiskfs_dx_readdir: 603 if ((!info->curr_node) || 604 (filp->f_version != inode->i_version)) { 605 info->curr_node = NULL; 606 free_rb_tree_fname(&info->root); 607 filp->f_version = inode->i_version; 608 ret = ldiskfs_htree_fill_tree(filp, info->curr_hash, 609 info->curr_minor_hash, 610 &info->next_hash); ldiskfs_htree_fill_tree: 698 if (start_hash < 2 || (start_hash ==2 && start_minor_hash==0)) { 699 de = (struct ldiskfs_dir_entry_2 *) frames[0].bh->b_data; 700 de = ldiskfs_next_entry(de, dir->i_sb->s_blocksize); 701 if ((err = ldiskfs_htree_store_dirent(dir_file, 2, 0, de)) != 0) 702 goto errout; 703 count++; 704 } {noformat} ] |
Attachment | New: htree_output.txt [ 12498 ] |
Labels | New: ptr |