Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 1.8.6, Lustre 1.8.x (1.8.0 - 1.8.5)
-
CentOS 5 with 1.8.6-WC1 clients and 1.8.4 servers.
-
3
-
4028
Description
During running some tests, we found client hang during tests. Further investigation shows that it is because client is looping for ever in ll_readdir_page(). The first ll_dir_dentry is correct but some of following up ll_dir_dentry record is all NULL.
crash> struct ll_dir_entry 0xffff8105f5818000
struct ll_dir_entry {
lde_inode = 748257583,
lde_rec_len = 12,
lde_name_len = 1 '\001',
lde_file_type = 2 '\002',
lde_name = ".\000\000\000\200\202\230,\f\000\002\002..\000\0000\201\231,\024\000\n\001ssciohb.nrat1\201\231,\020\000\005\001krsni8552\201\231,\020\000\005\001nticpemc3\201\231,\020\000\b\001crn.wole4\201\231,\f\000\003\001lsrt5\201\231,\024\000\n\001loita.hdal2.6\201\231,\024\000\n\001feekg.vsri9\0007\201\231,\024\000\f\001lgaumt.ggesd8\201\231,\024\000\v\001eilltn.ncsr.9\201\231,\020\000\b\001sai.blol:\201\231,\024\000\t\001rnbtmru.sing;\201\231,\020\000\006\001eta.fo64<\201\231,\f\000\003\001rkr.=\201\231,\020\000\005\001aco.d13"
}
crash> struct ll_dir_entry 0xffff8105f5818a19
struct ll_dir_entry {
lde_inode = 0,
lde_rec_len = 0,
lde_name_len = 0 '\0',
lde_file_type = 0 '\0',
lde_name = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
}
After applying bellow changes, tests passes smoothly and the debug message is printed a lot.
diff --git a/lustre/llite/dir.c b/lustre/llite/dir.c
index 3154d32..3b9779b 100644
— a/lustre/llite/dir.c
+++ b/lustre/llite/dir.c
@@ -327,6 +327,12 @@ static int ll_readdir_page(char *addr, __u64 base, unsigned *offset,
de = ll_entry_at(addr, *offset);
end = addr + CFS_PAGE_SIZE - ll_dir_rec_len(1);
for (nr = 0 ;(char*)de <= end; de = ll_dir_next_entry(de)) {
+ if (de->lde_rec_len == 0)
if (de->lde_inode != 0) {
nr++;
*offset = (char *)de - addr;
It may not be the right fix as I didn't figure out why the page is partially zeroed.