Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.4.0, Lustre 2.1.6
    • Lustre 2.4.0, Lustre 2.1.4
    • None
    • 3
    • 7394

    Description

      We are using lustre 2.1.4-3chaos on our server clusters.

      Running a test application, one of our archive storage folks discovered that Lustre's directory listings are rather unreliable. The first thing she noticed is that directory entries can appear multiple times:

      > cd /p/lscratchrza/apotts/divt_rzstagg0/htar_1st_27475
      > find . -type f > ../test.lst0 ; echo $? ; wc -l ../test.lst0
      0
      34339 ../test.lst0
      > find . -type f > ../test.lst1 ; echo $? ; wc -l ../test.lst1
      0
      35006 ../test.lst1
      

      When the two directory listings are sorted and run through uniq, there are only 34339 unique entries.

      One of our sysadmins investigated, and further found that sometimes entry listing are missing altogether. But when the missing files are checked with an ls, they are present.

      This has been noticed with the above find command, and also using "/bin/ls -laR .". Both files and subdirectories have appeared twice in the directory listing.

      The Lustre clients that have reproduced this behaviour are running 2.1.2-4chaos and 1.8.5.0-6chaos.

      Attachments

        Activity

          [LU-3029] Directory listings are unreliable

          It turns out an earlier version of the patch I linked to was carried in the ldiskfs patch series for earlier kernels.

          nedbass Ned Bass (Inactive) added a comment - It turns out an earlier version of the patch I linked to was carried in the ldiskfs patch series for earlier kernels.

          I peeked at filp->f_mode in ldiskfs_readdir() with systemtap. It is always 0x600 on the server where we see this problem, so both FMODE_32BITHASH and FMODE_64BITHASH are set. On a server where we've never seen this bug, it is 0x400, i.e. only FMODE_64BITHASH is set. I'm not sure how it's getting set that way, but that explains why the major hash is always zero.

          nedbass Ned Bass (Inactive) added a comment - I peeked at filp->f_mode in ldiskfs_readdir() with systemtap. It is always 0x600 on the server where we see this problem, so both FMODE_32BITHASH and FMODE_64BITHASH are set. On a server where we've never seen this bug, it is 0x400, i.e. only FMODE_64BITHASH is set. I'm not sure how it's getting set that way, but that explains why the major hash is always zero.

          I noticed this ext4 patch appeared in kernel-2.6.32-343.el6. We first saw this bug after updating to an ldiskfs build with this change.

          http://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=d1f5273e9adb40724a85272f248f210dc4ce919a

          I don't have any direct evidence to implicate it, but given the suspicious 32-bit vs. 64-bit nature of this bug, it looks worth investigating. In particular, the iterator looks like it's being "reset" to .. in the bad case:

          function_dinfo_osd_it_ea_dirent()) LU-3029: osd_it_ea(ffff88011cdee400)->oie_dirent(ffff880229396000) = { .oied_fid = [0x0:0x0:0x0], .oied_ino = 466160119, .oied_off = 4294967296, .oied_namelen = 2, .oied_type = 4, .oied_name = ..
          

          which made these hunks of the patch jump out at me:

          /*@@ -441,13 +552,13 @@ static int ext4_dx_readdir(struct file *filp,
                  int     ret;
           
                  if (!info) {
          -               info = ext4_htree_create_dir_info(filp->f_pos);
          +               info = ext4_htree_create_dir_info(filp, filp->f_pos);
                          if (!info)
                                  return -ENOMEM;
                          filp->private_data = info;
                  }
           
          -       if (filp->f_pos == EXT4_HTREE_EOF)
          +       if (filp->f_pos == ext4_get_htree_eof(filp))
                          return 0;       /* EOF */
           
                  /* Some one has messed with f_pos; reset the world */
          @@ -455,8 +566,8 @@ static int ext4_dx_readdir(struct file *filp,
                          free_rb_tree_fname(&info->root);
                          info->curr_node = NULL;
                          info->extra_fname = NULL;
          -               info->curr_hash = pos2maj_hash(filp->f_pos);
          -               info->curr_minor_hash = pos2min_hash(filp->f_pos);
          +               info->curr_hash = pos2maj_hash(filp, filp->f_pos);
          +               info->curr_minor_hash = pos2min_hash(filp, filp->f_pos);
                  }
           
                  /*
          
          nedbass Ned Bass (Inactive) added a comment - I noticed this ext4 patch appeared in kernel-2.6.32-343.el6. We first saw this bug after updating to an ldiskfs build with this change. http://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=d1f5273e9adb40724a85272f248f210dc4ce919a I don't have any direct evidence to implicate it, but given the suspicious 32-bit vs. 64-bit nature of this bug, it looks worth investigating. In particular, the iterator looks like it's being "reset" to .. in the bad case: function_dinfo_osd_it_ea_dirent()) LU-3029: osd_it_ea(ffff88011cdee400)->oie_dirent(ffff880229396000) = { .oied_fid = [0x0:0x0:0x0], .oied_ino = 466160119, .oied_off = 4294967296, .oied_namelen = 2, .oied_type = 4, .oied_name = .. which made these hunks of the patch jump out at me: /*@@ -441,13 +552,13 @@ static int ext4_dx_readdir(struct file *filp, int ret; if (!info) { - info = ext4_htree_create_dir_info(filp->f_pos); + info = ext4_htree_create_dir_info(filp, filp->f_pos); if (!info) return -ENOMEM; filp->private_data = info; } - if (filp->f_pos == EXT4_HTREE_EOF) + if (filp->f_pos == ext4_get_htree_eof(filp)) return 0; /* EOF */ /* Some one has messed with f_pos; reset the world */ @@ -455,8 +566,8 @@ static int ext4_dx_readdir(struct file *filp, free_rb_tree_fname(&info->root); info->curr_node = NULL; info->extra_fname = NULL; - info->curr_hash = pos2maj_hash(filp->f_pos); - info->curr_minor_hash = pos2min_hash(filp->f_pos); + info->curr_hash = pos2maj_hash(filp, filp->f_pos); + info->curr_minor_hash = pos2min_hash(filp, filp->f_pos); } /*
          prakash Prakash Surya (Inactive) added a comment - - edited

          Sorry for the lack of communication. I've been trying to get the issue to hit with the debug info from the nasf's comment above but haven't reproduced until today.

          Using systemtap I've probed a few different places in the call path of interest. Specifically, the entry and exit points of mdd_readpage, osd_it_ea_load, osd_it_ea_key_size, osd_it_ea_store, osd_it_ea_rec, and osd_it_ea_next.

          The entry point of mdd_readpage now prints:

                                                                                
                  CDEBUG(D_INFO,                                                          
                          "LU-3029: rdpg = %p, hash = "LPU64", attr = %x, count = %u, npages = %u\n",
                          rdpg, rdpg->rp_hash, rdpg->rp_attrs, rdpg->rp_count, rdpg->rp_npages);
          

          The exit point of mdd_readpage prints:

                                                                                
                  CDEBUG(D_INFO,                                                          
                          "LU-3029: rdpg = %p, hash = "LPU64", attr = %x, count = %u, npages = %u\n",
                          rdpg, rdpg->rp_hash, rdpg->rp_attrs, rdpg->rp_count, rdpg->rp_npages);
                                                                                          
                  for (i = 0, nob = rdpg->rp_count; i < rdpg->rp_npages && nob > 0;          
                       i++, nob -= CFS_PAGE_SIZE) {                                       
                          struct lu_dirpage *dp = cfs_kmap(rdpg->rp_pages[i]);            
                          CDEBUG(D_INFO,                                                  
                                  "LU-3029: rdpg: %p, lu_dirpage ldp_hash_start "LPU64       
                                  " ldp_hash_end "LPU64" ldp_flags %u\n",                 
                                  rdpg, le64_to_cpu(dp->ldp_hash_start),                  
                                  le64_to_cpu(dp->ldp_hash_end),                          
                                  le32_to_cpu(dp->ldp_flags));                            
                          cfs_kunmap(rdpg->rp_pages[i]);                                  
                  }                                                                       
          

          And the osd_it_ea_* functions print their trace information and some dump its osd_it_ea structure to D_INFO using:

                                                                                
                  struct osd_it_ea *ea = (struct osd_it_ea *)STAP_ARG_osd_it_ea;          
                  CDEBUG(D_INFO, "LU-3029: "                                              
                          "osd_it_ea(%p) = { .oie_obj = %p, "                             
                                            ".oie_file = {...}, "                         
                                            ".oie_rd_dirent = %i, "                       
                                            ".oie_it_dirent = %i, "                       
                                            ".oie_dirent = %p, "                          
                                            ".oie_buf = %p }\n",                          
                                          ea,                                             
                                          ea->oie_obj,                                    
                                          /*ea->oie_file,*/                               
                                          ea->oie_rd_dirent,                              
                                          ea->oie_it_dirent,                              
                                          ea->oie_dirent,                                 
                                          ea->oie_buf);                                   
          

          and:

                                                                                
                  struct osd_it_ea *ea = (struct osd_it_ea *)STAP_ARG_osd_it_ea;          
                  CDEBUG(D_INFO, "LU-3029: osd_it_ea(%p)->oie_dirent(%p) = "              
                          "{ .oied_fid = "DFID", "                                        
                            ".oied_ino = "LPU64", "                                       
                            ".oied_off = "LPU64", "                                       
                            ".oied_namelen = %i, "                                        
                            ".oied_type = %i, "                                           
                            ".oied_name = %s }\n",                                        
                          ea, ea->oie_dirent,                                             
                          PFID(&ea->oie_dirent->oied_fid),                                
                          ea->oie_dirent->oied_ino,                                       
                          ea->oie_dirent->oied_off,                                       
                          ea->oie_dirent->oied_namelen,                                   
                          ea->oie_dirent->oied_type,                                      
                          ea->oie_dirent->oied_name);                                     
          

          I'll attach the full system tap script and logs to this bug in a follow up comment, but just wanted to give my following debug messages some context.

          So, in the "good" run (readdir produces the correct results), the calls to mdd_readpage look like so on the MDS:

                                                                                
          00000001:00000040:4.0:1366671762.749937:0:4513:0:(stap_27848_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff88082cd89b28, hash = 0, attr = 7, count = 4096, npages = 1 
          00000001:00000040:4.0:1366671762.755066:0:4513:0:(stap_27848_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff88082cd89b28, hash = 225459756, attr = 7, count = 4096, npages = 1 
          00000001:00000040:4.0:1366671762.760068:0:4513:0:(stap_27848_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff88082cd89b28, hash = 466119302, attr = 7, count = 4096, npages = 1 
          00000001:00000040:4.0:1366671762.764754:0:4513:0:(stap_27848_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff88082cd89b28, hash = 678250957, attr = 7, count = 4096, npages = 1 
          00000001:00000040:6.0:1366671762.769742:0:4513:0:(stap_27848_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff88082cd89b28, hash = 885661942, attr = 7, count = 4096, npages = 1 
          00000001:00000040:6.0:1366671762.774808:0:4513:0:(stap_27848_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff88082cd89b28, hash = 1071938997, attr = 7, count = 4096, npages = 1 
          00000001:00000040:4.0:1366671762.779841:0:4513:0:(stap_27848_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff88082cd89b28, hash = 1255106162, attr = 7, count = 4096, npages = 1 
          00000001:00000040:4.0:1366671762.784969:0:4513:0:(stap_27848_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff88082cd89b28, hash = 1502393138, attr = 7, count = 4096, npages = 1 
          00000001:00000040:4.0:1366671762.789711:0:4513:0:(stap_27848_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff88082cd89b28, hash = 1713543192, attr = 7, count = 4096, npages = 1 
          00000001:00000040:4.0:1366671762.794429:0:4513:0:(stap_27848_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff88082cd89b28, hash = 1968026834, attr = 7, count = 4096, npages = 1 
          

          But in the bad case (readdir produces the incorrect results), the calls looks like this:

                                                                                
          00000001:00000040:9.0:1366734926.847245:0:4495:0:(stap_31938_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff88022acfe328, hash = 0, attr = 7, count = 4096, npages = 1 
          00000001:00000040:10.0:1366734926.852728:0:4495:0:(stap_31938_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff88022acfe328, hash = 225459756, attr = 7, count = 4096, npages = 1 
          00000001:00000040:10.0:1366734926.858534:0:4501:0:(stap_31938_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff880403b28b28, hash = 466119302, attr = 7, count = 4096, npages = 1 
          00000001:00000040:11.0:1366734926.862934:0:4501:0:(stap_31938_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff880403b28b28, hash = 678250957, attr = 7, count = 4096, npages = 1 
          00000001:00000040:6.0:1366734926.872498:0:4501:0:(stap_31938_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff880403b28b28, hash = 885661942, attr = 7, count = 4096, npages = 1 
          00000001:00000040:15.0:1366734926.887832:0:4495:0:(stap_31938_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff88022acfe328, hash = 1071938997, attr = 7, count = 4096, npages = 1 
          00000001:00000040:13.0:1366734926.894694:0:4501:0:(stap_31938_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff880403b28b28, hash = 1255106162, attr = 7, count = 4096, npages = 1 
          00000001:00000040:12.0:1366734926.904717:0:31865:0:(stap_31938_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff8801da30fb28, hash = 1502393138, attr = 7, count = 4096, npages = 1 
          00000001:00000040:7.0:1366734926.910386:0:31865:0:(stap_31938_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff8801da30fb28, hash = 972141167607355620, attr = 7, count = 4096, npages = 1 
          00000001:00000040:5.0:1366734926.916303:0:31865:0:(stap_31938_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff8801da30fb28, hash = 2012764699691245150, attr = 7, count = 4096, npages = 1 
          00000001:00000040:5.0:1366734926.925118:0:31865:0:(stap_31938_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff8801da30fb28, hash = 2920144773176655063, attr = 7, count = 4096, npages = 1 
          00000001:00000040:0.0:1366734927.917902:0:4495:0:(stap_31938_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff88022acfe328, hash = 3811381997883059066, attr = 7, count = 4096, npages = 1 
          00000001:00000040:0.0:1366734927.922300:0:4495:0:(stap_31938_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff88022acfe328, hash = 1366741674, attr = 7, count = 4096, npages = 1 
          00000001:00000040:0.0:1366734927.926697:0:4495:0:(stap_31938_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff88022acfe328, hash = 1643777690, attr = 7, count = 4096, npages = 1 
          00000001:00000040:0.0:1366734927.931104:0:4495:0:(stap_31938_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff88022acfe328, hash = 1854943286, attr = 7, count = 4096, npages = 1 
          00000001:00000040:0.0:1366734927.935557:0:4495:0:(stap_31938_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff88022acfe328, hash = 2076961697, attr = 7, count = 4096, npages = 1 
          

          It looks pretty obvious that things go wrong in the transition from hash 1502393138 to hash 972141167607355620. But that's nothing new. So now, lets look a little closer at some of the more interesting RPCs and the call path they take.

          OK, lets looks at the two RPCs where the client is asking for hash 1255106162. In both cases, things seem to work just fine, and consistent with each other. A brief trace of the "good" and "bad" cases are below:

          "good" case:

                                                                                
          00000100:00100000:4.0:1366671762.779825:0:4513:0:(service.c:1724:ptlrpc_server_handle_request()) Handling RPC pname:cluuid+ref:pid:xid:nid:opc mdt_rdpg_33:119d1f74-b5ac-9653-7a11-c72daccd58b7+11:5521:x1432697691423683:12345-172.16.56.1@tcp1:37
          00000001:00000040:4.0:1366671762.779841:0:4513:0:(stap_27848_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff88082cd89b28, hash = 1255106162, attr = 7, count = 4096, npages = 1
          00000004:00000040:4.0:1366671762.783816:0:4513:0:(mdd_object.c:2365:mdd_dir_page_build()) lu_dirpage ldp_hash_start 1255106162 ldp_hash_end 1502393138 ldp_flags 0
          00000001:00000040:4.0:1366671762.783835:0:4513:0:(stap_27848_src.c:842:function_mdd_readpage_return()) LU-3029: rdpg = ffff88082cd89b28, hash = 1255106162, attr = 7, count = 4096, npages = 1
          00000001:00000040:4.0:1366671762.783837:0:4513:0:(stap_27848_src.c:852:function_mdd_readpage_return()) LU-3029: rdpg: ffff88082cd89b28, lu_dirpage ldp_hash_start 1255106162 ldp_hash_end 1502393138 ldp_flags 0
          00000100:00100000:4.0:1366671762.784444:0:4513:0:(service.c:1771:ptlrpc_server_handle_request()) Handled RPC pname:cluuid+ref:pid:xid:nid:opc mdt_rdpg_33:119d1f74-b5ac-9653-7a11-c72daccd58b7+10:5521:x1432697691423683:12345-172.16.56.1@tcp1:37 Request procesed in 4626us (4676us total) trans 0 rc 0/0
          

          "bad" case:

                                                                                
          00000100:00100000:13.0:1366734926.894677:0:4501:0:(service.c:1724:ptlrpc_server_handle_request()) Handling RPC pname:cluuid+ref:pid:xid:nid:opc mdt_rdpg_21:119d1f74-b5ac-9653-7a11-c72daccd58b7+177:1767:x1432697831399328:12345-172.16.56.1@tcp1:37
          00000001:00000040:13.0:1366734926.894694:0:4501:0:(stap_31938_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff880403b28b28, hash = 1255106162, attr = 7, count = 4096, npages = 1
          00000004:00000040:13.0:1366734926.898359:0:4501:0:(mdd_object.c:2365:mdd_dir_page_build()) lu_dirpage ldp_hash_start 1255106162 ldp_hash_end 1502393138 ldp_flags 0
          00000001:00000040:13.0:1366734926.898375:0:4501:0:(stap_31938_src.c:842:function_mdd_readpage_return()) LU-3029: rdpg = ffff880403b28b28, hash = 1255106162, attr = 7, count = 4096, npages = 1
          00000001:00000040:13.0:1366734926.898376:0:4501:0:(stap_31938_src.c:852:function_mdd_readpage_return()) LU-3029: rdpg: ffff880403b28b28, lu_dirpage ldp_hash_start 1255106162 ldp_hash_end 1502393138 ldp_flags 0
          00000100:00100000:15.0:1366734926.898804:0:4501:0:(service.c:1771:ptlrpc_server_handle_request()) Handled RPC pname:cluuid+ref:pid:xid:nid:opc mdt_rdpg_21:119d1f74-b5ac-9653-7a11-c72daccd58b7+176:1767:x1432697831399328:12345-172.16.56.1@tcp1:37 Request procesed in 4136us (4887us total) trans 0 rc 0/0
          

          All good, now lets look at the same information for the two RPCs where the client is asking for hash 1502393138.

          "good" case:

                                                                                
          00000100:00100000:4.0:1366671762.784957:0:4513:0:(service.c:1724:ptlrpc_server_handle_request()) Handling RPC pname:cluuid+ref:pid:xid:nid:opc mdt_rdpg_33:119d1f74-b5ac-9653-7a11-c72daccd58b7+11:5521:x1432697691423684:12345-172.16.56.1@tcp1:37
          00000001:00000040:4.0:1366671762.784969:0:4513:0:(stap_27848_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff88082cd89b28, hash = 1502393138, attr = 7, count = 4096, npages = 1
          00000004:00000040:4.0:1366671762.788669:0:4513:0:(mdd_object.c:2365:mdd_dir_page_build()) lu_dirpage ldp_hash_start 1502393138 ldp_hash_end 1713543192 ldp_flags 0
          00000001:00000040:4.0:1366671762.788682:0:4513:0:(stap_27848_src.c:842:function_mdd_readpage_return()) LU-3029: rdpg = ffff88082cd89b28, hash = 1502393138, attr = 7, count = 4096, npages = 1
          00000001:00000040:4.0:1366671762.788684:0:4513:0:(stap_27848_src.c:852:function_mdd_readpage_return()) LU-3029: rdpg: ffff88082cd89b28, lu_dirpage ldp_hash_start 1502393138 ldp_hash_end 1713543192 ldp_flags 0
          00000100:00100000:4.0:1366671762.789180:0:4513:0:(service.c:1771:ptlrpc_server_handle_request()) Handled RPC pname:cluuid+ref:pid:xid:nid:opc mdt_rdpg_33:119d1f74-b5ac-9653-7a11-c72daccd58b7+10:5521:x1432697691423684:12345-172.16.56.1@tcp1:37 Request procesed in 4228us (4255us total) trans 0 rc 0/0
          

          "bad" case:

                                                                                
          00000100:00100000:12.0:1366734926.904700:0:31865:0:(service.c:1724:ptlrpc_server_handle_request()) Handling RPC pname:cluuid+ref:pid:xid:nid:opc mdt_rdpg_43:119d1f74-b5ac-9653-7a11-c72daccd58b7+177:1767:x1432697831399329:12345-172.16.56.1@tcp1:37
          00000001:00000040:12.0:1366734926.904717:0:31865:0:(stap_31938_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff8801da30fb28, hash = 1502393138, attr = 7, count = 4096, npages = 1
          00000004:00000040:12.0:1366734926.908287:0:31865:0:(mdd_object.c:2365:mdd_dir_page_build()) lu_dirpage ldp_hash_start 4294967296 ldp_hash_end 972141167607355620 ldp_flags 0
          00000001:00000040:12.0:1366734926.908306:0:31865:0:(stap_31938_src.c:842:function_mdd_readpage_return()) LU-3029: rdpg = ffff8801da30fb28, hash = 1502393138, attr = 7, count = 4096, npages = 1
          00000001:00000040:12.0:1366734926.908308:0:31865:0:(stap_31938_src.c:852:function_mdd_readpage_return()) LU-3029: rdpg: ffff8801da30fb28, lu_dirpage ldp_hash_start 1502393138 ldp_hash_end 972141167607355620 ldp_flags 0
          00000100:00100000:12.0:1366734926.908847:0:31865:0:(service.c:1771:ptlrpc_server_handle_request()) Handled RPC pname:cluuid+ref:pid:xid:nid:opc mdt_rdpg_43:119d1f74-b5ac-9653-7a11-c72daccd58b7+176:1767:x1432697831399329:12345-172.16.56.1@tcp1:37 Request procesed in 4158us (9365us total) trans 0 rc 0/0
          

          Whoa, this is strange, the two RPCs don't looks the same at all here. In the "good" case a lu_dirpage is built up with hash_start of 1502393138, and hash_end of 1713543192. In the "bad" case the lu_dirpage is built up with hash_start of 4294967296, and hash_end of 972141167607355620.

          So why is the "bad" case building a lu_dirpage with these "wrong" dirents and values for hash_start and hash_end? These bogus values are just used verbatim on the client even though they don't appear to be "correct".

          Digging into the RPC asking for hash 1502393138 in some greater detail, it looks like the osd_it_ea_load call isn't loading the correct iterator information in the "bad" case. When "good" asks to load hash = 1502393138, the iterator is loaded with oied_off = 1502393138. But when bad attempts to load hash = 1502393138, it gets oied_off = 4294967296 (the hash_start values of it's lu_dirpage).

          "good" case:

                                                                                
          00000100:00100000:4.0:1366671762.784957:0:4513:0:(service.c:1724:ptlrpc_server_handle_request()) Handling RPC pname:cluuid+ref:pid:xid:nid:opc mdt_rdpg_33:119d1f74-b5ac-9653-7a11-c72daccd58b7+11:5521:x1432697691423684:12345-172.16.56.1@tcp1:37
          00000001:00000040:4.0:1366671762.784969:0:4513:0:(stap_27848_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff88082cd89b28, hash = 1502393138, attr = 7, count = 4096, npages = 1
          00000001:00000040:4.0:1366671762.784973:0:4513:0:(stap_27848_src.c:649:function_dinfo()) LU-3029: osd_it_ea_load entering
          00000001:00000040:4.0:1366671762.784974:0:4513:0:(stap_27848_src.c:701:function_dinfo_osd_it_ea()) LU-3029: osd_it_ea(ffff88082cd8a400) = { .oie_obj = ffff88037ddc6d80, .oie_file = {...}, .oie_rd_dirent = 0, .oie_it_dirent = 0, .oie_dirent = (null), .oie_buf = ffff8808341e6000 }
          00000001:00000040:4.0:1366671762.784976:0:4513:0:(stap_27848_src.c:741:function_dinfo_osd_it_ea_dirent()) LU-3029: ea->oie_dirent == NULL
          00000001:00000040:4.0:1366671762.785057:0:4513:0:(stap_27848_src.c:759:function_dinfo_osd_it_ea_dirent()) LU-3029: osd_it_ea(ffff88082cd8a400)->oie_dirent(ffff8808341e6000) = { .oied_fid = [0x0:0x0:0x0], .oied_ino = 466162723, .oied_off = 1502393138, .oied_namelen = 5, .oied_type = 8, .oied_name = 01319YZ }
          00000001:00000040:4.0:1366671762.785059:0:4513:0:(stap_27848_src.c:701:function_dinfo_osd_it_ea()) LU-3029: osd_it_ea(ffff88082cd8a400) = { .oie_obj = ffff88037ddc6d80, .oie_file = {...}, .oie_rd_dirent = 106, .oie_it_dirent = 1, .oie_dirent = ffff8808341e6000, .oie_buf = ffff8808341e6000 }
          00000001:00000040:4.0:1366671762.785062:0:4513:0:(stap_27848_src.c:649:function_dinfo()) LU-3029: osd_it_ea_load leaving (rc = return=0x1)
          00000001:00000040:4.0:1366671762.785064:0:4513:0:(stap_27848_src.c:649:function_dinfo()) LU-3029: osd_it_ea_key_size entering
          00000001:00000040:4.0:1366671762.785066:0:4513:0:(stap_27848_src.c:649:function_dinfo()) LU-3029: osd_it_ea_key_size leaving (rc = return=0x5)
          00000001:00000040:4.0:1366671762.785068:0:4513:0:(stap_27848_src.c:649:function_dinfo()) LU-3029: osd_it_ea_store entering
          00000001:00000040:4.0:1366671762.785070:0:4513:0:(stap_27848_src.c:649:function_dinfo()) LU-3029: osd_it_ea_store leaving (rc = return=0x598cb332)
          00000001:00000040:4.0:1366671762.785072:0:4513:0:(stap_27848_src.c:649:function_dinfo()) LU-3029: osd_it_ea_rec entering
          00000001:00000040:4.0:1366671762.785073:0:4513:0:(stap_27848_src.c:701:function_dinfo_osd_it_ea()) LU-3029: osd_it_ea(ffff88082cd8a400) = { .oie_obj = ffff88037ddc6d80, .oie_file = {...}, .oie_rd_dirent = 106, .oie_it_dirent = 1, .oie_dirent = ffff8808341e6000, .oie_buf = ffff8808341e6000 }
          00000001:00000040:4.0:1366671762.785075:0:4513:0:(stap_27848_src.c:759:function_dinfo_osd_it_ea_dirent()) LU-3029: osd_it_ea(ffff88082cd8a400)->oie_dirent(ffff8808341e6000) = { .oied_fid = [0x0:0x0:0x0], .oied_ino = 466162723, .oied_off = 1502393138, .oied_namelen = 5, .oied_type = 8, .oied_name = 01319YZ }
          00000001:00000040:4.0:1366671762.785082:0:4513:0:(stap_27848_src.c:759:function_dinfo_osd_it_ea_dirent()) LU-3029: osd_it_ea(ffff88082cd8a400)->oie_dirent(ffff8808341e6000) = { .oied_fid = [0x5a59cc33e0:0x1c51a:0x0], .oied_ino = 466162723, .oied_off = 1502393138, .oied_namelen = 5, .oied_type = 8, .oied_name = 01319YZ }
          00000001:00000040:4.0:1366671762.785084:0:4513:0:(stap_27848_src.c:701:function_dinfo_osd_it_ea()) LU-3029: osd_it_ea(ffff88082cd8a400) = { .oie_obj = ffff88037ddc6d80, .oie_file = {...}, .oie_rd_dirent = 106, .oie_it_dirent = 1, .oie_dirent = ffff8808341e6000, .oie_buf = ffff8808341e6000 }
          00000001:00000040:4.0:1366671762.785087:0:4513:0:(stap_27848_src.c:649:function_dinfo()) LU-3029: osd_it_ea_rec leaving (rc = return=0x0)
          00000001:00000040:4.0:1366671762.785089:0:4513:0:(stap_27848_src.c:649:function_dinfo()) LU-3029: osd_it_ea_next entering
          00000001:00000040:4.0:1366671762.785089:0:4513:0:(stap_27848_src.c:701:function_dinfo_osd_it_ea()) LU-3029: osd_it_ea(ffff88082cd8a400) = { .oie_obj = ffff88037ddc6d80, .oie_file = {...}, .oie_rd_dirent = 106, .oie_it_dirent = 1, .oie_dirent = ffff8808341e6000, .oie_buf = ffff8808341e6000 }
          00000001:00000040:4.0:1366671762.785091:0:4513:0:(stap_27848_src.c:759:function_dinfo_osd_it_ea_dirent()) LU-3029: osd_it_ea(ffff88082cd8a400)->oie_dirent(ffff8808341e6000) = { .oied_fid = [0x5a59cc33e0:0x1c51a:0x0], .oied_ino = 466162723, .oied_off = 1502393138, .oied_namelen = 5, .oied_type = 8, .oied_name = 01319YZ }
          00000001:00000040:4.0:1366671762.785094:0:4513:0:(stap_27848_src.c:759:function_dinfo_osd_it_ea_dirent()) LU-3029: osd_it_ea(ffff88082cd8a400)->oie_dirent(ffff8808341e6030) = { .oied_fid = [0x0:0x0:0x0], .oied_ino = 466164047, .oied_off = 1502967108, .oied_namelen = 5, .oied_type = 8, .oied_name = 01907 }
          00000001:00000040:4.0:1366671762.785096:0:4513:0:(stap_27848_src.c:701:function_dinfo_osd_it_ea()) LU-3029: osd_it_ea(ffff88082cd8a400) = { .oie_obj = ffff88037ddc6d80, .oie_file = {...}, .oie_rd_dirent = 106, .oie_it_dirent = 2, .oie_dirent = ffff8808341e6030, .oie_buf = ffff8808341e6000 }
          00000001:00000040:4.0:1366671762.785099:0:4513:0:(stap_27848_src.c:649:function_dinfo()) LU-3029: osd_it_ea_next leaving (rc = return=0x0)
          00000001:00000040:4.0:1366671762.785101:0:4513:0:(stap_27848_src.c:649:function_dinfo()) LU-3029: osd_it_ea_key_size entering
          00000001:00000040:4.0:1366671762.785102:0:4513:0:(stap_27848_src.c:649:function_dinfo()) LU-3029: osd_it_ea_key_size leaving (rc = return=0x5)
          00000001:00000040:4.0:1366671762.785104:0:4513:0:(stap_27848_src.c:649:function_dinfo()) LU-3029: osd_it_ea_store entering
          00000001:00000040:4.0:1366671762.785106:0:4513:0:(stap_27848_src.c:649:function_dinfo()) LU-3029: osd_it_ea_store leaving (rc = return=0x59957544)
          00000001:00000040:4.0:1366671762.785107:0:4513:0:(stap_27848_src.c:649:function_dinfo()) LU-3029: osd_it_ea_rec entering
          00000001:00000040:4.0:1366671762.785108:0:4513:0:(stap_27848_src.c:701:function_dinfo_osd_it_ea()) LU-3029: osd_it_ea(ffff88082cd8a400) = { .oie_obj = ffff88037ddc6d80, .oie_file = {...}, .oie_rd_dirent = 106, .oie_it_dirent = 2, .oie_dirent = ffff8808341e6030, .oie_buf = ffff8808341e6000 }
          00000001:00000040:4.0:1366671762.785110:0:4513:0:(stap_27848_src.c:759:function_dinfo_osd_it_ea_dirent()) LU-3029: osd_it_ea(ffff88082cd8a400)->oie_dirent(ffff8808341e6030) = { .oied_fid = [0x0:0x0:0x0], .oied_ino = 466164047, .oied_off = 1502967108, .oied_namelen = 5, .oied_type = 8, .oied_name = 01907 }
          00000001:00000040:4.0:1366671762.785117:0:4513:0:(stap_27848_src.c:759:function_dinfo_osd_it_ea_dirent()) LU-3029: osd_it_ea(ffff88082cd8a400)->oie_dirent(ffff8808341e6030) = { .oied_fid = [0x5a59cc33e0:0x1c766:0x0], .oied_ino = 466164047, .oied_off = 1502967108, .oied_namelen = 5, .oied_type = 8, .oied_name = 01907 }
          00000001:00000040:4.0:1366671762.785119:0:4513:0:(stap_27848_src.c:701:function_dinfo_osd_it_ea()) LU-3029: osd_it_ea(ffff88082cd8a400) = { .oie_obj = ffff88037ddc6d80, .oie_file = {...}, .oie_rd_dirent = 106, .oie_it_dirent = 2, .oie_dirent = ffff8808341e6030, .oie_buf = ffff8808341e6000 }
          00000001:00000040:4.0:1366671762.785122:0:4513:0:(stap_27848_src.c:649:function_dinfo()) LU-3029: osd_it_ea_rec leaving (rc = return=0x0)
          00000001:00000040:4.0:1366671762.785124:0:4513:0:(stap_27848_src.c:649:function_dinfo()) LU-3029: osd_it_ea_next entering
          00000001:00000040:4.0:1366671762.785124:0:4513:0:(stap_27848_src.c:701:function_dinfo_osd_it_ea()) LU-3029: osd_it_ea(ffff88082cd8a400) = { .oie_obj = ffff88037ddc6d80, .oie_file = {...}, .oie_rd_dirent = 106, .oie_it_dirent = 2, .oie_dirent = ffff8808341e6030, .oie_buf = ffff8808341e6000 }
          00000001:00000040:4.0:1366671762.785126:0:4513:0:(stap_27848_src.c:759:function_dinfo_osd_it_ea_dirent()) LU-3029: osd_it_ea(ffff88082cd8a400)->oie_dirent(ffff8808341e6030) = { .oied_fid = [0x5a59cc33e0:0x1c766:0x0], .oied_ino = 466164047, .oied_off = 1502967108, .oied_namelen = 5, .oied_type = 8, .oied_name = 01907 }
          00000001:00000040:4.0:1366671762.785129:0:4513:0:(stap_27848_src.c:759:function_dinfo_osd_it_ea_dirent()) LU-3029: osd_it_ea(ffff88082cd8a400)->oie_dirent(ffff8808341e6060) = { .oied_fid = [0x0:0x0:0x0], .oied_ino = 466163711, .oied_off = 1508584937, .oied_namelen = 5, .oied_type = 8, .oied_name = 01737 }
          00000001:00000040:4.0:1366671762.785131:0:4513:0:(stap_27848_src.c:701:function_dinfo_osd_it_ea()) LU-3029: osd_it_ea(ffff88082cd8a400) = { .oie_obj = ffff88037ddc6d80, .oie_file = {...}, .oie_rd_dirent = 106, .oie_it_dirent = 3, .oie_dirent = ffff8808341e6060, .oie_buf = ffff8808341e6000 }
          00000001:00000040:4.0:1366671762.785134:0:4513:0:(stap_27848_src.c:649:function_dinfo()) LU-3029: osd_it_ea_next leaving (rc = return=0x0)
          

          "bad" case:

                                                                                
          00000100:00100000:12.0:1366734926.904700:0:31865:0:(service.c:1724:ptlrpc_server_handle_request()) Handling RPC pname:cluuid+ref:pid:xid:nid:opc mdt_rdpg_43:119d1f74-b5ac-9653-7a11-c72daccd58b7+177:1767:x1432697831399329:12345-172.16.56.1@tcp1:37
          00000001:00000040:12.0:1366734926.904717:0:31865:0:(stap_31938_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff8801da30fb28, hash = 1502393138, attr = 7, count = 4096, npages = 1
          00000001:00000040:12.0:1366734926.904726:0:31865:0:(stap_31938_src.c:649:function_dinfo()) LU-3029: osd_it_ea_load entering
          00000001:00000040:12.0:1366734926.904728:0:31865:0:(stap_31938_src.c:701:function_dinfo_osd_it_ea()) LU-3029: osd_it_ea(ffff88011cdee400) = { .oie_obj = ffff8803188fd300, .oie_file = {...}, .oie_rd_dirent = 0, .oie_it_dirent = 0, .oie_dirent = (null), .oie_buf = ffff880229396000 }
          00000001:00000040:12.0:1366734926.904730:0:31865:0:(stap_31938_src.c:741:function_dinfo_osd_it_ea_dirent()) LU-3029: ea->oie_dirent == NULL
          00000001:00000040:12.0:1366734926.904831:0:31865:0:(stap_31938_src.c:759:function_dinfo_osd_it_ea_dirent()) LU-3029: osd_it_ea(ffff88011cdee400)->oie_dirent(ffff880229396000) = { .oied_fid = [0x0:0x0:0x0], .oied_ino = 466160119, .oied_off = 4294967296, .oied_namelen = 2, .oied_type = 4, .oied_name = .. }
          00000001:00000040:12.0:1366734926.904835:0:31865:0:(stap_31938_src.c:701:function_dinfo_osd_it_ea()) LU-3029: osd_it_ea(ffff88011cdee400) = { .oie_obj = ffff8803188fd300, .oie_file = {...}, .oie_rd_dirent = 106, .oie_it_dirent = 1, .oie_dirent = ffff880229396000, .oie_buf = ffff880229396000 }
          00000001:00000040:12.0:1366734926.904838:0:31865:0:(stap_31938_src.c:649:function_dinfo()) LU-3029: osd_it_ea_load leaving (rc = return=0x1)
          00000001:00000040:12.0:1366734926.904841:0:31865:0:(stap_31938_src.c:649:function_dinfo()) LU-3029: osd_it_ea_key_size entering
          00000001:00000040:12.0:1366734926.904843:0:31865:0:(stap_31938_src.c:649:function_dinfo()) LU-3029: osd_it_ea_key_size leaving (rc = return=0x2)
          00000001:00000040:12.0:1366734926.904845:0:31865:0:(stap_31938_src.c:649:function_dinfo()) LU-3029: osd_it_ea_store entering
          00000001:00000040:12.0:1366734926.904848:0:31865:0:(stap_31938_src.c:649:function_dinfo()) LU-3029: osd_it_ea_store leaving (rc = return=0x100000000)
          00000001:00000040:12.0:1366734926.904850:0:31865:0:(stap_31938_src.c:649:function_dinfo()) LU-3029: osd_it_ea_rec entering
          00000001:00000040:12.0:1366734926.904851:0:31865:0:(stap_31938_src.c:701:function_dinfo_osd_it_ea()) LU-3029: osd_it_ea(ffff88011cdee400) = { .oie_obj = ffff8803188fd300, .oie_file = {...}, .oie_rd_dirent = 106, .oie_it_dirent = 1, .oie_dirent = ffff880229396000, .oie_buf = ffff880229396000 }
          00000001:00000040:12.0:1366734926.904853:0:31865:0:(stap_31938_src.c:759:function_dinfo_osd_it_ea_dirent()) LU-3029: osd_it_ea(ffff88011cdee400)->oie_dirent(ffff880229396000) = { .oied_fid = [0x0:0x0:0x0], .oied_ino = 466160119, .oied_off = 4294967296, .oied_namelen = 2, .oied_type = 4, .oied_name = .. }
          00000001:00000040:12.0:1366734926.904860:0:31865:0:(stap_31938_src.c:759:function_dinfo_osd_it_ea_dirent()) LU-3029: osd_it_ea(ffff88011cdee400)->oie_dirent(ffff880229396000) = { .oied_fid = [0x5a59cc33e0:0x1c3d9:0x0], .oied_ino = 466160119, .oied_off = 4294967296, .oied_namelen = 2, .oied_type = 4, .oied_name = .. }
          00000001:00000040:12.0:1366734926.904863:0:31865:0:(stap_31938_src.c:701:function_dinfo_osd_it_ea()) LU-3029: osd_it_ea(ffff88011cdee400) = { .oie_obj = ffff8803188fd300, .oie_file = {...}, .oie_rd_dirent = 106, .oie_it_dirent = 1, .oie_dirent = ffff880229396000, .oie_buf = ffff880229396000 }
          00000001:00000040:12.0:1366734926.904865:0:31865:0:(stap_31938_src.c:649:function_dinfo()) LU-3029: osd_it_ea_rec leaving (rc = return=0x0)
          00000001:00000040:12.0:1366734926.904867:0:31865:0:(stap_31938_src.c:649:function_dinfo()) LU-3029: osd_it_ea_next entering
          00000001:00000040:12.0:1366734926.904868:0:31865:0:(stap_31938_src.c:701:function_dinfo_osd_it_ea()) LU-3029: osd_it_ea(ffff88011cdee400) = { .oie_obj = ffff8803188fd300, .oie_file = {...}, .oie_rd_dirent = 106, .oie_it_dirent = 1, .oie_dirent = ffff880229396000, .oie_buf = ffff880229396000 }
          00000001:00000040:12.0:1366734926.904870:0:31865:0:(stap_31938_src.c:759:function_dinfo_osd_it_ea_dirent()) LU-3029: osd_it_ea(ffff88011cdee400)->oie_dirent(ffff880229396000) = { .oied_fid = [0x5a59cc33e0:0x1c3d9:0x0], .oied_ino = 466160119, .oied_off = 4294967296, .oied_namelen = 2, .oied_type = 4, .oied_name = .. }
          00000001:00000040:12.0:1366734926.904873:0:31865:0:(stap_31938_src.c:759:function_dinfo_osd_it_ea_dirent()) LU-3029: osd_it_ea(ffff88011cdee400)->oie_dirent(ffff880229396028) = { .oied_fid = [0x0:0x0:0x0], .oied_ino = 466164003, .oied_off = 950193342486109, .oied_namelen = 5, .oied_type = 8, .oied_name = 01877!
          00000001:00000040:12.0:1366734926.904875:0:31865:0:(stap_31938_src.c:701:function_dinfo_osd_it_ea()) LU-3029: osd_it_ea(ffff88011cdee400) = { .oie_obj = ffff8803188fd300, .oie_file = {...}, .oie_rd_dirent = 106, .oie_it_dirent = 2, .oie_dirent = ffff880229396028, .oie_buf = ffff880229396000 }
          00000001:00000040:12.0:1366734926.904877:0:31865:0:(stap_31938_src.c:649:function_dinfo()) LU-3029: osd_it_ea_next leaving (rc = return=0x0)
          00000001:00000040:12.0:1366734926.904879:0:31865:0:(stap_31938_src.c:649:function_dinfo()) LU-3029: osd_it_ea_key_size entering
          00000001:00000040:12.0:1366734926.904882:0:31865:0:(stap_31938_src.c:649:function_dinfo()) LU-3029: osd_it_ea_key_size leaving (rc = return=0x5)
          00000001:00000040:12.0:1366734926.904883:0:31865:0:(stap_31938_src.c:649:function_dinfo()) LU-3029: osd_it_ea_store entering
          00000001:00000040:12.0:1366734926.904886:0:31865:0:(stap_31938_src.c:649:function_dinfo()) LU-3029: osd_it_ea_store leaving (rc = return=0x3603220a5965d)
          00000001:00000040:12.0:1366734926.904887:0:31865:0:(stap_31938_src.c:649:function_dinfo()) LU-3029: osd_it_ea_rec entering
          00000001:00000040:12.0:1366734926.904888:0:31865:0:(stap_31938_src.c:701:function_dinfo_osd_it_ea()) LU-3029: osd_it_ea(ffff88011cdee400) = { .oie_obj = ffff8803188fd300, .oie_file = {...}, .oie_rd_dirent = 106, .oie_it_dirent = 2, .oie_dirent = ffff880229396028, .oie_buf = ffff880229396000 }
          00000001:00000040:12.0:1366734926.904890:0:31865:0:(stap_31938_src.c:759:function_dinfo_osd_it_ea_dirent()) LU-3029: osd_it_ea(ffff88011cdee400)->oie_dirent(ffff880229396028) = { .oied_fid = [0x0:0x0:0x0], .oied_ino = 466164003, .oied_off = 950193342486109, .oied_namelen = 5, .oied_type = 8, .oied_name = 01877!
          00000001:00000040:12.0:1366734926.904897:0:31865:0:(stap_31938_src.c:759:function_dinfo_osd_it_ea_dirent()) LU-3029: osd_it_ea(ffff88011cdee400)->oie_dirent(ffff880229396028) = { .oied_fid = [0x5a59cc33e0:0x1c748:0x0], .oied_ino = 466164003, .oied_off = 950193342486109, .oied_namelen = 5, .oied_type = 8, .oied_name = 01877!
          00000001:00000040:12.0:1366734926.904899:0:31865:0:(stap_31938_src.c:701:function_dinfo_osd_it_ea()) LU-3029: osd_it_ea(ffff88011cdee400) = { .oie_obj = ffff8803188fd300, .oie_file = {...}, .oie_rd_dirent = 106, .oie_it_dirent = 2, .oie_dirent = ffff880229396028, .oie_buf = ffff880229396000 }
          00000001:00000040:12.0:1366734926.904902:0:31865:0:(stap_31938_src.c:649:function_dinfo()) LU-3029: osd_it_ea_rec leaving (rc = return=0x0)
          00000001:00000040:12.0:1366734926.904904:0:31865:0:(stap_31938_src.c:649:function_dinfo()) LU-3029: osd_it_ea_next entering
          00000001:00000040:12.0:1366734926.904904:0:31865:0:(stap_31938_src.c:701:function_dinfo_osd_it_ea()) LU-3029: osd_it_ea(ffff88011cdee400) = { .oie_obj = ffff8803188fd300, .oie_file = {...}, .oie_rd_dirent = 106, .oie_it_dirent = 2, .oie_dirent = ffff880229396028, .oie_buf = ffff880229396000 }
          00000001:00000040:12.0:1366734926.904906:0:31865:0:(stap_31938_src.c:759:function_dinfo_osd_it_ea_dirent()) LU-3029: osd_it_ea(ffff88011cdee400)->oie_dirent(ffff880229396028) = { .oied_fid = [0x5a59cc33e0:0x1c748:0x0], .oied_ino = 466164003, .oied_off = 950193342486109, .oied_namelen = 5, .oied_type = 8, .oied_name = 01877!
          00000001:00000040:12.0:1366734926.904908:0:31865:0:(stap_31938_src.c:759:function_dinfo_osd_it_ea_dirent()) LU-3029: osd_it_ea(ffff88011cdee400)->oie_dirent(ffff880229396058) = { .oied_fid = [0x0:0x0:0x0], .oied_ino = 466162984, .oied_off = 1048734590608755, .oied_namelen = 5, .oied_type = 8, .oied_name = 01430 }
          00000001:00000040:12.0:1366734926.904911:0:31865:0:(stap_31938_src.c:701:function_dinfo_osd_it_ea()) LU-3029: osd_it_ea(ffff88011cdee400) = { .oie_obj = ffff8803188fd300, .oie_file = {...}, .oie_rd_dirent = 106, .oie_it_dirent = 3, .oie_dirent = ffff880229396058, .oie_buf = ffff880229396000 }
          00000001:00000040:12.0:1366734926.904913:0:31865:0:(stap_31938_src.c:649:function_dinfo()) LU-3029: osd_it_ea_next leaving (rc = return=0x0)
          
          prakash Prakash Surya (Inactive) added a comment - - edited Sorry for the lack of communication. I've been trying to get the issue to hit with the debug info from the nasf's comment above but haven't reproduced until today. Using systemtap I've probed a few different places in the call path of interest. Specifically, the entry and exit points of mdd_readpage, osd_it_ea_load, osd_it_ea_key_size, osd_it_ea_store, osd_it_ea_rec, and osd_it_ea_next. The entry point of mdd_readpage now prints: CDEBUG(D_INFO, "LU-3029: rdpg = %p, hash = "LPU64", attr = %x, count = %u, npages = %u\n", rdpg, rdpg->rp_hash, rdpg->rp_attrs, rdpg->rp_count, rdpg->rp_npages); The exit point of mdd_readpage prints: CDEBUG(D_INFO, "LU-3029: rdpg = %p, hash = "LPU64", attr = %x, count = %u, npages = %u\n", rdpg, rdpg->rp_hash, rdpg->rp_attrs, rdpg->rp_count, rdpg->rp_npages); for (i = 0, nob = rdpg->rp_count; i < rdpg->rp_npages && nob > 0; i++, nob -= CFS_PAGE_SIZE) { struct lu_dirpage *dp = cfs_kmap(rdpg->rp_pages[i]); CDEBUG(D_INFO, "LU-3029: rdpg: %p, lu_dirpage ldp_hash_start "LPU64 " ldp_hash_end "LPU64" ldp_flags %u\n", rdpg, le64_to_cpu(dp->ldp_hash_start), le64_to_cpu(dp->ldp_hash_end), le32_to_cpu(dp->ldp_flags)); cfs_kunmap(rdpg->rp_pages[i]); } And the osd_it_ea_* functions print their trace information and some dump its osd_it_ea structure to D_INFO using: struct osd_it_ea *ea = (struct osd_it_ea *)STAP_ARG_osd_it_ea; CDEBUG(D_INFO, "LU-3029: " "osd_it_ea(%p) = { .oie_obj = %p, " ".oie_file = {...}, " ".oie_rd_dirent = %i, " ".oie_it_dirent = %i, " ".oie_dirent = %p, " ".oie_buf = %p }\n", ea, ea->oie_obj, /*ea->oie_file,*/ ea->oie_rd_dirent, ea->oie_it_dirent, ea->oie_dirent, ea->oie_buf); and: struct osd_it_ea *ea = (struct osd_it_ea *)STAP_ARG_osd_it_ea; CDEBUG(D_INFO, "LU-3029: osd_it_ea(%p)->oie_dirent(%p) = " "{ .oied_fid = "DFID", " ".oied_ino = "LPU64", " ".oied_off = "LPU64", " ".oied_namelen = %i, " ".oied_type = %i, " ".oied_name = %s }\n", ea, ea->oie_dirent, PFID(&ea->oie_dirent->oied_fid), ea->oie_dirent->oied_ino, ea->oie_dirent->oied_off, ea->oie_dirent->oied_namelen, ea->oie_dirent->oied_type, ea->oie_dirent->oied_name); I'll attach the full system tap script and logs to this bug in a follow up comment, but just wanted to give my following debug messages some context. So, in the "good" run (readdir produces the correct results), the calls to mdd_readpage look like so on the MDS: 00000001:00000040:4.0:1366671762.749937:0:4513:0:(stap_27848_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff88082cd89b28, hash = 0, attr = 7, count = 4096, npages = 1 00000001:00000040:4.0:1366671762.755066:0:4513:0:(stap_27848_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff88082cd89b28, hash = 225459756, attr = 7, count = 4096, npages = 1 00000001:00000040:4.0:1366671762.760068:0:4513:0:(stap_27848_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff88082cd89b28, hash = 466119302, attr = 7, count = 4096, npages = 1 00000001:00000040:4.0:1366671762.764754:0:4513:0:(stap_27848_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff88082cd89b28, hash = 678250957, attr = 7, count = 4096, npages = 1 00000001:00000040:6.0:1366671762.769742:0:4513:0:(stap_27848_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff88082cd89b28, hash = 885661942, attr = 7, count = 4096, npages = 1 00000001:00000040:6.0:1366671762.774808:0:4513:0:(stap_27848_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff88082cd89b28, hash = 1071938997, attr = 7, count = 4096, npages = 1 00000001:00000040:4.0:1366671762.779841:0:4513:0:(stap_27848_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff88082cd89b28, hash = 1255106162, attr = 7, count = 4096, npages = 1 00000001:00000040:4.0:1366671762.784969:0:4513:0:(stap_27848_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff88082cd89b28, hash = 1502393138, attr = 7, count = 4096, npages = 1 00000001:00000040:4.0:1366671762.789711:0:4513:0:(stap_27848_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff88082cd89b28, hash = 1713543192, attr = 7, count = 4096, npages = 1 00000001:00000040:4.0:1366671762.794429:0:4513:0:(stap_27848_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff88082cd89b28, hash = 1968026834, attr = 7, count = 4096, npages = 1 But in the bad case (readdir produces the incorrect results), the calls looks like this: 00000001:00000040:9.0:1366734926.847245:0:4495:0:(stap_31938_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff88022acfe328, hash = 0, attr = 7, count = 4096, npages = 1 00000001:00000040:10.0:1366734926.852728:0:4495:0:(stap_31938_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff88022acfe328, hash = 225459756, attr = 7, count = 4096, npages = 1 00000001:00000040:10.0:1366734926.858534:0:4501:0:(stap_31938_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff880403b28b28, hash = 466119302, attr = 7, count = 4096, npages = 1 00000001:00000040:11.0:1366734926.862934:0:4501:0:(stap_31938_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff880403b28b28, hash = 678250957, attr = 7, count = 4096, npages = 1 00000001:00000040:6.0:1366734926.872498:0:4501:0:(stap_31938_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff880403b28b28, hash = 885661942, attr = 7, count = 4096, npages = 1 00000001:00000040:15.0:1366734926.887832:0:4495:0:(stap_31938_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff88022acfe328, hash = 1071938997, attr = 7, count = 4096, npages = 1 00000001:00000040:13.0:1366734926.894694:0:4501:0:(stap_31938_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff880403b28b28, hash = 1255106162, attr = 7, count = 4096, npages = 1 00000001:00000040:12.0:1366734926.904717:0:31865:0:(stap_31938_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff8801da30fb28, hash = 1502393138, attr = 7, count = 4096, npages = 1 00000001:00000040:7.0:1366734926.910386:0:31865:0:(stap_31938_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff8801da30fb28, hash = 972141167607355620, attr = 7, count = 4096, npages = 1 00000001:00000040:5.0:1366734926.916303:0:31865:0:(stap_31938_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff8801da30fb28, hash = 2012764699691245150, attr = 7, count = 4096, npages = 1 00000001:00000040:5.0:1366734926.925118:0:31865:0:(stap_31938_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff8801da30fb28, hash = 2920144773176655063, attr = 7, count = 4096, npages = 1 00000001:00000040:0.0:1366734927.917902:0:4495:0:(stap_31938_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff88022acfe328, hash = 3811381997883059066, attr = 7, count = 4096, npages = 1 00000001:00000040:0.0:1366734927.922300:0:4495:0:(stap_31938_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff88022acfe328, hash = 1366741674, attr = 7, count = 4096, npages = 1 00000001:00000040:0.0:1366734927.926697:0:4495:0:(stap_31938_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff88022acfe328, hash = 1643777690, attr = 7, count = 4096, npages = 1 00000001:00000040:0.0:1366734927.931104:0:4495:0:(stap_31938_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff88022acfe328, hash = 1854943286, attr = 7, count = 4096, npages = 1 00000001:00000040:0.0:1366734927.935557:0:4495:0:(stap_31938_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff88022acfe328, hash = 2076961697, attr = 7, count = 4096, npages = 1 It looks pretty obvious that things go wrong in the transition from hash 1502393138 to hash 972141167607355620. But that's nothing new. So now, lets look a little closer at some of the more interesting RPCs and the call path they take. OK, lets looks at the two RPCs where the client is asking for hash 1255106162. In both cases, things seem to work just fine, and consistent with each other. A brief trace of the "good" and "bad" cases are below: "good" case: 00000100:00100000:4.0:1366671762.779825:0:4513:0:(service.c:1724:ptlrpc_server_handle_request()) Handling RPC pname:cluuid+ref:pid:xid:nid:opc mdt_rdpg_33:119d1f74-b5ac-9653-7a11-c72daccd58b7+11:5521:x1432697691423683:12345-172.16.56.1@tcp1:37 00000001:00000040:4.0:1366671762.779841:0:4513:0:(stap_27848_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff88082cd89b28, hash = 1255106162, attr = 7, count = 4096, npages = 1 00000004:00000040:4.0:1366671762.783816:0:4513:0:(mdd_object.c:2365:mdd_dir_page_build()) lu_dirpage ldp_hash_start 1255106162 ldp_hash_end 1502393138 ldp_flags 0 00000001:00000040:4.0:1366671762.783835:0:4513:0:(stap_27848_src.c:842:function_mdd_readpage_return()) LU-3029: rdpg = ffff88082cd89b28, hash = 1255106162, attr = 7, count = 4096, npages = 1 00000001:00000040:4.0:1366671762.783837:0:4513:0:(stap_27848_src.c:852:function_mdd_readpage_return()) LU-3029: rdpg: ffff88082cd89b28, lu_dirpage ldp_hash_start 1255106162 ldp_hash_end 1502393138 ldp_flags 0 00000100:00100000:4.0:1366671762.784444:0:4513:0:(service.c:1771:ptlrpc_server_handle_request()) Handled RPC pname:cluuid+ref:pid:xid:nid:opc mdt_rdpg_33:119d1f74-b5ac-9653-7a11-c72daccd58b7+10:5521:x1432697691423683:12345-172.16.56.1@tcp1:37 Request procesed in 4626us (4676us total) trans 0 rc 0/0 "bad" case: 00000100:00100000:13.0:1366734926.894677:0:4501:0:(service.c:1724:ptlrpc_server_handle_request()) Handling RPC pname:cluuid+ref:pid:xid:nid:opc mdt_rdpg_21:119d1f74-b5ac-9653-7a11-c72daccd58b7+177:1767:x1432697831399328:12345-172.16.56.1@tcp1:37 00000001:00000040:13.0:1366734926.894694:0:4501:0:(stap_31938_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff880403b28b28, hash = 1255106162, attr = 7, count = 4096, npages = 1 00000004:00000040:13.0:1366734926.898359:0:4501:0:(mdd_object.c:2365:mdd_dir_page_build()) lu_dirpage ldp_hash_start 1255106162 ldp_hash_end 1502393138 ldp_flags 0 00000001:00000040:13.0:1366734926.898375:0:4501:0:(stap_31938_src.c:842:function_mdd_readpage_return()) LU-3029: rdpg = ffff880403b28b28, hash = 1255106162, attr = 7, count = 4096, npages = 1 00000001:00000040:13.0:1366734926.898376:0:4501:0:(stap_31938_src.c:852:function_mdd_readpage_return()) LU-3029: rdpg: ffff880403b28b28, lu_dirpage ldp_hash_start 1255106162 ldp_hash_end 1502393138 ldp_flags 0 00000100:00100000:15.0:1366734926.898804:0:4501:0:(service.c:1771:ptlrpc_server_handle_request()) Handled RPC pname:cluuid+ref:pid:xid:nid:opc mdt_rdpg_21:119d1f74-b5ac-9653-7a11-c72daccd58b7+176:1767:x1432697831399328:12345-172.16.56.1@tcp1:37 Request procesed in 4136us (4887us total) trans 0 rc 0/0 All good, now lets look at the same information for the two RPCs where the client is asking for hash 1502393138. "good" case: 00000100:00100000:4.0:1366671762.784957:0:4513:0:(service.c:1724:ptlrpc_server_handle_request()) Handling RPC pname:cluuid+ref:pid:xid:nid:opc mdt_rdpg_33:119d1f74-b5ac-9653-7a11-c72daccd58b7+11:5521:x1432697691423684:12345-172.16.56.1@tcp1:37 00000001:00000040:4.0:1366671762.784969:0:4513:0:(stap_27848_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff88082cd89b28, hash = 1502393138, attr = 7, count = 4096, npages = 1 00000004:00000040:4.0:1366671762.788669:0:4513:0:(mdd_object.c:2365:mdd_dir_page_build()) lu_dirpage ldp_hash_start 1502393138 ldp_hash_end 1713543192 ldp_flags 0 00000001:00000040:4.0:1366671762.788682:0:4513:0:(stap_27848_src.c:842:function_mdd_readpage_return()) LU-3029: rdpg = ffff88082cd89b28, hash = 1502393138, attr = 7, count = 4096, npages = 1 00000001:00000040:4.0:1366671762.788684:0:4513:0:(stap_27848_src.c:852:function_mdd_readpage_return()) LU-3029: rdpg: ffff88082cd89b28, lu_dirpage ldp_hash_start 1502393138 ldp_hash_end 1713543192 ldp_flags 0 00000100:00100000:4.0:1366671762.789180:0:4513:0:(service.c:1771:ptlrpc_server_handle_request()) Handled RPC pname:cluuid+ref:pid:xid:nid:opc mdt_rdpg_33:119d1f74-b5ac-9653-7a11-c72daccd58b7+10:5521:x1432697691423684:12345-172.16.56.1@tcp1:37 Request procesed in 4228us (4255us total) trans 0 rc 0/0 "bad" case: 00000100:00100000:12.0:1366734926.904700:0:31865:0:(service.c:1724:ptlrpc_server_handle_request()) Handling RPC pname:cluuid+ref:pid:xid:nid:opc mdt_rdpg_43:119d1f74-b5ac-9653-7a11-c72daccd58b7+177:1767:x1432697831399329:12345-172.16.56.1@tcp1:37 00000001:00000040:12.0:1366734926.904717:0:31865:0:(stap_31938_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff8801da30fb28, hash = 1502393138, attr = 7, count = 4096, npages = 1 00000004:00000040:12.0:1366734926.908287:0:31865:0:(mdd_object.c:2365:mdd_dir_page_build()) lu_dirpage ldp_hash_start 4294967296 ldp_hash_end 972141167607355620 ldp_flags 0 00000001:00000040:12.0:1366734926.908306:0:31865:0:(stap_31938_src.c:842:function_mdd_readpage_return()) LU-3029: rdpg = ffff8801da30fb28, hash = 1502393138, attr = 7, count = 4096, npages = 1 00000001:00000040:12.0:1366734926.908308:0:31865:0:(stap_31938_src.c:852:function_mdd_readpage_return()) LU-3029: rdpg: ffff8801da30fb28, lu_dirpage ldp_hash_start 1502393138 ldp_hash_end 972141167607355620 ldp_flags 0 00000100:00100000:12.0:1366734926.908847:0:31865:0:(service.c:1771:ptlrpc_server_handle_request()) Handled RPC pname:cluuid+ref:pid:xid:nid:opc mdt_rdpg_43:119d1f74-b5ac-9653-7a11-c72daccd58b7+176:1767:x1432697831399329:12345-172.16.56.1@tcp1:37 Request procesed in 4158us (9365us total) trans 0 rc 0/0 Whoa, this is strange, the two RPCs don't looks the same at all here. In the "good" case a lu_dirpage is built up with hash_start of 1502393138, and hash_end of 1713543192. In the "bad" case the lu_dirpage is built up with hash_start of 4294967296, and hash_end of 972141167607355620. So why is the "bad" case building a lu_dirpage with these "wrong" dirents and values for hash_start and hash_end? These bogus values are just used verbatim on the client even though they don't appear to be "correct". Digging into the RPC asking for hash 1502393138 in some greater detail, it looks like the osd_it_ea_load call isn't loading the correct iterator information in the "bad" case. When "good" asks to load hash = 1502393138, the iterator is loaded with oied_off = 1502393138. But when bad attempts to load hash = 1502393138, it gets oied_off = 4294967296 (the hash_start values of it's lu_dirpage). "good" case: 00000100:00100000:4.0:1366671762.784957:0:4513:0:(service.c:1724:ptlrpc_server_handle_request()) Handling RPC pname:cluuid+ref:pid:xid:nid:opc mdt_rdpg_33:119d1f74-b5ac-9653-7a11-c72daccd58b7+11:5521:x1432697691423684:12345-172.16.56.1@tcp1:37 00000001:00000040:4.0:1366671762.784969:0:4513:0:(stap_27848_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff88082cd89b28, hash = 1502393138, attr = 7, count = 4096, npages = 1 00000001:00000040:4.0:1366671762.784973:0:4513:0:(stap_27848_src.c:649:function_dinfo()) LU-3029: osd_it_ea_load entering 00000001:00000040:4.0:1366671762.784974:0:4513:0:(stap_27848_src.c:701:function_dinfo_osd_it_ea()) LU-3029: osd_it_ea(ffff88082cd8a400) = { .oie_obj = ffff88037ddc6d80, .oie_file = {...}, .oie_rd_dirent = 0, .oie_it_dirent = 0, .oie_dirent = (null), .oie_buf = ffff8808341e6000 } 00000001:00000040:4.0:1366671762.784976:0:4513:0:(stap_27848_src.c:741:function_dinfo_osd_it_ea_dirent()) LU-3029: ea->oie_dirent == NULL 00000001:00000040:4.0:1366671762.785057:0:4513:0:(stap_27848_src.c:759:function_dinfo_osd_it_ea_dirent()) LU-3029: osd_it_ea(ffff88082cd8a400)->oie_dirent(ffff8808341e6000) = { .oied_fid = [0x0:0x0:0x0], .oied_ino = 466162723, .oied_off = 1502393138, .oied_namelen = 5, .oied_type = 8, .oied_name = 01319YZ } 00000001:00000040:4.0:1366671762.785059:0:4513:0:(stap_27848_src.c:701:function_dinfo_osd_it_ea()) LU-3029: osd_it_ea(ffff88082cd8a400) = { .oie_obj = ffff88037ddc6d80, .oie_file = {...}, .oie_rd_dirent = 106, .oie_it_dirent = 1, .oie_dirent = ffff8808341e6000, .oie_buf = ffff8808341e6000 } 00000001:00000040:4.0:1366671762.785062:0:4513:0:(stap_27848_src.c:649:function_dinfo()) LU-3029: osd_it_ea_load leaving (rc = return=0x1) 00000001:00000040:4.0:1366671762.785064:0:4513:0:(stap_27848_src.c:649:function_dinfo()) LU-3029: osd_it_ea_key_size entering 00000001:00000040:4.0:1366671762.785066:0:4513:0:(stap_27848_src.c:649:function_dinfo()) LU-3029: osd_it_ea_key_size leaving (rc = return=0x5) 00000001:00000040:4.0:1366671762.785068:0:4513:0:(stap_27848_src.c:649:function_dinfo()) LU-3029: osd_it_ea_store entering 00000001:00000040:4.0:1366671762.785070:0:4513:0:(stap_27848_src.c:649:function_dinfo()) LU-3029: osd_it_ea_store leaving (rc = return=0x598cb332) 00000001:00000040:4.0:1366671762.785072:0:4513:0:(stap_27848_src.c:649:function_dinfo()) LU-3029: osd_it_ea_rec entering 00000001:00000040:4.0:1366671762.785073:0:4513:0:(stap_27848_src.c:701:function_dinfo_osd_it_ea()) LU-3029: osd_it_ea(ffff88082cd8a400) = { .oie_obj = ffff88037ddc6d80, .oie_file = {...}, .oie_rd_dirent = 106, .oie_it_dirent = 1, .oie_dirent = ffff8808341e6000, .oie_buf = ffff8808341e6000 } 00000001:00000040:4.0:1366671762.785075:0:4513:0:(stap_27848_src.c:759:function_dinfo_osd_it_ea_dirent()) LU-3029: osd_it_ea(ffff88082cd8a400)->oie_dirent(ffff8808341e6000) = { .oied_fid = [0x0:0x0:0x0], .oied_ino = 466162723, .oied_off = 1502393138, .oied_namelen = 5, .oied_type = 8, .oied_name = 01319YZ } 00000001:00000040:4.0:1366671762.785082:0:4513:0:(stap_27848_src.c:759:function_dinfo_osd_it_ea_dirent()) LU-3029: osd_it_ea(ffff88082cd8a400)->oie_dirent(ffff8808341e6000) = { .oied_fid = [0x5a59cc33e0:0x1c51a:0x0], .oied_ino = 466162723, .oied_off = 1502393138, .oied_namelen = 5, .oied_type = 8, .oied_name = 01319YZ } 00000001:00000040:4.0:1366671762.785084:0:4513:0:(stap_27848_src.c:701:function_dinfo_osd_it_ea()) LU-3029: osd_it_ea(ffff88082cd8a400) = { .oie_obj = ffff88037ddc6d80, .oie_file = {...}, .oie_rd_dirent = 106, .oie_it_dirent = 1, .oie_dirent = ffff8808341e6000, .oie_buf = ffff8808341e6000 } 00000001:00000040:4.0:1366671762.785087:0:4513:0:(stap_27848_src.c:649:function_dinfo()) LU-3029: osd_it_ea_rec leaving (rc = return=0x0) 00000001:00000040:4.0:1366671762.785089:0:4513:0:(stap_27848_src.c:649:function_dinfo()) LU-3029: osd_it_ea_next entering 00000001:00000040:4.0:1366671762.785089:0:4513:0:(stap_27848_src.c:701:function_dinfo_osd_it_ea()) LU-3029: osd_it_ea(ffff88082cd8a400) = { .oie_obj = ffff88037ddc6d80, .oie_file = {...}, .oie_rd_dirent = 106, .oie_it_dirent = 1, .oie_dirent = ffff8808341e6000, .oie_buf = ffff8808341e6000 } 00000001:00000040:4.0:1366671762.785091:0:4513:0:(stap_27848_src.c:759:function_dinfo_osd_it_ea_dirent()) LU-3029: osd_it_ea(ffff88082cd8a400)->oie_dirent(ffff8808341e6000) = { .oied_fid = [0x5a59cc33e0:0x1c51a:0x0], .oied_ino = 466162723, .oied_off = 1502393138, .oied_namelen = 5, .oied_type = 8, .oied_name = 01319YZ } 00000001:00000040:4.0:1366671762.785094:0:4513:0:(stap_27848_src.c:759:function_dinfo_osd_it_ea_dirent()) LU-3029: osd_it_ea(ffff88082cd8a400)->oie_dirent(ffff8808341e6030) = { .oied_fid = [0x0:0x0:0x0], .oied_ino = 466164047, .oied_off = 1502967108, .oied_namelen = 5, .oied_type = 8, .oied_name = 01907 } 00000001:00000040:4.0:1366671762.785096:0:4513:0:(stap_27848_src.c:701:function_dinfo_osd_it_ea()) LU-3029: osd_it_ea(ffff88082cd8a400) = { .oie_obj = ffff88037ddc6d80, .oie_file = {...}, .oie_rd_dirent = 106, .oie_it_dirent = 2, .oie_dirent = ffff8808341e6030, .oie_buf = ffff8808341e6000 } 00000001:00000040:4.0:1366671762.785099:0:4513:0:(stap_27848_src.c:649:function_dinfo()) LU-3029: osd_it_ea_next leaving (rc = return=0x0) 00000001:00000040:4.0:1366671762.785101:0:4513:0:(stap_27848_src.c:649:function_dinfo()) LU-3029: osd_it_ea_key_size entering 00000001:00000040:4.0:1366671762.785102:0:4513:0:(stap_27848_src.c:649:function_dinfo()) LU-3029: osd_it_ea_key_size leaving (rc = return=0x5) 00000001:00000040:4.0:1366671762.785104:0:4513:0:(stap_27848_src.c:649:function_dinfo()) LU-3029: osd_it_ea_store entering 00000001:00000040:4.0:1366671762.785106:0:4513:0:(stap_27848_src.c:649:function_dinfo()) LU-3029: osd_it_ea_store leaving (rc = return=0x59957544) 00000001:00000040:4.0:1366671762.785107:0:4513:0:(stap_27848_src.c:649:function_dinfo()) LU-3029: osd_it_ea_rec entering 00000001:00000040:4.0:1366671762.785108:0:4513:0:(stap_27848_src.c:701:function_dinfo_osd_it_ea()) LU-3029: osd_it_ea(ffff88082cd8a400) = { .oie_obj = ffff88037ddc6d80, .oie_file = {...}, .oie_rd_dirent = 106, .oie_it_dirent = 2, .oie_dirent = ffff8808341e6030, .oie_buf = ffff8808341e6000 } 00000001:00000040:4.0:1366671762.785110:0:4513:0:(stap_27848_src.c:759:function_dinfo_osd_it_ea_dirent()) LU-3029: osd_it_ea(ffff88082cd8a400)->oie_dirent(ffff8808341e6030) = { .oied_fid = [0x0:0x0:0x0], .oied_ino = 466164047, .oied_off = 1502967108, .oied_namelen = 5, .oied_type = 8, .oied_name = 01907 } 00000001:00000040:4.0:1366671762.785117:0:4513:0:(stap_27848_src.c:759:function_dinfo_osd_it_ea_dirent()) LU-3029: osd_it_ea(ffff88082cd8a400)->oie_dirent(ffff8808341e6030) = { .oied_fid = [0x5a59cc33e0:0x1c766:0x0], .oied_ino = 466164047, .oied_off = 1502967108, .oied_namelen = 5, .oied_type = 8, .oied_name = 01907 } 00000001:00000040:4.0:1366671762.785119:0:4513:0:(stap_27848_src.c:701:function_dinfo_osd_it_ea()) LU-3029: osd_it_ea(ffff88082cd8a400) = { .oie_obj = ffff88037ddc6d80, .oie_file = {...}, .oie_rd_dirent = 106, .oie_it_dirent = 2, .oie_dirent = ffff8808341e6030, .oie_buf = ffff8808341e6000 } 00000001:00000040:4.0:1366671762.785122:0:4513:0:(stap_27848_src.c:649:function_dinfo()) LU-3029: osd_it_ea_rec leaving (rc = return=0x0) 00000001:00000040:4.0:1366671762.785124:0:4513:0:(stap_27848_src.c:649:function_dinfo()) LU-3029: osd_it_ea_next entering 00000001:00000040:4.0:1366671762.785124:0:4513:0:(stap_27848_src.c:701:function_dinfo_osd_it_ea()) LU-3029: osd_it_ea(ffff88082cd8a400) = { .oie_obj = ffff88037ddc6d80, .oie_file = {...}, .oie_rd_dirent = 106, .oie_it_dirent = 2, .oie_dirent = ffff8808341e6030, .oie_buf = ffff8808341e6000 } 00000001:00000040:4.0:1366671762.785126:0:4513:0:(stap_27848_src.c:759:function_dinfo_osd_it_ea_dirent()) LU-3029: osd_it_ea(ffff88082cd8a400)->oie_dirent(ffff8808341e6030) = { .oied_fid = [0x5a59cc33e0:0x1c766:0x0], .oied_ino = 466164047, .oied_off = 1502967108, .oied_namelen = 5, .oied_type = 8, .oied_name = 01907 } 00000001:00000040:4.0:1366671762.785129:0:4513:0:(stap_27848_src.c:759:function_dinfo_osd_it_ea_dirent()) LU-3029: osd_it_ea(ffff88082cd8a400)->oie_dirent(ffff8808341e6060) = { .oied_fid = [0x0:0x0:0x0], .oied_ino = 466163711, .oied_off = 1508584937, .oied_namelen = 5, .oied_type = 8, .oied_name = 01737 } 00000001:00000040:4.0:1366671762.785131:0:4513:0:(stap_27848_src.c:701:function_dinfo_osd_it_ea()) LU-3029: osd_it_ea(ffff88082cd8a400) = { .oie_obj = ffff88037ddc6d80, .oie_file = {...}, .oie_rd_dirent = 106, .oie_it_dirent = 3, .oie_dirent = ffff8808341e6060, .oie_buf = ffff8808341e6000 } 00000001:00000040:4.0:1366671762.785134:0:4513:0:(stap_27848_src.c:649:function_dinfo()) LU-3029: osd_it_ea_next leaving (rc = return=0x0) "bad" case: 00000100:00100000:12.0:1366734926.904700:0:31865:0:(service.c:1724:ptlrpc_server_handle_request()) Handling RPC pname:cluuid+ref:pid:xid:nid:opc mdt_rdpg_43:119d1f74-b5ac-9653-7a11-c72daccd58b7+177:1767:x1432697831399329:12345-172.16.56.1@tcp1:37 00000001:00000040:12.0:1366734926.904717:0:31865:0:(stap_31938_src.c:800:function_mdd_readpage_call()) LU-3029: rdpg = ffff8801da30fb28, hash = 1502393138, attr = 7, count = 4096, npages = 1 00000001:00000040:12.0:1366734926.904726:0:31865:0:(stap_31938_src.c:649:function_dinfo()) LU-3029: osd_it_ea_load entering 00000001:00000040:12.0:1366734926.904728:0:31865:0:(stap_31938_src.c:701:function_dinfo_osd_it_ea()) LU-3029: osd_it_ea(ffff88011cdee400) = { .oie_obj = ffff8803188fd300, .oie_file = {...}, .oie_rd_dirent = 0, .oie_it_dirent = 0, .oie_dirent = (null), .oie_buf = ffff880229396000 } 00000001:00000040:12.0:1366734926.904730:0:31865:0:(stap_31938_src.c:741:function_dinfo_osd_it_ea_dirent()) LU-3029: ea->oie_dirent == NULL 00000001:00000040:12.0:1366734926.904831:0:31865:0:(stap_31938_src.c:759:function_dinfo_osd_it_ea_dirent()) LU-3029: osd_it_ea(ffff88011cdee400)->oie_dirent(ffff880229396000) = { .oied_fid = [0x0:0x0:0x0], .oied_ino = 466160119, .oied_off = 4294967296, .oied_namelen = 2, .oied_type = 4, .oied_name = .. } 00000001:00000040:12.0:1366734926.904835:0:31865:0:(stap_31938_src.c:701:function_dinfo_osd_it_ea()) LU-3029: osd_it_ea(ffff88011cdee400) = { .oie_obj = ffff8803188fd300, .oie_file = {...}, .oie_rd_dirent = 106, .oie_it_dirent = 1, .oie_dirent = ffff880229396000, .oie_buf = ffff880229396000 } 00000001:00000040:12.0:1366734926.904838:0:31865:0:(stap_31938_src.c:649:function_dinfo()) LU-3029: osd_it_ea_load leaving (rc = return=0x1) 00000001:00000040:12.0:1366734926.904841:0:31865:0:(stap_31938_src.c:649:function_dinfo()) LU-3029: osd_it_ea_key_size entering 00000001:00000040:12.0:1366734926.904843:0:31865:0:(stap_31938_src.c:649:function_dinfo()) LU-3029: osd_it_ea_key_size leaving (rc = return=0x2) 00000001:00000040:12.0:1366734926.904845:0:31865:0:(stap_31938_src.c:649:function_dinfo()) LU-3029: osd_it_ea_store entering 00000001:00000040:12.0:1366734926.904848:0:31865:0:(stap_31938_src.c:649:function_dinfo()) LU-3029: osd_it_ea_store leaving (rc = return=0x100000000) 00000001:00000040:12.0:1366734926.904850:0:31865:0:(stap_31938_src.c:649:function_dinfo()) LU-3029: osd_it_ea_rec entering 00000001:00000040:12.0:1366734926.904851:0:31865:0:(stap_31938_src.c:701:function_dinfo_osd_it_ea()) LU-3029: osd_it_ea(ffff88011cdee400) = { .oie_obj = ffff8803188fd300, .oie_file = {...}, .oie_rd_dirent = 106, .oie_it_dirent = 1, .oie_dirent = ffff880229396000, .oie_buf = ffff880229396000 } 00000001:00000040:12.0:1366734926.904853:0:31865:0:(stap_31938_src.c:759:function_dinfo_osd_it_ea_dirent()) LU-3029: osd_it_ea(ffff88011cdee400)->oie_dirent(ffff880229396000) = { .oied_fid = [0x0:0x0:0x0], .oied_ino = 466160119, .oied_off = 4294967296, .oied_namelen = 2, .oied_type = 4, .oied_name = .. } 00000001:00000040:12.0:1366734926.904860:0:31865:0:(stap_31938_src.c:759:function_dinfo_osd_it_ea_dirent()) LU-3029: osd_it_ea(ffff88011cdee400)->oie_dirent(ffff880229396000) = { .oied_fid = [0x5a59cc33e0:0x1c3d9:0x0], .oied_ino = 466160119, .oied_off = 4294967296, .oied_namelen = 2, .oied_type = 4, .oied_name = .. } 00000001:00000040:12.0:1366734926.904863:0:31865:0:(stap_31938_src.c:701:function_dinfo_osd_it_ea()) LU-3029: osd_it_ea(ffff88011cdee400) = { .oie_obj = ffff8803188fd300, .oie_file = {...}, .oie_rd_dirent = 106, .oie_it_dirent = 1, .oie_dirent = ffff880229396000, .oie_buf = ffff880229396000 } 00000001:00000040:12.0:1366734926.904865:0:31865:0:(stap_31938_src.c:649:function_dinfo()) LU-3029: osd_it_ea_rec leaving (rc = return=0x0) 00000001:00000040:12.0:1366734926.904867:0:31865:0:(stap_31938_src.c:649:function_dinfo()) LU-3029: osd_it_ea_next entering 00000001:00000040:12.0:1366734926.904868:0:31865:0:(stap_31938_src.c:701:function_dinfo_osd_it_ea()) LU-3029: osd_it_ea(ffff88011cdee400) = { .oie_obj = ffff8803188fd300, .oie_file = {...}, .oie_rd_dirent = 106, .oie_it_dirent = 1, .oie_dirent = ffff880229396000, .oie_buf = ffff880229396000 } 00000001:00000040:12.0:1366734926.904870:0:31865:0:(stap_31938_src.c:759:function_dinfo_osd_it_ea_dirent()) LU-3029: osd_it_ea(ffff88011cdee400)->oie_dirent(ffff880229396000) = { .oied_fid = [0x5a59cc33e0:0x1c3d9:0x0], .oied_ino = 466160119, .oied_off = 4294967296, .oied_namelen = 2, .oied_type = 4, .oied_name = .. } 00000001:00000040:12.0:1366734926.904873:0:31865:0:(stap_31938_src.c:759:function_dinfo_osd_it_ea_dirent()) LU-3029: osd_it_ea(ffff88011cdee400)->oie_dirent(ffff880229396028) = { .oied_fid = [0x0:0x0:0x0], .oied_ino = 466164003, .oied_off = 950193342486109, .oied_namelen = 5, .oied_type = 8, .oied_name = 01877! 00000001:00000040:12.0:1366734926.904875:0:31865:0:(stap_31938_src.c:701:function_dinfo_osd_it_ea()) LU-3029: osd_it_ea(ffff88011cdee400) = { .oie_obj = ffff8803188fd300, .oie_file = {...}, .oie_rd_dirent = 106, .oie_it_dirent = 2, .oie_dirent = ffff880229396028, .oie_buf = ffff880229396000 } 00000001:00000040:12.0:1366734926.904877:0:31865:0:(stap_31938_src.c:649:function_dinfo()) LU-3029: osd_it_ea_next leaving (rc = return=0x0) 00000001:00000040:12.0:1366734926.904879:0:31865:0:(stap_31938_src.c:649:function_dinfo()) LU-3029: osd_it_ea_key_size entering 00000001:00000040:12.0:1366734926.904882:0:31865:0:(stap_31938_src.c:649:function_dinfo()) LU-3029: osd_it_ea_key_size leaving (rc = return=0x5) 00000001:00000040:12.0:1366734926.904883:0:31865:0:(stap_31938_src.c:649:function_dinfo()) LU-3029: osd_it_ea_store entering 00000001:00000040:12.0:1366734926.904886:0:31865:0:(stap_31938_src.c:649:function_dinfo()) LU-3029: osd_it_ea_store leaving (rc = return=0x3603220a5965d) 00000001:00000040:12.0:1366734926.904887:0:31865:0:(stap_31938_src.c:649:function_dinfo()) LU-3029: osd_it_ea_rec entering 00000001:00000040:12.0:1366734926.904888:0:31865:0:(stap_31938_src.c:701:function_dinfo_osd_it_ea()) LU-3029: osd_it_ea(ffff88011cdee400) = { .oie_obj = ffff8803188fd300, .oie_file = {...}, .oie_rd_dirent = 106, .oie_it_dirent = 2, .oie_dirent = ffff880229396028, .oie_buf = ffff880229396000 } 00000001:00000040:12.0:1366734926.904890:0:31865:0:(stap_31938_src.c:759:function_dinfo_osd_it_ea_dirent()) LU-3029: osd_it_ea(ffff88011cdee400)->oie_dirent(ffff880229396028) = { .oied_fid = [0x0:0x0:0x0], .oied_ino = 466164003, .oied_off = 950193342486109, .oied_namelen = 5, .oied_type = 8, .oied_name = 01877! 00000001:00000040:12.0:1366734926.904897:0:31865:0:(stap_31938_src.c:759:function_dinfo_osd_it_ea_dirent()) LU-3029: osd_it_ea(ffff88011cdee400)->oie_dirent(ffff880229396028) = { .oied_fid = [0x5a59cc33e0:0x1c748:0x0], .oied_ino = 466164003, .oied_off = 950193342486109, .oied_namelen = 5, .oied_type = 8, .oied_name = 01877! 00000001:00000040:12.0:1366734926.904899:0:31865:0:(stap_31938_src.c:701:function_dinfo_osd_it_ea()) LU-3029: osd_it_ea(ffff88011cdee400) = { .oie_obj = ffff8803188fd300, .oie_file = {...}, .oie_rd_dirent = 106, .oie_it_dirent = 2, .oie_dirent = ffff880229396028, .oie_buf = ffff880229396000 } 00000001:00000040:12.0:1366734926.904902:0:31865:0:(stap_31938_src.c:649:function_dinfo()) LU-3029: osd_it_ea_rec leaving (rc = return=0x0) 00000001:00000040:12.0:1366734926.904904:0:31865:0:(stap_31938_src.c:649:function_dinfo()) LU-3029: osd_it_ea_next entering 00000001:00000040:12.0:1366734926.904904:0:31865:0:(stap_31938_src.c:701:function_dinfo_osd_it_ea()) LU-3029: osd_it_ea(ffff88011cdee400) = { .oie_obj = ffff8803188fd300, .oie_file = {...}, .oie_rd_dirent = 106, .oie_it_dirent = 2, .oie_dirent = ffff880229396028, .oie_buf = ffff880229396000 } 00000001:00000040:12.0:1366734926.904906:0:31865:0:(stap_31938_src.c:759:function_dinfo_osd_it_ea_dirent()) LU-3029: osd_it_ea(ffff88011cdee400)->oie_dirent(ffff880229396028) = { .oied_fid = [0x5a59cc33e0:0x1c748:0x0], .oied_ino = 466164003, .oied_off = 950193342486109, .oied_namelen = 5, .oied_type = 8, .oied_name = 01877! 00000001:00000040:12.0:1366734926.904908:0:31865:0:(stap_31938_src.c:759:function_dinfo_osd_it_ea_dirent()) LU-3029: osd_it_ea(ffff88011cdee400)->oie_dirent(ffff880229396058) = { .oied_fid = [0x0:0x0:0x0], .oied_ino = 466162984, .oied_off = 1048734590608755, .oied_namelen = 5, .oied_type = 8, .oied_name = 01430 } 00000001:00000040:12.0:1366734926.904911:0:31865:0:(stap_31938_src.c:701:function_dinfo_osd_it_ea()) LU-3029: osd_it_ea(ffff88011cdee400) = { .oie_obj = ffff8803188fd300, .oie_file = {...}, .oie_rd_dirent = 106, .oie_it_dirent = 3, .oie_dirent = ffff880229396058, .oie_buf = ffff880229396000 } 00000001:00000040:12.0:1366734926.904913:0:31865:0:(stap_31938_src.c:649:function_dinfo()) LU-3029: osd_it_ea_next leaving (rc = return=0x0)

          nasf, I checked the MDTs with tunefs.ldiskfs -l, and the directory hash seeds look valid, i.e.

          Directory Hash Seed: 9b006d00-8c7b-4ec9-995a-c99307066c69

          I assumed but didn't verify that output prints ldiskfs_super_block::s_hash_seed.

          I believe Prakash has been trying to reproduce the issue with that patch and other debugging in place, but hasn't gotten a hit yet.

          nedbass Ned Bass (Inactive) added a comment - nasf, I checked the MDTs with tunefs.ldiskfs -l, and the directory hash seeds look valid, i.e. Directory Hash Seed: 9b006d00-8c7b-4ec9-995a-c99307066c69 I assumed but didn't verify that output prints ldiskfs_super_block::s_hash_seed. I believe Prakash has been trying to reproduce the issue with that patch and other debugging in place, but hasn't gotten a hit yet.

          Can anybody help to reproduce the issues with the debug information mentioned in my comment 19/Apr/13 4:39 PM? Thanks very much!

          yong.fan nasf (Inactive) added a comment - Can anybody help to reproduce the issues with the debug information mentioned in my comment 19/Apr/13 4:39 PM? Thanks very much!

          Andreas, they seem to be unrelated (other than having similar symptoms). LU-3182 was introduced by an lmv cleanup patch between 2.3.58 and 2.3.59, while this bug is affecting 2.1 and 1.8 systems. Also, LU-3182 should only affect PPC clients due to large page size.

          nedbass Ned Bass (Inactive) added a comment - Andreas, they seem to be unrelated (other than having similar symptoms). LU-3182 was introduced by an lmv cleanup patch between 2.3.58 and 2.3.59, while this bug is affecting 2.1 and 1.8 systems. Also, LU-3182 should only affect PPC clients due to large page size.

          Is it possible this problem relates to LU-3182 and http://review.whamcloud.com/6071?

          adilger Andreas Dilger added a comment - Is it possible this problem relates to LU-3182 and http://review.whamcloud.com/6071?

          For ldiskfs-based MDS, the cookie (or position, or client visible 'dirhash') returned to client for readdir() is composed of two ldiskfs internal hash values: major hash (32-bits) and minor hash (32-bits). If client support '64bithash', then the cookie is 64-bits with both major hash and minor hash; otherwise only major hash will be used to generate the cookie. All the things has been processed inside ldiskfs, NOT involve IAM.
          I have tested lustre-2.1.4-4chaos server + lustre-1.8.5.0-6chaos patchless client, with new formatted ldiskfs (NOT upgraded from lustre-1.6). I found that the '64bithash' is supported, and the dir cookie for communication between client and MDS are 64-bits, and each time only one page is fetched from MDS for readdir(). All are expected. So NOT reproduced your issues.

          According to your system log, there are several questionable points:

          1) Your MDT device was formatted with Lustre-1.6. Is it possible that the hash_seed (ldiskfs_super_block::s_hash_seed) has NOT been set correctly or crashed during the system using? So would you please add some debug info (or just printk) in ldiskfs to print out such hash_seed?

          2) On the other hand, we can NOT explain why mdd_dir_page_build() was called twice under failure cases. I think that CFS_PAGE_SIZE is equal to LU_PAGE_SIZE in your system, otherwise, it should repeat mdd_dir_page_build() for successful cases also. So I suspect some RPC parameters were NOT correctly set or crashed for failed readdir() cases. Would you please to add some debug info in lustre/mdt/mdt_handle.c as following:

          static int mdt_readpage(struct mdt_thread_info *info)
          ...
                                          PTLRPC_MAX_BRW_SIZE);
                  rdpg->rp_npages = (rdpg->rp_count + CFS_PAGE_SIZE - 1) >>
                                    CFS_PAGE_SHIFT;
          +        CDEBUG(D_INFO,
          +               "readdir: hash = %Lx, attr = %x, count = %u, npages = %u\n",
          +               rdpg->rp_hash, rdpg->rp_attrs, rdpg->rp_count, rdpg->rp_npages);
                  OBD_ALLOC(rdpg->rp_pages, rdpg->rp_npages * sizeof rdpg->rp_pages[0]);
                  if (rdpg->rp_pages == NULL)
                          RETURN(-ENOMEM);
          
          yong.fan nasf (Inactive) added a comment - For ldiskfs-based MDS, the cookie (or position, or client visible 'dirhash') returned to client for readdir() is composed of two ldiskfs internal hash values: major hash (32-bits) and minor hash (32-bits). If client support '64bithash', then the cookie is 64-bits with both major hash and minor hash; otherwise only major hash will be used to generate the cookie. All the things has been processed inside ldiskfs, NOT involve IAM. I have tested lustre-2.1.4-4chaos server + lustre-1.8.5.0-6chaos patchless client, with new formatted ldiskfs (NOT upgraded from lustre-1.6). I found that the '64bithash' is supported, and the dir cookie for communication between client and MDS are 64-bits, and each time only one page is fetched from MDS for readdir(). All are expected. So NOT reproduced your issues. According to your system log, there are several questionable points: 1) Your MDT device was formatted with Lustre-1.6. Is it possible that the hash_seed (ldiskfs_super_block::s_hash_seed) has NOT been set correctly or crashed during the system using? So would you please add some debug info (or just printk) in ldiskfs to print out such hash_seed? 2) On the other hand, we can NOT explain why mdd_dir_page_build() was called twice under failure cases. I think that CFS_PAGE_SIZE is equal to LU_PAGE_SIZE in your system, otherwise, it should repeat mdd_dir_page_build() for successful cases also. So I suspect some RPC parameters were NOT correctly set or crashed for failed readdir() cases. Would you please to add some debug info in lustre/mdt/mdt_handle.c as following: static int mdt_readpage(struct mdt_thread_info *info) ... PTLRPC_MAX_BRW_SIZE); rdpg->rp_npages = (rdpg->rp_count + CFS_PAGE_SIZE - 1) >> CFS_PAGE_SHIFT; + CDEBUG(D_INFO, + "readdir: hash = %Lx, attr = %x, count = %u, npages = %u\n" , + rdpg->rp_hash, rdpg->rp_attrs, rdpg->rp_count, rdpg->rp_npages); OBD_ALLOC(rdpg->rp_pages, rdpg->rp_npages * sizeof rdpg->rp_pages[0]); if (rdpg->rp_pages == NULL) RETURN(-ENOMEM);

          So as long as "rdpg->rp_count" is larger than CFS_PAGE_SIZE, then there will be multiple mdd_dir_page_build() call for single readdir RPC. Usually, the "rdpg->rp_count" is assigned by client side parameter via mdt_body::nlink, which should be CFS_PAGE_SIZE. But if something wrong to overwrite such filed, then confusing cases will happen. Consider the unclearly 64bit-dirhash issues, I suspect that there may be some interoperability issues.

          Right, that's the only way I can see mdd_dir_page_build getting called a second time as well. I'm unsure why rp_count would be anything other than CFS_PAGE_SIZE when using a 1.8.5 based client though.

          Also, I don't quite understand the values printed in the second call to mdd_dir_page_build of the "bad" case when compared to the "good" case. Specifically, in xid x1384931791058559 of the "good" test, we ask for hash 67825097 and it builds a dirpage like so:

          00000004:00000040:1.0:1366065858.330776:0:4488:0:(mdd_object.c:2365:mdd_dir_page_build()) lu_dirpage ldp_hash_start 678250957 ldp_hash_end 885661942 ldp_flags 0
          

          But when that same hash is asked for in the "bad" test, in xid x1384931780112747, the server builds a dirpage with these values:

          00000004:00000040:10.0:1365797000.195510:0:4492:0:(mdd_object.c:2365:mdd_dir_page_build()) lu_dirpage ldp_hash_start 4294967296 ldp_hash_end 972141167607355620 ldp_flags 0
          

          I'm curious why the server is returning different results for the same operation of looking up hash value 67825097. Even with the extra call to mdd_dir_build_page, it looks like the client still asks for the correct next hash of 67825097, but gets a completely different answer from the server.

          How are these hash values being built up? I used debugfs to look at the ldiskfs htree hash values for this directory (as Andreas suggested above), but I'm not sure how those values map to the lustre hash indices.

          prakash Prakash Surya (Inactive) added a comment - So as long as "rdpg->rp_count" is larger than CFS_PAGE_SIZE, then there will be multiple mdd_dir_page_build() call for single readdir RPC. Usually, the "rdpg->rp_count" is assigned by client side parameter via mdt_body::nlink, which should be CFS_PAGE_SIZE. But if something wrong to overwrite such filed, then confusing cases will happen. Consider the unclearly 64bit-dirhash issues, I suspect that there may be some interoperability issues. Right, that's the only way I can see mdd_dir_page_build getting called a second time as well. I'm unsure why rp_count would be anything other than CFS_PAGE_SIZE when using a 1.8.5 based client though. Also, I don't quite understand the values printed in the second call to mdd_dir_page_build of the "bad" case when compared to the "good" case. Specifically, in xid x1384931791058559 of the "good" test, we ask for hash 67825097 and it builds a dirpage like so: 00000004:00000040:1.0:1366065858.330776:0:4488:0:(mdd_object.c:2365:mdd_dir_page_build()) lu_dirpage ldp_hash_start 678250957 ldp_hash_end 885661942 ldp_flags 0 But when that same hash is asked for in the "bad" test, in xid x1384931780112747, the server builds a dirpage with these values: 00000004:00000040:10.0:1365797000.195510:0:4492:0:(mdd_object.c:2365:mdd_dir_page_build()) lu_dirpage ldp_hash_start 4294967296 ldp_hash_end 972141167607355620 ldp_flags 0 I'm curious why the server is returning different results for the same operation of looking up hash value 67825097. Even with the extra call to mdd_dir_build_page, it looks like the client still asks for the correct next hash of 67825097, but gets a completely different answer from the server. How are these hash values being built up? I used debugfs to look at the ldiskfs htree hash values for this directory (as Andreas suggested above), but I'm not sure how those values map to the lustre hash indices.

          I did not find 1.8.5-6chaos code, just check the code 1.8.5.0-6chaos:
          lustre/llite/llite_lib.c

                  data->ocd_connect_flags = OBD_CONNECT_VERSION | OBD_CONNECT_IBITS      |
                                            OBD_CONNECT_JOIN    | OBD_CONNECT_ATTRFID    |
                                            OBD_CONNECT_NODEVOH | OBD_CONNECT_CANCELSET  |
                                            OBD_CONNECT_AT      | OBD_CONNECT_FID        |
                                            OBD_CONNECT_VBR     | OBD_CONNECT_LOV_V3     |
                                            OBD_CONNECT_64BITHASH;
          

          That means 1.8.5.0-6chaos client support 64bit dirhash.

          I also checked 2.1.4-4chaos code:
          lustre/mdt/mdt_handler.c

                  if (info->mti_exp->exp_connect_flags & OBD_CONNECT_64BITHASH)
                          rdpg->rp_attrs |= LUDA_64BITHASH;
          

          lustre/osd-ldiskfs/osd_handler.c

                  if (attr & LUDA_64BITHASH)
          		it->oie_file.f_mode |= FMODE_64BITHASH;
                  else
          		it->oie_file.f_mode |= FMODE_32BITHASH;
          

          That means 2.1.4-4chaos server support 64bit dirhash also. So I think the dirhash listed as above should be 64-bits. But they are 32-bits only, in spite of successful cases or failure cases. I do not know what happened.

          As for the additional mdd_dir_page_build() call, according to the 2.1.4-4chaos implementation, it is possible:

                  for (i = 0, nob = rdpg->rp_count; rc == 0 && nob > 0;
                       i++, nob -= CFS_PAGE_SIZE) {
                          struct lu_dirpage *dp;
          
                          LASSERT(i < rdpg->rp_npages);
                          pg = rdpg->rp_pages[i];
                          dp = cfs_kmap(pg);
          #if CFS_PAGE_SIZE > LU_PAGE_SIZE
          repeat:
          #endif
                          rc = mdd_dir_page_build(env, mdd, dp,
                                                  min_t(int, nob, LU_PAGE_SIZE),
                                                  iops, it, rdpg->rp_attrs);
                          if (rc > 0) {
                                  /*
                                   * end of directory.
                                   */
                                  dp->ldp_hash_end = cpu_to_le64(MDS_DIR_END_OFF);
                                  nlupgs++;
                          } else if (rc < 0) {
                                  CWARN("build page failed: %d!\n", rc);
                          } else {
                                  nlupgs++;
          #if CFS_PAGE_SIZE > LU_PAGE_SIZE
                                  dp = (struct lu_dirpage *)((char *)dp + LU_PAGE_SIZE);
                                  if ((unsigned long)dp & ~CFS_PAGE_MASK)
                                          goto repeat;
          #endif
                          }
                          cfs_kunmap(pg);
                  }
          

          So as long as "rdpg->rp_count" is larger than CFS_PAGE_SIZE, then there will be multiple mdd_dir_page_build() call for single readdir RPC. Usually, the "rdpg->rp_count" is assigned by client side parameter via mdt_body::nlink, which should be CFS_PAGE_SIZE. But if something wrong to overwrite such filed, then confusing cases will happen. Consider the unclearly 64bit-dirhash issues, I suspect that there may be some interoperability issues.

          yong.fan nasf (Inactive) added a comment - I did not find 1.8.5-6chaos code, just check the code 1.8.5.0-6chaos: lustre/llite/llite_lib.c data->ocd_connect_flags = OBD_CONNECT_VERSION | OBD_CONNECT_IBITS | OBD_CONNECT_JOIN | OBD_CONNECT_ATTRFID | OBD_CONNECT_NODEVOH | OBD_CONNECT_CANCELSET | OBD_CONNECT_AT | OBD_CONNECT_FID | OBD_CONNECT_VBR | OBD_CONNECT_LOV_V3 | OBD_CONNECT_64BITHASH; That means 1.8.5.0-6chaos client support 64bit dirhash. I also checked 2.1.4-4chaos code: lustre/mdt/mdt_handler.c if (info->mti_exp->exp_connect_flags & OBD_CONNECT_64BITHASH) rdpg->rp_attrs |= LUDA_64BITHASH; lustre/osd-ldiskfs/osd_handler.c if (attr & LUDA_64BITHASH) it->oie_file.f_mode |= FMODE_64BITHASH; else it->oie_file.f_mode |= FMODE_32BITHASH; That means 2.1.4-4chaos server support 64bit dirhash also. So I think the dirhash listed as above should be 64-bits. But they are 32-bits only, in spite of successful cases or failure cases. I do not know what happened. As for the additional mdd_dir_page_build() call, according to the 2.1.4-4chaos implementation, it is possible: for (i = 0, nob = rdpg->rp_count; rc == 0 && nob > 0; i++, nob -= CFS_PAGE_SIZE) { struct lu_dirpage *dp; LASSERT(i < rdpg->rp_npages); pg = rdpg->rp_pages[i]; dp = cfs_kmap(pg); # if CFS_PAGE_SIZE > LU_PAGE_SIZE repeat: #endif rc = mdd_dir_page_build(env, mdd, dp, min_t( int , nob, LU_PAGE_SIZE), iops, it, rdpg->rp_attrs); if (rc > 0) { /* * end of directory. */ dp->ldp_hash_end = cpu_to_le64(MDS_DIR_END_OFF); nlupgs++; } else if (rc < 0) { CWARN( "build page failed: %d!\n" , rc); } else { nlupgs++; # if CFS_PAGE_SIZE > LU_PAGE_SIZE dp = (struct lu_dirpage *)(( char *)dp + LU_PAGE_SIZE); if ((unsigned long )dp & ~CFS_PAGE_MASK) goto repeat; #endif } cfs_kunmap(pg); } So as long as "rdpg->rp_count" is larger than CFS_PAGE_SIZE, then there will be multiple mdd_dir_page_build() call for single readdir RPC. Usually, the "rdpg->rp_count" is assigned by client side parameter via mdt_body::nlink, which should be CFS_PAGE_SIZE. But if something wrong to overwrite such filed, then confusing cases will happen. Consider the unclearly 64bit-dirhash issues, I suspect that there may be some interoperability issues.

          People

            yong.fan nasf (Inactive)
            morrone Christopher Morrone (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            15 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: