Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1488

2.1.2 servers, 1.8.8 clients _mdc_blocking_ast()) ### data mismatch with ino

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 1.8.9
    • Lustre 1.8.8
    • None
    • 3
    • 6386

    Description

      When running 2.1.2 servers with RHEL5/1.8.8 clients, Servers are RHEL6 - seeing this error quite a bit.Test is recovery-scale, but error also observed with other tests.

      un 6 09:50:44 ehyperion354 Lustre: Server lustre-MDT0000_UUID version (2.1.1.0) is much newer than client version (1.8.8)
      Jun 6 09:50:44 ehyperion354 LustreError: 25302:0:(namei.c:256:ll_mdc_blocking_ast()) ### data mismatch with ino 144115305952584894/0 (ffff8101b4b8e9a0) ns: lustre-MDT0000-mdc-ffff81021762a000 lock: ffff810160e1f400/0x793342fb2430adf9 lrc: 3/0,0 mode: PR/PR res: 8589941618/9406 bits 0x1 rrc: 2 type: IBT flags: 0x2002c90 remote: 0xe77c1b87ad8301fe expref: -99 pid: 25116 timeout: 0
      Jun 6 09:50:44 ehyperion354 LustreError: 25116:0:(mdc_locks.c:653:mdc_enqueue()) ldlm_cli_enqueue error: -4
      Jun 6 09:50:44 ehyperion354 LustreError: 25116:0:(file.c:3331:ll_inode_revalidate_fini()) failure -4 inode 180355073
      Jun 6 09:50:44 ehyperion354 LustreError: 25303:0:(client.c:858:ptlrpc_import_delay_req()) @@@ IMP_INVALID req@ffff810162637400 x1403992855059565/t0 o101->lustre-MDT0000_UUID@192.168.120.126@o2ib:12/10 lens 544/1232 e 0 to 1 dl 0 ref 1 fl Rpc:/0/0 rc 0/0

      Attachments

        Issue Links

          Activity

            [LU-1488] 2.1.2 servers, 1.8.8 clients _mdc_blocking_ast()) ### data mismatch with ino
            ys Yang Sheng added a comment -

            I don't think this issue will cause a real problem. We only reference fid_res_name_eq() in ll_mdc_blocking_ast() and just doing a check and output a error message. In fact, it harmless.

            ys Yang Sheng added a comment - I don't think this issue will cause a real problem. We only reference fid_res_name_eq() in ll_mdc_blocking_ast() and just doing a check and output a error message. In fact, it harmless.

            Is there no way to fix this from the 2.X side?

            morrone Christopher Morrone (Inactive) added a comment - Is there no way to fix this from the 2.X side?
            ys Yang Sheng added a comment - Patch upload to http://review.whamcloud.com/3522 .
            ys Yang Sheng added a comment -

            This issue cause by commit ef8bd11416bae8c03a65682f3a10a4da39922b45. The fid_build_reg_res_name() & fid_res_name_eq() use different way build & compare lu_fid & res. I'll produce a patch to fix it.

            ys Yang Sheng added a comment - This issue cause by commit ef8bd11416bae8c03a65682f3a10a4da39922b45. The fid_build_reg_res_name() & fid_res_name_eq() use different way build & compare lu_fid & res. I'll produce a patch to fix it.
            pjones Peter Jones added a comment -

            Yangsheng will look into this one

            pjones Peter Jones added a comment - Yangsheng will look into this one
            pjones Peter Jones added a comment -

            Bob

            Could you please look into this one?

            Thanks

            Peter

            pjones Peter Jones added a comment - Bob Could you please look into this one? Thanks Peter

            Just test Lustre 2.2.0 servers with 1.8.8 and I saw the same problem.

            simmonsja James A Simmons added a comment - Just test Lustre 2.2.0 servers with 1.8.8 and I saw the same problem.

            This is interesting, but hopefully not a difficult problem to fix...

            The inode number is reported as ino = 144115305952584894 = 0x200001b720024be, gen = 0, while the lock resource is reported as ino = 8589941618 = 0x200001b72, gen = 9406 = 0x24be. So there is no real problem here (i.e. no mismatch of ino/generation between the inode and the DLM lock), but it just isn't comparing the two values correctly. The inode number is the "flattened" version, while the DLM lock has the proper "FID":

                            fid = ll_inode_lu_fid(inode);
                            :
                            :
                            if (!fid_res_name_eq(fid, &lock->l_resource->lr_name)) {
                                    LDLM_ERROR(lock, "data mismatch with ino %lu/%u (%p)",
                                               inode->i_ino, inode->i_generation, inode);
                            }
            

            However, this doesn't totally explain the problem away, because the "fid" being compared is from ll_i2info(inode)->ll_fid.f20, but LDLM_ERROR() doesn't actually print the "fid" value. It isn't clear if the flattened inode number is actually stored in lu_fid, or perhaps it is just uninitialized and failing the comparison for that reason, or is just full of garbage?

            If I had any skill with systemtap it would probably be possible to print out "fid" when this check fails, without having to submit a patch to do the same, build, land, install, and retest.

            adilger Andreas Dilger added a comment - This is interesting, but hopefully not a difficult problem to fix... The inode number is reported as ino = 144115305952584894 = 0x200001b720024be, gen = 0, while the lock resource is reported as ino = 8589941618 = 0x200001b72, gen = 9406 = 0x24be. So there is no real problem here (i.e. no mismatch of ino/generation between the inode and the DLM lock), but it just isn't comparing the two values correctly. The inode number is the "flattened" version, while the DLM lock has the proper "FID": fid = ll_inode_lu_fid(inode); : : if (!fid_res_name_eq(fid, &lock->l_resource->lr_name)) { LDLM_ERROR(lock, "data mismatch with ino %lu/%u (%p)" , inode->i_ino, inode->i_generation, inode); } However, this doesn't totally explain the problem away, because the "fid" being compared is from ll_i2info(inode)->ll_fid.f20, but LDLM_ERROR() doesn't actually print the "fid" value. It isn't clear if the flattened inode number is actually stored in lu_fid, or perhaps it is just uninitialized and failing the comparison for that reason, or is just full of garbage? If I had any skill with systemtap it would probably be possible to print out "fid" when this check fails, without having to submit a patch to do the same, build, land, install, and retest.

            People

              ys Yang Sheng
              cliffw Cliff White (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: