[LU-1488] 2.1.2 servers, 1.8.8 clients _mdc_blocking_ast()) ### data mismatch with ino Created: 06/Jun/12  Updated: 22/Feb/13  Resolved: 17/Aug/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 1.8.8
Fix Version/s: Lustre 1.8.9

Type: Bug Priority: Major
Reporter: Cliff White (Inactive) Assignee: Yang Sheng
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 6386

 Description   

When running 2.1.2 servers with RHEL5/1.8.8 clients, Servers are RHEL6 - seeing this error quite a bit.Test is recovery-scale, but error also observed with other tests.

un 6 09:50:44 ehyperion354 Lustre: Server lustre-MDT0000_UUID version (2.1.1.0) is much newer than client version (1.8.8)
Jun 6 09:50:44 ehyperion354 LustreError: 25302:0:(namei.c:256:ll_mdc_blocking_ast()) ### data mismatch with ino 144115305952584894/0 (ffff8101b4b8e9a0) ns: lustre-MDT0000-mdc-ffff81021762a000 lock: ffff810160e1f400/0x793342fb2430adf9 lrc: 3/0,0 mode: PR/PR res: 8589941618/9406 bits 0x1 rrc: 2 type: IBT flags: 0x2002c90 remote: 0xe77c1b87ad8301fe expref: -99 pid: 25116 timeout: 0
Jun 6 09:50:44 ehyperion354 LustreError: 25116:0:(mdc_locks.c:653:mdc_enqueue()) ldlm_cli_enqueue error: -4
Jun 6 09:50:44 ehyperion354 LustreError: 25116:0:(file.c:3331:ll_inode_revalidate_fini()) failure -4 inode 180355073
Jun 6 09:50:44 ehyperion354 LustreError: 25303:0:(client.c:858:ptlrpc_import_delay_req()) @@@ IMP_INVALID req@ffff810162637400 x1403992855059565/t0 o101->lustre-MDT0000_UUID@192.168.120.126@o2ib:12/10 lens 544/1232 e 0 to 1 dl 0 ref 1 fl Rpc:/0/0 rc 0/0



 Comments   
Comment by Andreas Dilger [ 06/Jun/12 ]

This is interesting, but hopefully not a difficult problem to fix...

The inode number is reported as ino = 144115305952584894 = 0x200001b720024be, gen = 0, while the lock resource is reported as ino = 8589941618 = 0x200001b72, gen = 9406 = 0x24be. So there is no real problem here (i.e. no mismatch of ino/generation between the inode and the DLM lock), but it just isn't comparing the two values correctly. The inode number is the "flattened" version, while the DLM lock has the proper "FID":

                fid = ll_inode_lu_fid(inode);
                :
                :
                if (!fid_res_name_eq(fid, &lock->l_resource->lr_name)) {
                        LDLM_ERROR(lock, "data mismatch with ino %lu/%u (%p)",
                                   inode->i_ino, inode->i_generation, inode);
                }

However, this doesn't totally explain the problem away, because the "fid" being compared is from ll_i2info(inode)->ll_fid.f20, but LDLM_ERROR() doesn't actually print the "fid" value. It isn't clear if the flattened inode number is actually stored in lu_fid, or perhaps it is just uninitialized and failing the comparison for that reason, or is just full of garbage?

If I had any skill with systemtap it would probably be possible to print out "fid" when this check fails, without having to submit a patch to do the same, build, land, install, and retest.

Comment by James A Simmons [ 12/Jun/12 ]

Just test Lustre 2.2.0 servers with 1.8.8 and I saw the same problem.

Comment by Peter Jones [ 26/Jul/12 ]

Bob

Could you please look into this one?

Thanks

Peter

Comment by Peter Jones [ 02/Aug/12 ]

Yangsheng will look into this one

Comment by Yang Sheng [ 03/Aug/12 ]

This issue cause by commit ef8bd11416bae8c03a65682f3a10a4da39922b45. The fid_build_reg_res_name() & fid_res_name_eq() use different way build & compare lu_fid & res. I'll produce a patch to fix it.

Comment by Yang Sheng [ 03/Aug/12 ]

Patch upload to http://review.whamcloud.com/3522.

Comment by Christopher Morrone [ 03/Aug/12 ]

Is there no way to fix this from the 2.X side?

Comment by Yang Sheng [ 03/Aug/12 ]

I don't think this issue will cause a real problem. We only reference fid_res_name_eq() in ll_mdc_blocking_ast() and just doing a check and output a error message. In fact, it harmless.

Comment by Christopher Morrone [ 03/Aug/12 ]

Alright, thanks!

Comment by Yang Sheng [ 06/Aug/12 ]

I think b1_8 has some issue run on test system, Please take a look for http://review.whamcloud.com/3539 . I just change lustre/ChangeLog. It still failed in same result and it works well when i download the packages on my local box.

Comment by Peter Jones [ 17/Aug/12 ]

Fix landed for b1_8

Generated at Sat Feb 10 01:17:04 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.