[LU-1488] 2.1.2 servers, 1.8.8 clients _mdc_blocking_ast()) ### data mismatch with ino Created: 06/Jun/12 Updated: 22/Feb/13 Resolved: 17/Aug/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 1.8.8 |
| Fix Version/s: | Lustre 1.8.9 |
| Type: | Bug | Priority: | Major |
| Reporter: | Cliff White (Inactive) | Assignee: | Yang Sheng |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 6386 |
| Description |
|
When running 2.1.2 servers with RHEL5/1.8.8 clients, Servers are RHEL6 - seeing this error quite a bit.Test is recovery-scale, but error also observed with other tests. un 6 09:50:44 ehyperion354 Lustre: Server lustre-MDT0000_UUID version (2.1.1.0) is much newer than client version (1.8.8) |
| Comments |
| Comment by Andreas Dilger [ 06/Jun/12 ] |
|
This is interesting, but hopefully not a difficult problem to fix... The inode number is reported as ino = 144115305952584894 = 0x200001b720024be, gen = 0, while the lock resource is reported as ino = 8589941618 = 0x200001b72, gen = 9406 = 0x24be. So there is no real problem here (i.e. no mismatch of ino/generation between the inode and the DLM lock), but it just isn't comparing the two values correctly. The inode number is the "flattened" version, while the DLM lock has the proper "FID": fid = ll_inode_lu_fid(inode);
:
:
if (!fid_res_name_eq(fid, &lock->l_resource->lr_name)) {
LDLM_ERROR(lock, "data mismatch with ino %lu/%u (%p)",
inode->i_ino, inode->i_generation, inode);
}
However, this doesn't totally explain the problem away, because the "fid" being compared is from ll_i2info(inode)->ll_fid.f20, but LDLM_ERROR() doesn't actually print the "fid" value. It isn't clear if the flattened inode number is actually stored in lu_fid, or perhaps it is just uninitialized and failing the comparison for that reason, or is just full of garbage? If I had any skill with systemtap it would probably be possible to print out "fid" when this check fails, without having to submit a patch to do the same, build, land, install, and retest. |
| Comment by James A Simmons [ 12/Jun/12 ] |
|
Just test Lustre 2.2.0 servers with 1.8.8 and I saw the same problem. |
| Comment by Peter Jones [ 26/Jul/12 ] |
|
Bob Could you please look into this one? Thanks Peter |
| Comment by Peter Jones [ 02/Aug/12 ] |
|
Yangsheng will look into this one |
| Comment by Yang Sheng [ 03/Aug/12 ] |
|
This issue cause by commit ef8bd11416bae8c03a65682f3a10a4da39922b45. The fid_build_reg_res_name() & fid_res_name_eq() use different way build & compare lu_fid & res. I'll produce a patch to fix it. |
| Comment by Yang Sheng [ 03/Aug/12 ] |
|
Patch upload to http://review.whamcloud.com/3522. |
| Comment by Christopher Morrone [ 03/Aug/12 ] |
|
Is there no way to fix this from the 2.X side? |
| Comment by Yang Sheng [ 03/Aug/12 ] |
|
I don't think this issue will cause a real problem. We only reference fid_res_name_eq() in ll_mdc_blocking_ast() and just doing a check and output a error message. In fact, it harmless. |
| Comment by Christopher Morrone [ 03/Aug/12 ] |
|
Alright, thanks! |
| Comment by Yang Sheng [ 06/Aug/12 ] |
|
I think b1_8 has some issue run on test system, Please take a look for http://review.whamcloud.com/3539 . I just change lustre/ChangeLog. It still failed in same result and it works well when i download the packages on my local box. |
| Comment by Peter Jones [ 17/Aug/12 ] |
|
Fix landed for b1_8 |