[LU-7447] Incorrect nlink attr for new create directory Created: 18/Nov/15 Updated: 02/Dec/15 Resolved: 02/Dec/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.7.0, Lustre 2.8.0 |
| Fix Version/s: | Lustre 2.8.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | nasf (Inactive) | Assignee: | nasf (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
During lfsck test, I found that sometime the new created directory object's nlink attr is 3, not the expected 2. The root reason is related with the operations order between ref_add(dir) and ea_insert(..). We need three main operations for the new created directory object: Sometimes, the 3rd step maybe ahead of 1st, sometimes maybe between 1st and 2nd. Usually, the developers would thought that such order is not important. But in fact, such assumption is not true. That is because the ldiskfs will set the new created directory object's nlink as 2. Then if the callers call ref_add() after that, the directory object's nlink attr will become 3. |
| Comments |
| Comment by nasf (Inactive) [ 18/Nov/15 ] |
|
It looks like the ref_add() is redundant. Can we drop it directly? The answer is "no". Because the ldiskfs_add_dot_dotdot() is the exported function by the ldiskfs for OSD to initialise dot/dotdot entries, some kernel sets nlink attr inside ldiskfs_add_dot_dotdot(), such as 2.6; but some do that out of ldiskfs_add_dot_dotdot(), such as 3.12. There are several solutions for that: 2) New ldiskfs patch to make ldiskfs_add_dot_dotdot() to NOT set nlink attr, instead, do that as 3.12 kernel does. Because ldiskfs can be used as local FS, we still to set the nlink attr as 2 inside ldiskfs after ldiskfs_add_dot_dotdot() called. 3) If we do not want to maintain ldiskfs patch, then we can adjust all the up layer callers to make the ref_add() to be called before the dot dot entry inserted. That is not complex. But the real trouble is that some others may be still not aware of such order's side-effect and make the same wrong then break the nlink attr again. Andreas, what is your suggestion? |
| Comment by Alex Zhuravlev [ 18/Nov/15 ] |
|
order should not matter, we should fix this is ldiskfs or osd-ldiskfs by some means.. |
| Comment by nasf (Inactive) [ 18/Nov/15 ] |
|
So do we need the ref_add() to be called in the up layers of OSD? If not, we have to handle zfs-osd also, right? |
| Comment by Alex Zhuravlev [ 18/Nov/15 ] |
|
of course we need, otherwise what the model would be so that both MDD and LFSCK can control this properly? |
| Comment by Andreas Dilger [ 18/Nov/15 ] |
|
Any ldiskfs patch should be made to fix old ldiskfs to work like the new kernel so that the new kernels do not need a patch and old patches can be deleted in the future. I wouldn't object to the existing code to also be changed to call ref_add() first so that it works correctly with all kernels, since this needs very little effort, but may avoid bugs for users with Lustre on different kernels. In the future the order should not matter, so no risk of adding bugs. It would also be good to check the kernel logs to see why this change was made in 3.12 to understand it better and in case we also need to fix something else. |
| Comment by Gerrit Updater [ 19/Nov/15 ] |
|
Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/17269 |
| Comment by Gerrit Updater [ 02/Dec/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/17269/ |
| Comment by nasf (Inactive) [ 02/Dec/15 ] |
|
The patch has been landed to master. |