[LU-7447] Incorrect nlink attr for new create directory Created: 18/Nov/15  Updated: 02/Dec/15  Resolved: 02/Dec/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0, Lustre 2.8.0
Fix Version/s: Lustre 2.8.0

Type: Bug Priority: Critical
Reporter: nasf (Inactive) Assignee: nasf (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

During lfsck test, I found that sometime the new created directory object's nlink attr is 3, not the expected 2. The root reason is related with the operations order between ref_add(dir) and ea_insert(..). We need three main operations for the new created directory object:
1) insert dot entry
2) insert dotdot entry
3) ref_add on the directory object.

Sometimes, the 3rd step maybe ahead of 1st, sometimes maybe between 1st and 2nd. Usually, the developers would thought that such order is not important. But in fact, such assumption is not true. That is because the ldiskfs will set the new created directory object's nlink as 2. Then if the callers call ref_add() after that, the directory object's nlink attr will become 3.



 Comments   
Comment by nasf (Inactive) [ 18/Nov/15 ]

It looks like the ref_add() is redundant. Can we drop it directly? The answer is "no". Because the ldiskfs_add_dot_dotdot() is the exported function by the ldiskfs for OSD to initialise dot/dotdot entries, some kernel sets nlink attr inside ldiskfs_add_dot_dotdot(), such as 2.6; but some do that out of ldiskfs_add_dot_dotdot(), such as 3.12.

There are several solutions for that:
1) New ldiskfs patch to make ldiskfs_add_dot_dotdot() to set nlink attr as 2 for all kinds of kernels. The trouble is that we have to maintain such patch for more and more new kernels.

2) New ldiskfs patch to make ldiskfs_add_dot_dotdot() to NOT set nlink attr, instead, do that as 3.12 kernel does. Because ldiskfs can be used as local FS, we still to set the nlink attr as 2 inside ldiskfs after ldiskfs_add_dot_dotdot() called.

3) If we do not want to maintain ldiskfs patch, then we can adjust all the up layer callers to make the ref_add() to be called before the dot dot entry inserted. That is not complex. But the real trouble is that some others may be still not aware of such order's side-effect and make the same wrong then break the nlink attr again.

Andreas, what is your suggestion?

Comment by Alex Zhuravlev [ 18/Nov/15 ]

order should not matter, we should fix this is ldiskfs or osd-ldiskfs by some means..

Comment by nasf (Inactive) [ 18/Nov/15 ]

So do we need the ref_add() to be called in the up layers of OSD? If not, we have to handle zfs-osd also, right?

Comment by Alex Zhuravlev [ 18/Nov/15 ]

of course we need, otherwise what the model would be so that both MDD and LFSCK can control this properly?
DMU (which we use) doesn't really care about nlink and this is great. I'd hope we can do this in osd-ldiskfs as well.
would it be possible to store nlink before calling to ldiskfs_add_dot_dotdot() and then setting it back?
not very elegant, but this is how ldiskfs is structured now..

Comment by Andreas Dilger [ 18/Nov/15 ]

Any ldiskfs patch should be made to fix old ldiskfs to work like the new kernel so that the new kernels do not need a patch and old patches can be deleted in the future. I wouldn't object to the existing code to also be changed to call ref_add() first so that it works correctly with all kernels, since this needs very little effort, but may avoid bugs for users with Lustre on different kernels. In the future the order should not matter, so no risk of adding bugs.

It would also be good to check the kernel logs to see why this change was made in 3.12 to understand it better and in case we also need to fix something else.

Comment by Gerrit Updater [ 19/Nov/15 ]

Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/17269
Subject: LU-7447 lfsck: add_ref before dotdot inserted
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: b05ae555e21f9d3a21bac56a6c7d69b7320e3874

Comment by Gerrit Updater [ 02/Dec/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/17269/
Subject: LU-7447 lfsck: correct nlink attr for new created dir
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 5434d9402f6deb46f25bcf4c92726711b9d77d00

Comment by nasf (Inactive) [ 02/Dec/15 ]

The patch has been landed to master.

Generated at Sat Feb 10 02:08:59 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.