[LU-3240] The link count is not updated after the mkdir Created: 28/Apr/13 Updated: 20/Dec/13 Resolved: 11/Oct/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.2.0, Lustre 2.3.0, Lustre 2.1.5 |
| Fix Version/s: | Lustre 2.5.0, Lustre 2.4.2 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Shuichi Ihara (Inactive) | Assignee: | Nathaniel Clark |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | mn1, patch | ||
| Environment: |
RHEL6.3 |
||
| Issue Links: |
|
||||||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||||||
| Rank (Obsolete): | 7956 | ||||||||||||||||||||||||
| Description |
|
We sometimes saw incorrect link counts after the mkdir or remove directories. We created a simple reproducer for this problem below. #!/bin/sh -x TESTDIR=/lustre/linkcount_test [ -d "$TESTDIR" ] && rm -rf $TESTDIR mkdir $TESTDIR cd $TESTDIR ls mkdir a b ls ls -adl $TESTDIR rmdir b ls -ald # ./test.sh
+ TESTDIR=/lustre/linkcount_test
+ '[' -d /lustre/linkcount_test ']'
+ rm -rf /lustre/linkcount_test
+ mkdir /lustre/linkcount_test
+ cd /lustre/linkcount_test
+ ls
+ mkdir a b
+ ls
a b
+ ls -adl /lustre/linkcount_test
drwxr-xr-x 2 root root 4096 Apr 30 02:52 /lustre/linkcount_test
^^^
This should be 4 after two directories created.
+ rmdir b
+ ls -ald
drwxr-xr-x 3 root root 4096 Apr 30 02:52 .
^^^
Updated after an directory is removed.
This problem happens on b2_1, b2_2 and b2_3, but not happens if both server and clients are running with current master. |
| Comments |
| Comment by Shuichi Ihara (Inactive) [ 28/Apr/13 ] |
|
After analysis, we found which commit started not caused problem on current master. Especially, following codes changes in lustre/llite/name.c affects to this problem. If MDS_INODELOCK_PERM is removed from the codes, we saw same problem. @@ -284,7 +284,7 @@ int ll_md_blocking_ast(struct ldlm_lock *lock, struct ldlm_lock_desc *desc,
if (inode->i_sb->s_root &&
inode != inode->i_sb->s_root->d_inode &&
- (bits & MDS_INODELOCK_LOOKUP))
+ (bits & (MDS_INODELOCK_LOOKUP | MDS_INODELOCK_PERM)))
ll_invalidate_aliases(inode);
iput(inode);
break;
Here is test results. current master
# ./test.sh
+ TESTDIR=/lustre/linkcount_test
+ '[' -d /lustre/linkcount_test ']'
+ rm -rf /lustre/linkcount_test
+ mkdir /lustre/linkcount_test
+ cd /lustre/linkcount_test
+ ls
+ mkdir a b
+ ls
a b
+ ls -adl /lustre/linkcount_test
drwxr-xr-x 4 root root 4096 Apr 30 07:37 /lustre/linkcount_test
^^^
the link count is updated correctly.
+ rmdir b
+ ls -ald
drwxr-xr-x 3 root root 4096 Apr 30 07:37 .
remove "MDS_INODELOCK_PERM" above codes and revert previous code on the client.
# ./test.sh
+ TESTDIR=/lustre/linkcount_test
+ '[' -d /lustre/linkcount_test ']'
+ rm -rf /lustre/linkcount_test
+ mkdir /lustre/linkcount_test
+ cd /lustre/linkcount_test
+ ls
+ mkdir a b
+ ls
a b
+ ls -adl /lustre/linkcount_test
drwxr-xr-x 2 root root 4096 Apr 30 08:24 /lustre/linkcount_test
^^^
not updated
+ rmdir b
+ ls -ald
drwxr-xr-x 3 root root 4096 Apr 30 08:24 .
We still don't know, why this is related to not cause this problem and in any case, what is root cause of this problem... |
| Comment by Peter Jones [ 29/Apr/13 ] |
|
Nathaniel Could you please look into this issue? Thanks Peter |
| Comment by Andreas Dilger [ 29/Apr/13 ] |
|
I did a quick test on my 2.1.3 system and could reproduce this. Fortunately, the link count is actually correct on another node and on disk, it is just a cache consistency issue on the client. Note that the link count on the client is "read only", so this could not lead to inconsistency on the server, but might cause programs like "find" or "ls -R" to miss scanning newly-created subdirectories. |
| Comment by Shuichi Ihara (Inactive) [ 29/Apr/13 ] |
|
Andreas, And, "find" command detects incorrect link count, it showed the following warning messages. We saw this issue at the production system often. This was original problem we found, then started analysis and making a producer.. find: WARNING: Hard link count is wrong for `./test' (saw only |
| Comment by Shuichi Ihara (Inactive) [ 02/May/13 ] |
|
Hi Nathaniel, Andreas, |
| Comment by Andreas Dilger [ 06/May/13 ] |
|
I think the root of the problem here is that creating a subdirectory should be revoking the MDS_INODELOCK_UPDATE bit (though allowing the MDS_INODELOCK_LOOKUP bit to remain on the client in a separate lock), and this needs to be re-fetched when doing a "stat" on the inode. This appears to be the case: vfs_getattr()
ll_getattr_it()
ll_inode_revalidate_it(ibits=(MDS_INODELOCK_UPDATE | MDS_INODELOCK_LOOKUP))
but this isn't getting the correct link count from the MDS. Just updating the link count on the creating client is not sufficient, since the same problem could exist on other clients also creating directories concurrently. |
| Comment by Manisha salve (Inactive) [ 09/May/13 ] |
|
By checking MDS_INODELOCK_UPDATE bit in ll_md_blocking_ast() will solve the issue. if (inode->i_sb->s_root &&
inode != inode->i_sb->s_root->d_inode &&
(bits & (MDS_INODELOCK_LOOKUP | MDS_INODELOCK_UPDATE)))
ll_invalidate_aliases(inode);
The ll_invalidate_aliases() will make the dentry invalid. Now as the part of revalidation , it will get the inode from MDS and will have correct nlink count. |
| Comment by Andreas Dilger [ 09/May/13 ] |
|
In addition to the question of whether this change will cause the inode to be invalidated, there is the separate question of whether it will remain cached on the client when the inode is not being modified... Can you please improve the test case to check whether repeated "stat" calls on either mountpoint do not result in extra lock enqueues on the MDS. |
| Comment by Manisha salve (Inactive) [ 10/May/13 ] |
|
Hi Andreas, Following test results will verify the above claims. C2:
C1: stat /mnt/lustre (One lock request from C1 gets en-queued on mds)
C2: mkdir /mnt/lustre/a (One lock request from C2 gets en-queued on mds and Lock cancel request from C1) [root@pun-lsfs21-node07 lustre-release]# cat /proc/fs/lustre/mdt/lustre-MDT0000/exports/0@lo/ldlm_stats Again stat on C2 will revalidate the inode and will get updated inode contents from MDS: |
| Comment by Alexey Lyashkov [ 15/May/13 ] |
if (inode->i_sb->s_root &&
inode != inode->i_sb->s_root->d_inode &&
(bits & (MDS_INODELOCK_LOOKUP | MDS_INODELOCK_UPDATE)))
ll_invalidate_aliases(inode);
using a update bit to invalidate aliases is wrong. because it's bit should be protect an attributes, not a name - but you tried to remove whole information from a dcache, in that case new lookup will triggered. it's solve that particular issue, but too bad at all as need to send an additional rpc in case we need just lookup name without attributes (or need minimal attributes subset to process open an security checks). that bug may reproduced easy tmp/test> mkdir newdir; ls; find . -print newdir . find: WARNING: Hard link count is wrong for `.' (saw only st_nlink=2 but we already saw 0 subdirectories): this may be a bug in your file system driver. Automatically turning on find's -noleaf option. Earlier results may have failed to include directories that should have been searched. ./newdir tmp/test> Xyratex plan to start work on that issue tomorrow. |
| Comment by John Hammond [ 15/May/13 ] |
|
Working on another issue I noticed that ll_have_me_lock() cannot clear MDS_INODELOCK_PERM from *bits since MDS_INODELOCK_MAXSHIFT is 4, while MDS_INODELOCK_PERM is 0x10: int ll_have_md_lock(struct inode *inode, __u64 *bits, ldlm_mode_t l_req_mode)
{
struct lustre_handle lockh;
ldlm_policy_data_t policy;
ldlm_mode_t mode = (l_req_mode == LCK_MINMODE) ?
(LCK_CR|LCK_CW|LCK_PR|LCK_PW) : l_req_mode;
struct lu_fid *fid;
__u64 flags;
int i;
ENTRY;
if (!inode)
RETURN(0);
fid = &ll_i2info(inode)->lli_fid;
CDEBUG(D_INFO, "trying to match res "DFID" mode %s\n", PFID(fid),
ldlm_lockname[mode]);
flags = LDLM_FL_BLOCK_GRANTED | LDLM_FL_CBPENDING | LDLM_FL_TEST_LOCK;
for (i = 0; i < MDS_INODELOCK_MAXSHIFT && *bits != 0; i++) {
policy.l_inodebits.bits = *bits & (1 << i);
if (policy.l_inodebits.bits == 0)
continue;
...
|
| Comment by Alexey Lyashkov [ 16/May/13 ] |
|
Based in Panda (Andrew Perepechko <andrew_perepechko@xyratex.com>) initial investigation problem is
BUG hit. so.. drop openlock will caused full revalidate inode and ll_intent_open call so attributes will updates correctly. ps. readdir forget update a inode attributes from a 1.4 (i don't have early version on quick access). |
| Comment by Alexey Lyashkov [ 17/May/13 ] |
|
in discussion with green, we found - readdir take a LCK_CR mode lock. it mode should be don't matched in revalidate (in theory), if we will able remove skipping revalidate we will have valid attributes in stat() syscall. |
| Comment by Alexey Lyashkov [ 17/May/13 ] |
bash-3.2$ git diff
diff --git a/lustre/llite/dir.c b/lustre/llite/dir.c
index 24ba16d..77ee138 100644
--- a/lustre/llite/dir.c
+++ b/lustre/llite/dir.c
@@ -368,7 +368,7 @@ struct page *ll_get_dir_page(struct file *filp, struct inode *dir, __u64 hash,
struct ll_inode_info *lli = ll_i2info(dir);
int hash64 = ll_i2sbi(dir)->ll_flags & LL_SBI_64BIT_HASH;
- mode = LCK_PR;
+ mode = LCK_CR;
rc = md_lock_match(ll_i2sbi(dir)->ll_md_exp, LDLM_FL_BLOCK_GRANTED,
ll_inode2fid(dir), LDLM_IBITS, &policy, mode, &lockh);
if (!rc) {
diff --git a/lustre/mdc/mdc_locks.c b/lustre/mdc/mdc_locks.c
index 84389c8..21dbecd 100644
--- a/lustre/mdc/mdc_locks.c
+++ b/lustre/mdc/mdc_locks.c
@@ -923,15 +923,19 @@ int mdc_revalidate_lock(struct obd_export *exp, struct lookup_intent *it,
struct lustre_handle lockh;
ldlm_policy_data_t policy;
ldlm_mode_t mode;
+ uint64_t lock_mode = LCK_PR|LCK_PW;
ENTRY;
fid_build_reg_res_name(fid, &res_id);
policy.l_inodebits.bits = (it->it_op == IT_GETATTR) ?
MDS_INODELOCK_UPDATE : MDS_INODELOCK_LOOKUP;
+ /* don't match agains open lock, that is have no guaranted to atributes
+ * to be valid */
+ lock_mode |= it->it_op == IT_GETATTR ? 0 : LCK_CW | LCK_CR;
mode = ldlm_lock_match(exp->exp_obd->obd_namespace,
LDLM_FL_BLOCK_GRANTED, &res_id, LDLM_IBITS,
- &policy, LCK_CR|LCK_CW|LCK_PR|LCK_PW, &lockh, 0);
+ &policy, lock_mode, &lockh, 0);
if (mode) {
it->d.lustre.it_lock_handle = lockh.cookie;
it->d.lustre.it_lock_mode = mode;
that patch restore ~1.8.6 behavior (request LCK_CR instead of LCK_PR in readdir) and fix issue when attributes request matched against openlock. my results after it [root@rhel6-64 tests]# [root@rhel6-64 tmp]# sh bug1.sh a b drwxr-xr-x 4 root root 4096 May 17 09:53 /mnt/lustre/linkcount_test drwxr-xr-x 3 root root 4096 May 17 09:53 . Manisha, may you test that patch and verify it? |
| Comment by Alexey Lyashkov [ 17/May/13 ] |
|
first part of bug introduced in cmd3 branch.
Index: dir.c
===================================================================
RCS file: /lustre/lustre-core/llite/dir.c,v
retrieving revision 1.54.2.7.10.6.2.1.4.4.2.38
retrieving revision 1.54.2.7.10.6.2.1.4.4.2.39
diff -u -r1.54.2.7.10.6.2.1.4.4.2.38 -r1.54.2.7.10.6.2.1.4.4.2.39
--- dir.c 13 Oct 2006 12:32:11 -0000 1.54.2.7.10.6.2.1.4.4.2.38
+++ dir.c 20 Oct 2006 11:27:35 -0000 1.54.2.7.10.6.2.1.4.4.2.39
@@ -278,14 +278,20 @@
ldlm_policy_data_t policy = {.l_inodebits = {MDS_INODELOCK_UPDATE} };
struct address_space *mapping = dir->i_mapping;
struct lustre_handle lockh;
- struct page *page;
struct lu_dirpage *dp;
+ struct page *page;
+ ldlm_mode_t mode;
int rc;
__u32 start;
__u32 end;
+#ifdef CONFIG_PDIROPS
+ mode = LCK_PR;
+#else
+ mode = LCK_CR;
+#endif
rc = md_lock_match(ll_i2sbi(dir)->ll_md_exp, LDLM_FL_BLOCK_GRANTED,
- ll_inode2fid(dir), LDLM_IBITS, &policy, LCK_CR, &lockh);
+ ll_inode2fid(dir), LDLM_IBITS, &policy, mode, &lockh);
if (!rc) {
struct lookup_intent it = { .it_op = IT_READDIR };
struct ptlrpc_request *request;
@@ -296,7 +302,7 @@
return ERR_PTR(-ENOMEM);
rc = md_enqueue(ll_i2sbi(dir)->ll_md_exp, LDLM_IBITS, &it,
- LCK_CR, op_data, &lockh, NULL, 0,
+ mode, op_data, &lockh, NULL, 0,
ldlm_completion_ast, ll_md_blocking_ast, dir,
0);
and put in HEAD with b_post_cmd3 merge. |
| Comment by Manisha salve (Inactive) [ 23/May/13 ] |
|
Verified the patch , works fine. |
| Comment by Andreas Dilger [ 24/May/13 ] |
|
The patch in |
| Comment by Andreas Dilger [ 24/May/13 ] |
|
Shadow, can you please submit that patch to Gerrit? |
| Comment by Alexander Boyko [ 27/May/13 ] |
|
I have submitted patch http://review.whamcloud.com/6460. |
| Comment by Patrick Farrell (Inactive) [ 06/Jun/13 ] |
|
We're monitoring this one from the Cray side... I see the patch failed one of the sanityn tests, specifically test_13. That's "test directory page revocation", and at least seems relevant to this patch. Thoughts/updates? |
| Comment by Alexey Lyashkov [ 11/Jun/13 ] |
|
Correct, that is related to that patch - unlink modification don't flush an page cache. That is may be solution for a lock ping-pong with unlinking a file in parallel to the create, but test failed. |
| Comment by Shuichi Ihara (Inactive) [ 04/Sep/13 ] |
|
Any updates on this, new patches? |
| Comment by Alexey Lyashkov [ 04/Sep/13 ] |
|
I pushed refreshed version few days ago. |
| Comment by Patrick Valentin (Inactive) [ 05/Sep/13 ] |
|
I tested set 4 of patch http://review.whamcloud.com/6460 on a lustre 2.4.0 client, and it also fixes the issue described in |
| Comment by Dmitry Eremin (Inactive) [ 05/Oct/13 ] |
|
I tested Patch Set 5 of http://review.whamcloud.com/6460 with master, it partially solve the |
| Comment by Cory Spitz [ 10/Oct/13 ] |
|
Should 'Fix Version/s' == 2.5.0? |
| Comment by Peter Jones [ 11/Oct/13 ] |
|
Cory yes that would be more clear |
| Comment by Jodi Levi (Inactive) [ 11/Oct/13 ] |
|
Eliminating the extra roc when revalidating remote objects is being handled in |
| Comment by Sebastien Buisson (Inactive) [ 14/Oct/13 ] |
|
I am sorry, but in addition to patch http://review.whamcloud.com/6460 that was landed on master, I can also see patch http://review.whamcloud.com/7910 merged at the same time. TIA, |
| Comment by Patrick Farrell (Inactive) [ 16/Oct/13 ] |
|
Sebastien, As I understand it, that second patch is not required. Oleg commented (I can't find where right now) that it was unnecessary to send the parent FID when doing that operation, as the child FID is sufficient. I don't believe it fixes any specific problem, it's just cleanup/improvement. To summarize: |
| Comment by Bob Glossman (Inactive) [ 18/Nov/13 ] |
|
backport to b2_4 |
| Comment by Jian Yu [ 22/Nov/13 ] |
|
The patch was cherry-picked to Lustre b2_4 branch. |
| Comment by Alexey Lyashkov [ 20/Dec/13 ] |
|
please reopen a ticket due introducing an regressions with open with last Oleg changes ps. it's strange - but i don't able to reopen ticket. |