[LU-4690] sanity test_4: Expect error removing in-use dir /mnt/lustre/remote_dir Created: 28/Feb/14  Updated: 13/Jul/16  Resolved: 12/May/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0, Lustre 2.5.1, Lustre 2.5.2
Fix Version/s: Lustre 2.6.0, Lustre 2.5.2

Type: Bug Priority: Blocker
Reporter: Maloo Assignee: Di Wang
Resolution: Fixed Votes: 0
Labels: dne2

Issue Links:
Related
is related to LU-4471 Failed 'rmdir' on remote directories ... Resolved
is related to LU-4215 Some expected improvements for OUT Open
is related to LU-3696 sanity test_17m, test_17n: e2fsck una... Resolved
is related to LU-5296 lod_attr_set() skips attr_set on osp ... Resolved
is related to LU-5675 qsd_reint_index(): II_FL_NONUNQ is se... Resolved
is related to LU-5016 clean up use of la_attr for non-attri... Resolved
Severity: 3
Rank (Obsolete): 12888

 Description   

This issue was created by maloo for Bob Glossman <bob.glossman@intel.com>

This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/0dd7df32-a0b5-11e3-9f3a-52540035b04c.

The sub-test test_4 failed with the following error:

Expect error removing in-use dir /mnt/lustre/remote_dir

There have been many of these lately. maloo reports:

Failure Rate: 31.00% of last 100 executions [all branches]

A search shows failures like this started happening 2/26, none earlier.
Suspect something landed just recently causing these failures.

Info required for matching: sanity 4



 Comments   
Comment by Bob Glossman (Inactive) [ 28/Feb/14 ]

I see that test_4 was only added to sanity.sh very recently, in commit 1f55f2a9071d5e7db4042b959723086dee1c379a, LU-4471 mdd: mdd_unlink: do trans_start after sanity check.

That would explain why all failures are recent. Strongly suspect something is very wrong with this new subtest.

Comment by Bob Glossman (Inactive) [ 28/Feb/14 ]

I note that this change was back ported into b2_5 at nearly the same time. I'll bet it's causing test failures there too.

Comment by Di Wang [ 01/Mar/14 ]

Hmm, right now we will check whether the object is empty during getattr. But it is missing for striped dir. And also current empty check for remote directory is kind of jacky, given that we already have remote object iteration from LFSCK. I will check what I can do here. And in the mean time, I would suggest to disable the test.

Comment by Di Wang [ 01/Mar/14 ]

Disable LU-4690 temporarily. http://review.whamcloud.com/#/c/9440/

Comment by Di Wang [ 05/Mar/14 ]

http://review.whamcloud.com/9511

Comment by Jodi Levi (Inactive) [ 10/Mar/14 ]

I have retriggered Change, 9511 test.

Comment by Jodi Levi (Inactive) [ 08/May/14 ]

Does this test need to be re-enabled now that Change, 9511 has landed? Or can this ticket be closed?

Comment by Di Wang [ 08/May/14 ]

There is another patch http://review.whamcloud.com/#/c/10261/ to cleanup the 9511 a bit. Probably close the ticket after 10261 is landed?

Comment by Jodi Levi (Inactive) [ 12/May/14 ]

Patches landed to Master.

Comment by Niu Yawei (Inactive) [ 03/Jul/14 ]

Looks the patch caused serious problem with quota:

In lod_attr_set():

        for (i = 0; i < lo->ldo_stripenr; i++) {
                LASSERT(lo->ldo_stripe[i]);
+               if (dt_object_exists(lo->ldo_stripe[i]) == 0)
+                       continue;
                rc = dt_attr_set(env, lo->ldo_stripe[i], attr, handle, capa);

I think dt_object_exists() will always return false for osp object, then any chown/chgrp will never be applied to OST objects. Any thoughts?

Because test_34 of s-q was disabled for LU-4515, this problem wasn't found by maloo test, I've submit a patch to re-enable test_34 in LU-4515.

Comment by Di Wang [ 04/Jul/14 ]

I think dt_object_exists() will always return false for osp object, then any chown/chgrp will never be applied to OST objects. Any thoughts?

Hmm, we actually separate the remote and exist flag for MD OSP object, but for OST object, this check might have problem. The easiest fix might be add S_ISDIR(dt->do_lu.lo_header->loh_attr) check here?

Comment by Niu Yawei (Inactive) [ 04/Jul/14 ]

Thank you, Di. I'll try to compose a patch according to your comment and track the fix in LU-5296.

Generated at Sat Feb 10 01:44:59 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.