[LU-8967] directory entries for non existing files - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Blocker
Fix Version/s: None
Affects Version/s: Lustre 2.8.0
Labels:
- llnl
Environment:
ssh://review.whamcloud.com/fs/lustre-release-fe-llnl

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

We have several directories with entries for non existing files. For example:

[root@quartz2311:~]# ls -l /p/lscratchh/casses1/quartz-zinc_3/19519/dbench/quartz2322/clients/client0                                                                                 
ls: cannot access /p/lscratchh/casses1/quartz-zinc_3/19519/dbench/quartz2322/clients/client0/filler.003: No such file or directory
total 3154
-rw------- 1 casses1 casses1 1048576 Dec 21 16:43 filler.000
-rw------- 1 casses1 casses1 1048576 Dec 21 16:43 filler.001
-rw------- 1 casses1 casses1 1048576 Dec 21 16:43 filler.002
-????????? ? ?       ?             ?            ? filler.003
drwx------ 2 casses1 casses1   25600 Dec 21 16:43 ~dmtmp

The directory itself is a remote directory on one MDT:

[root@quartz2311:~]# lfs getdirstripe -d /p/lscratchh/casses1/quartz-zinc_3/19519/dbench/quartz2322/clients/client0
lmv_stripe_count: 0 lmv_stripe_offset: 3

We are able to get striping information for this file:

[root@quartz2311:~]# lfs getstripe /p/lscratchh/casses1/quartz-zinc_3/19519/dbench/quartz2322/clients/client0/filler.003
/p/lscratchh/casses1/quartz-zinc_3/19519/dbench/quartz2322/clients/client0/filler.003
lmm_stripe_count:   1
lmm_stripe_size:    1048576
lmm_pattern:        1
lmm_layout_gen:     0
lmm_stripe_offset:  27
        obdidx           objid           objid           group
            27        20538776      0x1396598      0xcc0000402

It looks like the OSS serving that OST was rebooted and the OST went through recovery around the time the missing file was created. In particular, we note that the object number falls in the range of orphan objects that were deleted:

[root@zinci:~]# grep 0xcc0000402 /var/log/conman/console.zinc*
/var/log/conman/console.zinc43:2016-12-21 16:30:56 [189484.767900] Lustre: lsh-OST001b: deleting orphan objects from 0xcc0000402:20538706 to 0xcc0000402:20541649
/var/log/conman/console.zinc43:2016-12-21 16:33:30 [189639.110247] Lustre: lsh-OST001b: deleting orphan objects from 0xcc0000402:20538766 to 0xcc0000402:20541649
/var/log/conman/console.zinc43:2016-12-21 16:35:41 [189769.704490] Lustre: lsh-OST001b: deleting orphan objects from 0xcc0000402:20538766 to 0xcc0000402:20541649
/var/log/conman/console.zinc43:2016-12-21 16:40:19 [190047.449320] Lustre: lsh-OST001b: deleting orphan objects from 0xcc0000402:20538766 to 0xcc0000402:20541649
/var/log/conman/console.zinc43:2016-12-21 16:44:45 [190313.751155] Lustre: lsh-OST001b: deleting orphan objects from 0xcc0000402:20538820 to 0xcc0000402:20541649
/var/log/conman/console.zinc44:2016-12-21 16:49:27 [  159.838420] Lustre: lsh-OST001b: deleting orphan objects from 0xcc0000402:20538820 to 0xcc0000402:20541649

I will attach server console logs separately.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

LU-8967.console.zinc44
105 kB
23/Dec/16 1:26 AM
LU-8967.console.zinc43
26 kB
23/Dec/16 1:26 AM
LU-8967.console.zinc4.mds
8 kB
23/Dec/16 1:26 AM

Issue Links

is duplicated by

LU-8562 osp_precreate_cleanup_orphans/osp_precreate_reserve race may cause data loss

Resolved

Activity

[LU-8967] directory entries for non existing files

Peter Jones added a comment - 27/Feb/17 9:56 PM

AFAIK items tracked under this ticket are complete

Peter Jones added a comment - 27/Feb/17 9:56 PM AFAIK items tracked under this ticket are complete

Peter Jones added a comment - 01/Feb/17 3:40 PM

This is now confirmed as a duplicate of ~~LU-8562~~ and so you should proceed with using Ned's ports of those patches to 2.8 FE. In addition it is recommended that you pick up the fix for ~~LU-8367~~.

Peter Jones added a comment - 01/Feb/17 3:40 PM This is now confirmed as a duplicate of LU-8562 and so you should proceed with using Ned's ports of those patches to 2.8 FE. In addition it is recommended that you pick up the fix for LU-8367 .

Alex Zhuravlev added a comment - 23/Jan/17 3:45 PM

a prototype is under testing, I'm going to pass it through Maloo few more times..

Alex Zhuravlev added a comment - 23/Jan/17 3:45 PM a prototype is under testing, I'm going to pass it through Maloo few more times..

Alex Zhuravlev added a comment - 10/Jan/17 3:15 PM

with ~~LU-8562~~ I still see the same symptoms rarely.. there is another patch addressing the same issue, it's doing a bit better, but still possible to reproduce within few hours. I've been looking for the root cause.

Alex Zhuravlev added a comment - 10/Jan/17 3:15 PM with LU-8562 I still see the same symptoms rarely.. there is another patch addressing the same issue, it's doing a bit better, but still possible to reproduce within few hours. I've been looking for the root cause.

Ned Bass (Inactive) added a comment - 09/Jan/17 7:09 PM - edited

Ned, so this issue is solved by ~~LU-8562~~ in general, but patch itself contains defect. I checked your patch, does it solves your problem? Or more work is required in that area?

I have tested the ~~LU-8562~~ patch and my one-line follow-on patch https://review.whamcloud.com/24758 on a single-node test setup. I am no longer able to reproduce the data loss bug with those patches applied. Without the patches I can reproduce it almost immediately using the ~~LU-8562~~ test case.

The remaining work to do in that area is as follows.

An explanation is needed as to why conf-sanity test_101 is still failing as per ~~LU-8972~~. The ongoing test case failure suggests the data loss bug is not completely resolved by the ~~LU-8562~~ patch. We need high confidence that this bug is resolved before putting user data on Lustre 2.8 FE.
Patch https://review.whamcloud.com/24758 needs review by someone who understands the osp_precreate_thread state machine better than me.

Interesting that ~~LU-8562~~ itself is quite recent change and we did't observe a lot of issues similar to ~~LU-8967~~ without it. I wonder what was changed in your system when you start seeing it. Was it just a software update or hardware as well?

My best guess as to why we started seeing ~~LU-8562~~ is that we made changes to our pacemaker/corosync HA system. We recently optimized the configuration so Lustre services are started with much less delay than before. This makes it is very likely that OST orphan cleanup will be interrupted by the HA partner coming up and failing back the OST. As I understand, that is the race window for ~~LU-8562~~ to occur. Before it took a long time for services to start up, so orphan cleanup was almost always done by the time the partner failed back the OST.

Ned Bass (Inactive) added a comment - 09/Jan/17 7:09 PM - edited Ned, so this issue is solved by LU-8562 in general, but patch itself contains defect. I checked your patch, does it solves your problem? Or more work is required in that area? I have tested the LU-8562 patch and my one-line follow-on patch https://review.whamcloud.com/24758 on a single-node test setup. I am no longer able to reproduce the data loss bug with those patches applied. Without the patches I can reproduce it almost immediately using the LU-8562 test case. The remaining work to do in that area is as follows. An explanation is needed as to why conf-sanity test_101 is still failing as per LU-8972 . The ongoing test case failure suggests the data loss bug is not completely resolved by the LU-8562 patch. We need high confidence that this bug is resolved before putting user data on Lustre 2.8 FE. Patch https://review.whamcloud.com/24758 needs review by someone who understands the osp_precreate_thread state machine better than me. Interesting that LU-8562 itself is quite recent change and we did't observe a lot of issues similar to LU-8967 without it. I wonder what was changed in your system when you start seeing it. Was it just a software update or hardware as well? My best guess as to why we started seeing LU-8562 is that we made changes to our pacemaker/corosync HA system. We recently optimized the configuration so Lustre services are started with much less delay than before. This makes it is very likely that OST orphan cleanup will be interrupted by the HA partner coming up and failing back the OST. As I understand, that is the race window for LU-8562 to occur. Before it took a long time for services to start up, so orphan cleanup was almost always done by the time the partner failed back the OST.

Mikhail Pershin added a comment - 09/Jan/17 9:36 AM

Ned, so this issue is solved by ~~LU-8562~~ in general, but patch itself contains defect. I checked your patch, does it solves your problem? Or more work is required in that area?

Interesting that ~~LU-8562~~ itself is quite recent change and we did't observe a lot of issues similar to ~~LU-8967~~ without it. I wonder what was changed in your system when you start seeing it. Was it just a software update or hardware as well?

Mikhail Pershin added a comment - 09/Jan/17 9:36 AM Ned, so this issue is solved by LU-8562 in general, but patch itself contains defect. I checked your patch, does it solves your problem? Or more work is required in that area? Interesting that LU-8562 itself is quite recent change and we did't observe a lot of issues similar to LU-8967 without it. I wonder what was changed in your system when you start seeing it. Was it just a software update or hardware as well?

Ned Bass (Inactive) added a comment - 30/Dec/16 7:53 PM

Hi Mikhail, Each occurrence that I've investigated happened immediately after the OST completed recovery. The object numbers of the missing files all fall at the beginning of the range of deleted orphans. It does not continue to occur when all OSTs are up.

I can remove the files as root. The rm command fails for an unprivileged user because stat() returns ENONENT and rm treats that as fatal unless you're root.

I have confirmed that I can reproduce ~~LU-8562~~ on our system using the test case from that patch and it looks just like this issue. I tested https://review.whamcloud.com/#/c/22211/ on a single node setup and wasn't able to reproduce the bug. However I ran into a defect with that patch that causes the osp_precreate thread to hang as I described in ~~LU-8562~~.

Ned Bass (Inactive) added a comment - 30/Dec/16 7:53 PM Hi Mikhail, Each occurrence that I've investigated happened immediately after the OST completed recovery. The object numbers of the missing files all fall at the beginning of the range of deleted orphans. It does not continue to occur when all OSTs are up. I can remove the files as root. The rm command fails for an unprivileged user because stat() returns ENONENT and rm treats that as fatal unless you're root. I have confirmed that I can reproduce LU-8562 on our system using the test case from that patch and it looks just like this issue. I tested https://review.whamcloud.com/#/c/22211/ on a single node setup and wasn't able to reproduce the bug. However I ran into a defect with that patch that causes the osp_precreate thread to hang as I described in LU-8562 .

Mikhail Pershin added a comment - 30/Dec/16 1:35 PM

Ned, are these entries occurred once when OST was failed over or still continue to occur? Is it possible to remove them?

I am checking patches you've mentioned.

Mikhail Pershin added a comment - 30/Dec/16 1:35 PM Ned, are these entries occurred once when OST was failed over or still continue to occur? Is it possible to remove them? I am checking patches you've mentioned.

Ned Bass (Inactive) added a comment - 27/Dec/16 11:33 PM

I suspect this is related to ~~LU-8562~~.

Ned Bass (Inactive) added a comment - 27/Dec/16 11:33 PM I suspect this is related to LU-8562 .

Peter Jones added a comment - 23/Dec/16 6:34 PM

Mike

Could you please assist with this issue?

Thanks

Peter

Peter Jones added a comment - 23/Dec/16 6:34 PM Mike Could you please assist with this issue? Thanks Peter

People

Assignee:: Mikhail Pershin

Reporter:: Ned Bass (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 23/Dec/16 1:20 AM

Updated:: 10/Aug/17 11:36 PM

Resolved:: 27/Feb/17 9:56 PM