[LU-184] Test failure on test suite insanity, subtest test_0 Created: 31/Mar/11  Updated: 18/Apr/11  Resolved: 18/Apr/11

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.1.0

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Niu Yawei (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Attachments: Text File LU-184-debug.patch    
Severity: 3
Epic: recovery
Rank (Obsolete): 5074

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/b573b3c0-5c15-11e0-a272-52540025f9af.

The sub-test test_0 failed with the following error:

post-failover df: 1



 Comments   
Comment by Sarah Liu [ 31/Mar/11 ]

I saw this error several times, but if I only run insanity, it can pass. It would be better someone can have a look at it.

Comment by Peter Jones [ 01/Apr/11 ]

Niu

Could you please help Sarah with this test failure?

Thanks

Peter

Comment by Niu Yawei (Inactive) [ 01/Apr/11 ]

Looks there are still some open files on clients before running insanity test_0, then the test_0 triggered open replay on those clients, however, some open replays failed, and two clients were evicted for the failed open replay, that's why the statfs of test_0 on those two clients failed.

If you run insanity only, there will be definitely no open files on clients, so everything will be fine.

I'll look closer to see why the open replay got failure, there might be some defect in open replay.

Comment by Di Wang [ 01/Apr/11 ]

I happened to review the debug log a little bit. It seems open replay failed because of VBR replay fails, (parent's version does not match the version of the request). I am afraid that is the legacy of other unrecovery failure.

Comment by Niu Yawei (Inactive) [ 02/Apr/11 ]

It looks like some previous tests leaked open replay on clients, and those requests were replayed and failed during insanity test_0, which result in test failure at the end.

Hi, tappro & Di

There is one thing that I don't quite understand: why don't we purge all the replay reqeusts of the client when it's evicted? Won't those leaked replay requests bring troubles in future? Especially for the open replay.

Comment by Mikhail Pershin [ 02/Apr/11 ]

Niu, these replays shouldn't be replayed because after eviction the client will get new connection and recovery is not started for this client. Client knows it is evicted only after reconnect, so we need to keep that queue until that, after reconnection this queue will be freed during import invalidation including open replays.

If you see stalled open replays in queue after eviction that is most probably bug, please check that carefully - was that import evicted or not. Stalled replays shouldn't affect recovery because their generation is old and such requests are dropped, that is why I doubt that import was evicted.

Comment by Mikhail Pershin [ 02/Apr/11 ]

btw, recovery-small test 10 was failed before, maybe these requests are from it.

Comment by Niu Yawei (Inactive) [ 02/Apr/11 ]

Hi, tappro

I didn't see how all the replay requests are freed during import invalidation, looks only ptlrpc_free_committed() is called on invlidation, so how can we guarantee all replay requests are purged?

"Stalled replays shouldn't affect recovery because their generation is old and such requests are dropped", What generation did you mean?

Thanks
Niu

Comment by Mikhail Pershin [ 02/Apr/11 ]

ptlrpc_free_committed() will do the job, there is such check:
if (req->rq_import_generation < imp->imp_generation)

{ DEBUG_REQ(D_RPCTRACE, req, "free request with old gen"); GOTO(free_req, 0); }

so all request from issued with old import will be freed.

Server has another check for older requests - ptlrpc_check_req(). There is conn_cnt (sorry, not generation as I mentioned) is compared to drop requests from old connection. But it will drop only requests sent by old import, so old replay will pass if it sent by new import.

Comment by Niu Yawei (Inactive) [ 02/Apr/11 ]

ptlrpc_free_committed() looks to me not a safe way to purge all replay requests, if the replay request (or another replay request with smaller transno) is added just after imp->imp_generation++, then ptlrpc_free_committed() won't free all the requests on replay list.

I made a debug patch to see if there is still replay requests after import invalidation, Hi, Sarah, could you reproduce this with the patch applied, then we can go through the client log to see if there is any replay reqeusts leaked during previous tests. Thanks.

Comment by Niu Yawei (Inactive) [ 02/Apr/11 ]

After import invalidation, print the replay requests if there is any.

Comment by Mikhail Pershin [ 04/Apr/11 ]

----------
if the replay request (or another replay request with smaller transno) is added just after imp->imp_generation++, then ptlrpc_free_committed() won't free all the requests on replay list.
----------

I don't see how that can be? Any reply after generation change will be dropped with -EIO. Look at ptlrpc_check_set():ptlrpc_import_delay_req(), there is check for wrong generation.

Another question - why are we talking about some 'leaked' replays at all, as I can see it is just ordinary open replays after reconnection. Insanity test0 does MDS fail, then recovery starts with 3 clients in server last_rcvd, then two client sends open replays and failed and evicted (client 20 and 21), that is why test failed. That said these clients had old replays from previous tests, it is normal case nothing wrong here. In fact I see here only test issue - because one of previous test wan't cleaned up properly.

Comment by Niu Yawei (Inactive) [ 04/Apr/11 ]

Looks ptlrpc_import_delay_req() should be called before (re)sending a request, shouldn't it?

Because I think after a client go through the invalidation process, all replay requests must be dropped, otherwise, the retained replays will cause trouble in future recovery (especially for open replays, they don't have chance to be freed by ptlrpc_free_committed()). I'm not sure if the open replays in insanity test_0 crossed a client eviction, if it did, then I think they are 'leaked' replays. I just can't imagine how can the ordinary open replay get vbr failure in insanity test_0, so I suspect they are 'leaked' replays.

Comment by Mikhail Pershin [ 04/Apr/11 ]

How do you think new request can be added to the freed replays list? That said import must receive reply for some old request and that should happen after reconnection/eviction. Let's suppose this is possible, but will that be repeatable? Let's wait for run with your debug to see that.

But that can't cause recovery anyway, the client is evicted, server will not include it in recovery process and will not ask client to recover, so client will start normal processing. And the first reply will cause after_reply():ptlrpc_free_committed() which will drop any stalled replays in replay_list. Instead of that we see normal recovery with all clients included.

About open replays we see - that can be just result of previous tests which causes them to stay in queue, missed close maybe? So they cannot be re-open on server. Note also that vbr failure can be reported in more cases than just version mismatch, if object doesn't exist then this is also 'version mismatch', so if files were removed but some client will try to open them - it will got version mismatch as well.

Comment by Niu Yawei (Inactive) [ 04/Apr/11 ]

Yes, I was suspecting that some old reply arrived client after reconnection.

Right, client doesn't do replay immediately after eviction, but it could do recovery later, and open replay can't be freed by ptlrpc_free_committed().

If object doesn't exist but open replay retained on client, the only valid reason I can think of is the uncommitted open create, and such open replay should pass version checking (given that no client was evicted before the open replay in test_0). Is there any other reason can result in that strange situation (no object but has open replay request on client)?

Comment by Mikhail Pershin [ 04/Apr/11 ]

> Yes, I was suspecting that some old reply arrived client after reconnection.

but that quite weird to see that every time, so server replied, then evict client, then got reconnect, then replied again to connect request, then connect reply went back to the client and client starts eviction. And all this time the first reply was somewhere in network, wandering across routers? I don't believe that can happens every time.

> Right, client doesn't do replay immediately after eviction, but it could do recovery later, and open replay can't be freed by ptlrpc_free_committed().

Well, after_reply() calls ptlrpc_free_committed() for every reply, so after any reply from server the old requests will be purged even if they appeared somehow in replay queue after eviction. Note that even if client does nothing there are pings and they will cause after_reply() invoking. So we are already doing too many assumptions each is less probable than another, but there are just obvious cases for which we don't need to imagine 'leak' replays cases - they are not leaked but just normal replays staying in queue.

In fact we can close this discussion very easily - let's see what your debug will show. Also we can close any possibility for such 'leakage' just doing check for generation before adding new replays in the queue, so no old replays will be added to the queue after eviction for sure. I'll agree to have such check to be completely sure.

> If object doesn't exist but open replay retained on client, the only valid reason I can think of is the uncommitted open create

No, open is kept in queue even being committed and removed when closed no matter is that just open or open|create

> such open replay should pass version checking (given that no client was evicted before the open replay in test_0).

Not sure I get this. These open replays didn't pass version checking that is why test_0 failed. So original request opens file (maybe with creation) but replay found its version wrong (version of non-existent file is considered as '-1'). These checks are done for parent and child.
That can be result of problems not with that client but another one, say one client miss recovery so other clients may have issues with replays, another case - client sent close() and after that all files were deleted by test cleanup, but this close wasn't replied or was lost so open retains in queue. This is not usual case but recent recovery-small run can cause that by some obd_fail code maybe?

I want to say that there can be more ordinary and simple cases before thinking about some 'leaked' replays which are possible in very rare and specific cases only.

I'd take your patch and ask QE to run insanity with previous tests, e.g. replay-ost-single+insanity, then add another previous test and so on. There were two tests I suspect involved in this - racer and recovery-small. Maybe our opens are from them. In fact we need more info - log level must have HA and INODE to be turned on also.

Comment by Build Master (Inactive) [ 04/Apr/11 ]

Integrated in lustre-reviews » server,el6-i686 #60
LU-184 Debug patch for LU-184

sarah : b8e0de8975135cd7d63d288f3556bfac69d3865c
Files :

  • lustre/ptlrpc/import.c
Comment by Build Master (Inactive) [ 04/Apr/11 ]

Integrated in lustre-reviews » client,el5-x86_64 #60
LU-184 Debug patch for LU-184

sarah : b8e0de8975135cd7d63d288f3556bfac69d3865c
Files :

  • lustre/ptlrpc/import.c
Comment by Build Master (Inactive) [ 04/Apr/11 ]

Integrated in lustre-reviews » client,el6-x86_64 #60
LU-184 Debug patch for LU-184

sarah : b8e0de8975135cd7d63d288f3556bfac69d3865c
Files :

  • lustre/ptlrpc/import.c
Comment by Build Master (Inactive) [ 04/Apr/11 ]

Integrated in lustre-reviews » client,ubuntu-x86_64 #60
LU-184 Debug patch for LU-184

sarah : b8e0de8975135cd7d63d288f3556bfac69d3865c
Files :

  • lustre/ptlrpc/import.c
Comment by Build Master (Inactive) [ 04/Apr/11 ]

Integrated in lustre-reviews » client,el6-i686 #60
LU-184 Debug patch for LU-184

sarah : b8e0de8975135cd7d63d288f3556bfac69d3865c
Files :

  • lustre/ptlrpc/import.c
Comment by Build Master (Inactive) [ 04/Apr/11 ]

Integrated in lustre-reviews » server,el5-i686 #60
LU-184 Debug patch for LU-184

sarah : b8e0de8975135cd7d63d288f3556bfac69d3865c
Files :

  • lustre/ptlrpc/import.c
Comment by Build Master (Inactive) [ 04/Apr/11 ]

Integrated in lustre-reviews » client,el5-i686 #60
LU-184 Debug patch for LU-184

sarah : b8e0de8975135cd7d63d288f3556bfac69d3865c
Files :

  • lustre/ptlrpc/import.c
Comment by Build Master (Inactive) [ 04/Apr/11 ]

Integrated in lustre-reviews » server,el5-x86_64 #60
LU-184 Debug patch for LU-184

sarah : b8e0de8975135cd7d63d288f3556bfac69d3865c
Files :

  • lustre/ptlrpc/import.c
Comment by Sarah Liu [ 05/Apr/11 ]

As above,I have submitted the patch and made a build, but currently the new build cannot be uploaded and installed on Toro due to load script's problem. Chris will work on it.

Comment by Build Master (Inactive) [ 05/Apr/11 ]

Integrated in lustre-reviews » server,el6-i686 #74
LU-184 Debug patch for LU-184

sarah : b8e0de8975135cd7d63d288f3556bfac69d3865c
Files :

  • lustre/ptlrpc/import.c
Comment by Build Master (Inactive) [ 05/Apr/11 ]

Integrated in lustre-reviews » client,el5-x86_64 #74
LU-184 Debug patch for LU-184

sarah : b8e0de8975135cd7d63d288f3556bfac69d3865c
Files :

  • lustre/ptlrpc/import.c
Comment by Build Master (Inactive) [ 05/Apr/11 ]

Integrated in lustre-reviews » client,el6-x86_64 #74
LU-184 Debug patch for LU-184

sarah : b8e0de8975135cd7d63d288f3556bfac69d3865c
Files :

  • lustre/ptlrpc/import.c
Comment by Build Master (Inactive) [ 05/Apr/11 ]

Integrated in lustre-reviews » client,ubuntu-x86_64 #74
LU-184 Debug patch for LU-184

sarah : b8e0de8975135cd7d63d288f3556bfac69d3865c
Files :

  • lustre/ptlrpc/import.c
Comment by Build Master (Inactive) [ 05/Apr/11 ]

Integrated in lustre-reviews » client,el6-i686 #74
LU-184 Debug patch for LU-184

sarah : b8e0de8975135cd7d63d288f3556bfac69d3865c
Files :

  • lustre/ptlrpc/import.c
Comment by Build Master (Inactive) [ 05/Apr/11 ]

Integrated in lustre-reviews » server,el5-i686 #74
LU-184 Debug patch for LU-184

sarah : b8e0de8975135cd7d63d288f3556bfac69d3865c
Files :

  • lustre/ptlrpc/import.c
Comment by Build Master (Inactive) [ 05/Apr/11 ]

Integrated in lustre-reviews » server,el5-x86_64 #74
LU-184 Debug patch for LU-184

sarah : b8e0de8975135cd7d63d288f3556bfac69d3865c
Files :

  • lustre/ptlrpc/import.c
Comment by Build Master (Inactive) [ 05/Apr/11 ]

Integrated in lustre-reviews » client,el5-i686 #74
LU-184 Debug patch for LU-184

sarah : b8e0de8975135cd7d63d288f3556bfac69d3865c
Files :

  • lustre/ptlrpc/import.c
Comment by Sarah Liu [ 05/Apr/11 ]

https://maloo.whamcloud.com/test_sets/ec05daba-5f91-11e0-a2b4-52540025f9af

Not sure it is the same, Niu, could you please check if it is the same issue. If not, I will create another issue. Thanks

Comment by Niu Yawei (Inactive) [ 05/Apr/11 ]

Hi, Sarah

This looks like same issue, thank you. One thing confused me is that the 3 clients did replays during replay-single test, I thought the replay should happen only on one client, and I didn't find any messages printed by this debug patch, so those replays should not the 'leaked' replays as I mentioned before.

This log is better than the previous insanity test's log, since after last umount on clients, only two tests ran before replay-single, they are lfsck and racer, the replay requests might come from the racer test, will look closer to see what happened there.

Comment by Sarah Liu [ 05/Apr/11 ]

Niu, you are right, I ran racer and then insanity, this bug can be reproduced. Here is the result link tested on build with your patch.
https://maloo.whamcloud.com/test_sets/cd1ed9f6-6010-11e0-a2b4-52540025f9af

Comment by Mikhail Pershin [ 05/Apr/11 ]

as I remember racer does many bad things to cause different conflict situation, so there we have two options - or this is real bug related to open/close handling or it is just results of racer that should be clean up upon exit from racer test.

Interesting that this happened not at test 0 but only at 20b. That said these replays were done successfully in all previous tests and failed only at 20b. I wonder what is changed and produced version mismatch there.

Comment by Niu Yawei (Inactive) [ 07/Apr/11 ]

Hi, Tappro

Yes, it might be open replay defect or racer test defect, what confused me is that there were some open replays during the recovery of insanity test_0 (racer + insanity tests), because the log shows last_transno is much greater than the transnos of open replays (so they must be committed open), and runracer didn't report error (so all threads have been killed and no pending close), I don't see why there is still open replays.

Hi, Sarah
Could you reproduce it (racer + insanity) with D_HA and D_INODE log enabled? I hope more debug log can help us to figure out where the open replays come from. Thank you.

Comment by Sarah Liu [ 08/Apr/11 ]

here is the result link with D_HA and D_INODE enable
https://maloo.whamcloud.com/test_sets/d36cb418-626d-11e0-a2b4-52540025f9af

Comment by Mikhail Pershin [ 09/Apr/11 ]

This is obviously bug, from what I see we have many version mismatches like the following:

00000004:00000002:0.0:1302324530.070202:0:10190:0:(mdt_handler.c:1607:mdt_reint_opcode()) @@@ reint opt = 6 req@ffff8106384fc000 x1365585644912170/t0(4295521982) o-1->917221e7-5188-2444-10d7-07b3e1abdbd0@NET_0x50000c0a80416_UUID:0/0 lens 528/0 e 0 to 0 dl 1302324555 ref 1 fl Complete:/ffffffff/ffffffff rc 0/-1
00000004:00000002:0.0:1302324530.070213:0:10190:0:(mdt_open.c:1194:mdt_reint_open()) I am going to open [0x200000400:0x1:0x0]/(14->[0x200000402:0x66c8:0x0]) cr_flag=02102 mode=0200100644 msg_flag=0x4
<skip>
00000004:00000002:0.0:1302324530.070256:0:10190:0:(mdt_reint.c:114:mdt_obj_version_get()) FID [0x200000400:0x1:0x0] version is 0x100000002
00000004:00000002:0.0:1302324530.070260:0:10190:0:(mdt_reint.c:114:mdt_obj_version_get()) FID [0x200000401:0x4693:0x0] version is 0x10008b4de
00000004:00000002:0.0:1302324530.070261:0:10190:0:(mdt_reint.c:143:mdt_version_check()) Version mismatch 0x1 != 0x10008b4de

So, we got open|create replay, which is already committed (it is seen on client and by transno). There is also two checks for versions - first one is for parent (passed), second is for child - failed.

Note the expected version is 0x1 - this is special version for non-existent files, that mean this file didn't exist before this open and that is right - that was just new create. But now we have file with some version and check fails. This is also not error - that also can be if file was created and used. The problem is why this code is executed during replay, that shouldn't be and this is wrong.

Let's look at mdt_reint_open:

if (req_is_replay(req) ||
(req->rq_export->exp_libclient && create_flags&MDS_OPEN_HAS_EA)) {
/* This is a replay request or from liblustre with ea. */
result = mdt_open_by_fid(info, ldlm_rep);
if (result != -ENOENT)

{ <skip> GOTO(out, result); }

so if we have replay then first thing we are trying to do - open file by fid. If file exists then we don't need to re-create it just open is needed.
And only if we get -ENOENT then we pass through this check and will try to re-create file and only in that case versions are checked.
After that we check parent version - it is fine, and then do lookup for child and check its version expecting it to be 1 (ENOENT_VERSION), but instead of that we find child by name and it exists. Therefore we have two options there:

1) FID lookup doesn't work, so we can't find child by FID but can by name
2) this is other file with the same name, but different FID.

Look at debug above:
I am going to open [0x200000400:0x1:0x0]/(14->[0x200000402:0x66c8:0x0]), so child FID is [0x200000402:0x66c8:0x0]
but version is checked for FID [0x200000401:0x4693:0x0].
That is why it fails, we have another file with the same name.

For me that looks like we have bug in close code or orphan code. We have open replays for files which doesn't exist already, I see also other messages in log which says the same, e.g. "OPEN & CREAT not in open replay." - that mean we are doing open without create but file doesn't exists. That can be only if files were deleted but not closed. That is why replays are staying in queue.

I'd propose to check close() paths to make sure they can't be lost. Another check should be done for orphan code - maybe file wasn't closed yet but another client destroy it. Usually that should produce orphan which must be re-opened during replay. Maybe that orphan wasn't created.

I tend to think this is close() issue, not orphans, because we have a lot of tests for orphan handling in replay-single and I doubt there is some bug but such possibility exists still.

Comment by Niu Yawei (Inactive) [ 11/Apr/11 ]

Hi, Sarah

I downloaded the logs and found that each client log is only about 3M, and server log is about 20 MB, so the messages from runracer test are missed in the log. What's the debug log size of the tests? Could you enlarge the debug size (for instance, 50MB for both client and server?) then reproduce it again? Thanks.

Comment by Sarah Liu [ 11/Apr/11 ]

In my configure file, the debug size is set to 32, I will enlarge it and rerun the tests.

Comment by Niu Yawei (Inactive) [ 12/Apr/11 ]

I reproduced this issue on Toro, after analyzing the log, I think I see the bug:

At the begining of insanity test_0, it fails the mds by umount, and the open files on mds were closed during mds umount, for the orphan file, it'll be deleted upon the last close, however, the clients hasn't issue close yet, so all the open requests are still retained on clients. After mds restart and entering recovery, the clients reconnect and replay the open requests, some of the open replays will fail to find orphan objects since they've been deleted during mds umount.

I think probably we should keep the orphan while closing file on umount, anyway, they'll be cleared after recovery or file close, so we needn't worry about orphan leak. Tappro, what's your opinion?

Comment by Andreas Dilger [ 12/Apr/11 ]

The open-unlink files should always be linked from the PENDING directory, so that MDS unmount/crash doesn't drop the last reference to the inode that would otherwise only be held in memory. Is there something broken in the 2.x code related to linking orphan inodes into PENDING? I know there have been some changes in this area in 1.8 in the last year or so, and it is possible that there is a race in how the inode nlink count is checked when a file should be added to PENDING.

Also, I added Oleg to this bug, because I hope he may recall a problem related to file close that was causing the client to not send the "real" close to the MDS even after the file was closed by the application and exited. This is a separate defect that shouldn't be causing this test to fail, but it is something that should be fixed anyway.

Comment by Mikhail Pershin [ 12/Apr/11 ]

Andreas, I don't see defect there, we just close all opened files upon umount and that causing orphan removal. mdt_export_cleanup() do that by walking through list of mfd. 1.8 does the same as I can see.

But there is one difference with old times - now we don't set rdonly mode, that mode prevents orphans previously but after 22299 that was removed. Personally I don't see there a big problem, server umount is not what supposed to be done for working server. But our tests may need modification given the fact that rdonly is not default behavior during umount.

Comment by Di Wang [ 12/Apr/11 ]

In 2.x code, when unlink the orphan inode, only the name entry will be deleted, and the real inode and OI entry will be kept until final close, i.e. the last reference of the lu object is being dropped. So we probably safe for crash, since replay can still find the object by fid (through OI). But for umount, MDT will close the file (without notifying client), and then it will be deleted in OI and disk (triggered by lu_object_put->lu_object_free, different with 1.8), we might in trouble on replay.

It seems we did put the inode in orph list, but never tried to lookup it in the open replay. Niu, maybe you should check the object under orphan before delete the object in OI and ldiskfs? hmm, we might have an un-deleted orphan, if the client crashes or evicted, but it might not worse than 1.8 ? Sorry I was wrong here, mdd_recovery_complete does actually delete the orphan object. Tappro probably know more and give better suggestion. Please correct me, if I am wrong.

Comment by Mikhail Pershin [ 12/Apr/11 ]

I am trying to remember 22299 details and find out that there we tried to keep clients in last_rcvd:
@@ -3054,7 +3054,7 @@ static int filter_disconnect(struct obd_export *exp)

rc = server_disconnect_export(exp);

  • if (exp->exp_obd->obd_replayable)
    + if (obd->obd_replayable && (!obd->obd_fail || exp->exp_failed))
    filter_client_del(exp);

I think this is the same problem - should umount be gentle or not, without rdonly umount removed also last_rcvd completely because of client disconnection. But with changes above clients will be preserved if there is failover. That is why we have problem - the clients are kept during umount but orphans are not.

Previously we have all clients removed - no problem with orphans OR with rdonly nothing was removed actually.

So now we need to preserve orphans if there is failover (like for clients) or we shouldn't preserve anything and don't use umount/mount pair for failover without setting rdonly mode before that.

Comment by Mikhail Pershin [ 12/Apr/11 ]

Btw, from this condition:
if (obd->obd_replayable && (!obd->obd_fail || exp->exp_failed))

the ordinary umount should cause client removal always it seems, obd_fail is set only if 'lctl cleanup' was used but not during umount, or I missed something? That said there shoudn't be recovery on server after umount/mount. But we saw that, is this expected behavior?

Comment by Mikhail Pershin [ 12/Apr/11 ]

the obd_fail is set to 1 by server_stop_servers() just before class_manual_cleanup() call. That is why we have clients in last_rcvd but no orphans.

Comment by Niu Yawei (Inactive) [ 12/Apr/11 ]

As far as I can see, the problem is that we have to preserve client data on failover umount for futhure recovery, however, the orphan was cleared while closing files on umount, such inconsistence would cause open replay error.

For 1.8, we don't have such problem, because the final close on failover umount doesn't clear orphan in 1.8.

My proposal is that we should keep orhpan on failover umount, just like 1.8 does. Which requires some code changes in close/orphan handling path, and the post recovery orphan cleanup as well. Tappro/Andreas/Wangdi/Oleg: Does it sound ok? If you all agree with me, I will try to make a patch to fix it in this way. Thank you.

Comment by Mikhail Pershin [ 12/Apr/11 ]

Niu, yes, I agree though I can't find how 1.8 keeps orphans, can you show, please?

Comment by Niu Yawei (Inactive) [ 13/Apr/11 ]

Hi, Tappro

In 1.8, the 'unlink_orphan' param of mds_mfd_close() should be false when it's called from umount, since the flag 'OBD_OPT_FAILOVER' should has been set to exp->exp_flags when it's not a force umount.

(see server_put_super() -> class_manual_cleanup() -> class_process_config() -> class_cleanup() -> class_disconnect_exports() -> class_disconnect_export_list() -> mds_disconnect() -> mds_cleanup_mfd() -> mds_mfd_close(). obd_fail is set in server_put_super(), exp_flags is set in class_disconnect_export_list(). )

Comment by Mikhail Pershin [ 13/Apr/11 ]

OK, got it, but I wonder shouldn't we use one check for client removal and orphan cleanup? Now obd_fail is checked to decide about client record removal but OBD_OPT_FAILOVER flag for orphans. We need common check for both cases I think

Comment by Mikhail Pershin [ 14/Apr/11 ]

You was able to reproduce that issue, does it disappear with that patch?

Comment by Niu Yawei (Inactive) [ 14/Apr/11 ]

Yes, without this patch, it can be reproduced every time by runracer + insanity, and it passed the 'runrace + insanity' after patch applied.

Comment by Build Master (Inactive) [ 18/Apr/11 ]

Integrated in lustre-master » client,el5-x86_64 #27
LU-184 Keep orphan on failover umount

Oleg Drokin : 5b3eecce0bad98c81a45712594037b6ec7a9024f
Files :

  • lustre/mdt/mdt_handler.c
  • lustre/include/lustre/lustre_idl.h
  • lustre/mdt/mdt_open.c
  • lustre/mdd/mdd_object.c
Comment by Build Master (Inactive) [ 18/Apr/11 ]

Integrated in lustre-master-centos5 #199
LU-184 Keep orphan on failover umount

Oleg Drokin : 5b3eecce0bad98c81a45712594037b6ec7a9024f
Files :

  • lustre/mdt/mdt_open.c
  • lustre/mdt/mdt_handler.c
  • lustre/mdd/mdd_object.c
  • lustre/include/lustre/lustre_idl.h
Comment by Build Master (Inactive) [ 18/Apr/11 ]

Integrated in lustre-master » client,el5-i686 #27
LU-184 Keep orphan on failover umount

Oleg Drokin : 5b3eecce0bad98c81a45712594037b6ec7a9024f
Files :

  • lustre/include/lustre/lustre_idl.h
  • lustre/mdt/mdt_open.c
  • lustre/mdt/mdt_handler.c
  • lustre/mdd/mdd_object.c
Comment by Build Master (Inactive) [ 18/Apr/11 ]

Integrated in lustre-master » client,ubuntu-x86_64 #27
LU-184 Keep orphan on failover umount

Oleg Drokin : 5b3eecce0bad98c81a45712594037b6ec7a9024f
Files :

  • lustre/include/lustre/lustre_idl.h
  • lustre/mdd/mdd_object.c
  • lustre/mdt/mdt_open.c
  • lustre/mdt/mdt_handler.c
Comment by Build Master (Inactive) [ 18/Apr/11 ]

Integrated in lustre-master » server,el6-x86_64 #27
LU-184 Keep orphan on failover umount

Oleg Drokin : 5b3eecce0bad98c81a45712594037b6ec7a9024f
Files :

  • lustre/include/lustre/lustre_idl.h
  • lustre/mdt/mdt_handler.c
  • lustre/mdd/mdd_object.c
  • lustre/mdt/mdt_open.c
Comment by Build Master (Inactive) [ 18/Apr/11 ]

Integrated in lustre-master » client,el6-i686 #27
LU-184 Keep orphan on failover umount

Oleg Drokin : 5b3eecce0bad98c81a45712594037b6ec7a9024f
Files :

  • lustre/include/lustre/lustre_idl.h
  • lustre/mdd/mdd_object.c
  • lustre/mdt/mdt_open.c
  • lustre/mdt/mdt_handler.c
Comment by Build Master (Inactive) [ 18/Apr/11 ]

Integrated in lustre-master » server,el5-x86_64 #27
LU-184 Keep orphan on failover umount

Oleg Drokin : 5b3eecce0bad98c81a45712594037b6ec7a9024f
Files :

  • lustre/mdt/mdt_handler.c
  • lustre/mdd/mdd_object.c
  • lustre/mdt/mdt_open.c
  • lustre/include/lustre/lustre_idl.h
Comment by Build Master (Inactive) [ 18/Apr/11 ]

Integrated in lustre-master » client,el6-x86_64 #27
LU-184 Keep orphan on failover umount

Oleg Drokin : 5b3eecce0bad98c81a45712594037b6ec7a9024f
Files :

  • lustre/mdt/mdt_open.c
  • lustre/mdd/mdd_object.c
  • lustre/include/lustre/lustre_idl.h
  • lustre/mdt/mdt_handler.c
Comment by Build Master (Inactive) [ 18/Apr/11 ]

Integrated in lustre-master » server,el5-i686 #27
LU-184 Keep orphan on failover umount

Oleg Drokin : 5b3eecce0bad98c81a45712594037b6ec7a9024f
Files :

  • lustre/mdt/mdt_open.c
  • lustre/mdt/mdt_handler.c
  • lustre/mdd/mdd_object.c
  • lustre/include/lustre/lustre_idl.h
Comment by Peter Jones [ 18/Apr/11 ]

Sarah confirms that this issue is fixed in the latest build

Generated at Sat Feb 10 01:04:36 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.