[LU-2613] opening and closing file can generate 'unreclaimable slab' space Created: 14/Jan/13 Updated: 12/Feb/14 Resolved: 12/Feb/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.1.3, Lustre 2.1.4 |
| Fix Version/s: | Lustre 2.6.0, Lustre 2.5.1 |
| Type: | Bug | Priority: | Major |
| Reporter: | Alexandre Louvet | Assignee: | Hongchao Zhang |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | JL, mn4 | ||
| Attachments: |
|
||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||||||
| Rank (Obsolete): | 6116 | ||||||||||||||||||||||||
| Description |
|
We have a lot of nodes with a large amount of unreclaimable memory (over 4GB). Whatever we try to do (manually shrinking the cache, clearing lru locks, ...) the memory can't be recovered. The only way to get the memory back is to umount the lustre filesystem. After some troubleshooting, I was able to wrote a small reproducer where I just open(2) then close(2) files in O_RDWR (my reproducer use to open thousand of files to emphasize the issue). Attached 2 programs :
There is no link between the 2 phases as rebooting the client between gentree & reproducer does't avoid the problem. Running gentree (which open as much files as reproducer) doesn't show the issue. |
| Comments |
| Comment by Peter Jones [ 14/Jan/13 ] |
|
Niu Could you please look into this one? Thanks Peter |
| Comment by Niu Yawei (Inactive) [ 14/Jan/13 ] |
|
Hi, Alexandre I suppose you are referring lustre client, right? I guess the memory could be used by page/inode/dentry caches, did you try if "echo 3 > /proc/sys/vm/drop_caches" works? If it doesn't work, could you provide the /proc/meminfo & /proc/slabinfo before you running the reproducer, after running reproducer, and after running "echo 3 ../drop_caches"? Thanks in advance. |
| Comment by Alexandre Louvet [ 15/Jan/13 ] |
|
> I suppose you are referring lustre client, right? > did you try if "echo 3 > /proc/sys/vm/drop_caches" works? I did modified a little the reproducer (just limit the number of opened files) to avoid my client to crash.
|
| Comment by Niu Yawei (Inactive) [ 15/Jan/13 ] |
|
I see, thanks Alexandre. The memory was used by the open replay requests. There wasn't any update operation in the reproducer, so the 'last committed transno' is never being updated, and open repaly requests queued on client will never be dropped (since they all have trasno greater than the last committed). You can try to make an update operation (touch a file, for example) after running the reproducer, then do "echo 3 > /proc/sys/vm/drop_caches", and you should see the memroy being reclaimed. I think such use case (open huge number of existing files, close, no any update opeartions) should be very unlikey to happen in the real world, so seems it's not a serious problem. |
| Comment by Alexandre Louvet [ 18/Jan/13 ] |
|
Niu You were right, writing to the filesystem release the memory. Thanks. But I disagree with your assertion regarding the fact it's very unlikely in the real world. This issue has been found due to a user job that was running on thousand of nodes and using mpiio. This application open thousand hundreds of small output files (< 2 MB). Since the application is run on thousands of nodes, and since mpiio is a easy way to handle gather operations on large number of nodes the developer used it. Now, mpiio gather datas and concentrate them on nodes to make bigger IOs, and in this particular case, concentrate everything only on 2 mpi cpus (simple no more than 2 physical nodes), remaining nodes just playing the open(O_RDWR)/close() twist. The remaining part of the story remain in the fact that if the next job running on that node doesn't make a write to the filesystem, it will not be able to use all the memory and due to ENOMEM (the reality is that the batch scheduler running on the job check the amount of usable memory before running a new job and will detect such situation then remove the node from production). |
| Comment by Niu Yawei (Inactive) [ 18/Jan/13 ] |
|
I'll try to compose a patch to fix this, thank you. |
| Comment by Niu Yawei (Inactive) [ 22/Jan/13 ] |
|
patch for b2_1: http://review.whamcloud.com/5143 |
| Comment by Mikhail Pershin [ 07/Feb/13 ] |
|
Niu, am I right that this problem exists in master too? |
| Comment by Niu Yawei (Inactive) [ 07/Feb/13 ] |
|
Hi, Mike. Yes, it's same for master. |
| Comment by Mikhail Pershin [ 10/Feb/13 ] |
|
Nui, I'd try to fix that at client side in master, e.g. we know the problem is that all open requests have transno, even those that don't update anything on disk. The reason for that is just to keep all open in replay queue to issue all opens again in case of recovery. Obviously that is not needed after close, and mdc_close() drops rq_replay flag to 0 so request can be deleted from replay queue, but it isn't because of last_committed value is very old due to current bug. Meanwhile we know that transno is fake for such opens and that check is not needed - request can be deleted from replay queue. I'd try to drop request transno to 0 along with dropping rq_replay flag in mdc_close() (we need to make this only for non-create opens, can we check disposition flags?). The ptlrpc_free_committed() checks that with LBUG() currently, so we need to rework it too, making note that this can be closed open request and allow transno 0 with goto free_req label. At least we are solving client problem on client side, avoiding any work on server. |
| Comment by Jay Lan (Inactive) [ 21/May/13 ] |
|
We at NASA Ames may have hit this problem yesterday. Some 48.2G of memory were stuck in unreclaimable slab: OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 109590860 1252200 1% 0.19K 5479543 20 21918172K cl_page_kmem It was a lustre client node. Both servers and clients run lustre 2.1.5. |
| Comment by Niu Yawei (Inactive) [ 22/May/13 ] |
|
Jay, your case isn't the same problem (umount didn't help), could you open another ticket and provide more detailed information? Thanks. |
| Comment by Jay Lan (Inactive) [ 22/May/13 ] |
|
The system I reported may have a different problem; however, we do have Today we tried the write technique mentioned in this ticket and it indeed That system that had 48.2G of memory stuck in unreclaimable slab was a |
| Comment by Alexey Lyashkov [ 27/May/13 ] |
|
I think fix is too complex, sending global transno as part of any request (instead of per export) is enough. |
| Comment by Niu Yawei (Inactive) [ 27/May/13 ] |
|
Alexey, I don't see why your proposal can fix the problem. For short term release, we can fix it by triggering disk update for very 1000 fake transactions (patchset 2 of http://review.whamcloud.com/5143); For longer term release, I think we have to change the protocol little bit: server reply not only the last_committed_transo (including fake trans), but also the last_committed_ondisk_transo (only real updating trans), then client should have enough knowledge to release open requests regarding to the last_committed_transo. |
| Comment by Alexey Lyashkov [ 27/May/13 ] |
|
Niu, currently - mdt send a transno updates just from export info, so it's last committed transaction for THAT client. root cause of you patch looks fine in situation when we have a single client with ro open / close, without other transaction - but anyway that don't update a client export transaction and don't flush replay queue on client side. |
| Comment by Niu Yawei (Inactive) [ 27/May/13 ] |
|
Alexy, If a client doesn't have it's own transactions, then it'll not have any cached replay requests, what's the point of returning the global last_committed to that client? |
| Comment by Alexey Lyashkov [ 27/May/13 ] |
|
No |
| Comment by Mikhail Pershin [ 27/May/13 ] |
|
Niu is right, if client has no own disk update it shouldn't care about any other. This is just wrong to rely on 'some other server update' to decide we can drop closed open request on client. There can be no updates and we will have the same problem. Each client should have solve own problem with only own information, by that way we will have full solution no matter how many clients we have and what they are doing on server. For this particular problem the issue is on client side, because it holds closed open requests in queue, though it is not needed and we know that it is not needed but we continue to do more tricks on server to make client happy instead of solving that on client itself |
| Comment by Alexey Lyashkov [ 27/May/13 ] |
|
tappro, why? before VBR landed - we always send a last server updates to the client. with VBR we have start using a per export committed transno - which is wrong. as about open requests - that is may be need in open + unlink case to open an orphan as we don't know it's committed or not. |
| Comment by Alex Zhuravlev [ 27/May/13 ] |
|
this is not wrong at all. last_committed can be tracked per export as this is what the client is interested in - own requests. and this improves SMP scalability, by definition. |
| Comment by Mikhail Pershin [ 27/May/13 ] |
|
per export committed is right, it open way to many recovery features and more flexible. It is not bug as you think, but was done specially by design and I see no single reason to go back to the obd_last_committed, especially just to fixing single issue. Each client has exactly OWN commit queue actually, it know nothing about any other client requests. the open-unlink case was solved along with VBR years ago, we have no problems with that. If we have an issue related to this then better to think why it exists and how to fix it. It exists not because last_committed is not cluster wide, there can be no commit from other or there is single client and problem appears again. It exists because client is not able to drop closed open requests if there is no more server activity (or if it is not seeing that activity). IMHO, this is fundamental issue, client shouldn't rely on server activity. I'd try to start from that point. |
| Comment by Vitaly Fertman [ 27/May/13 ] |
|
I do not think, this is client side issue. if server gives transno out, it is supposed to finally say this transno is committed. that's all. client should not think if to drop rpc from replay queue or not. regarding how to inform client this transno is committed - it is up to the server. informing client about other client's committed transno is excused just because it is mdt's logic to create such fake transactions. however, there are other options: a fake transaction commit; or direct exp_last_committed update - which happens from time to time... |
| Comment by Alexey Lyashkov [ 27/May/13 ] |
|
Tappro, i just don't say it's wrong to track per client base, i just say - client should be know about any transno committed as it's have no cost to implement and was done in previous version. i agree, it's open a window when it's single client - but it's very very rare case and we may use a cluster wide transno as short/middle time solution, to fix 2.1 and 2.4. But i will be happy if you will able to deliver a better fix in 2.5 without loosing compatibility with older clients. |
| Comment by Andreas Dilger [ 28/May/13 ] |
|
I've reduced the severity of this bug from Blocker to Major. Clearly, if the problem has existed since before 1.8.0 it cannot be affecting a huge number of users, though it definitely appears to be a problem with using RobinHood (or other tool) to open and close a large number of files. Also, there is a simple workaround - periodically touch any file on the client to force a transaction so that the last_committed value is updated, and the saved RPCs will be flushed. Presumably, Robin Hood cannot be modified to get the information it needs without the open/close (e.g. stat() instead of fstat() and similar)? That would be even less work on the part of the client. In the short term, to work around the client bug, it could also be made to modify some temporary file in the filesystem (e.g. mknod(), then periodic utimes() to generate transactions) until this problem is resolved. The correct long-term solution to this problem is as Mike suggested early on in this bug - to decouple the open-replay handling from the RPC replay mechanism, since it isn't really the RPC layer's job to re-open files that are no longer involved in the transactions. The RPC replay is of course correct for open(O_CREAT) that did not yet commit and/or close, but it doesn't make sense to keep non-creating open/close RPCs around after that time. We had previously discussed moving the file open-replay handling up to the llite layer in the context of "Simplified Interoperability" (http://wiki.lustre.org/images/0/0b/Simplified_InteropRecovery.pdf and https://projectlava.xyratex.com/show_bug.cgi?id=18496). In the short term (2.1 and 2.4) there are a few compromise solutions possible. If the server is doing other transactions, then it might make sense to return a last_committed value < the the most recent "fake" transaction number. One possibility is to return last_committed = min(last_committed, inode_version - 1) in the RPC replies so that the clients don't get any state that they do not need, but at least depend on the recovery state of any file they have accessed? Alternately, at one time we discussed returning duplicate (old) transaction numbers for opens that do not create the file. This allows the files to be replayed in the proper order after recovery, but they do not change the state on disk. |
| Comment by Andreas Dilger [ 29/May/13 ] |
|
Thinking about this further, I suspect the following will work correctly for new and old clients. The actual RPC open transno could be inode_version, but will have a later XID from the client, so it will sort correctly in the client replay list. The close RPC transno can be max(inode_version of inode, max inode_version accessed by this export - 1). The client RPC replay ordering should be in [transno, XID] order, so if this client also created the file, then the later open/close RPCs would still be replayed after it. When the close gets a reply, this will also naturally ensure that both the open/close RPC transnos are < last_committed, if the object create itself was committed. For both open and close, it should be enough to return: last_committed value = min(max(inode_version accessed by this export),
last_committed)
so that the client never "sees" a last_committed value newer than the inode_version (i.e. last change transaction) of any object it has accessed, and the actual last_committed value in case the inode was changed since the most recent transaction. I don't think there is any problem with the client or server having duplicate transaction numbers, per comment in ptlrpc_retain_replayable_request(): /* We may have duplicate transnos if we create and then * open a file, or for closes retained if to match creating * opens, so use req->rq_xid as a secondary key. * (See bugs 684, 685, and 428.) * XXX no longer needed, but all opens need transnos! This avoids the ever-growing transaction number for open+close that do not actually affect the on-disk state (except for keeping objects open for open-but-unlinked files). So long as the client has the open after the object was first created, and the close before the current commit (which can't affect any open-unlinked state if some other client still has not sent a close RPC). |
| Comment by Niu Yawei (Inactive) [ 29/May/13 ] |
|
Andreas, I think your proposal is based on the assumption of there are other clients doing disk update in the cluster, right? What if there isn't any client doing disk updates or the disk updates frequency isn't high enough? I think that's the situation described in this ticket. |
| Comment by Alexey Lyashkov [ 29/May/13 ] |
|
Niu, window open only in case none updates in cluster exist. Any other cases will be solve problem, but it's unlikely in cluster where any client . if affected client will move to idle after open/close series - it will have a last_commited updates via ping. |
| Comment by Andreas Dilger [ 30/May/13 ] |
|
Niu, I think my proposal should work even if there are no changes being made to the filesystem at all. The open transno would be inode_version and the close transno would be max(inode_version, export last_committed), so it doesn't expose any newer transno to the client. This also ensures that when the files are closed, the open/close RPCs are < last_committed, and will be dropped from the replay list on the client. |
| Comment by Alexey Lyashkov [ 30/May/13 ] |
|
Andreas, that is will be don't work. because if client mounted lustre with noatime, and executed just a open(O_RDONLY)+close they have a zero in peer_last_commited, because none updates will be send to him. |
| Comment by Mikhail Pershin [ 30/May/13 ] |
|
Andreas, that may work, yes. Besides it will be really good to get rid of mdt_empty_transno() code. Alexey, with your example the imp_peer_committed_transno will be still updated because last_committed is to be returned from server as: last_committed value = min(max(inode_version accessed by this export),
last_committed)
'accessed' means not just updates but open/close too. |
| Comment by Niu Yawei (Inactive) [ 31/May/13 ] |
|
I see, but I'm not sure if duplicated transno will bring us trouble. I'll try to cooke a patch to find it out later. Thanks, Andreas. |
| Comment by Andreas Dilger [ 31/May/13 ] |
|
See my earlier comment in this bug. We used to have duplicate transnos for open/close at one time in the past, or at least it was something we strongly thought about, and the client code is ready for this to happen. |
| Comment by Niu Yawei (Inactive) [ 03/Jun/13 ] |
|
I'm not sure if there is any good way to handle duplicated transnos in replay:
Then server how to replay the two open requests with same transno during recovery? |
| Comment by Andreas Dilger [ 03/Jun/13 ] |
|
Three options would be:
|
| Comment by Niu Yawei (Inactive) [ 07/Jun/13 ] |
|
Thank you, Andreas. Looks the first option needs to change client (or even protocol), I think if we have to change client & protocol, we'd choose some cleaner & easier way: reply back two last_committed transnos (last_committed & on-disk last_committed, similar to my patchset 1) or totally change the open replay mechanism as you mentioned. I tried the second way today and discovered another problem: for performance reason, client only tries to scan the replay list to free the committed/closed requests whenever the last_committed is bumped, which means without client code changes, the open requests won't be freed promptly, even they have transno smaller than last_committed. Even if we change the client code, we'd think out some better way rather than scanning the replay list on every reply. So, as a short term solution, I think we'd go back to keep using my patchset 2 (update disk on server periodically), or ask user to do this in userspace by themselves? What do you think about? |
| Comment by Andreas Dilger [ 08/Jun/13 ] |
|
Niu, for second option I thought the client would not even try to put the RPC into the replay list if transno <= last_committed? Is it not checked in the close RPC callback to drop both requests from this list if both transnos are <= last_committed? Iff a fake transno is given to the client to avoid a duplicate open+create transno, it should be inode_version+1, and upon close if the real last_committed is >= inode_version+1 then it should send inode_version+1 back to the client for last_committed. In any case, I don't mind also fixing this on the client as long as it doesn't break the interoperability. The client should be smart enough to cancel matching pairs of open+close if they are <= last_committed. That won't solve the problem by itself, but in conjunction with the server changes not to invent fake transnos > last_committed it should work. I think this is a combination of two bugs - server giving out transnos that are > last_committed and never committing them, and also that the client does not drop RPCs if the last_committed doesn't change. This should be possible to handle without scanning the list each time I think. |
| Comment by Niu Yawei (Inactive) [ 09/Jun/13 ] |
That's ture for close request but not ture for open request, open request has to be retained regardless of transno.
No, we didn't do that, but it's easy to fix if we change the client code.
One thing can't be handled well by this manner is: If a client never do update operation, it'll always have zero exp_last_committed, which means any replied trasno can't be larger than 0... We may fix it by generating an update operation upon each client connection (connect becomes an update operation? that looks quite dirty to me), or we should just ignore this rare case (no any update from a client) for this moment? Any suggestion? Thanks. |
| Comment by Andreas Dilger [ 09/Jun/13 ] |
|
Per previous comments, the client exp_last_committed should be min(last_committed, max(inode_version of all inodes accessed). That ensures that I'd the client is accessing committed inodes that they can be discarded on close (assuming it isn't the very last file created). |
| Comment by Alexey Lyashkov [ 12/Jun/13 ] |
|
looks i found one more issue in same area. void ptlrpc_commit_replies(struct obd_export *exp)
{
struct ptlrpc_reply_state *rs, *nxt;
DECLARE_RS_BATCH(batch);
ENTRY;
rs_batch_init(&batch);
/* Find any replies that have been committed and get their service
* to attend to complete them. */
/* CAVEAT EMPTOR: spinlock ordering!!! */
spin_lock(&exp->exp_uncommitted_replies_lock);
cfs_list_for_each_entry_safe(rs, nxt, &exp->exp_uncommitted_replies,
rs_obd_list) {
LASSERT (rs->rs_difficult);
/* VBR: per-export last_committed */
LASSERT(rs->rs_export);
if (rs->rs_transno <= exp->exp_last_committed) {
cfs_list_del_init(&rs->rs_obd_list);
rs_batch_add(&batch, rs);
}
}
spin_unlock(&exp->exp_uncommitted_replies_lock);
rs_batch_fini(&batch);
EXIT;
}
but exp_last_commited is zero until modification request is hit but we have send a locks via getattr request without any transaction opened. |
| Comment by Niu Yawei (Inactive) [ 13/Jun/13 ] |
Andreas, my point is that last_committed returned to client can't be larger than the exp_last_committed on disk slot, otherwise, client would think server data lost on next recovery (see recovery-small.sh test_54). So, if a client doesn't update disk, it's exp_last_committed on disk should always be zero, and we can't return last_committed larger than 0 in such case. But I think we may force a disk update in this case. Another problem is that client code assumes transno unique in some places, such as ptlrpc_replay_next(): cfs_list_for_each_safe(tmp, pos, &imp->imp_replay_list) {
req = cfs_list_entry(tmp, struct ptlrpc_request,
rq_replay_list);
/* If need to resend the last sent transno (because a
reconnect has occurred), then stop on the matching
req and send it again. If, however, the last sent
transno has been committed then we continue replay
from the next request. */
if (req->rq_transno > last_transno) {
if (imp->imp_resend_replay)
lustre_msg_add_flags(req->rq_reqmsg,
MSG_RESENT);
break;
}
req = NULL;
}
If there are duplicate transno requests, some requests will be skipped during replay, so the old client will have trouble with this manner. |
| Comment by Andreas Dilger [ 13/Jun/13 ] |
|
Does this affect the case if the duplicate transnos are < last_committed (i.e. the open/close RPCs)? I thought those ones are just replayed on the server? Ah, you are referencing the client code that replays the RPCs... If it would make the implementation easier, it would be possible to negotiate between the client and server with MSG_* flags in the ptlrpc_body whether they can handle duplicate transno or not. If not, then the server would have to do occasional commits to bump the transno, otherwise this could be avoided for newer clients & servers. |
| Comment by Niu Yawei (Inactive) [ 13/Jun/13 ] |
|
Andreas, given that we're going to add MSG flag, I'm thinking that if it would be better to pack another transno in reply. Looks pb_last_seen in ptlrpc_body is not used, I think we can use it to carry the last committed on-disk transo (pb_last_committed stores the last committed on-disk/fake transno), so that we can resolve the issue without introducing duplicate transo. I'll update the patch in this way if it's fine with you. Thanks. |
| Comment by Niu Yawei (Inactive) [ 17/Jun/13 ] |
|
patch for master: http://review.whamcloud.com/6665 |
| Comment by Mikhail Pershin [ 01/Jul/13 ] |
|
Niu, Andreas, this become more and more complex as I can see. The problem we are trying to solve is that client keeps closed open requests in replay queue, right? Meanwhile the client itself wants them to be dropped from that queue, see mdc_close(): /* We no longer want to preserve this open for replay even * though the open was committed. b=3632, b=3633 */ spin_lock(&mod->mod_open_req->rq_lock); mod->mod_open_req->rq_replay = 0; spin_unlock(&mod->mod_open_req->rq_lock); So the problem is that request is not dropped when client ask to do that. That is because of last_committed check which is the only mechanism to drop request and that means for me we just need to add another one to drop request from replay queue regardless its transno. E.g. rq_closed_open flag can be added and checked to drop request. That could be much simpler. Did I miss something and there are other cases we are trying to solve? |
| Comment by Niu Yawei (Inactive) [ 01/Jul/13 ] |
|
Mike, right. I mentioned this (the open request can only be freed when the last_committed on client is bumped) in previous comment, and it's same for the close request. Adding another flag for checking open/close request might be work, what about my solution in http://review.whamcloud.com/6665 ? Could you review it? Thanks. |
| Comment by Andreas Dilger [ 01/Jul/13 ] |
|
If the request is not doing a create, couldn't both the open and close RPC be dropped at this time, regardless of the transno? |
| Comment by Niu Yawei (Inactive) [ 02/Jul/13 ] |
|
Andreas, the existing code can only drop open/close request when the last_committed seen by client is bumped, no matter if it's an open_create or not. |
| Comment by Mikhail Pershin [ 15/Jul/13 ] |
|
Niu, exactly, and I propose to make that 'existing code' able to drop closed open regardless its transno because it doesn't make sense after close. Current solution is still based on hacking server side in various ways. In fact that can be solved at client side, just by letting closed OPENs to be dropped despite their transno. |
| Comment by Alex Zhuravlev [ 15/Jul/13 ] |
|
I guess it still makes some sense if open created a file? |
| Comment by Niu Yawei (Inactive) [ 15/Jul/13 ] |
Mike, I think there is no way to achieve this without server side changes, I can think of two ways so far: 1. Server treats open/close as committed transactions, and returns client both last committed transno & last real transno (on-disk transno), client drops committed open & close request immediately after close. That's what I did in my patch. 2. Server assigns no transno for open/close, and client open-replay mechanism must be adapted to this change (like Siyao mentioned in the review comment: track open handle in fs layer, and rebuild request when replay the open, and some other changes to open, close, open lock code could be required) The second solution looks cleaner to me, but it requires more code changes, and it'll be little tricky to handle open-create & open differently on client side. |
| Comment by Alex Zhuravlev [ 15/Jul/13 ] |
|
yes, I also remember we discussed a way to implement openhandle as a LDLM lock and let LDLM to re-enqueue locks at recovery. |
| Comment by Mikhail Pershin [ 16/Jul/13 ] |
|
Niu, in fact we don't need to wait for commit in case of closed open (no create) and exactly that case causes this bug with unreclaimable space. And I don't see why server help is needed here - client knows there was close and knows this is non-create open - that is enough to make decision to drop request from replay queue. I am not sure though how easy to distinguish non-create case from OPEN-CREATE, at first sign we need to check disposition flag for DISP_OPEN_CREATE bit. So possible solution can be: Will that works? Am I missing something? |
| Comment by Niu Yawei (Inactive) [ 17/Jul/13 ] |
|
Mike, your solution looks fine to me, I'll update the patch in this way soon. Thanks. |
| Comment by Mikhail Pershin [ 17/Jul/13 ] |
|
Niu, I am not so sure it will be easy to implement, this is just possible way to go but if that will work that would be good. |
| Comment by Niu Yawei (Inactive) [ 18/Jul/13 ] |
|
Mike, I realized that not only the open creates object (with DISP_OPEN_CREATE) needs be replayed, the open which creates stripe data needs be replayed as well (see mdt_create_data()), and I don't see how to identify such open on client, any good idea? |
| Comment by Niu Yawei (Inactive) [ 18/Jul/13 ] |
|
Seems the server code has to be changed. Anyway, I introduced a new DISP bit (DISP_OPEN_STRIPE) to identify the open which creates stripe, with this manner, the server/protocol changes are less than the former patch (server returning on-disk transno). Mike could you take a look at the patch? Thanks |
| Comment by Peter Jones [ 02/Oct/13 ] |
|
Pushing to 2.5.1 because it seems that the patch needs more work |
| Comment by Bob Glossman (Inactive) [ 14/Nov/13 ] |
|
backport to b2_4 |
| Comment by Peter Jones [ 12/Feb/14 ] |
|
Landed for 2.5.1 and 2.6 |