[LU-2613] opening and closing file can generate 'unreclaimable slab' space - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: Lustre 2.6.0, Lustre 2.5.1
Affects Version/s: Lustre 2.1.3, Lustre 2.1.4
Labels:
- JL
- mn4

Severity:
3
Rank (Obsolete):
6116

Description

We have a lot of nodes with a large amount of unreclaimable memory (over 4GB). Whatever we try to do (manually shrinking the cache, clearing lru locks, ...) the memory can't be recovered. The only way to get the memory back is to umount the lustre filesystem.

After some troubleshooting, I was able to wrote a small reproducer where I just open(2) then close(2) files in O_RDWR (my reproducer use to open thousand of files to emphasize the issue).

Attached 2 programs :

gentree.c (cc -o gentree gentree.c -lpthread) to generate a tree of known files (no need to use readdir in reproducer.c)
reproducer.c (cc -o reproducer reproduver.c -lpthread) to reproduce the issue.
The macro BASE_DIR has to be adjust according the local cluster configuration (you should provide the name of a directory located on a lustre filesystem).

There is no link between the 2 phases as rebooting the client between gentree & reproducer does't avoid the problem. Running gentree (which open as much files as reproducer) doesn't show the issue.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

gentree.c
3 kB
14/Jan/13 8:38 AM
logs_01.tar.gz
7 kB
15/Jan/13 6:08 AM
reproducer.c
2 kB
14/Jan/13 8:38 AM

Issue Links

is duplicated by

LU-3399 MDT don't update client last commited correctly so produce OOM on client

Open

LU-3381 lock enqueue fails for open(".lustre/fid", ...)

Closed

is related to

LU-4272 lu_device_fini()) ASSERTION( cfs_atomic_read(&d->ld_ref) == 0 ) failed from lovsub_device_free

Resolved

LU-4270 Interop 2.5.0<->2.6 failure on test suite sanity test_209: open/close requests are not freed

Resolved

Activity

[LU-2613] opening and closing file can generate 'unreclaimable slab' space

Niu Yawei (Inactive) added a comment - 15/Jul/13 7:49 AM

Niu, exactly, and I propose to make that 'existing code' able to drop closed open regardless its transno because it doesn't make sense after close. Current solution is still based on hacking server side in various ways. In fact that can be solved at client side, just by letting closed OPENs to be dropped despite their transno.

Mike, I think there is no way to achieve this without server side changes, I can think of two ways so far:

1. Server treats open/close as committed transactions, and returns client both last committed transno & last real transno (on-disk transno), client drops committed open & close request immediately after close. That's what I did in my patch.

2. Server assigns no transno for open/close, and client open-replay mechanism must be adapted to this change (like Siyao mentioned in the review comment: track open handle in fs layer, and rebuild request when replay the open, and some other changes to open, close, open lock code could be required)

The second solution looks cleaner to me, but it requires more code changes, and it'll be little tricky to handle open-create & open differently on client side.

Niu Yawei (Inactive) added a comment - 15/Jul/13 7:49 AM Niu, exactly, and I propose to make that 'existing code' able to drop closed open regardless its transno because it doesn't make sense after close. Current solution is still based on hacking server side in various ways. In fact that can be solved at client side, just by letting closed OPENs to be dropped despite their transno. Mike, I think there is no way to achieve this without server side changes, I can think of two ways so far: 1. Server treats open/close as committed transactions, and returns client both last committed transno & last real transno (on-disk transno), client drops committed open & close request immediately after close. That's what I did in my patch. 2. Server assigns no transno for open/close, and client open-replay mechanism must be adapted to this change (like Siyao mentioned in the review comment: track open handle in fs layer, and rebuild request when replay the open, and some other changes to open, close, open lock code could be required) The second solution looks cleaner to me, but it requires more code changes, and it'll be little tricky to handle open-create & open differently on client side.

Alex Zhuravlev added a comment - 15/Jul/13 7:04 AM

I guess it still makes some sense if open created a file?

Alex Zhuravlev added a comment - 15/Jul/13 7:04 AM I guess it still makes some sense if open created a file?

Mikhail Pershin added a comment - 15/Jul/13 5:30 AM

Niu, exactly, and I propose to make that 'existing code' able to drop closed open regardless its transno because it doesn't make sense after close. Current solution is still based on hacking server side in various ways. In fact that can be solved at client side, just by letting closed OPENs to be dropped despite their transno.

Mikhail Pershin added a comment - 15/Jul/13 5:30 AM Niu, exactly, and I propose to make that 'existing code' able to drop closed open regardless its transno because it doesn't make sense after close. Current solution is still based on hacking server side in various ways. In fact that can be solved at client side, just by letting closed OPENs to be dropped despite their transno.

Niu Yawei (Inactive) added a comment - 02/Jul/13 2:52 AM

Andreas, the existing code can only drop open/close request when the last_committed seen by client is bumped, no matter if it's an open_create or not.

Niu Yawei (Inactive) added a comment - 02/Jul/13 2:52 AM Andreas, the existing code can only drop open/close request when the last_committed seen by client is bumped, no matter if it's an open_create or not.

Andreas Dilger added a comment - 01/Jul/13 4:15 PM

If the request is not doing a create, couldn't both the open and close RPC be dropped at this time, regardless of the transno?

Andreas Dilger added a comment - 01/Jul/13 4:15 PM If the request is not doing a create, couldn't both the open and close RPC be dropped at this time, regardless of the transno?

Niu Yawei (Inactive) added a comment - 01/Jul/13 11:43 AM

Mike, right. I mentioned this (the open request can only be freed when the last_committed on client is bumped) in previous comment, and it's same for the close request. Adding another flag for checking open/close request might be work, what about my solution in http://review.whamcloud.com/6665 ? Could you review it? Thanks.

Niu Yawei (Inactive) added a comment - 01/Jul/13 11:43 AM Mike, right. I mentioned this (the open request can only be freed when the last_committed on client is bumped) in previous comment, and it's same for the close request. Adding another flag for checking open/close request might be work, what about my solution in http://review.whamcloud.com/6665 ? Could you review it? Thanks.

Mikhail Pershin added a comment - 01/Jul/13 9:46 AM

Niu, Andreas, this become more and more complex as I can see. The problem we are trying to solve is that client keeps closed open requests in replay queue, right? Meanwhile the client itself wants them to be dropped from that queue, see mdc_close():

                /* We no longer want to preserve this open for replay even
                 * though the open was committed. b=3632, b=3633 */
		spin_lock(&mod->mod_open_req->rq_lock);
		mod->mod_open_req->rq_replay = 0;
		spin_unlock(&mod->mod_open_req->rq_lock);

So the problem is that request is not dropped when client ask to do that. That is because of last_committed check which is the only mechanism to drop request and that means for me we just need to add another one to drop request from replay queue regardless its transno. E.g. rq_closed_open flag can be added and checked to drop request. That could be much simpler. Did I miss something and there are other cases we are trying to solve?

Mikhail Pershin added a comment - 01/Jul/13 9:46 AM Niu, Andreas, this become more and more complex as I can see. The problem we are trying to solve is that client keeps closed open requests in replay queue, right? Meanwhile the client itself wants them to be dropped from that queue, see mdc_close(): /* We no longer want to preserve this open for replay even * though the open was committed. b=3632, b=3633 */ spin_lock(&mod->mod_open_req->rq_lock); mod->mod_open_req->rq_replay = 0; spin_unlock(&mod->mod_open_req->rq_lock); So the problem is that request is not dropped when client ask to do that. That is because of last_committed check which is the only mechanism to drop request and that means for me we just need to add another one to drop request from replay queue regardless its transno. E.g. rq_closed_open flag can be added and checked to drop request. That could be much simpler. Did I miss something and there are other cases we are trying to solve?

Niu Yawei (Inactive) added a comment - 17/Jun/13 7:22 AM

patch for master: http://review.whamcloud.com/6665

Niu Yawei (Inactive) added a comment - 17/Jun/13 7:22 AM patch for master: http://review.whamcloud.com/6665

Niu Yawei (Inactive) added a comment - 13/Jun/13 1:45 PM

Andreas, given that we're going to add MSG flag, I'm thinking that if it would be better to pack another transno in reply. Looks pb_last_seen in ptlrpc_body is not used, I think we can use it to carry the last committed on-disk transo (pb_last_committed stores the last committed on-disk/fake transno), so that we can resolve the issue without introducing duplicate transo. I'll update the patch in this way if it's fine with you. Thanks.

Niu Yawei (Inactive) added a comment - 13/Jun/13 1:45 PM Andreas, given that we're going to add MSG flag, I'm thinking that if it would be better to pack another transno in reply. Looks pb_last_seen in ptlrpc_body is not used, I think we can use it to carry the last committed on-disk transo (pb_last_committed stores the last committed on-disk/fake transno), so that we can resolve the issue without introducing duplicate transo. I'll update the patch in this way if it's fine with you. Thanks.

Andreas Dilger added a comment - 13/Jun/13 10:09 AM

Does this affect the case if the duplicate transnos are < last_committed (i.e. the open/close RPCs)? I thought those ones are just replayed on the server? Ah, you are referencing the client code that replays the RPCs...

If it would make the implementation easier, it would be possible to negotiate between the client and server with MSG_* flags in the ptlrpc_body whether they can handle duplicate transno or not. If not, then the server would have to do occasional commits to bump the transno, otherwise this could be avoided for newer clients & servers.

Andreas Dilger added a comment - 13/Jun/13 10:09 AM Does this affect the case if the duplicate transnos are < last_committed (i.e. the open/close RPCs)? I thought those ones are just replayed on the server? Ah, you are referencing the client code that replays the RPCs... If it would make the implementation easier, it would be possible to negotiate between the client and server with MSG_* flags in the ptlrpc_body whether they can handle duplicate transno or not. If not, then the server would have to do occasional commits to bump the transno, otherwise this could be avoided for newer clients & servers.

Niu Yawei (Inactive) added a comment - 13/Jun/13 2:08 AM - edited

Per previous comments, the client exp_last_committed should be min(last_committed, max(inode_version of all inodes accessed). That ensures that I'd the client is accessing committed inodes that they can be discarded on close (assuming it isn't the very last file created).

Andreas, my point is that last_committed returned to client can't be larger than the exp_last_committed on disk slot, otherwise, client would think server data lost on next recovery (see recovery-small.sh test_54). So, if a client doesn't update disk, it's exp_last_committed on disk should always be zero, and we can't return last_committed larger than 0 in such case. But I think we may force a disk update in this case.

Another problem is that client code assumes transno unique in some places, such as ptlrpc_replay_next():

        cfs_list_for_each_safe(tmp, pos, &imp->imp_replay_list) {
                req = cfs_list_entry(tmp, struct ptlrpc_request,
                                     rq_replay_list);

                /* If need to resend the last sent transno (because a
                   reconnect has occurred), then stop on the matching
                   req and send it again. If, however, the last sent
                   transno has been committed then we continue replay
                   from the next request. */
                if (req->rq_transno > last_transno) {
                        if (imp->imp_resend_replay)
                                lustre_msg_add_flags(req->rq_reqmsg,
                                                     MSG_RESENT);
                        break;
                }
                req = NULL;
        }

If there are duplicate transno requests, some requests will be skipped during replay, so the old client will have trouble with this manner.

Niu Yawei (Inactive) added a comment - 13/Jun/13 2:08 AM - edited Per previous comments, the client exp_last_committed should be min(last_committed, max(inode_version of all inodes accessed). That ensures that I'd the client is accessing committed inodes that they can be discarded on close (assuming it isn't the very last file created). Andreas, my point is that last_committed returned to client can't be larger than the exp_last_committed on disk slot, otherwise, client would think server data lost on next recovery (see recovery-small.sh test_54). So, if a client doesn't update disk, it's exp_last_committed on disk should always be zero, and we can't return last_committed larger than 0 in such case. But I think we may force a disk update in this case. Another problem is that client code assumes transno unique in some places, such as ptlrpc_replay_next(): cfs_list_for_each_safe(tmp, pos, &imp->imp_replay_list) { req = cfs_list_entry(tmp, struct ptlrpc_request, rq_replay_list); /* If need to resend the last sent transno (because a reconnect has occurred), then stop on the matching req and send it again. If, however, the last sent transno has been committed then we continue replay from the next request. */ if (req->rq_transno > last_transno) { if (imp->imp_resend_replay) lustre_msg_add_flags(req->rq_reqmsg, MSG_RESENT); break ; } req = NULL; } If there are duplicate transno requests, some requests will be skipped during replay, so the old client will have trouble with this manner.

People

Assignee:: Hongchao Zhang

Reporter:: Alexandre Louvet (Inactive)

Votes:: 1 Vote for this issue

Watchers:: 31 Start watching this issue

Dates

Created:: 14/Jan/13 8:38 AM

Updated:: 12/Feb/14 11:10 PM

Resolved:: 12/Feb/14 11:10 PM