Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2613

opening and closing file can generate 'unreclaimable slab' space

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.6.0, Lustre 2.5.1
    • Lustre 2.1.3, Lustre 2.1.4
    • 3
    • 6116

    Description

      We have a lot of nodes with a large amount of unreclaimable memory (over 4GB). Whatever we try to do (manually shrinking the cache, clearing lru locks, ...) the memory can't be recovered. The only way to get the memory back is to umount the lustre filesystem.

      After some troubleshooting, I was able to wrote a small reproducer where I just open(2) then close(2) files in O_RDWR (my reproducer use to open thousand of files to emphasize the issue).

      Attached 2 programs :

      • gentree.c (cc -o gentree gentree.c -lpthread) to generate a tree of known files (no need to use readdir in reproducer.c)
      • reproducer.c (cc -o reproducer reproduver.c -lpthread) to reproduce the issue.
        The macro BASE_DIR has to be adjust according the local cluster configuration (you should provide the name of a directory located on a lustre filesystem).

      There is no link between the 2 phases as rebooting the client between gentree & reproducer does't avoid the problem. Running gentree (which open as much files as reproducer) doesn't show the issue.

      Attachments

        1. gentree.c
          3 kB
        2. logs_01.tar.gz
          7 kB
        3. reproducer.c
          2 kB

        Issue Links

          Activity

            [LU-2613] opening and closing file can generate 'unreclaimable slab' space
            pjones Peter Jones added a comment -

            Landed for 2.5.1 and 2.6

            pjones Peter Jones added a comment - Landed for 2.5.1 and 2.6
            bogl Bob Glossman (Inactive) added a comment - backport to b2_4 http://review.whamcloud.com/8277
            pjones Peter Jones added a comment -

            Pushing to 2.5.1 because it seems that the patch needs more work

            pjones Peter Jones added a comment - Pushing to 2.5.1 because it seems that the patch needs more work

            Seems the server code has to be changed. Anyway, I introduced a new DISP bit (DISP_OPEN_STRIPE) to identify the open which creates stripe, with this manner, the server/protocol changes are less than the former patch (server returning on-disk transno). Mike could you take a look at the patch? Thanks

            niu Niu Yawei (Inactive) added a comment - Seems the server code has to be changed. Anyway, I introduced a new DISP bit (DISP_OPEN_STRIPE) to identify the open which creates stripe, with this manner, the server/protocol changes are less than the former patch (server returning on-disk transno). Mike could you take a look at the patch? Thanks

            Mike, I realized that not only the open creates object (with DISP_OPEN_CREATE) needs be replayed, the open which creates stripe data needs be replayed as well (see mdt_create_data()), and I don't see how to identify such open on client, any good idea?

            niu Niu Yawei (Inactive) added a comment - Mike, I realized that not only the open creates object (with DISP_OPEN_CREATE) needs be replayed, the open which creates stripe data needs be replayed as well (see mdt_create_data()), and I don't see how to identify such open on client, any good idea?

            Niu, I am not so sure it will be easy to implement, this is just possible way to go but if that will work that would be good.

            tappro Mikhail Pershin added a comment - Niu, I am not so sure it will be easy to implement, this is just possible way to go but if that will work that would be good.

            Mike, your solution looks fine to me, I'll update the patch in this way soon. Thanks.

            niu Niu Yawei (Inactive) added a comment - Mike, your solution looks fine to me, I'll update the patch in this way soon. Thanks.

            Niu, in fact we don't need to wait for commit in case of closed open (no create) and exactly that case causes this bug with unreclaimable space. And I don't see why server help is needed here - client knows there was close and knows this is non-create open - that is enough to make decision to drop request from replay queue. I am not sure though how easy to distinguish non-create case from OPEN-CREATE, at first sign we need to check disposition flag for DISP_OPEN_CREATE bit. So possible solution can be:
            1) after open reply check disposition for DISP_OPEN_CREATE bit and save that information in md_open_data, OR just take disposition from already saved mod_open_req during mdc_close()
            2) in mdc_close() there is already mod->mod_open_req->rq_replay is set to 0, we set also mod_open_req->rq_commit_nowait or any other new flag for non-create open.
            3) in ptlrpc_free_committed() check that rq_commit_nowait flag and free such request immediately no matter what transno it has.

            Will that works? Am I missing something?

            tappro Mikhail Pershin added a comment - Niu, in fact we don't need to wait for commit in case of closed open (no create) and exactly that case causes this bug with unreclaimable space. And I don't see why server help is needed here - client knows there was close and knows this is non-create open - that is enough to make decision to drop request from replay queue. I am not sure though how easy to distinguish non-create case from OPEN-CREATE, at first sign we need to check disposition flag for DISP_OPEN_CREATE bit. So possible solution can be: 1) after open reply check disposition for DISP_OPEN_CREATE bit and save that information in md_open_data, OR just take disposition from already saved mod_open_req during mdc_close() 2) in mdc_close() there is already mod->mod_open_req->rq_replay is set to 0, we set also mod_open_req->rq_commit_nowait or any other new flag for non-create open. 3) in ptlrpc_free_committed() check that rq_commit_nowait flag and free such request immediately no matter what transno it has. Will that works? Am I missing something?

            yes, I also remember we discussed a way to implement openhandle as a LDLM lock and let LDLM to re-enqueue locks at recovery.

            bzzz Alex Zhuravlev added a comment - yes, I also remember we discussed a way to implement openhandle as a LDLM lock and let LDLM to re-enqueue locks at recovery.

            Niu, exactly, and I propose to make that 'existing code' able to drop closed open regardless its transno because it doesn't make sense after close. Current solution is still based on hacking server side in various ways. In fact that can be solved at client side, just by letting closed OPENs to be dropped despite their transno.

            Mike, I think there is no way to achieve this without server side changes, I can think of two ways so far:

            1. Server treats open/close as committed transactions, and returns client both last committed transno & last real transno (on-disk transno), client drops committed open & close request immediately after close. That's what I did in my patch.

            2. Server assigns no transno for open/close, and client open-replay mechanism must be adapted to this change (like Siyao mentioned in the review comment: track open handle in fs layer, and rebuild request when replay the open, and some other changes to open, close, open lock code could be required)

            The second solution looks cleaner to me, but it requires more code changes, and it'll be little tricky to handle open-create & open differently on client side.

            niu Niu Yawei (Inactive) added a comment - Niu, exactly, and I propose to make that 'existing code' able to drop closed open regardless its transno because it doesn't make sense after close. Current solution is still based on hacking server side in various ways. In fact that can be solved at client side, just by letting closed OPENs to be dropped despite their transno. Mike, I think there is no way to achieve this without server side changes, I can think of two ways so far: 1. Server treats open/close as committed transactions, and returns client both last committed transno & last real transno (on-disk transno), client drops committed open & close request immediately after close. That's what I did in my patch. 2. Server assigns no transno for open/close, and client open-replay mechanism must be adapted to this change (like Siyao mentioned in the review comment: track open handle in fs layer, and rebuild request when replay the open, and some other changes to open, close, open lock code could be required) The second solution looks cleaner to me, but it requires more code changes, and it'll be little tricky to handle open-create & open differently on client side.

            People

              hongchao.zhang Hongchao Zhang
              louveta Alexandre Louvet (Inactive)
              Votes:
              1 Vote for this issue
              Watchers:
              31 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: