Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-18

Allow 100k open files on single client

Details

    • Improvement
    • Resolution: Fixed
    • Minor
    • Lustre 2.11.0
    • Lustre 2.0.0, Lustre 2.1.0
    • None

    Description

      Allow 100k open files per client. Fix client to not store committed open RPCs in the resend list but instead reopen files from the file handles upon recovery (see Simplified Interop) to avoid O behaviour when adding new RPCs to the RPCs-for-recovery list on the client. Fix MDS to store "mfd" in a hash table instead of a linked list to avoid O behaviour when searching for an open file handle. For debugging it would be useful to have a /proc entry on the MDS showing the open FIDs for each client export.

      Attachments

        Issue Links

          Activity

            [LU-18] Allow 100k open files on single client
            adilger Andreas Dilger made changes -
            Fix Version/s New: Lustre 2.11.0 [ 13091 ]
            Resolution New: Fixed [ 1 ]
            Status Original: Open [ 1 ] New: Resolved [ 5 ]

            The referenced patches have landed, and this is likely fixed. There doesn't seem to be any value keeping it open longer.

            adilger Andreas Dilger added a comment - The referenced patches have landed, and this is likely fixed. There doesn't seem to be any value keeping it open longer.
            adilger Andreas Dilger made changes -
            Attachment New: open_100kfiles.patch [ 35404 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-5703 [ LU-5703 ]
            pjones Peter Jones made changes -
            Assignee Original: Niu Yawei [ niu ] New: WC Triage [ wc-triage ]

            Patch https://review.whamcloud.com/12885 "LU-5964 tests: open a large number of files at once" adds a test case for this, which has exposed some issues with memory usage when many files are open. That issue is being addressed by patch https://review.whamcloud.com/27208 "LU-9514 ptlrpc: free reply buffer for replay RPC".

            While that patch reduces the memory usage of saved open replay RPC buffers, it would be better to fix the open replay code as described here - to regenerate a new RPC to reopen files after the initial create has committed, rather than saving the open RPC indefinitely. Saving the RPCs wastes memory, and makes recovery more complex because the RPC format cannot be changed if the server is upgraded when it is offline.

            adilger Andreas Dilger added a comment - Patch https://review.whamcloud.com/12885 " LU-5964 tests: open a large number of files at once" adds a test case for this, which has exposed some issues with memory usage when many files are open. That issue is being addressed by patch https://review.whamcloud.com/27208 " LU-9514 ptlrpc: free reply buffer for replay RPC". While that patch reduces the memory usage of saved open replay RPC buffers, it would be better to fix the open replay code as described here - to regenerate a new RPC to reopen files after the initial create has committed, rather than saving the open RPC indefinitely. Saving the RPCs wastes memory, and makes recovery more complex because the RPC format cannot be changed if the server is upgraded when it is offline.
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-5964 [ LU-5964 ]

            It looks like there is a copy of the patch at https://bugzilla.lustre.org/show_bug.cgi?id=24217

            adilger Andreas Dilger added a comment - It looks like there is a copy of the patch at https://bugzilla.lustre.org/show_bug.cgi?id=24217

            Unfortunately, I can't find a local copy neither. (it could be lost when replacing laptop)

            The most significant change in the patch (client side changes) has been merged in master along with the fix of LU-2613. The server side change is about not reusing open handle on server side when do open replay.

            niu Niu Yawei (Inactive) added a comment - Unfortunately, I can't find a local copy neither. (it could be lost when replacing laptop) The most significant change in the patch (client side changes) has been merged in master along with the fix of LU-2613 . The server side change is about not reusing open handle on server side when do open replay.

            I was trying to find the patch for this ticket, but it seems it was in the old "lustre" project (not "fs/lustre-release" used today) at http://review.whamcloud.com/171/ which has since been removed.

            Niu, do you still have a copy of this patch that you could upload to fs/lustre-release?

            adilger Andreas Dilger added a comment - I was trying to find the patch for this ticket, but it seems it was in the old "lustre" project (not "fs/lustre-release" used today) at http://review.whamcloud.com/171/ which has since been removed. Niu, do you still have a copy of this patch that you could upload to fs/lustre-release?

            People

              wc-triage WC Triage
              niu Niu Yawei (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: