Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-18

Allow 100k open files on single client

Details

    • Improvement
    • Resolution: Fixed
    • Minor
    • Lustre 2.11.0
    • Lustre 2.0.0, Lustre 2.1.0
    • None

    Description

      Allow 100k open files per client. Fix client to not store committed open RPCs in the resend list but instead reopen files from the file handles upon recovery (see Simplified Interop) to avoid O behaviour when adding new RPCs to the RPCs-for-recovery list on the client. Fix MDS to store "mfd" in a hash table instead of a linked list to avoid O behaviour when searching for an open file handle. For debugging it would be useful to have a /proc entry on the MDS showing the open FIDs for each client export.

      Attachments

        Issue Links

          Activity

            [LU-18] Allow 100k open files on single client

            The referenced patches have landed, and this is likely fixed. There doesn't seem to be any value keeping it open longer.

            adilger Andreas Dilger added a comment - The referenced patches have landed, and this is likely fixed. There doesn't seem to be any value keeping it open longer.

            Patch https://review.whamcloud.com/12885 "LU-5964 tests: open a large number of files at once" adds a test case for this, which has exposed some issues with memory usage when many files are open. That issue is being addressed by patch https://review.whamcloud.com/27208 "LU-9514 ptlrpc: free reply buffer for replay RPC".

            While that patch reduces the memory usage of saved open replay RPC buffers, it would be better to fix the open replay code as described here - to regenerate a new RPC to reopen files after the initial create has committed, rather than saving the open RPC indefinitely. Saving the RPCs wastes memory, and makes recovery more complex because the RPC format cannot be changed if the server is upgraded when it is offline.

            adilger Andreas Dilger added a comment - Patch https://review.whamcloud.com/12885 " LU-5964 tests: open a large number of files at once" adds a test case for this, which has exposed some issues with memory usage when many files are open. That issue is being addressed by patch https://review.whamcloud.com/27208 " LU-9514 ptlrpc: free reply buffer for replay RPC". While that patch reduces the memory usage of saved open replay RPC buffers, it would be better to fix the open replay code as described here - to regenerate a new RPC to reopen files after the initial create has committed, rather than saving the open RPC indefinitely. Saving the RPCs wastes memory, and makes recovery more complex because the RPC format cannot be changed if the server is upgraded when it is offline.

            It looks like there is a copy of the patch at https://bugzilla.lustre.org/show_bug.cgi?id=24217

            adilger Andreas Dilger added a comment - It looks like there is a copy of the patch at https://bugzilla.lustre.org/show_bug.cgi?id=24217

            Unfortunately, I can't find a local copy neither. (it could be lost when replacing laptop)

            The most significant change in the patch (client side changes) has been merged in master along with the fix of LU-2613. The server side change is about not reusing open handle on server side when do open replay.

            niu Niu Yawei (Inactive) added a comment - Unfortunately, I can't find a local copy neither. (it could be lost when replacing laptop) The most significant change in the patch (client side changes) has been merged in master along with the fix of LU-2613 . The server side change is about not reusing open handle on server side when do open replay.

            I was trying to find the patch for this ticket, but it seems it was in the old "lustre" project (not "fs/lustre-release" used today) at http://review.whamcloud.com/171/ which has since been removed.

            Niu, do you still have a copy of this patch that you could upload to fs/lustre-release?

            adilger Andreas Dilger added a comment - I was trying to find the patch for this ticket, but it seems it was in the old "lustre" project (not "fs/lustre-release" used today) at http://review.whamcloud.com/171/ which has since been removed. Niu, do you still have a copy of this patch that you could upload to fs/lustre-release?

            Updated patch according to reviewers' comments, submitted for the 2nd round review.

            niu Niu Yawei (Inactive) added a comment - Updated patch according to reviewers' comments, submitted for the 2nd round review.

            Have submit the patch for review.

            niu Niu Yawei (Inactive) added a comment - Have submit the patch for review.

            Talked with Andreas and Ericm, to avoid the conflicts with the simplified interop work, also for easy patch/feature management, I decided to use the separate list for the committed open on client (as Andreas suggested) at the first stage.

            For the server side mfd list, I found that in normal operations, the mfd can always be found in the general handle hash table(class_handle_hash), the list only be scanned in following two cases:

            • For the resent open(and setattr in som), search the mfd in list by matching xid;
            • For the replayed close(and setattr/done_writing in som), search the mfd in list by matching mfd_old_handle (I don't quite understand this, why can't we just keep the old handle for the replayed open? thus this mfd_old_handle trick will be gone);

            so I suppose what we want is:

            • Store the mfd in cfs_hash in stead of global handle hash table (indexed by handle), which requires modifing the general handle hash code to export a handle generator function.
            • Keep old handle for the replayed open, thus the mfd_old_handle matching work can be avoid.
            • Create another cfs_hash for the mfd, indexed by xid, thus list searching for resent open can be avoid.

            Have exchanged my ideas with Andreas.

            niu Niu Yawei (Inactive) added a comment - Talked with Andreas and Ericm, to avoid the conflicts with the simplified interop work, also for easy patch/feature management, I decided to use the separate list for the committed open on client (as Andreas suggested) at the first stage. For the server side mfd list, I found that in normal operations, the mfd can always be found in the general handle hash table(class_handle_hash), the list only be scanned in following two cases: For the resent open(and setattr in som), search the mfd in list by matching xid; For the replayed close(and setattr/done_writing in som), search the mfd in list by matching mfd_old_handle (I don't quite understand this, why can't we just keep the old handle for the replayed open? thus this mfd_old_handle trick will be gone); so I suppose what we want is: Store the mfd in cfs_hash in stead of global handle hash table (indexed by handle), which requires modifing the general handle hash code to export a handle generator function. Keep old handle for the replayed open, thus the mfd_old_handle matching work can be avoid. Create another cfs_hash for the mfd, indexed by xid, thus list searching for resent open can be avoid. Have exchanged my ideas with Andreas.

            People

              wc-triage WC Triage
              niu Niu Yawei (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: