Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3319

Adapt to 3.10 upstream kernel proc_dir_entry change

Details

    • Improvement
    • Resolution: Fixed
    • Minor
    • Lustre 2.6.0
    • Lustre 2.5.0
    • None
    • 8278

    Description

      In kernel 3.10 merge window, there is an important change in procfs subsystem that hides all details of proc_dir_entry structure.

      This affects Lustre in two ways. First is that the read_proc_t and write_proc_t members are now removed so Lustre has to be converted to use seq_file approach. The second one is that Lustre can no longer do child entry lookup inside lprocfs_srch.

      With the private proc_dir_entry structure and related exported APIs, the only possible/safe way to do child lookup is through path walking. Each pde has a atomic counter that is initialized to 1. And remove_proc_entry will decrease the counter to 0 and the free the pde. The counter is only increased by procfs when the corresponding proc dentry is instantiated during path walk. And it is decreased when the proc inode is evicted. There is no way an external module can hold a ref count on the dentry thus a pde can go away at any time if remove_proc_entry is called on it.

      Therefore, callers has to make sure that they don't use any removed pde directly. OTOH, Lustre saves its created pde and passes it around in many places. Later those pointers are used without any protection. Would it be a problem? The situation is true for pre-3.10 kernel as well because Lustre never holds references on proc_dir_entry.

      The existence of LPROCFS_ENTRY_AND_CHECK() also seems confusing. It indicates that a pde may be already invalidated when accessed by Lustre. However, all callers of LPROCFS_ENTRY_AND_CHECK() are in fact part of pde->proc_fops, which is called in side of proc_reg_file_ops where pde is protected by pde->in_use counter (pde_users for older kernels). When pde->in_use is positive, there is no way a pde can be removed. So IMOH LPROCFS_ENTRY_AND_CHECK() seems unnecessary.

      Also, it seems that _lprocfs_lock is to protect child pde insert/remove/search, whereas kernel already has proc_subdir_lock doing the same job. But lprocfs_srch is the only exception. If we can get rid of lprocfs_srch, I think we can get rid of _lprocfs_lock as well. Since 3.10 is still under development, we can make a patch to procfs exporting a search function and send it to upstream. In this way, lprocfs_srch can be removed.

      Also, removing LPROCFS_ENTRY_AND_CHECK/_lprocfs_lock or not, we cannot dereference a removed pde directly. Is there any existing mechanism to make sure it doesn't happen? Many proc_dir_entry are created by Lustre and passed around without protection. It seems dangerous to me but since it is working so far, maybe I am missing something?

      Attachments

        Issue Links

          Activity

            [LU-3319] Adapt to 3.10 upstream kernel proc_dir_entry change

            Patches landed to Master. New ticket to cleanup technical debt created LU-5275

            jlevi Jodi Levi (Inactive) added a comment - Patches landed to Master. New ticket to cleanup technical debt created LU-5275

            The last two server patches have landed. All that is left is to create a cleanup patch to remove the technical debt.

            simmonsja James A Simmons added a comment - The last two server patches have landed. All that is left is to create a cleanup patch to remove the technical debt.

            The issues in NRS TBF have been addressed. Only two patches left to inspect.

            http://review.whamcloud.com/#/c/7934 // ZFS
            http://review.whamcloud.com/#/c/8049 /// MDD/OFD which should have been fixed

            Once we are down to one patch the last cleanup patch will be pushed.

            simmonsja James A Simmons added a comment - The issues in NRS TBF have been addressed. Only two patches left to inspect. http://review.whamcloud.com/#/c/7934 // ZFS http://review.whamcloud.com/#/c/8049 /// MDD/OFD which should have been fixed Once we are down to one patch the last cleanup patch will be pushed.
            simmonsja James A Simmons added a comment - - edited

            Time for a status update. Several patches have been merged. Also many dependence patches have been merged as well. The patches left are:

            http://review.whamcloud.com/#/c/7934 // ZFS
            http://review.whamcloud.com/#/c/7936 // OSP
            http://review.whamcloud.com/#/c/8049 // MDD/OFD currently has a memory leak
            http://review.whamcloud.com/#/c/9384 // NRS TBF investigating how it works

            The first two patches above are ready for review.

            simmonsja James A Simmons added a comment - - edited Time for a status update. Several patches have been merged. Also many dependence patches have been merged as well. The patches left are: http://review.whamcloud.com/#/c/7934 // ZFS http://review.whamcloud.com/#/c/7936 // OSP http://review.whamcloud.com/#/c/8049 // MDD/OFD currently has a memory leak http://review.whamcloud.com/#/c/9384 // NRS TBF investigating how it works The first two patches above are ready for review.

            Yep Hammond reported that bug in LU-4953. We have a patch at http://review.whamcloud.com/#/c/10192.

            simmonsja James A Simmons added a comment - Yep Hammond reported that bug in LU-4953 . We have a patch at http://review.whamcloud.com/#/c/10192 .
            ys Yang Sheng added a comment -

            Hi, James, I found a big issue in http://review.whamcloud.com/9038. Unfortunately, We lost chance to catch it before landed. The issue is that, You bring obd_type->typ_procsym as shared entry between osc & osp, lov & lod. But you also use it in lmv & lov to transfer 'target_obds' entry, That is absolute wrong. Since target_obds is per obd, But typ_procsym is per obd_type. So the 'typ_procsym' will be reused while one obd_type has two more obd_devices(It is usual case). So, Please resotre the obd_proc_private for target_obds. Of course, you can use other name as you like.

            ys Yang Sheng added a comment - Hi, James, I found a big issue in http://review.whamcloud.com/9038 . Unfortunately, We lost chance to catch it before landed. The issue is that, You bring obd_type->typ_procsym as shared entry between osc & osp, lov & lod. But you also use it in lmv & lov to transfer 'target_obds' entry, That is absolute wrong. Since target_obds is per obd, But typ_procsym is per obd_type. So the 'typ_procsym' will be reused while one obd_type has two more obd_devices(It is usual case). So, Please resotre the obd_proc_private for target_obds. Of course, you can use other name as you like.

            Now that LU-3227 landed hopefully the test failures we have been seeing for some of our patches will go away. The recipe has changed a bit. I have included the last patch for LU-1330 into the mix to avoid collisions with the ofd proc changes. LU-1330 is a step forward to syncing us to the upstream client. The patches now needed for server support are:

            http://review.whamcloud.com/#/c/7934
            http://review.whamcloud.com/#/c/7936
            http://review.whamcloud.com/#/c/8010
            http://review.whamcloud.com/#/c/8036
            http://review.whamcloud.com/#/c/2677 - LU-1330
            http://review.whamcloud.com/#/c/8049
            http://review.whamcloud.com/#/c/9384

            simmonsja James A Simmons added a comment - Now that LU-3227 landed hopefully the test failures we have been seeing for some of our patches will go away. The recipe has changed a bit. I have included the last patch for LU-1330 into the mix to avoid collisions with the ofd proc changes. LU-1330 is a step forward to syncing us to the upstream client. The patches now needed for server support are: http://review.whamcloud.com/#/c/7934 http://review.whamcloud.com/#/c/7936 http://review.whamcloud.com/#/c/8010 http://review.whamcloud.com/#/c/8036 http://review.whamcloud.com/#/c/2677 - LU-1330 http://review.whamcloud.com/#/c/8049 http://review.whamcloud.com/#/c/9384

            Looking at the logs it is not the nid stats that are wrong but the reason for the failure of test for ofd/mdt port is that lprocfs_recovery_status_seq_show has a extra newline that is confusing the test. The patch http://review.whamcloud.com/#/c/10007 for LU-3227 fixes this issue.

            simmonsja James A Simmons added a comment - Looking at the logs it is not the nid stats that are wrong but the reason for the failure of test for ofd/mdt port is that lprocfs_recovery_status_seq_show has a extra newline that is confusing the test. The patch http://review.whamcloud.com/#/c/10007 for LU-3227 fixes this issue.

            More of this work has landed. No more patching is needed for client support. For server support you need the following patches:

            http://review.whamcloud.com/#/c/9384 - Currently broken. Needs to fixed and tested.
            http://review.whamcloud.com/#/c/7936
            http://review.whamcloud.com/#/c/8010

            http://review.whamcloud.com/#/c/9059 - LU-4563, following patches depend on it.
            http://review.whamcloud.com/#/c/7934
            http://review.whamcloud.com/#/c/8036
            http://review.whamcloud.com/#/c/8049

            The last two report nid stats wrong which causes some Maloo test to fail. Need to track that down still. Please inspect and test.

            simmonsja James A Simmons added a comment - More of this work has landed. No more patching is needed for client support. For server support you need the following patches: http://review.whamcloud.com/#/c/9384 - Currently broken. Needs to fixed and tested. http://review.whamcloud.com/#/c/7936 http://review.whamcloud.com/#/c/8010 http://review.whamcloud.com/#/c/9059 - LU-4563 , following patches depend on it. http://review.whamcloud.com/#/c/7934 http://review.whamcloud.com/#/c/8036 http://review.whamcloud.com/#/c/8049 The last two report nid stats wrong which causes some Maloo test to fail. Need to track that down still. Please inspect and test.

            Back to finishing up the 3.12 server side support. Currently for seq_file proc support you need:

            client side:

            http://review.whamcloud.com/#/c/9465
            http://review.whamcloud.com/#/c/9038

            server side:

            http://review.whamcloud.com/#/c/9384 # Note this needs to be reworked. Very broken.
            http://review.whamcloud.com/#/c/7934
            http://review.whamcloud.com/#/c/7935
            http://review.whamcloud.com/#/c/7936
            http://review.whamcloud.com/#/c/8010
            http://review.whamcloud.com/#/c/8036
            http://review.whamcloud.com/#/c/8049
            http://review.whamcloud.com/#/c/8232

            Lots of bug reductions recently so it is becoming more usable.

            simmonsja James A Simmons added a comment - Back to finishing up the 3.12 server side support. Currently for seq_file proc support you need: client side: http://review.whamcloud.com/#/c/9465 http://review.whamcloud.com/#/c/9038 server side: http://review.whamcloud.com/#/c/9384 # Note this needs to be reworked. Very broken. http://review.whamcloud.com/#/c/7934 http://review.whamcloud.com/#/c/7935 http://review.whamcloud.com/#/c/7936 http://review.whamcloud.com/#/c/8010 http://review.whamcloud.com/#/c/8036 http://review.whamcloud.com/#/c/8049 http://review.whamcloud.com/#/c/8232 Lots of bug reductions recently so it is becoming more usable.

            People

              laisiyao Lai Siyao
              bergwolf Peng Tao
              Votes:
              0 Vote for this issue
              Watchers:
              19 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: