[LU-3319] Adapt to 3.10 upstream kernel proc_dir_entry change Created: 11/May/13  Updated: 23/Nov/17  Resolved: 01/Jul/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.0
Fix Version/s: Lustre 2.6.0

Type: Improvement Priority: Minor
Reporter: Peng Tao Assignee: Lai Siyao
Resolution: Fixed Votes: 0
Labels: None

Attachments: Text File 0001-proc-add-a-child-pde-search-helper.patch     Text File 0002-staging-lustre-adapt-proc_dir_entry-change.patch     File proc.c     Text File remove_lprocfs_srch.patch    
Issue Links:
Blocker
is blocking LU-3665 obdfilter-survey test_3a: unmount stu... Resolved
Related
is related to LU-4513 sanity test_220: prealloc_last_id: Fo... Resolved
is related to LU-4483 sanity test 133f: Panic Resolved
is related to LU-4510 Oops (use after free) in osp_prealloc... Resolved
is related to LU-4678 obdfilter-survey repeated run causes ... Resolved
is related to LU-5764 Crash of MDS on "apparent buffer over... Resolved
is related to LU-4511 proc_dir_entry 'lustre/osc' already r... Resolved
is related to LU-4532 Test failure on test suite sanity, su... Resolved
is related to LU-5275 clean up technical debt for proc_dir_... Resolved
is related to LU-4416 support for 3.12 linux kernel Resolved
is related to LU-3675 Support for linux 3.10 kernel Resolved
Rank (Obsolete): 8278

 Description   

In kernel 3.10 merge window, there is an important change in procfs subsystem that hides all details of proc_dir_entry structure.

This affects Lustre in two ways. First is that the read_proc_t and write_proc_t members are now removed so Lustre has to be converted to use seq_file approach. The second one is that Lustre can no longer do child entry lookup inside lprocfs_srch.

With the private proc_dir_entry structure and related exported APIs, the only possible/safe way to do child lookup is through path walking. Each pde has a atomic counter that is initialized to 1. And remove_proc_entry will decrease the counter to 0 and the free the pde. The counter is only increased by procfs when the corresponding proc dentry is instantiated during path walk. And it is decreased when the proc inode is evicted. There is no way an external module can hold a ref count on the dentry thus a pde can go away at any time if remove_proc_entry is called on it.

Therefore, callers has to make sure that they don't use any removed pde directly. OTOH, Lustre saves its created pde and passes it around in many places. Later those pointers are used without any protection. Would it be a problem? The situation is true for pre-3.10 kernel as well because Lustre never holds references on proc_dir_entry.

The existence of LPROCFS_ENTRY_AND_CHECK() also seems confusing. It indicates that a pde may be already invalidated when accessed by Lustre. However, all callers of LPROCFS_ENTRY_AND_CHECK() are in fact part of pde->proc_fops, which is called in side of proc_reg_file_ops where pde is protected by pde->in_use counter (pde_users for older kernels). When pde->in_use is positive, there is no way a pde can be removed. So IMOH LPROCFS_ENTRY_AND_CHECK() seems unnecessary.

Also, it seems that _lprocfs_lock is to protect child pde insert/remove/search, whereas kernel already has proc_subdir_lock doing the same job. But lprocfs_srch is the only exception. If we can get rid of lprocfs_srch, I think we can get rid of _lprocfs_lock as well. Since 3.10 is still under development, we can make a patch to procfs exporting a search function and send it to upstream. In this way, lprocfs_srch can be removed.

Also, removing LPROCFS_ENTRY_AND_CHECK/_lprocfs_lock or not, we cannot dereference a removed pde directly. Is there any existing mechanism to make sure it doesn't happen? Many proc_dir_entry are created by Lustre and passed around without protection. It seems dangerous to me but since it is working so far, maybe I am missing something?



 Comments   
Comment by Peter Jones [ 15/May/13 ]

Lai

Could you please look into this one?

Peter

Comment by Peng Tao [ 16/May/13 ]

The first patch is intended to be sent upstream 3.10 kernel to export proc child entry searching functionality. For older kernels where struct proc_dir_entry is not private, Lustre can just implement the same function itself.

The second patch is one applied to the staging tree to change the Lustre client there.

PS, staging tree is the one bellow:
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git

Comment by Peng Tao [ 16/May/13 ]

Moving private discussion with Andreas here:

On Thu, May 16, 2013 at 1:06 AM, Andreas Dilger <adilger@dilger.ca> wrote:
>
> I have some concern about removing lprocfs_lock. I think the problem is not
> about proc consistency, but rather about a race between Lustre device
> cleanup and proc access by applications. This won't be hit under normal
> usage, but is a concern if there is a userspace monitoring tool keeping the
> proc files open across unmount.
>
Not sure if I fully get your idea, but I think when user application opens the proc file, it takes a reference on the related proc dentry/inode, which in turn take a reference on pde. Therefore, when Lustre calls proc_remove() during umount, the corresponding pde will not be freed because of the reference taken by application. Also proc_remove() makes sure that all pde openers are closed. It also sets the pde->in_use to BIAS (a large negative value) so that no future calling into module methods will be made. And user space applications will get -EIO when it tries to read/write to the proc fd.

Above is for the latest 3.10 kernel. For older kernels, proc_remove() sets pde->proc_fops to NULL and all proc file ops like read/write/llseek check pde->proc_fops before calling into module methods. It is exactly same as Lustre's LPROCFS_ENTRY_AND_CHECK() does, which is another proof that the macro is unnecessary.

The existence of lprocfs_lock (as noted in the comments in libcfs/include/libcfs/params_tree.h), is not necessary after 2.6.23 kernel. The only reason we need it there IMO is because of lprocfs_srch() that ignores kernel's existing proc_subdir_lock that provides the same functionality. proc_subdir_lock went in since 2.6.16 kernel so it is safe to be used for all kernels that master supports.

Please see if above analysis makes sense.

Thanks, Tao

Comment by Peng Tao [ 17/May/13 ]

As I checked again, proc_subdir_lock is not exported in all kernels. So for old kernels, lprocfs_lock seems still necessary for the benefit of lprocfs_srch() to be safe. OTOH for 3.10 kernel, Lustre cannot access members of struct proc_dir_entry. So we need to push patch 0001 to upstream so that Lustre can still search child proc entry.

As for userspace monitor tool keeping the proc file open across unmount, I have a small test program that verifies that procfs rejects userspace access properly after unmount. The program will be attached.

Thoughts?

Thanks,
Tao

Comment by Peng Tao [ 17/May/13 ]

Running the program we'll see that the second read() fails because procfs properly rejects calling into module methods after proc entry is removed.

Comment by Lai Siyao [ 20/May/13 ]

I'm wondering whether we can remove lprocfs_srch() in lustre code after association of lustre proc entry with proper data structure, thereafter we can get the wanted proc entry directly without searching. I'll take deeper look into all the usages of lprofs_srch().

Comment by Peng Tao [ 20/May/13 ]

For client only, it seems possible. There are following usage of lprocfs_srch() in client source:
1. In lmv and lov, it is used to find "target_obds" from obd->obd_proc_entry. Looking at server code, there are many other child pdes. So it is not practical to add them all in obd_device structure. how about adding a "void *obd_proc_private" member to struct obd_device, and let each user define their own PDE collections? For example, MDC needs type specific pde, lmv/lov needs "target_obds" pde, and ofd needs osd_dir and its child entries.

There are also
mdc_symlink = lprocfs_srch(lmv_proc_dir, mdc_obd->obd_name);
and
osc_symlink = lprocfs_srch(lov_proc_dir, osc_obd->obd_name);
that are followed by lprocfs_remove() by calling proc_remove_subtree() instead
and implement similar function for older kernel.

2. proc dir entry for ldlm name space. We can save the pde in struct ldlm_namespace so that the search_before_use logic can be removed.

3. The generic case of lprocfs_register/lprocfs_add_vars. lprocfs_srch() is used as a safe guard because proc_mkdir/proc_create_data don't fail for duplicated creation. we'll lose the safe guard and have to make sure whenever lprocfs_register/lprocfs_add_vars are called, they do not create existing proc dir/entry. Luckly all lprocfs_register/lprocfs_add_vars callers are initilizers, it tends to be fine to just remove the safe guard.

Just from client point of view, it seems OK to remove lprocfs_srch(). However, as I looked at server code, it seems less OK because the obd_proc_private structure needed by ofd/osd exactly duplicates with the procfs hierarchy and also loses procfs' flexibility (we will have to modify data structure now to add introduce new child PDEs). But in theory we can get it to work anyway.

I can cook a client patch pretty soon to show how above works. Will update later.

Comment by Peng Tao [ 20/May/13 ]

remove_lproc_srch.patch is to show the idea of removing lprocfs_srch. It applies to staging tree client code on top of the 0002 proc_dir_entry patch. roughtly tested only.

IMO, the approach should work for server as well. What do you think? Thanks.

Comment by Lai Siyao [ 21/May/13 ]

Generally this looks good to me, but should you put the patches in gerritt to review to land to master?

Comment by Peng Tao [ 21/May/13 ]

Thanks for reviewing. I will do. Just that it is holding up staging tree client from being built with upstream kernel. So I will run more tests with the patches applied on client and if that passes, send it to Greg KH first. Then I will backport the patch to master and convert server side lprocfs callers as well.

Comment by Andreas Dilger [ 10/Jul/13 ]

Tao, can you please begin submitting the upstream patches to Lustre master. The patch in http://review.whamcloud.com/6588 would really benefit from the /proc seq_file change you submitted upstream.

Comment by Peng Tao [ 12/Jul/13 ]

Andreas, sorry for the delay. I will start submitting them after refresh all libcfs patches, which should be finished soon.

Comment by Emoly Liu [ 16/Jul/13 ]

Peng Tao, does this proc patch depend on those libcfs patches? As Andreas said, our some patches would really benefit from the /proc seq_file change. So, if you don't mind, I'd like do some help to make this change.

Comment by Peng Tao [ 16/Jul/13 ]

liuying, it doesn't depend on any libcfs patches. There might be some conflicts but I guess it is OK to resolve later since libcfs patches tend to conflict with anything else. The procfs change is now in Linus' mainline as commit 73bb1da692d0dc3e93b9c9e29084d6a5dcbc37a6. It would be great if you can help porting the patch. Thanks a lot.

Comment by James A Simmons [ 23/Jul/13 ]

liuying do you have a patch ready? I'm running into this issue getting a lustre client on 3.10.2 going.
Just to save us time by avoiding duplicate efforts.

Comment by Emoly Liu [ 25/Jul/13 ]

James, yes, I am working on this patch.

Comment by James A Simmons [ 25/Jul/13 ]

The reason I ask is that I while I was doing some testing for another project and I had time between test to create this patch. I have a working patch already. How far along are you?

Comment by Emoly Liu [ 25/Jul/13 ]

Aha, then probably your patch is in better shape than mine. I'm still cooking it.
Could you please push yours to gerrit for review? Thanks.

Comment by James A Simmons [ 26/Jul/13 ]

I'm going to need to break the patch into pieces. Currently it is several thousand lines of changes.

Comment by Emoly Liu [ 29/Jul/13 ]

I see. Thanks!

Comment by Emoly Liu [ 29/Jul/13 ]

This port includes the following sub-patches:

http://review.whamcloud.com/#/c/7084
http://review.whamcloud.com/#/c/7135
http://review.whamcloud.com/#/c/7139
http://review.whamcloud.com/#/c/7140

Comment by Bob Glossman (Inactive) [ 29/Jul/13 ]

I see fc19 just jumped from 3.9 to 3.10 kernels. These changes are now required for fc19 builds.

Comment by James A Simmons [ 31/Jul/13 ]

I opened ticket LU-3675 for 3.10 kernel support. This ticket and LU-2800 will need to be linked to it. We need on patch for to handle a api change in 3.10 that impacts llite_loop.c under the LU-2800 ticket. I just pushed a new http://review.whamcloud.com/#/c/7084 to be the base to handle proc seq. I have all the patches for a 3.10 client as well. The one question I have Peng is I noticed that some times you use the void *data field and some times you use struct seq_files private field. Which do you pick?

Comment by Peng Tao [ 01/Aug/13 ]

James, I thought I've made lprocfs to always use seq_file private field when seq_file is used. But I might have missed some. Could you please be specific which function(s) you referred to?

Comment by James A Simmons [ 01/Aug/13 ]

In the current linux tree you have in lproc_status.c alot of lprocfs_rd_* functions using the data field. I believe that the lprocfs_rd_[uint/u64/atomic] do need to use the data field. I will update
http://review.whamcloud.com/#/c/7135/ to reflect this.

Comment by Peng Tao [ 02/Aug/13 ]

lprocfs_rd_[type] functions are called in macros like LPROC_SEQ_FOPS_RO_TYPE. The void *data parameter passed in is seq_file's private filed. Please see lprocfs_status.h for details.

Comment by James A Simmons [ 02/Aug/13 ]

So in that it shouldn't really matter. Can you inspect my patch now at http://review.whamcloud.com/#/c/7135. I like to get this right so I can push the other patches soon after.

Comment by James A Simmons [ 06/Aug/13 ]

Andreas, Peng I have a new patch at http://review.whamcloud.com/#/c/7135 that will not collide with HSM. I did as you asked Peng in that this patch introduces two parallel proc handling methods to allow a gradual transition.

Comment by Bob Glossman (Inactive) [ 12/Aug/13 ]

If I'm not mistaken the list of needed patches is now much longer.

http://review.whamcloud.com/#/c/7135
http://review.whamcloud.com/#/c/7139
http://review.whamcloud.com/#/c/7140
http://review.whamcloud.com/#/c/7290
http://review.whamcloud.com/#/c/7291
http://review.whamcloud.com/#/c/7292
http://review.whamcloud.com/#/c/7293
http://review.whamcloud.com/#/c/7294
http://review.whamcloud.com/#/c/7295
http://review.whamcloud.com/#/c/7296

Comment by James A Simmons [ 13/Aug/13 ]

Yep you need all those patches for 3.10 client support. MOre will come soon for ZFS support.

Comment by Bob Glossman (Inactive) [ 13/Aug/13 ]

James,
As you said I've had fair success doing client builds in 3.10 with the listed patches. Not so much server builds for zfs only. Big problem seems to be with kernel fs API PDE(). Suspect this is due to the define for this moving to a different linux #include in 3.10

Comment by James A Simmons [ 13/Aug/13 ]

Can you give me to the end of the week. I might have something for you to try with ZFS.

Comment by James A Simmons [ 13/Aug/13 ]

As a side note you need to get the master branch for spl and zfs since 0.6.1 also has issues with the proc changes.

Comment by Bob Glossman (Inactive) [ 13/Aug/13 ]

That's not a problem. I typically pluck the latest master spl/zfs for my test builds instead of sticking to 0.6.* like our framework builds.

Comment by Bob Glossman (Inactive) [ 21/Aug/13 ]

The current set of working patches I'm trying look like:

http://review.whamcloud.com/#/c/5380
http://review.whamcloud.com/#/c/7135
http://review.whamcloud.com/#/c/7139
http://review.whamcloud.com/#/c/7140
http://review.whamcloud.com/#/c/7290
http://review.whamcloud.com/#/c/7291
http://review.whamcloud.com/#/c/7292
http://review.whamcloud.com/#/c/7293
http://review.whamcloud.com/#/c/7294
http://review.whamcloud.com/#/c/7296
http://review.whamcloud.com/#/c/7295

Note the change in order at the end of the list. This is due to the dependency layout.
Some of these need rebase to fix conflicts with recent landings.

At the moment I can still only do client builds, not server builds.

When using these patches I see a suspicious looking error log at mount time:

LustreError: 20333:0:(obd_config.c:1443:class_process_proc_seq_param()) writing proc entry import err -30

I mention it here because it looks related to proc code, where all the changes are.

Comment by James A Simmons [ 22/Aug/13 ]

Ah I see what is causing this. Error -30 is EROFS which means someone is attempting to write a read only variable. Do you have more detailed logs to know which proc entry this is happening with?

Comment by Bob Glossman (Inactive) [ 22/Aug/13 ]

All I know right now is it happens during a mount cmd. Suspect it's something invoked during module init or from mount.lustre. Maybe I can gather more context from debug log or strace of the mount cmd. I will see what I can find.

Comment by James A Simmons [ 22/Aug/13 ]

Debugged the problem. It looks someone is attempting using the mgc layer to write to the import proc entry.

Comment by Bob Glossman (Inactive) [ 22/Aug/13 ]

Jaems, Excellent. You just saved me a bunch of effort. Thank you.

Comment by James A Simmons [ 22/Aug/13 ]

Yep, import proc entry is writable in some cases. I missed it due to lprocfs_rd_import being in obdclass.c and lprocfs_wr_import being in lproc_ptlrpc.c. Patch will come shortly.

Comment by James A Simmons [ 22/Aug/13 ]

Patch http://review.whamcloud.com/#/c/7295/ has been updated with the fix.

Comment by Bob Glossman (Inactive) [ 22/Aug/13 ]

Verified that the error log message is gone using the new version of #7295.

Comment by Bob Glossman (Inactive) [ 03/Sep/13 ]

Recent refresh of patches not working so well in fc19 (3.10.10). Now see errors even during client only build. examples:

CC [M] /home/bogl/lustre-release/lustre/lmv/lmv_obd.o
/home/bogl/lustre-release/lustre/lmv/lmv_obd.c: In function ‘lmv_disconnect_mdc’:
/home/bogl/lustre-release/lustre/lmv/lmv_obd.c:652:9: error: implicit declaration of function ‘lprocfs_srch’ [-Werror=implicit-function-declaration]
lmv_proc_dir = lprocfs_srch(obd->obd_proc_entry, "target_obds");
^
/home/bogl/lustre-release/lustre/lmv/lmv_obd.c:652:22: error: assignment makes pointer from integer without a cast [-Werror]
lmv_proc_dir = lprocfs_srch(obd->obd_proc_entry, "target_obds");
^
/home/bogl/lustre-release/lustre/lmv/lmv_obd.c:656:29: error: assignment makes pointer from integer without a cast [-Werror]
mdc_symlink = lprocfs_srch(lmv_proc_dir, mdc_obd->obd_name);
^
cc1: all warnings being treated as errors
make[6]: *** [/home/bogl/lustre-release/lustre/lmv/lmv_obd.o] Error 1
make[5]: *** [/home/bogl/lustre-release/lustre/lmv] Error 2
make[4]: *** [/home/bogl/lustre-release/lustre] Error 2
make[3]: *** [_module_/home/bogl/lustre-release] Error 2
make[3]: Leaving directory `/home/bogl/rpmbuild/BUILD/kernel-3.10.10.200.l0903'
make[2]: *** [modules] Error 2
make[2]: Leaving directory `/home/bogl/lustre-release'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/home/bogl/lustre-release'
make: *** [all] Error 2

Looks like some functions only #define'd conditionally now are used in 3.10 too.

Comment by James A Simmons [ 03/Sep/13 ]

Oops a chuck got rejected that I missed. Sorry about that. The new patch should address this.

Comment by Bob Glossman (Inactive) [ 03/Sep/13 ]

Verified the additional refresh of #7291 gets client only builds working again.

Comment by Bob Glossman (Inactive) [ 03/Sep/13 ]

Current working patch set is:

http://review.whamcloud.com/#/c/7135
http://review.whamcloud.com/#/c/5380
http://review.whamcloud.com/#/c/7139
http://review.whamcloud.com/#/c/7140
http://review.whamcloud.com/#/c/7290
http://review.whamcloud.com/#/c/7291
http://review.whamcloud.com/#/c/7292
http://review.whamcloud.com/#/c/7293
http://review.whamcloud.com/#/c/7294
http://review.whamcloud.com/#/c/7296
http://review.whamcloud.com/#/c/7295

Still can only do client builds. Server builds don't work.

Comment by Yang Sheng [ 05/Sep/13 ]

Hi, James, Do you have plan for server works? Else i'll start working on it.

Comment by James A Simmons [ 06/Sep/13 ]

Yes and I just finished my first pass at ZFS server support. Some parts are full of bugs so I can start pushing what I know works.

Comment by James A Simmons [ 11/Oct/13 ]

I pushed the first set of working patches for server support with seq_files. Not everything has been ported yet as those parts have bugs still. The following that appears to work are:

http://review.whamcloud.com/#/c/7928
http://review.whamcloud.com/#/c/7929
http://review.whamcloud.com/#/c/7931

http://review.whamcloud.com/#/c/7933
http://review.whamcloud.com/#/c/7934
http://review.whamcloud.com/#/c/7935
http://review.whamcloud.com/#/c/7936

Comment by Bob Glossman (Inactive) [ 11/Oct/13 ]

Not able to build for servers with the added patch set (above). Getting errors related to use of lprocfs_static_vars, only defined if HAVE_ONLY_PROCFS_SEQ isn't:

  CC [M]  /home/bogl/lustre-release/lustre/lod/lod_dev.o
In file included from /home/bogl/lustre-release/lustre/lod/lod_dev.c:49:0:
/home/bogl/lustre-release/lustre/lod/lod_internal.h:320:35: error: ‘struct lprocfs_static_vars’ declared inside parameter list [-Werror]
/home/bogl/lustre-release/lustre/lod/lod_internal.h:320:35: error: its scope is only this definition or declaration, which is probably not what you want [-Werror]
/home/bogl/lustre-release/lustre/lod/lod_dev.c: In function ‘lod_process_config’:
/home/bogl/lustre-release/lustre/lod/lod_dev.c:395:10: error: variable ‘v’ has initializer but incomplete type
/home/bogl/lustre-release/lustre/lod/lod_dev.c:395:10: error: excess elements in struct initializer [-Werror]
/home/bogl/lustre-release/lustre/lod/lod_dev.c:395:10: error: (near initialization for ‘v’) [-Werror]
/home/bogl/lustre-release/lustre/lod/lod_dev.c:395:30: error: storage size of ‘v’ isn’t known
   .
   .
   .

Is there maybe a refresh of change #7135 that fixes this not pushed to gerrit?

Comment by James A Simmons [ 14/Oct/13 ]

lod is one of the subsystems missing. Currently I'm working out a few bugs.

Comment by James A Simmons [ 23/Oct/13 ]

I finished up support for ZFS server support with the linux 3.11 kernel. You will need the client patches which are:

http://review.whamcloud.com/#/c/7135
http://review.whamcloud.com/#/c/7139
http://review.whamcloud.com/#/c/7140
http://review.whamcloud.com/#/c/7290
http://review.whamcloud.com/#/c/7291
http://review.whamcloud.com/#/c/7292
http://review.whamcloud.com/#/c/7293
http://review.whamcloud.com/#/c/7294
http://review.whamcloud.com/#/c/7296
http://review.whamcloud.com/#/c/7295
http://review.whamcloud.com/#/c/5380

Server side patches for ZFS support needed are:

http://review.whamcloud.com/#/c/7928
http://review.whamcloud.com/#/c/7929
http://review.whamcloud.com/#/c/7931

http://review.whamcloud.com/#/c/7933
#depends 7933
http://review.whamcloud.com/#/c/7934
http://review.whamcloud.com/#/c/7935
http://review.whamcloud.com/#/c/7936
http://review.whamcloud.com/#/c/8010
http://review.whamcloud.com/#/c/8029
http://review.whamcloud.com/#/c/8036
http://review.whamcloud.com/#/c/8049

Of course you need the patches for kernel 3.11 support which are listed under LU-3974

I have started to work on ldiskfs but that will require a lot more work which we can cover in LU-3373.

Now that ZFS support is ready give it a try

Comment by Bob Glossman (Inactive) [ 05/Nov/13 ]

recent refreshes of the client patches seem to work fine. need refreshes of all the server patches to fix up the calls to class_register_type() too.

Comment by James A Simmons [ 06/Nov/13 ]

Working through some new bugs. Oleg's testing of the patch found a problem when freeing the proc entries that was registered with class_register_type. If you examine class_unregister_type we call lprocfs_try_remove_proc_entry to free the proc tree currently but that function parses through the proc directory tree structure which can't be done on newer kernels. I originally updated the code to use lprocfs_remove which removes all the subdirectories. It was updated to work on current RHEL6 kernels and the upstream kernels. This worked fine for client but not
server since we have the case of proc tres being symlinked, lod to lov for example, to each other. We have to find the proper proc tree to delete by the name, not the proc_dir_entry handle but you can't do that with lprocfs_remove so we use remove_proc_subtree instead. BTW this is broken upstream as well but was never encountered since the upstream code is only for clients which have no top level symlinks. That could change in the future.

The latest patch using remove_proc_subtree appears to resolve this issue but now new bugs show up due to the older code handling the locking of the proc tree and then newer code depending on newer kernel code to handle the locking itself. We have to make sure they work side by side. I'm working through those issues now.

Lastest I changed class_register again. All the instances I encountered only used num_ref which I removed as Andreas suggested which simplified the function. Well osd-ldiskfs passes in more than just num_refs so I had to restore it.

One other bug I noticed before on the server side was evict_client. This one is strange that only fails a fraction of the time which makes no sense and only with the ofd proc evict_client entry.

I hope to have this resolved in the next few days but it will take time to do all the testing for so many targets.

Comment by Andreas Dilger [ 07/Nov/13 ]

Personally, I would prefer if each structure kept a reference to its corresponding entries in /proc, and then only that entry would be deleted at the end. I don't like the idea of scanning /proc for entries that might need to be deleted. That is something we inherited from the original lprocfs contributor and I'd rather remove it if this is the time to do so.

How is this done in other parts of the kernel?

Comment by James A Simmons [ 11/Nov/13 ]

The new code removes the scanning since you can't transverse the tree like we do for lproc_srch with newer kernels. In fact I do keep a reference in struct obd_device (obd_proc_private). New kernels remove an entire proc subtree with remove_proc_subtree. You can also still do a remove_proc_entry which deletes a proc entry by name as long as you know the parent.

Comment by James A Simmons [ 11/Nov/13 ]

I just refreshed all the patches. My testing over the weekend shows promise with this set. Only the ofd needs more work now. I also have some early code for the ldiskfs layer but I ran into some problems during testing. The patches know to work will be posted very soon.

Comment by James A Simmons [ 11/Nov/13 ]

The patches need to port both ZFS and ldiskfs proc handling over to seq_file is

http://review.whamcloud.com/#/c/7135
http://review.whamcloud.com/#/c/7139
http://review.whamcloud.com/#/c/7140
http://review.whamcloud.com/#/c/7290
http://review.whamcloud.com/#/c/7291
http://review.whamcloud.com/#/c/7292
http://review.whamcloud.com/#/c/7293
http://review.whamcloud.com/#/c/7294
http://review.whamcloud.com/#/c/7296
http://review.whamcloud.com/#/c/7295

Server side patches for ZFS support needed are:

http://review.whamcloud.com/#/c/7928
http://review.whamcloud.com/#/c/7929
http://review.whamcloud.com/#/c/7931

http://review.whamcloud.com/#/c/7933
#depends 7933
http://review.whamcloud.com/#/c/7934
http://review.whamcloud.com/#/c/7935
http://review.whamcloud.com/#/c/7936
http://review.whamcloud.com/#/c/8010
http://review.whamcloud.com/#/c/8029
http://review.whamcloud.com/#/c/8036
http://review.whamcloud.com/#/c/8232
http://review.whamcloud.com/#/c/8029
http://review.whamcloud.com/#/c/8036
http://review.whamcloud.com/#/c/8049

Additional patches from LU-3373,LU-3974, and patch http://review.whamcloud.com/#/c/5380 for support of kernel 3.11 server support for ldiskfs/zfs. Only the iterator code fore ldiskfs needs to be worked out.

Comment by James A Simmons [ 18/Nov/13 ]

The ldiskfs code has been worked out and I have been testing a file system on our cray test bed. So far the only issue I ran into is in osp_process_config which shows up with DEBUG_PAGEALLOC enabled in the kernel. This happened when the file system attempted an recovery. This means

http://review.whamcloud.com/#/c/8029

needs more work.

Comment by James A Simmons [ 20/Nov/13 ]

Patch http://review.whamcloud.com/#/c/7135 finished its maloo test with success. Please inspect so we can land it soon. Thank you.

Comment by Bob Glossman (Inactive) [ 20/Nov/13 ]

don't want to +review #7135 until I can run it some more. Can't run it due to other patches needing refresh, in particular #7741

Comment by James A Simmons [ 20/Nov/13 ]

Several patches need refreshing. The patches are getting stale but to handle that I have to refresh 7135. Can I send you the updated patches to test with? I have them updated locally against the current master.

Comment by Bob Glossman (Inactive) [ 20/Nov/13 ]

Would prefer to see them pushed to Gerrit so I can see them all consistent, but I will take them however I can get them for test. Thanks, James.

Comment by James A Simmons [ 20/Nov/13 ]

Once 7135 is landed I will need to sync up several patches. Its just right now it Maloo roulette if you submit a patch.

Comment by Bob Glossman (Inactive) [ 04/Dec/13 ]

I strongly suspect the patch #7135 recently landed to master has broken the build in Centos/RHEL 6.5. see comment in LU-4287

Comment by James A Simmons [ 04/Dec/13 ]

I see the issue. It appears some parts of the procfs changes upstream were back ported. I'm looking at a way to fix this issue. I will post a patch with ticket LU-4287

Comment by James A Simmons [ 09/Dec/13 ]

The following patches are ready for review and possible landing.

http://review.whamcloud.com/#/c/7139
http://review.whamcloud.com/#/c/7291
http://review.whamcloud.com/#/c/7292
http://review.whamcloud.com/#/c/7292
http://review.whamcloud.com/#/c/7931

Comment by James A Simmons [ 11/Dec/13 ]

Several patches have landed. The following patches for client support are ready for review for feedback.

http://review.whamcloud.com/#/c/7140
http://review.whamcloud.com/#/c/7290
http://review.whamcloud.com/#/c/7291
http://review.whamcloud.com/#/c/7293
http://review.whamcloud.com/#/c/7296
http://review.whamcloud.com/#/c/7295
http://review.whamcloud.com/#/c/7928

Comment by James A Simmons [ 18/Dec/13 ]

Client support is down to a few patches now. The patches needed are

http://review.whamcloud.com/#/c/7290
http://review.whamcloud.com/#/c/7291
http://review.whamcloud.com/#/c/7293

Please review them. Other patches also ready for inspection are

http://review.whamcloud.com/#/c/7928
http://review.whamcloud.com/#/c/7933

When the merger of 7933 happens then we can also complete the LU-1330 work.

Comment by James A Simmons [ 02/Jan/14 ]

Time for a status update. For client support we are down to one patch which is http://review.whamcloud.com/#/c/7290. Unfortunately a new problem is showing up which I can't explain.
Test conf-sanity 28 fails 4% of time with the following error.

LustreError: 18595:0:(lproc_llite.c:367:ll_max_read_ahead_whole_mb_seq_write()) can't set max_read_ahead_whole_mb more than max_read_ahead_per_file_mb: 0
LustreError: 18595:0:(obd_config.c:1443:class_process_proc_seq_param()) writing proc entry max_read_ahead_whole_mb err -34

Looking at the code this means the variable ra_max_pages is being set to zero. Looking at llite_lib.c
we see that

pages = si.totalram - si.totalhigh;
...
sbi->ll_ra_info.ra_max_pages_per_file = min(pages / 32,
SBI_DEFAULT_READAHEAD_MAX);

so for some reason si_meminfo() is reporting no free memory.

How should we handle this case?

As for server support I have new working patches coming once http://review.whamcloud.com/#/c/7933 finished maloo successfully. Only the ofd layer needs for work that I know of.

Comment by Bob Glossman (Inactive) [ 02/Jan/14 ]

I think the Jira ticket LU-4425 is related to the failure(s) you mention.

Comment by James A Simmons [ 06/Jan/14 ]

The following server side patches are ready for inspection and possible landing.

http://review.whamcloud.com/#/c/7933
http://review.whamcloud.com/#/c/7934
http://review.whamcloud.com/#/c/7935
http://review.whamcloud.com/#/c/8029

Comment by James A Simmons [ 13/Jan/14 ]

We have two patches that are ready for inspection and landing that adds support on the client side.

http://review.whamcloud.com/#/c/7290
http://review.whamcloud.com/#/c/8816

Also a few server side patches are ready as well.

http://review.whamcloud.com/#/c/7933
http://review.whamcloud.com/#/c/7934
http://review.whamcloud.com/#/c/7935
http://review.whamcloud.com/#/c/8029

Comment by Bob Glossman (Inactive) [ 13/Jan/14 ]

James,
I see you were right in there with http://review.whamcloud.com/#/c/8816 as soon as new lustre/nodemap code landed. Very nice. Didn't experience any gap in support for 3.10 clients.

Comment by James A Simmons [ 21/Jan/14 ]

Good news is that all the client side patches have landed. We have 3.10 kernel support for Lustre clients. I just pushed one patch for client side to deal with the issues people pointed out during
the review process that were not show stoppers but needed to be address. That patch is at

http://review.whamcloud.com/#/c/8943

One patch was merged that had a bug that broke master when merged but has now been removed. Working on a solution for that. In the mean time I have updated the server patches that don't have symlinks to deal with that break things.

Comment by James A Simmons [ 29/Jan/14 ]

While working out the bug that required a revert I discovered a very ancient bug in the procfs code. It appears that in the lustre 1.5 days that a patch was merged that created symlinks in the lov/targets_obds directory that point to obds in the osc layer. Well the targets_obds directory was never created so the symlinks have never existed. So I created a patch to fix this issue as well as found a solution to the race condition for the 8029 patch. The patch is
at

http://review.whamcloud.com/#/c/9038

I'm also working to update the server code as well. It will be ready in the near future.

Comment by James A Simmons [ 02/Feb/14 ]

Client side cleanup and bug fixes patches are ready for review.

http://review.whamcloud.com/#/c/8943
http://review.whamcloud.com/#/c/9038

Server side patches that passed maloo are:

http://review.whamcloud.com/#/c/7935
http://review.whamcloud.com/#/c/7934

Rest of server side patches are:

http://review.whamcloud.com/#/c/7936
http://review.whamcloud.com/#/c/8010
http://review.whamcloud.com/#/c/8036
http://review.whamcloud.com/#/c/8049
http://review.whamcloud.com/#/c/8232

Comment by Alexey Shvetsov [ 12/Feb/14 ]

Are there current list of patches (up to date ones) needed to run on 3.10 - 3.11 kernels

Comment by James A Simmons [ 13/Feb/14 ]

Depends on what you want to do. Current master works out of the box for 3.10 kernels as long as you don't enable CONFIG_NAMESPACE. Their is a patch to support user namespace under the LU-4476. For 3.11 kernel support of clients you need patches from LU-3974. For server support that is more complex. Are you on the hpdd-discuss mailing list? I think it would better to discuss it there. Also currently due to the lfsck work the mdd/ofd layer is in constant flux that makes the patches get outdated fast.

Comment by Alexey Shvetsov [ 13/Feb/14 ]

Yep. I subscribed on this list. Well actualy i get server with zfs backstore running on 3.11. There was few conflicts during rebase.

Comment by Bob Glossman (Inactive) [ 26/Feb/14 ]

There's a new problem with 3.10 support due to recently landed code in master. lfsck code now calls rb_init_node(). This kernel API exists in 2.6 and 3.0 kernels, but not in 3.10 and later. Don't know when exactly it disappeared, but it is certainly gone from recent versions. I plan to push an autoconf mod to adapt to the lack of this routine in recent kernel versions.

Comment by James A Simmons [ 26/Feb/14 ]

Yep it was replaced with RB_CLEAR_NODE. Time to make a wrapper for RB_CLEAR_NODE.

Comment by Bob Glossman (Inactive) [ 26/Feb/14 ]

Yes, after consideration it seems to me better to replace our one use of rb_init_node() with RB_CLEAR_NODE(). As far as I can tell RB_CLEAR_NODE() does exist in all kernel versions we support and may not even need a wrapper at all. Shouldn't have to fool around with autoconf. Simpler is better.

Comment by James A Simmons [ 26/Feb/14 ]

Agree. Since lrn_node is allocated with OBD_ALLOC it will be zeroed out so we don't have to worry about rb_left or rb_right not being NULL. The RHEL6 version of rb_init_node sets those fields to NULL.

Comment by Bob Glossman (Inactive) [ 26/Feb/14 ]

yes, agree. Using RB_CLEAR_NODE() immediately after OBD_ALLOC is especially safe since OBD_ALLOC zeros everything as you say. Did survey a bunch of instances of what were rb_init_node() in RHEL6 and what they become in later versions and the vast majority of uses immediately after allocation in kernel code did change to RB_CLEAR_NODE().

Comment by Bob Glossman (Inactive) [ 03/Mar/14 ]

I think some of the server patches need refresh. In particular http://review.whamcloud.com/#/c/8010. When I try to do a server build on 3.10 with "#define HAVE_ONLY_PROCFS_SEQ 1" I get errors like:

  CC [M]  /home/bogl/lustre-release/lustre/lod/lod_dev.o
/home/bogl/lustre-release/lustre/lod/lod_dev.c: In function ‘lod_mod_init’:
/home/bogl/lustre-release/lustre/lod/lod_dev.c:916:2: error: implicit declaration of function ‘lprocfs_register’ [-Werror=implicit-function-declaration]
  type->typ_procsym = lprocfs_register("lov", proc_lustre_root,
  ^
/home/bogl/lustre-release/lustre/lod/lod_dev.c:916:20: error: assignment makes pointer from integer without a cast [-Werror]
  type->typ_procsym = lprocfs_register("lov", proc_lustre_root,
                    ^
cc1: all warnings being treated as errors
make[6]: *** [/home/bogl/lustre-release/lustre/lod/lod_dev.o] Error 1
make[5]: *** [/home/bogl/lustre-release/lustre/lod] Error 2
make[4]: *** [/home/bogl/lustre-release/lustre] Error 2
make[3]: *** [_module_/home/bogl/lustre-release] Error 2
make[3]: Leaving directory `/usr/src/kernels/3.10.0-89.el7.x86_64'
make[2]: *** [modules] Error 2
make[2]: Leaving directory `/home/bogl/lustre-release'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/home/bogl/lustre-release'
make: *** [all] Error 2

I think there are some new lprocfs_register() calls that need to be #ifdef'ed out for the HAVE_ONLY_PROCFS_SEQ case.

Comment by James A Simmons [ 03/Mar/14 ]

I just pushed the base patch 9038 so I can now update again the patches. All the patches now depend on 9038.

Comment by Bob Glossman (Inactive) [ 03/Mar/14 ]

I see a brand new patch just added to the recipe: http://review.whamcloud.com/#/c/9465

Comment by James A Simmons [ 27/Mar/14 ]

Back to finishing up the 3.12 server side support. Currently for seq_file proc support you need:

client side:

http://review.whamcloud.com/#/c/9465
http://review.whamcloud.com/#/c/9038

server side:

http://review.whamcloud.com/#/c/9384 # Note this needs to be reworked. Very broken.
http://review.whamcloud.com/#/c/7934
http://review.whamcloud.com/#/c/7935
http://review.whamcloud.com/#/c/7936
http://review.whamcloud.com/#/c/8010
http://review.whamcloud.com/#/c/8036
http://review.whamcloud.com/#/c/8049
http://review.whamcloud.com/#/c/8232

Lots of bug reductions recently so it is becoming more usable.

Comment by James A Simmons [ 23/Apr/14 ]

More of this work has landed. No more patching is needed for client support. For server support you need the following patches:

http://review.whamcloud.com/#/c/9384 - Currently broken. Needs to fixed and tested.
http://review.whamcloud.com/#/c/7936
http://review.whamcloud.com/#/c/8010

http://review.whamcloud.com/#/c/9059 - LU-4563, following patches depend on it.
http://review.whamcloud.com/#/c/7934
http://review.whamcloud.com/#/c/8036
http://review.whamcloud.com/#/c/8049

The last two report nid stats wrong which causes some Maloo test to fail. Need to track that down still. Please inspect and test.

Comment by James A Simmons [ 29/Apr/14 ]

Looking at the logs it is not the nid stats that are wrong but the reason for the failure of test for ofd/mdt port is that lprocfs_recovery_status_seq_show has a extra newline that is confusing the test. The patch http://review.whamcloud.com/#/c/10007 for LU-3227 fixes this issue.

Comment by James A Simmons [ 30/Apr/14 ]

Now that LU-3227 landed hopefully the test failures we have been seeing for some of our patches will go away. The recipe has changed a bit. I have included the last patch for LU-1330 into the mix to avoid collisions with the ofd proc changes. LU-1330 is a step forward to syncing us to the upstream client. The patches now needed for server support are:

http://review.whamcloud.com/#/c/7934
http://review.whamcloud.com/#/c/7936
http://review.whamcloud.com/#/c/8010
http://review.whamcloud.com/#/c/8036
http://review.whamcloud.com/#/c/2677 - LU-1330
http://review.whamcloud.com/#/c/8049
http://review.whamcloud.com/#/c/9384

Comment by Yang Sheng [ 05/May/14 ]

Hi, James, I found a big issue in http://review.whamcloud.com/9038. Unfortunately, We lost chance to catch it before landed. The issue is that, You bring obd_type->typ_procsym as shared entry between osc & osp, lov & lod. But you also use it in lmv & lov to transfer 'target_obds' entry, That is absolute wrong. Since target_obds is per obd, But typ_procsym is per obd_type. So the 'typ_procsym' will be reused while one obd_type has two more obd_devices(It is usual case). So, Please resotre the obd_proc_private for target_obds. Of course, you can use other name as you like.

Comment by James A Simmons [ 05/May/14 ]

Yep Hammond reported that bug in LU-4953. We have a patch at http://review.whamcloud.com/#/c/10192.

Comment by James A Simmons [ 20/May/14 ]

Time for a status update. Several patches have been merged. Also many dependence patches have been merged as well. The patches left are:

http://review.whamcloud.com/#/c/7934 // ZFS
http://review.whamcloud.com/#/c/7936 // OSP
http://review.whamcloud.com/#/c/8049 // MDD/OFD currently has a memory leak
http://review.whamcloud.com/#/c/9384 // NRS TBF investigating how it works

The first two patches above are ready for review.

Comment by James A Simmons [ 23/May/14 ]

The issues in NRS TBF have been addressed. Only two patches left to inspect.

http://review.whamcloud.com/#/c/7934 // ZFS
http://review.whamcloud.com/#/c/8049 /// MDD/OFD which should have been fixed

Once we are down to one patch the last cleanup patch will be pushed.

Comment by James A Simmons [ 16/Jun/14 ]

The last two server patches have landed. All that is left is to create a cleanup patch to remove the technical debt.

Comment by Jodi Levi (Inactive) [ 01/Jul/14 ]

Patches landed to Master. New ticket to cleanup technical debt created LU-5275

Generated at Sat Feb 10 01:32:53 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.