[LU-1346] cleanup libcfs Created: 26/Apr/12  Updated: 30/Sep/13  Resolved: 16/Sep/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0, Lustre 2.5.0
Fix Version/s: Lustre 2.4.0, Lustre 2.5.0

Type: Improvement Priority: Minor
Reporter: Peng Tao Assignee: Keith Mannthey (Inactive)
Resolution: Fixed Votes: 0
Labels: emc, patch

Issue Links:
Duplicate
is duplicated by LU-3963 cleanup libcfs wrappers Resolved
Epic: client
Rank (Obsolete): 7767

 Description   

Migrate libcfs to emulate Linux kernel APIs, so that when submitting Linux client to upstream kernel, we don't need the abstraction layer which will be rejected by kernel maintainers. The majority of libcfs are cfs_* or ll_* wrappers, and can be cleaned up with scripts doing like "s/cfs_spin_lock/spin_lock". For other functions that cannot be cleaned up this way, we need to look at them in a case-by-case basis.



 Comments   
Comment by Xuezhao Liu [ 17/May/12 ]

Per the discussion with Andreas etc., we can use a script to do libcfs cleanup at compiling time first. We can implement the script incrementally, when the script is good enough we can apply it to lustre tree to change code directly. The code change will be huge and will break most in-flight patches and break different branch rebase, the script can help to resolve the confliction as well.

We ever considered implementing the script based on coccinelle, it is potentially a good tool to resolve collateral evolution issues. But it cannot well process some "#ifdef .... #else ... #endif" cases, or some complicate macro codes. So we implemented a sed script to do libcfs cleanup, together with some direct code changes.

3 patches submitted for review:
http://review.whamcloud.com/2829 cleanup libcfs lock and bit operations
http://review.whamcloud.com/2830 cleanup libcfs file operations
http://review.whamcloud.com/2831 cleanup libcfs memory operations

Comment by Xuezhao Liu [ 09/Dec/12 ]

http://review.whamcloud.com/4775 cleanup libcfs primitive (linux-prim.h)
http://review.whamcloud.com/4776 cleanup macros in kp30.h
http://review.whamcloud.com/4777 tcpip/time/type related cleanup
http://review.whamcloud.com/4778 cleanup macros in portals_compat25.h
http://review.whamcloud.com/4779 cleanup cfs_curproc_xxx macros
http://review.whamcloud.com/4780 cleanup list operations
http://review.whamcloud.com/4781 replace CFS_CAP_XXX with kernel definition

The above patches are only script plus some other code changes that not-easy to be handled by script. I submitted only for review, we can merge them and run the script to generate the final patch at appropriate time.

There are some other cleanup patches have not been submitted for example LWT code and some other obsolete macros cleanup.

Comment by John Hammond [ 15/Mar/13 ]

http://review.whamcloud.com/#change,2829 (cleanup libcfs lock and bit operations) has landed.
http://review.whamcloud.com/#change,2830 (remove cfs_ file wrappers) has landed.

Comment by James A Simmons [ 01/Jul/13 ]

I just tried out the latest master to discover that it no longer builds on SLES11 SP1. The api for mem_cache_create has a extra argument in the base line 2.6.32 kernel which was removed later.

Comment by James A Simmons [ 01/Jul/13 ]

Sorry it was another problem. Your patch doesn't impact SLES11 SP1.

Comment by Peng Tao [ 12/Jul/13 ]

I split
http://review.whamcloud.com/4775 cleanup libcfs primitive (linux-prim.h)
into several patches, to facilitate review and merging

http://review.whamcloud.com/#/c/6954/
http://review.whamcloud.com/#/c/6955/
http://review.whamcloud.com/#/c/6956/
http://review.whamcloud.com/#/c/6957/
http://review.whamcloud.com/#/c/6959/

The procfs primitive cleanup is dropped because it will be soon changed with the procfs change in 3.10 kernel support.

The last one patch (6959) is partial and converts only atomic primitives in libcfs directory. I'll update incremental patches to clean other modules.

Comment by Keith Mannthey (Inactive) [ 13/Jul/13 ]

There seems to be some build problem related to this patch set. A rebuild failed as well so it likely has to do with the base change itself somehow.

Comment by Peng Tao [ 16/Jul/13 ]

I don't have SLES11 environment but I tried to build the patchset against SLES11SP2 kernel on my CentOS box, and no error was found. I couldn't build rpms though because they require different spec file rules. And the error seems to be from rpmbuild.

Could you please help to retrieve below file?
/var/lib/jenkins/workspace/lustre-reviews/arch/x86_64/build_type/client/distro/sles11sp2/ib_stack/inkernel/BUILD/find-requires

The error message says:
error: Failed to find Requires:
Finding Requires: /var/lib/jenkins/workspace/lustre-reviews/arch/x86_64/build_type/client/distro/sles11sp2/ib_stack/inkernel/BUILD/find-requires
error: line 674: Dependency tokens must begin with alpha-numeric, '_' or '/': rm -rf $RPM_BUILD_ROOT

It seems that find-requires scanned some script and found undefined variable $RPM_BUILD_ROOT. But the patchset doesn't change any scripts except for libcfscleanup.sed which clearly doesn't reference $RPM_BUILD_ROOT.

Comment by James A Simmons [ 16/Jul/13 ]

Can you try patch http://review.whamcloud.com/#/c/5491. That might fix it up for you.

Comment by Keith Mannthey (Inactive) [ 16/Jul/13 ]

I am working to find a way to get the malformed file so we can better see what went wrong. We have a separate internal ticket tracking this build issue.

The present thought is the first patch in this series is wrecking havoc is this odd way. I will see about a debug rebase ontop of the SuSE build change.

Comment by Peng Tao [ 17/Jul/13 ]

Thanks James. It seems that http://review.whamcloud.com/#/c/5491 is related. I'll update to include it and see if it fixes the build issue.

Comment by Peng Tao [ 19/Jul/13 ]

Turned out to be a bug in the base patch. The error message was misleading. I've fixed it and now build succeeded.

Comment by Keith Mannthey (Inactive) [ 19/Jul/13 ]

I am glad this got sorted it. I agree the error message was very misleading.

Comment by Peng Tao [ 22/Jul/13 ]

Following up the cfs_atomic primitive change, below seven patches convert the rest pieces of code incrementally. They can each be merged independently.
http://review.whamcloud.com/7070
http://review.whamcloud.com/7072
http://review.whamcloud.com/7073
http://review.whamcloud.com/7074
http://review.whamcloud.com/7075
http://review.whamcloud.com/7076
http://review.whamcloud.com/7077

Comment by Keith Mannthey (Inactive) [ 23/Jul/13 ]

These 7 patches are each dependant on about 10 other patches. We will need to land these changes in order and a lowish level patch needing a rebase is going to cause the whole stack to be rebuilt and tested at least once more.

http://review.whamcloud.com/4779 is low in the stack and out of date.

The lowest patch is http://review.whamcloud.com/4776 and is being testing. 4777 and 4778 look to to land once 4776 finishes testing.

If there is any way to break the dependencies up the speed at which this code will land could improve.

Comment by Peng Tao [ 23/Jul/13 ]

shall I rebase 4779 and all the patches that rely on 4770 now, or wait until 4776/4777/4778 on are landed?

They are dependent mostly because they all modify the libcfs-cleanup.sed file.

Comment by James A Simmons [ 23/Jul/13 ]

Can you hold off rebasing 4779. I'm working on getting Lustre working with a 3.10.2 kernel and that version of the kernel impacts process name space handling. First I need to handle the proc changes in 3.10+

Comment by Peng Tao [ 23/Jul/13 ]

James, do you mean you are working on procfs change in 3.10? It is tracked on LU-3319. I thought that liuying is working on it. You may want to coordinate with each other so don't waste the efforts.

Comment by Cory Spitz [ 28/Aug/13 ]

James posted http://review.whamcloud.com/#/c/7454 for gnilnd

Comment by Jodi Levi (Inactive) [ 16/Sep/13 ]

Follow on work for this ticket is being tracked in LU-3963

Comment by Alexey Lyashkov [ 27/Sep/13 ]

please revert
commit b22fb817507ff52c02de38435fe90d758e852105
Author: James Simmons <uja.ornl@gmail.com>
Date: Fri Sep 13 09:44:00 2013 -0400

LU-1346 libcfs: replace CFS_CAP_XXX with kernel definition

that defines needed because Linux capability now have size more then lustre wire protocol so we don't able to pack all capability into wire. That is main reason to introduce these macroses, please look to commit and ticket where it's adds to lustre tree.

I see you are drop compatibility with different OS'es and kernels, but that is very very bad idea.

Comment by Alexey Lyashkov [ 27/Sep/13 ]

and PLEASE REMEMBER - any on wire constants MUST be platform independent to avoid situation when it's different on different sides of lustre cluster.

as additional objection about that work - 'fls' cleanup landed some time ago. that is broke lustre to build on system already have these functions but with different arguments as you expected.
and forget to make cfs_gettimeofday to gettimeofday changes in some places.

Comment by James A Simmons [ 27/Sep/13 ]

The patch just did a bunch of renaming of variables. cfs_cap_t is still a __u32. Can you point to exactly the break is where you believe it to be.

Please push fixes for the fls and gettimeofday issues. Patches always welcomed

Comment by Alexey Lyashkov [ 27/Sep/13 ]

OS defined flags should be never applied to the custom type, so CAP_* flags may be applied to the kernel_cap type, other is incorrect by definition, but kernel_cap_t can't be put in _u32 in wire.
reason to have CFS_CAP* to be similar CAP_* defines - capability with older (before patch) clients, but we define only few bits from kernel bitmask, so if we will be need some additional on server side we will be able to pack data from kernel bits to unused bits in wire data.
so avoid a wire protocol changes as long as possible. It's my bad - i forget to add CFS_CAP checks into wiretest script to notify - it's wire specific data and should be don't changed without real reasons.

as about breakage.

In file included from /Users/shadow/work/lustre/work/WC-review/mac-os/libcfs/include/libcfs/posix/libcfs.h:108,
                from /Users/shadow/work/lustre/work/WC-review/mac-os/libcfs/include/libcfs/libcfs.h:45,
                from nidstrings.c:43:
/Users/shadow/work/lustre/work/WC-review/mac-os/libcfs/include/libcfs/user-bitops.h:80: error: conflicting types for ‘fls’
/usr/include/strings.h:90: error: previous declaration of ‘fls’ was here

so part for cfs_fls need to be reverted.

one more additional bug

gcc -DHAVE_CONFIG_H -I. -I../..  -D__arch_lib__ -D_LARGEFILE64_SOURCE=1 -include /Users/shadow/work/lustre/work/WC-review/mac-os/config.h -I/Users/shadow/work/lustre/work/WC-review/mac-os/libcfs/include -I/Users/shadow/work/lustre/work/WC-review/mac-os/lnet/include -I/Users/shadow/work/lustre/work/WC-review/mac-os/lustre/include  -g -Wall -fPIC -D_GNU_SOURCE -g -O2 -Werror -MT libcfs_a-posix-debug.o -MD -MP -MF .deps/libcfs_a-posix-debug.Tpo -c -o libcfs_a-posix-debug.o `test -f 'posix/posix-debug.c' || echo './'`posix/posix-debug.c

cc1: warnings being treated as errors

posix/posix-debug.c: In function ‘libcfs_debug_vmsg2’:

posix/posix-debug.c:210: warning: implicit declaration of function ‘cfs_gettimeofday’
Comment by James A Simmons [ 27/Sep/13 ]

Okay we can revert the last patch. The caps patch had less of a impact on the code than the other patches for LU-1346. Proper capabilities support will need to be explored later.

Comment by Alexey Lyashkov [ 28/Sep/13 ]

Thanks! I will send maces/freebsd support patches with fuse client son (i hope in next two weeks).

Comment by Alexey Lyashkov [ 30/Sep/13 ]

I created patch with some cleanups need to compile lustre with clang. most of them is c89 style fixes with correct includes.
please inspect it.

http://review.whamcloud.com/7803

Generated at Sat Feb 10 01:15:50 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.