[LU-11200] Centos 8 arm64 server support Created: 03/Aug/18  Updated: 11/Jun/20  Resolved: 11/Jun/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.0
Fix Version/s: Lustre 2.14.0

Type: New Feature Priority: Minor
Reporter: Oleg Drokin Assignee: James A Simmons
Resolution: Fixed Votes: 0
Labels: None

Attachments: File e2fsprogs-1.45.2.wc1-0.el7.aarch64.rpm     File e2fsprogs-debuginfo-1.45.2.wc1-0.el7.aarch64.rpm     File e2fsprogs-devel-1.45.2.wc1-0.el7.aarch64.rpm     File e2fsprogs-libs-1.45.2.wc1-0.el7.aarch64.rpm     File e2fsprogs-static-1.45.2.wc1-0.el7.aarch64.rpm     File libcom_err-1.45.2.wc1-0.el7.aarch64.rpm     File libcom_err-devel-1.45.2.wc1-0.el7.aarch64.rpm     File libss-1.45.2.wc1-0.el7.aarch64.rpm     File libss-devel-1.45.2.wc1-0.el7.aarch64.rpm     Text File patch_to_patches_ldiskfsset20.txt     File results-ldiskfs-rhel8-2507.yml     Text File vmcore-dmesg.txt    
Issue Links:
Blocker
is blocking LU-12323 save_stack_trace_tsk is not exported ... Resolved
Related
is related to LU-11527 sanity test_270a failed with O_DIRECT... Resolved
is related to LU-6387 Add Power8 support to Lustre Resolved
is related to LU-6766 add support for arm64 Resolved
is related to LU-11224 T10PI assume several kernel features ... Resolved
is related to LU-11832 ARM servers crashing on MDS startup Resolved
is related to LU-9793 sanity test 244 fail Resolved
is related to LU-11878 sanity test 103b: OOM because of too ... Resolved
is related to LU-11246 New lustre e2fsprogs 1.44 issues Closed
is related to LU-11440 Make e2fsprogs-1.44.3-wc1 release Resolved
is related to LU-12137 update client to use iterate_shared Resolved
is related to LU-10300 Can the Lustre 2.10.x clients support... Resolved
Rank (Obsolete): 9223372036854775807

 Description   

I am trying to bring up arm64 support for centos7 and it turns out centos 7 / rhel 7 for arm64 uses kernel 4.14 that we do not have any ldiskfs patches for.

I checked and the closest match is ubuntu18 4.15 kernel - the ldiskfs series has only a single trivial reject in ext4-data-in-dirent.patch

We also need a way to select it in configure that's a bit more complicated since I don't know how to do it easily yet.

Then there are compile errors:

/home/green/git/lustre-release/lustre/ptlrpc/../../lustre/ldlm/ldlm_lockd.c:135:74: error: macro "DEFINE_TIMER" requires 4 arguments, but only 2 given
 static CFS_DEFINE_TIMER(waiting_locks_timer, waiting_locks_callback, 0, 0);   
                                                                          ^


 Comments   
Comment by Andreas Dilger [ 03/Aug/18 ]

It probably makes sense to add a small patch to the 4.14 ldiskfs to make it match Ubuntu to resolve the conflict rather than creating a whole new set of patches.

Comment by Oleg Drokin [ 03/Aug/18 ]

I just forked that one patch and reuse the ubuntu patches otherwise, that's what we typically did in the past.

Comment by James A Simmons [ 03/Aug/18 ]

Actually the latest RHEL alt kernels for ARM/Power8 are 4.15 kernels. I was looking at doing this work since I have been assigned ARM server support. I have been running RHEL ARM clients for some time

Comment by James A Simmons [ 03/Aug/18 ]

Never mind I was wrong. We are at 4.11. Need to look update our clients kernels.

Comment by James A Simmons [ 03/Aug/18 ]

I'm going to take over this ticket since I need to go to a ARM conference in November to present ARM lustre server support.

Comment by Oleg Drokin [ 03/Aug/18 ]

I am fine with it. Hopefully it does not take too long to have the patches to compile current master on rhel7 arm64. I sent you the ldiskfs series update, but it still needs the configure magic of course. - please make it a separate patch from all the other possible compile fixes (the DEFINE_TIMER and whatever else might crop up) so we can enable arm64 builder permanently and make sure it no longer breaks.

Comment by Gerrit Updater [ 05/Aug/18 ]

James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/32939
Subject: LU-11200 libcfs: handle DECLARE_TIMER reduced to two arguments
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: fffd922836c7cae6bafd650a41232d845edaf8cf

Comment by Gerrit Updater [ 05/Aug/18 ]

James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/32940
Subject: LU-11200 ldiskfs: add support for RHEL7.5 alt kernel
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: f0cf010cadf6a16848d7ec3e02099df6087a0317

Comment by James A Simmons [ 09/Aug/18 ]

I have attached to this ticket the lustre version of the e2fsprogs which is the latest in the lustre-master-test test branch.

Comment by Andreas Dilger [ 09/Aug/18 ]

I assume from the e2fsprogs attachments that you didn't have any problems building the master-lustre-test branch on ARM?

Comment by James A Simmons [ 10/Aug/18 ]

No problem. Same for Power8. Only have trouble with Ubuntu systems. I will track that down.

Comment by Peter Jones [ 10/Aug/18 ]

James

Is anything needed for clients to work - either on master or on b2_10?

Peter

Comment by James A Simmons [ 10/Aug/18 ]

Technically I'm testing on a Power8 which is at RHE7.5 alt using the 4.14 kernel. The ARM system I have is in the progress of moving up to RHE7.5 alt. Both platforms use the same kernel. For my testing on Power8 I need the two patches from this ticket as well as the patch from LU-11224. I also have LU-11096 which might be needed for ARM. For Power8 its not required. That is for master. For b2_10 support it would be a lot of work. Lots of bits are missing from LNet to make 64K page systems to work.

Comment by Andreas Dilger [ 10/Aug/18 ]

You probably mean something different than LU-11124 "Add 'lfs getstripe -N' option to print mirror count"...

Comment by James A Simmons [ 10/Aug/18 ]

Yes a typo. I updated the comment to reflect the proper LU

Comment by Gerrit Updater [ 18/Aug/18 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32939/
Subject: LU-11200 libcfs: handle DECLARE_TIMER reduced to two arguments
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: ac40000d4bde21f807a68cb2add326ea5d77385c

Comment by James A Simmons [ 19/Aug/18 ]

Client support for ARM and Power8 restored. Now for server support.

Comment by Peter Jones [ 19/Aug/18 ]

James

So what (if anything) would be needed on b2_10 to offer ARM/Power8 client support?

Peter

Comment by James A Simmons [ 20/Aug/18 ]

First we need to port a bunch of patches to support newer kernels. Then figure out what to do with the ko2iblnd stack with 64K and map_on_demand work.

Comment by James A Simmons [ 28/Nov/18 ]

Found an issue with RHEL7 alt kernel for my Power8 system. Now I can attempt to mount ldiskfs but I'm encountering:

[156572.269381] LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodela

lloc

[156794.172421] INFO: task mount.lustre:37664 blocked for more than 120 seconds.

[156794.172515]       Tainted: P        W  OE  ------------   4.14.0-49.6.1.el7a.ppc64le #1

[156794.172524] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

[156794.172556] mount.lustre    D    0 37664  37663 0x00042080

[156794.172605] Call Trace:

[156794.172636] [c000001dd7f16ba0] [c00000000001cde0] __switch_to+0x330/0x660

[156794.172699] [c000001dd7f16c00] [c000000000c5ff64] __schedule+0x354/0xaf0

[156794.172758] [c000001dd7f16cd0] [c000000000c60748] schedule+0x48/0xc0

[156794.172817] [c000001dd7f16d00] [c000000000c65e88] rwsem_down_read_failed+0x148/0x1f0

[156794.172888] [c000001dd7f16d80] [c000000000c65038] down_read+0x78/0x80

[156794.172964] [c000001dd7f16db0] [d00000000dc31dc4] ldiskfs_readdir+0x704/0xa40 [ldiskfs]

[156794.173046] [c000001dd7f16ed0] [d00000000e94f718] osd_ios_general_scan+0x148/0x350 [osd_ldiskfs]

[156794.173136] [c000001dd7f16fb0] [d00000000e95ab28] osd_initial_OI_scrub+0x178/0x1770 [osd_ldiskfs]

[156794.173226] [c000001dd7f17150] [d00000000e95cdfc] osd_scrub_setup+0x8bc/0x1300 [osd_ldiskfs]

[156794.173314] [c000001dd7f172e0] [d00000000e923b18] osd_device_alloc+0x6e8/0xa90 [osd_ldiskfs]

[156794.173424] [c000001dd7f173c0] [d00000002821e478] class_setup+0xaf8/0x10d0 [obdclass]

[156794.173516] [c000001dd7f17510] [d000000028229924] class_process_config+0x1d64/0x3980 [obdclass]

[156794.173620] [c000001dd7f17640] [d000000028232dc8] do_lcfg+0x358/0x890 [obdclass]

[156794.173713] [c000001dd7f17770] [d000000028237af4] lustre_start_simple+0x1d4/0x460 [obdclass]

[156794.173817] [c000001dd7f17840] [d000000028279404] osd_start+0x714/0xd40 [obdclass]

[156794.173911] [c000001dd7f17960] [d000000028287508] server_fill_super+0x268/0x1cc0 [obdclass]

[156794.174015] [c000001dd7f17a60] [d00000002823cb40] lustre_fill_super+0x7c0/0x1050 [obdclass]

[156794.174098] [c000001dd7f17b20] [c0000000004494d0] mount_nodev+0x160/0x390

[156794.174178] [c000001dd7f17b90] [d000000028234ad4] lustre_mount+0x54/0x70 [obdclass]

[156794.174249] [c000001dd7f17be0] [c00000000044b97c] mount_fs+0x8c/0x230

[156794.174309] [c000001dd7f17c80] [c000000000487978] vfs_kern_mount+0x78/0x1b0

 

Comment by Shuichi Ihara [ 03/Apr/19 ]

Hi James,
I've tested your latest patch series (patch set 8), got crash. Please see attached vmcore-dmesg.txt (https://jira.whamcloud.com/secure/attachment/32378/vmcore-dmesg.txt) it was kernel-alt on x86_64 system though.
I will try patch on arm system too.

Comment by James A Simmons [ 04/Apr/19 ]

I think I resolved the locking issues for ldiskfs. The problem was try_lookup_one_len() can return NULL. In the original ldiskfs scrub code the dentry was then allocated in that case. So if try_lookup_one_len() doesn't find a dentry we have to create one.

Comment by James A Simmons [ 22/Apr/19 ]

Shuichi Ihara try both:

https://review.whamcloud.com/#/c/34714

https://review.whamcloud.com/#/c/32940

Comment by Baptiste Gerondeau (Inactive) [ 12/Jun/19 ]

Hi,

Thank you for your patches and efforts !

I've tested the latest series on an ARM64 machine (VM hosted on a ThunderX2 with Infiniband to be precise).
First off, when using the 'vanilla' CentOS distributed 4.14.0-49.10.1.el7a.aarch64 kernel, it seems ldiskfs's kmod complains about missing 'fscrypt' symbols.
Thus I recompiled the kernel enabling/forcing EXT4_ENCRYPTION, with that I can load ldiskfs's kmods fine.
However, when running llmount.sh (naively, no modifications to any config file and using LDISKFS_MKFS_OPTS="^metadata_csum" because of "LDISKFS-fs (loop0): Couldn't mount because of unsupported optional features (400)") I get :

[bgerdeb@lustretest ~]$ sudo FSTYPE=ldiskfs LDISKFS_MKFS_OPTS="^metadata_csum" /usr/lib64/lustre/tests/llmount.sh
Stopping clients: lustretest /mnt/lustre (opts:-f)
Stopping clients: lustretest /mnt/lustre2 (opts:-f)
lustretest: executing set_hostid
Loading modules from /usr/lib64/lustre/tests/..
detected 56 online CPUs by sysfs
libcfs will create CPU partition based on online CPUs
Formatting mgs, mds, osts
Format mds1: /tmp/lustre-mdt1
Failed to initialize ZFS library: 256

mkfs.lustre FATAL: Unable to build fs /dev/loop0 (256)

mkfs.lustre FATAL: mkfs failed 256

Furthermore, dmesg shows :

[ 164.229457] libcfs: loading out-of-tree module taints kernel.
[ 164.231426] libcfs: module verification failed: signature and/or required key missing - tainting kernel
[ 164.239068] LNet: HW NUMA nodes: 1, HW CPU cores: 56, npartitions: 28
[ 164.244595] alg: No test for adler32 (adler32-zlib)
[ 165.001287] Lustre: DEBUG MARKER: lustretest: executing set_hostid
[ 165.140092] Lustre: Lustre: Build Version: 2.12.54
[ 165.210256] LNet: Added LNI 10.40.16.14@tcp [8/1792/0/180]
[ 165.211486] LNet: Accept secure, port 988
[ 166.796493] Lustre: 6332:0:(gss_svc_upcall.c:1119:gss_init_svc_upcall()) Init channel is not opened by lsvcgssd, following request might be dropped until lsvcgssd is active
[ 166.799447] Key type lgssc registered
[ 166.949177] Lustre: Echo OBD driver; http://www.lustre.org/
[ 177.165649] loop: module loaded
[...]
[ 4728.518978] Lustre: DEBUG MARKER: lustretest: executing set_hostid
[ 4738.611605] print_req_error: I/O error, dev loop0, sector 0
[ 4738.617781] print_req_error: I/O error, dev loop0, sector 0
[ 4738.623807] print_req_error: I/O error, dev loop0, sector 0
[ 4738.629806] print_req_error: I/O error, dev loop0, sector 0
[ 4738.636358] print_req_error: I/O error, dev loop0, sector 0
[ 4738.642873] print_req_error: I/O error, dev loop0, sector 0
[ 4759.617658] print_req_error: I/O error, dev loop0, sector 0
[ 4759.650171] print_req_error: I/O error, dev loop0, sector 0
[ 4759.655604] print_req_error: I/O error, dev loop0, sector 0
[ 4759.661274] print_req_error: I/O error, dev loop0, sector 0
[ 4759.666946] print_req_error: I/O error, dev loop0, sector 0
[ 4759.672655] print_req_error: I/O error, dev loop0, sector 0

Those last 'print_req_error' seem to be inherent to this precise version of the kernel/EXT4 drivers (Ubuntu's bug tracker is full of those errors).
I also encountered them when running llmount.sh with FSTYPE=zfs on this kernel (4.14.0-49.10.1.el7a.aarch64, both vanilla and my builds) crashing llmount.sh

When compiling Lustre (master + both patches cherry picked) against the latest CentOS 7.6 aarch64 kernel (yesterday's 4.14.0-115.8.1.el7a.aarch64, as well as 4.14.0-115.7.1.el7a.aarch64) with only ZFS enabled, I can llmount.sh just fine.
Note that I haven't run much of auster on this (I need to migrate the VM, not enough space left...), but I'm not getting any print_req_error.

Comment by James A Simmons [ 12/Jun/19 ]

metadata_csum is new. As for the loop device you might need the new e2fsprogs being worked on for RHEL8. In fact give me some time and I will build you updated e2fsprogs for you. I do plan on updating the LU-11200 patch due to RHEL8 changes as well. 

Comment by Baptiste Gerondeau (Inactive) [ 13/Jun/19 ]

Thanks a lot ! Looking forward to trying out those patches !

Comment by James A Simmons [ 13/Jun/19 ]

Updated the e2fsprogs. I hope to have the RHEL7.6alt patches ready over the next few days.

Comment by Baptiste Gerondeau (Inactive) [ 13/Jun/19 ]

Thanks ! Downloaded and reinstalled your e2fsprogs over mine, tested and got the same result as above.
I might redo a clean build and install later with your e2fsprogs just to make sure.
Looking forward to the 7.6alt patches !

Comment by James A Simmons [ 17/Jun/19 ]

Just updated the ldiskfs patch. Currently we are running 4.14.43-200 and that kernel is nearly identical to the Ubuntu 18 LTS kernel so its a really easy update. That patch plus the LU-12137 patch should get you mounting a file system. The only issue is that LU-12137 isn't complete on some recovery points but you should be able to mount ldiskfs on ARM.

Comment by Baptiste Gerondeau (Inactive) [ 18/Jun/19 ]

I can't seem to find a 4.14.43-200 kernel (for aarch64). The closest I have is Fedora's 4.14.13-200. Is it that one ?

Comment by James A Simmons [ 18/Jun/19 ]

I saw 4.14.43-200 kernels for Fedora x86 but I can't seem to find one for ARM. The kernel I'm using is directly from the vendor using RedHat. Can you try to see if you have build issues with this version? I do see the latest fedora is 30 for ARM and it is a 5.0 kernel which is being worked on

Comment by Baptiste Gerondeau (Inactive) [ 19/Jun/19 ]

So I tried with the aforementioned kernel : the good news is that I only had to do minor changes to your rhel7.5alt patch (see : this file), which I already had to do before but failed to mention it seems...) for it to apply to this kernel's EXT4 tree.

The bad news is that it fails to compile with GCC 5.4.0 where it succeeded before; here are the errors :

In file included from include/linux/kernel.h:10:0,
from include/linux/list.h:9,
from include/linux/wait.h:7,
from include/linux/wait_bit.h:8,
from include/linux/fs.h:6,
from /home/bgerdeb/lustre_build/lustre/lustrearm/ldiskfs/dir.c:25:
/home/bgerdeb/lustre_build/lustre/lustrearm/ldiskfs/dir.c: In function ‘__ldiskfs_check_dir_entry’:
*/home/bgerdeb/lustre_build/lustre/lustrearm/ldiskfs/dir.c:71:22: error: implicit declaration of function ‘__LDISKFS_DIR_REC_LEN’ [-Werror=implicit-function-declaration]
if (unlikely(rlen < __LDISKFS_DIR_REC_LEN(1)))*
^
include/linux/compiler.h:77:42: note: in definition of macro ‘unlikely’

  1. define unlikely __builtin_expect(!!, 0)
    ^
    */home/bgerdeb/lustre_build/lustre/lustrearm/ldiskfs/ldiskfs.h:2031:77: error: invalid operands to binary & (have ‘struct ldiskfs_dir_entry_2 *’ and ‘int’)
    #define LDISKFS_DIR_REC_LEN(name_len) (((name_len) + 8 + LDISKFS_DIR_ROUND) & \
    ^*
    include/linux/compiler.h:77:42: note: in definition of macro ‘unlikely’
  2. define unlikely __builtin_expect(!!, 0)
    ^
    /home/bgerdeb/lustre_build/lustre/lustrearm/ldiskfs/dir.c:75:27: note: in expansion of macro ‘LDISKFS_DIR_REC_LEN’
    else if (unlikely(rlen < LDISKFS_DIR_REC_LEN(de)))
    ^
    */home/bgerdeb/lustre_build/lustre/lustrearm/ldiskfs/dir.c: In function ‘ldiskfs_htree_store_dirent’:
    /home/bgerdeb/lustre_build/lustre/lustrearm/ldiskfs/dir.c:451:26: error: ‘LDISKFS_DIRENT_LUFID’ undeclared (first use in this function)
    if (dirent->file_type & LDISKFS_DIRENT_LUFID)
    ^*
    /home/bgerdeb/lustre_build/lustre/lustrearm/ldiskfs/dir.c:451:26: note: each undeclared identifier is reported only once for each function it appears in
    */home/bgerdeb/lustre_build/lustre/lustrearm/ldiskfs/dir.c:452:16: error: implicit declaration of function ‘ldiskfs_get_dirent_data_len’ [-Werror=implicit-function-declaration]
    extra_data = ldiskfs_get_dirent_data_len(dirent);
    ^*
    /home/bgerdeb/lustre_build/lustre/lustrearm/ldiskfs/dir.c: At top level:
    cc1: warning: unrecognized command line option ‘-Wno-stringop-overflow’
    cc1: warning: unrecognized command line option ‘-Wno-stringop-truncation’
    cc1: warning: unrecognized command line option ‘-Wno-format-truncation’

[... make -j12 so it goes on ..]

/home/bgerdeb/lustre_build/lustre/lustrearm/lustre/osd-ldiskfs/osd_handler.c:3695:20: warning: ‘struct ldiskfs_dentry_param’ declared inside parameter list
const struct lu_fid *fid)
^
/home/bgerdeb/lustre_build/lustre/lustrearm/lustre/osd-ldiskfs/osd_handler.c:3695:20: warning: its scope is only this definition or declaration, which is probably not what you want
*/home/bgerdeb/lustre_build/lustre/lustrearm/lustre/osd-ldiskfs/osd_handler.c: In function ‘osd_get_ldiskfs_dirent_param’:
/home/bgerdeb/lustre_build/lustre/lustrearm/lustre/osd-ldiskfs/osd_handler.c:3699:8: error: dereferencing pointer to incomplete type ‘struct ldiskfs_dentry_param’
param->edp_magic = 0;
^*
*/home/bgerdeb/lustre_build/lustre/lustrearm/lustre/osd-ldiskfs/osd_handler.c:3703:21: error: ‘LDISKFS_LUFID_MAGIC’ undeclared (first use in this function)
param->edp_magic = LDISKFS_LUFID_MAGIC;
^*
/home/bgerdeb/lustre_build/lustre/lustrearm/lustre/osd-ldiskfs/osd_handler.c:3703:21: note: each undeclared identifier is reported only once for each function it appears in
*/home/bgerdeb/lustre_build/lustre/lustrearm/lustre/osd-ldiskfs/osd_handler.c: In function ‘osd_add_dot_dotdot_internal’:
/home/bgerdeb/lustre_build/lustre/lustrearm/lustre/osd-ldiskfs/osd_handler.c:3746:31: error: passing argument 1 of ‘osd_get_ldiskfs_dirent_param’ from incompatible pointer type [-Werror=incompatible-pointer-types]
osd_get_ldiskfs_dirent_param(dot_dot_ldp, dot_dot_fid);
^*
/home/bgerdeb/lustre_build/lustre/lustrearm/lustre/osd-ldiskfs/osd_handler.c:3694:13: note: expected ‘struct ldiskfs_dentry_param *’ but argument is of type ‘struct ldiskfs_dentry_param *’
static void osd_get_ldiskfs_dirent_param(struct ldiskfs_dentry_param *param,
^
*/home/bgerdeb/lustre_build/lustre/lustrearm/lustre/osd-ldiskfs/osd_handler.c:3749:9: error: dereferencing pointer to incomplete type ‘struct ldiskfs_dentry_param’
dot_ldp->edp_magic = 0;
^
/home/bgerdeb/lustre_build/lustre/lustrearm/lustre/osd-ldiskfs/osd_handler.c:3751:7: error: implicit declaration of function ‘ldiskfs_add_dot_dotdot’ [-Werror=implicit-function-declaration]
rc = ldiskfs_add_dot_dotdot(oth->ot_handle, parent_dir,
^*
*/home/bgerdeb/lustre_build/lustre/lustrearm/lustre/osd-ldiskfs/osd_handler.c: In function ‘osd_get_fid_from_dentry’:
/home/bgerdeb/lustre_build/lustre/lustrearm/lustre/osd-ldiskfs/osd_handler.c:5074:22: error: ‘LDISKFS_DIRENT_LUFID’ undeclared (first use in this function)
if (de->file_type & LDISKFS_DIRENT_LUFID)

{ ^* */home/bgerdeb/lustre_build/lustre/lustrearm/lustre/osd-ldiskfs/osd_handler.c: In function ‘__osd_ea_add_rec’: /home/bgerdeb/lustre_build/lustre/lustrearm/lustre/osd-ldiskfs/osd_handler.c:5407:6: error: dereferencing pointer to incomplete type ‘struct ldiskfs_dentry_param’ ldp->edp_magic = 0; ^* */home/bgerdeb/lustre_build/lustre/lustrearm/lustre/osd-ldiskfs/osd_handler.c:5409:32: error: passing argument 1 of ‘osd_get_ldiskfs_dirent_param’ from incompatible pointer type [-Werror=incompatible-pointer-types] osd_get_ldiskfs_dirent_param(ldp, fid); ^* /home/bgerdeb/lustre_build/lustre/lustrearm/lustre/osd-ldiskfs/osd_handler.c:3694:13: note: expected ‘struct ldiskfs_dentry_param *’ but argument is of type ‘struct ldiskfs_dentry_param *’ static void osd_get_ldiskfs_dirent_param(struct ldiskfs_dentry_param *param, ^ */home/bgerdeb/lustre_build/lustre/lustrearm/lustre/osd-ldiskfs/osd_handler.c:5427:22: error: ‘LDISKFS_DIRENT_LUFID’ undeclared (first use in this function) de->file_type = LDISKFS_DIRENT_LUFID | ^* */home/bgerdeb/lustre_build/lustre/lustrearm/lustre/osd-ldiskfs/osd_handler.c: In function ‘osd_ldiskfs_filldir’: /home/bgerdeb/lustre_build/lustre/lustrearm/lustre/osd-ldiskfs/osd_handler.c:6550:22: error: ‘LDISKFS_DIRENT_LUFID’ undeclared (first use in this function) }

else if (d_type & LDISKFS_DIRENT_LUFID) {
^*
*In file included from /home/bgerdeb/lustre_build/lustre/lustrearm/lustre/osd-ldiskfs/osd_handler.c:56:0:
/home/bgerdeb/lustre_build/lustre/lustrearm/lustre/osd-ldiskfs/osd_handler.c: In function ‘osd_dotdot_has_space’:
/home/bgerdeb/lustre_build/lustre/lustrearm/ldiskfs/ldiskfs.h:2031:77: error: invalid operands to binary & (have ‘struct ldiskfs_dir_entry_2 *’ and ‘int’)
#define LDISKFS_DIR_REC_LEN(name_len) (((name_len) + 8 + LDISKFS_DIR_ROUND) & \
^*
/home/bgerdeb/lustre_build/lustre/lustrearm/lustre/osd-ldiskfs/osd_handler.c:6711:6: note: in expansion of macro ‘LDISKFS_DIR_REC_LEN’
if (LDISKFS_DIR_REC_LEN(de) >=
^
*/home/bgerdeb/lustre_build/lustre/lustrearm/lustre/osd-ldiskfs/osd_handler.c:6712:6: error: implicit declaration of function ‘__LDISKFS_DIR_REC_LEN’ [-Werror=implicit-function-declaration]
__LDISKFS_DIR_REC_LEN(2 + 1 + sizeof(struct osd_fid_pack)))
^*
*/home/bgerdeb/lustre_build/lustre/lustrearm/lustre/osd-ldiskfs/osd_handler.c: In function ‘osd_dirent_reinsert’:
/home/bgerdeb/lustre_build/lustre/lustrearm/lustre/osd-ldiskfs/osd_handler.c:6762:20: error: ‘LDISKFS_DIRENT_LUFID’ undeclared (first use in this function)
de->file_type |= LDISKFS_DIRENT_LUFID;
^*
*/home/bgerdeb/lustre_build/lustre/lustrearm/lustre/osd-ldiskfs/osd_handler.c:6775:31: error: passing argument 1 of ‘osd_get_ldiskfs_dirent_param’ from incompatible pointer type [-Werror=incompatible-pointer-types]
osd_get_ldiskfs_dirent_param(ldp, fid);
^*
/home/bgerdeb/lustre_build/lustre/lustrearm/lustre/osd-ldiskfs/osd_handler.c:3694:13: note: expected ‘struct ldiskfs_dentry_param *’ but argument is of type ‘struct ldiskfs_dentry_param *’
static void osd_get_ldiskfs_dirent_param(struct ldiskfs_dentry_param *param,
^
CC [M] /home/bgerdeb/lustre_build/lustre/lustrearm/lustre/obdclass/class_obd.o
/home/bgerdeb/lustre_build/lustre/lustrearm/lustre/osd-ldiskfs/osd_handler.c: At top level:
cc1: warning: unrecognized command line option ‘-Wno-stringop-overflow’
cc1: warning: unrecognized command line option ‘-Wno-stringop-truncation’
cc1: warning: unrecognized command line option ‘-Wno-format-truncation’

Note, I always cherry pick your patches on top of master. Will rebase and retry.

EDIT : Tried to put in bold the errors, of course thanks to Atlassian's wonderful text editor, it failed.

Comment by James A Simmons [ 20/Jun/19 ]

It seems Fedora is a different beast then RHEL7.6alt. Would you be willing to try Fedora30 since its a 5.0 kernel and that support is being worked on. I'm hoping we can just use the same ldiskfs patch set as the 5.0 series.

Comment by Peter Jones [ 20/Jun/19 ]

Talking to Marvell at ISC this week and they seemed to think that focusing on RHEL8 should be the first priority in this arena.

Comment by James A Simmons [ 20/Jun/19 ]

Are you suggesting we drop RHEL7 alt support?  Why that seems like a good idea people might be stuck with what their vendor supports  I think their might be a delay for some time before RHEL8 is picked up ARM vendors. For ORNL we will be getting new ARM servers in soon and one set will be RHEL7 alt and the second set will be RHEL8. Thankfully other vendors haven't made the mistake of rolling their own special ARM only kernels.

Comment by Peter Jones [ 20/Jun/19 ]

I am working to the assumption that it will take a little while to get Arm server support working so focusing on something with a longer shelf life would be best

Comment by James A Simmons [ 21/Jun/19 ]

We are much closer than you think. The challenge has been getting people to test the needed changes. Now that I have worked out most of the issues for LU-11893 I can focus on finishing this. I talked to Shaun from Cray and he is willing to work with me to finishing this work up.

Comment by James A Simmons [ 24/Jun/19 ]

I have resolved the ldiskfs locking issues that ARM testers have reported. Should be in a good position to finish this work off.

Comment by Baptiste Gerondeau (Inactive) [ 26/Jun/19 ]

I have tested your patches against Fedora 30's 5.1.1 kernel, and the patches do not apply to the ext4.
I have also tested CentOS's 4.14.43-201 and the ialloc.c differs non trivially from what "rhel7.6alt/ext4-corrupted-inode-block-bitmaps-handling-patches.patch" expects.

I'd advise to use the upstream CentOS 7.6alt kernel : here which is the one you get doing a netinstall by default (if you don't run yum update afterwards).
The next to latest one is this one : here
For the latest one, kernel-alt-4.14.0-115.8.1.el7a, I can't seem to find the src rpm but here is the changelog : it doesn't seem to include any changes to EXT4 (only XFS).
I'll try to get a diff of 7.1 and 8.1's ext4 to make sure.

Concerning 7.6 vs RHEL8, I'm happy to test both.
Since the timeline to CentOS 8 is not clear, 7.6alt is still (the most) relevant (and since I haven't tested CentOS 8, there is no assurance it is as stable as 7.6alt yet).

Comment by Baptiste Gerondeau (Inactive) [ 25/Jul/19 ]

I have tested out the latest lustre-release master on RHEL8 ARM64 VM and can confirm that it builds installs, runs and passes (most of) the sanity test suite with LDISKFS (in all-on-one-node configuration at the moment) !

Here are the results : results-ldiskfs-rhel8-2507.yml !
Will investigate further tomorrow, try to isolate the FAILS.

Comment by James A Simmons [ 10/Oct/19 ]

Baptiste I updated https://review.whamcloud.com/#/c/34714 This should make RHLE8 ARM fully functional.

Comment by Baptiste Gerondeau (Inactive) [ 28/Oct/19 ]

Thanks a lot !
Will take a look at it this week and report back (here I guess ? Let me know if I should open an issue/if there is a more "à propos" issue elsewhere)

Comment by Andreas Dilger [ 11/Jun/20 ]

The last patch for this ticket was landed in 2.14, and RHEL8 clients are working for all arches.

Generated at Sat Feb 10 02:41:49 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.