[LU-4199] Lustre client support for ARM platform Created: 01/Nov/13  Updated: 11/Mar/17  Resolved: 11/Mar/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0
Fix Version/s: None

Type: New Feature Priority: Minor
Reporter: Shuichi Ihara (Inactive) Assignee: James A Simmons
Resolution: Fixed Votes: 0
Labels: patch
Environment:

ARM compatibility processor


Issue Links:
Related
is related to LU-4668 compile error on ppc64: asm/stacktrac... Resolved
is related to LU-6766 add support for arm64 Resolved
Rank (Obsolete): 11408

 Description   

We are interested in Lustre on ARM based processor platform. It runs on standard Linux kernel + specific patches today. This encourages Lustre runs on ARM based platform. So, we have ported few Lustre codes to adapt ARM processor and Lustre client runs on it.



 Comments   
Comment by Peter Jones [ 01/Nov/13 ]

Sounds interesting. A couple of people were asking about ARM support in Lustre at SC last year. Do you have anything ready to contribute yet?

Comment by Shuichi Ihara (Inactive) [ 01/Nov/13 ]

Peter, Yes, we have patch and tested on tiny ARM board. We will push patch here very soon.

Comment by Li Xi (Inactive) [ 02/Nov/13 ]

Here is the patch.
http://review.whamcloud.com/8144
But there is problem. Folloing newly merged patch removes LIBCFS_FUNC_DUMP_TRACE test since dump_trace() function is always defined on X86 platform. However, there is no dump_trace() function for ARM platform (e.g. Linux-3.6). We are trying to figure out what is the best way to fix this problem.
http://git.whamcloud.com/?p=fs/lustre-release.git;a=commit;h=8250f49e3d0431db6a6363f959d2cce65684c74e

Comment by Shuichi Ihara (Inactive) [ 04/Nov/13 ]

We have built Lustre codes (both kernel modules and user space utilities) with http://review.whamcloud.com/8144 using Lustre cross build environment.
And built Lustre kernel modules and utilizes have been tested on Raspberry PI http://www.raspberrypi.org/ which is ARMv6 compatibility processor based board. We are still investigating pcakging (.e.g. RPM, deb) for ARM based Linux distribution. The following steps are still a little bit complicated, but it works well.

Host system (x86_64, CentOS6.4)

The kernel re-compile is NOT required for the lustre and it's NOT part of Lustre. Howerver, the latest kenrel source tree has some updates from original pre-installed kernel. I did re-compile and installed before Lustre build.

Creating "dummy root directory" to transfer modules and binaries for Rasberry Pi from host system 
# mkdir -p /usr/src/dummyroot/boot

Setup Cross build environment and re-compile kernel with them
# cd /usr/src/
# git clone --depth 5 https://github.com/raspberrypi/tools.git
# export PATH=/usr/src/tools/arm-bcm2708/arm-bcm2708-linux-gnueabi/bin:$PATH
# git clone -b rpi-3.6.y --depth 5 git://github.com/raspberrypi/linux.git rpi-3.6.y

# cd rpi-3.6.y
# cp arch/arm/configs/bcmrpi_defconfig .config
# yes "" | make ARCH=arm CROSS_COMPILE=arm-bcm2708-linux-gnueabi- oldconfig
# make ARCH=arm CROSS_COMPILE=arm-bcm2708-linux-gnueabi- 

Install kernel images and modules to "dummy root"
# make ARCH=arm CROSS_COMPILE=arm-bcm2708-linux-gnueabi- \
INSTALL_PATH=/usr/src/dummyroot/boot install
# make ARCH=arm CROSS_COMPILE=arm-bcm2708-linux-gnueabi- INSTALL_MOD_PATH=/usr/src/dummyroot 

Lustre build against rpi-3.6.y linux source tree

Checkout LU-4199 patches and build Lustre with Cross compile option
(compiled modules and binaries are installed to dummy root directory as well)
# sh ./autogen.sh
# ./configure --with-linux=/usr/src/rpi-3.6.y/ --without-o2ib \
--host=arm-bcm2708-linux-gnueabi --prefix=/usr/src/dummyroot
# make ARCH=arm CROSS_COMPILE=arm-bcm2708-linux-gnueabi- CROSS_PATH=/usr/src/dummyroot 
# make ARCH=arm CROSS_COMPILE=arm-bcm2708-linux-gnueabi- CROSS_PATH=/usr/src/dummyroot install

Export "dummy root" directory for Raspberry Pi
# exportfs *:/usr/src/dummyroot

Target system(ARM, Raspberry Pi, Raspbian)

Boot Raspberry Pi and login the system, then mount host system's dummy root directory via NFS.
# mount <host system's IP address>:/usr/src/dummyroot /mnt

Install kernel image, modules (include Lustre) and Lustre user utility
# cp /mnt/boot/vmlinuz-3.6.11+ /boot/kernel.img
# rsync -av --exclude=/boot /mnt/ /
# depmod -a 3.6.11+
# sync; reboot

Now, it can mounts the lustre from Rasberry PI board

root@raspberrypi:~# mount -t lustre 192.168.1.27@tcp:/lustre /lustre
root@raspberrypi:~# df -t lustre
Filesystem               1K-blocks  Used Available Use% Mounted on
192.168.1.27@tcp:/lustre    984248 38024    896224   5% /lustre
Comment by James A Simmons [ 04/Nov/13 ]

We can add back in testing for dump_trace again. The test was broken before since it only tested on x86 platforms anyways. Looking at the latest kernel code only x86 and parisc has dump_trace.

Comment by James A Simmons [ 04/Nov/13 ]

Looked at the code more closely and it appears dump_trace is x86 specific. The parisc platform only uses it internally.

Comment by Shuichi Ihara (Inactive) [ 04/Nov/13 ]

thanks for checking, so "#ifdef CONFIG_X86" would be fine. we added this in linux-debug.c.

Comment by James A Simmons [ 22/Nov/13 ]

I seen in your patches you have implemented UMP versions of cfs_cpu_ht_nsiblings and cfs_cpt_table_print. What does cat /proc/sys/lnet/cpu_partition_table look like for you with the default function?

Comment by Li Xi (Inactive) [ 23/Nov/13 ]

Hi James,

Thanks for your review! The default cfs_cpu_ht_nsiblings and cfs_cpt_table_print only work for SMP. Since the uniprocessor version of "struct cfs_cpt_table" does not have necccessary fields, I am afraid it is impossible to use default cfs_cpt_table_print() for uniprocessor systems. We had to add new versions of those functions, otherwise the compiling will fail.

Thanks!

Comment by James A Simmons [ 25/Nov/13 ]

Oh I missed that struct cfs_cpt_table is different for the UMP case. I also see the dependency of HAVE_LIBCFS_CPT on CONFIG_SMP. I will changed my review of your patch.

Comment by Li Xi (Inactive) [ 25/Nov/13 ]

James, Thanks for checking.

Comment by James A Simmons [ 25/Nov/13 ]

No problem. I also included Alexey on the review to make sure we don't break his Darwin port.

Comment by James A Simmons [ 06/Jan/14 ]

Patch http://review.whamcloud.com/#/c/8144 is ready for inspect and possible landing.

Comment by Li Xi (Inactive) [ 16/Jan/14 ]

I've splitted the patch into multiple patches.
1. LU-4199 libcfs: remove assertion of spin_is_locked()
http://review.whamcloud.com/#/c/8144
The newly merged nrs_tbf.c is updated.
2. LU-4199 libcfs: add wrapper of dump_trace()
http://review.whamcloud.com/#/c/8872
3. LU-4199 libcfs: add CPU table functions for uniprocessor
http://review.whamcloud.com/#/c/8873
4. LU-4199 libcfs: add wrapper of PAGE_SIZE
http://review.whamcloud.com/#/c/8877
This is a new problem when compiling Lustre on Raspberry Pi. Compilation of userspace utils fails becuasee PAGE_SIZE is missing. It seems we should not use PAGE_SIZE from userspace because page.h will not be exported to usespace in the future.
5. LU-4199 build: add ARM support in build system
http://review.whamcloud.com/8878
Oleg has given some advices on simpling the script of lustre-build-linux.m4. However, I don't have MPSS or k1om test environment, so it seems easy for me to break the script when building on those architectures. Any idea about how to improve it?

We used to hit following BUGs when compiling, but it is gone now.
/home/lixi/lustre/lustre-release.git/lustre/lov/lov_pack.c: In function 'lov_getstripe':
/home/lixi/lustre/lustre-release.git/lustre/lov/lov_pack.c:632:9: error: duplicate case value
/home/lixi/lustre/lustre-release.git/lustre/lov/lov_pack.c:632:9: error: previously used here

The kernel crashed when running "ls" on Lustre client of Raspberry Pi. lmv_read_entry() hit "Unable to handle kernel NULL pointer dereference at virtual address 00000003". We did not see that problem when the version is 2.5.50. We are trying to figure out since which version the problem starts.

Comment by Li Xi (Inactive) [ 16/Jan/14 ]

It is confirmed that 4e0c8aeb94 (LU-3531 llite: move dir cache to MDC layer) is the first bad commit that has crash problem of lmv_read_entry().

Comment by Li Xi (Inactive) [ 20/Jan/14 ]

In the last seperate patches, I did not include the change which adds packed attribute to some structures. Those packed attribute fixed the problem when compiling Lustre codes:
/home/lixi/lustre/lustre-release.git/lustre/lov/lov_pack.c: In function 'lov_getstripe':
/home/lixi/lustre/lustre-release.git/lustre/lov/lov_pack.c:632:9: error: duplicate case value
/home/lixi/lustre/lustre-release.git/lustre/lov/lov_pack.c:632:9: error: previously used here

Please note this problem happens when the Lustre version is 2.5.0. For some reason, that problem did not show up when I rebased the patches to the later versions (from 2.5.1 to 2.5.54+). I don't know why and so I did not commit that part of change. I will investigate more on this.

Comment by Li Xi (Inactive) [ 29/Jan/14 ]

lmv_read_entry() crashes because lli_lmv_md field of struct ll_inode_info is not inited in ll_lli_init().

Following is the patch which fixes this problem.
http://review.whamcloud.com/9042

Comment by James A Simmons [ 06/Feb/14 ]

While testing these patch I found a problem with struct cfs_cpt_table in libcfs/include/libcfs/libcfs_cpu.h. It has a nodemask_t that is not a pointer but the struct cfs_cpt_table in libcfs/include/libcfs/linux/libcfs_cpu.h is a pointer. So this can be fixed in one of two ways. We could change it to a pointer and then update the table allocation function. The other is just remove the nodemask and cpumask in struct cfs_cpt_table since they are not used. The reason this was caught is that ost_handler.c function ost_setup accesses the ctb_nodemask directly. It really should use
cfs_cpt_nodemask instead. Which would be the preferred fix?

Comment by James A Simmons [ 28/Mar/14 ]

Only two patches left.

http://review.whamcloud.com/#/c/8144
http://review.whamcloud.com/#/c/8878

Comment by James A Simmons [ 03/Jul/14 ]

All that is left for this ticket is to see if lustre can cross compile.

Comment by James A Simmons [ 21/Dec/14 ]

Currently Lustre master branch can be build natively on ARM/PPC. IMO cross compiling can be handled by a separate ticket since that functionality touch many types of platforms.

Comment by James A Simmons [ 17/Mar/15 ]

This ticket can be closed now.

Comment by Minh Diep [ 11/Mar/17 ]

close per request

Generated at Sat Feb 10 01:40:34 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.