[LU-17095] modules.order: No such file or directory, No rule to make target 'libcfs.ko' needed by 'all-am' Created: 07/Sep/23  Updated: 23/Nov/23  Resolved: 28/Sep/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.16.0, Lustre 2.15.4

Type: Bug Priority: Minor
Reporter: Andreas Dilger Assignee: Jian Yu
Resolution: Fixed Votes: 0
Labels: build

Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Occasionally when building Lustre packages (both for RHEL and Ubuntu) the build will fail with an error about a missing modules.order file and then the build will fail:

00:05:29.164 Type 'make' to build Lustre.
00:05:29.232 + make -j10 -s
00:05:29.244 Making all in .
00:06:22.277 cat: /tmp/rpmbuild-lustre-jenkins-fmslEJ8U/BUILD/lustre-2.14.0_ddn101//modules.order: No such file or directory
00:06:22.390 Making all in lustre-iokit
00:06:22.398 Making all in obdfilter-survey
00:06:22.401 Making all in sgpdd-survey
00:06:22.404 Making all in ost-survey
00:06:22.406 Making all in ior-survey
00:06:22.409 Making all in mds-survey
00:06:22.412 Making all in stats-collect
00:06:22.419 Making all in libcfs
00:06:22.426 Making all in libcfs
00:06:22.434 Making all in linux
00:06:22.438 Making all in util
00:06:22.441 Making all in crypto
00:06:22.446 make[5]: *** No rule to make target 'libcfs.ko', needed by 'all-am'.  Stop.

https://build.whamcloud.com/job/lustre-reviews/97659/arch=x86_64,build_type=client,distro=el9.2,ib_stack=inkernel/console
https://build.whamcloud.com/job/lustre-b_es6_0/514/arch=x86_64,build_type=client,distro=rocky9.2,ib_stack=inkernel/console
https://build.whamcloud.com/job/lustre-b_es6_0/520/arch=x86_64,build_type=client,distro=sles15sp4,ib_stack=mlx/console
https://build.whamcloud.com/job/lustre-b_es6_0/522/arch=x86_64,build_type=client,distro=rocky9.2,ib_stack=inkernel/console
https://build.whamcloud.com/job/lustre-b_es-reviews/12517/arch=x86_64,build_type=client,distro=el9.0,ib_stack=mlx/console
https://build.whamcloud.com/job/lustre-b_es-reviews/12531/arch=x86_64,build_type=client,distro=el9.1,ib_stack=inkernel/console
https://build.whamcloud.com/job/lustre-b_es-reviews/12541/arch=x86_64,build_type=client,distro=el9.1,ib_stack=inkernel/console
https://build.whamcloud.com/job/lustre-b_es-reviews/12574/arch=x86_64,build_type=client,distro=sles15sp4,ib_stack=inkernel/console



 Comments   
Comment by Andreas Dilger [ 07/Sep/23 ]

It isn't clear what is causing the problem, since it is intermittently failing. It appears to happen more often with newer kernels, and possibly with more concurrency (see "make -j 10" in the description).

The modules.order file itself is not generated directly from the Lustre build, but appears to be part of the kernel build infrastructure:

[lustre-head]$ git grep modules.order
.gitignore:modules.order
build/.gitignore:/modules.order
ldiskfs/.gitignore:/modules.order
ldiskfs/autoMakefile.am:        rm -rf linux linux-stage ldiskfs*.h trace modules.order
lustre-iokit/.gitignore:/modules.order

[linux-git]$ git grep modules.order
.gitignore:modules.order
Documentation/dontdiff:modules.order
Documentation/kbuild/kbuild.rst:modules.order
Documentation/target/tcm_mod_builder.rst:    -rw-r--r-- 1 root root     49 2010-10-05 03:23 modules.order
Makefile:export MODORDER := $(extmod_prefix)modules.order
Makefile:       @sed 's:^\(.*\)\.o$$:kernel/\1.ko:' modules.order > $(MODLIB)/modules.order
Makefile:               -o -name '*.symtypes' -o -name 'modules.order' \
scripts/Makefile.build:subdir-modorder := $(sort $(filter %/modules.order, $(obj-m)))
scripts/Makefile.build:targets-for-modules += $(obj)/modules.order
scripts/Makefile.build:$(subdir-modorder): $(obj)/%/modules.order: $(obj)/% ;
scripts/Makefile.build:# Rule to create modules.order file
:

It seems possible that the modules.order generation is racy with higher concurrency, and one make thread is trying to access it without having a proper make dependency on it, while another make thread hasn't created it yet?

This is likely a bug in the kernel build system. It might be possible to work around this by executing something like "make modules.order" to build this file serially instead of using "make -j10 ...", but this command doesn't actually work.

Comment by Andreas Dilger [ 07/Sep/23 ]

Just digging a bit through the kernel build code, I found a few things that might be related.

I had removed modules.order at the start of a local build, and it appears that it was recreated during the later module linking stage:

# ls -l modules.order
ls: cannot access modules.order: No such file or directory
# fg
make -j10
  CC [M]  /usr/src/lustre-exa/lustre/obdclass/upcall_cache.o
  CC [M]  /usr/src/lustre-exa/lustre/ptlrpc/llog_server.o
  LD [M]  /usr/src/lustre-exa/lustre/quota/lquota.o
:
  CC [M]  /usr/src/lustre-exa/lustre/ptlrpc/gss/gss_keyring.o
  LD [M]  /usr/src/lustre-exa/lustre/ptlrpc/ptlrpc.o
  LD [M]  /usr/src/lustre-exa/lustre/ptlrpc/gss/ptlrpc_gss.o
  Building modules, stage 2.
  MODPOST 34 modules
  CC      /usr/src/lustre-exa/ldiskfs/ldiskfs.mod.o
  CC      /usr/src/lustre-exa/libcfs/libcfs/libcfs.mod.o
  CC      /usr/src/lustre-exa/lnet/klnds/o2iblnd/ko2iblnd.mod.o
:
^Z
[1]+  Stopped                 make -j10
[root@centos7 lustre-exa]# ls -l modules.order
4 -rwxrwx---. 1 root 1686 Sep  6 22:08 modules.order*

In Documentation/kbuild/kbuild.rst I was looking for MODPOST and found a few things of interest:

modules.order
-------------
This file records the order in which modules appear in Makefiles. This
is used by modprobe to deterministically resolve aliases that match
multiple modules.

KBUILD_MODPOST_NOFINAL
----------------------
KBUILD_MODPOST_NOFINAL can be set to skip the final link of modules.
This is solely useful to speed up test compiles.

KBUILD_VERBOSE
--------------
Set the kbuild verbosity. Can be assigned same values as "V=...".

Removing modules.order and running with "make V=1" produces output for all of the modules like:

make -f scripts/Makefile.build obj=/usr/src/lustre-exa
make -f scripts/Makefile.build obj=/usr/src/lustre-exa/ldiskfs
(cat /dev/null;   echo kernel//usr/src/lustre-exa/ldiskfs/ldiskfs.ko;) > /usr/src/lustre-exa/ldiskfs/modules.order
make -f scripts/Makefile.build obj=/usr/src/lustre-exa/libcfs
make -f scripts/Makefile.build obj=/usr/src/lustre-exa/libcfs/libcfs
(cat /dev/null;   echo kernel//usr/src/lustre-exa/libcfs/libcfs/libcfs.ko;) > /usr/src/lustre-exa/libcfs/libcfs/modules.order
(cat /dev/null;   cat /usr/src/lustre-exa/libcfs/libcfs/modules.order;) > /usr/src/lustre-exa/libcfs/modules.order
:
:
(cat /dev/null;   cat /usr/src/lustre-exa/lustre/fid/modules.order;   cat /usr/src/lustre-exa/lustre/obdclass/modules.order;   cat /usr/src/lustre-exa/lustre/ptlrpc/modules.order;   cat /usr/src/lustre-exa/lustre/obdecho/modules.order;   cat /usr/src/lustre-exa/lustre/mgc/modules.order;   cat /usr/src/lustre-exa/lustre/tests/kernel/modules.order;   cat /usr/src/lustre-exa/lustre/ost/modules.order;   cat /usr/src/lustre-exa/lustre/mgs/modules.order;   cat /usr/src/lustre-exa/lustre/mdt/modules.order;   cat /usr/src/lustre-exa/lustre/mdd/modules.order;   cat /usr/src/lustre-exa/lustre/ofd/modules.order;   cat /usr/src/lustre-exa/lustre/quota/modules.order;   cat /usr/src/lustre-exa/lustre/osp/modules.order;   cat /usr/src/lustre-exa/lustre/lod/modules.order;   cat /usr/src/lustre-exa/lustre/lfsck/modules.order;   cat /usr/src/lustre-exa/lustre/lov/modules.order;   cat /usr/src/lustre-exa/lustre/osc/modules.order;   cat /usr/src/lustre-exa/lustre/mdc/modules.order;   cat /usr/src/lustre-exa/lustre/lmv/modules.order;   cat /usr/src/lustre-exa/lustre/llite/modules.order;   cat /usr/src/lustre-exa/lustre/fld/modules.order;   cat /usr/src/lustre-exa/lustre/lz4/modules.order;   cat /usr/src/lustre-exa/lustre/gzip/modules.order;   cat /usr/src/lustre-exa/lustre/osd-ldiskfs/modules.order;) > /usr/src/lustre-exa/lustre/modules.order
(cat /dev/null;   cat /usr/src/lustre-exa/ldiskfs/modules.order;   cat /usr/src/lustre-exa/libcfs/modules.order;   cat /usr/src/lustre-exa/lnet/modules.order;   cat /usr/src/lustre-exa/lustre/modules.order;) > /usr/src/lustre-exa/modules.order
make -f ./scripts/Makefile.modpost
  find /usr/src/lustre-exa/.tmp_versions -name '*.mod' | xargs -r grep -h '\.ko$' | sort -u | sed 's/\.ko$/.o/' | scripts/mod/modpost -m -a -i ./Module.symvers -I /usr/src/lustre-exa/Module.symvers  -o /usr/src/lustre-exa/Module.symvers  -w -s -T -

so this is definitely the part of the build that is causing problems, but I couldn't see what was going wrong.

It seems likely that this behavior depends on the kernel which the modules were built against (I'm testing with el8.7),

Comment by Gerrit Updater [ 08/Sep/23 ]

"Jian Yu <yujian@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52323
Subject: LU-17095 build: avoid modules.order nonexistence failure
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 61e9a196313f788a1d86f4672be1fdad8e5298c2

Comment by Gerrit Updater [ 28/Sep/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52323/
Subject: LU-17095 build: avoid modules.order nonexistence failure
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: dbe4f860977455a9abe50165645a025bb6c46350

Comment by Peter Jones [ 28/Sep/23 ]

Landed for 2.16

Comment by Gerrit Updater [ 28/Sep/23 ]

"Jian Yu <yujian@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52545
Subject: LU-17095 build: avoid modules.order nonexistence failure
Project: fs/lustre-release
Branch: b2_15
Current Patch Set: 1
Commit: 3d581b2f5a2516c55b6c93033b6724f941fb5079

Comment by Gerrit Updater [ 23/Nov/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52545/
Subject: LU-17095 build: avoid modules.order nonexistence failure
Project: fs/lustre-release
Branch: b2_15
Current Patch Set:
Commit: 25b536a8c26c5209b1bd54e0cf9cf3aa0b829bad

Generated at Sat Feb 10 03:32:35 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.