Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.15.6
-
kernel-4.18.0-553.34.1.1toss.t4.x86_64
zfs-2.2.7_1llnl-2.t4.x86_64
lustre-2.15.6_4.llnl-2.t4.x86_64
-
3
-
9223372036854775807
Description
We have many nodes that mount multiple Lustre file systems. When a user issues lfs commands target a particular file system, for example, lfs path2fid, the code enters get_root_path_slow(), and a statx() call is issued against each mounted Lustre file system until the target file system is reached. If the client is disconnected from MDT0000 for one of those file systems, the statx() hangs before reaching the target file system.
[root@rzslic9:~]# strace -ftT lfs fid2path /p/lustre1 [0x24006b523:0x5ecf:0x0] ... 11:37:09 read(3, "latime,vers=3,rsize=65536,wsize="..., 1024) = 1024 <0.000050> 11:37:09 read(3, "ize=65536,wsize=65536,namlen=255"..., 1024) = 1024 <0.000030> 11:37:09 read(3, "e=65536,namlen=255,hard,proto=tc"..., 1024) = 1024 <0.000033> 11:37:09 read(3, "=255,hard,proto=tcp,timeo=600,re"..., 1024) = 1024 <0.000035> 11:37:09 read(3, "255,hard,proto=tcp,timeo=600,ret"..., 1024) = 1024 <0.000044> 11:37:09 statx(AT_FDCWD, "/p/czlustre1", AT_STATX_SYNC_AS_STAT, 0, {stx_mask=0, stx_attributes=0, stx_mode=S_IFDIR|0755, stx_size=1074176, ...}) = 0 <0.000049> 11:37:09 statx(AT_FDCWD, "/p/czlustre2", AT_STATX_SYNC_AS_STAT, 0, {stx_mask=0, stx_attributes=0, stx_mode=S_IFDIR|0755, stx_size=1057280, ...}) = 0 <0.000023> 11:37:09 statx(AT_FDCWD, "/p/czlustre3", AT_STATX_SYNC_AS_STAT, 0, {stx_mask=0, stx_attributes=0, stx_mode=S_IFDIR|0755, stx_size=1073664, ...}) = 0 <0.000018> 11:37:09 statx(AT_FDCWD, "/p/czlustre4", AT_STATX_SYNC_AS_STAT, 0, <<HUNG HERE>>
another affected example is lfs df <mountpoint>. The stack is:
(gdb) bt #0 0x00001555539afedf in statx () from /lib64/libc.so.6 #1 0x00001555550ec3ef in get_file_dev (path=<optimized out>, dev=0x7fffffff8b38) at liblustreapi.c:1222 #2 0x00001555550f44c8 in get_root_path_slow (want=want@entry=3, fsname=fsname@entry=0x7fffffffbd60 "", outfd=outfd@entry=0x0, path=path@entry=0x7fffffff9d60 "/p/lustre1", index=index@entry=-1, dev=dev@entry=0x0, nid=0x0) at liblustreapi.c:1357 #3 0x00001555550f4aac in get_root_path (want=3, fsname=0x7fffffffbd60 "", outfd=0x0, path=0x7fffffff9d60 "/p/lustre1", index=-1, dev=0x0, nid=0x0) at liblustreapi.c:1444 #4 0x00001555550f4c69 in llapi_search_mounts (pathname=pathname@entry=0x7fffffffad60 "/p/lustre1", index=index@entry=0, mntdir=mntdir@entry=0x7fffffff9d60 "/p/lustre1", fsname=fsname@entry=0x7fffffffbd60 "") at liblustreapi.c:1487 #5 0x000000000040e152 in lfs_df (argc=<optimized out>, argv=0x7fffffffcec0) at lfs.c:7269 #6 0x0000155555105511 in Parser_execarg (argc=argc@entry=2, argv=argv@entry=0x7fffffffcec0, cmds=cmds@entry=0x62d6e0 <cmdlist>) at util/parser.c:118 #7 0x0000000000404e6c in main (argc=3, argv=0x7fffffffceb8) at lfs.c:12737
"Olaf Faaland <faaland1@llnl.gov>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/58321
Subject:
LU-18738utils: avoid statx() of root of mounted FSProject: fs/lustre-release
Branch: b2_15
Current Patch Set: 1
Commit: 342d738245897f8e10079b27495ebe91d1b0ba69