Details
-
Bug
-
Resolution: Fixed
-
Major
-
None
-
None
-
3
-
9223372036854775807
Description
aborted mdtest job leaves unattached inodes:
Job Script: command started at Mon Mar 2 16:40:31 CST 2020 -- started at 03/02/2020 16:40:32 -- mdtest-1.9.3 was launched with 32 total task(s) on 1 node(s) Command line used: /cray/css/ostest/binaries/xt/rel.70.aries.cray/xtcnl/ostest/ROOT.latest/tests/gold/ioperf/mdtest/mdtest -f 32 -l 32 -n10000 -i1 -d /lus/snx11281/disk/ostest.vers/alsorun.20200302130005.752.saturn-p4/CL_mdtest_4s_fo.4.tjgw4U.1583188829/CL_mdtest_4s_fo -tuv V-1: main: Setting create/stat/read/remove_only to True V-1: Entering valid_tests... barriers : True collective_creates : False create_only : True dirpath(s): /lus/snx11281/disk/ostest.vers/alsorun.20200302130005.752.saturn-p4/CL_mdtest_4s_fo.4.tjgw4U.1583188829/CL_mdtest_4s_fo dirs_only : True read_bytes : 0 read_only : True first : 32 files_only : True iterations : 1 items_per_dir : 0 last : 32 leaf_only : False items : 10000 nstride : 0 pre_delay : 0 remove_only : False random_seed : 0 stride : 1 shared_file : False time_unique_dir_overhead: True stat_only : True unique_dir_per_task : True write_bytes : 0 sync_file : False depth : 0 V-1: Entering display_freespace... V-1: Entering show_file_system_size... Path: /lus/snx11281/disk/ostest.vers/alsorun.20200302130005.752.saturn-p4/CL_mdtest_4s_fo.4.tjgw4U.1583188829 FS: 483.4 TiB Used FS: 1.5% Inodes: 487.0 Mi Used Inodes: 0.1% 32 tasks, 320000 files/directories Operation Duration Rate --------- -------- ---- V-1: main: * iteration 1 * V-1: Entering create_remove_directory_tree, currDepth = 0... V-1: Entering create_remove_directory_tree, currDepth = 1... V-1: main: Tree creation : 0.011 sec, 88.658 ops/sec V-1: Entering directory_test... V-1: Entering unique_dir_access... V-1: Entering create_remove_items, currDepth = 0... V-1: Entering create_remove_items_helper... V-1: Entering unique_dir_access... V-1: Entering mdtest_stat... V-1: Entering unique_dir_access... V-1: Entering unique_dir_access... V-1: Entering create_remove_items, currDepth = 0... V-1: Entering create_remove_items_helper... V-1: Entering unique_dir_access... V-1: Directory creation: 25.644 sec, 12478.741 ops/sec V-1: Directory stat : 4.194 sec, 76299.371 ops/sec V-1: Directory removal : 8.918 sec, 35883.016 ops/sec V-1: Entering file_test... V-1: Entering unique_dir_access... V-1: Entering create_remove_items, currDepth = 0... V-1: Entering create_remove_items_helper... aprun: Apid 5441901: Caught signal Terminated, sending to application _pmiu_daemon(SIGCHLD): [NID 00545] [c0-1c2s8n1] [Mon Mar 2 18:15:52 2020] PE RANK 25 exit signal Terminated Application 5441901 exit codes: 143 Application 5441901 resources: utime ~0s, stime ~6s, Rss ~11768, inblocks ~0, outblocks ~0 Job Script: command stopped at Mon Mar 2 18:16:38 CST 2020 Job Script: command runtime was 5767 seconds
e2fsck logs:
[root@snx11281n000 ~]# grep Unattached /home/admin/e2fsck.* /home/admin/e2fsck.snx11281n002.3.4-010.81.202003091157.out:Unattached inode 3367056499 /home/admin/e2fsck.snx11281n002.3.4-010.81.202003091157.out:Unattached inode 3367056500 /home/admin/e2fsck.snx11281n002.3.4-010.81.202003091157.out:Unattached inode 3367056501 /home/admin/e2fsck.snx11281n002.3.4-010.81.202003091157.out:Unattached inode 3367056507 /home/admin/e2fsck.snx11281n002.3.4-010.81.202003091157.out:Unattached inode 3367056509 /home/admin/e2fsck.snx11281n002.3.4-010.81.202003091157.out:Unattached inode 3367056510 [root@snx11281n000 ~]#
unattached inode stat:
debugfs: stat <3367056499> Inode: 3367056499 Type: regular Mode: 0644 Flags: 0x0 Generation: 3271840411 Version: 0x000001ee:00bad136 User: 1356 Group: 11121 Project: 0 Size: 0 File ACL: 0 Links: 1 Blockcount: 0 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x5e5d9f91:00000000 -- Mon Mar 2 18:06:41 2020 atime: 0x5e5d9f91:00000000 -- Mon Mar 2 18:06:41 2020 mtime: 0x5e5d9f91:00000000 -- Mon Mar 2 18:06:41 2020 crtime: 0x5e5da1e3:c1b98cf8 -- Mon Mar 2 18:16:35 2020 Size of extra inode fields: 32 Extended attributes: trusted.lma (24) = 00 00 00 00 00 00 00 00 6e 11 10 00 02 00 00 00 84 46 01 00 00 00 00 00 lma: fid=[0x20010116e:0x14684:0x0] compat=0 incompat=0 trusted.lov (144) trusted.link (60) BLOCKS: debugfs: stat <3367056500> Inode: 3367056500 Type: regular Mode: 0644 Flags: 0x0 Generation: 3271840414 Version: 0x000001ee:00bad13a User: 1356 Group: 11121 Project: 0 Size: 0 File ACL: 0 Links: 1 Blockcount: 0 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x5e5d9f91:00000000 -- Mon Mar 2 18:06:41 2020 atime: 0x5e5d9f91:00000000 -- Mon Mar 2 18:06:41 2020 mtime: 0x5e5d9f91:00000000 -- Mon Mar 2 18:06:41 2020 crtime: 0x5e5da1e3:c1b98cf8 -- Mon Mar 2 18:16:35 2020 Size of extra inode fields: 32 Extended attributes: trusted.lma (24) = 00 00 00 00 00 00 00 00 6e 11 10 00 02 00 00 00 85 46 01 00 00 00 00 00 lma: fid=[0x20010116e:0x14685:0x0] compat=0 incompat=0 trusted.lov (144) trusted.link (61) BLOCKS: debugfs: stat <3367056501> Inode: 3367056501 Type: regular Mode: 0644 Flags: 0x0 Generation: 3271840413 Version: 0x000001ee:00bad143 User: 1356 Group: 11121 Project: 0 Size: 0 File ACL: 0 Links: 1 Blockcount: 0 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x5e5d9f91:00000000 -- Mon Mar 2 18:06:41 2020 atime: 0x5e5d9f91:00000000 -- Mon Mar 2 18:06:41 2020 mtime: 0x5e5d9f91:00000000 -- Mon Mar 2 18:06:41 2020 crtime: 0x5e5da1e3:c1b98cf8 -- Mon Mar 2 18:16:35 2020 Size of extra inode fields: 32 Extended attributes: trusted.lma (24) = 00 00 00 00 00 00 00 00 6e 11 10 00 02 00 00 00 83 46 01 00 00 00 00 00 lma: fid=[0x20010116e:0x14683:0x0] compat=0 incompat=0 trusted.lov (96) trusted.link (61) BLOCKS: debugfs: stat <3367056507> Inode: 3367056507 Type: regular Mode: 0644 Flags: 0x0 Generation: 3271840428 Version: 0x000001ee:00bad146 User: 1356 Group: 11121 Project: 0 Size: 0 File ACL: 0 Links: 1 Blockcount: 0 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x5e5d9f96:00000000 -- Mon Mar 2 18:06:46 2020 atime: 0x5e5d9f96:00000000 -- Mon Mar 2 18:06:46 2020 mtime: 0x5e5d9f96:00000000 -- Mon Mar 2 18:06:46 2020 crtime: 0x5e5da1e3:c1f69600 -- Mon Mar 2 18:16:35 2020 Size of extra inode fields: 32 Extended attributes: trusted.lma (24) = 00 00 00 00 00 00 00 00 6e 11 10 00 02 00 00 00 98 46 01 00 00 00 00 00 lma: fid=[0x20010116e:0x14698:0x0] compat=0 incompat=0 trusted.lov (144) trusted.link (61) BLOCKS:
dmesg contains the following Lustre error message at the moment of inode creation:
Mar 2 18:16:35 snx11281n002 kernel: LustreError: 2640:0:(osd_handler.c:2009:osd_trans_stop()) snx11281-MDT0000: failed in transaction hook: rc = -114
LFSCK was able to successfully re-attach the inodes:
00100000:10000000:1.0:1583880053.552740:0:31921:0:(lfsck_namespace.c:1243:lfsck_namespace_insert_normal()) snx11281-MDT0000-osd: namespace LFSCK insert object [0x20010116e:0x14682:0x0] with the name file.mdtest.17.4708 and type 100000 to the parent [0x200101166:0x1731:0x0]: rc = 1 00100000:10000000:1.0:1583880053.626588:0:31921:0:(lfsck_namespace.c:1243:lfsck_namespace_insert_normal()) snx11281-MDT0000-osd: namespace LFSCK insert object [0x20010116e:0x14683:0x0] with the name file.mdtest.11.4748 and type 100000 to the parent [0x200101166:0x1727:0x0]: rc = 1 00100000:10000000:9.0:1583880053.699071:0:31921:0:(lfsck_namespace.c:1243:lfsck_namespace_insert_normal()) snx11281-MDT0000-osd: namespace LFSCK insert object [0x20010116e:0x14684:0x0] with the name file.mdtest.0.4749 and type 100000 to the parent [0x200101166:0x1730:0x0]: rc = 1 00100000:10000000:6.0F:1583880053.764322:0:31921:0:(lfsck_namespace.c:1243:lfsck_namespace_insert_normal()) snx11281-MDT0000-osd: namespace LFSCK insert object [0x20010116e:0x14685:0x0] with the name file.mdtest.25.4668 and type 100000 to the parent [0x200101166:0x1721:0x0]: rc = 1 00100000:10000000:7.0:1583880053.830815:0:31921:0:(lfsck_namespace.c:1243:lfsck_namespace_insert_normal()) snx11281-MDT0000-osd: namespace LFSCK insert object [0x20010116e:0x14686:0x0] with the name file.mdtest.26.4699 and type 100000 to the parent [0x200101166:0x1725:0x0]: rc = 1 00100000:10000000:9.0:1583880053.963978:0:31921:0:(lfsck_namespace.c:1243:lfsck_namespace_insert_normal()) snx11281-MDT0000-osd: namespace LFSCK insert object [0x20010116e:0x14698:0x0] with the name file.mdtest.22.4696 and type 100000 to the parent [0x200101166:0x1716:0x0]: rc = 1
Landed for 2.14