[LU-13389] aborted file create may leave unattached inodes on MDS. Created: 25/Mar/20 Updated: 07/May/20 Resolved: 07/May/20 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.14.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Alexander Zarochentsev | Assignee: | Alexander Zarochentsev |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
aborted mdtest job leaves unattached inodes: Job Script: command started at Mon Mar 2 16:40:31 CST 2020
-- started at 03/02/2020 16:40:32 --
mdtest-1.9.3 was launched with 32 total task(s) on 1 node(s)
Command line used: /cray/css/ostest/binaries/xt/rel.70.aries.cray/xtcnl/ostest/ROOT.latest/tests/gold/ioperf/mdtest/mdtest -f 32 -l 32 -n10000 -i1 -d /lus/snx11281/disk/ostest.vers/alsorun.20200302130005.752.saturn-p4/CL_mdtest_4s_fo.4.tjgw4U.1583188829/CL_mdtest_4s_fo -tuv
V-1: main: Setting create/stat/read/remove_only to True
V-1: Entering valid_tests...
barriers : True
collective_creates : False
create_only : True
dirpath(s):
/lus/snx11281/disk/ostest.vers/alsorun.20200302130005.752.saturn-p4/CL_mdtest_4s_fo.4.tjgw4U.1583188829/CL_mdtest_4s_fo
dirs_only : True
read_bytes : 0
read_only : True
first : 32
files_only : True
iterations : 1
items_per_dir : 0
last : 32
leaf_only : False
items : 10000
nstride : 0
pre_delay : 0
remove_only : False
random_seed : 0
stride : 1
shared_file : False
time_unique_dir_overhead: True
stat_only : True
unique_dir_per_task : True
write_bytes : 0
sync_file : False
depth : 0
V-1: Entering display_freespace...
V-1: Entering show_file_system_size...
Path: /lus/snx11281/disk/ostest.vers/alsorun.20200302130005.752.saturn-p4/CL_mdtest_4s_fo.4.tjgw4U.1583188829
FS: 483.4 TiB Used FS: 1.5% Inodes: 487.0 Mi Used Inodes: 0.1%
32 tasks, 320000 files/directories
Operation Duration Rate
--------- -------- ----
V-1: main: * iteration 1 *
V-1: Entering create_remove_directory_tree, currDepth = 0...
V-1: Entering create_remove_directory_tree, currDepth = 1...
V-1: main: Tree creation : 0.011 sec, 88.658 ops/sec
V-1: Entering directory_test...
V-1: Entering unique_dir_access...
V-1: Entering create_remove_items, currDepth = 0...
V-1: Entering create_remove_items_helper...
V-1: Entering unique_dir_access...
V-1: Entering mdtest_stat...
V-1: Entering unique_dir_access...
V-1: Entering unique_dir_access...
V-1: Entering create_remove_items, currDepth = 0...
V-1: Entering create_remove_items_helper...
V-1: Entering unique_dir_access...
V-1: Directory creation: 25.644 sec, 12478.741 ops/sec
V-1: Directory stat : 4.194 sec, 76299.371 ops/sec
V-1: Directory removal : 8.918 sec, 35883.016 ops/sec
V-1: Entering file_test...
V-1: Entering unique_dir_access...
V-1: Entering create_remove_items, currDepth = 0...
V-1: Entering create_remove_items_helper...
aprun: Apid 5441901: Caught signal Terminated, sending to application
_pmiu_daemon(SIGCHLD): [NID 00545] [c0-1c2s8n1] [Mon Mar 2 18:15:52 2020] PE RANK 25 exit signal Terminated
Application 5441901 exit codes: 143
Application 5441901 resources: utime ~0s, stime ~6s, Rss ~11768, inblocks ~0, outblocks ~0
Job Script: command stopped at Mon Mar 2 18:16:38 CST 2020
Job Script: command runtime was 5767 seconds
e2fsck logs: [root@snx11281n000 ~]# grep Unattached /home/admin/e2fsck.* /home/admin/e2fsck.snx11281n002.3.4-010.81.202003091157.out:Unattached inode 3367056499 /home/admin/e2fsck.snx11281n002.3.4-010.81.202003091157.out:Unattached inode 3367056500 /home/admin/e2fsck.snx11281n002.3.4-010.81.202003091157.out:Unattached inode 3367056501 /home/admin/e2fsck.snx11281n002.3.4-010.81.202003091157.out:Unattached inode 3367056507 /home/admin/e2fsck.snx11281n002.3.4-010.81.202003091157.out:Unattached inode 3367056509 /home/admin/e2fsck.snx11281n002.3.4-010.81.202003091157.out:Unattached inode 3367056510 [root@snx11281n000 ~]# unattached inode stat: debugfs: stat <3367056499> Inode: 3367056499 Type: regular Mode: 0644 Flags: 0x0 Generation: 3271840411 Version: 0x000001ee:00bad136 User: 1356 Group: 11121 Project: 0 Size: 0 File ACL: 0 Links: 1 Blockcount: 0 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x5e5d9f91:00000000 -- Mon Mar 2 18:06:41 2020 atime: 0x5e5d9f91:00000000 -- Mon Mar 2 18:06:41 2020 mtime: 0x5e5d9f91:00000000 -- Mon Mar 2 18:06:41 2020 crtime: 0x5e5da1e3:c1b98cf8 -- Mon Mar 2 18:16:35 2020 Size of extra inode fields: 32 Extended attributes: trusted.lma (24) = 00 00 00 00 00 00 00 00 6e 11 10 00 02 00 00 00 84 46 01 00 00 00 00 00 lma: fid=[0x20010116e:0x14684:0x0] compat=0 incompat=0 trusted.lov (144) trusted.link (60) BLOCKS: debugfs: stat <3367056500> Inode: 3367056500 Type: regular Mode: 0644 Flags: 0x0 Generation: 3271840414 Version: 0x000001ee:00bad13a User: 1356 Group: 11121 Project: 0 Size: 0 File ACL: 0 Links: 1 Blockcount: 0 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x5e5d9f91:00000000 -- Mon Mar 2 18:06:41 2020 atime: 0x5e5d9f91:00000000 -- Mon Mar 2 18:06:41 2020 mtime: 0x5e5d9f91:00000000 -- Mon Mar 2 18:06:41 2020 crtime: 0x5e5da1e3:c1b98cf8 -- Mon Mar 2 18:16:35 2020 Size of extra inode fields: 32 Extended attributes: trusted.lma (24) = 00 00 00 00 00 00 00 00 6e 11 10 00 02 00 00 00 85 46 01 00 00 00 00 00 lma: fid=[0x20010116e:0x14685:0x0] compat=0 incompat=0 trusted.lov (144) trusted.link (61) BLOCKS: debugfs: stat <3367056501> Inode: 3367056501 Type: regular Mode: 0644 Flags: 0x0 Generation: 3271840413 Version: 0x000001ee:00bad143 User: 1356 Group: 11121 Project: 0 Size: 0 File ACL: 0 Links: 1 Blockcount: 0 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x5e5d9f91:00000000 -- Mon Mar 2 18:06:41 2020 atime: 0x5e5d9f91:00000000 -- Mon Mar 2 18:06:41 2020 mtime: 0x5e5d9f91:00000000 -- Mon Mar 2 18:06:41 2020 crtime: 0x5e5da1e3:c1b98cf8 -- Mon Mar 2 18:16:35 2020 Size of extra inode fields: 32 Extended attributes: trusted.lma (24) = 00 00 00 00 00 00 00 00 6e 11 10 00 02 00 00 00 83 46 01 00 00 00 00 00 lma: fid=[0x20010116e:0x14683:0x0] compat=0 incompat=0 trusted.lov (96) trusted.link (61) BLOCKS: debugfs: stat <3367056507> Inode: 3367056507 Type: regular Mode: 0644 Flags: 0x0 Generation: 3271840428 Version: 0x000001ee:00bad146 User: 1356 Group: 11121 Project: 0 Size: 0 File ACL: 0 Links: 1 Blockcount: 0 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x5e5d9f96:00000000 -- Mon Mar 2 18:06:46 2020 atime: 0x5e5d9f96:00000000 -- Mon Mar 2 18:06:46 2020 mtime: 0x5e5d9f96:00000000 -- Mon Mar 2 18:06:46 2020 crtime: 0x5e5da1e3:c1f69600 -- Mon Mar 2 18:16:35 2020 Size of extra inode fields: 32 Extended attributes: trusted.lma (24) = 00 00 00 00 00 00 00 00 6e 11 10 00 02 00 00 00 98 46 01 00 00 00 00 00 lma: fid=[0x20010116e:0x14698:0x0] compat=0 incompat=0 trusted.lov (144) trusted.link (61) BLOCKS: dmesg contains the following Lustre error message at the moment of inode creation: Mar 2 18:16:35 snx11281n002 kernel: LustreError: 2640:0:(osd_handler.c:2009:osd_trans_stop()) snx11281-MDT0000: failed in transaction hook: rc = -114 LFSCK was able to successfully re-attach the inodes: 00100000:10000000:1.0:1583880053.552740:0:31921:0:(lfsck_namespace.c:1243:lfsck_namespace_insert_normal()) snx11281-MDT0000-osd: namespace LFSCK insert object [0x20010116e:0x14682:0x0] with the name file.mdtest.17.4708 and type 100000 to the parent [0x200101166:0x1731:0x0]: rc = 1 00100000:10000000:1.0:1583880053.626588:0:31921:0:(lfsck_namespace.c:1243:lfsck_namespace_insert_normal()) snx11281-MDT0000-osd: namespace LFSCK insert object [0x20010116e:0x14683:0x0] with the name file.mdtest.11.4748 and type 100000 to the parent [0x200101166:0x1727:0x0]: rc = 1 00100000:10000000:9.0:1583880053.699071:0:31921:0:(lfsck_namespace.c:1243:lfsck_namespace_insert_normal()) snx11281-MDT0000-osd: namespace LFSCK insert object [0x20010116e:0x14684:0x0] with the name file.mdtest.0.4749 and type 100000 to the parent [0x200101166:0x1730:0x0]: rc = 1 00100000:10000000:6.0F:1583880053.764322:0:31921:0:(lfsck_namespace.c:1243:lfsck_namespace_insert_normal()) snx11281-MDT0000-osd: namespace LFSCK insert object [0x20010116e:0x14685:0x0] with the name file.mdtest.25.4668 and type 100000 to the parent [0x200101166:0x1721:0x0]: rc = 1 00100000:10000000:7.0:1583880053.830815:0:31921:0:(lfsck_namespace.c:1243:lfsck_namespace_insert_normal()) snx11281-MDT0000-osd: namespace LFSCK insert object [0x20010116e:0x14686:0x0] with the name file.mdtest.26.4699 and type 100000 to the parent [0x200101166:0x1725:0x0]: rc = 1 00100000:10000000:9.0:1583880053.963978:0:31921:0:(lfsck_namespace.c:1243:lfsck_namespace_insert_normal()) snx11281-MDT0000-osd: namespace LFSCK insert object [0x20010116e:0x14698:0x0] with the name file.mdtest.22.4696 and type 100000 to the parent [0x200101166:0x1716:0x0]: rc = 1 |
| Comments |
| Comment by Alexander Zarochentsev [ 25/Mar/20 ] |
|
The issue is a regression from A check for "obsolete" requests introduced in The "rq_obsolete" flag causes osd_txn_stop() to return error
tgt_add_reply_data():
...
if (req->rq_obsolete) {
mutex_unlock(&ted->ted_lcd_lock);
RETURN(-EALREADY);
}
full call stack looks as
and mdd_create() attempts to rollback the transaction:
out_stop:
rc2 = mdd_trans_stop(env, mdd, rc, handle);
if (rc == 0) {
/* If creation fails, it is most likely due to the remote update
* failure, because local transaction will mostly succeed at
* this stage. There is no easy way to rollback all of previous
* updates, so let's remove the object from namespace, and
* LFSCK should handle the orphan object. */
if (rc2 < 0 && !mdd_object_remote(mdd_pobj))
mdd_index_delete(env, mdd_pobj, attr, lname);
rc = rc2;
}
Unfortunately it is done partially, only name is removed. There are two bad side effects from that: fs inconsistency with unattached inodes and LFSCK is able to re-attach the inodes back to the namespace which is a bit unexpected. I should note those "obsolete" requests may not survive a failover as the client already dropped the request from all import's queues. It means the request cannot participate in recovery and can be lost after a failover. So transaction rollback doesn't look as a wrong approach, it reduces chances of seeing fs object loss after failover. Also nobody depends on successful completion of the file create, the application already aborted. I see several possible solutions to the problem: 1. rollback the transaction fully, that includes inode deletion and changelog record update or deletion. |
| Comment by Gerrit Updater [ 14/Apr/20 ] |
|
Alexander Zarochentsev (alexander.zarochentsev@hpe.com) uploaded a new patch: https://review.whamcloud.com/38221 |
| Comment by Gerrit Updater [ 07/May/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38221/ |
| Comment by Peter Jones [ 07/May/20 ] |
|
Landed for 2.14 |