Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13389

aborted file create may leave unattached inodes on MDS.

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.14.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      aborted mdtest job leaves unattached inodes:

      Job Script: command started at Mon Mar 2 16:40:31 CST 2020
      
      -- started at 03/02/2020 16:40:32 --
      
       
      
      mdtest-1.9.3 was launched with 32 total task(s) on 1 node(s)
      
      Command line used: /cray/css/ostest/binaries/xt/rel.70.aries.cray/xtcnl/ostest/ROOT.latest/tests/gold/ioperf/mdtest/mdtest -f 32 -l 32 -n10000 -i1 -d /lus/snx11281/disk/ostest.vers/alsorun.20200302130005.752.saturn-p4/CL_mdtest_4s_fo.4.tjgw4U.1583188829/CL_mdtest_4s_fo -tuv
      
      V-1: main: Setting create/stat/read/remove_only to True
      
      V-1: Entering valid_tests...
      
      barriers                : True
      
      collective_creates      : False
      
      create_only             : True
      
      dirpath(s):
      
          /lus/snx11281/disk/ostest.vers/alsorun.20200302130005.752.saturn-p4/CL_mdtest_4s_fo.4.tjgw4U.1583188829/CL_mdtest_4s_fo
      
      dirs_only               : True
      
      read_bytes              : 0
      
      read_only               : True
      
      first                   : 32
      
      files_only              : True
      
      iterations              : 1
      
      items_per_dir           : 0
      
      last                    : 32
      
      leaf_only               : False
      
      items                   : 10000
      
      nstride                 : 0
      
      pre_delay               : 0
      
      remove_only             : False
      
      random_seed             : 0
      
      stride                  : 1
      
      shared_file             : False
      
      time_unique_dir_overhead: True
      
      stat_only               : True
      
      unique_dir_per_task     : True
      
      write_bytes             : 0
      
      sync_file               : False
      
      depth                   : 0
      
      V-1: Entering display_freespace...
      
      V-1: Entering show_file_system_size...
      
      Path: /lus/snx11281/disk/ostest.vers/alsorun.20200302130005.752.saturn-p4/CL_mdtest_4s_fo.4.tjgw4U.1583188829
      
      FS: 483.4 TiB   Used FS: 1.5%   Inodes: 487.0 Mi   Used Inodes: 0.1%
      
       
      
      32 tasks, 320000 files/directories
      
       
      
         Operation               Duration              Rate
      
         ---------               --------              ----
      
      V-1: main: * iteration 1 *
      
      V-1: Entering create_remove_directory_tree, currDepth = 0...
      
      V-1: Entering create_remove_directory_tree, currDepth = 1...
      
      V-1: main:   Tree creation     :          0.011 sec,         88.658 ops/sec
      
      V-1: Entering directory_test...
      
      V-1: Entering unique_dir_access...
      
      V-1: Entering create_remove_items, currDepth = 0...
      
      V-1: Entering create_remove_items_helper...
      
      V-1: Entering unique_dir_access...
      
      V-1: Entering mdtest_stat...
      
      V-1: Entering unique_dir_access...
      
      V-1: Entering unique_dir_access...
      
      V-1: Entering create_remove_items, currDepth = 0...
      
      V-1: Entering create_remove_items_helper...
      
      V-1: Entering unique_dir_access...
      
      V-1:   Directory creation:         25.644 sec,      12478.741 ops/sec
      
      V-1:   Directory stat    :          4.194 sec,      76299.371 ops/sec
      
      V-1:   Directory removal :          8.918 sec,      35883.016 ops/sec
      
      V-1: Entering file_test...
      
      V-1: Entering unique_dir_access...
      
      V-1: Entering create_remove_items, currDepth = 0...
      
      V-1: Entering create_remove_items_helper...
      
      aprun: Apid 5441901: Caught signal Terminated, sending to application
      
      _pmiu_daemon(SIGCHLD): [NID 00545] [c0-1c2s8n1] [Mon Mar  2 18:15:52 2020] PE RANK 25 exit signal Terminated
      
      Application 5441901 exit codes: 143
      
      Application 5441901 resources: utime ~0s, stime ~6s, Rss ~11768, inblocks ~0, outblocks ~0
      
      Job Script: command stopped at Mon Mar 2 18:16:38 CST 2020
      
      Job Script: command runtime was 5767 seconds
      
      

      e2fsck logs:

      [root@snx11281n000 ~]# grep Unattached /home/admin/e2fsck.*
      /home/admin/e2fsck.snx11281n002.3.4-010.81.202003091157.out:Unattached inode 3367056499
      /home/admin/e2fsck.snx11281n002.3.4-010.81.202003091157.out:Unattached inode 3367056500
      /home/admin/e2fsck.snx11281n002.3.4-010.81.202003091157.out:Unattached inode 3367056501
      /home/admin/e2fsck.snx11281n002.3.4-010.81.202003091157.out:Unattached inode 3367056507
      /home/admin/e2fsck.snx11281n002.3.4-010.81.202003091157.out:Unattached inode 3367056509
      /home/admin/e2fsck.snx11281n002.3.4-010.81.202003091157.out:Unattached inode 3367056510
      [root@snx11281n000 ~]#
      

      unattached inode stat:

      debugfs:  stat <3367056499>
      
      Inode: 3367056499   Type: regular    Mode:  0644   Flags: 0x0
      
      Generation: 3271840411    Version: 0x000001ee:00bad136
      
      User:  1356   Group: 11121   Project:     0   Size: 0
      
      File ACL: 0
      
      Links: 1   Blockcount: 0
      
      Fragment:  Address: 0    Number: 0    Size: 0
      
       ctime: 0x5e5d9f91:00000000 -- Mon Mar  2 18:06:41 2020
      
       atime: 0x5e5d9f91:00000000 -- Mon Mar  2 18:06:41 2020
      
       mtime: 0x5e5d9f91:00000000 -- Mon Mar  2 18:06:41 2020
      
      crtime: 0x5e5da1e3:c1b98cf8 -- Mon Mar  2 18:16:35 2020
      
      Size of extra inode fields: 32
      
      Extended attributes:
      
        trusted.lma (24) = 00 00 00 00 00 00 00 00 6e 11 10 00 02 00 00 00 84 46 01 00 00 00 00 00
      
        lma: fid=[0x20010116e:0x14684:0x0] compat=0 incompat=0
      
        trusted.lov (144)
      
        trusted.link (60)
      
      BLOCKS:
      
      
      
      debugfs:  stat <3367056500>
      
      Inode: 3367056500   Type: regular    Mode:  0644   Flags: 0x0
      
      Generation: 3271840414    Version: 0x000001ee:00bad13a
      
      User:  1356   Group: 11121   Project:     0   Size: 0
      
      File ACL: 0
      
      Links: 1   Blockcount: 0
      
      Fragment:  Address: 0    Number: 0    Size: 0
      
       ctime: 0x5e5d9f91:00000000 -- Mon Mar  2 18:06:41 2020
      
       atime: 0x5e5d9f91:00000000 -- Mon Mar  2 18:06:41 2020
      
       mtime: 0x5e5d9f91:00000000 -- Mon Mar  2 18:06:41 2020
      
      crtime: 0x5e5da1e3:c1b98cf8 -- Mon Mar  2 18:16:35 2020
      
      Size of extra inode fields: 32
      
      Extended attributes:
      
        trusted.lma (24) = 00 00 00 00 00 00 00 00 6e 11 10 00 02 00 00 00 85 46 01 00 00 00 00 00
      
        lma: fid=[0x20010116e:0x14685:0x0] compat=0 incompat=0
      
        trusted.lov (144)
      
        trusted.link (61)
      
      BLOCKS:
      
       
      
      
      debugfs:  stat <3367056501>
      
      Inode: 3367056501   Type: regular    Mode:  0644   Flags: 0x0
      
      Generation: 3271840413    Version: 0x000001ee:00bad143
      
      User:  1356   Group: 11121   Project:     0   Size: 0
      
      File ACL: 0
      
      Links: 1   Blockcount: 0
      
      Fragment:  Address: 0    Number: 0    Size: 0
      
       ctime: 0x5e5d9f91:00000000 -- Mon Mar  2 18:06:41 2020
      
       atime: 0x5e5d9f91:00000000 -- Mon Mar  2 18:06:41 2020
      
       mtime: 0x5e5d9f91:00000000 -- Mon Mar  2 18:06:41 2020
      
      crtime: 0x5e5da1e3:c1b98cf8 -- Mon Mar  2 18:16:35 2020
      
      Size of extra inode fields: 32
      
      Extended attributes:
      
        trusted.lma (24) = 00 00 00 00 00 00 00 00 6e 11 10 00 02 00 00 00 83 46 01 00 00 00 00 00
      
        lma: fid=[0x20010116e:0x14683:0x0] compat=0 incompat=0
      
        trusted.lov (96)
      
        trusted.link (61)
      
      BLOCKS:
      
       
      
      
      debugfs:  stat <3367056507>
      
      Inode: 3367056507   Type: regular    Mode:  0644   Flags: 0x0
      
      Generation: 3271840428    Version: 0x000001ee:00bad146
      
      User:  1356   Group: 11121   Project:     0   Size: 0
      
      File ACL: 0
      
      Links: 1   Blockcount: 0
      
      Fragment:  Address: 0    Number: 0    Size: 0
      
       ctime: 0x5e5d9f96:00000000 -- Mon Mar  2 18:06:46 2020
      
       atime: 0x5e5d9f96:00000000 -- Mon Mar  2 18:06:46 2020
      
       mtime: 0x5e5d9f96:00000000 -- Mon Mar  2 18:06:46 2020
      
      crtime: 0x5e5da1e3:c1f69600 -- Mon Mar  2 18:16:35 2020
      
      Size of extra inode fields: 32
      
      Extended attributes:
      
        trusted.lma (24) = 00 00 00 00 00 00 00 00 6e 11 10 00 02 00 00 00 98 46 01 00 00 00 00 00
      
        lma: fid=[0x20010116e:0x14698:0x0] compat=0 incompat=0
      
        trusted.lov (144)
      
        trusted.link (61)
      
      BLOCKS:
      
       
      

      dmesg contains the following Lustre error message at the moment of inode creation:

      Mar  2 18:16:35 snx11281n002 kernel: LustreError: 2640:0:(osd_handler.c:2009:osd_trans_stop()) snx11281-MDT0000: failed in transaction hook: rc = -114
      

      LFSCK was able to successfully re-attach the inodes:

      00100000:10000000:1.0:1583880053.552740:0:31921:0:(lfsck_namespace.c:1243:lfsck_namespace_insert_normal()) snx11281-MDT0000-osd: namespace LFSCK insert object [0x20010116e:0x14682:0x0] with the name file.mdtest.17.4708 and type 100000 to the parent [0x200101166:0x1731:0x0]: rc = 1
      00100000:10000000:1.0:1583880053.626588:0:31921:0:(lfsck_namespace.c:1243:lfsck_namespace_insert_normal()) snx11281-MDT0000-osd: namespace LFSCK insert object [0x20010116e:0x14683:0x0] with the name file.mdtest.11.4748 and type 100000 to the parent [0x200101166:0x1727:0x0]: rc = 1
      00100000:10000000:9.0:1583880053.699071:0:31921:0:(lfsck_namespace.c:1243:lfsck_namespace_insert_normal()) snx11281-MDT0000-osd: namespace LFSCK insert object [0x20010116e:0x14684:0x0] with the name file.mdtest.0.4749 and type 100000 to the parent [0x200101166:0x1730:0x0]: rc = 1
      00100000:10000000:6.0F:1583880053.764322:0:31921:0:(lfsck_namespace.c:1243:lfsck_namespace_insert_normal()) snx11281-MDT0000-osd: namespace LFSCK insert object [0x20010116e:0x14685:0x0] with the name file.mdtest.25.4668 and type 100000 to the parent [0x200101166:0x1721:0x0]: rc = 1
      00100000:10000000:7.0:1583880053.830815:0:31921:0:(lfsck_namespace.c:1243:lfsck_namespace_insert_normal()) snx11281-MDT0000-osd: namespace LFSCK insert object [0x20010116e:0x14686:0x0] with the name file.mdtest.26.4699 and type 100000 to the parent [0x200101166:0x1725:0x0]: rc = 1
      00100000:10000000:9.0:1583880053.963978:0:31921:0:(lfsck_namespace.c:1243:lfsck_namespace_insert_normal()) snx11281-MDT0000-osd: namespace LFSCK insert object [0x20010116e:0x14698:0x0] with the name file.mdtest.22.4696 and type 100000 to the parent [0x200101166:0x1716:0x0]: rc = 1
      
      

      Attachments

        Activity

          People

            zam Alexander Zarochentsev
            zam Alexander Zarochentsev
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: