Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14465

unlink fails when if the metadata migration is running behind

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.14.0
    • None
    • 3
    • 9223372036854775807

    Description

      Created 6.4M files and restripe automatically triggered unknown reasons. (this is already filed as a separate ticket LU-14464)

      [root@ec01 ~]# salloc -p 40n -N 40 --ntasks-per-node=16  mpirun -mca btl_openib_if_include mlx5_1:1 -x UCX_NET_DEVICES=mlx5_1:1 --bind-to core:overload-allowed --allow-run-as-root /work/tools/bin/mdtest -n 10000 -F -v -d /ai400x/testdir/ -C
      
      [root@ec01 ~]#  lfs getdirstripe /ai400x/testdir/test-dir.0-0/mdtest_tree.0/
      lmv_stripe_count: 4 lmv_stripe_offset: 0 lmv_hash_type: fnv_1a_64,migrating
      mdtidx		 FID[seq:oid:ver]
           0		 [0x200000d46:0x1f339:0x0]		
           2		 [0x2c0000c06:0xa6d6:0x0]		
           1		 [0x300000c07:0xa7a1:0x0]		
           3		 [0x340000c07:0xa9d1:0x0]
      

      Anyway, when it removes all files during metadata migration process is running behind, unlink operations fails for some files due to files were already removed.

      [root@ec01 ~]# salloc -p 40n -N 40 --ntasks-per-node=16  mpirun -mca btl_openib_if_include mlx5_1:1 -x UCX_NET_DEVICES=mlx5_1:1 --bind-to core:overload-allowed --allow-run-as-root /work/tools/bin/mdtest -n 10000 -F -v -d /ai400x/testdir/ -r
      salloc: Granted job allocation 7262
      V-1: Entering PrintTimestamp...
      -- started at 02/23/2021 10:50:03 --
      
      mdtest-3.3.0+dev was launched with 640 total task(s) on 40 node(s)
      Command line used: /work/tools/bin/mdtest '-n' '10000' '-F' '-v' '-d' '/ai400x/testdir/' '-r'
      V-1: Rank   0 Line   239 Entering parse_dirpath on /ai400x/testdir/...
      V-1: Rank   0 Line  1398 Entering valid_tests...
      V-1: Rank   0 Line  2015 api                     : (null)
      V-1: Rank   0 Line  2016 barriers                : True
      V-1: Rank   0 Line  2017 collective_creates      : False
      V-1: Rank   0 Line  2018 create_only             : False
      V-1: Rank   0 Line  2019 dirpath(s):
      V-1: Rank   0 Line  2021 	/ai400x/testdir/
      V-1: Rank   0 Line  2023 dirs_only               : False
      V-1: Rank   0 Line  2024 read_bytes              : 0
      V-1: Rank   0 Line  2025 read_only               : False
      V-1: Rank   0 Line  2026 first                   : 1
      V-1: Rank   0 Line  2027 files_only              : True
      V-1: Rank   0 Line  2031 iterations              : 1
      V-1: Rank   0 Line  2032 items_per_dir           : 0
      V-1: Rank   0 Line  2033 last                    : 0
      V-1: Rank   0 Line  2034 leaf_only               : False
      V-1: Rank   0 Line  2035 items                   : 10000
      V-1: Rank   0 Line  2036 nstride                 : 0
      V-1: Rank   0 Line  2037 pre_delay               : 0
      V-1: Rank   0 Line  2038 remove_only             : False
      V-1: Rank   0 Line  2039 random_seed             : 0
      V-1: Rank   0 Line  2040 stride                  : 1
      V-1: Rank   0 Line  2041 shared_file             : False
      V-1: Rank   0 Line  2042 time_unique_dir_overhead: False
      V-1: Rank   0 Line  2043 stone_wall_timer_seconds: 0
      V-1: Rank   0 Line  2044 stat_only               : False
      V-1: Rank   0 Line  2045 unique_dir_per_task     : False
      V-1: Rank   0 Line  2046 write_bytes             : 0
      V-1: Rank   0 Line  2047 sync_file               : False
      V-1: Rank   0 Line  2048 call_sync               : False
      V-1: Rank   0 Line  2049 depth                   : 0
      V-1: Rank   0 Line  2050 make_node               : 0
      V-1: Rank   0 Line  1490 Entering show_file_system_size on /ai400x/testdir
      Path: /ai400x/testdir
      FS: 52.4 TiB   Used FS: 0.0%   Inodes: 316.8 Mi   Used Inodes: 1.9%
      
      Nodemap: 1111111111111111000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
      640 tasks, 6400000 files
      V-1: Rank   0 Line  2238 
      V-1: Rank   0 Line  2239    Operation               Duration              Rate
      V-1: Rank   0 Line  2240    ---------               --------              ----
      V-1: Rank   0 Line  1648 main: * iteration 1 *
      V-1: Rank   0 Line   481 Entering create_remove_items on /ai400x/testdir/test-dir.0-0/mdtest_tree.0, currDepth = 0...
      V-1: Rank   0 Line   412 Entering create_remove_items_helper on /ai400x/testdir/test-dir.0-0/mdtest_tree.0
      ior WARNING: [RANK 550]: unlink() of file "/ai400x/testdir/test-dir.0-0/mdtest_tree.0/file.mdtest.550.4111" failed
      , errno 2, No such file or directory 
      V-1: Rank   0 Line  1223   File creation     :          0.000 sec,          0.000 ops/sec
      V-1: Rank   0 Line  1227   File stat         :          0.000 sec,          0.000 ops/sec
      V-1: Rank   0 Line  1228   File read         :          0.000 sec,          0.000 ops/sec
      V-1: Rank   0 Line  1229   File removal      :       1230.781 sec,       5199.949 ops/sec
      V-1: Rank   0 Line  1573 Entering create_remove_directory_tree on /ai400x/testdir/test-dir.0-0, currDepth = 0...
      V-1: Rank   0 Line  1573 Entering create_remove_directory_tree on /ai400x/testdir/test-dir.0-0/mdtest_tree.0/, currDepth = 1...
      V-1: Entering PrintTimestamp...
      02/23/2021 11:10:33: Process 0: FAILED in create_remove_directory_tree, Unable to remove directory /ai400x/testdir/test-dir.0-0/mdtest_tree.0/: Directory not empty
      --------------------------------------------------------------------------
      MPI_ABORT was invoked on rank 0 in communicator MPI COMMUNICATOR 3 CREATE FROM 0
      with errorcode 1.
      
      NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
      You may or may not see output from other processes, depending on
      exactly when Open MPI kills them.
      --------------------------------------------------------------------------
      salloc: Relinquishing job allocation 7262
      

      what mdtest claimed alraedy removed file, was not revmoed yet.

      # ls /ai400x/testdir/test-dir.0-0/mdtest_tree.0/file.mdtest.550.4111 
      

      Attachments

        Issue Links

          Activity

            [LU-14465] unlink fails when if the metadata migration is running behind
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-14466 [ LU-14466 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-7607 [ LU-7607 ]

            I think if directory split happens earlier then there will be fewer files migrated (a few thousand instead of millions), so the chance of hitting this is reduced.

            Most importantly, leaving "dir_restripe_nsonly=1" should avoid this problem because the auto-split will only move the names and not the inodes. I don't think we should set "dir_restripe_nsonly=0" until LU-7607 is finished, so that the FIDs do not change when migrated.

            adilger Andreas Dilger added a comment - I think if directory split happens earlier then there will be fewer files migrated (a few thousand instead of millions), so the chance of hitting this is reduced. Most importantly, leaving " dir_restripe_nsonly=1 " should avoid this problem because the auto-split will only move the names and not the inodes. I don't think we should set " dir_restripe_nsonly=0 " until LU-7607 is finished, so that the FIDs do not change when migrated.
            laisiyao Lai Siyao added a comment -

            This was tracked under LU-7607, and I'll update the patch set https://review.whamcloud.com/#/c/38233/ later when I think over how to implement distributed index file.

            laisiyao Lai Siyao added a comment - This was tracked under LU-7607 , and I'll update the patch set https://review.whamcloud.com/#/c/38233/  later when I think over how to implement distributed index file.
            sihara Shuichi Ihara made changes -
            Description Original: Created 6.4M files and restripe automatically triggered unknown reasons. (this is already filed as a separate ticket LU-14464)
            {noformat}
            [root@ec01 ~]# salloc -p 40n -N 40 --ntasks-per-node=16 mpirun -mca btl_openib_if_include mlx5_1:1 -x UCX_NET_DEVICES=mlx5_1:1 --bind-to core:overload-allowed --allow-run-as-root /work/tools/bin/mdtest -n 10000 -F -v -d /ai400x/testdir/ -C
            {noformat}
            {noformat}
            [root@ec01 ~]# lfs getdirstripe /ai400x/testdir/test-dir.0-0/mdtest_tree.0/
            lmv_stripe_count: 4 lmv_stripe_offset: 0 lmv_hash_type: fnv_1a_64,migrating
            mdtidx FID[seq:oid:ver]
                 0 [0x200000d46:0x1f339:0x0]
                 2 [0x2c0000c06:0xa6d6:0x0]
                 1 [0x300000c07:0xa7a1:0x0]
                 3 [0x340000c07:0xa9d1:0x0]
            {noformat}

            Anyway, when it removes all files during metadata migration process is running behind, unlink operations fails for some files due to files were already removed.
            {noformat}
            [root@ec01 ~]# salloc -p 40n -N 40 --ntasks-per-node=16 mpirun -mca btl_openib_if_include mlx5_1:1 -x UCX_NET_DEVICES=mlx5_1:1 --bind-to core:overload-allowed --allow-run-as-root /work/tools/bin/mdtest -n 10000 -F -v -d /ai400x/testdir/ -r
            salloc: Granted job allocation 7262
            V-1: Entering PrintTimestamp...
            -- started at 02/23/2021 10:50:03 --

            mdtest-3.3.0+dev was launched with 640 total task(s) on 40 node(s)
            Command line used: /work/tools/bin/mdtest '-n' '10000' '-F' '-v' '-d' '/ai400x/testdir/' '-r'
            V-1: Rank 0 Line 239 Entering parse_dirpath on /ai400x/testdir/...
            V-1: Rank 0 Line 1398 Entering valid_tests...
            V-1: Rank 0 Line 2015 api : (null)
            V-1: Rank 0 Line 2016 barriers : True
            V-1: Rank 0 Line 2017 collective_creates : False
            V-1: Rank 0 Line 2018 create_only : False
            V-1: Rank 0 Line 2019 dirpath(s):
            V-1: Rank 0 Line 2021 /ai400x/testdir/
            V-1: Rank 0 Line 2023 dirs_only : False
            V-1: Rank 0 Line 2024 read_bytes : 0
            V-1: Rank 0 Line 2025 read_only : False
            V-1: Rank 0 Line 2026 first : 1
            V-1: Rank 0 Line 2027 files_only : True
            V-1: Rank 0 Line 2031 iterations : 1
            V-1: Rank 0 Line 2032 items_per_dir : 0
            V-1: Rank 0 Line 2033 last : 0
            V-1: Rank 0 Line 2034 leaf_only : False
            V-1: Rank 0 Line 2035 items : 10000
            V-1: Rank 0 Line 2036 nstride : 0
            V-1: Rank 0 Line 2037 pre_delay : 0
            V-1: Rank 0 Line 2038 remove_only : False
            V-1: Rank 0 Line 2039 random_seed : 0
            V-1: Rank 0 Line 2040 stride : 1
            V-1: Rank 0 Line 2041 shared_file : False
            V-1: Rank 0 Line 2042 time_unique_dir_overhead: False
            V-1: Rank 0 Line 2043 stone_wall_timer_seconds: 0
            V-1: Rank 0 Line 2044 stat_only : False
            V-1: Rank 0 Line 2045 unique_dir_per_task : False
            V-1: Rank 0 Line 2046 write_bytes : 0
            V-1: Rank 0 Line 2047 sync_file : False
            V-1: Rank 0 Line 2048 call_sync : False
            V-1: Rank 0 Line 2049 depth : 0
            V-1: Rank 0 Line 2050 make_node : 0
            V-1: Rank 0 Line 1490 Entering show_file_system_size on /ai400x/testdir
            Path: /ai400x/testdir
            FS: 52.4 TiB Used FS: 0.0% Inodes: 316.8 Mi Used Inodes: 1.9%

            Nodemap: 1111111111111111000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
            640 tasks, 6400000 files
            V-1: Rank 0 Line 2238
            V-1: Rank 0 Line 2239 Operation Duration Rate
            V-1: Rank 0 Line 2240 --------- -------- ----
            V-1: Rank 0 Line 1648 main: * iteration 1 *
            V-1: Rank 0 Line 481 Entering create_remove_items on /ai400x/testdir/test-dir.0-0/mdtest_tree.0, currDepth = 0...
            V-1: Rank 0 Line 412 Entering create_remove_items_helper on /ai400x/testdir/test-dir.0-0/mdtest_tree.0
            ior WARNING: [RANK 550]: unlink() of file "/ai400x/testdir/test-dir.0-0/mdtest_tree.0/file.mdtest.550.4111" failed
            , errno 2, No such file or directory
            V-1: Rank 0 Line 1223 File creation : 0.000 sec, 0.000 ops/sec
            V-1: Rank 0 Line 1227 File stat : 0.000 sec, 0.000 ops/sec
            V-1: Rank 0 Line 1228 File read : 0.000 sec, 0.000 ops/sec
            V-1: Rank 0 Line 1229 File removal : 1230.781 sec, 5199.949 ops/sec
            V-1: Rank 0 Line 1573 Entering create_remove_directory_tree on /ai400x/testdir/test-dir.0-0, currDepth = 0...
            V-1: Rank 0 Line 1573 Entering create_remove_directory_tree on /ai400x/testdir/test-dir.0-0/mdtest_tree.0/, currDepth = 1...
            V-1: Entering PrintTimestamp...
            02/23/2021 11:10:33: Process 0: FAILED in create_remove_directory_tree, Unable to remove directory /ai400x/testdir/test-dir.0-0/mdtest_tree.0/: Directory not empty
            --------------------------------------------------------------------------
            MPI_ABORT was invoked on rank 0 in communicator MPI COMMUNICATOR 3 CREATE FROM 0
            with errorcode 1.

            NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
            You may or may not see output from other processes, depending on
            exactly when Open MPI kills them.
            --------------------------------------------------------------------------
            salloc: Relinquishing job allocation 7262
            {noformat}
            New: Created 6.4M files and restripe automatically triggered unknown reasons. (this is already filed as a separate ticket LU-14464)
            {noformat}
            [root@ec01 ~]# salloc -p 40n -N 40 --ntasks-per-node=16 mpirun -mca btl_openib_if_include mlx5_1:1 -x UCX_NET_DEVICES=mlx5_1:1 --bind-to core:overload-allowed --allow-run-as-root /work/tools/bin/mdtest -n 10000 -F -v -d /ai400x/testdir/ -C
            {noformat}
            {noformat}
            [root@ec01 ~]# lfs getdirstripe /ai400x/testdir/test-dir.0-0/mdtest_tree.0/
            lmv_stripe_count: 4 lmv_stripe_offset: 0 lmv_hash_type: fnv_1a_64,migrating
            mdtidx FID[seq:oid:ver]
                 0 [0x200000d46:0x1f339:0x0]
                 2 [0x2c0000c06:0xa6d6:0x0]
                 1 [0x300000c07:0xa7a1:0x0]
                 3 [0x340000c07:0xa9d1:0x0]
            {noformat}

            Anyway, when it removes all files during metadata migration process is running behind, unlink operations fails for some files due to files were already removed.
            {noformat}
            [root@ec01 ~]# salloc -p 40n -N 40 --ntasks-per-node=16 mpirun -mca btl_openib_if_include mlx5_1:1 -x UCX_NET_DEVICES=mlx5_1:1 --bind-to core:overload-allowed --allow-run-as-root /work/tools/bin/mdtest -n 10000 -F -v -d /ai400x/testdir/ -r
            salloc: Granted job allocation 7262
            V-1: Entering PrintTimestamp...
            -- started at 02/23/2021 10:50:03 --

            mdtest-3.3.0+dev was launched with 640 total task(s) on 40 node(s)
            Command line used: /work/tools/bin/mdtest '-n' '10000' '-F' '-v' '-d' '/ai400x/testdir/' '-r'
            V-1: Rank 0 Line 239 Entering parse_dirpath on /ai400x/testdir/...
            V-1: Rank 0 Line 1398 Entering valid_tests...
            V-1: Rank 0 Line 2015 api : (null)
            V-1: Rank 0 Line 2016 barriers : True
            V-1: Rank 0 Line 2017 collective_creates : False
            V-1: Rank 0 Line 2018 create_only : False
            V-1: Rank 0 Line 2019 dirpath(s):
            V-1: Rank 0 Line 2021 /ai400x/testdir/
            V-1: Rank 0 Line 2023 dirs_only : False
            V-1: Rank 0 Line 2024 read_bytes : 0
            V-1: Rank 0 Line 2025 read_only : False
            V-1: Rank 0 Line 2026 first : 1
            V-1: Rank 0 Line 2027 files_only : True
            V-1: Rank 0 Line 2031 iterations : 1
            V-1: Rank 0 Line 2032 items_per_dir : 0
            V-1: Rank 0 Line 2033 last : 0
            V-1: Rank 0 Line 2034 leaf_only : False
            V-1: Rank 0 Line 2035 items : 10000
            V-1: Rank 0 Line 2036 nstride : 0
            V-1: Rank 0 Line 2037 pre_delay : 0
            V-1: Rank 0 Line 2038 remove_only : False
            V-1: Rank 0 Line 2039 random_seed : 0
            V-1: Rank 0 Line 2040 stride : 1
            V-1: Rank 0 Line 2041 shared_file : False
            V-1: Rank 0 Line 2042 time_unique_dir_overhead: False
            V-1: Rank 0 Line 2043 stone_wall_timer_seconds: 0
            V-1: Rank 0 Line 2044 stat_only : False
            V-1: Rank 0 Line 2045 unique_dir_per_task : False
            V-1: Rank 0 Line 2046 write_bytes : 0
            V-1: Rank 0 Line 2047 sync_file : False
            V-1: Rank 0 Line 2048 call_sync : False
            V-1: Rank 0 Line 2049 depth : 0
            V-1: Rank 0 Line 2050 make_node : 0
            V-1: Rank 0 Line 1490 Entering show_file_system_size on /ai400x/testdir
            Path: /ai400x/testdir
            FS: 52.4 TiB Used FS: 0.0% Inodes: 316.8 Mi Used Inodes: 1.9%

            Nodemap: 1111111111111111000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
            640 tasks, 6400000 files
            V-1: Rank 0 Line 2238
            V-1: Rank 0 Line 2239 Operation Duration Rate
            V-1: Rank 0 Line 2240 --------- -------- ----
            V-1: Rank 0 Line 1648 main: * iteration 1 *
            V-1: Rank 0 Line 481 Entering create_remove_items on /ai400x/testdir/test-dir.0-0/mdtest_tree.0, currDepth = 0...
            V-1: Rank 0 Line 412 Entering create_remove_items_helper on /ai400x/testdir/test-dir.0-0/mdtest_tree.0
            ior WARNING: [RANK 550]: unlink() of file "/ai400x/testdir/test-dir.0-0/mdtest_tree.0/file.mdtest.550.4111" failed
            , errno 2, No such file or directory
            V-1: Rank 0 Line 1223 File creation : 0.000 sec, 0.000 ops/sec
            V-1: Rank 0 Line 1227 File stat : 0.000 sec, 0.000 ops/sec
            V-1: Rank 0 Line 1228 File read : 0.000 sec, 0.000 ops/sec
            V-1: Rank 0 Line 1229 File removal : 1230.781 sec, 5199.949 ops/sec
            V-1: Rank 0 Line 1573 Entering create_remove_directory_tree on /ai400x/testdir/test-dir.0-0, currDepth = 0...
            V-1: Rank 0 Line 1573 Entering create_remove_directory_tree on /ai400x/testdir/test-dir.0-0/mdtest_tree.0/, currDepth = 1...
            V-1: Entering PrintTimestamp...
            02/23/2021 11:10:33: Process 0: FAILED in create_remove_directory_tree, Unable to remove directory /ai400x/testdir/test-dir.0-0/mdtest_tree.0/: Directory not empty
            --------------------------------------------------------------------------
            MPI_ABORT was invoked on rank 0 in communicator MPI COMMUNICATOR 3 CREATE FROM 0
            with errorcode 1.

            NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
            You may or may not see output from other processes, depending on
            exactly when Open MPI kills them.
            --------------------------------------------------------------------------
            salloc: Relinquishing job allocation 7262
            {noformat}

            what mdtest claimed alraedy removed file, was not revmoed yet.
            {noformat}
            # ls /ai400x/testdir/test-dir.0-0/mdtest_tree.0/file.mdtest.550.4111
            {noformat}
            sihara Shuichi Ihara created issue -

            People

              wc-triage WC Triage
              sihara Shuichi Ihara
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: