Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14465

unlink fails when if the metadata migration is running behind

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.14.0
    • None
    • 3
    • 9223372036854775807

    Description

      Created 6.4M files and restripe automatically triggered unknown reasons. (this is already filed as a separate ticket LU-14464)

      [root@ec01 ~]# salloc -p 40n -N 40 --ntasks-per-node=16  mpirun -mca btl_openib_if_include mlx5_1:1 -x UCX_NET_DEVICES=mlx5_1:1 --bind-to core:overload-allowed --allow-run-as-root /work/tools/bin/mdtest -n 10000 -F -v -d /ai400x/testdir/ -C
      
      [root@ec01 ~]#  lfs getdirstripe /ai400x/testdir/test-dir.0-0/mdtest_tree.0/
      lmv_stripe_count: 4 lmv_stripe_offset: 0 lmv_hash_type: fnv_1a_64,migrating
      mdtidx		 FID[seq:oid:ver]
           0		 [0x200000d46:0x1f339:0x0]		
           2		 [0x2c0000c06:0xa6d6:0x0]		
           1		 [0x300000c07:0xa7a1:0x0]		
           3		 [0x340000c07:0xa9d1:0x0]
      

      Anyway, when it removes all files during metadata migration process is running behind, unlink operations fails for some files due to files were already removed.

      [root@ec01 ~]# salloc -p 40n -N 40 --ntasks-per-node=16  mpirun -mca btl_openib_if_include mlx5_1:1 -x UCX_NET_DEVICES=mlx5_1:1 --bind-to core:overload-allowed --allow-run-as-root /work/tools/bin/mdtest -n 10000 -F -v -d /ai400x/testdir/ -r
      salloc: Granted job allocation 7262
      V-1: Entering PrintTimestamp...
      -- started at 02/23/2021 10:50:03 --
      
      mdtest-3.3.0+dev was launched with 640 total task(s) on 40 node(s)
      Command line used: /work/tools/bin/mdtest '-n' '10000' '-F' '-v' '-d' '/ai400x/testdir/' '-r'
      V-1: Rank   0 Line   239 Entering parse_dirpath on /ai400x/testdir/...
      V-1: Rank   0 Line  1398 Entering valid_tests...
      V-1: Rank   0 Line  2015 api                     : (null)
      V-1: Rank   0 Line  2016 barriers                : True
      V-1: Rank   0 Line  2017 collective_creates      : False
      V-1: Rank   0 Line  2018 create_only             : False
      V-1: Rank   0 Line  2019 dirpath(s):
      V-1: Rank   0 Line  2021 	/ai400x/testdir/
      V-1: Rank   0 Line  2023 dirs_only               : False
      V-1: Rank   0 Line  2024 read_bytes              : 0
      V-1: Rank   0 Line  2025 read_only               : False
      V-1: Rank   0 Line  2026 first                   : 1
      V-1: Rank   0 Line  2027 files_only              : True
      V-1: Rank   0 Line  2031 iterations              : 1
      V-1: Rank   0 Line  2032 items_per_dir           : 0
      V-1: Rank   0 Line  2033 last                    : 0
      V-1: Rank   0 Line  2034 leaf_only               : False
      V-1: Rank   0 Line  2035 items                   : 10000
      V-1: Rank   0 Line  2036 nstride                 : 0
      V-1: Rank   0 Line  2037 pre_delay               : 0
      V-1: Rank   0 Line  2038 remove_only             : False
      V-1: Rank   0 Line  2039 random_seed             : 0
      V-1: Rank   0 Line  2040 stride                  : 1
      V-1: Rank   0 Line  2041 shared_file             : False
      V-1: Rank   0 Line  2042 time_unique_dir_overhead: False
      V-1: Rank   0 Line  2043 stone_wall_timer_seconds: 0
      V-1: Rank   0 Line  2044 stat_only               : False
      V-1: Rank   0 Line  2045 unique_dir_per_task     : False
      V-1: Rank   0 Line  2046 write_bytes             : 0
      V-1: Rank   0 Line  2047 sync_file               : False
      V-1: Rank   0 Line  2048 call_sync               : False
      V-1: Rank   0 Line  2049 depth                   : 0
      V-1: Rank   0 Line  2050 make_node               : 0
      V-1: Rank   0 Line  1490 Entering show_file_system_size on /ai400x/testdir
      Path: /ai400x/testdir
      FS: 52.4 TiB   Used FS: 0.0%   Inodes: 316.8 Mi   Used Inodes: 1.9%
      
      Nodemap: 1111111111111111000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
      640 tasks, 6400000 files
      V-1: Rank   0 Line  2238 
      V-1: Rank   0 Line  2239    Operation               Duration              Rate
      V-1: Rank   0 Line  2240    ---------               --------              ----
      V-1: Rank   0 Line  1648 main: * iteration 1 *
      V-1: Rank   0 Line   481 Entering create_remove_items on /ai400x/testdir/test-dir.0-0/mdtest_tree.0, currDepth = 0...
      V-1: Rank   0 Line   412 Entering create_remove_items_helper on /ai400x/testdir/test-dir.0-0/mdtest_tree.0
      ior WARNING: [RANK 550]: unlink() of file "/ai400x/testdir/test-dir.0-0/mdtest_tree.0/file.mdtest.550.4111" failed
      , errno 2, No such file or directory 
      V-1: Rank   0 Line  1223   File creation     :          0.000 sec,          0.000 ops/sec
      V-1: Rank   0 Line  1227   File stat         :          0.000 sec,          0.000 ops/sec
      V-1: Rank   0 Line  1228   File read         :          0.000 sec,          0.000 ops/sec
      V-1: Rank   0 Line  1229   File removal      :       1230.781 sec,       5199.949 ops/sec
      V-1: Rank   0 Line  1573 Entering create_remove_directory_tree on /ai400x/testdir/test-dir.0-0, currDepth = 0...
      V-1: Rank   0 Line  1573 Entering create_remove_directory_tree on /ai400x/testdir/test-dir.0-0/mdtest_tree.0/, currDepth = 1...
      V-1: Entering PrintTimestamp...
      02/23/2021 11:10:33: Process 0: FAILED in create_remove_directory_tree, Unable to remove directory /ai400x/testdir/test-dir.0-0/mdtest_tree.0/: Directory not empty
      --------------------------------------------------------------------------
      MPI_ABORT was invoked on rank 0 in communicator MPI COMMUNICATOR 3 CREATE FROM 0
      with errorcode 1.
      
      NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
      You may or may not see output from other processes, depending on
      exactly when Open MPI kills them.
      --------------------------------------------------------------------------
      salloc: Relinquishing job allocation 7262
      

      what mdtest claimed alraedy removed file, was not revmoed yet.

      # ls /ai400x/testdir/test-dir.0-0/mdtest_tree.0/file.mdtest.550.4111 
      

      Attachments

        Issue Links

          Activity

            [LU-14465] unlink fails when if the metadata migration is running behind

            I think if directory split happens earlier then there will be fewer files migrated (a few thousand instead of millions), so the chance of hitting this is reduced.

            Most importantly, leaving "dir_restripe_nsonly=1" should avoid this problem because the auto-split will only move the names and not the inodes. I don't think we should set "dir_restripe_nsonly=0" until LU-7607 is finished, so that the FIDs do not change when migrated.

            adilger Andreas Dilger added a comment - I think if directory split happens earlier then there will be fewer files migrated (a few thousand instead of millions), so the chance of hitting this is reduced. Most importantly, leaving " dir_restripe_nsonly=1 " should avoid this problem because the auto-split will only move the names and not the inodes. I don't think we should set " dir_restripe_nsonly=0 " until LU-7607 is finished, so that the FIDs do not change when migrated.
            laisiyao Lai Siyao added a comment -

            This was tracked under LU-7607, and I'll update the patch set https://review.whamcloud.com/#/c/38233/ later when I think over how to implement distributed index file.

            laisiyao Lai Siyao added a comment - This was tracked under LU-7607 , and I'll update the patch set https://review.whamcloud.com/#/c/38233/  later when I think over how to implement distributed index file.

            People

              wc-triage WC Triage
              sihara Shuichi Ihara
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: