Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.14.0
-
None
-
3
-
9223372036854775807
Description
Created 6.4M files and restripe automatically triggered unknown reasons. (this is already filed as a separate ticket LU-14464)
[root@ec01 ~]# salloc -p 40n -N 40 --ntasks-per-node=16 mpirun -mca btl_openib_if_include mlx5_1:1 -x UCX_NET_DEVICES=mlx5_1:1 --bind-to core:overload-allowed --allow-run-as-root /work/tools/bin/mdtest -n 10000 -F -v -d /ai400x/testdir/ -C
[root@ec01 ~]# lfs getdirstripe /ai400x/testdir/test-dir.0-0/mdtest_tree.0/ lmv_stripe_count: 4 lmv_stripe_offset: 0 lmv_hash_type: fnv_1a_64,migrating mdtidx FID[seq:oid:ver] 0 [0x200000d46:0x1f339:0x0] 2 [0x2c0000c06:0xa6d6:0x0] 1 [0x300000c07:0xa7a1:0x0] 3 [0x340000c07:0xa9d1:0x0]
Anyway, when it removes all files during metadata migration process is running behind, unlink operations fails for some files due to files were already removed.
[root@ec01 ~]# salloc -p 40n -N 40 --ntasks-per-node=16 mpirun -mca btl_openib_if_include mlx5_1:1 -x UCX_NET_DEVICES=mlx5_1:1 --bind-to core:overload-allowed --allow-run-as-root /work/tools/bin/mdtest -n 10000 -F -v -d /ai400x/testdir/ -r salloc: Granted job allocation 7262 V-1: Entering PrintTimestamp... -- started at 02/23/2021 10:50:03 -- mdtest-3.3.0+dev was launched with 640 total task(s) on 40 node(s) Command line used: /work/tools/bin/mdtest '-n' '10000' '-F' '-v' '-d' '/ai400x/testdir/' '-r' V-1: Rank 0 Line 239 Entering parse_dirpath on /ai400x/testdir/... V-1: Rank 0 Line 1398 Entering valid_tests... V-1: Rank 0 Line 2015 api : (null) V-1: Rank 0 Line 2016 barriers : True V-1: Rank 0 Line 2017 collective_creates : False V-1: Rank 0 Line 2018 create_only : False V-1: Rank 0 Line 2019 dirpath(s): V-1: Rank 0 Line 2021 /ai400x/testdir/ V-1: Rank 0 Line 2023 dirs_only : False V-1: Rank 0 Line 2024 read_bytes : 0 V-1: Rank 0 Line 2025 read_only : False V-1: Rank 0 Line 2026 first : 1 V-1: Rank 0 Line 2027 files_only : True V-1: Rank 0 Line 2031 iterations : 1 V-1: Rank 0 Line 2032 items_per_dir : 0 V-1: Rank 0 Line 2033 last : 0 V-1: Rank 0 Line 2034 leaf_only : False V-1: Rank 0 Line 2035 items : 10000 V-1: Rank 0 Line 2036 nstride : 0 V-1: Rank 0 Line 2037 pre_delay : 0 V-1: Rank 0 Line 2038 remove_only : False V-1: Rank 0 Line 2039 random_seed : 0 V-1: Rank 0 Line 2040 stride : 1 V-1: Rank 0 Line 2041 shared_file : False V-1: Rank 0 Line 2042 time_unique_dir_overhead: False V-1: Rank 0 Line 2043 stone_wall_timer_seconds: 0 V-1: Rank 0 Line 2044 stat_only : False V-1: Rank 0 Line 2045 unique_dir_per_task : False V-1: Rank 0 Line 2046 write_bytes : 0 V-1: Rank 0 Line 2047 sync_file : False V-1: Rank 0 Line 2048 call_sync : False V-1: Rank 0 Line 2049 depth : 0 V-1: Rank 0 Line 2050 make_node : 0 V-1: Rank 0 Line 1490 Entering show_file_system_size on /ai400x/testdir Path: /ai400x/testdir FS: 52.4 TiB Used FS: 0.0% Inodes: 316.8 Mi Used Inodes: 1.9% Nodemap: 1111111111111111000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 640 tasks, 6400000 files V-1: Rank 0 Line 2238 V-1: Rank 0 Line 2239 Operation Duration Rate V-1: Rank 0 Line 2240 --------- -------- ---- V-1: Rank 0 Line 1648 main: * iteration 1 * V-1: Rank 0 Line 481 Entering create_remove_items on /ai400x/testdir/test-dir.0-0/mdtest_tree.0, currDepth = 0... V-1: Rank 0 Line 412 Entering create_remove_items_helper on /ai400x/testdir/test-dir.0-0/mdtest_tree.0 ior WARNING: [RANK 550]: unlink() of file "/ai400x/testdir/test-dir.0-0/mdtest_tree.0/file.mdtest.550.4111" failed , errno 2, No such file or directory V-1: Rank 0 Line 1223 File creation : 0.000 sec, 0.000 ops/sec V-1: Rank 0 Line 1227 File stat : 0.000 sec, 0.000 ops/sec V-1: Rank 0 Line 1228 File read : 0.000 sec, 0.000 ops/sec V-1: Rank 0 Line 1229 File removal : 1230.781 sec, 5199.949 ops/sec V-1: Rank 0 Line 1573 Entering create_remove_directory_tree on /ai400x/testdir/test-dir.0-0, currDepth = 0... V-1: Rank 0 Line 1573 Entering create_remove_directory_tree on /ai400x/testdir/test-dir.0-0/mdtest_tree.0/, currDepth = 1... V-1: Entering PrintTimestamp... 02/23/2021 11:10:33: Process 0: FAILED in create_remove_directory_tree, Unable to remove directory /ai400x/testdir/test-dir.0-0/mdtest_tree.0/: Directory not empty -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 0 in communicator MPI COMMUNICATOR 3 CREATE FROM 0 with errorcode 1. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -------------------------------------------------------------------------- salloc: Relinquishing job allocation 7262
what mdtest claimed alraedy removed file, was not revmoed yet.
# ls /ai400x/testdir/test-dir.0-0/mdtest_tree.0/file.mdtest.550.4111
I think if directory split happens earlier then there will be fewer files migrated (a few thousand instead of millions), so the chance of hitting this is reduced.
Most importantly, leaving "dir_restripe_nsonly=1" should avoid this problem because the auto-split will only move the names and not the inodes. I don't think we should set "dir_restripe_nsonly=0" until LU-7607 is finished, so that the FIDs do not change when migrated.