[LU-6984] Failure to delete over a million files in a DNE2 directory. - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: Lustre 2.8.0
Affects Version/s: Lustre 2.8.0
Labels:
- dne2
Environment:
pre-2.8 clients with DNE2 directories which contain 1 million or more files.

Epic/Theme:
- dne
Severity:
3
Rank (Obsolete):
9223372036854775807

Description

In my testing of DNE2 I'm seeing problems when creating 1 million+ files per directory. Clearing out the debug logs I see the problem is only on the client side. When running a application I see:

command line used: /lustre/sultan/stf008/scratch/jsimmons/mdtest -I 100000 -i 5 -d /lustre/sultan/stf008/scratch/jsimmons/dne2_4_mds_md_test/shared_1000k_10
Path: /lustre/sultan/stf008/scratch/jsimmons/dne2_4_mds_md_test
FS: 21.8 TiB Used FS: 0.2% Inodes: 58.7 Mi Used Inodes: 4.6%

10 tasks, 1000000 files/directories
aprun: Apid 3172: Caught signal Window changed, sending to application
08/03/2015 10:34:45: Process 0(nid00028): FAILED in create_remove_directory_tree, Unable to remove directory: No such file or directory
Rank 0 [Mon Aug 3 10:34:45 2015] [c0-0c0s1n2] application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
_pmiu_daemon(SIGCHLD): [NID 00028] [c0-0c0s1n2] [Mon Aug 3 10:34:45 2015] PE RANK 0 exit signal Aborted
aprun: Apid 3172: Caught signal Interrupt, sending to application
_pmiu_daemon(SIGCHLD): [NID 00012] [c0-0c0s6n0] [Mon Aug 3 10:50:50 2015] PE RANK 7 exit signal Interrupt
_pmiu_daemon(SIGCHLD): [NID 00018] [c0-0c0s6n2] [Mon Aug 3 10:50:50 2015] PE RANK 9 exit signal Interrupt
_pmiu_daemon(SIGCHLD): [NID 00013] [c0-0c0s6n1] [Mon Aug 3 10:50:50 2015] PE RANK 8 exit signal Interrupt

After the test failed any attempt to remove the files create by these test fail. When I attempt to remove the files I see the following errors in dmesg.

LustreError: 5430:0:(llite_lib.c:2286:ll_prep_inode()) new_inode -fatal: rc -2
LustreError: 5451:0:(llite_lib.c:2286:ll_prep_inode()) new_inode -fatal: rc -2
LustreError: 5451:0:(llite_lib.c:2286:ll_prep_inode()) Skipped 7 previous similar messages
LustreError: 5451:0:(llite_lib.c:2286:ll_prep_inode()) new_inode -fatal: rc -2

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

lctldump.20150813
0.2 kB
13/Aug/15 6:17 PM
LU-6381.log
0.2 kB
11/Aug/15 2:45 PM
LU-6984-backtrace.log
83 kB
16/Sep/15 10:54 PM
lu-6984-Sept-18-2015.tgz
0.2 kB
18/Sep/15 6:20 PM

Issue Links

is related to

LU-6831 The ticket for tracking all DNE2 bugs

Reopened

mentioned in: Page Loading...; Page Loading...

Activity

People

Assignee:: Di Wang (Inactive)

Reporter:: James A Simmons

Votes:: 0 Vote for this issue

Watchers:: 11 Start watching this issue

Dates

Created:: 11/Aug/15 2:40 PM

Updated:: 20/Feb/17 7:41 AM

Resolved:: 03/Oct/15 4:21 AM