[LU-6984] Failure to delete over a million files in a DNE2 directory. - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: Lustre 2.8.0
Affects Version/s: Lustre 2.8.0
Labels:
- dne2
Environment:
pre-2.8 clients with DNE2 directories which contain 1 million or more files.

Epic/Theme:
- dne
Severity:
3
Rank (Obsolete):
9223372036854775807

Description

In my testing of DNE2 I'm seeing problems when creating 1 million+ files per directory. Clearing out the debug logs I see the problem is only on the client side. When running a application I see:

command line used: /lustre/sultan/stf008/scratch/jsimmons/mdtest -I 100000 -i 5 -d /lustre/sultan/stf008/scratch/jsimmons/dne2_4_mds_md_test/shared_1000k_10
Path: /lustre/sultan/stf008/scratch/jsimmons/dne2_4_mds_md_test
FS: 21.8 TiB Used FS: 0.2% Inodes: 58.7 Mi Used Inodes: 4.6%

10 tasks, 1000000 files/directories
aprun: Apid 3172: Caught signal Window changed, sending to application
08/03/2015 10:34:45: Process 0(nid00028): FAILED in create_remove_directory_tree, Unable to remove directory: No such file or directory
Rank 0 [Mon Aug 3 10:34:45 2015] [c0-0c0s1n2] application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
_pmiu_daemon(SIGCHLD): [NID 00028] [c0-0c0s1n2] [Mon Aug 3 10:34:45 2015] PE RANK 0 exit signal Aborted
aprun: Apid 3172: Caught signal Interrupt, sending to application
_pmiu_daemon(SIGCHLD): [NID 00012] [c0-0c0s6n0] [Mon Aug 3 10:50:50 2015] PE RANK 7 exit signal Interrupt
_pmiu_daemon(SIGCHLD): [NID 00018] [c0-0c0s6n2] [Mon Aug 3 10:50:50 2015] PE RANK 9 exit signal Interrupt
_pmiu_daemon(SIGCHLD): [NID 00013] [c0-0c0s6n1] [Mon Aug 3 10:50:50 2015] PE RANK 8 exit signal Interrupt

After the test failed any attempt to remove the files create by these test fail. When I attempt to remove the files I see the following errors in dmesg.

LustreError: 5430:0:(llite_lib.c:2286:ll_prep_inode()) new_inode -fatal: rc -2
LustreError: 5451:0:(llite_lib.c:2286:ll_prep_inode()) new_inode -fatal: rc -2
LustreError: 5451:0:(llite_lib.c:2286:ll_prep_inode()) Skipped 7 previous similar messages
LustreError: 5451:0:(llite_lib.c:2286:ll_prep_inode()) new_inode -fatal: rc -2

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

lctldump.20150813
0.2 kB
13/Aug/15 6:17 PM
LU-6381.log
0.2 kB
11/Aug/15 2:45 PM
LU-6984-backtrace.log
83 kB
16/Sep/15 10:54 PM
lu-6984-Sept-18-2015.tgz
0.2 kB
18/Sep/15 6:20 PM

Issue Links

is related to

LU-6831 The ticket for tracking all DNE2 bugs

Reopened

mentioned in: Page Loading...; Page Loading...

Activity

[LU-6984] Failure to delete over a million files in a DNE2 directory.

James A Simmons added a comment - 18/Sep/15 6:20 PM

The first run of mdtest takes a while before failure. Once it fails you can duplicate the failure with rm -rf the left over files from mdtest.

I attached the logs for my latest test from the client node and the all the MDS servers I have.

James A Simmons added a comment - 18/Sep/15 6:20 PM The first run of mdtest takes a while before failure. Once it fails you can duplicate the failure with rm -rf the left over files from mdtest. I attached the logs for my latest test from the client node and the all the MDS servers I have.

Di Wang (Inactive) added a comment - 18/Sep/15 5:00 PM

James: thanks. And usually how soon did you met the failure? after a few minutes? a few hours after starting the test?

Di Wang (Inactive) added a comment - 18/Sep/15 5:00 PM James: thanks. And usually how soon did you met the failure? after a few minutes? a few hours after starting the test?

James A Simmons added a comment - 18/Sep/15 4:02 PM

I did you one better. Grab my source rpm at http://www.infradead.org/~jsimmons/lustre-2.7.59-1_g703195a.src.rpm

James A Simmons added a comment - 18/Sep/15 4:02 PM I did you one better. Grab my source rpm at http://www.infradead.org/~jsimmons/lustre-2.7.59-1_g703195a.src.rpm

Di Wang (Inactive) added a comment - 18/Sep/15 6:53 AM

Ok, I tried to reproduce it on Opensfs cluster with 8 MDTs (4 MDS) and 4 OSTs(2 OSS). 9 clients. Just start the test, it has been an hour, still can not see this problem. I will check tomorrow morning to see how it goes?

James: could you please tell me all of your patches(based on master)? Thanks.

Di Wang (Inactive) added a comment - 18/Sep/15 6:53 AM Ok, I tried to reproduce it on Opensfs cluster with 8 MDTs (4 MDS) and 4 OSTs(2 OSS). 9 clients. Just start the test, it has been an hour, still can not see this problem. I will check tomorrow morning to see how it goes? James: could you please tell me all of your patches(based on master)? Thanks.

Di Wang (Inactive) added a comment - 17/Sep/15 8:34 PM

Both would be best. If not, then only client would be ok. Thanks

Di Wang (Inactive) added a comment - 17/Sep/15 8:34 PM Both would be best. If not, then only client would be ok. Thanks

James A Simmons added a comment - 17/Sep/15 8:15 PM

On the MDS or the client?

James A Simmons added a comment - 17/Sep/15 8:15 PM On the MDS or the client?

Di Wang (Inactive) added a comment - 17/Sep/15 8:05 PM

Hmm, during slaves revalidation, it seems the striped directory has been locked with both LOOKUP and UPDATE locks. I do not understand why the master stripe nlink turns to 1 at that time.

James: Could you please collect the debug log when the failure happens? (-1) would be best, but if there is race, just collect the default one please. Thanks!

Di Wang (Inactive) added a comment - 17/Sep/15 8:05 PM Hmm, during slaves revalidation, it seems the striped directory has been locked with both LOOKUP and UPDATE locks. I do not understand why the master stripe nlink turns to 1 at that time. James: Could you please collect the debug log when the failure happens? (-1) would be best, but if there is race, just collect the default one please. Thanks!

James A Simmons added a comment - 16/Sep/15 11:29 PM

Doesn't matter how many client nodes. I use 400 below but use whatever you want. What matters the number of files per directory. Remember this is with remote_dir=-1 and remote_dir_gid=-1. Try using 8 MDS servers but any number greater than 1 will do:

lfs setdirstripe -c 8 /lustre/whatever/jsimmons/dne2_8_mds_md_test
lfs setdirstripe -c 8 -D /lustre/whatever/jsimmons/dne2_8_mds_md_test (to make all directories under it the same)
mkdir /lustre/whatever/jsimmons/dne2_8_mds_md_test/shared_1000k_400
mpi_run -n 400 mdtest -I 2500 -i 5 -d /lustre/whatever/jsimmons/dne2_8_mds_md_test/shared_1000k_400

When mdtest goes to delete the files mdtest will fail. At least it does for me.

James A Simmons added a comment - 16/Sep/15 11:29 PM Doesn't matter how many client nodes. I use 400 below but use whatever you want. What matters the number of files per directory. Remember this is with remote_dir=-1 and remote_dir_gid=-1. Try using 8 MDS servers but any number greater than 1 will do: lfs setdirstripe -c 8 /lustre/whatever/jsimmons/dne2_8_mds_md_test lfs setdirstripe -c 8 -D /lustre/whatever/jsimmons/dne2_8_mds_md_test (to make all directories under it the same) mkdir /lustre/whatever/jsimmons/dne2_8_mds_md_test/shared_1000k_400 mpi_run -n 400 mdtest -I 2500 -i 5 -d /lustre/whatever/jsimmons/dne2_8_mds_md_test/shared_1000k_400 When mdtest goes to delete the files mdtest will fail. At least it does for me.

Di Wang (Inactive) added a comment - 16/Sep/15 11:12 PM

Could you please tell me how to reproduce the problem? still use mdtest with single thread on 1 node? thanks.

Di Wang (Inactive) added a comment - 16/Sep/15 11:12 PM Could you please tell me how to reproduce the problem? still use mdtest with single thread on 1 node? thanks.

James A Simmons added a comment - 16/Sep/15 10:54 PM

Better yet I did a dump_stack at where the code fails. You have complete back traces. I attached the file to this ticket.

James A Simmons added a comment - 16/Sep/15 10:54 PM Better yet I did a dump_stack at where the code fails. You have complete back traces. I attached the file to this ticket.

Di Wang (Inactive) added a comment - 16/Sep/15 10:31 PM

James, thanks for updating, very useful information. Do you happen to know who calls ll_prep_inode() ? Thanks!

Di Wang (Inactive) added a comment - 16/Sep/15 10:31 PM James, thanks for updating, very useful information. Do you happen to know who calls ll_prep_inode() ? Thanks!

People

Assignee:: Di Wang (Inactive)

Reporter:: James A Simmons

Votes:: 0 Vote for this issue

Watchers:: 11 Start watching this issue

Dates

Created:: 11/Aug/15 2:40 PM

Updated:: 20/Feb/17 7:41 AM

Resolved:: 03/Oct/15 4:21 AM