[LU-762] Hyperion - mdtest failure Created: 14/Oct/11  Updated: 10/Jan/12  Resolved: 16/Dec/11

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 1.8.7
Fix Version/s: Lustre 2.2.0, Lustre 2.1.1

Type: Bug Priority: Major
Reporter: Cliff White (Inactive) Assignee: Yang Sheng
Resolution: Fixed Votes: 0
Labels: None
Environment:

RHEL5/x86_64


Attachments: File e4-38-74     File fs38-74     File parallel-scale.test_metabench.debug_log.hyperion-mds1.1319834069.log.gz    
Severity: 3
Rank (Obsolete): 4852

 Description   

mdtest in shared directory fails with 50 clients and MDTEST_NFILES=1024, also fails w/100 clients and MDTEST_NFILES=256. Failure is silent, no Lustre Errors or any servers messages. Failure is consistent and repeatable. Failure only occurs when run in shared directory mode.
Failure example:

000: Command line used: /opt/mdtest-1.8.3/bin/mdtest -d /p/l_wham/white215/hyperion.3867/mdtest -i3 -n1024
000: Path: /p/l_wham/white215/hyperion.3867
000: FS: 87.0 TiB Used FS: 0.0% Inodes: 546.9 Mi Used Inodes: 2.1%
000:
000: 400 tasks, 409600 files/directories
000: 10/14/2011 01:17:15: Process 0(hyperion319): FAILED in create_remove_items_helper, unable to remove directory: No such file or directory
srun: mvapich: 2011-10-14T01:17:15: ABORT from MPI rank 0 [on hyperion319]
000: [0] [MPI Abort by user] Aborting Program!
000: [0:hyperion319] Abort: MPI_Abort() code: 1, rank 0, MPI Abort by user Aborting program ! at line 99 in file mpid_init.c
000: slurmd[hyperion319]: *** STEP 1219936.0 KILLED AT 2011-10-14T01:17:15 WITH SIGNAL 9 ***



 Comments   
Comment by Sarah Liu [ 14/Oct/11 ]

I tried to reproduce this bug on Toro, it can pass with 3 clients and MDTEST_NFILES=1024, but failed with 3 clients and MDTEST_NFILES=10240:

MDTEST_NFILES=1024:
----------------------
mdtest-1.8.3 was launched with 12 total task(s) on 3 nodes
Command line used: /usr/bin/mdtest -d /mnt/lustre/mdtest -i3 -n1024
Path: /mnt/lustre
FS: 46.9 GiB Used FS: 4.4% Inodes: 3.0 Mi Used Inodes: 0.0%

12 tasks, 12288 files/directories

SUMMARY: (of 3 iterations)
Operation Max Min Mean Std Dev
--------- — — ---- -------
Directory creation: 6095.885 4288.836 5074.051 756.460
Directory stat : 28560.081 26921.815 27471.125 770.018
Directory removal : 4370.147 2651.794 3363.277 731.949
File creation : 7989.778 7864.541 7912.732 55.049
File stat : 23978.178 22347.507 23185.891 666.516
File removal : 8990.494 8197.678 8721.670 370.560
Tree creation : 2770.346 20.672 1548.695 1143.263
Tree removal : 1261.066 16.791 628.840 508.173

– finished at 10/14/2011 16:45:06 –
[root@client-15 ~]#

MDTEST_NFILES=10240:
---------------------------
mdtest-1.8.3 was launched with 12 total task(s) on 3 nodes
Command line used: /usr/bin/mdtest -d /mnt/lustre/mdtest -i3 -n10240
Path: /mnt/lustre
FS: 46.9 GiB Used FS: 4.4% Inodes: 3.0 Mi Used Inodes: 0.0%

12 tasks, 122880 files/directories
10/14/2011 16:49:14: Process 2(client-2.lab.whamcloud.com): FAILED in create_remove_items_helper, unable to remove directory: No such file or directory
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 2 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun has exited due to process rank 2 with PID 3690 on
node client-2 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[root@client-15 ~]#

Comment by Peter Jones [ 17/Oct/11 ]

Oleg will look into this one

Comment by Oleg Drokin [ 19/Oct/11 ]

Ok, so I was diving into it and it seems to be related to some ldiskfs problems.

Here is a relevant mds log part:
00000100:00000001:23:1319040202.309449:0:6653:0:(events.c:322:request_in_callback()) Process leaving
00000004:00000001:10:1319040202.309449:0:7050:0:(mds_reint.c:1607:mds_get_parent_child_locked()) Process entered
00000004:00000001:10:1319040202.309450:0:7050:0:(mds_reint.c:92:mds_lookup()) Process entered00000004:00000001:10:1319040202.309454:0:7050:0:(mds_reint.c:115:mds_lookup()) Process leaving (rc=18446604446544310608 : -139627165241008 : ffff8102846a3150)
00000004:00000001:10:1319040202.309457:0:7050:0:(handler.c:256:mds_fid2dentry()) Process leaving (rc=18446744073709551614 : -2 : fffffffffffffffe)00000004:00000001:10:1319040202.309459:0:7050:0:(mds_reint.c:1614:mds_get_parent_child_locked()) Process leaving (rc=18446744073709551614 : -2 : fffffffffffffffe)

This tells us that ldiskfs somehow returned an inode from lookup that had zero nlink which is not right at all, as nobody could delete it yet.

So after the job dies I located this dir (from client logs) and did ls on it - it was still there, and then removed it - and it removed with no fuss.

Another intereting point, in my testing I am only able to reproduce the problem if I run the two mdtests sequentially like specified here - one with 1024 files followed by 10240 files in the same dir.

If I run just the 10240 files run in another dir or same dir after it was deleted, I cannot seem to be able to reproduce it, so I wonder if there is some leftover state in the large dir left that runins the day later on?

Comment by Andreas Dilger [ 19/Oct/11 ]

Has anyone tried running this same test with 1.8.6 on the MDS to see if it passes? It would also be useful to either compile the 1.8.6 Lustre against the 2.6.18-274.3.1.el5 (RHEL 5) kernel from 1.8.7, or 1.8.7 against the 2.6.18-238.12.1.el5 (RHEL 5) from 1.8.6.

It is possible that this is either an upstream ldiskfs bug, or possibly an existing bug that we didn't notice before due to a different test environment.

Comment by Oleg Drokin [ 19/Oct/11 ]

I asked Sarah to try on 1.8.5 an 1.8.6 to see if it happens too.

Comment by Sarah Liu [ 19/Oct/11 ]

I finished the same tests on 1.8.6, run 4 times, they all pass:

[root@client-15 ~]# mpirun -np 12 -H client-15,client-5,client-2 -mca plm_rsh_agent rsh /usr/bin/mdtest -d /mnt/lustre/mdtest -i3 -n1024
connect to address 10.10.4.5 port 544: connect to address 10.10.4.2 port 544: Connection refused
Connection refused
Trying krb4 rsh...
Trying krb4 rsh...
connect to address 10.10.4.2 port 544: Connection refused
trying normal rsh (/usr/bin/rsh)
connect to address 10.10.4.5 port 544: Connection refused
trying normal rsh (/usr/bin/rsh)
– started at 10/19/2011 13:01:08 –

mdtest-1.8.3 was launched with 12 total task(s) on 3 nodes
Command line used: /usr/bin/mdtest -d /mnt/lustre/mdtest -i3 -n1024
Path: /mnt/lustre
FS: 46.9 GiB Used FS: 4.3% Inodes: 2.0 Mi Used Inodes: 0.0%

12 tasks, 12288 files/directories

SUMMARY: (of 3 iterations)
Operation Max Min Mean Std Dev
--------- — — ---- -------
Directory creation: 6351.096 3853.853 4965.091 1037.843
Directory stat : 24382.547 14461.591 19932.550 4114.050
Directory removal : 3375.762 2219.832 2795.249 471.920
File creation : 7368.815 6689.622 7084.639 288.143
File stat : 23931.050 23541.947 23698.468 167.708
File removal : 8466.017 7895.573 8205.915 235.577
Tree creation : 2369.663 213.028 1297.122 880.481
Tree removal : 1602.715 80.465 631.532 688.807

– finished at 10/19/2011 13:01:42 –
[root@client-15 ~]# mpirun -np 12 -H client-15,client-5,client-2 -mca plm_rsh_agent rsh /usr/bin/mdtest -d /mnt/lustre/mdtest -i3 -n10240
connect to address 10.10.4.2 port 544: Connection refused
Trying krb4 rsh...
connect to address 10.10.4.5 port 544: Connection refused
Trying krb4 rsh...
connect to address 10.10.4.2 port 544: Connection refused
trying normal rsh (/usr/bin/rsh)
connect to address 10.10.4.5 port 544: Connection refused
trying normal rsh (/usr/bin/rsh)
– started at 10/19/2011 13:02:24 –

mdtest-1.8.3 was launched with 12 total task(s) on 3 nodes
Command line used: /usr/bin/mdtest -d /mnt/lustre/mdtest -i3 -n10240
Path: /mnt/lustre
FS: 46.9 GiB Used FS: 4.3% Inodes: 2.0 Mi Used Inodes: 0.0%

12 tasks, 122880 files/directories

SUMMARY: (of 3 iterations)
Operation Max Min Mean Std Dev
--------- — — ---- -------
Directory creation: 3407.669 2985.174 3263.238 196.670
Directory stat : 19586.197 16856.908 18567.720 1217.032
Directory removal : 2116.226 2032.805 2070.180 34.604
File creation : 6603.196 6245.869 6459.676 154.112
File stat : 22034.710 21846.257 21921.526 81.473
File removal : 7017.384 6581.323 6831.171 183.620
Tree creation : 1501.720 17.186 972.647 676.912
Tree removal : 265.883 14.524 109.538 111.405

– finished at 10/19/2011 13:09:44 –
INIT: version 2.86 reloading
[root@client-15 ~]# rpm -qa|grep lustre
lustre-client-tests-1.8.6-wc1_2.6.18_238.12.1.el5
lustre-client-1.8.6-wc1_2.6.18_238.12.1.el5
lustre-client-modules-1.8.6-wc1_2.6.18_238.12.1.el5

Comment by Sarah Liu [ 20/Oct/11 ]

As Peter and Andreas suggested, besides 1.8.6, I tried two builds between 1.8.6 and 1.8.7-rc1--build #130 and #131. For #131, kernel is updated from 2.6.18-238.12.1.el5 to 2.6.18-274.3.1.el5. The result is #130 can pass both mdtest(1024 and 10240) while #131 hit this problem and after this failure, test with MDTEST_NFILES=1024 cannot pass either.

#130
---------------------------------------------------------------------------
[root@client-15 ~]# mpirun -np 12 -H client-15,client-5,client-2 -mca plm_rsh_agent rsh /usr/bin/mdtest -d /mnt/lustre/mdtest -i3 -n1024
connect to address 10.10.4.5 port 544: Connection refused
Trying krb4 rsh...
connect to address 10.10.4.2 port 544: Connection refused
Trying krb4 rsh...
connect to address 10.10.4.2 port 544: Connection refused
trying normal rsh (/usr/bin/rsh)
connect to address 10.10.4.5 port 544: Connection refused
trying normal rsh (/usr/bin/rsh)
– started at 10/19/2011 17:19:32 –

mdtest-1.8.3 was launched with 12 total task(s) on 3 nodes
Command line used: /usr/bin/mdtest -d /mnt/lustre/mdtest -i3 -n1024
Path: /mnt/lustre
FS: 46.9 GiB Used FS: 4.4% Inodes: 3.0 Mi Used Inodes: 0.0%

12 tasks, 12288 files/directories

SUMMARY: (of 3 iterations)
Operation Max Min Mean Std Dev
--------- — — ---- -------
Directory creation: 4853.888 3735.464 4346.004 462.328
Directory stat : 16579.909 13008.459 15348.596 1655.482
Directory removal : 2295.923 2191.987 2229.765 46.938
File creation : 6724.576 6253.199 6528.509 200.431
File stat : 23266.734 22682.011 23025.525 249.433
File removal : 8424.377 7434.221 7924.369 404.290
Tree creation : 2605.158 1328.152 1771.459 589.910
Tree removal : 634.923 46.341 366.383 243.032

– finished at 10/19/2011 17:20:12 –
[root@client-15 ~]# mpirun -np 12 -H client-15,client-5,client-2 -mca plm_rsh_agent rsh /usr/bin/mdtest -d /mnt/lustre/mdtest -i3 -n10240
connect to address 10.10.4.5 port 544: Connection refused
Trying krb4 rsh...
connect to address 10.10.4.2 port 544: Connection refused
Trying krb4 rsh...
connect to address 10.10.4.5 port 544: Connection refused
trying normal rsh (/usr/bin/rsh)
connect to address 10.10.4.2 port 544: Connection refused
trying normal rsh (/usr/bin/rsh)
– started at 10/19/2011 17:29:59 –

mdtest-1.8.3 was launched with 12 total task(s) on 3 nodes
Command line used: /usr/bin/mdtest -d /mnt/lustre/mdtest -i3 -n10240
Path: /mnt/lustre
FS: 46.9 GiB Used FS: 4.4% Inodes: 3.0 Mi Used Inodes: 0.0%

12 tasks, 122880 files/directories

SUMMARY: (of 3 iterations)
Operation Max Min Mean Std Dev
--------- — — ---- -------
Directory creation: 3336.135 2802.707 3099.313 221.836
Directory stat : 19458.564 16126.399 17976.454 1385.008
Directory removal : 2532.999 2463.258 2490.151 30.625
File creation : 6323.514 6055.286 6199.546 110.440
File stat : 21670.612 21482.912 21568.812 77.449
File removal : 7048.993 6775.570 6876.685 122.451
Tree creation : 1633.932 31.719 1051.878 723.751
Tree removal : 33.018 10.923 19.498 9.675

– finished at 10/19/2011 17:36:58 –
INIT: version 2.86 reloading

[root@client-15 ~]# rpm -qa|grep lustre
lustre-client-tests-1.8.6.80-wc1_2.6.18_238.12.1.el5_geb7f696
lustre-client-1.8.6.80-wc1_2.6.18_238.12.1.el5_geb7f696
lustre-client-modules-1.8.6.80-wc1_2.6.18_238.12.1.el5_geb7f696
[root@client-15 ~]#

#131
-----------------------------------------------------------------------
[root@client-15 ~]# mpirun -np 12 -H client-15,client-5,client-2 -mca plm_rsh_agent rsh /usr/bin/mdtest -d /mnt/lustre/mdtest -i3 -n1024
connect to address 10.10.4.5 port 544: Connection refused
Trying krb4 rsh...
connect to address 10.10.4.2 port 544: Connection refused
Trying krb4 rsh...
connect to address 10.10.4.2 port 544: Connection refused
trying normal rsh (/usr/bin/rsh)
connect to address 10.10.4.5 port 544: Connection refused
trying normal rsh (/usr/bin/rsh)
– started at 10/19/2011 21:52:55 –

mdtest-1.8.3 was launched with 12 total task(s) on 3 nodes
Command line used: /usr/bin/mdtest -d /mnt/lustre/mdtest -i3 -n1024
Path: /mnt/lustre
FS: 46.9 GiB Used FS: 4.4% Inodes: 3.0 Mi Used Inodes: 0.0%

12 tasks, 12288 files/directories

SUMMARY: (of 3 iterations)
Operation Max Min Mean Std Dev
--------- — — ---- -------
Directory creation: 5487.495 3353.100 4100.389 981.813
Directory stat : 27125.824 25855.645 26577.083 532.733
Directory removal : 7818.470 6933.606 7367.622 361.440
File creation : 7796.005 7057.546 7418.017 301.729
File stat : 24489.842 22745.956 23741.504 733.084
File removal : 8910.799 8573.264 8721.124 140.935
Tree creation : 3460.647 77.585 1758.095 1381.217
Tree removal : 1204.567 750.726 1011.358 191.316

– finished at 10/19/2011 21:53:21 –
[root@client-15 ~]# mpirun -np 12 -H client-15,client-5,client-2 -mca plm_rsh_agent rsh /usr/bin/mdtest -d /mnt/lustre/mdtest -i3 -n10240
connect to address 10.10.4.5 port 544: Connection refused
Trying krb4 rsh...
connect to address 10.10.4.2 port 544: Connection refused
Trying krb4 rsh...
connect to address 10.10.4.5 port 544: Connection refused
trying normal rsh (/usr/bin/rsh)
connect to address 10.10.4.2 port 544: Connection refused
trying normal rsh (/usr/bin/rsh)
– started at 10/19/2011 21:53:31 –

mdtest-1.8.3 was launched with 12 total task(s) on 3 nodes
Command line used: /usr/bin/mdtest -d /mnt/lustre/mdtest -i3 -n10240
Path: /mnt/lustre
FS: 46.9 GiB Used FS: 4.4% Inodes: 3.0 Mi Used Inodes: 0.0%

12 tasks, 122880 files/directories
10/19/2011 21:54:18: Process 4(client-5.lab.whamcloud.com): FAILED in create_remove_items_helper, unable to remove directory: No such file or directory
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 4 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun has exited due to process rank 4 with PID 3683 on
node client-5 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[root@client-15 ~]# rpm -qa|grep lustre
lustre-client-tests-1.8.6.80-wc1_2.6.18_274.3.1.el5_gbec8184
lustre-client-1.8.6.80-wc1_2.6.18_274.3.1.el5_gbec8184
lustre-client-modules-1.8.6.80-wc1_2.6.18_274.3.1.el5_gbec8184
[root@client-15 ~]#

Comment by Peter Jones [ 20/Oct/11 ]

YangSheng

Could you please look into what about the checkin identified as the culprit has caused this problem?

thanks

Peter

Comment by Yang Sheng [ 20/Oct/11 ]

Ok. Taken.

Comment by Andreas Dilger [ 20/Oct/11 ]

Yang Sheng, can you please attach a diff between those two kernels fs/.[ch] fs/ext4/.[ch]. It may be that this is a change in the ext4 code and/or VFS that is causing the problem.

Comment by Yang Sheng [ 20/Oct/11 ]

fs diff

Comment by Yang Sheng [ 20/Oct/11 ]

ext4 diff

Comment by Andreas Dilger [ 20/Oct/11 ]

I don't see any obvious changes in those two patches that might be causing this problem.

In the ext4 code there is a patch in ext4_free_inode() that is duplicating our own ext4-claim_inode-free_inode-race.patch, which has been removed from the ldiskfs-2.6-rhel5-ext4.series. The rest of the patches are for AIO with O_DIRECT files, which is completely irrelevant for the MDS, and a small fix to the quota code.

There is one change in the VFS code in d_obtain_alias(), but I don't see that we use this on the MDS. Possibly the problem relates to our iopen patch? I can't see anything obvious.

Hmm, I thought the two builds being tested were only before/after the kernel update. I see that there is another patch (bec818434c27bb390b4c8866e73d1afb0dd9e884, http://review.whamcloud.com/1259) that was also included in that patch, and it seems very relevant, and a good candidate for the cause of this bug.

Comment by Andreas Dilger [ 20/Oct/11 ]

That patch was from LU-146.

Comment by Oleg Drokin [ 20/Oct/11 ]

Actually I kind of don't think the LU-146 patch is relevant.

We only see the problem in mkdir/rmdir case.
The mkdir path is completely untouched by the patch.
The rmdir path is touched a little bit - the commong get_parent_child_locked function is changed, but htere is "op" argument added and all new functionality is added after check for open as I see it.

Besides, the erroneous lookup (rather, invalid nlink value) clearly comes from ldiskfs, and ldiskfs was not touched by LU-146.

Can you please elaborate?

Comment by Oleg Drokin [ 20/Oct/11 ]

btw I would agree that running a test with just kernel different and perhaps just lu-146 applied would be a good idea to clear who's at fault.

Comment by Oleg Drokin [ 20/Oct/11 ]

btwm another note.
See that hte failure was in trying to lookup paremt, tihs part of the mds_get_parent_locked() - at the very beginning that was not touched:

/* Step 1: Lookup parent */
*dparentp = mds_fid2dentry(mds, fid, NULL);
if (IS_ERR(*dparentp))

{ rc = PTR_ERR(*dparentp); *dparentp = NULL; RETURN(rc); }
Comment by Sarah Liu [ 20/Oct/11 ]

I will test the review build which contain one patch at a time. Will keep you updated.

Comment by Sarah Liu [ 21/Oct/11 ]

I tested review build #2398 which included patch for LU-146 and #2437 for LU-674, got the same issue on both builds. I will rerun the tests on b1_8 build #130(doesn't contain these 2 patches)again to confirm that it can pass.

Comment by Johann Lombardi (Inactive) [ 21/Oct/11 ]

From ext4_iget() (called from iopen):

        if (inode->i_nlink == 0) { 
                if (inode->i_mode == 0 || 
                    !(EXT4_SB(inode->i_sb)->s_mount_state & EXT4_ORPHAN_FS)) {
                        /* this inode is deleted */
                        ret = -ESTALE;
                        goto bad_inode;
                }    
                /* The only unlinked inodes we let through here have
                 * valid i_mode and are being read by the orphan
                 * recovery code: that's fine, we're about to complete
                 * the process of deleting those. */
        }

So the fs has EXT4_ORPHAN_FS set? How is this possible?
The only option is that someone else dropped nlink after ext4_iget() successfully completed. This would thus be a race, so i would really not disregard (as discussed when this bug was initially reported) LU-146 which touches the locking code, although i have no idea how it could trigger this.

Comment by Oleg Drokin [ 21/Oct/11 ]

Note that nlink=0 is reported on parent dir.
The parent dir is never deleted, but it's nlink count is 1 (because we went past 32k dirs in it), so if some code does nlink--, we'll have 0 at least briefly.
Also note that the place where we do lookup is not locked by ldlm lock yet, so if there is some possibility for parallel rmdir.

Regarding the code you quote, I believe it's jsut that i_mode is NOT 0 (of course) and the EXT4_ORPHAN_FS is NOT set as well.

So time to see what might be doing nlink-- on rmdir path even if briefly?

Comment by Andreas Dilger [ 21/Oct/11 ]

Yes, I see this now - the directory link count is temporarily set to 0 if a subdirectory is deleted:

int ext4_rmdir(struct inode *dir, struct dentry *dentry)
{
        :
        :
        ext4_dec_count(handle, dir);
        :
        :
}

static void ext4_dec_count(handle_t *handle, struct inode *inode)
{      
        drop_nlink(inode);
        if (S_ISDIR(inode->i_mode) && inode->i_nlink == 0)
                inc_nlink(inode);
}

This is called via ll_vfs_rmdir(). In mds_reint_create() and mds_mfd_close it calls ll_vfs_rmdir(dir) under LOCK_INODE_MUTEX(dir). Perhaps we are not doing the lookup under i_mutex, or not refreshing the attributes under i_mutex?

Comment by Andreas Dilger [ 21/Oct/11 ]

It looks like the patch landed to ext4 is not the same as what was in our original patch:

static inline void ext3_dec_count(handle_t *handle, struct inode *inode)
{      
      if (!S_ISDIR(inode->i_mode) || inode->i_nlink > 2)
              drop_nlink(inode);
}

I don't know if the locking changes in LU-146 changed this in some way.

Comment by Oleg Drokin [ 21/Oct/11 ]

LU-146 did not change any locking outside of the open path. Also it had no ldiskfs changes too.

Comment by Andreas Dilger [ 21/Oct/11 ]

What I meant was whether the changes in the LU-146 locking made it racy w.r.t. getting the i_nlink count for the parent directory without doing LOCK_INODE_MUTEX(), so that if another process is doing ext4_dec_count() on the parent at that same time it can briefly see the parent directory with i_nlink == 0.

It is possible to add another ext4 patch to change ext4_dec_count() back to how it was in our patches, which is probably the most expedient thing to do for 1.8.7, but it would be better to avoid the race in Lustre instead of adding more patches to ldiskfs. That said, I don't think this is needed for 2.x, so maybe that is OK until 1.8.x is gone.

Comment by Oleg Drokin [ 21/Oct/11 ]

I don't think lu-146 touches anything related to attrs readign and inude mutex around it, its likely a problem we had from the very beginning.

Comment by Oleg Drokin [ 21/Oct/11 ]

BTW having that said, why would it be fine in the kernel to have nlink 0 even temporarily? It's error prone and all, why not have it correct at all times instead?
This is something I think should also be addressed upstream.

Comment by Andreas Dilger [ 21/Oct/11 ]

It is OK for the kernel because this is always accessed under i_mutex in cases that matter? I haven't checked yet, but my guess is that mds_get_parent_child_locked() (or somewhere else) is accessing i_nlink without a lock?

That said, now I'm a bit confused because the code in question (ext4_iget()) is only hitting this code path when reading the inode from disk. That means either we are getting inode aliases in memory (seems unlikely, since it would be getting a valid i_nlink from disk, since ext4_dec_count() is only working on the in-memory inode) or we are accessing the parent inode before it is fully set up. It may possibly relate to strangeness in the iopen code, I haven't looked at that in a long time.

It may be that my source of the i_nlink=0 condition is misdiagnosed.

Comment by Oleg Drokin [ 21/Oct/11 ]

Your source of the problem seems to be very plausible.

I am not sure what iget has to do with this, as we are getting this inode via normal lookup, we don't get any sort of an error from ldiskfs itself, just our own checking of nlink is finding that it's suddenly 0.

And yes, we do check it without any locking:

        result = mds_lookup(obd, fid_name, mds->mds_fid_de, strlen(fid_name));
        if (IS_ERR(result))
                RETURN(result);

        inode = result->d_inode;
        if (!inode)
                RETURN(ERR_PTR(-ENOENT));

       if (inode->i_nlink == 0) {
                if (inode->i_mode == 0 &&
                    LTIME_S(inode->i_ctime) == 0 ) {
                        LCONSOLE_WARN("Found inode with zero nlink, mode and "
                                      "ctime -- this may indicate disk"
                                      "corruption (device %s, inode %lu, link:"
                                      " %lu, count: %d)\n", obd->obd_name, inode
->i_ino,
                                      (unsigned long)inode->i_nlink,
                                      atomic_read(&inode->i_count));
                }
                dput(result);
                RETURN(ERR_PTR(-ENOENT));
        }
Comment by Andreas Dilger [ 21/Oct/11 ]

I've just created http://review.whamcloud.com/1579 that has a patch which may fix this problem. It hasn't been tested at all, but hopefully it is right.

Comment by Sarah Liu [ 21/Oct/11 ]

I will test it when Toro is accessible.

Comment by Sarah Liu [ 23/Oct/11 ]

I've tested patch http://review.whamcloud.com/#change,1579 for 3 times, both 1024 and 10240 pass.

Comment by Build Master (Inactive) [ 24/Oct/11 ]

Integrated in lustre-b1_8 » x86_64,client,el5,inkernel #148
LU-762 ldiskfs: don't drop directory nlink to 0

Johann Lombardi : 9500ebf6146dcd50dab094b502d4b50d0123296a
Files :

  • ldiskfs/kernel_patches/patches/ext4-nlink-2.6-rhel5.patch
  • ldiskfs/kernel_patches/series/ldiskfs-2.6-rhel5-ext4.series
Comment by Build Master (Inactive) [ 24/Oct/11 ]

Integrated in lustre-b1_8 » x86_64,client,el5,ofa #148
LU-762 ldiskfs: don't drop directory nlink to 0

Johann Lombardi : 9500ebf6146dcd50dab094b502d4b50d0123296a
Files :

  • ldiskfs/kernel_patches/series/ldiskfs-2.6-rhel5-ext4.series
  • ldiskfs/kernel_patches/patches/ext4-nlink-2.6-rhel5.patch
Comment by Build Master (Inactive) [ 24/Oct/11 ]

Integrated in lustre-b1_8 » x86_64,client,el6,inkernel #148
LU-762 ldiskfs: don't drop directory nlink to 0

Johann Lombardi : 9500ebf6146dcd50dab094b502d4b50d0123296a
Files :

  • ldiskfs/kernel_patches/series/ldiskfs-2.6-rhel5-ext4.series
  • ldiskfs/kernel_patches/patches/ext4-nlink-2.6-rhel5.patch
Comment by Build Master (Inactive) [ 24/Oct/11 ]

Integrated in lustre-b1_8 » x86_64,server,el5,ofa #148
LU-762 ldiskfs: don't drop directory nlink to 0

Johann Lombardi : 9500ebf6146dcd50dab094b502d4b50d0123296a
Files :

  • ldiskfs/kernel_patches/patches/ext4-nlink-2.6-rhel5.patch
  • ldiskfs/kernel_patches/series/ldiskfs-2.6-rhel5-ext4.series
Comment by Build Master (Inactive) [ 24/Oct/11 ]

Integrated in lustre-b1_8 » x86_64,client,ubuntu1004,inkernel #148
LU-762 ldiskfs: don't drop directory nlink to 0

Johann Lombardi : 9500ebf6146dcd50dab094b502d4b50d0123296a
Files :

  • ldiskfs/kernel_patches/patches/ext4-nlink-2.6-rhel5.patch
  • ldiskfs/kernel_patches/series/ldiskfs-2.6-rhel5-ext4.series
Comment by Build Master (Inactive) [ 24/Oct/11 ]

Integrated in lustre-b1_8 » i686,client,el6,inkernel #148
LU-762 ldiskfs: don't drop directory nlink to 0

Johann Lombardi : 9500ebf6146dcd50dab094b502d4b50d0123296a
Files :

  • ldiskfs/kernel_patches/series/ldiskfs-2.6-rhel5-ext4.series
  • ldiskfs/kernel_patches/patches/ext4-nlink-2.6-rhel5.patch
Comment by Build Master (Inactive) [ 24/Oct/11 ]

Integrated in lustre-b1_8 » i686,client,el5,ofa #148
LU-762 ldiskfs: don't drop directory nlink to 0

Johann Lombardi : 9500ebf6146dcd50dab094b502d4b50d0123296a
Files :

  • ldiskfs/kernel_patches/patches/ext4-nlink-2.6-rhel5.patch
  • ldiskfs/kernel_patches/series/ldiskfs-2.6-rhel5-ext4.series
Comment by Build Master (Inactive) [ 24/Oct/11 ]

Integrated in lustre-b1_8 » i686,client,el5,inkernel #148
LU-762 ldiskfs: don't drop directory nlink to 0

Johann Lombardi : 9500ebf6146dcd50dab094b502d4b50d0123296a
Files :

  • ldiskfs/kernel_patches/series/ldiskfs-2.6-rhel5-ext4.series
  • ldiskfs/kernel_patches/patches/ext4-nlink-2.6-rhel5.patch
Comment by Build Master (Inactive) [ 24/Oct/11 ]

Integrated in lustre-b1_8 » i686,server,el5,inkernel #148
LU-762 ldiskfs: don't drop directory nlink to 0

Johann Lombardi : 9500ebf6146dcd50dab094b502d4b50d0123296a
Files :

  • ldiskfs/kernel_patches/series/ldiskfs-2.6-rhel5-ext4.series
  • ldiskfs/kernel_patches/patches/ext4-nlink-2.6-rhel5.patch
Comment by Build Master (Inactive) [ 24/Oct/11 ]

Integrated in lustre-b1_8 » i686,server,el5,ofa #148
LU-762 ldiskfs: don't drop directory nlink to 0

Johann Lombardi : 9500ebf6146dcd50dab094b502d4b50d0123296a
Files :

  • ldiskfs/kernel_patches/patches/ext4-nlink-2.6-rhel5.patch
  • ldiskfs/kernel_patches/series/ldiskfs-2.6-rhel5-ext4.series
Comment by Build Master (Inactive) [ 24/Oct/11 ]

Integrated in lustre-b1_8 » x86_64,server,el5,inkernel #148
LU-762 ldiskfs: don't drop directory nlink to 0

Johann Lombardi : 9500ebf6146dcd50dab094b502d4b50d0123296a
Files :

  • ldiskfs/kernel_patches/series/ldiskfs-2.6-rhel5-ext4.series
  • ldiskfs/kernel_patches/patches/ext4-nlink-2.6-rhel5.patch
Comment by Cliff White (Inactive) [ 28/Oct/11 ]

The test fails on RC2 also:

0000: mdtest-1.7.3 was launched with 1032 total task(s) on 129 nodes
0000: Command line used: /opt/mdtest-1.7.4/bin/mdtest -d /p/l_wham/d0.mdtest -i 3 -n 242 -u
0000: Path: /p/l_wham
0000: FS: 87.0 TiB Used FS: 0.0% 0.0%Inodes: 546.9 Mi Used Inodes: 0.0%
0000:
0000: 1032 tasks, 249744 files/directories
0746: 10/28/2011 00:24:25: Process 746(hyperion412): FAILED in unique_dir_access, Unable to create unique directory: No such file or directory
0831: 10/28/2011 00:24:25: Process 831(hyperion422): FAILED in unique_dir_access, Unable to create unique directory: No such file or directory

Comment by Oleg Drokin [ 28/Oct/11 ]

This seems to be somewhat different problem now, fails in create because file does not exist? Need some logs I guess, probably from mds. e.g. +vfstrace +dlmtrace +...trace (this last one is the one that traces execution, though I forgot what the name is).

Comment by Cliff White (Inactive) [ 28/Oct/11 ]

I re-ran the test with the requested debug options, it failed. Am uploading to maloo now.

Comment by Cliff White (Inactive) [ 31/Oct/11 ]

Malloo results https://maloo.whamcloud.com/test_sessions/02a44c1a-01a0-11e1-bb4f-52540025f9af

Comment by Cliff White (Inactive) [ 31/Oct/11 ]

MDS debug log

Comment by Oleg Drokin [ 31/Oct/11 ]

Ok, this log is way too small, it only covers 9 directory creates out of 1038.

I guess to get more consistent results, can you please increase debug_mb to a gig on clients annd to 2G on the MDS, in addition please use debug_daemon (with 10240 M argument to it) on the MDS.

Once failed, capture the MDS log from the debug daemon, and also capture a couple of lustre debug logs from clinets that reported uniq create failures.

I also checked mdtest 1.7.3 source and it appears the failure to create is the failure to create uniq dirs themselves, not during in the middle of the test surprisingly.

Comment by Cliff White (Inactive) [ 01/Nov/11 ]

Ran second test for Oleg Results at: https://maloo.whamcloud.com/test_sessions/c4849490-04af-11e1-aa99-52540025f9af
Debug logs set to 2gb, which was too big for maloo, logs are on brent/eric ~cliffw/lu762
Need instructions for next steps/test runs

Comment by Peter Jones [ 02/Nov/11 ]

So, my understanding is from talking to Oleg and Cliff that the reported "reoccurence" has been linked to using an old and unreliable version of mdtest so this issue is actually fixed in 1.8.7 RC2.

So, can this ticket be marked as resolved or does an equivalent fix need to land to master?

Comment by Oleg Drokin [ 02/Nov/11 ]

I suspect we need this on master for the case of running on an upgraded fs that was originally started as 1.8.x formatted, so no oi is used.

somebody need to confirm my understanding of this fact.

Comment by Peter Jones [ 03/Nov/11 ]

Talking to Andreas it seems safest to port this change to master. YangSheng can you please take care of that?

Comment by Andreas Dilger [ 03/Nov/11 ]

I'm not sure what you are asking about related to the OI. When a 1.8.x formatted filesystem is upgraded to 2.x it will get an OI to hold new FIDs, but old IGIF FIDs are not added to the OI (at least currently though it may make life easier in some cases, but that is a separate discussion).

I think it is reasonable to land the ldiskfs patch to master as well, just to keep the ldiskfs patches in sync. The patch was already accepted upstream for ext4, so eventually the patch can go away.

Yang Sheng, can you please add the ext4-nlink-2.6-rhel5.patch to the master ldiskfs series as well.

Comment by Yang Sheng [ 03/Nov/11 ]

Hi, Peter, Adilger, patch upload to:http://review.whamcloud.com/#change,1644

Comment by Build Master (Inactive) [ 11/Nov/11 ]

Integrated in lustre-master » x86_64,client,el5,inkernel #350
LU-762 ldiskfs: don't drop directory nlink to 0 (Revision 51f76e02e2a04070ff2125285642430f703962ac)

Result = SUCCESS
Oleg Drokin : 51f76e02e2a04070ff2125285642430f703962ac
Files :

  • ldiskfs/kernel_patches/series/ldiskfs-2.6-rhel5-ext4.series
  • ldiskfs/kernel_patches/series/ldiskfs-2.6-rhel6.series
  • ldiskfs/kernel_patches/patches/ext4-nlink-2.6-rhel5.patch
Comment by Build Master (Inactive) [ 11/Nov/11 ]

Integrated in lustre-master » x86_64,client,el5,ofa #350
LU-762 ldiskfs: don't drop directory nlink to 0 (Revision 51f76e02e2a04070ff2125285642430f703962ac)

Result = SUCCESS
Oleg Drokin : 51f76e02e2a04070ff2125285642430f703962ac
Files :

  • ldiskfs/kernel_patches/series/ldiskfs-2.6-rhel5-ext4.series
  • ldiskfs/kernel_patches/series/ldiskfs-2.6-rhel6.series
  • ldiskfs/kernel_patches/patches/ext4-nlink-2.6-rhel5.patch
Comment by Build Master (Inactive) [ 11/Nov/11 ]

Integrated in lustre-master » x86_64,client,sles11,inkernel #350
LU-762 ldiskfs: don't drop directory nlink to 0 (Revision 51f76e02e2a04070ff2125285642430f703962ac)

Result = SUCCESS
Oleg Drokin : 51f76e02e2a04070ff2125285642430f703962ac
Files :

  • ldiskfs/kernel_patches/series/ldiskfs-2.6-rhel6.series
  • ldiskfs/kernel_patches/series/ldiskfs-2.6-rhel5-ext4.series
  • ldiskfs/kernel_patches/patches/ext4-nlink-2.6-rhel5.patch
Comment by Build Master (Inactive) [ 11/Nov/11 ]

Integrated in lustre-master » x86_64,client,el6,inkernel #350
LU-762 ldiskfs: don't drop directory nlink to 0 (Revision 51f76e02e2a04070ff2125285642430f703962ac)

Result = SUCCESS
Oleg Drokin : 51f76e02e2a04070ff2125285642430f703962ac
Files :

  • ldiskfs/kernel_patches/series/ldiskfs-2.6-rhel5-ext4.series
  • ldiskfs/kernel_patches/series/ldiskfs-2.6-rhel6.series
  • ldiskfs/kernel_patches/patches/ext4-nlink-2.6-rhel5.patch
Comment by Build Master (Inactive) [ 11/Nov/11 ]

Integrated in lustre-master » x86_64,server,el5,inkernel #350
LU-762 ldiskfs: don't drop directory nlink to 0 (Revision 51f76e02e2a04070ff2125285642430f703962ac)

Result = SUCCESS
Oleg Drokin : 51f76e02e2a04070ff2125285642430f703962ac
Files :

  • ldiskfs/kernel_patches/series/ldiskfs-2.6-rhel6.series
  • ldiskfs/kernel_patches/patches/ext4-nlink-2.6-rhel5.patch
  • ldiskfs/kernel_patches/series/ldiskfs-2.6-rhel5-ext4.series
Comment by Build Master (Inactive) [ 11/Nov/11 ]

Integrated in lustre-master » i686,client,el6,inkernel #350
LU-762 ldiskfs: don't drop directory nlink to 0 (Revision 51f76e02e2a04070ff2125285642430f703962ac)

Result = SUCCESS
Oleg Drokin : 51f76e02e2a04070ff2125285642430f703962ac
Files :

  • ldiskfs/kernel_patches/series/ldiskfs-2.6-rhel6.series
  • ldiskfs/kernel_patches/patches/ext4-nlink-2.6-rhel5.patch
  • ldiskfs/kernel_patches/series/ldiskfs-2.6-rhel5-ext4.series
Comment by Build Master (Inactive) [ 11/Nov/11 ]

Integrated in lustre-master » i686,server,el6,inkernel #350
LU-762 ldiskfs: don't drop directory nlink to 0 (Revision 51f76e02e2a04070ff2125285642430f703962ac)

Result = SUCCESS
Oleg Drokin : 51f76e02e2a04070ff2125285642430f703962ac
Files :

  • ldiskfs/kernel_patches/series/ldiskfs-2.6-rhel5-ext4.series
  • ldiskfs/kernel_patches/series/ldiskfs-2.6-rhel6.series
  • ldiskfs/kernel_patches/patches/ext4-nlink-2.6-rhel5.patch
Comment by Build Master (Inactive) [ 11/Nov/11 ]

Integrated in lustre-master » x86_64,client,ubuntu1004,inkernel #350
LU-762 ldiskfs: don't drop directory nlink to 0 (Revision 51f76e02e2a04070ff2125285642430f703962ac)

Result = SUCCESS
Oleg Drokin : 51f76e02e2a04070ff2125285642430f703962ac
Files :

  • ldiskfs/kernel_patches/series/ldiskfs-2.6-rhel5-ext4.series
  • ldiskfs/kernel_patches/patches/ext4-nlink-2.6-rhel5.patch
  • ldiskfs/kernel_patches/series/ldiskfs-2.6-rhel6.series
Comment by Build Master (Inactive) [ 11/Nov/11 ]

Integrated in lustre-master » x86_64,server,el5,ofa #350
LU-762 ldiskfs: don't drop directory nlink to 0 (Revision 51f76e02e2a04070ff2125285642430f703962ac)

Result = SUCCESS
Oleg Drokin : 51f76e02e2a04070ff2125285642430f703962ac
Files :

  • ldiskfs/kernel_patches/patches/ext4-nlink-2.6-rhel5.patch
  • ldiskfs/kernel_patches/series/ldiskfs-2.6-rhel5-ext4.series
  • ldiskfs/kernel_patches/series/ldiskfs-2.6-rhel6.series
Comment by Build Master (Inactive) [ 11/Nov/11 ]

Integrated in lustre-master » x86_64,server,el6,inkernel #350
LU-762 ldiskfs: don't drop directory nlink to 0 (Revision 51f76e02e2a04070ff2125285642430f703962ac)

Result = SUCCESS
Oleg Drokin : 51f76e02e2a04070ff2125285642430f703962ac
Files :

  • ldiskfs/kernel_patches/series/ldiskfs-2.6-rhel6.series
  • ldiskfs/kernel_patches/patches/ext4-nlink-2.6-rhel5.patch
  • ldiskfs/kernel_patches/series/ldiskfs-2.6-rhel5-ext4.series
Comment by Build Master (Inactive) [ 11/Nov/11 ]

Integrated in lustre-master » i686,server,el5,inkernel #350
LU-762 ldiskfs: don't drop directory nlink to 0 (Revision 51f76e02e2a04070ff2125285642430f703962ac)

Result = SUCCESS
Oleg Drokin : 51f76e02e2a04070ff2125285642430f703962ac
Files :

  • ldiskfs/kernel_patches/series/ldiskfs-2.6-rhel6.series
  • ldiskfs/kernel_patches/series/ldiskfs-2.6-rhel5-ext4.series
  • ldiskfs/kernel_patches/patches/ext4-nlink-2.6-rhel5.patch
Comment by Build Master (Inactive) [ 11/Nov/11 ]

Integrated in lustre-master » i686,server,el5,ofa #350
LU-762 ldiskfs: don't drop directory nlink to 0 (Revision 51f76e02e2a04070ff2125285642430f703962ac)

Result = SUCCESS
Oleg Drokin : 51f76e02e2a04070ff2125285642430f703962ac
Files :

  • ldiskfs/kernel_patches/series/ldiskfs-2.6-rhel5-ext4.series
  • ldiskfs/kernel_patches/patches/ext4-nlink-2.6-rhel5.patch
  • ldiskfs/kernel_patches/series/ldiskfs-2.6-rhel6.series
Comment by Build Master (Inactive) [ 11/Nov/11 ]

Integrated in lustre-master » i686,client,el5,inkernel #350
LU-762 ldiskfs: don't drop directory nlink to 0 (Revision 51f76e02e2a04070ff2125285642430f703962ac)

Result = SUCCESS
Oleg Drokin : 51f76e02e2a04070ff2125285642430f703962ac
Files :

  • ldiskfs/kernel_patches/series/ldiskfs-2.6-rhel6.series
  • ldiskfs/kernel_patches/patches/ext4-nlink-2.6-rhel5.patch
  • ldiskfs/kernel_patches/series/ldiskfs-2.6-rhel5-ext4.series
Comment by Build Master (Inactive) [ 11/Nov/11 ]

Integrated in lustre-master » i686,client,el5,ofa #350
LU-762 ldiskfs: don't drop directory nlink to 0 (Revision 51f76e02e2a04070ff2125285642430f703962ac)

Result = SUCCESS
Oleg Drokin : 51f76e02e2a04070ff2125285642430f703962ac
Files :

  • ldiskfs/kernel_patches/series/ldiskfs-2.6-rhel6.series
  • ldiskfs/kernel_patches/patches/ext4-nlink-2.6-rhel5.patch
  • ldiskfs/kernel_patches/series/ldiskfs-2.6-rhel5-ext4.series
Comment by Yang Sheng [ 16/Dec/11 ]

Close first. Please feel free to reopen.

Generated at Sat Feb 10 01:10:10 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.