[LU-484] LASSERT(inode->i_nlink > 0) failed in osd_handler.c:osd_object_ref_del() Created: 05/Jul/11  Updated: 04/Oct/16  Resolved: 18/Jul/11

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Patrick Valentin (Inactive) Assignee: Niu Yawei (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Attachments: File gaia14.crash.trace    
Issue Links:
Related
is related to LU-6808 Interop 2.5.3<->master sanity test_22... Resolved
Severity: 3
Rank (Obsolete): 4951

 Description   

The CEA Bull customer reported two crashes when doing I/O benchmarks on a Lustre FS with quota activated.
After a quotaoff, the crash didn't happen anymore. Here's the signature :

LustreError: 29856:0:(osd_handler.c:1967:osd_object_ref_del())
ASSERTION(inode->i_nlink > 0) failed
Pid: 29856, comm: mdt_24

As they provided a system dump, I have analysed it with crash and provided the trace in attachement (stack and content of inode structure).



 Comments   
Comment by Johann Lombardi (Inactive) [ 05/Jul/11 ]
 
        if (S_ISDIR(ma->ma_attr.la_mode)) {
                /* Add "." and ".." for newly created dir */
                __mdd_ref_add(env, child, handle);
                rc = __mdd_index_insert_only(env, child, mdo2fid(child),
                                             dot, handle, BYPASS_CAPA);
                if (rc == 0)
                        rc = __mdd_index_insert_only(env, child, pfid,
                                                     dotdot, handle,
                                                     BYPASS_CAPA);
                if (rc != 0)
                        __mdd_ref_del(env, child, handle, 1);
        }

I'm not sure sanity-quota exercises this code path (i.e. EDQUOT on directory creation) ...

Comment by Peter Jones [ 06/Jul/11 ]

Niu

Could you please comment?

Thanks

Peter

Comment by Niu Yawei (Inactive) [ 07/Jul/11 ]

Hi, Johann

Even if the __mdd_index_insert_only() return -EDQUOT, I don't see why the following __mdd_ref_del() could hit this assert.

It looks like a race to me, Patrick, do you know what kind of operations were running on customer's system? Thanks.

Comment by Johann Lombardi (Inactive) [ 07/Jul/11 ]

Well, __mdd_ref_add() is noop if MNLINK_OBJ is set, while __mdd_ref_del(..., 1) definitely calls osd_object_ref_del().
Is this possible to have MNLINK set at this stage?

Comment by Niu Yawei (Inactive) [ 07/Jul/11 ]

This is a newly created dir, the NMLINK_OBJ should not be set, of course, it's possible if there is bug. Even if the MNLINK_OBJ was set, the i_nlink should be 1, so it really confused me.

Comment by Patrick Valentin (Inactive) [ 08/Jul/11 ]

Hi Niu,
It was only reported to me that an IO benchmark was running. I have transmited your question to our on-site Support team.
Patrick

Comment by Liang Zhen (Inactive) [ 09/Jul/11 ]

The calling trace is like this

namei.c
mdd_object_initialize()->
    __mdd_index_insert_only(...dotdot...)->
        osd_index_ea_insert()->
            osd_add_dot_dotdot->
                ldiskfs_add_dot_dotdot()

and please see ldiskfs/ext4_add_dot_dotdot:

namei.c
        inode->i_op = &ext4_dir_inode_operations;
        inode->i_fop = &ext4_dir_operations;
        inode->i_size = EXT4_I(inode)->i_disksize = inode->i_sb->s_blocksize;
        dir_block = ext4_bread(handle, inode, 0, 1, &err);
        if (!dir_block) {
                clear_nlink(inode);
                ext4_mark_inode_dirty(handle, inode);
                iput (inode);
                goto get_out;
        }

As we can see, if ext4_bread() failed to create a block, clear_nlink(inode) will set inode->i_nlink to ZERO and I think that's the reason we got the assertion in __mdd_ref_del() called in the code block below

namei.c
        if (S_ISDIR(ma->ma_attr.la_mode)) {
                /* Add "." and ".." for newly created dir */
                __mdd_ref_add(env, child, handle);
                rc = __mdd_index_insert_only(env, child, mdo2fid(child),
                                             dot, handle, BYPASS_CAPA);
                if (rc == 0)
                        rc = __mdd_index_insert_only(env, child, pfid,
                                                     dotdot, handle,
                                                     BYPASS_CAPA);
                if (rc != 0)
                        __mdd_ref_del(env, child, handle, 1);
        }
Comment by Niu Yawei (Inactive) [ 09/Jul/11 ]

Hi, Liang

Thank you for pointing out this, I think you are right. Will try to think up a way to handle the i_nlink correctly in such case.

Comment by Liang Zhen (Inactive) [ 09/Jul/11 ]

a few things we might want to notice:

  • ext4_add_dot_dotdot() will arbitrarily set inode->i_nlink = 2 which means the the refcount increased by __mdd_ref_add() will be overwritten, although the result is still correct
  • we must be very careful if we want to fix it in MDD layer because other OSDs (i.e: btrfs-osd in the future) might rely on these _mdd_ref_add/_mdd_ref_del

just FYI

Comment by Niu Yawei (Inactive) [ 11/Jul/11 ]

The patch is at: http://review.whamcloud.com/#change,1079

Comment by Build Master (Inactive) [ 15/Jul/11 ]

Integrated in lustre-master » i686,server,el6,inkernel #212
LU-484 Don't do error cleanup in ext4_add_dot_dotdot()

Oleg Drokin : a34dd87e7e9b2b1dc3d838dc59c03f01b3df99c6
Files :

  • ldiskfs/kernel_patches/patches/ext4_data_in_dirent-rhel6.patch
  • ldiskfs/kernel_patches/patches/ext4_data_in_dirent.patch
Comment by Build Master (Inactive) [ 15/Jul/11 ]

Integrated in lustre-master » x86_64,client,sles11,inkernel #212
LU-484 Don't do error cleanup in ext4_add_dot_dotdot()

Oleg Drokin : a34dd87e7e9b2b1dc3d838dc59c03f01b3df99c6
Files :

  • ldiskfs/kernel_patches/patches/ext4_data_in_dirent.patch
  • ldiskfs/kernel_patches/patches/ext4_data_in_dirent-rhel6.patch
Comment by Build Master (Inactive) [ 15/Jul/11 ]

Integrated in lustre-master » x86_64,server,el5,ofa #212
LU-484 Don't do error cleanup in ext4_add_dot_dotdot()

Oleg Drokin : a34dd87e7e9b2b1dc3d838dc59c03f01b3df99c6
Files :

  • ldiskfs/kernel_patches/patches/ext4_data_in_dirent-rhel6.patch
  • ldiskfs/kernel_patches/patches/ext4_data_in_dirent.patch
Comment by Build Master (Inactive) [ 15/Jul/11 ]

Integrated in lustre-master » x86_64,server,el6,inkernel #212
LU-484 Don't do error cleanup in ext4_add_dot_dotdot()

Oleg Drokin : a34dd87e7e9b2b1dc3d838dc59c03f01b3df99c6
Files :

  • ldiskfs/kernel_patches/patches/ext4_data_in_dirent.patch
  • ldiskfs/kernel_patches/patches/ext4_data_in_dirent-rhel6.patch
Comment by Build Master (Inactive) [ 15/Jul/11 ]

Integrated in lustre-master » i686,client,el6,inkernel #212
LU-484 Don't do error cleanup in ext4_add_dot_dotdot()

Oleg Drokin : a34dd87e7e9b2b1dc3d838dc59c03f01b3df99c6
Files :

  • ldiskfs/kernel_patches/patches/ext4_data_in_dirent-rhel6.patch
  • ldiskfs/kernel_patches/patches/ext4_data_in_dirent.patch
Comment by Build Master (Inactive) [ 15/Jul/11 ]

Integrated in lustre-master » x86_64,client,el5,inkernel #212
LU-484 Don't do error cleanup in ext4_add_dot_dotdot()

Oleg Drokin : a34dd87e7e9b2b1dc3d838dc59c03f01b3df99c6
Files :

  • ldiskfs/kernel_patches/patches/ext4_data_in_dirent.patch
  • ldiskfs/kernel_patches/patches/ext4_data_in_dirent-rhel6.patch
Comment by Build Master (Inactive) [ 15/Jul/11 ]

Integrated in lustre-master » x86_64,server,el5,inkernel #212
LU-484 Don't do error cleanup in ext4_add_dot_dotdot()

Oleg Drokin : a34dd87e7e9b2b1dc3d838dc59c03f01b3df99c6
Files :

  • ldiskfs/kernel_patches/patches/ext4_data_in_dirent.patch
  • ldiskfs/kernel_patches/patches/ext4_data_in_dirent-rhel6.patch
Comment by Build Master (Inactive) [ 15/Jul/11 ]

Integrated in lustre-master » i686,server,el5,ofa #212
LU-484 Don't do error cleanup in ext4_add_dot_dotdot()

Oleg Drokin : a34dd87e7e9b2b1dc3d838dc59c03f01b3df99c6
Files :

  • ldiskfs/kernel_patches/patches/ext4_data_in_dirent.patch
  • ldiskfs/kernel_patches/patches/ext4_data_in_dirent-rhel6.patch
Comment by Build Master (Inactive) [ 15/Jul/11 ]

Integrated in lustre-master » x86_64,client,el5,ofa #212
LU-484 Don't do error cleanup in ext4_add_dot_dotdot()

Oleg Drokin : a34dd87e7e9b2b1dc3d838dc59c03f01b3df99c6
Files :

  • ldiskfs/kernel_patches/patches/ext4_data_in_dirent.patch
  • ldiskfs/kernel_patches/patches/ext4_data_in_dirent-rhel6.patch
Comment by Build Master (Inactive) [ 15/Jul/11 ]

Integrated in lustre-master » x86_64,client,ubuntu1004,inkernel #212
LU-484 Don't do error cleanup in ext4_add_dot_dotdot()

Oleg Drokin : a34dd87e7e9b2b1dc3d838dc59c03f01b3df99c6
Files :

  • ldiskfs/kernel_patches/patches/ext4_data_in_dirent-rhel6.patch
  • ldiskfs/kernel_patches/patches/ext4_data_in_dirent.patch
Comment by Build Master (Inactive) [ 15/Jul/11 ]

Integrated in lustre-master » x86_64,client,el6,inkernel #212
LU-484 Don't do error cleanup in ext4_add_dot_dotdot()

Oleg Drokin : a34dd87e7e9b2b1dc3d838dc59c03f01b3df99c6
Files :

  • ldiskfs/kernel_patches/patches/ext4_data_in_dirent.patch
  • ldiskfs/kernel_patches/patches/ext4_data_in_dirent-rhel6.patch
Comment by Build Master (Inactive) [ 15/Jul/11 ]

Integrated in lustre-master » i686,server,el5,inkernel #212
LU-484 Don't do error cleanup in ext4_add_dot_dotdot()

Oleg Drokin : a34dd87e7e9b2b1dc3d838dc59c03f01b3df99c6
Files :

  • ldiskfs/kernel_patches/patches/ext4_data_in_dirent.patch
  • ldiskfs/kernel_patches/patches/ext4_data_in_dirent-rhel6.patch
Comment by Build Master (Inactive) [ 15/Jul/11 ]

Integrated in lustre-master » i686,client,el5,ofa #212
LU-484 Don't do error cleanup in ext4_add_dot_dotdot()

Oleg Drokin : a34dd87e7e9b2b1dc3d838dc59c03f01b3df99c6
Files :

  • ldiskfs/kernel_patches/patches/ext4_data_in_dirent-rhel6.patch
  • ldiskfs/kernel_patches/patches/ext4_data_in_dirent.patch
Comment by Build Master (Inactive) [ 15/Jul/11 ]

Integrated in lustre-master » i686,client,el5,inkernel #212
LU-484 Don't do error cleanup in ext4_add_dot_dotdot()

Oleg Drokin : a34dd87e7e9b2b1dc3d838dc59c03f01b3df99c6
Files :

  • ldiskfs/kernel_patches/patches/ext4_data_in_dirent.patch
  • ldiskfs/kernel_patches/patches/ext4_data_in_dirent-rhel6.patch
Comment by Niu Yawei (Inactive) [ 18/Jul/11 ]

landed for 2.1

Comment by Andreas Dilger [ 09/Feb/12 ]

Niu, can this bug be closed?

Comment by Niu Yawei (Inactive) [ 09/Feb/12 ]

Niu, can this bug be closed?

This is a closed bug, did we see any regression?

Generated at Sat Feb 10 01:07:29 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.