[LU-1055] recovery-mds osd_handler.c:1966:osd_declare_object_destroy()) ASSERTION(!lu_object_is_dying(dt->do_lu.lo_header)) failed Created: 30/Jan/12  Updated: 27/Mar/12  Resolved: 14/Feb/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.2.0
Fix Version/s: Lustre 2.2.0

Type: Bug Priority: Blocker
Reporter: Cliff White (Inactive) Assignee: Niu Yawei (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

Hyperion - Chaos5 clients and servers


Attachments: Text File crash.txt     File osd_obj_des.txt.gz    
Severity: 3
Rank (Obsolete): 4736

 Description   

Running recovery-scale flavor MDS on Hyperion, MDS failed after 3 failovers - MDT shows as mounted via df, but unmounted in /proc/mounts. mount -f appears to have cleared the mount.
Syslog and lustre log attached



 Comments   
Comment by Oleg Drokin [ 30/Jan/12 ]

Alex, I think this is another one of the problems surfaced after adding object destroy prototyping in osd api, similar in nature to LU-1017 ?

Comment by Alex Zhuravlev [ 31/Jan/12 ]

at the moment the theory is the following: once recovery is completed, MDS starts to scan PENDING/ to cleanup orphans; at the same time some clients unlink/close a file. so, orph_index_iterate() races with mdd_close().

can somebody confirm this race is possible?

Comment by Alex Zhuravlev [ 31/Jan/12 ]

if the theory is right, then original design relies on a result of direntry removal from PENDING/

then the following patch should be enough:

diff --git a/lustre/osd-ldiskfs/osd_handler.c b/lustre/osd-ldiskfs/osd_handler.
index 72bfd42..feb106f 100644
— a/lustre/osd-ldiskfs/osd_handler.c
+++ b/lustre/osd-ldiskfs/osd_handler.c
@@ -1963,7 +1963,6 @@ static int osd_declare_object_destroy(const struct lu_env
oh = container_of0(th, struct osd_thandle, ot_super);
LASSERT(oh->ot_handle == NULL);
LASSERT(inode);

  • LASSERT(!lu_object_is_dying(dt->do_lu.lo_header));

OSD_DECLARE_OP(oh, destroy);
OSD_DECLARE_OP(oh, delete);
@@ -1989,6 +1988,7 @@ static int osd_object_destroy(const struct lu_env *env,
LASSERT(oh->ot_handle);
LASSERT(inode);
LASSERT(osd_inode_unlinked(inode));
+ LASSERT(!lu_object_is_dying(dt->do_lu.lo_header));

OSD_EXEC_OP(th, destroy);

Comment by Peter Jones [ 02/Feb/12 ]

Niu

Could you please take care of this one?

Thanks

Peter

Comment by Niu Yawei (Inactive) [ 03/Feb/12 ]

http://review.whamcloud.com/2083

I agree with Alex, the orphan cleanup after recovery could race with the client close, move the LASSERT into real destroy function which protected by lock should be fine.

Comment by Build Master (Inactive) [ 14/Feb/12 ]

Integrated in lustre-master » x86_64,client,el5,ofa #471
LU-1055 remove improper assert in osd_declare_object_destroy (Revision 0117cbc7b085404af72978b0361b33966ac379cb)

Result = SUCCESS
Oleg Drokin : 0117cbc7b085404af72978b0361b33966ac379cb
Files :

  • lustre/osd-ldiskfs/osd_handler.c
Comment by Peter Jones [ 14/Feb/12 ]

Landed for 2.2

Comment by Build Master (Inactive) [ 14/Feb/12 ]

Integrated in lustre-master » x86_64,client,el5,inkernel #471
LU-1055 remove improper assert in osd_declare_object_destroy (Revision 0117cbc7b085404af72978b0361b33966ac379cb)

Result = SUCCESS
Oleg Drokin : 0117cbc7b085404af72978b0361b33966ac379cb
Files :

  • lustre/osd-ldiskfs/osd_handler.c
Comment by Build Master (Inactive) [ 14/Feb/12 ]

Integrated in lustre-master » x86_64,server,el5,ofa #471
LU-1055 remove improper assert in osd_declare_object_destroy (Revision 0117cbc7b085404af72978b0361b33966ac379cb)

Result = SUCCESS
Oleg Drokin : 0117cbc7b085404af72978b0361b33966ac379cb
Files :

  • lustre/osd-ldiskfs/osd_handler.c
Comment by Build Master (Inactive) [ 14/Feb/12 ]

Integrated in lustre-master » x86_64,server,el5,inkernel #471
LU-1055 remove improper assert in osd_declare_object_destroy (Revision 0117cbc7b085404af72978b0361b33966ac379cb)

Result = SUCCESS
Oleg Drokin : 0117cbc7b085404af72978b0361b33966ac379cb
Files :

  • lustre/osd-ldiskfs/osd_handler.c
Comment by Build Master (Inactive) [ 14/Feb/12 ]

Integrated in lustre-master » x86_64,client,sles11,inkernel #471
LU-1055 remove improper assert in osd_declare_object_destroy (Revision 0117cbc7b085404af72978b0361b33966ac379cb)

Result = SUCCESS
Oleg Drokin : 0117cbc7b085404af72978b0361b33966ac379cb
Files :

  • lustre/osd-ldiskfs/osd_handler.c
Comment by Build Master (Inactive) [ 14/Feb/12 ]

Integrated in lustre-master » x86_64,client,ubuntu1004,inkernel #471
LU-1055 remove improper assert in osd_declare_object_destroy (Revision 0117cbc7b085404af72978b0361b33966ac379cb)

Result = SUCCESS
Oleg Drokin : 0117cbc7b085404af72978b0361b33966ac379cb
Files :

  • lustre/osd-ldiskfs/osd_handler.c
Comment by Build Master (Inactive) [ 14/Feb/12 ]

Integrated in lustre-master » i686,client,el5,inkernel #471
LU-1055 remove improper assert in osd_declare_object_destroy (Revision 0117cbc7b085404af72978b0361b33966ac379cb)

Result = SUCCESS
Oleg Drokin : 0117cbc7b085404af72978b0361b33966ac379cb
Files :

  • lustre/osd-ldiskfs/osd_handler.c
Comment by Build Master (Inactive) [ 14/Feb/12 ]

Integrated in lustre-master » i686,client,el6,inkernel #471
LU-1055 remove improper assert in osd_declare_object_destroy (Revision 0117cbc7b085404af72978b0361b33966ac379cb)

Result = SUCCESS
Oleg Drokin : 0117cbc7b085404af72978b0361b33966ac379cb
Files :

  • lustre/osd-ldiskfs/osd_handler.c
Comment by Build Master (Inactive) [ 14/Feb/12 ]

Integrated in lustre-master » x86_64,client,el6,inkernel #471
LU-1055 remove improper assert in osd_declare_object_destroy (Revision 0117cbc7b085404af72978b0361b33966ac379cb)

Result = SUCCESS
Oleg Drokin : 0117cbc7b085404af72978b0361b33966ac379cb
Files :

  • lustre/osd-ldiskfs/osd_handler.c
Comment by Build Master (Inactive) [ 14/Feb/12 ]

Integrated in lustre-master » i686,server,el6,inkernel #471
LU-1055 remove improper assert in osd_declare_object_destroy (Revision 0117cbc7b085404af72978b0361b33966ac379cb)

Result = SUCCESS
Oleg Drokin : 0117cbc7b085404af72978b0361b33966ac379cb
Files :

  • lustre/osd-ldiskfs/osd_handler.c
Comment by Build Master (Inactive) [ 14/Feb/12 ]

Integrated in lustre-master » x86_64,server,el6,inkernel #471
LU-1055 remove improper assert in osd_declare_object_destroy (Revision 0117cbc7b085404af72978b0361b33966ac379cb)

Result = SUCCESS
Oleg Drokin : 0117cbc7b085404af72978b0361b33966ac379cb
Files :

  • lustre/osd-ldiskfs/osd_handler.c
Comment by Build Master (Inactive) [ 14/Feb/12 ]

Integrated in lustre-master » i686,server,el5,inkernel #471
LU-1055 remove improper assert in osd_declare_object_destroy (Revision 0117cbc7b085404af72978b0361b33966ac379cb)

Result = SUCCESS
Oleg Drokin : 0117cbc7b085404af72978b0361b33966ac379cb
Files :

  • lustre/osd-ldiskfs/osd_handler.c
Comment by Build Master (Inactive) [ 14/Feb/12 ]

Integrated in lustre-master » i686,server,el5,ofa #471
LU-1055 remove improper assert in osd_declare_object_destroy (Revision 0117cbc7b085404af72978b0361b33966ac379cb)

Result = SUCCESS
Oleg Drokin : 0117cbc7b085404af72978b0361b33966ac379cb
Files :

  • lustre/osd-ldiskfs/osd_handler.c
Comment by Build Master (Inactive) [ 14/Feb/12 ]

Integrated in lustre-master » i686,client,el5,ofa #471
LU-1055 remove improper assert in osd_declare_object_destroy (Revision 0117cbc7b085404af72978b0361b33966ac379cb)

Result = SUCCESS
Oleg Drokin : 0117cbc7b085404af72978b0361b33966ac379cb
Files :

  • lustre/osd-ldiskfs/osd_handler.c
Comment by Build Master (Inactive) [ 17/Feb/12 ]

Integrated in lustre-master » x86_64,server,el6,ofa #480
LU-1055 remove improper assert in osd_declare_object_destroy (Revision 0117cbc7b085404af72978b0361b33966ac379cb)

Result = FAILURE
Oleg Drokin : 0117cbc7b085404af72978b0361b33966ac379cb
Files :

  • lustre/osd-ldiskfs/osd_handler.c
Comment by Build Master (Inactive) [ 17/Feb/12 ]

Integrated in lustre-master » x86_64,client,el6,ofa #480
LU-1055 remove improper assert in osd_declare_object_destroy (Revision 0117cbc7b085404af72978b0361b33966ac379cb)

Result = FAILURE
Oleg Drokin : 0117cbc7b085404af72978b0361b33966ac379cb
Files :

  • lustre/osd-ldiskfs/osd_handler.c
Comment by Build Master (Inactive) [ 17/Feb/12 ]

Integrated in lustre-master » i686,client,el6,ofa #480
LU-1055 remove improper assert in osd_declare_object_destroy (Revision 0117cbc7b085404af72978b0361b33966ac379cb)

Result = ABORTED
Oleg Drokin : 0117cbc7b085404af72978b0361b33966ac379cb
Files :

  • lustre/osd-ldiskfs/osd_handler.c
Generated at Sat Feb 10 01:13:04 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.