[LU-7579] (osd_handler.c:2683:osd_object_destroy()) ASSERTION( osd_inode_unlinked(inode) || inode->i_nlink == 1 || inode->i_nlink == 2 ) failed Created: 17/Dec/15  Updated: 28/Dec/22  Resolved: 13/Feb/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: Lustre 2.8.0

Type: Bug Priority: Blocker
Reporter: John Hammond Assignee: Alex Zhuravlev
Resolution: Fixed Votes: 0
Labels: None

Attachments: Text File dmesg.txt     File lustre.log    
Issue Links:
Duplicate
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   
# export MDSCOUNT=4
# llmount.sh
...
# cd /mnt/lustre
# mkdir d0
# exec 7<d0
# cd d0
# rmdir ../d0
# lfs mkdir -i1 d1
# exec 7>&-
[  130.331201] LustreError: 4033:0:(osd_handler.c:2683:osd_object_destroy()) ASSERTION( osd_inode_unlinked(inode) || inode->i_nlink == 1 || inode->i_nlink == 2 ) failed:
[  130.335818] LustreError: 4033:0:(osd_handler.c:2683:osd_object_destroy()) LBUG
[  130.338084] Pid: 4033, comm: mdt_rdpg00_001
[  130.339427]
[  130.339429] Call Trace:
[  130.340776]  [<ffffffffa09338b5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
[  130.342927]  [<ffffffffa0933eb7>] lbug_with_loc+0x47/0xb0 [libcfs]
[  130.344861]  [<ffffffffa12aa9b1>] osd_object_destroy+0x5a1/0x5b0 [osd_ldiskfs]
[  130.346272]  [<ffffffffa144b49d>] lod_sub_object_destroy+0x1fd/0x440 [lod]
[  130.347589]  [<ffffffffa1440080>] lod_object_destroy+0x130/0x770 [lod]
[  130.348877]  [<ffffffffa093f751>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
[  130.350151]  [<ffffffffa1336ca8>] mdd_close+0xa48/0xbf0 [mdd]
[  130.351286]  [<ffffffffa139e1c9>] mdt_mfd_close+0x359/0x1980 [mdt]
[  130.352523]  [<ffffffffa0aa4e3d>] ? class_handle_unhash_nolock+0x2d/0x150 [obdclass]
[  130.354034]  [<ffffffffa139fa04>] mdt_close_internal+0x214/0x4f0 [mdt]
[  130.355298]  [<ffffffffa139ff9a>] mdt_close+0x2ba/0x900 [mdt]
[  130.356495]  [<ffffffffa0d538f7>] tgt_request_handle+0x907/0x14a0 [ptlrpc]
[  130.357874]  [<ffffffffa0cfdd7a>] ptlrpc_main+0xdaa/0x18b0 [ptlrpc]
[  130.359089]  [<ffffffff8105e59d>] ? finish_task_switch+0x7d/0x110
[  130.360256]  [<ffffffff8105e568>] ? finish_task_switch+0x48/0x110
[  130.361445]  [<ffffffff81553065>] ? thread_return+0x4e/0x7e9
[  130.362538]  [<ffffffff810b6b6d>] ? lock_release_holdtime+0x3d/0x190
[  130.363791]  [<ffffffffa0cfcfd0>] ? ptlrpc_main+0x0/0x18b0 [ptlrpc]
[  130.365028]  [<ffffffff8109e856>] kthread+0x96/0xa0
[  130.365935]  [<ffffffff8100c30a>] child_rip+0xa/0x20
[  130.366896]  [<ffffffff8100bb10>] ? restore_args+0x0/0x30
[  130.367938]  [<ffffffff8109e7c0>] ? kthread+0x0/0xa0
[  130.368911]  [<ffffffff8100c300>] ? child_rip+0x0/0x20
[  130.369926]


 Comments   
Comment by Di Wang [ 18/Dec/15 ]

Hmm, this is an interesting question. Once the parent(on MDT0) becomes orphan, we should not be able to create any file or directory.
But if it create remote directory, then create RPC will be sent to another MDT(MDT1), in mdd_create(), MDT1 will retrieve the parent attributes from the MDT0 remotely, and since the parent is just an orphan linked to the orphan directory, so get_attr() have no way to tell the parent has become an orphan, so the creation go ahead, and failure happens.

So we need permanently record this orphan flag somewhere in the inode, another flag inside ldiskfs or add another EA, either way this will impact the performance.

Comment by Di Wang [ 18/Dec/15 ]

how about store the orphan status in LMA, then get/set it through osd_attr_xxx()? other suggestion?

Comment by Andreas Dilger [ 21/Dec/15 ]

Di, regarding your question in http://review.whamcloud.com/10274

Is it easy to add more flags here? Since I need a flag to indicate the directory is orphan, so no one can create sub-directory or file under this orphan anymore.

right now we store this flag in memory of mdd object, but then it would not be transferred to remote target. (see LU-7579)
So the orphan has to be marked as a permanent flag in OSD layer, then osp_attr_get() can retrieve this flag, and aware of the orphan status. Please add comments in LU-7579.

These flags are direct mapping a from the EXT4_*_FL inode flags, so it is a bit tricky to add new values here, since most of them are checked by e2fsck.

There is one flag - EXT4_IMAGIC_FL which means that the inode is disconnected and does not have a parent, but it also needs a superblock feature flag to be set to be valid.

It is also possible to prevent new files to be created in a directory by setting the nlink count = 0, I think, and adding it to add the orphan inode list. However, that is pretty ldiskfs-specific.

What about a "compat" flag in the LMA? Is that sent to the remote MDT? That would at least be consistent with ZFS and ldiskfs.

Comment by Di Wang [ 21/Dec/15 ]
What about a "compat" flag in the LMA? Is that sent to the remote MDT? That would at least be consistent with ZFS and ldiskfs.

That is what I thought as I well, but I am not sure if we should do this inside osd_attr_get/set(), i.e. get/set extra flags from LMA in osd_attr_get/set(), and input/output with lu_flags, or explicitly though get_xattr to retrieve these flags, which will definitely need extra RPC.

Comment by Alex Zhuravlev [ 21/Dec/15 ]

it shouldn't be a problem to store additional EA - the directories usually have quite amount of free space in inode (given they don't store LOV). as for RPC RTT - we do have a notion of readahead for attributes/EAs/etc (already used by LFSCK), which can be employed by MDD to save RPC.
in general - it's the same issue as opencounter. moving all these flags from mdd objects to an external storage is the right way to go.
btw, what about LUSTRE_SLAVE_DEAD_FL ? we've been doing this for the slaves already?

Comment by Andreas Dilger [ 22/Dec/15 ]

Yes, LUSTRE_SLAVE_DEAD_FL seems like exactly the right thing?

Comment by Gerrit Updater [ 22/Dec/15 ]

wangdi (di.wang@intel.com) uploaded a new patch: http://review.whamcloud.com/17715
Subject: LU-7579 mdd: Add ORPHAN/DEAD flag in LMA
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: b8472fb5383e5a4ac477cbd2aaa16bb871660ffd

Comment by parinay v kondekar (Inactive) [ 28/Dec/15 ]
  • umount of lustre on server resulted in similar ASSERTION
  • env
    DNE setup, interop 2.7.64 server <-> 2.5.x client
    
  • the dump
    2015-12-24 04:29:23 [592212.084691] LustreError: 76535:0:(osd_handler.c:2683:osd_object_destroy()) ASSERTION( osd_inode_unlinked(inode) || in
    ode->i_nlink == 1 || inode->i_nlink == 2 ) failed:
    2015-12-24 04:29:23 [592212.100075] LustreError: 76535:0:(osd_handler.c:2683:osd_object_destroy()) LBUG
    2015-12-24 04:29:23 [592212.107635] Pid: 76535, comm: mdt_rdpg00_001
    2015-12-24 04:29:23 [592212.112078]
    2015-12-24 04:29:23 [592212.112078] Call Trace:
    2015-12-24 04:29:23 [592212.116383]  [<ffffffffa05ab875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
    2015-12-24 04:29:23 [592212.123536]  [<ffffffffa05abe77>] lbug_with_loc+0x47/0xb0 [libcfs]
    2015-12-24 04:29:23 [592212.129909]  [<ffffffffa10d5fc1>] osd_object_destroy+0x5a1/0x5b0 [osd_ldiskfs]
    2015-12-24 04:29:23 [592212.137395]  [<ffffffffa118e05d>] lod_sub_object_destroy+0x1fd/0x440 [lod]
    2015-12-24 04:29:23 [592212.144466]  [<ffffffffa1182c70>] lod_object_destroy+0x130/0x770 [lod]
    2015-12-24 04:29:23 [592212.151180]  [<ffffffffa05b76c1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
    2015-12-24 04:29:23 [592212.157999]  [<ffffffffa0f22e68>] mdd_close+0xa48/0xbf0 [mdd]
    2015-12-24 04:29:23 [592212.163938]  [<ffffffffa0f9d319>] mdt_mfd_close+0x359/0x1980 [mdt]
    2015-12-24 04:29:23 [592212.170354]  [<ffffffffa095309c>] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc]
    2015-12-24 04:29:23 [592212.177782]  [<ffffffffa09532db>] ? lustre_pack_reply_v2+0x1eb/0x280 [ptlrpc]
    2015-12-24 04:29:24 [592212.185218]  [<ffffffffa0952085>] ? lustre_msg_buf+0x55/0x60 [ptlrpc]
    2015-12-24 04:29:24 [592212.191881]  [<ffffffffa0709295>] ? class_handle2object+0x95/0x190 [obdclass]
    2015-12-24 04:29:24 [592212.199278]  [<ffffffffa0f9eb3a>] mdt_close_internal+0x1fa/0x4e0 [mdt]
    2015-12-24 04:29:24 [592212.206006]  [<ffffffffa0f9f0d7>] mdt_close+0x2b7/0xa40 [mdt]
    2015-12-24 04:29:24 [592212.211985]  [<ffffffffa09bdd4c>] tgt_request_handle+0x8ec/0x1470 [ptlrpc]
    2015-12-24 04:29:24 [592212.219081]  [<ffffffffa0964e71>] ptlrpc_main+0xe41/0x1910 [ptlrpc]
    2015-12-24 04:29:24 [592212.225573]  [<ffffffffa0964030>] ? ptlrpc_main+0x0/0x1910 [ptlrpc]
    2015-12-24 04:29:24 [592212.232020]  [<ffffffff8109ac66>] kthread+0x96/0xa0
    2015-12-24 04:29:24 [592212.237124]  [<ffffffff8100c20a>] child_rip+0xa/0x20
    2015-12-24 04:29:24 [592212.242272]  [<ffffffff8109abd0>] ? kthread+0x0/0xa0
    2015-12-24 04:29:24 [592212.247410]  [<ffffffff8100c200>] ? child_rip+0x0/0x20
    2015-12-24 04:29:24 [592212.252722]
    2015-12-24 04:29:24 [592212.254723] Kernel panic - not syncing: LBUG
    
  • I will upload the necessary logs(lctl/dmesg).
Comment by Peter Jones [ 19/Jan/16 ]

http://review.whamcloud.com/#/c/18024/

Comment by Gerrit Updater [ 13/Feb/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/18024/
Subject: LU-7579 osd: move ORPHAN/DEAD flag to OSD
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 098fb363c3902f67b29ddfa864b452d0a8460ad2

Comment by Peter Jones [ 13/Feb/16 ]

Landed for 2.8

Comment by Gerrit Updater [ 13/Feb/16 ]

Alex Zhuravlev (alexey.zhuravlev@intel.com) uploaded a new patch: http://review.whamcloud.com/18444
Subject: LU-7579 mdd: do not mark object as an orphan early
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 7a0d2710b49eac003f3777975a17a651d0996b5b

Comment by Gerrit Updater [ 15/Feb/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/18444/
Subject: LU-7579 mdd: do not mark object as an orphan early
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 1fd624857218bd5f7ae1529d3fee8933c9cb8a75

Comment by Gerrit Updater [ 28/Dec/22 ]

"Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49524
Subject: LU-7579 osd-zfs: set orphan flag on the cached object
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: ce3e893bae93da83863d08e14f85bbdc205499f7

Generated at Sat Feb 10 02:10:06 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.