[LU-5125] lu_object_attr() should return 0 for non-existing objects Created: 30/May/14 Updated: 19/Apr/17 Resolved: 19/Apr/17 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.6.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | John Hammond | Assignee: | John Hammond |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | mdt | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 14141 | ||||||||||||
| Description |
[09:45:54] John Hammond: Could someone explain the idea behind the lu_object_exists() assertion in lu_object_attr()? [14:57:23] Alex Zhuravlev: I think that nobody should be trying to access attributes on non-existing object [15:43:30] John Hammond: Yes, that part is clear. [17:15:55] John Hammond: Uses of lu_object_attr() are fairly common and so this assertion seems to be hit in fairly safe looking code. Can you suggest how we can better avoid hitting it. [17:16:27] John Hammond: Always check for existence right after object is found? [17:16:57] John Hammond: Add lu_object_find_existing()? [17:17:15] John Hammond: More disciplined LDLM usage? [17:17:37] Alex Zhuravlev: we're discussing this and related stuff with Vitaly from Xx .. [17:18:13] John Hammond: Just on gerrit? Or elsewhere? [17:18:47] Alex Zhuravlev: a bit on skype, a bit in gerrit [17:23:09] John Hammond: I understand what you say about non-existing objects. But from a stability perspective it just sounds like you're saying that people should only write code that makes sense. Can we have this function return 0 for non-existing objetcs? [17:24:58] Alex Zhuravlev: well, this is a part of the discussion as well. there are different arguments here... I tend to think this is not very good if an object disappears when somebody else is using that, silently. [17:25:51] Alex Zhuravlev: LBUG isn't a good thing, but I'd think it's kind of easy to recovery from (relatively) [17:26:25] Alex Zhuravlev: because nothing bad has been propagated to a disk or to another nodes. [17:26:30] Alex Zhuravlev: not that I like LBUGs :) [17:28:46] John Hammond: Respectfully MDT LBUGs are just not OK. [17:29:08] Alex Zhuravlev: of course [17:29:15] John Hammond: We can't keep using them to "learn" what we did wrong. [17:29:24] Alex Zhuravlev: I disagree [17:30:41] Alex Zhuravlev: learn "before" that something went wrong is much better than later, because in general the later, the more chances for bad consequences. especially then it comes to on-disk state, the clients, etc. [17:33:35] John Hammond: Is there some chance that the discussions with Vitaly will yield a systematic way of avoiding this issue? [17:34:04] Alex Zhuravlev: this is exactly what I'm hoping for, with proper documentation [17:35:35] John Hammond: Is it OK by you if I post the above conversation into a Jira issue? [17:35:53] Alex Zhuravlev: sure [17:37:55] John Hammond: OK. I'll give you a month. After that if there isn't something promising then I'll start pushing for returning 0. [17:38:43] Alex Zhuravlev: fine. though frankly I don't think removing LBUG is the right way to go.. |
| Comments |
| Comment by Andreas Dilger [ 04/Dec/14 ] |
|
Alex, Vitaly, did you ever come to a conclusion about what to do in this case? I agree with John that LBUG() is not a good form of error handling for cases that can actually happen (e.g. due to disk or network corruption, malicious clients, etc). It should only be used for states that are impossible to hit except for code defects, and even then only if there is no good way to handle the error. |
| Comment by Vitaly Fertman [ 25/Dec/14 ] |
|
as I understand there are the following problems. 1. the inode is not removed under the same transaction as unlink itself, several transactions means the operation is not atomic and inodes could be lost as the result. also, it may lead to some deadlocks like: thus the left part is to move iput into osd_object_destroy() or around. 2. all the operations works in the order object_find, object_lock (e.g. mdt_object_find_lock) |
| Comment by Vitaly Fertman [ 25/Dec/14 ] |
|
this is what we discussed with Alex |
| Comment by Gerrit Updater [ 07/Oct/15 ] |
|
John L. Hammond (john.hammond@intel.com) uploaded a new patch: http://review.whamcloud.com/16750 |