[LU-7315] In osd_dirent_check_repair(), Pointer 'hlock' can be NULL and can be dereferenced Created: 19/Oct/15 Updated: 04/Dec/15 Resolved: 04/Dec/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.8.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Dmitry Eremin (Inactive) | Assignee: | nasf (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | kw | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
The list of commits since the previous build 2.7.61-9-g4438262:
|
| Comments |
| Comment by Andreas Dilger [ 19/Oct/15 ] |
|
It looks like this was added by http://review.whamcloud.com/16440 " Fan Yong, could you please take a look. |
| Comment by nasf (Inactive) [ 23/Oct/15 ] |
|
Inside the function osd_dirent_check_repair(), anytime before it "goto again", it will call as following: if (jh == NULL) { brelse(bh); if (hlock != NULL) ldiskfs_htree_unlock(hlock); else up_read(&obj->oo_ext_idx_sem); dev->od_dirent_journal = 1; goto again; } Means the "hlock" or "oo_ext_idx_sem" will be unlock/up before "goto again". Then the logic goes to the label "again": if (dev->od_dirent_journal != 0) { again: jh = osd_journal_start_sb(sb, LDISKFS_HT_MISC, credits); if (IS_ERR(jh)) { rc = PTR_ERR(jh); CDEBUG(D_LFSCK, "%.16s: fail to start trans for dirent " "check_repair, dir = %lu/%u, credits = %d, " "name = %.*s: rc = %d\n", devname, dir->i_ino, dir->i_generation, credits, ent->oied_namelen, ent->oied_name, rc); GOTO(out_inode, rc); } if (obj->oo_hl_head != NULL) { hlock = osd_oti_get(env)->oti_hlock; /* "0" means exclusive lock for the whole directory. * We need to prevent others access such name entry * during the delete + insert. Neither HLOCK_ADD nor * HLOCK_DEL cannot guarantee the atomicity. */ ldiskfs_htree_lock(hlock, obj->oo_hl_head, dir, 0); } else { down_write(&obj->oo_ext_idx_sem); } } else { ... out_inode: iput(inode); if (rc >= 0 && !dirty) dev->od_dirent_journal = 0; return rc; As you can see, it will try to start journal firstly, if it failed, then "GOTO(out_inode, rc);", at the label "out_inode", it will not handle "hlock" or "oo_ext_idx_sem", because either "hlock" or "oo_ext_idx_sem" has been unlocked/up, or it has not been lock/down (for the first time). If journal has been started successfully, then it will try to lock/down the "hlock" or "oo_ext_idx_sem". After that, all elements become valid. The "obj->oo_hl_head" cannot be changed once it has been set, so if "hlock" is not NULL before the "goto again", then it will still be non-NULL after the "goto again". So the whole logic is safe. Anyway, I know what you worry about, I will make patch to reset the "hlock" explicitly to the logic to be easy for the readers. |
| Comment by Gerrit Updater [ 23/Oct/15 ] |
|
Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/16924 |
| Comment by Dmitry Eremin (Inactive) [ 17/Nov/15 ] |
|
The tool reports the different path of this issue. osd_handler.c:5404: 'hlock' is checked for NULL. osd_handler.c:5404: hlock!= ( (void* )0) is false osd_handler.c:5333: obj->oo_hl_head!= ( (void* )0) is false osd_handler.c:5353: 'hlock' is dereferenced by passing argument 4 to function '__ldiskfs_find_entry'. namei.c:1215: 'lck' is passed to function '__ldiskfs_find_entry'. namei.c:1236: 'lck' is dereferenced by passing argument 4 to function 'ldiskfs_dx_find_entry'. namei.c:1327: 'lck' is passed to function 'ldiskfs_dx_find_entry'. namei.c:1343: 'lck' is dereferenced by passing argument 5 to function 'dx_probe'. namei.c:594: 'lck' is passed to function 'dx_probe'. namei.c:618: hinfo->hash_version<=2 is true namei.c:690: 0 is false namei.c:727: ldiskfs_htree_node_locked(lck, LDISKFS_LB_DX) is false namei.c:736: 'lck' is dereferenced by passing argument 1 to function 'dx_probe_hash_collision'. namei.c:566: 'lck' is passed to function 'dx_probe_hash_collision'. namei.c:570: 'lck' is explicitly dereferenced. |
| Comment by nasf (Inactive) [ 18/Nov/15 ] |
|
Hm... That is a different story in the original PDO patch, not related with the recent landed patch http://review.whamcloud.com/16440. I have updated the patch to fix that. Please help to analysis the new code with tools. Thanks! |
| Comment by Dmitry Eremin (Inactive) [ 18/Nov/15 ] |
|
Now tool don't detect issue with lck. |
| Comment by Gerrit Updater [ 04/Dec/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16924/ |
| Comment by Joseph Gmitter (Inactive) [ 04/Dec/15 ] |
|
Landed for 2.8 |