[LU-4985] Return value of kthread_run truncated to 32-bit int in ll_sa thread and elsewhere Created: 30/Apr/14 Updated: 05/Jun/14 Resolved: 14/May/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.6.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Ryan Haasken | Assignee: | John Hammond |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | patch | ||
| Severity: | 3 |
| Rank (Obsolete): | 13809 |
| Description |
|
Several places in the Lustre code store the result of kthread_run, which is of 64-bit type task_struct *, in a 32-bit integer. Depending on the value of the pointer, this will be interpreted as an error by the IS_ERR_VALUE macro. This problem exists, for example, with the creation of statahead threads, ll_sa. static int start_statahead_thread(struct inode *dir, struct dentry *dentry)
{
...
int rc;
...
rc = PTR_ERR(kthread_run(ll_statahead_thread, dentry->d_parent,
"ll_sa_%u", lli->lli_opendir_pid));
thread = &sai->sai_thread;
if (IS_ERR_VALUE(rc)) {
CERROR("can't start ll_sa thread, rc: %d\n", rc);
spin_lock(&lli->lli_sa_lock);
thread_set_flags(thread, SVC_STOPPED);
thread_set_flags(&sai->sai_agl_thread, SVC_STOPPED);
spin_unlock(&lli->lli_sa_lock);
ll_sai_put(sai);
GOTO(out, rc = -EAGAIN);
}
...
This can cause a general protection fault if the sai struct is accessed by the ll_sa thread after it has been freed. Here is an example in which the return value from kthread_run is incorrectly interpreted as an error: 2014-04-18T04:37:49.706791-05:00 c0-0c1s2n3 LustreError: 11453:0:(statahead.c:1588:start_statahead_thread()) can't start ll_sa thread, rc: -3968 The -3968 which is reported as the rc of the kthread_run is not a valid error number. It is the result of truncating the returned task_struct pointer to a 32-bit int. There are other places in Lustre which handle the return value of kthread_run incorrectly. We need to go through and make sure the return value of kthread_run is handled correctly in all cases. The correct way to handle it would be like this: struct task_struct *task = kthread_run(...)
if (IS_ERR(task)) {
rc = PTR_ERR(task);
CERROR(..)
}
|
| Comments |
| Comment by Jodi Levi (Inactive) [ 30/Apr/14 ] |
|
John, |
| Comment by John Hammond [ 30/Apr/14 ] |
|
This looks like 2.5.*. Is that correct Ryan? This is fixed in 2.6.0 by http://review.whamcloud.com/#/c/6759/ for I would say major/critical. Most of http://review.whamcloud.com/#/c/6759/ can probably be ported to 2.5 without too much fuss. |
| Comment by Ryan Haasken [ 30/Apr/14 ] |
|
John, that's correct, it's 2.5.*. I didn't see It looks like there are still a few remaining spots that need fixing in master as well: File Function Line 0 lfsck_layout.c lfsck_layout_master_prep 4318 rc = PTR_ERR(kthread_run(lfsck_layout_assistant, lta, "lfsck_layout")); 1 lfsck_lib.c lfsck_start 2238 rc = PTR_ERR(kthread_run(lfsck_master_engine, lta, "lfsck")); 2 lfsck_namespace.c lfsck_namespace_double_scan 1534 rc = PTR_ERR(kthread_run(lfsck_namespace_double_scan_main, lta, 3 ofd_io.c ofd_start_inconsistency_verification_ 230 rc = PTR_ERR(kthread_run(ofd_inconsistency_verification_main, ofd, |
| Comment by Ryan Haasken [ 30/Apr/14 ] |
|
Actually, I take back the comment about the four calls to kthread_run above. In each of those cases, rc is a long int, so it should be okay. |
| Comment by Ryan Haasken [ 30/Apr/14 ] |
|
Here is the port for b2_5: http://review.whamcloud.com/#/c/10178/ |
| Comment by Andreas Dilger [ 01/May/14 ] |
|
It wouldn't be terrible to fix the four remaining cases to assign the return value to a task_struct pointer for consistency with the other callers. |
| Comment by Ryan Haasken [ 08/May/14 ] |
|
Here's a patch against master to change the remaining four cases to match the rest: http://review.whamcloud.com/#/c/10266/ |
| Comment by James A Simmons [ 14/May/14 ] |
|
Patch landed to master. Ticket can be closed. |
| Comment by Peter Jones [ 14/May/14 ] |
|
Landed for 2.6 |
| Comment by Ryan Haasken [ 14/May/14 ] |
|
The b2_5 port of the patch which fixed |