[LU-396] LustreError: 5663:0:(osc_io.c:322:osc_io_commit_write()) ASSERTION(to > 0) failed Created: 06/Jun/11 Updated: 05/Jun/12 Resolved: 05/Jun/12 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.1.0 |
| Fix Version/s: | Lustre 2.1.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Prakash Surya (Inactive) | Assignee: | Jinshan Xiong (Inactive) |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Based on the RHEL6/CHAOS5 distro and 2.0.59 lustre tag |
||
| Severity: | 3 |
| Rank (Obsolete): | 8535 |
| Description |
|
Failed assertion when running a 32 node, 64 client (2 per node), file per process IOR. The relevant console output from the client node that crashed is below: 2011-06-03 15:23:28 LustreError: 5663:0:(osc_io.c:322:osc_io_commit_write()) ASSERTION(to > 0) failed |
| Comments |
| Comment by Oleg Drokin [ 06/Jun/11 ] |
|
Jinshan, can you please take a look at this? |
| Comment by Lai Siyao [ 07/Jun/11 ] |
|
Hi Jinshan, maybe it's associated with new aops write_begin/end, if so, please check 1.8 also. P.S. recently write_begin/end is enabled by default for RHEL5 kernels too. |
| Comment by Jinshan Xiong (Inactive) [ 07/Jun/11 ] |
|
Hi Lai, Thank you for helping, I'll look into this. |
| Comment by Jinshan Xiong (Inactive) [ 10/Jun/11 ] |
|
I can't figure out this issue. If you can reproduce this problem, can you please run it with debug enabled and post lustre log here? Thanks. |
| Comment by Prakash Surya (Inactive) [ 10/Jun/11 ] |
|
Sure, I will try to reproduce the issue. How do I ensure that debug in enabled? By issuing 'echo -1 > /proc/sys/lnet/debug'? And the lustre log can be retrieved by 'lctl dk' right? |
| Comment by Jinshan Xiong (Inactive) [ 10/Jun/11 ] |
|
Yes. lctl dk > log.txt, and send me the log.txt |
| Comment by Christopher Morrone [ 10/Jun/11 ] |
|
He is not going to be able to log in and run lctl after the kernel panics. I would suggest a debugging patch that prints to the console. |
| Comment by Christopher Morrone [ 10/Jun/11 ] |
|
Oh, yeah, we have --enable-panic_dumplog set in our build, so if we make sure the path for the dump file is persistent storage, we may get logs that way. |
| Comment by Prakash Surya (Inactive) [ 10/Jun/11 ] |
|
Where are the logs being dumped to? Can I query lustre or the system to get the path? |
| Comment by Jinshan Xiong (Inactive) [ 10/Jun/11 ] |
|
echo 0 > /proc/sys/lnet/panic_on_lbug to not let the system hang so you can dump the log. |
| Comment by Christopher Morrone [ 13/Jun/11 ] |
|
/proc/sys/lnet/debug_path |
| Comment by Peter Jones [ 22/Jun/11 ] |
|
LLNL agreed on the weekly 2.1 call that this should be dropped in priority until it reoccurred. |
| Comment by Niu Yawei (Inactive) [ 04/Jun/12 ] |
|
Hi, Jingshan Is the LASSERT(to > 0) in osc_io_commit_write() valid? From the kernel code, seems the 'copied' could be zero when something wrong happened during copy data into page cache, that will result in the to == 0 in osc_io_commit_write(). Did I miss anything? |
| Comment by Jinshan Xiong (Inactive) [ 05/Jun/12 ] |
|
It won't come to this function if to is zero and everything else works fine. For example, in ll_commit_write(), it checks LASSERT(from <= to);
if (from != to) /* handle short write case. */
result = cl_io_commit_write(env, io, page, from, to);
I think it would be a good start to revise it LASSERTF(to > 0, "to is %d\n", to) and it will be really helpful to print out as much information as possible. |
| Comment by Niu Yawei (Inactive) [ 05/Jun/12 ] |
|
This bug should have been fixed in |
| Comment by Jinshan Xiong (Inactive) [ 05/Jun/12 ] |
|
duplicate to |