[LU-753] kernel:LustreError: 4998:0:(filter_io_26.c:791:filter_commitrw_write()) LBUG Created: 11/Oct/11 Updated: 16/Apr/13 Resolved: 10/Jan/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.2.0, Lustre 2.4.0, Lustre 2.1.1 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Jinshan Xiong (Inactive) | Assignee: | Niu Yawei (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 4873 | ||||||||
| Description |
|
I hit this bug when I do the following thing: On a fresh lustre env: it hit this bug, and the stack trace on the OSS is: LustreError: 4998:0:(filter_io_26.c:791:filter_commitrw_write()) ASSERTION(oti->oti_transno <= obd->obd_last_committed) failed: oti_transno 752 last_committed 750 Call Trace: LustreError: dumping log to /tmp/lustre-log.1318354849.4998 Info required for matching: sanity-quota test_23a 23a |
| Comments |
| Comment by Jinshan Xiong (Inactive) [ 11/Oct/11 ] |
|
I didn't look into this bug yet so I'm not sure if there is a quick reproduce path. |
| Comment by Jinshan Xiong (Inactive) [ 13/Oct/11 ] |
|
I took a look at this issue today because it stops my test frequently. The root cause of this issue is that fsfilt_commit_wait() returns before j_commit_callback is called for that specific commit sequence. Let's say proc1 is journal commit process, proc2 is the waiting process(filter write process), and there are two transaction t1 and t2 pending, proc2 wants to wait for t2 to be finished.
proc1: proc2:
wait for t2
finish t1
call commit callback for t1, update committed_transno
wake up proc2
finish t2
see t2 is finished, return
hit LBUG since transno is actually less then committed_transno
call commit callback for t2, update committed_transno
I have no idea to fix this issue gracefully. Maybe we should patch jbd2 to let it call commit_callback before updating j_commit_sequence; or modify lustre to actually wait for committed_transno. |
| Comment by Jinshan Xiong (Inactive) [ 13/Oct/11 ] |
|
I made a hot fix at: http://review.whamcloud.com/1519 So if anybody else has the same issue, this fix can be applied to move things forward. The real fix should be contributed by ldiskfs expert. |
| Comment by Zhenyu Xu [ 20/Oct/11 ] |
|
another hit at https://maloo.whamcloud.com/test_sets/fa85f7c8-fad2-11e0-bbc0-52540025f9af |
| Comment by Mikhail Pershin [ 21/Oct/11 ] |
|
Another hit: |
| Comment by Peter Jones [ 21/Oct/11 ] |
|
Niu Could you please look into this one? It is causing regular autotest failures Thanks Peter |
| Comment by Jinshan Xiong (Inactive) [ 21/Oct/11 ] |
|
It would be fine to just comment it out. |
| Comment by Andreas Dilger [ 21/Oct/11 ] |
|
I think the root of the problem is that fsfilt_commit_wait() is returning too early. Any workarounds at the higher level are just wrong. Maybe I'm missing something, but jbd2_journal_commit_transaction() on my system looks like: if (journal->j_commit_callback)
journal->j_commit_callback(journal, commit_transaction);
trace_jbd2_end_commit(journal, commit_transaction);
jbd_debug(1, "JBD: commit %d complete, head %d\n",
journal->j_commit_sequence, journal->j_tail_sequence);
if (to_free)
kfree(commit_transaction);
wake_up(&journal->j_wait_done_commit);
so it is calling the commit callback before waking up the threads waiting in jbd2_log_wait_commit(): while (tid_gt(tid, journal->j_commit_sequence)) {
jbd_debug(1, "JBD: want %d, j_commit_sequence=%d\n",
tid, journal->j_commit_sequence);
wake_up(&journal->j_wait_commit);
read_unlock(&journal->j_state_lock);
wait_event(journal->j_wait_done_commit,
!tid_gt(tid, journal->j_commit_sequence));
read_lock(&journal->j_state_lock);
}
However, I guess if this thread is getting spurious wakeups for some reason, it might just wake up after jbd2_journal_commit_transaction() updated j_commit_sequence and dropped j_state_lock and exit before the commit callback is called. It looks like the root cause of this problem is the new jbd2 commit callback patch ( That said, I'm not sure the right way to fix this. Technically, the code is correct in that jbd2_log_wait_commit() is waiting until the transaction is complete, but it is racy if one considers the commit callbacks as part of the transaction. It doesn't seem very correct to move the commit callback earlier, and it may even improve performance now that the transaction can complete before the callbacks are run. However, we lose out on some valuable sanity checking in the obdfilter::filter_commitrw_write() that the data is on disk after we have returned from fsfilt_commit_wait(). Maybe the value of this LASSERT() is declining, since this code will all disappear once we move to OFD/OSD? |
| Comment by Andreas Dilger [ 21/Oct/11 ] |
|
Hmm, going further, if we remove this LASSERT() then it will push the problem back to the client. The client reply will have transno < last_committed, and it would have to keep the bulk RPC for replay, even though it may have asked for a sync IO. It looks like the right solution is to wait for the commit callback to be run before returning to the client: /* In rare cases fsfilt_commit_wait() will wake up and return after
* the transaction has finished its work and updated j_commit_sequence
* but the commit callbacks have not been run yet. Wait here until
* that is finished so that clients requesting sync IO don't see the
* reply transno < last_committed. LU-753 */
if (unlikely(obd->obd_replayable && !rc && wait_handle &&
oti->oti_transno <= obd->obd_last_committed)) {
struct l_wait_info lwi = { 0 };
l_wait_event(&waitq, oti->oti_transno >=obd->obd_last_committed,
&lwi);
}
|
| Comment by Jinshan Xiong (Inactive) [ 21/Oct/11 ] |
|
Does clients check if comitted_transno is not less than RPC transno, I tend to this it will be okay they don't. The only side effect is that clients have to hold those pages for longer time until up-to-date committed transno is broadcasted by next RPC. |
| Comment by Niu Yawei (Inactive) [ 22/Oct/11 ] |
indeed, seems keeping the replay a little bit longer in very rare case isn't harmful.Andreas, any objections? |
| Comment by Andreas Dilger [ 22/Oct/11 ] |
|
If the client requested sync RPCs for some reason (e.g. liblustre, some other client that doesnt handle bulk RPC replay, if the client RAM is short) then the OST should honor this request. I don't think this will delay RPC replies in most cases. Consider how many bulk writes have been sent in every test run since Please add the l_wait_event() as suggested. |
| Comment by Niu Yawei (Inactive) [ 23/Oct/11 ] |
| Comment by Build Master (Inactive) [ 26/Oct/11 ] |
|
Integrated in Oleg Drokin : 0935fc8eae15696820f87951d8955a9c8bc66177
|
| Comment by Build Master (Inactive) [ 26/Oct/11 ] |
|
Integrated in Oleg Drokin : 0935fc8eae15696820f87951d8955a9c8bc66177
|
| Comment by Build Master (Inactive) [ 26/Oct/11 ] |
|
Integrated in Oleg Drokin : 0935fc8eae15696820f87951d8955a9c8bc66177
|
| Comment by Build Master (Inactive) [ 26/Oct/11 ] |
|
Integrated in Oleg Drokin : 0935fc8eae15696820f87951d8955a9c8bc66177
|
| Comment by Build Master (Inactive) [ 26/Oct/11 ] |
|
Integrated in Oleg Drokin : 0935fc8eae15696820f87951d8955a9c8bc66177
|
| Comment by Build Master (Inactive) [ 26/Oct/11 ] |
|
Integrated in Oleg Drokin : 0935fc8eae15696820f87951d8955a9c8bc66177
|
| Comment by Build Master (Inactive) [ 26/Oct/11 ] |
|
Integrated in Oleg Drokin : 0935fc8eae15696820f87951d8955a9c8bc66177
|
| Comment by Build Master (Inactive) [ 26/Oct/11 ] |
|
Integrated in Oleg Drokin : 0935fc8eae15696820f87951d8955a9c8bc66177
|
| Comment by Build Master (Inactive) [ 26/Oct/11 ] |
|
Integrated in Oleg Drokin : 0935fc8eae15696820f87951d8955a9c8bc66177
|
| Comment by Build Master (Inactive) [ 26/Oct/11 ] |
|
Integrated in Oleg Drokin : 0935fc8eae15696820f87951d8955a9c8bc66177
|
| Comment by Build Master (Inactive) [ 26/Oct/11 ] |
|
Integrated in Oleg Drokin : 0935fc8eae15696820f87951d8955a9c8bc66177
|
| Comment by Build Master (Inactive) [ 26/Oct/11 ] |
|
Integrated in Oleg Drokin : 0935fc8eae15696820f87951d8955a9c8bc66177
|
| Comment by Build Master (Inactive) [ 26/Oct/11 ] |
|
Integrated in Oleg Drokin : 0935fc8eae15696820f87951d8955a9c8bc66177
|
| Comment by Build Master (Inactive) [ 26/Oct/11 ] |
|
Integrated in Oleg Drokin : 0935fc8eae15696820f87951d8955a9c8bc66177
|
| Comment by Peter Jones [ 03/Nov/11 ] |
|
Landed for 2.2 |
| Comment by Build Master (Inactive) [ 02/May/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 02/May/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 02/May/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 02/May/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 02/May/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 02/May/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 02/May/12 ] |
|
Integrated in Result = SUCCESS
|