[LU-5056] osp_sync_thread()) ASSERTION( rc == 0 || rc == LLOG_PROC_BREAK ) failed: 6 changes, 8 in progress, 0 in flight: -5 Created: 13/May/14  Updated: 09/Jun/16  Resolved: 09/Jun/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.0, Lustre 2.6.0
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Jinshan Xiong (Inactive) Assignee: WC Triage
Resolution: Duplicate Votes: 0
Labels: None

Issue Links:
Related
is related to LU-6696 ASSERTION( rc == 0 || rc == LLOG_PROC... Resolved
Severity: 3
Rank (Obsolete): 13967

 Description   
LustreError: 5182:0:(llite_lib.c:1304:ll_md_setattr()) md_setattr fails: rc = -30
LustreError: 5182:0:(file.c:171:ll_close_inode_openhandle()) inode 144115205272540056 mdc close failed: rc = -30
LDISKFS-fs (loop0): 
LustreError: 3149:0:(osd_handler.c:861:osd_trans_commit_cb()) transaction @0xffff8800d3ddd880 commit error: 2
LDISKFS-fs error (device loop0): ldiskfs_journal_start_sb: Detected aborted journal
LDISKFS-fs (loop0): Remounting filesystem read-only
LustreError: 3756:0:(llog.c:159:llog_cancel_rec()) lustre-OST0000-osc-MDT0000: fail to write header for llog #0x3:1#00000000: rc = -30
LustreError: 3756:0:(llog_cat.c:529:llog_cat_cancel_records()) lustre-OST0000-osc-MDT0000: fail to cancel 1 of 1 llog-records: rc = -30
LustreError: 3756:0:(osp_sync.c:702:osp_sync_process_committed()) lustre-OST0000-osc-MDT0000: can't cancel record: -30
loop: Write error at byte offset 148717568, length 4096.
loop: Write error at byte offset 148721664, length 4096.
Remounting filesystem read-only
LustreError: 3758:0:(osd_io.c:946:osd_ldiskfs_read()) lustre-MDT0000: can't read 4096@90112 on ino 112: rc = -5
LustreError: 3763:0:(osd_io.c:946:osd_ldiskfs_read()) lustre-MDT0000: can't read 4096@73728 on ino 116: rc = -5
LustreError: 3763:0:(llog_osd.c:592:llog_osd_next_block()) lustre-MDT0000-osd: can't read llog block from log [0x1:0x9:0x0] offset 73728: rc = -5
LustreError: 3763:0:(osp_sync.c:855:osp_sync_thread()) ASSERTION( rc == 0 || rc == LLOG_PROC_BREAK ) failed: 6 changes, 8 in progress, 0 in flight: -5
LustreError: 3763:0:(osp_sync.c:855:osp_sync_thread()) LBUG
Pid: 3763, comm: osp-syn-3

Call Trace:
 [<ffffffffa03a3895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
 [<ffffffffa03a3e97>] lbug_with_loc+0x47/0xb0 [libcfs]
 [<ffffffffa0ddd0d3>] osp_sync_thread+0x753/0x7d0 [osp]
 [<ffffffff8150e600>] ? thread_return+0x4e/0x76e
 [<ffffffffa0ddc980>] ? osp_sync_thread+0x0/0x7d0 [osp]
 [<ffffffff81096a36>] kthread+0x96/0xa0
 [<ffffffff8100c0ca>] child_rip+0xa/0x20
 [<ffffffff810969a0>] ? kthread+0x0/0xa0
 [<ffffffff8100c0c0>] ? child_rip+0x0/0x20


 Comments   
Comment by Jinshan Xiong (Inactive) [ 13/May/14 ]

I can reproduce this issue consistently. Please let me know if somebody is working on this and need more log

Comment by Jodi Levi (Inactive) [ 13/May/14 ]

Jinshan,
Is this happening on Master as well?

Comment by Jinshan Xiong (Inactive) [ 13/May/14 ]

Jodi, I will verify it on master.

Comment by Jinshan Xiong (Inactive) [ 13/May/14 ]

Yes, I can.

Since I can only see this issue on extremely low memory case, so this issue is not necessary to be a blocker for 2.6

Comment by Andreas Dilger [ 09/Jun/16 ]

Closing this as a duplicate of LU-6696, which has a patch http://review.whamcloud.com/19856 "LU-6696 llog: improve error handling" to handle the oops, though not the remount-readonly case.

Generated at Sat Feb 10 01:48:09 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.