[LU-7001] osp_sync.c: 1139: osp_sync_thread Created: 13/Aug/15 Updated: 18/Sep/18 Resolved: 13/Sep/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.11.0, Lustre 2.10.4 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Alex | Assignee: | Alexander Boyko |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | patch | ||
| Environment: |
https://build.hpdd.intel.com/job/lustre-reviews/34017/arch=x86_64,build_type=server,distro=el6.6,ib_stack=inkernel/ |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
The problem unfortunately is not solved, even with the patch http://review.whamcloud.com/#/c/15841/ |
| Comments |
| Comment by Andreas Dilger [ 13/Aug/15 ] |
|
Please provide the console logs with stack trace from the failing node. What operations are being done to trigger these errors? |
| Comment by Alex [ 14/Aug/15 ] |
|
How to enable logs to provide them? Operation delete files |
| Comment by Gerrit Updater [ 09/Sep/15 ] |
|
Li Xi (lixi@ddn.com) uploaded a new patch: http://review.whamcloud.com/16335 |
| Comment by Li Xi (Inactive) [ 09/Sep/15 ] |
|
We are hitting this issue repeatedly. I guess it will never recover unless we skip recovery or do something trick. Can we just remove the assertion? It seems this assertion is not proper, since the running thread has no idea when it will be requested to stop. Also, in osp_init0(), if ptlrpc_init_import() function returns a failure (ptlrpc_init_import() will not return any failure at least currently), it seems the assertion will fail. So this assertion looks dangerous. |
| Comment by Li Xi (Inactive) [ 10/Sep/15 ] |
|
Finally, we walk around this problem by remove the CATALOGS file. I am wondering whether there is anyway to chack and recover broken llogs records... |
| Comment by Andreas Dilger [ 10/Sep/15 ] |
|
Li Xi, there are a couple of patches in flight that will repair or skip corrupted log records, but there may still be more types of corruption found on the future. |
| Comment by Li Xi (Inactive) [ 11/Sep/15 ] |
|
Thank you Andreas for the information. Do you think it is possible to write a userspace tool to read as well as edit the llog files? I know that llog_reader is being changed, so hopefully, we will be able to at least dump the llog file. But since the llog files can be read locally from MDT/OST ldiskfs, maybe we can use a tool to remove wrong records mannually too? |
| Comment by Gerrit Updater [ 23/Mar/17 ] |
|
Alexander Boyko (alexander.boyko@seagate.com) uploaded a new patch: https://review.whamcloud.com/26132 |
| Comment by Gerrit Updater [ 05/Apr/17 ] |
|
Alexander Boyko (alexander.boyko@seagate.com) uploaded a new patch: https://review.whamcloud.com/26359 |
| Comment by Gerrit Updater [ 13/Sep/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/26132/ |
| Comment by Peter Jones [ 13/Sep/17 ] |
|
Landed for 2.11 |
| Comment by Gerrit Updater [ 20/Apr/18 ] |
|
Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/32097 |
| Comment by Gerrit Updater [ 03/May/18 ] |
|
John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/32097/ |