[LU-13645] Various data corruptions possible in lustre. Created: 08/Jun/20 Updated: 03/Mar/23 Resolved: 30/Oct/20 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.14.0, Lustre 2.12.5 |
| Fix Version/s: | Lustre 2.14.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Alexey Lyashkov | Assignee: | Alexey Lyashkov |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||||||||||||||
| Description |
|
Two groups data corruption cases possible with a lustre, but both is addressed to the lock cancel without osc object assigned to lock. 1) first bug is addressed to the situation when check_and_discard function can found lock without l_ast_data assigned, this block to discard a pages from page cache and leave as is. For the Lock Ahead case, it don't have a logs and other conformation - but it looks possible. ldlm_bl_13-35551 [034] 164201.591130: funcgraph_entry: | ll_dom_lock_cancel() {
ldlm_bl_13-35551 [034] 164201.591132: funcgraph_entry: | cl_env_get() {
ldlm_bl_13-35551 [034] 164201.591132: funcgraph_entry: 0.054 us | _raw_read_lock();
ldlm_bl_13-35551 [034] 164201.591132: funcgraph_entry: 0.039 us | lu_env_refill();
ldlm_bl_13-35551 [034] 164201.591133: funcgraph_entry: 0.046 us | cl_env_init0();
ldlm_bl_13-35551 [034] 164201.591133: funcgraph_entry: 0.035 us | lu_context_enter();
ldlm_bl_13-35551 [034] 164201.591133: funcgraph_entry: 0.034 us | lu_context_enter();
ldlm_bl_13-35551 [034] 164201.591134: funcgraph_exit: 1.811 us | }
ldlm_bl_13-35551 [034] 164201.591134: funcgraph_entry: | cl_object_flush() {
ldlm_bl_13-35551 [034] 164201.591134: funcgraph_entry: | lov_object_flush() {
ldlm_bl_13-35551 [034] 164201.591134: funcgraph_entry: 0.115 us | down_read();
ldlm_bl_13-35551 [034] 164201.591135: funcgraph_entry: | lov_flush_composite() {
ldlm_bl_13-35551 [034] 164201.591135: funcgraph_entry: | cl_object_flush() {
ldlm_bl_13-35551 [034] 164201.591135: funcgraph_entry: | mdc_object_flush() {
ldlm_bl_13-35551 [034] 164201.591136: funcgraph_entry: | mdc_dlm_blocking_ast0() {
ldlm_bl_13-35551 [034] 164201.591136: funcgraph_entry: | lock_res_and_lock() {
ldlm_bl_13-35551 [034] 164201.591136: funcgraph_entry: 0.114 us | _raw_spin_lock();
ldlm_bl_13-35551 [034] 164201.591136: funcgraph_entry: 0.030 us | _raw_spin_lock();
ldlm_bl_13-35551 [034] 164201.591137: funcgraph_exit: 0.677 us | }
ldlm_bl_13-35551 [034] 164201.591137: funcgraph_entry: 0.031 us | unlock_res_and_lock();
ldlm_bl_13-35551 [034] 164201.591137: funcgraph_exit: 1.363 us | }
ldlm_bl_13-35551 [034] 164201.591137: funcgraph_exit: 1.674 us | }
ldlm_bl_13-35551 [034] 164201.591137: funcgraph_exit: 2.207 us | }
ldlm_bl_13-35551 [034] 164201.591138: funcgraph_exit: 2.596 us | }
ldlm_bl_13-35551 [034] 164201.591138: funcgraph_entry: 0.042 us | up_read();
ldlm_bl_13-35551 [034] 164201.591138: funcgraph_exit: 3.714 us | }
ldlm_bl_13-35551 [034] 164201.591138: funcgraph_exit: 4.279 us | }
ldlm_bl_13-35551 [034] 164201.591138: funcgraph_entry: | cl_env_put() {
ldlm_bl_13-35551 [034] 164201.591138: funcgraph_entry: 0.034 us | lu_context_exit();
ldlm_bl_13-35551 [034] 164201.591139: funcgraph_entry: 0.030 us | lu_context_exit();
ldlm_bl_13-35551 [034] 164201.591139: funcgraph_entry: 0.030 us | _raw_read_lock();
ldlm_bl_13-35551 [034] 164201.591139: funcgraph_exit: 0.990 us | }
ldlm_bl_13-35551 [034] 164201.591140: funcgraph_exit: 8.253 us | }
easy to see - mdc_dlm_blocking_ast0 skipped at begin, it mean lock isn't granted or no l_ast_data aka osc object assigned. Data was obtained from page cache later. <...>-40843 [000] 164229.430007: funcgraph_entry: | ll_do_fast_read() {
<...>-40843 [000] 164229.430009: funcgraph_entry: | generic_file_read_iter() {
<...>-40843 [000] 164229.430010: funcgraph_entry: 0.044 us | _cond_resched();
<...>-40843 [000] 164229.430010: funcgraph_entry: | pagecache_get_page() {
<...>-40843 [000] 164229.430010: funcgraph_entry: 0.706 us | find_get_entry();
<...>-40843 [000] 164229.430011: funcgraph_exit: 1.078 us | }
<...>-40843 [000] 164229.430012: funcgraph_entry: | mark_page_accessed() {
<...>-40843 [000] 164229.430012: funcgraph_entry: 0.088 us | activate_page();
<...>-40843 [000] 164229.430012: funcgraph_entry: 0.143 us | workingset_activation();
<...>-40843 [000] 164229.430013: funcgraph_exit: 0.925 us | }
<...>-40843 [000] 164229.430014: funcgraph_entry: 0.032 us | _cond_resched();
<...>-40843 [000] 164229.430014: funcgraph_entry: | pagecache_get_page() {
<...>-40843 [000] 164229.430014: funcgraph_entry: 0.070 us | find_get_entry();
<...>-40843 [000] 164229.430014: funcgraph_exit: 0.401 us | }
<...>-40843 [000] 164229.430015: funcgraph_entry: | mark_page_accessed() {
<...>-40843 [000] 164229.430015: funcgraph_entry: 0.037 us | activate_page();
<...>-40843 [000] 164229.430015: funcgraph_entry: 0.039 us | workingset_activation();
<...>-40843 [000] 164229.430015: funcgraph_exit: 0.649 us | }
....
Short description - how it was hit. 2) DoM's read on open corruption. ... |
| Comments |
| Comment by Alexey Lyashkov [ 09/Jun/20 ] |
|
Several other corruption cases related to situation "lock without l_ast_data assigned". Inspired discussion of review of KMS bug ( 1) layout change vs lock cancel. layout change disconnects an locks from it's object and wait it will pickup at lock enqueue time. Lock cancel run have no chance to flush pages in this case. 2) Inode destroy case. Inode destroy will cause a ast disconnect also, but inode recreation can found an old lock during check_and_discard run without l_ast_data assigned and page flush is not possible. 3) Layout change vs DoM cancel lock. MD lock can downgraded to lost all bits except an DoM, so it will go through lov to flush a data, but situation where lov can't find a DoM component is possible. So pages will still in page cache. 4) it looks like a tiny write ( |
| Comment by Alexey Lyashkov [ 19/Jun/20 ] |
|
I can drop a some cases after research with Vitaly.
--- a/lustre/tests/sanity-pfl.sh
+++ b/lustre/tests/sanity-pfl.sh
@@ -855,8 +855,10 @@ test19_io_base() {
error “Create $comp_file failed”
fi
+ dd if=/dev/zero of=$comp_file bs=100K count=1 conv=notrunc ||
+ error “dd to extend faied”
# write past end of first component, so it is extended
- dd if=/dev/zero of=$comp_file bs=1M count=1 seek=127 conv=notrunc ||
+ dd if=/dev/zero of=$comp_file bs=100K count=1 seek=1270 conv=notrunc ||
error “dd to extend failed”
local ost_idx1=$($LFS getstripe -I1 -i $comp_file)
result is == sanity-pfl test 19a: Simple test of extension behavior ============================================ 18:07:26 (1592492846) 1+0 records in 1+0 records out 1048576 bytes (1.0 MB) copied, 0.0327618 s, 32.0 MB/s 1+0 records in 1+0 records out 1048576 bytes (1.0 MB) copied, 0.0535577 s, 19.6 MB/s Pass! Layout change vs DoM cancel is still possible but very hard to reach. I think adding a LASSERT in this place will be good to confirm data isn't corrupted, once assert will hit - dom blocking callback for "complex" ibit locks need reworked.
Vitaly investigation about group lock problem say - it hard to reproduce but it don't 100% same logic as expected for the extents locks. Additional problem is group id generation for swap layout. random id used for this case, but it's not a unique value over large cluster and should avoid as possible. So currently we can focus on two confirmed bugs. 1) mdc check_and_discard function can skip a object discard due lock without osc object assigned. 2) and fixing an dom read-on-open, which put a uptodate pages into page cache, but ldlm lock don't have an osc object assigned and have no way to the flush any data. Patch in process. |
| Comment by Alexey Lyashkov [ 03/Jul/20 ] |
|
It looks bugs affected an any Lustre versions includes an DoM and Lock Ahead features. Initial testing say - most bugs can be fixed with two low risks patches. Some problems with group locks/unprotected layout change invested separately. Patch submit blocked due master branch build breakage with Redhat debug kernel caused a James S. backports for xarray. |
| Comment by Cory Spitz [ 06/Jul/20 ] |
|
From Alexey in Linux Lustre Client slack:
|
| Comment by Gerrit Updater [ 08/Jul/20 ] |
|
Alexey Lyashkov (alexey.lyashkov@hpe.com) uploaded a new patch: https://review.whamcloud.com/39319 |
| Comment by Gerrit Updater [ 16/Jul/20 ] |
|
Vitaly Fertman (vitaly.fertman@hpe.com) uploaded a new patch: https://review.whamcloud.com/39405 |
| Comment by Gerrit Updater [ 16/Jul/20 ] |
|
Vitaly Fertman (vitaly.fertman@hpe.com) uploaded a new patch: https://review.whamcloud.com/39406 |
| Comment by Gerrit Updater [ 13/Aug/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39405/ |
| Comment by Gerrit Updater [ 19/Sep/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39318/ |
| Comment by Gerrit Updater [ 30/Oct/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39406/ |
| Comment by Gerrit Updater [ 30/Oct/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39878/ |
| Comment by Peter Jones [ 30/Oct/20 ] |
|
All patches landed for 2.14 |
| Comment by Gerrit Updater [ 03/Mar/23 ] |
|
"Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50199 |
| Comment by Gerrit Updater [ 03/Mar/23 ] |
|
"Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50200 |