[LU-14045] Fix O_DIRECT and encrypted files Created: 19/Oct/20 Updated: 07/Jan/21 Resolved: 07/Nov/20 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.14.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Sebastien Buisson | Assignee: | Sebastien Buisson |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | patch, sec | ||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
Because of patch https://review.whamcloud.com/38967, we can end up in a situation where osc_release_bounce_pages() mistakenly consider pages as fscrypt bounce pages, and tries to free them, as shown in the stack below. 2020-10-18 15:26:49 [ 4462.081809][T14012] Lustre: DEBUG MARKER: == sanity test 56w: check lfs_migrate -c stripe_count works ========================================== 15:26:49 (1603049209) 2020-10-18 15:26:52 [ 4464.514691][T30281] BUG: kernel NULL pointer dereference, address: 0000000000000048 2020-10-18 15:26:52 [ 4464.524282][T30281] #PF: supervisor read access in kernel mode 2020-10-18 15:26:52 [ 4464.532011][T30281] #PF: error_code(0x0000) - not-present page 2020-10-18 15:26:52 [ 4464.539709][T30281] PGD 80000007edcce067 P4D 80000007edcce067 PUD 7f1306067 PMD 0 2020-10-18 15:26:52 [ 4464.549144][T30281] Oops: 0000 [#1] PREEMPT SMP PTI 2020-10-18 15:26:52 [ 4464.555851][T30281] CPU: 0 PID: 30281 Comm: ptlrpcd_00_04 Tainted: G W 5.7.0-rc7+ #1 2020-10-18 15:26:52 [ 4464.566720][T30281] Hardware name: Supermicro Super Server/To be filled by O.E.M., BIOS 2.0b 08/12/2016 2020-10-18 15:26:52 [ 4464.577932][T30281] RIP: 0010:mempool_free+0x12/0x80 2020-10-18 15:26:52 [ 4464.584690][T30281] Code: 60 e8 ff cc cc cc cc cc 0f 1f 44 00 00 e9 86 a3 08 00 66 0f 1f 44 00 00 0f 1f 44 00 00 55 48 85 ff 48 89 fd 53 74 1a 48 89 f3 <8b> 46 48 39 46 4c 7c 12 48 8b 73 58 48 8b 43 68 48 89 ef 5b 5d ff 2020-10-18 15:26:52 [ 4464.607734][T30281] RSP: 0018:ffffc9002414fcc0 EFLAGS: 00010282 2020-10-18 15:26:52 [ 4464.615423][T30281] RAX: ffff8887d44fb5e0 RBX: 0000000000000000 RCX: 0000000000000000 2020-10-18 15:26:52 [ 4464.625013][T30281] RDX: ffff888845abb780 RSI: 0000000000000000 RDI: ffffea001f553340 2020-10-18 15:26:52 [ 4464.634577][T30281] RBP: ffffea001f553340 R08: 0000000000000000 R09: 0000000000000000 2020-10-18 15:26:52 [ 4464.644109][T30281] R10: 0000000000000000 R11: 000000000000000f R12: 0000000000000000 2020-10-18 15:26:52 [ 4464.653614][T30281] R13: ffff8887d736c9f0 R14: 0000000000000010 R15: ffff888845abb780 2020-10-18 15:26:52 [ 4464.663095][T30281] FS: 0000000000000000(0000) GS:ffff88885e600000(0000) knlGS:0000000000000000 2020-10-18 15:26:52 [ 4464.673521][T30281] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 2020-10-18 15:26:52 [ 4464.681579][T30281] CR2: 0000000000000048 CR3: 00000007cf9fa004 CR4: 00000000001606f0 2020-10-18 15:26:52 [ 4464.691015][T30281] Call Trace: 2020-10-18 15:26:52 [ 4464.695751][T30281] brw_interpret+0xac/0xa60 [osc] 2020-10-18 15:26:52 [ 4464.702190][T30281] ? _raw_spin_unlock+0x29/0x50 2020-10-18 15:26:52 [ 4464.708490][T30281] ptlrpc_check_set+0x329/0x1790 [ptlrpc] 2020-10-18 15:26:52 [ 4464.715599][T30281] ptlrpcd_check+0x411/0x460 [ptlrpc] 2020-10-18 15:26:52 [ 4464.722318][T30281] ptlrpcd+0x278/0x300 [ptlrpc] 2020-10-18 15:26:52 [ 4464.728463][T30281] ? remove_wait_queue+0x60/0x60 2020-10-18 15:26:52 [ 4464.734667][T30281] kthread+0x12a/0x170 2020-10-18 15:26:52 [ 4464.739993][T30281] ? ptlrpcd_check+0x460/0x460 [ptlrpc] 2020-10-18 15:26:52 [ 4464.746745][T30281] ? kthread_bind+0x10/0x10 2020-10-18 15:26:52 [ 4464.752431][T30281] ret_from_fork+0x24/0x30 |
| Comments |
| Comment by Gerrit Updater [ 19/Oct/20 ] |
|
Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/40295 |
| Comment by James A Simmons [ 19/Oct/20 ] |
|
Here is an occurence of the crash hit without fix https://review.whamcloud.com/40295 Test: |
| Comment by Yang Sheng [ 20/Oct/20 ] |
|
https://testing.whamcloud.com/test_sessions/8cfbafe5-ac90-4a05-831f-e9f636a229a5 |
| Comment by Bruno Faccini (Inactive) [ 20/Oct/20 ] |
|
+2 with recent master at https://testing.whamcloud.com/test_sets/24317b7d-ea90-4b01-ae0a-e01b5284c227 and https://testing.whamcloud.com/test_sets/fb2d522c-391e-4979-a709-c6c4d8a967a0 |
| Comment by Andreas Dilger [ 21/Oct/20 ] |
|
I may be conflating two issues, but AFAICS, sanity test_56w has only crashed a couple of times in the past 4 weeks: and those were both on 2020-10-10 when testing patch https://review.whamcloud.com/38883 "LU-11621 utils: optimize migrate_copy_data() with copy_file_range()". The only other crash started on If this is related to crypto, it appears the source of the funky pages is the splice IO from "splice". The two failed sanity test_56w are testing copy_file_range() that is also using in-kernel data copying, similar to splice. Since the pages are generated in a source filesystem and sent to the target, it isn't whether we can play games with the mapping or not, so it might be better to use a page flag (e.g. PageChecked, maybe with a better wrapper like PageCrypto for Lustre)? |
| Comment by Andreas Dilger [ 21/Oct/20 ] |
|
Stack trace from sanity.sh test_426: [15000.400779] Lustre: DEBUG MARKER: == sanity test 426: splice test on Lustre ==== 20:58:26 (1603227506) [15001.080742] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000004 [15001.102937] user pgtable: 64k pages, 48-bit VAs, pgdp = 000000009f14b2d0 [15001.111120] Internal error: Oops: 96000005 [#1] SMP [15001.149680] CPU: 1 PID: 11273 Comm: ptlrpcd_01_01 4.18.0-147.8.1.el8_1.aarch64 #1 [15001.164523] pc : mempool_free+0x24/0xe0 [15001.167022] lr : llcrypt_free_bounce_page.part.1+0x38/0x48 [libcfs] [15001.223444] Process ptlrpcd_01_01 (pid: 11273, stack limit = 0x00000000f9135a93) [15001.228185] Call trace: [15001.229806] mempool_free+0x24/0xe0 [15001.232143] llcrypt_free_bounce_page.part.1+0x38/0x48 [libcfs] [15001.236007] llcrypt_free_bounce_page+0x24/0x30 [libcfs] [15001.239541] brw_interpret+0x124/0x10c8 [osc] [15001.242729] ptlrpc_check_set+0x688/0x3318 [ptlrpc] [15001.246031] ptlrpcd_check+0x470/0x820 [ptlrpc] [15001.249060] ptlrpcd+0x3d4/0x5c8 [ptlrpc] [15001.251673] kthread+0x130/0x138 |
| Comment by Andreas Dilger [ 21/Oct/20 ] |
|
I've pushed patch https://review.whamcloud.com/40326 "LU-13745 tests: skip sanity test_426 for 4.18+" to skip this test until the issue is resolved. |
| Comment by Gerrit Updater [ 07/Nov/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40295/ |
| Comment by Peter Jones [ 07/Nov/20 ] |
|
Landed for 2.14 |