[LU-17388] sanity-flr test 38: resync panic. Created: 28/Dec/23  Updated: 11/Jan/24

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Blocker
Reporter: Alexey Lyashkov Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None
Environment:

RHEL 8.4 debug kernel.
Don't reproduced with ONLY=38, but easy to reproduce with full test suite run.


Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   
[ 8554.748991] Lustre: DEBUG MARKER: == sanity-flr test 36d: write/punch FLR file update OST layout version ========================================================== 21:00:04 (1703786404)
[ 8588.313641] Lustre: DEBUG MARKER: == sanity-flr test 37: mirror I/O API verification ======= 21:00:38 (1703786438)
[ 8600.635491] Lustre: Unmounted lustre-client
[ 8600.944134] Lustre: Mounted lustre-client
[ 8630.238315] Lustre: DEBUG MARKER: == sanity-flr test 38: resync ============================ 21:01:20 (1703786480)
[ 8633.346584] page:ffffea00071e1d80 refcount:0 mapcount:1 mapping:dead000000000400 index:0x7f2232076 compound_mapcount: 1
[ 8633.348480] anon flags: 0x17ffffc0000000()
[ 8633.349531] raw: 0017ffffc0000000 ffffea00071e0001 dead000000000200 dead000000000400
[ 8633.351227] raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000
[ 8633.352187] page dumped because: VM_BUG_ON_PAGE(PageTail(page))
[ 8633.352932] ------------[ cut here ]------------
[ 8633.353521] kernel BUG at include/linux/page-flags.h:505!
[ 8633.354219] invalid opcode: 0000 [#1] SMP KASAN PTI
[ 8633.354797] CPU: 1 PID: 237586 Comm: lt-lfs Tainted: G    B   W  OE    ---------r-  - 4.18.0-305.25.1.el8_4.x86_64+debug #1
[ 8633.356130] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.16.0-4.module_el8.9.0+3659+9c8643f3 04/01/2014
[ 8633.357256] RIP: 0010:lov_page_init_empty+0x2a4/0x330 [lov]
[ 8633.357985] Code: c0 59 c2 c7 05 51 76 05 00 01 00 00 00 e8 34 28 d6 fe 5b 31 c0 5d 41 5c 41 5d c3 48 c7 c6 e0 34 56 c2 48 89 df e8 9c c8 ed ca <0f> 0b 48 c7 c7 c0 bf 59 c2 e8 89 55 62 cb 48 89 ef e8 66 ac fd ca
[ 8633.360240] RSP: 0018:ffff8882067c73f0 EFLAGS: 00010282
[ 8633.360853] RAX: dffffc0000000000 RBX: ffffea00071e1d80 RCX: 0000000000000007
[ 8633.361704] RDX: 1ffffd4000e3c3b7 RSI: 0000000000000000 RDI: ffffea00071e1db8
[ 8633.362568] RBP: ffffffffc12fc380 R08: ffffed1044f3bda5 R09: ffffed1044f3bda5
[ 8633.363409] R10: ffff8882279ded23 R11: ffffed1044f3bda4 R12: ffff8881f6e3fc38
[ 8633.364263] R13: ffff8881f6e3fc20 R14: ffff8881f5e3fc90 R15: ffff888224cdb458
[ 8633.365116] FS:  00007f2233d02480(0000) GS:ffff888227800000(0000) knlGS:0000000000000000
[ 8633.366068] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8633.366746] CR2: 00007f2231e75000 CR3: 000000021cf8c006 CR4: 0000000000020ee0
[ 8633.367607] Call Trace:
[ 8633.367922]  lov_page_init_composite+0x95d/0x10a0 [lov]
[ 8633.368588]  ? lov_page_init_empty+0x330/0x330 [lov]
[ 8633.369274]  ? cl_page_alloc+0xac5/0x13f0 [obdclass]
[ 8633.369901]  cl_page_alloc+0x8cf/0x13f0 [obdclass]
[ 8633.370516]  ? __kmalloc_node+0x17a/0x2a0
[ 8633.371069]  ? cl_page_make_ready+0xa80/0xa80 [obdclass]
[ 8633.371707]  ? iov_iter_get_pages_alloc+0x23d/0x10e0
[ 8633.372491]  ? __init_waitqueue_head+0x9c/0x110
[ 8633.373066]  ? memset+0x1f/0x40
[ 8633.373499]  cl_page_find+0x3d3/0x620 [obdclass]
[ 8633.374138]  ll_direct_IO_impl+0x10d5/0x2ab0 [lustre]
[ 8633.374773]  ? ll_write_end+0x12b0/0x12b0 [lustre]
[ 8633.375379]  ? rcu_read_unlock+0x50/0x50
[ 8633.375846]  ? touch_atime+0xca/0x250
[ 8633.376314]  generic_file_read_iter+0x1ed/0x4c0
[ 8633.376853]  ? trace_hardirqs_on+0x20/0x195
[ 8633.377410]  vvp_io_read_start+0x1042/0x18f0 [lustre]
[ 8633.378071]  ? vvp_io_setattr_fini+0x180/0x180 [lustre]
[ 8633.378706]  ? lov_lock_init_composite+0x1b1/0x1f0 [lov]
[ 8633.379408]  ? cl_lock_request+0x148/0x370 [obdclass]
[ 8633.380073]  cl_io_start+0x187/0x3a0 [obdclass]
[ 8633.380667]  cl_io_loop+0x183/0x490 [obdclass]
[ 8633.381265]  ll_file_io_generic+0x937/0x2540 [lustre]
[ 8633.381897]  ? ll_io_init+0x1080/0x1080 [lustre]
[ 8633.382518]  ll_file_read_iter+0x1505/0x2a60 [lustre]
[ 8633.383181]  ? ll_file_write_iter+0x21a0/0x21a0 [lustre]
[ 8633.383809]  ? lock_downgrade+0x710/0x710
[ 8633.384348]  ? ll_getattr_dentry+0xaeb/0x2600 [lustre]
[ 8633.385084]  new_sync_read+0x390/0x550
[ 8633.385529]  ? do_iter_readv_writev+0x6d0/0x6d0
[ 8633.386096]  ? lock_downgrade+0x710/0x710
[ 8633.386569]  ? rcu_read_unlock+0x50/0x50
[ 8633.387067]  ? __ia32_sys_lstat+0x70/0x70
[ 8633.387559]  ? fsnotify_first_mark+0x150/0x150
[ 8633.388116]  vfs_read+0xff/0x300
[ 8633.388511]  ksys_pread64+0x11b/0x140
[ 8633.388949]  ? __audit_syscall_exit+0x796/0xab0
[ 8633.389516]  ? __ia32_sys_write+0xb0/0xb0
[ 8633.389996]  ? trace_hardirqs_on_thunk+0x1a/0x20
[ 8633.390570]  ? trace_hardirqs_on_caller+0x22/0x1a0
[ 8633.391213]  ? do_syscall_64+0x22/0x430
[ 8633.391670]  do_syscall_64+0xa5/0x430
[ 8633.392141]  entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[ 8633.392734] RIP: 0033:0x7f22334491c8
[ 8633.393193] Code: b8 ff ff ff ff eb c5 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 8b 05 86 d2 20 00 49 89 ca 85 c0 75 17 b8 11 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 60 c3 0f 1f 80 00 00 00 00 41 55 49 89 cd 41
[ 8633.395464] RSP: 002b:00007ffc7847e528 EFLAGS: 00000246 ORIG_RAX: 0000000000000011
[ 8633.396380] RAX: ffffffffffffffda RBX: 0000000000400000 RCX: 00007f22334491c8
[ 8633.397236] RDX: 0000000000400000 RSI: 00007f2231e76000 RDI: 0000000000000003
[ 8633.398098] RBP: 00007f2231e76000 R08: 0000000000000000 R09: 0000000000000000
[ 8633.398956] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[ 8633.399816] R13: 0000000000000003 R14: 0000000000000fff R15: 0000000000000000
[ 8633.400675] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) jbd2 mbcache rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache iTCO_wdt iTCO_vendor_support joydev crct10dif_pclmul crc32_pclmul ghash_clmulni_intel qxl drm_ttm_helper ttm pcspkr drm_kms_helper syscopyarea sysfillrect i6300esb virtio_balloon sysimgblt fb_sys_fops drm lpc_ich i2c_i801 sunrpc vfat fat ip_tables xfs libcrc32c ahci libahci virtio_console crc32c_intel virtio_scsi e1000 virtio_blk libata serio_raw [last unloaded: libcfs]
[ 8633.408363] ---[ end trace f6b2871834a024d8 ]---
[ 8633.409254] RIP: 0010:lov_page_init_empty+0x2a4/0x330 [lov]
[ 8633.410292] Code: c0 59 c2 c7 05 51 76 05 00 01 00 00 00 e8 34 28 d6 fe 5b 31 c0 5d 41 5c 41 5d c3 48 c7 c6 e0 34 56 c2 48 89 df e8 9c c8 ed ca <0f> 0b 48 c7 c7 c0 bf 59 c2 e8 89 55 62 cb 48 89 ef e8 66 ac fd ca
[ 8633.413491] RSP: 0018:ffff8882067c73f0 EFLAGS: 00010282
[ 8633.414539] RAX: dffffc0000000000 RBX: ffffea00071e1d80 RCX: 0000000000000007
[ 8633.415772] RDX: 1ffffd4000e3c3b7 RSI: 0000000000000000 RDI: ffffea00071e1db8
[ 8633.417060] RBP: ffffffffc12fc380 R08: ffffed1044f3bda5 R09: ffffed1044f3bda5
[ 8633.418332] R10: ffff8882279ded23 R11: ffffed1044f3bda4 R12: ffff8881f6e3fc38
[ 8633.419697] R13: ffff8881f6e3fc20 R14: ffff8881f5e3fc90 R15: ffff888224cdb458
[ 8633.421061] FS:  00007f2233d02480(0000) GS:ffff888227800000(0000) knlGS:0000000000000000
[ 8633.422602] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8633.423714] CR2: 00007f2231e75000 CR3: 000000021cf8c006 CR4: 0000000000020ee0
[ 8633.425090] Kernel panic - not syncing: Fatal exception
503 static __always_inline void SetPageUptodate(struct page *page)
504 {
505         VM_BUG_ON_PAGE(PageTail(page), page);
506         /*
507          * Memory barrier must be issued before setting the PG_uptodate bit,
508          * so that all previous stores issued in order to bring the page
509          * uptodate are actually visible before PageUptodate becomes true.
510          */
511         smp_wmb();
512         set_bit(PG_uptodate, &page->flags);
513 }


 Comments   
Comment by Andreas Dilger [ 11/Jan/24 ]

It seems possible that this might relate to some of the other stale page in cache issues that have been seen recently.

Generated at Sat Feb 10 03:35:02 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.