[LU-17110] Slab corruption using fiemap ioctl with fm_extent_count==0 Created: 12/Sep/23 Updated: 28/Sep/23 Resolved: 28/Sep/23 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Etienne Aujames | Assignee: | Etienne Aujames |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
2.15.3 clients |
||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
We hit this initially on a production env with 2.15 clients using mpifileutils dsync. Reproducer (reproduced on master branch): #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <linux/fs.h> #include <linux/fiemap.h> int main(int argc, char **argv) { char *fname; int fsize; int fd; int i; struct fiemap fiemap = { .fm_start = 0, .fm_flags = FIEMAP_FLAG_SYNC, .fm_extent_count = 0, .fm_mapped_extents = 0, }; if (argc <= 1) return 1; fname = argv[1]; fd = open(fname, O_RDONLY); if (fd < 0) { perror("Failed to open"); return 1; } fsize = lseek(fd, 0, SEEK_END); if (fsize < 0) return 1; lseek(fd, 0, SEEK_SET); fiemap.fm_length = fsize; while (1) { printf("iter: %i\n", ++i); if (ioctl(fd, FS_IOC_FIEMAP, &fiemap) < 0) { perror("FS_IOC_FIEMAP ioctl failed"); return 1; } usleep(1000); } return 0; } 116.791028] WARNING: CPU: 1 PID: 13475 at lib/list_debug.c:33 __list_add+0xac/0xc0 [ 116.791035] list_add corruption. prev->next should be next (ffff8848b7c2d390), but was ffff8848b7c2d391. (prev=ffff8848a584fe28). [ 116.791039] Modules linked in: loop zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zcommon(POE) znvpair(POE) zavl(POE) icp(POE) spl(OE) lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) joydev libcfs(OE) dm_flakey rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd cuse grace fuse fscache sunrpc ext4 mbcache jbd2 ppdev iosf_mbi crc32_pclmul ghash_clmulni_intel snd_intel8x0 snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm aesni_intel lrw gf128mul glue_helper ablk_helper cryptd sg pcspkr parport_pc snd_timer vboxguest(OE) snd parport video soundcore i2c_piix4 binfmt_misc ip_tables xfs libcrc32c sr_mod [ 116.791250] cdrom sd_mod crc_t10dif crct10dif_generic ata_generic pata_acpi vmwgfx drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix ahci libahci crct10dif_pclmul crct10dif_common crc32c_intel serio_raw e1000 libata drm_panel_orientation_quirks dm_mirror dm_region_hash dm_log dm_mod [ 116.791317] CPU: 1 PID: 13475 Comm: fiemap_test Kdump: loaded Tainted: P W OE ------------ 3.10.0-1160.59.1.el7.centos.plus.x86_64 #1 [ 116.791322] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 116.791327] Call Trace: [ 116.791348] [<ffffffff89d975b9>] dump_stack+0x19/0x1b [ 116.791358] [<ffffffff8969b278>] __warn+0xd8/0x100 [ 116.791364] [<ffffffff8969b2ff>] warn_slowpath_fmt+0x5f/0x80 [ 116.791374] [<ffffffff899b745c>] __list_add+0xac/0xc0 [ 116.791417] [<ffffffffc08964ba>] libcfs_debug_msg+0x2da/0xac0 [libcfs] [ 116.791431] [<ffffffff899a351b>] ? string.isra.7+0x3b/0xf0 [ 116.791669] [<ffffffffc0d55040>] ? lock_matches+0x230/0x230 [ptlrpc] [ 116.791837] [<ffffffffc0d51cc7>] _ldlm_lock_debug+0x647/0x830 [ptlrpc] [ 116.791944] [<ffffffffc0d5358d>] ? ldlm_lock_remove_from_lru_nolock+0x3d/0xe0 [ptlrpc] [ 116.792046] [<ffffffffc0d55040>] ? lock_matches+0x230/0x230 [ptlrpc] [ 116.792157] [<ffffffffc0d54d10>] ldlm_lock_addref_internal_nolock+0x80/0x100 [ptlrpc] [ 116.792282] [<ffffffffc0d5503b>] lock_matches+0x22b/0x230 [ptlrpc] [ 116.792391] [<ffffffffc0d5508e>] itree_overlap_cb+0x4e/0x70 [ptlrpc] [ 116.792511] [<ffffffffc0a7ae3b>] interval_search+0x8b/0x220 [obdclass] [ 116.792735] [<ffffffffc0d51534>] search_itree+0x94/0xd0 [ptlrpc] [ 116.792878] [<ffffffffc0d5612f>] ldlm_lock_match_with_skip+0x29f/0x9a0 [ptlrpc] [ 116.792892] [<ffffffff899a4c64>] ? vsnprintf+0x234/0x6a0 [ 116.792908] [<ffffffff899a4c64>] ? vsnprintf+0x234/0x6a0 [ 116.792944] [<ffffffffc0fa0fbd>] osc_object_fiemap+0x15d/0x6a0 [osc] [ 116.793033] [<ffffffffc0a66313>] cl_object_fiemap+0x73/0x160 [obdclass] [ 116.793066] [<ffffffffc10324f0>] lov_object_fiemap+0x1300/0x18f0 [lov] [ 116.793131] [<ffffffffc1701c50>] ? vvp_io_fini+0x410/0x710 [lustre] [ 116.793215] [<ffffffffc0a66313>] cl_object_fiemap+0x73/0x160 [obdclass] [ 116.793267] [<ffffffffc169cb5c>] ll_do_fiemap+0x2bc/0x390 [lustre] [ 116.793318] [<ffffffffc169d057>] ll_fiemap+0x427/0x5f0 [lustre] [ 116.793332] [<ffffffff89863934>] do_vfs_ioctl+0x204/0x5b0 [ 116.793343] [<ffffffff89863d81>] SyS_ioctl+0xa1/0xc0 [ 116.793356] [<ffffffff89daaec9>] ? system_call_after_swapgs+0x96/0x13a [ 116.793366] [<ffffffff89daaf92>] system_call_fastpath+0x25/0x2a [ 116.793379] [<ffffffff89daaed5>] ? system_call_after_swapgs+0xa2/0x13a [ 116.793388] ---[ end trace d89bc4ba123f5eec ]--- [ 117.442039] ------------[ cut here ]------------ [ 117.442047] WARNING: CPU: 1 PID: 13475 at lib/list_debug.c:62 __list_del_entry+0x82/0xd0 [ 117.442049] list_del corruption. next->prev should be ffff8848a584f1a8, but was a5ffff8848a584f1 [ 117.442050] Modules linked in: loop zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zcommon(POE) znvpair(POE) zavl(POE) icp(POE) spl(OE) lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) joydev libcfs(OE) dm_flakey rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd cuse grace fuse fscache sunrpc ext4 mbcache jbd2 ppdev iosf_mbi crc32_pclmul ghash_clmulni_intel snd_intel8x0 snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm aesni_intel lrw gf128mul glue_helper ablk_helper cryptd sg pcspkr parport_pc snd_timer vboxguest(OE) snd parport video soundcore i2c_piix4 binfmt_misc ip_tables xfs libcrc32c sr_mod [ 117.442116] cdrom sd_mod crc_t10dif crct10dif_generic ata_generic pata_acpi vmwgfx drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix ahci libahci crct10dif_pclmul crct10dif_common crc32c_intel serio_raw e1000 libata drm_panel_orientation_quirks dm_mirror dm_region_hash dm_log dm_mod [ 117.442139] CPU: 1 PID: 13475 Comm: fiemap_test Kdump: loaded Tainted: P W OE ------------ 3.10.0-1160.59.1.el7.centos.plus.x86_64 #1 [ 117.442141] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 117.442143] Call Trace: [ 117.442149] [<ffffffff89d975b9>] dump_stack+0x19/0x1b [ 117.442152] [<ffffffff8969b278>] __warn+0xd8/0x100 [ 117.442155] [<ffffffff8969b2ff>] warn_slowpath_fmt+0x5f/0x80 [ 117.442174] [<ffffffffc092ffff>] ? lnet_rtrpools_alloc+0x17f/0x320 [lnet] [ 117.442177] [<ffffffff899b74f2>] __list_del_entry+0x82/0xd0 [ 117.442187] [<ffffffffc0896185>] cfs_tage_to_tail+0x25/0x80 [libcfs] [ 117.442195] [<ffffffffc0896abd>] libcfs_debug_msg+0x8dd/0xac0 [libcfs] [ 117.442199] [<ffffffff899a351b>] ? string.isra.7+0x3b/0xf0 [ 117.442255] [<ffffffffc0d55040>] ? lock_matches+0x230/0x230 [ptlrpc] [ 117.442300] [<ffffffffc0d51cc7>] _ldlm_lock_debug+0x647/0x830 [ptlrpc] [ 117.442345] [<ffffffffc0d5358d>] ? ldlm_lock_remove_from_lru_nolock+0x3d/0xe0 [ptlrpc] [ 117.442388] [<ffffffffc0d55040>] ? lock_matches+0x230/0x230 [ptlrpc] [ 117.442432] [<ffffffffc0d54d10>] ldlm_lock_addref_internal_nolock+0x80/0x100 [ptlrpc] [ 117.442473] [<ffffffffc0d5503b>] lock_matches+0x22b/0x230 [ptlrpc] [ 117.442514] [<ffffffffc0d5508e>] itree_overlap_cb+0x4e/0x70 [ptlrpc] [ 117.442568] [<ffffffffc0a7ae3b>] interval_search+0x8b/0x220 [obdclass] [ 117.442653] [<ffffffffc0d51534>] search_itree+0x94/0xd0 [ptlrpc] [ 117.442698] [<ffffffffc0d5612f>] ldlm_lock_match_with_skip+0x29f/0x9a0 [ptlrpc] [ 117.442704] [<ffffffff899a4c64>] ? vsnprintf+0x234/0x6a0 [ 117.442707] [<ffffffff899a4c64>] ? vsnprintf+0x234/0x6a0 [ 117.442719] [<ffffffffc0fa0fbd>] osc_object_fiemap+0x15d/0x6a0 [osc] [ 117.442746] [<ffffffffc0a66313>] cl_object_fiemap+0x73/0x160 [obdclass] [ 117.442758] [<ffffffffc10324f0>] lov_object_fiemap+0x1300/0x18f0 [lov] [ 117.442792] [<ffffffffc1701c50>] ? vvp_io_fini+0x410/0x710 [lustre] [ 117.442832] [<ffffffffc0a66313>] cl_object_fiemap+0x73/0x160 [obdclass] [ 117.442847] [<ffffffffc169cb5c>] ll_do_fiemap+0x2bc/0x390 [lustre] [ 117.442862] [<ffffffffc169d057>] ll_fiemap+0x427/0x5f0 [lustre] [ 117.442867] [<ffffffff89863934>] do_vfs_ioctl+0x204/0x5b0 [ 117.442870] [<ffffffff89863d81>] SyS_ioctl+0xa1/0xc0 [ 117.442875] [<ffffffff89daaec9>] ? system_call_after_swapgs+0x96/0x13a [ 117.442881] [<ffffffff89daaf92>] system_call_fastpath+0x25/0x2a [ 117.442885] [<ffffffff89daaed5>] ? system_call_after_swapgs+0xa2/0x13a [ 117.442887] ---[ end trace d89bc4ba123f5eed ]--- [ 118.280802] general protection fault: 0000 [#1] SMP [ 118.280822] Modules linked in: loop zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zcommon(POE) znvpair(POE) zavl(POE) icp(POE) spl(OE) lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) joydev libcfs(OE) dm_flakey rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd cuse grace fuse fscache sunrpc ext4 mbcache jbd2 ppdev iosf_mbi crc32_pclmul ghash_clmulni_intel snd_intel8x0 snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm aesni_intel lrw gf128mul glue_helper ablk_helper cryptd sg pcspkr parport_pc snd_timer vboxguest(OE) snd parport video soundcore i2c_piix4 binfmt_misc ip_tables xfs libcrc32c sr_mod [ 118.281074] cdrom sd_mod crc_t10dif crct10dif_generic ata_generic pata_acpi vmwgfx drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix ahci libahci crct10dif_pclmul crct10dif_common crc32c_intel serio_raw e1000 libata drm_panel_orientation_quirks dm_mirror dm_region_hash dm_log dm_mod [ 118.281166] CPU: 1 PID: 13478 Comm: abrt-server Kdump: loaded Tainted: P W OE ------------ 3.10.0-1160.59.1.el7.centos.plus.x86_64 #1 [ 118.281195] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 118.281214] task: ffff8848b767a100 ti: ffff8848b8b60000 task.ti: ffff8848b8b60000 [ 118.281231] RIP: 0010:[<ffffffffc04519e0>] [<ffffffffc04519e0>] xfs_trans_buf_item_match+0x60/0xa0 [xfs] [ 118.281271] RSP: 0018:ffff8848b8b63948 EFLAGS: 00010212 [ 118.281284] RAX: 60ffff8848b8b873 RBX: ffff8848ba1181b0 RCX: ffff88487633f5c1 [ 118.281300] RDX: ffff8848b8b639b8 RSI: ffff8848b69e93c0 RDI: ffff8848ba118260 [ 118.281317] RBP: ffff8848b8b63948 R08: 0000000000000008 R09: ffff8848b59f1f28 [ 118.281333] R10: 0000000002c37820 R11: 0000000000000001 R12: ffff8848b6ab9000 [ 118.281349] R13: ffff8848b8b63a00 R14: ffff8848b8b639b8 R15: ffff8848b69e93c0 [ 118.281366] FS: 00007fa5e1243900(0000) GS:ffff8848bfd00000(0000) knlGS:0000000000000000 [ 118.281384] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 118.281399] CR2: 00007fa5e126f000 CR3: 000000003bece000 CR4: 00000000000606e0 [ 118.281438] Call Trace: [ 118.281460] [<ffffffffc0451da2>] xfs_trans_read_buf_map+0x52/0x2c0 [xfs] [ 118.281485] [<ffffffffc03f6934>] xfs_btree_read_buf_block.constprop.33+0xa4/0xe0 [xfs] [ 118.281513] [<ffffffffc03faa45>] xfs_btree_lookup_get_block+0x95/0x1a0 [xfs] [ 118.281543] [<ffffffffc03facff>] xfs_btree_lookup+0xdf/0x420 [xfs] [ 118.282076] [<ffffffffc03df60b>] xfs_alloc_lookup_eq+0x1b/0x20 [xfs] [ 118.282591] [<ffffffffc03e0ed8>] xfs_free_ag_extent+0x278/0x780 [xfs] [ 118.283094] [<ffffffffc03e33da>] xfs_free_extent+0xaa/0x140 [xfs] [ 118.283604] [<ffffffffc045272a>] xfs_trans_free_extent+0x4a/0x100 [xfs] [ 118.284103] [<ffffffffc04527fe>] xfs_extent_free_finish_item+0x1e/0x40 [xfs] [ 118.284596] [<ffffffffc0401738>] xfs_defer_finish+0x128/0x3d0 [xfs] [ 118.285077] [<ffffffffc0434cf5>] xfs_itruncate_extents+0xf5/0x220 [xfs] [ 118.285553] [<ffffffffc0434ed7>] xfs_inactive_truncate+0xb7/0x110 [xfs] [ 118.286014] [<ffffffffc0435528>] xfs_inactive+0x108/0x130 [xfs] [ 118.286463] [<ffffffffc043cb15>] xfs_fs_destroy_inode+0x95/0x190 [xfs] [ 118.286899] [<ffffffff8986c85b>] destroy_inode+0x3b/0x60 [ 118.287317] [<ffffffff8986c995>] evict+0x115/0x180 [ 118.287727] [<ffffffff8986cd6c>] iput+0xfc/0x190 [ 118.288120] [<ffffffff89860b3e>] do_unlinkat+0x1ae/0x2d0 [ 118.288511] [<ffffffff89daaed5>] ? system_call_after_swapgs+0xa2/0x13a [ 118.288892] [<ffffffff89daaec9>] ? system_call_after_swapgs+0x96/0x13a [ 118.289254] [<ffffffff89daaed5>] ? system_call_after_swapgs+0xa2/0x13a [ 118.289607] [<ffffffff89daaec9>] ? system_call_after_swapgs+0x96/0x13a [ 118.289944] [<ffffffff89daaed5>] ? system_call_after_swapgs+0xa2/0x13a [ 118.290281] [<ffffffff89daaec9>] ? system_call_after_swapgs+0x96/0x13a [ 118.290601] [<ffffffff89861bbb>] SyS_unlinkat+0x1b/0x40 [ 118.290900] [<ffffffff89daaf92>] system_call_fastpath+0x25/0x2a [ 118.291189] [<ffffffff89daaed5>] ? system_call_after_swapgs+0xa2/0x13a [ 118.291476] Code: 48 8b 87 b0 00 00 00 48 81 c7 b0 00 00 00 48 39 c7 48 8d 48 f8 75 11 eb 42 66 90 48 8b 41 08 48 39 c7 48 8d 48 f8 74 33 48 8b 01 <81> 78 30 3c 12 00 00 75 e7 48 8b 80 88 00 00 00 48 39 b0 98 00 [ 118.292459] RIP [<ffffffffc04519e0>] xfs_trans_buf_item_match+0x60/0xa0 [xfs] It seems that " |
| Comments |
| Comment by Gerrit Updater [ 12/Sep/23 ] |
|
"Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52352 |
| Comment by Gerrit Updater [ 26/Sep/23 ] |
|
"Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52512 |
| Comment by Gerrit Updater [ 28/Sep/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52352/ |
| Comment by Peter Jones [ 28/Sep/23 ] |
|
Landed for 2.16 |