[LU-10598] Ignore IGIF formatted last_id Created: 02/Feb/18 Updated: 08/Aug/18 Resolved: 09/Apr/18 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.12.0, Lustre 2.10.4 |
| Type: | Bug | Priority: | Minor |
| Reporter: | nasf (Inactive) | Assignee: | nasf (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
Some customer reported the following trouble during OI scrub: [1328349.256907] ------------[ cut here ]------------ [1328349.256928] WARNING: at /tmp/rpmbuild-lustre-root-lvg8hO2L/BUILD/lustre-2.7.21.3.ddn12.g7410d9d/ldiskfs/ext4_jbd2.c:266 __ldiskfs_handle_dirty_metadata+0x1c2/0x220 [ldiskfs]() [1328349.256930] Modules linked in: mgs(OE) loop osc(OE) nfsv3 nfs fscache osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) sha512_ssse3 sha512_generic crypto_null libcfs(OE) mpt3sas mpt2sas raid_class scsi_transport_sas mptctl mptbase dell_rbu rdma_ucm(OE) ib_ucm(OE) bonding rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) ib_sa(OE) ib_mad(OE) mlx4_core(OE) dm_mirror dm_region_hash dm_log intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass dm_round_robin crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt iTCO_vendor_support mxm_wmi pcspkr sg ipmi_devintf ipmi_si [1328349.256966] ipmi_msghandler mei_me lpc_ich mei sb_edac edac_core shpchp acpi_power_meter wmi dm_multipath dm_mod knem(OE) nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 mlx5_ib(OE) ib_core(OE) ib_addr(OE) ib_netlink(OE) sr_mod cdrom sd_mod crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common crc32c_intel mgag200 i2c_algo_bit qla2xxx drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm scsi_transport_fc mlx5_core(OE) ahci scsi_tgt mlx_compat(OE) drm libahci tg3 libata i2c_core megaraid_sas ptp pps_core fjes [1328349.256991] CPU: 30 PID: 6079 Comm: OI_scrub Tainted: G OE ------------ 3.10.0-514.26.2.el7_lustre.2.7.21.3.ddn12.g7410d9d.x86_64 #1 [1328349.256993] Hardware name: /086D43, BIOS 2.4.3 01/17/2017 [1328349.256994] 0000000000000000 00000000a836e6cc ffff881de9ab7700 ffffffff81687f63 [1328349.256996] ffff881de9ab7738 ffffffff81085cb0 ffff880d7c2952d8 ffff88017fbbf150 [1328349.256997] ffff880d8368a2a8 ffffffffa116530c 0000000000000346 ffff881de9ab7748 [1328349.256999] Call Trace: [1328349.257004] [<ffffffff81687f63>] dump_stack+0x19/0x1b [1328349.257008] [<ffffffff81085cb0>] warn_slowpath_common+0x70/0xb0 [1328349.257009] [<ffffffff81085dfa>] warn_slowpath_null+0x1a/0x20 [1328349.257016] [<ffffffffa114a8a2>] __ldiskfs_handle_dirty_metadata+0x1c2/0x220 [ldiskfs] [1328349.257022] [<ffffffffa111b201>] ldiskfs_getblk+0x131/0x200 [ldiskfs] [1328349.257027] [<ffffffffa111b2fa>] ldiskfs_bread+0x2a/0x1e0 [ldiskfs] [1328349.257031] [<ffffffffa110cf01>] ldiskfs_append+0x81/0x150 [ldiskfs] [1328349.257036] [<ffffffffa1113e5f>] ldiskfs_init_new_dir+0xcf/0x230 [ldiskfs] [1328349.257040] [<ffffffffa11141db>] ldiskfs_mkdir+0x18b/0x290 [ldiskfs] [1328349.257043] [<ffffffff8120ab27>] vfs_mkdir+0xb7/0x160 [1328349.257059] [<ffffffffa11cec09>] simple_mkdir.isra.17.constprop.25+0x429/0x4c0 [osd_ldiskfs] [1328349.257066] [<ffffffffa11e28ef>] osd_seq_load_locked.isra.20+0x47a/0x804 [osd_ldiskfs] [1328349.257069] [<ffffffff810bc454>] ? __wake_up+0x44/0x50 [1328349.257071] [<ffffffff811debf6>] ? kmem_cache_alloc_trace+0x1d6/0x200 [1328349.257078] [<ffffffffa11cefe8>] osd_seq_load+0x348/0x610 [osd_ldiskfs] [1328349.257085] [<ffffffffa11cf2d5>] osd_object_spec_find+0x25/0x130 [osd_ldiskfs] [1328349.257091] [<ffffffffa11d29ec>] osd_obj_spec_insert+0x5c/0x150 [osd_ldiskfs] [1328349.257097] [<ffffffffa11bfd30>] osd_oi_insert+0x2c0/0x450 [osd_ldiskfs] [1328349.257103] [<ffffffffa11d4276>] ? osd_scrub_refresh_mapping+0x66/0x420 [osd_ldiskfs] [1328349.257110] [<ffffffffa114a209>] ? __ldiskfs_journal_start_sb+0x69/0xe0 [ldiskfs] [1328349.257117] [<ffffffffa11d442c>] osd_scrub_refresh_mapping+0x21c/0x420 [osd_ldiskfs] [1328349.257122] [<ffffffffa11b4bb4>] ? osd_ea_fid_set+0xb4/0x340 [osd_ldiskfs] [1328349.257128] [<ffffffffa11d5153>] osd_scrub_check_update+0x293/0x12e0 [osd_ldiskfs] [1328349.257134] [<ffffffffa11d3b77>] ? osd_iit_iget+0xd7/0x2f0 [osd_ldiskfs] [1328349.257139] [<ffffffffa11d7d35>] osd_scrub_exec+0x65/0x5e0 [osd_ldiskfs] [1328349.257147] [<ffffffffa114df4e>] ? ldiskfs_read_inode_bitmap+0x23e/0x6e0 [ldiskfs] [1328349.257153] [<ffffffffa11d97c1>] osd_inode_iteration+0x571/0xd80 [osd_ldiskfs] [1328349.257159] [<ffffffffa11d7cd0>] ? osd_ios_ROOT_scan+0x280/0x280 [osd_ldiskfs] [1328349.257165] [<ffffffffa11d3e50>] ? osd_preload_next+0xc0/0xc0 [osd_ldiskfs] [1328349.257171] [<ffffffffa11daa40>] osd_scrub_main+0xa70/0x1080 [osd_ldiskfs] [1328349.257177] [<ffffffffa11d9fd0>] ? osd_inode_iteration+0xd80/0xd80 [osd_ldiskfs] [1328349.257180] [<ffffffff810b0a4f>] kthread+0xcf/0xe0 [1328349.257181] [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140 [1328349.257183] [<ffffffff81698598>] ret_from_fork+0x58/0x90 [1328349.257185] [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140 [1328349.257186] ---[ end trace 002aea32270ef73f ]--- [1328349.257188] LDISKFS-fs: ldiskfs_getblk:838: aborting transaction: error 28 in __ldiskfs_handle_dirty_metadata [1328349.269824] LDISKFS-fs error (device dm-0): ldiskfs_getblk:838: inode #1559272469: block 779684908: comm OI_scrub: journal_dirty_metadata failed: handle type 0 started at line 112, credits 16/0, errcode -28 |
| Comments |
| Comment by nasf (Inactive) [ 02/Feb/18 ] |
|
The root reason is that the OI scrub found an object with the FID [0x89:0x0:0x0]. That is a valid IGIF, but the old logic handled it as last_id for the sequence 0x89, then tried to create directories under /O. But the OI scrub has already start transactions without consider such corner case, as to the credits was not enough, then failed. In fact, such special IGIF should not be handled as last_id. The issue is very rare (1/2^32 possibility), but exists on almost all the release. I will make patch to fix that. |
| Comment by Gerrit Updater [ 02/Feb/18 ] |
|
Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/31140 |
| Comment by Gerrit Updater [ 09/Apr/18 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31140/ |
| Comment by Peter Jones [ 09/Apr/18 ] |
|
**Landed for 2.12 |
| Comment by Gerrit Updater [ 11/Apr/18 ] |
|
Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/31952 |
| Comment by Gerrit Updater [ 12/Apr/18 ] |
|
John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/31952/ |