[LU-10598] Ignore IGIF formatted last_id Created: 02/Feb/18  Updated: 08/Aug/18  Resolved: 09/Apr/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.12.0, Lustre 2.10.4

Type: Bug Priority: Minor
Reporter: nasf (Inactive) Assignee: nasf (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-11156 scrub treat project quota inode as IG... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Some customer reported the following trouble during OI scrub:

[1328349.256907] ------------[ cut here ]------------
[1328349.256928] WARNING: at /tmp/rpmbuild-lustre-root-lvg8hO2L/BUILD/lustre-2.7.21.3.ddn12.g7410d9d/ldiskfs/ext4_jbd2.c:266 __ldiskfs_handle_dirty_metadata+0x1c2/0x220 [ldiskfs]()
[1328349.256930] Modules linked in: mgs(OE) loop osc(OE) nfsv3 nfs fscache osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) sha512_ssse3 sha512_generic crypto_null libcfs(OE) mpt3sas mpt2sas raid_class scsi_transport_sas mptctl mptbase dell_rbu rdma_ucm(OE) ib_ucm(OE) bonding rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) ib_sa(OE) ib_mad(OE) mlx4_core(OE) dm_mirror dm_region_hash dm_log intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass dm_round_robin crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt iTCO_vendor_support mxm_wmi pcspkr sg ipmi_devintf ipmi_si
[1328349.256966]  ipmi_msghandler mei_me lpc_ich mei sb_edac edac_core shpchp acpi_power_meter wmi dm_multipath dm_mod knem(OE) nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 mlx5_ib(OE) ib_core(OE) ib_addr(OE) ib_netlink(OE) sr_mod cdrom sd_mod crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common crc32c_intel mgag200 i2c_algo_bit qla2xxx drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm scsi_transport_fc mlx5_core(OE) ahci scsi_tgt mlx_compat(OE) drm libahci tg3 libata i2c_core megaraid_sas ptp pps_core fjes
[1328349.256991] CPU: 30 PID: 6079 Comm: OI_scrub Tainted: G           OE  ------------   3.10.0-514.26.2.el7_lustre.2.7.21.3.ddn12.g7410d9d.x86_64 #1
[1328349.256993] Hardware name:    /086D43, BIOS 2.4.3 01/17/2017
[1328349.256994]  0000000000000000 00000000a836e6cc ffff881de9ab7700 ffffffff81687f63
[1328349.256996]  ffff881de9ab7738 ffffffff81085cb0 ffff880d7c2952d8 ffff88017fbbf150
[1328349.256997]  ffff880d8368a2a8 ffffffffa116530c 0000000000000346 ffff881de9ab7748
[1328349.256999] Call Trace:
[1328349.257004]  [<ffffffff81687f63>] dump_stack+0x19/0x1b
[1328349.257008]  [<ffffffff81085cb0>] warn_slowpath_common+0x70/0xb0
[1328349.257009]  [<ffffffff81085dfa>] warn_slowpath_null+0x1a/0x20
[1328349.257016]  [<ffffffffa114a8a2>] __ldiskfs_handle_dirty_metadata+0x1c2/0x220 [ldiskfs]
[1328349.257022]  [<ffffffffa111b201>] ldiskfs_getblk+0x131/0x200 [ldiskfs]
[1328349.257027]  [<ffffffffa111b2fa>] ldiskfs_bread+0x2a/0x1e0 [ldiskfs]
[1328349.257031]  [<ffffffffa110cf01>] ldiskfs_append+0x81/0x150 [ldiskfs]
[1328349.257036]  [<ffffffffa1113e5f>] ldiskfs_init_new_dir+0xcf/0x230 [ldiskfs]
[1328349.257040]  [<ffffffffa11141db>] ldiskfs_mkdir+0x18b/0x290 [ldiskfs]
[1328349.257043]  [<ffffffff8120ab27>] vfs_mkdir+0xb7/0x160
[1328349.257059]  [<ffffffffa11cec09>] simple_mkdir.isra.17.constprop.25+0x429/0x4c0 [osd_ldiskfs]
[1328349.257066]  [<ffffffffa11e28ef>] osd_seq_load_locked.isra.20+0x47a/0x804 [osd_ldiskfs]
[1328349.257069]  [<ffffffff810bc454>] ? __wake_up+0x44/0x50
[1328349.257071]  [<ffffffff811debf6>] ? kmem_cache_alloc_trace+0x1d6/0x200
[1328349.257078]  [<ffffffffa11cefe8>] osd_seq_load+0x348/0x610 [osd_ldiskfs]
[1328349.257085]  [<ffffffffa11cf2d5>] osd_object_spec_find+0x25/0x130 [osd_ldiskfs]
[1328349.257091]  [<ffffffffa11d29ec>] osd_obj_spec_insert+0x5c/0x150 [osd_ldiskfs]
[1328349.257097]  [<ffffffffa11bfd30>] osd_oi_insert+0x2c0/0x450 [osd_ldiskfs]
[1328349.257103]  [<ffffffffa11d4276>] ? osd_scrub_refresh_mapping+0x66/0x420 [osd_ldiskfs]
[1328349.257110]  [<ffffffffa114a209>] ? __ldiskfs_journal_start_sb+0x69/0xe0 [ldiskfs]
[1328349.257117]  [<ffffffffa11d442c>] osd_scrub_refresh_mapping+0x21c/0x420 [osd_ldiskfs]
[1328349.257122]  [<ffffffffa11b4bb4>] ? osd_ea_fid_set+0xb4/0x340 [osd_ldiskfs]
[1328349.257128]  [<ffffffffa11d5153>] osd_scrub_check_update+0x293/0x12e0 [osd_ldiskfs]
[1328349.257134]  [<ffffffffa11d3b77>] ? osd_iit_iget+0xd7/0x2f0 [osd_ldiskfs]
[1328349.257139]  [<ffffffffa11d7d35>] osd_scrub_exec+0x65/0x5e0 [osd_ldiskfs]
[1328349.257147]  [<ffffffffa114df4e>] ? ldiskfs_read_inode_bitmap+0x23e/0x6e0 [ldiskfs]
[1328349.257153]  [<ffffffffa11d97c1>] osd_inode_iteration+0x571/0xd80 [osd_ldiskfs]
[1328349.257159]  [<ffffffffa11d7cd0>] ? osd_ios_ROOT_scan+0x280/0x280 [osd_ldiskfs]
[1328349.257165]  [<ffffffffa11d3e50>] ? osd_preload_next+0xc0/0xc0 [osd_ldiskfs]
[1328349.257171]  [<ffffffffa11daa40>] osd_scrub_main+0xa70/0x1080 [osd_ldiskfs]
[1328349.257177]  [<ffffffffa11d9fd0>] ? osd_inode_iteration+0xd80/0xd80 [osd_ldiskfs]
[1328349.257180]  [<ffffffff810b0a4f>] kthread+0xcf/0xe0
[1328349.257181]  [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
[1328349.257183]  [<ffffffff81698598>] ret_from_fork+0x58/0x90
[1328349.257185]  [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
[1328349.257186] ---[ end trace 002aea32270ef73f ]---
[1328349.257188] LDISKFS-fs: ldiskfs_getblk:838: aborting transaction: error 28 in __ldiskfs_handle_dirty_metadata
[1328349.269824] LDISKFS-fs error (device dm-0): ldiskfs_getblk:838: inode #1559272469: block 779684908: comm OI_scrub: journal_dirty_metadata failed: handle type 0 started at line 112, credits 16/0, errcode -28


 Comments   
Comment by nasf (Inactive) [ 02/Feb/18 ]

The root reason is that the OI scrub found an object with the FID [0x89:0x0:0x0]. That is a valid IGIF, but the old logic handled it as last_id for the sequence 0x89, then tried to create directories under /O. But the OI scrub has already start transactions without consider such corner case, as to the credits was not enough, then failed.

In fact, such special IGIF should not be handled as last_id. The issue is very rare (1/2^32 possibility), but exists on almost all the release. I will make patch to fix that.

Comment by Gerrit Updater [ 02/Feb/18 ]

Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/31140
Subject: LU-10598 obdclass: ignore IGIF formatted last_id
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 6cb58fedda9bc1e128c0cb93493f34504795038d

Comment by Gerrit Updater [ 09/Apr/18 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31140/
Subject: LU-10598 obdclass: ignore IGIF formatted last_id
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 0fa1cc6fbfd7ec777139a8ead4efce83fde4e702

Comment by Peter Jones [ 09/Apr/18 ]

**Landed for 2.12

Comment by Gerrit Updater [ 11/Apr/18 ]

Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/31952
Subject: LU-10598 obdclass: ignore IGIF formatted last_id
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 01335c53cffad802e50d1ed96ecc75c512e05806

Comment by Gerrit Updater [ 12/Apr/18 ]

John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/31952/
Subject: LU-10598 obdclass: ignore IGIF formatted last_id
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: 51854afa37b4520f0c0cc0bab8d0056b8ac9ae6f

Generated at Sat Feb 10 02:36:31 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.