[LU-10738] mdd: LBUG() from changelog_store_data_by_fid Created: 28/Feb/18 Updated: 05/Aug/20 Resolved: 09/Mar/18 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | CEA | Assignee: | Sebastien Buisson (Inactive) |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
When running sanity-hsm test_26d on Maloo with this patch applied, the MDS hits an LBUG(). This looks a lot like Here is a test instance that triggers the bug (mds log file). Here is the important part: [ 2604.113084] Lustre: DEBUG MARKER: /usr/sbin/lctl set_param -n mdt.lustre-MDT0000.evict_client 16a29cf5-5198-cefe-15d1-1c432b2ce629 [ 2604.264461] Lustre: 3825:0:(genops.c:1759:obd_export_evict_by_uuid()) lustre-MDT0000: evicting 16a29cf5-5198-cefe-15d1-1c432b2ce629 at adminstrative request [ 2604.269447] Lustre: 3825:0:(osd_internal.h:1139:osd_trans_exec_op()) lustre-MDT0000: opcode 7: credits = 0, rollback = 7 [ 2604.272240] Lustre: 3825:0:(osd_handler.c:1723:osd_trans_dump_creds()) create: 0/0/0, destroy: 0/0/0 [ 2604.274895] Lustre: 3825:0:(osd_handler.c:1730:osd_trans_dump_creds()) attr_set: 0/0/0, xattr_set: 1/89/0 [ 2604.277555] Lustre: 3825:0:(osd_handler.c:1740:osd_trans_dump_creds()) write: 0/0/0, punch: 0/0/0, quota 0/0/0 [ 2604.280293] Lustre: 3825:0:(osd_handler.c:1747:osd_trans_dump_creds()) insert: 0/0/0, delete: 0/0/0 [ 2604.282859] Lustre: 3825:0:(osd_handler.c:1754:osd_trans_dump_creds()) ref_add: 0/0/0, ref_del: 0/0/0 [ 2604.285402] LustreError: 3825:0:(osd_internal.h:1141:osd_trans_exec_op()) ASSERTION( !ldiskfs_track_declares_assert ) failed: [ 2604.289787] LustreError: 3825:0:(osd_internal.h:1141:osd_trans_exec_op()) LBUG [ 2604.292129] Pid: 3825, comm: lctl [ 2604.294136] [ 2604.294136] Call Trace: [ 2604.297725] [<ffffffffc068d7ae>] libcfs_call_trace+0x4e/0x60 [libcfs] [ 2604.299849] [<ffffffffc068d83c>] lbug_with_loc+0x4c/0xb0 [libcfs] [ 2604.301974] [<ffffffffc0d54df1>] osd_write+0x5a1/0x5b0 [osd_ldiskfs] [ 2604.304053] [<ffffffffc08c74e9>] dt_record_write+0x39/0x120 [obdclass] [ 2604.306141] [<ffffffffc0888697>] llog_osd_write_rec+0xbf7/0x1460 [obdclass] [ 2604.308201] [<ffffffffc087b3d9>] llog_write_rec+0xc9/0x520 [obdclass] [ 2604.310265] [<ffffffffc0880370>] llog_cat_add_rec+0x220/0x8b0 [obdclass] [ 2604.312289] [<ffffffffc08784fa>] llog_add+0x7a/0x1a0 [obdclass] [ 2604.314286] [<ffffffff810ec7ba>] ? __getnstimeofday64+0x3a/0xd0 [ 2604.316248] [<ffffffffc1055b22>] mdd_changelog_store+0x1a2/0x5f0 [mdd] [ 2604.318314] [<ffffffffc1063f8e>] mdd_changelog_data_store_by_fid+0x1ae/0x320 [mdd] [ 2604.320430] [<ffffffffc1064564>] mdd_changelog_data_store_xattr+0x104/0x230 [mdd] [ 2604.322589] [<ffffffffc106c17e>] mdd_xattr_set+0x95e/0x17f0 [mdd] [ 2604.324613] [<ffffffffc0f12532>] mdt_hsm_attr_set+0xa2/0x230 [mdt] [ 2604.326673] [<ffffffffc0efc5b1>] mdt_add_dirty_flag+0x1d1/0x250 [mdt] [ 2604.328694] [<ffffffffc0ed0e3d>] mdt_ctxt_add_dirty_flag.isra.70+0xdd/0x1a0 [mdt] [ 2604.330870] [<ffffffffc0ed3618>] mdt_obd_disconnect+0x3c8/0x670 [mdt] [ 2604.332913] [<ffffffffc08986a9>] class_fail_export+0x279/0x580 [obdclass] [ 2604.334988] [<ffffffffc089b72f>] obd_export_evict_by_uuid+0x12f/0x220 [obdclass] [ 2604.337030] [<ffffffff8118483b>] ? unlock_page+0x2b/0x30 [ 2604.338946] [<ffffffffc08de96f>] lprocfs_evict_client_seq_write+0x1cf/0x290 [obdclass] [ 2604.340974] [<ffffffffc0f0f086>] mdt_mds_evict_client_write+0x416/0x6a0 [mdt] [ 2604.342948] [<ffffffff8127267d>] proc_reg_write+0x3d/0x80 [ 2604.344734] [<ffffffff81202ced>] vfs_write+0xbd/0x1e0 [ 2604.346467] [<ffffffff81203aff>] SyS_write+0x7f/0xe0 [ 2604.348143] [<ffffffff816b8929>] ? system_call_after_swapgs+0x156/0x214 [ 2604.349917] [<ffffffff816b89fd>] system_call_fastpath+0x16/0x1b [ 2604.351609] [<ffffffff816b889d>] ? system_call_after_swapgs+0xca/0x214 |
| Comments |
| Comment by Peter Jones [ 28/Feb/18 ] |
|
IIUC Sebastien is investigating whether this is due to one of his patches from |
| Comment by John Hammond [ 28/Feb/18 ] |
|
We check recording_changelog(env, mdd) in mdd_declare_changelog_store() but not in mdd_changelog_data_store_xattr() or any of its callees. |
| Comment by John Hammond [ 28/Feb/18 ] |
|
Should we call hsm_init_ucred() in mdt_export_cleanup() as well? |
| Comment by Sebastien Buisson (Inactive) [ 28/Feb/18 ] |
|
Hi, It looks like recording_changelog(env, mdd) should be called instead of directly checking (mdd->mdd_cl.mc_flags & CLM_ON) in every function. I have pushed patch https://review.whamcloud.com/31456 to address this. Not sure if it helps with this bug, as I am not able to reproduce the issue on my test system so far (although I am using a setup with 2 MDTs and 3 clients). |
| Comment by John Hammond [ 01/Mar/18 ] |
|
Note that Sebastien's patch uses |
| Comment by Quentin Bouget [ 02/Mar/18 ] |
|
I rebased my patch on top of Sebastien's fix and test_26d now passes on Maloo (twice in a row). |
| Comment by Peter Jones [ 09/Mar/18 ] |
|
So if I understand correctly this ticket is essentially a duplicate of |