[LU-601] kernel BUG at fs/jbd2/transaction.c:1030 Created: 17/Aug/11  Updated: 26/Feb/12  Resolved: 26/Feb/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.2.0, Lustre 2.1.1

Type: Bug Priority: Minor
Reporter: Patrick Valentin (Inactive) Assignee: Zhenyu Xu
Resolution: Fixed Votes: 1
Labels: None

Attachments: File trace_chown_1_to_10    
Issue Links:
Duplicate
is duplicated by LU-1045 kernel BUG at fs/jbd2/transaction.c:1... Resolved
Severity: 3
Rank (Obsolete): 4726

 Description   

Hi,

The following assertion failed with Lustre 2.0.0 was reported by the on site support at CEA customer site:

fs/jbd2/transaction.c:jbd2_journal_dirty_metadata() ,line 1030

J_ASSERT_JH(jh, handle->h_buffer_credits > 0);

This issue has been hit several times on restart of an MDS. On this particular one, the problem is not extremely critical
since after dump+restart, the service continue



------------[ cut here ]------------
kernel BUG at fs/jbd2/transaction.c:1030!
invalid opcode: 0000 [#1] SMP 

PID: 24472  TASK: ffff8808556011c0  CPU: 22  COMMAND: "tgt_recov"
 #0 [ffff88083370a9d0] machine_kexec at ffffffff8102e77b
 #1 [ffff88083370aa30] crash_kexec at ffffffff810a6cd8
 #2 [ffff88083370ab00] oops_end at ffffffff8146aad0
 #3 [ffff88083370ab30] die at ffffffff8101021b
 #4 [ffff88083370ab60] do_trap at ffffffff8146a3a4
 #5 [ffff88083370abc0] do_invalid_op at ffffffff8100dda5
 #6 [ffff88083370ac60] invalid_op at ffffffff8100cf3b
    [exception RIP: jbd2_journal_dirty_metadata+269]
    RIP: ffffffffa00518ed  RSP: ffff88083370ad10  RFLAGS: 00010246
    RAX: ffff881831c8b8c0  RBX: ffff881834107468  RCX: ffff8808512adc90
    RDX: 0000000000000000  RSI: ffff8808512adc90  RDI: 0000000000000000
    RBP: ffff88083370ad30   R8: 2010000000000000   R9: f790d737baaf2402
    R10: 0000000000000001  R11: 0000000000000040  R12: ffff8818343606d8
    R13: ffff8808512adc90  R14: ffff880859b81800  R15: 0000000000002000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffff88083370ad38] __ldiskfs_handle_dirty_metadata at ffffffffa04bb3fb [ldiskfs]
 #8 [ffff88083370ad78] fsfilt_ldiskfs_write_handle at ffffffffa09bede7 [fsfilt_ldiskfs]
 #9 [ffff88083370ae28] fsfilt_ldiskfs_write_record at ffffffffa09bf0fe [fsfilt_ldiskfs]
#10 [ffff88083370aea8] llog_lvfs_write_blob at ffffffffa05a018c [obdclass]
#11 [ffff88083370af58] llog_lvfs_write_rec at ffffffffa05a1732 [obdclass]
#12 [ffff88083370b038] llog_cat_current_log.clone.0 at ffffffffa059e14f [obdclass]
#13 [ffff88083370b118] llog_cat_add_rec at ffffffffa059e86a [obdclass]
#14 [ffff88083370b198] llog_obd_origin_add at ffffffffa05a51a6 [obdclass]
#15 [ffff88083370b1f8] llog_add at ffffffffa05a5381 [obdclass]
#16 [ffff88083370b268] lov_llog_origin_add at ffffffffa089a0cc [lov]
#17 [ffff88083370b318] llog_add at ffffffffa05a5381 [obdclass]
#18 [ffff88083370b388] mds_llog_origin_add at ffffffffa09d46f9 [mds]
#19 [ffff88083370b408] llog_add at ffffffffa05a5381 [obdclass]
#20 [ffff88083370b478] mds_llog_add_unlink at ffffffffa09d4de4 [mds]
#21 [ffff88083370b4f8] mds_log_op_orphan at ffffffffa09d5229 [mds]
#22 [ffff88083370b578] mds_lov_update_objids at ffffffffa09de7ef [mds]
#23 [ffff88083370b638] mdd_lov_objid_update at ffffffffa09f5cb2 [mdd]
#24 [ffff88083370b648] mdd_create_data at ffffffffa0a02c91 [mdd]
#25 [ffff88083370b6e8] cml_create_data at ffffffffa0acf036 [cmm]
#26 [ffff88083370b768] mdt_finish_open at ffffffffa0a6c885 [mdt]
#27 [ffff88083370b838] mdt_reint_open at ffffffffa0a6d119 [mdt]
#28 [ffff88083370b958] mdt_reint_rec at ffffffffa0a5764f [mdt]
#29 [ffff88083370b9a8] mdt_reint_internal at ffffffffa0a4ea04 [mdt]
#30 [ffff88083370ba38] mdt_intent_reint at ffffffffa0a4f085 [mdt]
#31 [ffff88083370bab8] mdt_intent_policy at ffffffffa0a48270 [mdt]
#32 [ffff88083370bb28] ldlm_lock_enqueue at ffffffffa068ea9d [ptlrpc]
#33 [ffff88083370bbc8] ldlm_handle_enqueue0 at ffffffffa06b64d1 [ptlrpc]
#34 [ffff88083370bc68] mdt_enqueue at ffffffffa0a47dea [mdt]
#35 [ffff88083370bc98] mdt_handle_common at ffffffffa0a439f5 [mdt]
#36 [ffff88083370bd18] mdt_recovery_handle at ffffffffa0a44a68 [mdt]
#37 [ffff88083370bd68] handle_recovery_req at ffffffffa0699512 [ptlrpc]
#38 [ffff88083370bde8] target_recovery_thread at ffffffffa0699b36 [ptlrpc]
#39 [ffff88083370bf48] kernel_thread at ffffffff8100d1aa


Something similar is sometime hit just after the MDS end the recovery, during orphan cleanup. In such case the MDS fall
repetitively after each lustre restart and, as a workaround, we had to mount the volume in ldiskfs mode and remove the
PENDING subdirectory.

Is block reservation done in fsfilt_ldiskfs_write_record for the jbd2 transaction is too small ?



 Comments   
Comment by Peter Jones [ 18/Aug/11 ]

Bobijam

Could you please look into this one?

Thanks

Peter

Comment by Zhenyu Xu [ 19/Aug/11 ]
fsfilt_ldiskfs_write_record()
        /* Determine how many transaction credits are needed */
        blocksize = 1 << inode->i_blkbits;
        block_count = (*offs & (blocksize - 1)) + bufsize;
        block_count = (block_count + blocksize - 1) >> inode->i_blkbits;

        handle = ldiskfs_journal_start(inode,
                               block_count * FSFILT_DATA_TRANS_BLOCKS(inode->i_sb) + 2);

bzzz, is it occasionally possible that the transaction credit calculation not enough for unlink record writing?

please ignore the question

Comment by Zhenyu Xu [ 22/Aug/11 ]

This happens in mdd_create_data() and it only preserve transaction credit for object create

mdd_txn_param_build(env, mdd, MDD_TXN_CREATE_DATA_OP);

which does not count in the possible unlinking lost precreated object logs credits.

Comment by Zhenyu Xu [ 24/Aug/11 ]

patch tracking at http://review.whamcloud.com/1276

Comment by Patrick Valentin (Inactive) [ 15/Sep/11 ]

I attached the trace file (traces.tar.gz) but it seems it doesn't work. So I upload a copy on your ftp site, under uploads/LU-L687.

Comment by Patrick Valentin (Inactive) [ 15/Sep/11 ]

Ooops, ignore my previous comment! Sorry, it was for LU-687.

Comment by Patrick Valentin (Inactive) [ 16/Sep/11 ]

The following patch, at http://review.whamcloud.com/1276, was installed at CEA site on tuesday september the 6th.
This morning, they experienced a MDS pseudo-hang situation as described in LU-685, and they had to reboot the MDS.
During the restart, they had again a linux kernel crash, due to the following ASSERT:

fs/jbd2/transaction.c:1030
    jbd2_journal_dirty_metadata()
        J_ASSERT_JH(jh, handle->h_buffer_credits > 0);
Comment by Zhenyu Xu [ 16/Sep/11 ]

Is the crash dump stack the same as that of the original dump stack posted in the description?

Comment by Patrick Valentin (Inactive) [ 16/Sep/11 ]

Yes, the stack is the same. Here under is a copy.
If you are interrested to have the dump, I can ask to have an upload.
It's on another cluster at CEA site, and on this one we are allowed
to get a copy of the dumps.

PID: 9687   TASK: ffff88185b8216c0  CPU: 9   COMMAND: "tgt_recov"
 #0 [ffff881831588f40] machine_kexec at ffffffff8102e77b
 #1 [ffff881831588fa0] crash_kexec at ffffffff810a6cd8
 #2 [ffff881831589070] oops_end at ffffffff8146aad0
 #3 [ffff8818315890a0] die at ffffffff8101021b
 #4 [ffff8818315890d0] do_trap at ffffffff8146a3a4
 #5 [ffff881831589130] do_invalid_op at ffffffff8100dda5
 #6 [ffff8818315891d0] invalid_op at ffffffff8100cf3b
    [exception RIP: jbd2_journal_dirty_metadata+269]
    RIP: ffffffffa08b08ed  RSP: ffff881831589280  RFLAGS: 00010246
    RAX: ffff8820452b7680  RBX: ffff88104db7c1c8  RCX: ffff8808127194d8
    RDX: 0000000000000000  RSI: ffff8808127194d8  RDI: 0000000000000000
    RBP: ffff8818315892a0   R8: b010000000000000   R9: f7cf906fc45d3602
    R10: 0000000000000001  R11: 00000000000000c0  R12: ffff8808127ed908
    R13: ffff8808127194d8  R14: ffff881055678800  R15: 0000000000002000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffff8818315892a8] __ldiskfs_handle_dirty_metadata at ffffffffa08cb3fb
 #8 [ffff8818315892e8] fsfilt_ldiskfs_write_handle at ffffffffa055cde7
 #9 [ffff881831589398] fsfilt_ldiskfs_write_record at ffffffffa055d0fe
#10 [ffff881831589418] llog_lvfs_write_blob at ffffffffa04a218c
#11 [ffff8818315894c8] llog_lvfs_write_rec at ffffffffa04a39fa
#12 [ffff8818315895a8] llog_cat_add_rec at ffffffffa04a0905
#13 [ffff881831589628] llog_obd_origin_add at ffffffffa04a71a6
#14 [ffff881831589688] llog_add at ffffffffa04a7381
#15 [ffff8818315896f8] lov_llog_origin_add at ffffffffa079c0cc
#16 [ffff8818315897a8] llog_add at ffffffffa04a7381
#17 [ffff881831589818] mds_llog_origin_add at ffffffffa0572d29
#18 [ffff881831589898] llog_add at ffffffffa04a7381
#19 [ffff881831589908] mds_llog_add_unlink at ffffffffa0573414
#20 [ffff881831589988] mds_log_op_unlink at ffffffffa0573b26
#21 [ffff881831589a18] mdd_unlink_log at ffffffffa094bc1e
#22 [ffff881831589a78] mdd_lov_destroy at ffffffffa094e5d0
#23 [ffff881831589b38] __mdd_orphan_cleanup at ffffffffa0955c8c
#24 [ffff881831589c18] mdd_recovery_complete at ffffffffa0967be8
#25 [ffff881831589c98] cmm_recovery_complete at ffffffffa09dc17a
#26 [ffff881831589ce8] mdt_postrecov at ffffffffa098278e
#27 [ffff881831589d48] mdt_obd_postrecov at ffffffffa0985ca7
#28 [ffff881831589de8] target_recovery_thread at ffffffffa059c0e8
#29 [ffff881831589f48] kernel_thread at ffffffff8100d1aa
Comment by Zhenyu Xu [ 16/Sep/11 ]

orphan_object_destroy() also does not preserve enough journal credit, it calls orphan_object_kill()->mdd_lov_destroy() which need another llog write credit.

Comment by Peter Jones [ 22/Sep/11 ]

http://review.whamcloud.com/#change,1276

Comment by Patrick Valentin (Inactive) [ 13/Oct/11 ]

Hi,

We have delivered an EFIX containing LU-601 patch set 7 to our CEA customer last tuestay (october 4), and yesterday they experienced twice the same same crash, during Lustre recovery. Here is the stack and an extract of the syslog recorded by on line support. This stack is quite close to the initial one posted in this Jira ticket, and is exactly the same than other traces recorded by support team before this efix was installed.

BTW, we had to add the following lines to the original patch set 7, because we have already integrated LU-81, which adds a call to mdd_txn_param_build():

--- a/lustre/mdd/mdd_device.c
+++ b/lustre/mdd/mdd_device.c
@@ -1393,7 +1393,7 @@ static int mdd_changelog_user_purge_cb(s
 
                 /* XXX This is a workaround for the deadlock of changelog
                  * adding vs. changelog cancelling. LU-81. */
-                mdd_txn_param_build(mcud->mcud_env, mdd, MDD_TXN_UNLINK_OP);
+                mdd_txn_param_build(mcud->mcud_env, mdd, MDD_TXN_UNLINK_OP, 1);
                 trans_h = mdd_trans_start(mcud->mcud_env, mdd);
                 if (IS_ERR(trans_h)) {
                         CERROR("fsfilt_start_log failed: %ld\n",

traces provided by support team:

crash&gt; bt
PID: 13511  TASK: ffff88085b462440  CPU: 30  COMMAND: &quot;tgt_recov&quot;
 #0 [ffff88082ec22e50] machine_kexec at ffffffff8102e77b
 #1 [ffff88082ec22eb0] crash_kexec at ffffffff810a6cd8
 #2 [ffff88082ec22f80] oops_end at ffffffff8146aad0
 #3 [ffff88082ec22fb0] die at ffffffff8101021b
 #4 [ffff88082ec22fe0] do_trap at ffffffff8146a3a4
 #5 [ffff88082ec23040] do_invalid_op at ffffffff8100dda5
 #6 [ffff88082ec230e0] invalid_op at ffffffff8100cf3b
    [exception RIP: jbd2_journal_dirty_metadata+269]
    RIP: ffffffffa00518ed  RSP: ffff88082ec23190  RFLAGS: 00010246
    RAX: ffff881830d86ac0  RBX: ffff881841975438  RCX: ffff8818419dfe98
    RDX: 0000000000000000  RSI: ffff8818419dfe98  RDI: 0000000000000000
    RBP: ffff88082ec231b0   R8: 3010000000000000   R9: e7a06811cf77a602
    R10: 0000000000000000  R11: 00000000000001ad  R12: ffff8818419dd438
    R13: ffff8818419dfe98  R14: ffff88185a75c800  R15: ffff8818419de558
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffff88082ec231b8] __ldiskfs_handle_dirty_metadata at ffffffffa09433fb
 #8 [ffff88082ec231f8] ldiskfs_getblk at ffffffffa095150d
 #9 [ffff88082ec232b8] ldiskfs_bread at ffffffffa09515b8
#10 [ffff88082ec232e8] fsfilt_ldiskfs_write_handle at ffffffffa09f0e47
#11 [ffff88082ec23398] fsfilt_ldiskfs_write_record at ffffffffa09f10fe
#12 [ffff88082ec23418] llog_lvfs_write_blob at ffffffffa053318c
#13 [ffff88082ec234c8] llog_lvfs_write_rec at ffffffffa05349fa
#14 [ffff88082ec235a8] llog_cat_add_rec at ffffffffa0531905
#15 [ffff88082ec23628] llog_obd_origin_add at ffffffffa05381a6
#16 [ffff88082ec23688] llog_add at ffffffffa0538381
#17 [ffff88082ec236f8] lov_llog_origin_add at ffffffffa082d0cc
#18 [ffff88082ec237a8] llog_add at ffffffffa0538381
#19 [ffff88082ec23818] mds_llog_origin_add at ffffffffa05f1d29
#20 [ffff88082ec23898] llog_add at ffffffffa0538381
#21 [ffff88082ec23908] mds_llog_add_unlink at ffffffffa05f2414
#22 [ffff88082ec23988] mds_log_op_unlink at ffffffffa05f2b26
#23 [ffff88082ec23a18] mdd_unlink_log at ffffffffa0a0cc1e
#24 [ffff88082ec23a78] mdd_lov_destroy at ffffffffa0a0f5d0
#25 [ffff88082ec23b38] __mdd_orphan_cleanup at ffffffffa0a16c8c
#26 [ffff88082ec23c18] mdd_recovery_complete at ffffffffa0a28c08
#27 [ffff88082ec23c98] cmm_recovery_complete at ffffffffa073a17a
#28 [ffff88082ec23ce8] mdt_postrecov at ffffffffa0a4378e
#29 [ffff88082ec23d48] mdt_obd_postrecov at ffffffffa0a46ca7
#30 [ffff88082ec23de8] target_recovery_thread at ffffffffa062d0e8
#31 [ffff88082ec23f48] kernel_thread at ffffffff8100d1aa
crash&gt; log | tail -100
LustreError: 13540:0:(mdt_handler.c:2796:mdt_recovery()) operation 400 on unconnected MDS from 12345-10.100.62.53@o2ib2
LustreError: 13551:0:(mdt_handler.c:2796:mdt_recovery()) operation 400 on unconnected MDS from 12345-10.100.62.52@o2ib2
LustreError: 13568:0:(ldlm_lib.c:2123:target_send_reply_msg()) @@@ processing error (-19)  req@ffff88083f8c1400
x1382292200733537/t0(0) o38-&gt;&lt;?&gt;@&lt;?&gt;:0/0 lens 368/0 e 0 to 0 dl 1318420955 ref 1
fl Interpret:/ffffffff/ffffffff rc -19/-1
LustreError: 13568:0:(ldlm_lib.c:2123:target_send_reply_msg()) Skipped 63 previous similar messages
LustreError: 137-5: UUID 'work1-MDT0000_UUID' is not available  for connect (no target)
LustreError: Skipped 215 previous similar messages
Lustre: 13009:0:(ldlm_lib.c:866:target_handle_connect()) MGS: connection from
8226fbc1-0138-e464-46e9-4106014c8d32@10.100.60.97@o2ib2 t0 exp (null) cur 1318420879 last 0
Lustre: 13009:0:(ldlm_lib.c:866:target_handle_connect()) Skipped 583 previous similar messages
Lustre: 13009:0:(sec.c:1428:sptlrpc_import_sec_adapt()) import MGS-&gt; netid 50000: select flavor null
Lustre: 13009:0:(sec.c:1428:sptlrpc_import_sec_adapt()) Skipped 647 previous similar messages
LustreError: 12579:0:(lib-move.c:2633:LNetGet()) error sending GET to 12345-172.24.110.198@o2ib: -113
LustreError: 12579:0:(lib-move.c:2633:LNetGet()) Skipped 1 previous similar message
LustreError: 13512:0:(ldlm_lib.c:2123:target_send_reply_msg()) @@@ processing error (-19)  req@ffff88081fe70400
x1382292245433402/t0(0) o38-&gt;&lt;?&gt;@&lt;?&gt;:0/0 lens 368/0 e 0 to 0 dl 1318420988 ref 1
fl Interpret:/ffffffff/ffffffff rc -19/-1
LustreError: 13512:0:(ldlm_lib.c:2123:target_send_reply_msg()) Skipped 341 previous similar messages
LustreError: 137-5: UUID 'work1-MDT0000_UUID' is not available  for connect (no target)
LustreError: Skipped 326 previous similar messages
Lustre: 13528:0:(ldlm_lib.c:2023:target_queue_recovery_request()) Next recovery transno: 42949908740, current: 42949908740,
replaying: 0
Lustre: 13532:0:(ldlm_lib.c:2023:target_queue_recovery_request()) Next recovery transno: 42949908373, current: 42949908373,
replaying: 0
LustreError: 12579:0:(lib-move.c:2633:LNetGet()) error sending GET to 12345-172.24.110.198@o2ib: -113
LustreError: 12579:0:(lib-move.c:2633:LNetGet()) Skipped 1 previous similar message
Lustre: 13536:0:(ldlm_lib.c:2023:target_queue_recovery_request()) Next recovery transno: 42949908374, current: 42949908374,
replaying: 1
Lustre: store1-MDT0000: sending delayed replies to recovered clients
Lustre: 13511:0:(mds_lov.c:1014:mds_notify()) MDS mdd_obd-store1-MDT0000-0: in recovery, not resetting orphans
on store1-OST0000_UUID
Lustre: 13511:0:(mds_lov.c:1014:mds_notify()) Skipped 62 previous similar messages
Lustre: 13511:0:(mdd_orphans.c:360:orph_key_test_and_del()) Found orphan! Delete it
------------[ cut here ]------------
kernel BUG at fs/jbd2/transaction.c:1030!
Lustre: MDS mdd_obd-store1-MDT0000-0: store1-OST001e_UUID now active, resetting orphans
invalid opcode: 0000 [#1] SMP 
last sysfs file: /sys/devices/pci0000:80/0000:80:07.0/0000:86:00.1/host14/rport-14:0-0/target14:0:0/14:0:0:5/timeout
CPU 30 
Modules linked in: cmm(U) osd_ldiskfs(U) mdt(U) mdd(U) mds(U) fsfilt_ldiskfs(U) mgs(U) mgc(U) ldiskfs(U) lustre(U) lov(U)
osc(U) mdc(U) lquota(U) fid(U) fld(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) ipmi_devintf
Lustre: MDS mdd_obd-store1-MDT0000-0: store1-OST002e_UUID now active, resetting orphans
Lustre: Skipped 40 previous similar messages
 ipmi_si ipmi_msghandler nfs lockd fscache nfs_acl auth_rpcgss sunrpc acpi_cpufreq freq_table rdma_ucm(U) ib_sdp(U)
rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib(U) ib_cm(U) ib_sa(U) ipv6 ib_uverbs(U) ib_umad(U) mlx4_ib(U) mlx4_core(U) ib_mthca(U)
ib_mad(U) ib_core(U) dm_round_robin dm_multipath usbhid hid i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support ehci_hcd
uhci_hcd ioatdma lpfc scsi_transport_fc scsi_tgt sg igb dca ext4 jbd2 sd_mod crc_t10dif ahci megaraid_sas dm_mod [last
unloaded: microcode]

Modules linked in: cmm(U) osd_ldiskfs(U) mdt(U) mdd(U) mds(U) fsfilt_ldiskfs(U) mgs(U) mgc(U) ldiskfs(U) lustre(U) lov(U)
osc(U) mdc(U) lquota(U) fid(U) fld(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) ipmi_devintf ipmi_si
ipmi_msghandler nfs lockd fscache nfs_acl auth_rpcgss sunrpc acpi_cpufreq freq_table rdma_ucm(U) ib_sdp(U) rdma_cm(U)
iw_cm(U) ib_addr(U) ib_ipoib(U) ib_cm(U) ib_sa(U) ipv6 ib_uverbs(U) ib_umad(U) mlx4_ib(U) mlx4_core(U) ib_mthca(U) ib_mad(U)
ib_core(U) dm_round_robin dm_multipath usbhid hid i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support ehci_hcd uhci_hcd ioatdma
lpfc scsi_transport_fc scsi_tgt sg igb dca ext4 jbd2 sd_mod crc_t10dif ahci megaraid_sas dm_mod [last unloaded:
microcode]
Pid: 13511, comm: tgt_recov Not tainted 2.6.32-71.24.1.el6.Bull.23.x86_64 #1 bullx super-node
RIP: 0010:[&lt;ffffffffa00518ed&gt;]  [&lt;ffffffffa00518ed&gt;] jbd2_journal_dirty_metadata+0x10d/0x150
[jbd2]
RSP: 0018:ffff88082ec23190  EFLAGS: 00010246
RAX: ffff881830d86ac0 RBX: ffff881841975438 RCX: ffff8818419dfe98
RDX: 0000000000000000 RSI: ffff8818419dfe98 RDI: 0000000000000000
RBP: ffff88082ec231b0 R08: 3010000000000000 R09: e7a06811cf77a602
R10: 0000000000000000 R11: 00000000000001ad R12: ffff8818419dd438
R13: ffff8818419dfe98 R14: ffff88185a75c800 R15: ffff8818419de558
FS:  00007fc535a01700(0000) GS:ffff88109c7c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f9cfa3a6000 CR3: 0000000001001000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process tgt_recov (pid: 13511, threadinfo ffff88082ec22000, task ffff88085b462440)
Stack:
 ffff881841975438 ffffffffa097e37f ffff8818419dfe98 ffff8818419de558
&lt;0&gt; ffff88082ec231f0 ffffffffa09433fb ffff88082ec231e0 ffffffff8107f36f
&lt;0&gt; ffff881841975438 0000000000000000 ffff88082ec2335c ffff8818419dfe98
Call Trace:
 [&lt;ffffffffa09433fb&gt;] __ldiskfs_handle_dirty_metadata+0x7b/0x100 [ldiskfs]
 [&lt;ffffffff8107f36f&gt;] ? wake_up_bit+0x2f/0x40
 [&lt;ffffffffa095150d&gt;] ldiskfs_getblk+0x15d/0x1f0 [ldiskfs]
 [&lt;ffffffffa09515b8&gt;] ldiskfs_bread+0x18/0x80 [ldiskfs]
 [&lt;ffffffffa09f0e47&gt;] fsfilt_ldiskfs_write_handle+0x147/0x340 [fsfilt_ldiskfs]
 [&lt;ffffffffa09f10fe&gt;] fsfilt_ldiskfs_write_record+0xbe/0x1a0 [fsfilt_ldiskfs]
 [&lt;ffffffffa04b9a03&gt;] ? cfs_alloc+0x63/0x90 [libcfs]
 [&lt;ffffffffa053318c&gt;] llog_lvfs_write_blob+0x2bc/0x460 [obdclass]
 [&lt;ffffffffa052e758&gt;] ? llog_init_handle+0xa18/0xa70 [obdclass]
 [&lt;ffffffff8106666b&gt;] ? round_jiffies_up+0x1b/0x20
 [&lt;ffffffffa05349fa&gt;] llog_lvfs_write_rec+0x40a/0xf00 [obdclass]
 [&lt;ffffffffa0531905&gt;] llog_cat_add_rec+0xf5/0x840 [obdclass]
 [&lt;ffffffffa05381a6&gt;] llog_obd_origin_add+0x56/0x190 [obdclass]
 [&lt;ffffffff8110c431&gt;] ? __alloc_pages_nodemask+0x111/0x850
 [&lt;ffffffffa0538381&gt;] llog_add+0xa1/0x3c0 [obdclass]
 [&lt;ffffffffa0538e64&gt;] ? __llog_ctxt_put+0x34/0x220 [obdclass]
 [&lt;ffffffff8114159a&gt;] ? kmem_getpages+0xba/0x170
 [&lt;ffffffffa082d0cc&gt;] lov_llog_origin_add+0xcc/0x640 [lov]
 [&lt;ffffffffa0538381&gt;] llog_add+0xa1/0x3c0 [obdclass]
 [&lt;ffffffffa05f1d29&gt;] mds_llog_origin_add+0xb9/0x320 [mds]
 [&lt;ffffffffa0538381&gt;] llog_add+0xa1/0x3c0 [obdclass]
 [&lt;ffffffffa05f2414&gt;] mds_llog_add_unlink+0x164/0x540 [mds]
 [&lt;ffffffffa05f2b26&gt;] mds_log_op_unlink+0x196/0x9a0 [mds]
 [&lt;ffffffffa072503b&gt;] ? osd_xattr_get+0xfb/0x170 [osd_ldiskfs]
 [&lt;ffffffffa0a0cc1e&gt;] mdd_unlink_log+0x4e/0x100 [mdd]
 [&lt;ffffffffa0a0f5d0&gt;] mdd_lov_destroy+0x130/0xf70 [mdd]
 [&lt;ffffffffa0725142&gt;] ? osd_attr_get+0x92/0x140 [osd_ldiskfs]
 [&lt;ffffffffa0a0180d&gt;] ? mdd_la_get+0xad/0xb0 [mdd]
 [&lt;ffffffffa0a16c8c&gt;] __mdd_orphan_cleanup+0x95c/0xae0 [mdd]
 [&lt;ffffffffa0841db1&gt;] ? lov_notify+0xde1/0x15e0 [lov]
 [&lt;ffffffffa0a28c08&gt;] mdd_recovery_complete+0x168/0x570 [mdd]
 [&lt;ffffffffa073cd19&gt;] ? cmm_key_init+0x59/0x190 [cmm]
 [&lt;ffffffffa073a17a&gt;] cmm_recovery_complete+0x3a/0x100 [cmm]
 [&lt;ffffffffa05773ab&gt;] ? lu_context_init+0xab/0x240 [obdclass]
 [&lt;ffffffffa0a4378e&gt;] mdt_postrecov+0x4e/0x130 [mdt]
 [&lt;ffffffffa0a46ca7&gt;] mdt_obd_postrecov+0xd7/0x100 [mdt]
 [&lt;ffffffffa0a50a10&gt;] ? mdt_recovery_handle+0x0/0x160 [mdt]
 [&lt;ffffffffa062d0e8&gt;] target_recovery_thread+0x938/0xf10 [ptlrpc]
 [&lt;ffffffff8100d1aa&gt;] child_rip+0xa/0x20
 [&lt;ffffffffa062c7b0&gt;] ? target_recovery_thread+0x0/0xf10 [ptlrpc]
 [&lt;ffffffff8100d1a0&gt;] ? child_rip+0x0/0x20
Code: c6 9c 03 00 00 4c 89 f7 e8 c1 80 41 e1 48 8b 33 ba 01 00 00 00 4c 89 e7 e8 91 eb ff ff 4c 89 f0 66 ff 00 66 66
90 e9 73 ff ff ff &lt;0f&gt; 0b eb fe 0f 0b eb fe 0f 0b 66 0f 1f 84 00 00 00 00 00 eb f5
RIP  [&lt;ffffffffa00518ed&gt;] jbd2_journal_dirty_metadata+0x10d/0x150 [jbd2]
 RSP &lt;ffff88082ec23190&gt;
crash&gt; 

Comment by Zhenyu Xu [ 13/Oct/11 ]

It looks I underestimate the llog number mdd_lov_destroy() needs, it requires "stripe_count" of llog records instead of 1.

update patch set 8.

Comment by Patrick Valentin (Inactive) [ 20/Oct/11 ]

By the beginning of september, we delevered to our CEA customer an Efix containing the patch proposed for LU-81 ticket, which is till in open state.
This patch contains the following line in "mdd_changelog_user_purge_cb()" routine in "lustre/mdd/mdd_device.c file":

mdd_txn_param_build(mcud->mcud_env, mdd, MDD_TXN_UNLINK_OP);

In the patch proposed for LU-601 ticket, "mdd_txn_param_build()" routine has now an additional "int changelog_cnt" parameter.
We can also see in "orphan_object_destroy()" routine in "lustre/mdd/mdd_orphans.c" file that this patch adds a call to "mdd_log_txn_param_build()" with MDD_TXN_UNLINK_OP operation.

As we have to produce a new EFIX for CEA containing both LU-81 and LU-601, we have the following question about the call to "mdd_txn_param_build()" in "mdd_changelog_user_purge_cb()":
1) Do we have to keep this call to "mdd_txn_param_build()", and which value should we pass for "int changelog_cnt" parameter ?
2) Or do we have to replace it by a call to "mdd_log_txn_param_build()", and in this case which value do we have to pass for "md_object", "md_attr" and "changelog_cnt" parameters?

TIA
Patrick

Comment by Zhenyu Xu [ 20/Oct/11 ]

I've consult with LU-81 patch author, it does not involve writing log record for LOV, so it can be kept as mdd_txn_param_build() with 0 as its changelog_cnt.

According to the review of patch set 10, a new patch set will be posted soon.

Comment by Patrick Valentin (Inactive) [ 20/Oct/11 ]

Hi,
Thanks a lot for the information.
I'm waiting for the next patch set to be reviewed before integrating it. The CEA EFIX must be available on november the 7th.

Thanks and regards,
Patrick

Comment by Bruno Faccini (Inactive) [ 27/Oct/11 ]

Very bad news from TGCC site, we now suffer (twice today until now !) the same problem no longer during orphan recovery but under normal operations with the following stack trace :
===================================================================================
PID: 8018 TASK: ffff881826c2af60 CPU: 1 COMMAND: "mdt_40"
#0 [ffff881826c30d80] machine_kexec at ffffffff8102e77b
#1 [ffff881826c30de0] crash_kexec at ffffffff810a6cd8
#2 [ffff881826c30eb0] oops_end at ffffffff8146aad0
#3 [ffff881826c30ee0] die at ffffffff8101021b
#4 [ffff881826c30f10] do_trap at ffffffff8146a3a4
#5 [ffff881826c30f70] do_invalid_op at ffffffff8100dda5
#6 [ffff881826c31010] invalid_op at ffffffff8100cf3b
[exception RIP: jbd2_journal_dirty_metadata+269]
RIP: ffffffffa04228ed RSP: ffff881826c310c0 RFLAGS: 00010246
RAX: ffff8810418ac280 RBX: ffff88103e6af7f8 RCX: ffff8817f06c0610
RDX: 0000000000000000 RSI: ffff8817f06c0610 RDI: 0000000000000000
RBP: ffff881826c310e0 R8: 2010000000000000 R9: e7f199f60af98402
R10: 0000000000000001 R11: 0000000000001970 R12: ffff880ffcdcad68
R13: ffff8817f06c0610 R14: ffff881058b77800 R15: 0000000000002000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#7 [ffff881826c310e8] __ldiskfs_handle_dirty_metadata at ffffffffa043d3fb
#8 [ffff881826c31128] fsfilt_ldiskfs_write_handle at ffffffffa095fde7
#9 [ffff881826c311d8] fsfilt_ldiskfs_write_record at ffffffffa09600fe
#10 [ffff881826c31258] llog_lvfs_write_blob at ffffffffa054118c
#11 [ffff881826c31308] llog_lvfs_write_rec at ffffffffa05429fa
#12 [ffff881826c313e8] llog_cat_add_rec at ffffffffa053f905
#13 [ffff881826c31468] llog_obd_origin_add at ffffffffa05461a6
#14 [ffff881826c314c8] llog_add at ffffffffa0546381
#15 [ffff881826c31538] lov_llog_origin_add at ffffffffa083b0cc
#16 [ffff881826c315e8] llog_add at ffffffffa0546381
#17 [ffff881826c31658] mds_llog_origin_add at ffffffffa0975d29
#18 [ffff881826c316d8] llog_add at ffffffffa0546381
#19 [ffff881826c31748] mdd_log_op_setattr at ffffffffa099c80a
#20 [ffff881826c317e8] mdd_setattr_log at ffffffffa099d480
#21 [ffff881826c31858] mdd_attr_set at ffffffffa0993e07
#22 [ffff881826c31968] cml_attr_set at ffffffffa0a6f7af
#23 [ffff881826c319c8] mdt_attr_set at ffffffffa09fc6c4
#24 [ffff881826c31a58] mdt_reint_setattr at ffffffffa09fce1e
#25 [ffff881826c31ae8] mdt_reint_rec at ffffffffa09f864f
#26 [ffff881826c31b38] mdt_reint_internal at ffffffffa09efa04
#27 [ffff881826c31bc8] mdt_reint at ffffffffa09efd6c
#28 [ffff881826c31c18] mdt_handle_common at ffffffffa09e49f5
#29 [ffff881826c31c98] mdt_regular_handle at ffffffffa09e5a05
#30 [ffff881826c31ca8] ptlrpc_server_handle_request at ffffffffa06825f1
#31 [ffff881826c31de8] ptlrpc_main at ffffffffa0683992
#32 [ffff881826c31f48] kernel_thread at ffffffff8100d1aa
===================================================================================

seems that this new occurence/scenario occurs during a "chmod -R" command.

So we may trigger a new case of wrong block computation for chmod/setattr related Changelog/llog transaction ??...

Comment by Zhenyu Xu [ 27/Oct/11 ]

yes, another transaction credit miscalculation case. will update patch.

Comment by jacques-charles lafoucriere [ 02/Nov/11 ]

It seems very difficult to compute a safe size for all cases (recovery, replay, .....)
Is it possible to oversize this size or to have a dynamic way to set it so in case of bad number we can keep the system up and running ?

Comment by Build Master (Inactive) [ 06/Nov/11 ]

Integrated in lustre-master » x86_64,client,el5,ofa #342
LU-601 mdd: Fix transaction credits (Revision dc9b1f76d1d0ea7326dfd77b0361b0baf1e67aed)

Result = SUCCESS
Oleg Drokin : dc9b1f76d1d0ea7326dfd77b0361b0baf1e67aed
Files :

  • lustre/mdd/mdd_trans.c
  • lustre/mdd/mdd_orphans.c
  • lustre/mdd/mdd_object.c
  • lustre/mdd/mdd_internal.h
  • lustre/mdd/mdd_dir.c
Comment by Build Master (Inactive) [ 06/Nov/11 ]

Integrated in lustre-master » i686,client,el6,inkernel #342
LU-601 mdd: Fix transaction credits (Revision dc9b1f76d1d0ea7326dfd77b0361b0baf1e67aed)

Result = SUCCESS
Oleg Drokin : dc9b1f76d1d0ea7326dfd77b0361b0baf1e67aed
Files :

  • lustre/mdd/mdd_trans.c
  • lustre/mdd/mdd_object.c
  • lustre/mdd/mdd_orphans.c
  • lustre/mdd/mdd_internal.h
  • lustre/mdd/mdd_dir.c
Comment by Build Master (Inactive) [ 06/Nov/11 ]

Integrated in lustre-master » x86_64,client,el5,inkernel #342
LU-601 mdd: Fix transaction credits (Revision dc9b1f76d1d0ea7326dfd77b0361b0baf1e67aed)

Result = SUCCESS
Oleg Drokin : dc9b1f76d1d0ea7326dfd77b0361b0baf1e67aed
Files :

  • lustre/mdd/mdd_orphans.c
  • lustre/mdd/mdd_object.c
  • lustre/mdd/mdd_dir.c
  • lustre/mdd/mdd_internal.h
  • lustre/mdd/mdd_trans.c
Comment by Build Master (Inactive) [ 06/Nov/11 ]

Integrated in lustre-master » x86_64,client,el6,inkernel #342
LU-601 mdd: Fix transaction credits (Revision dc9b1f76d1d0ea7326dfd77b0361b0baf1e67aed)

Result = SUCCESS
Oleg Drokin : dc9b1f76d1d0ea7326dfd77b0361b0baf1e67aed
Files :

  • lustre/mdd/mdd_internal.h
  • lustre/mdd/mdd_orphans.c
  • lustre/mdd/mdd_object.c
  • lustre/mdd/mdd_trans.c
  • lustre/mdd/mdd_dir.c
Comment by Build Master (Inactive) [ 06/Nov/11 ]

Integrated in lustre-master » x86_64,server,el5,inkernel #342
LU-601 mdd: Fix transaction credits (Revision dc9b1f76d1d0ea7326dfd77b0361b0baf1e67aed)

Result = SUCCESS
Oleg Drokin : dc9b1f76d1d0ea7326dfd77b0361b0baf1e67aed
Files :

  • lustre/mdd/mdd_internal.h
  • lustre/mdd/mdd_orphans.c
  • lustre/mdd/mdd_object.c
  • lustre/mdd/mdd_trans.c
  • lustre/mdd/mdd_dir.c
Comment by Build Master (Inactive) [ 06/Nov/11 ]

Integrated in lustre-master » x86_64,server,el6,inkernel #342
LU-601 mdd: Fix transaction credits (Revision dc9b1f76d1d0ea7326dfd77b0361b0baf1e67aed)

Result = SUCCESS
Oleg Drokin : dc9b1f76d1d0ea7326dfd77b0361b0baf1e67aed
Files :

  • lustre/mdd/mdd_internal.h
  • lustre/mdd/mdd_object.c
  • lustre/mdd/mdd_dir.c
  • lustre/mdd/mdd_trans.c
  • lustre/mdd/mdd_orphans.c
Comment by Build Master (Inactive) [ 06/Nov/11 ]

Integrated in lustre-master » i686,server,el6,inkernel #342
LU-601 mdd: Fix transaction credits (Revision dc9b1f76d1d0ea7326dfd77b0361b0baf1e67aed)

Result = SUCCESS
Oleg Drokin : dc9b1f76d1d0ea7326dfd77b0361b0baf1e67aed
Files :

  • lustre/mdd/mdd_dir.c
  • lustre/mdd/mdd_orphans.c
  • lustre/mdd/mdd_object.c
  • lustre/mdd/mdd_internal.h
  • lustre/mdd/mdd_trans.c
Comment by Build Master (Inactive) [ 06/Nov/11 ]

Integrated in lustre-master » x86_64,client,sles11,inkernel #342
LU-601 mdd: Fix transaction credits (Revision dc9b1f76d1d0ea7326dfd77b0361b0baf1e67aed)

Result = SUCCESS
Oleg Drokin : dc9b1f76d1d0ea7326dfd77b0361b0baf1e67aed
Files :

  • lustre/mdd/mdd_internal.h
  • lustre/mdd/mdd_object.c
  • lustre/mdd/mdd_orphans.c
  • lustre/mdd/mdd_dir.c
  • lustre/mdd/mdd_trans.c
Comment by Build Master (Inactive) [ 06/Nov/11 ]

Integrated in lustre-master » x86_64,client,ubuntu1004,inkernel #342
LU-601 mdd: Fix transaction credits (Revision dc9b1f76d1d0ea7326dfd77b0361b0baf1e67aed)

Result = SUCCESS
Oleg Drokin : dc9b1f76d1d0ea7326dfd77b0361b0baf1e67aed
Files :

  • lustre/mdd/mdd_internal.h
  • lustre/mdd/mdd_trans.c
  • lustre/mdd/mdd_dir.c
  • lustre/mdd/mdd_object.c
  • lustre/mdd/mdd_orphans.c
Comment by Build Master (Inactive) [ 06/Nov/11 ]

Integrated in lustre-master » x86_64,server,el5,ofa #342
LU-601 mdd: Fix transaction credits (Revision dc9b1f76d1d0ea7326dfd77b0361b0baf1e67aed)

Result = SUCCESS
Oleg Drokin : dc9b1f76d1d0ea7326dfd77b0361b0baf1e67aed
Files :

  • lustre/mdd/mdd_trans.c
  • lustre/mdd/mdd_dir.c
  • lustre/mdd/mdd_orphans.c
  • lustre/mdd/mdd_object.c
  • lustre/mdd/mdd_internal.h
Comment by Build Master (Inactive) [ 06/Nov/11 ]

Integrated in lustre-master » i686,server,el5,inkernel #342
LU-601 mdd: Fix transaction credits (Revision dc9b1f76d1d0ea7326dfd77b0361b0baf1e67aed)

Result = SUCCESS
Oleg Drokin : dc9b1f76d1d0ea7326dfd77b0361b0baf1e67aed
Files :

  • lustre/mdd/mdd_dir.c
  • lustre/mdd/mdd_trans.c
  • lustre/mdd/mdd_orphans.c
  • lustre/mdd/mdd_object.c
  • lustre/mdd/mdd_internal.h
Comment by Build Master (Inactive) [ 06/Nov/11 ]

Integrated in lustre-master » i686,server,el5,ofa #342
LU-601 mdd: Fix transaction credits (Revision dc9b1f76d1d0ea7326dfd77b0361b0baf1e67aed)

Result = SUCCESS
Oleg Drokin : dc9b1f76d1d0ea7326dfd77b0361b0baf1e67aed
Files :

  • lustre/mdd/mdd_trans.c
  • lustre/mdd/mdd_dir.c
  • lustre/mdd/mdd_object.c
  • lustre/mdd/mdd_orphans.c
  • lustre/mdd/mdd_internal.h
Comment by Build Master (Inactive) [ 06/Nov/11 ]

Integrated in lustre-master » i686,client,el5,inkernel #342
LU-601 mdd: Fix transaction credits (Revision dc9b1f76d1d0ea7326dfd77b0361b0baf1e67aed)

Result = SUCCESS
Oleg Drokin : dc9b1f76d1d0ea7326dfd77b0361b0baf1e67aed
Files :

  • lustre/mdd/mdd_dir.c
  • lustre/mdd/mdd_object.c
  • lustre/mdd/mdd_orphans.c
  • lustre/mdd/mdd_trans.c
  • lustre/mdd/mdd_internal.h
Comment by Build Master (Inactive) [ 06/Nov/11 ]

Integrated in lustre-master » i686,client,el5,ofa #342
LU-601 mdd: Fix transaction credits (Revision dc9b1f76d1d0ea7326dfd77b0361b0baf1e67aed)

Result = SUCCESS
Oleg Drokin : dc9b1f76d1d0ea7326dfd77b0361b0baf1e67aed
Files :

  • lustre/mdd/mdd_dir.c
  • lustre/mdd/mdd_trans.c
  • lustre/mdd/mdd_object.c
  • lustre/mdd/mdd_orphans.c
  • lustre/mdd/mdd_internal.h
Comment by Alex Zhuravlev [ 07/Nov/11 ]

the patch landed on the master will have problems with backends like ZFS where we have to declare changes against specific objects (not just a number of abstract llog objects). following patch's logic we'd have to declare changes to all possible llogs.

Comment by Zhenyu Xu [ 07/Nov/11 ]

Then how can we solve this issue? Since the changelogs ops (such as mdd_changelog_data_store) assocated with mdd object operations also uses the same transaction handle, and hence consume transaction credit. The object changes declaration do not cover this part of extra changelog requirement.

Comment by Alex Zhuravlev [ 08/Nov/11 ]

don't know yet .. something to solve soon, I hope. btw, calling mds_lov_create() (i.e. assign objid) outside of transaction is equally bad as this can result in orphans on OST. from this point of view it's OK do no llog object removal in this case at all.

Comment by Alex Zhuravlev [ 08/Nov/11 ]

even more interesting ... what's the point to declare llog records if mdd_create() don't try to destroy objects in failure case? or I missed this in the code?

Comment by Zhenyu Xu [ 08/Nov/11 ]

we need trans start to create object, and w/o the return of the create op, we don't know whether failure occurs during the op, and hence we can't know in advance whether we should preserve objects destroy credit, so the basic idea is to preserve credits to cover the biggest possible trans credit requirement.

Comment by Alex Zhuravlev [ 08/Nov/11 ]

I understand that. now tell me where mdd_create() destroys objects if creation failed?

Comment by Zhenyu Xu [ 08/Nov/11 ]

In mdd_crete(), it could possibly need to delete lost precreated objects.

lustre/mdd/mdd_dir.c
1860         /* update lov_objid data, must be before transaction stop! */
1861         if (rc == 0)
1862                 mdd_lov_objid_update(mdd, lmm);
Comment by Alex Zhuravlev [ 08/Nov/11 ]

if (rc == 0) – meaning original change to declare llog records is useless in mdd_create() ?

Comment by Zhenyu Xu [ 08/Nov/11 ]

I got your meaning now. Sorry, my mistake, I missed that part, mdd_create object credit already considered this create_object_credit.

Comment by Alex Zhuravlev [ 08/Nov/11 ]

my point was that mdd_create() never cleanup just created objects, thus llog declaration is useless.

Comment by Zhenyu Xu [ 08/Nov/11 ]

yes, mdd_create() does not cleanup just created objects.

And 
  mdd_create_txn_param_build(env, mdd, lmm, MDD_TXN_MKDIR_OP, 1)->mdd_txn_param_build(env, mdd, op, stripes + 1);
where stripes covers credit needed by mdd_lov_objid_update()
while   1     covers credit needed by 
        if (rc == 0)
                rc = mdd_changelog_ns_store(env, mdd,
                            S_ISDIR(attr->la_mode) ? CL_MKDIR :
                            S_ISREG(attr->la_mode) ? CL_CREATE :
                            S_ISLNK(attr->la_mode) ? CL_SOFTLINK : CL_MKNOD,
                            0, son, mdd_pobj, NULL, lname, handle);
Comment by Andreas Dilger [ 09/Nov/11 ]

JC,
while it would be possible to increase the transaction credits for arbitrary operations to provide a safety margin, this has two potential negative side effects:

  • unless the root cause of the transaction credit shortage are understood, a small safety margin will not necessarily handle all cases of the problem, but just make the frequency of the error less common.
  • using a large safety margin can negatively impact the MDS performance, due to too-large transaction handles causing the transaction to close too early, and reducing the number of shared blocks in each transaction. If this happens too often then the MDS may be blocked waiting for the old transactions to commit, each one writing more blocks to the filesystem than needed.

The transaction credits are already (or should be) maximally pessimistic for each operations, and just adding a handful of credits to each operation won't necessarily handle the case where there is some file being modified that is entirely unaccounted for. I don't object to CEA installing a patch that reserves some extra credits for each transaction (in order to mitigate production interruptions), with the understanding that either there is a larger-than-normal journal file/device and RAM for the MDS, or potentially that the metadata performance may be slowed down. However, this isn't a long-term solution to the problem being hit.

It may well be that CEA is hitting this problem due to the use of ChangeLogs (not typically in use at other sites), and in some cases when there is a need to create a second llog file for the changelog it needs to reserve more credits? I would suggest that Bobijam try to run some tests with ChangeLogs enabled (e.g. "untar kernel tarball; chmod -R; rm -r" in a loop) to see if this triggers the problem being seen here.

That said, the current mechanism for transaction credit reservation for ldiskfs/JBD2 is very poor. In essence, the caller (Lustre in this case) needs to know every detail of how the internal credits are accounted for every change being made. The infrastructure that Alex is working on in the orion branch, and which is planned to land into master for the 2.2 release, will ensure that every operation being done to the filesystem has sufficient transaction credits, and can track and validate the modification of objects at the OSD level. While I don't suggest this be deployed in production immediately (it is intended to go through the 2.2 release validation testing first), it has been in use on the orion branch for more than a year already, and works with ZFS (which is very stringent on transaction accounting and does not even have a mechanism to allow "safety margin" to be added, unlike ldiskfs).

Comment by Aurelien Degremont (Inactive) [ 15/Nov/11 ]

Hello

At CEA we hit again this bug, with the patch applied, again in mdd_attr_set() context. So I look closely to this patch and I've got several question/comments.

  • The new option chglog_cnt counts not the number of Changelog records that will be added, but the number of LLOG records that will be added, which could be Changelog or object llog use with unlink, chown, and so on? It could be a good idea to name this llog_cnt and not chglog_cnt.
  • By the way, I'm not an expert with llog files, but is it ok to compute credit for changelog and other kind of llogs with the same value?
  • The number of llog records needed, based on stripecount, is computed in mdd_log_txn_param_build() and mdd_create_txn_param_build() functions. There is now a third case where this computation is needed, for chown cases. This is done in mdd_attr_set(), which call mdd_setattr_txn_param_build(). It could clarify the code if this computation is also moved in mdd_setattr_txn_param_build(). In this case, all this computations will be grouped in the same kind of functions, in the same C file. It will be much simpler to maintain and will help to avoid mistakes.
  • It is possible to fill the file used for LLOG_MDS_OST and we need to create another file the computed credit do not take this case in account?
Comment by Zhenyu Xu [ 15/Nov/11 ]

The changelog leverages llog records to record MD change, so they occupy the same credit value.

The mdd_attr_set() chown case has considered the case to add journal chown/chgrp in llog, and I made a mistake to heed the endianness of lov_mds_md, I apologize for that, fix is posted at http://review.whamcloud.com/1702.

While I assume that you use little-endian machines, this glitch should not cause credit insufficience.

Comment by Zhenyu Xu [ 16/Nov/11 ]

I am running the test Andreas suggested on my local virtual machine.

test script
# lfs setstripe -c -1 /mnt/lustre
# lctl
> device lustre-MDT0000
> changelog_register
> quit
# while (true) do  rm -rf linux*; tar xzf /mnt/kernel/linux-2.6.18-194.17.1.el5.tgz; chmod 777 -R ./; chown nobody -R ./; done

I can observe "lfs changelog lustr-MDT0000" output keep refreshing, and by checking debug logs, it shows that chown reserve ost_count number+1 of llog credit.

The test is still running without error happens.

Comment by Aurelien Degremont (Inactive) [ 17/Nov/11 ]

We could reproduce the issue simply with a big stripe count.

# mkdir foo54
# lfs setstripe -c 54 foo54
# cd foo54
# touch file1
# chown nobody file1

If you have a stripe count below 20 OSTS and you could not see the problem.

Comment by Zhenyu Xu [ 18/Nov/11 ]

Aurelien,

Does the ldiskfs used by MDT formatted with "extents" option?

Comment by Aurelien Degremont (Inactive) [ 18/Nov/11 ]

No, it doesn't.
Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery FEATURE_I8 flex_bg FEATURE_I12 sparse_super large_file uninit_bg

Comment by Zhenyu Xu [ 18/Nov/11 ]

from your description, i suspect we had insufficient basic credit blocks calculation, with bigger stripe count, the single insufficience gets multiplied and results in the LBUG. I'm checking the whole credit calculation formula.

Comment by Zhenyu Xu [ 21/Nov/11 ]

Hi Anrelien,

Does the chmod test case over big stripe count issue reproducible when you do not use ChangeLog feature?

Comment by Aurelien Degremont (Inactive) [ 21/Nov/11 ]

I prepare the reproducer on filesystems which do not use Changelog.
Did you succeed in reproducing it on your systems?

Comment by Zhenyu Xu [ 21/Nov/11 ]

no, I can not reproduce it. I'm using this build image http://build.whamcloud.com/job/lustre-reviews/3283/ , patch set 1 of http://review.whamcloud.com/1702

[root@client-7 foo27]# uname -a
Linux client-7.lab.whamcloud.com 2.6.18-238.19.1.el5_lustre #1 SMP Fri Nov 11 21:40:30 PST 2011 x86_64 x86_64 x86_64 GNU/Linux
[root@client-7 foo27]# lfs getstripe file2
file2
lmm_stripe_count:   27
lmm_stripe_size:    1048576
lmm_stripe_offset:  2
        obdidx           objid          objid            group
             2              10            0xa                0
            11               6            0x6                0
            25               5            0x5                0
...
[root@client-7 foo27]# chown nobody file2
[root@client-7 foo27]# lfs changelog lu601-MDT0000
13 00MARK  02:36:42.159352273 2011.11.22 0x0 t=[0x10001:0x0:0x0] p=[0:0x0:0x0] mdd_obd-lu601-MDT0000
14 01CREAT 02:37:06.692662827 2011.11.22 0x0 t=[0x200000400:0x16:0x0] p=[0x200000400:0x10:0x0] file2
15 14SATTR 02:37:18.943318820 2011.11.22 0x24 t=[0x200000400:0x16:0x0]
16 01CREAT 02:37:39.931729863 2011.11.22 0x0 t=[0x200000400:0x17:0x0] p=[0x200000400:0x13:0x0] file2
17 14SATTR 02:38:10.248880048 2011.11.22 0x24 t=[0x200000400:0x17:0x0]
18 14SATTR 02:39:18.837961386 2011.11.22 0x24 t=[0x200000400:0x17:0x0]

system is still sound.

Comment by Aurelien Degremont (Inactive) [ 22/Nov/11 ]

As I put in my reproducer above, you need to go further than stripecount=27. I reproduce it with stripecount=54. The smaller the value is, the harder is it to reproduce it.
Clearly here 27 is too small.

Comment by Zhenyu Xu [ 22/Nov/11 ]

tried with 71 stripe count, still ok for me.

[root@client-7 foo]# lfs changelog lu601-MDT0000
1 00MARK  17:05:12.327394996 2011.11.22 0x0 t=[0x10001:0x0:0x0] p=[0:0x0:0x0] mdd_obd-lu601-MDT0000
[root@client-7 foo]# lfs getstripe file1
file1
lmm_stripe_count:   71
lmm_stripe_size:    1048576
lmm_stripe_offset:  31
        obdidx           objid          objid            group
            31               3            0x3                0
             7               5            0x5                0
            69               4            0x4                0
...
[root@client-7 foo]# chown nobody file1
[root@client-7 foo]# lfs changelog lu601-MDT0000
1 00MARK  17:05:12.327394996 2011.11.22 0x0 t=[0x10001:0x0:0x0] p=[0:0x0:0x0] mdd_obd-lu601-MDT0000
2 01CREAT 17:07:45.800675043 2011.11.22 0x0 t=[0x200000400:0x7:0x0] p=[0x200000400:0x1:0x0] file1
3 14SATTR 17:07:50.227566703 2011.11.22 0x24 t=[0x200000400:0x7:0x0]
Comment by Patrick Valentin (Inactive) [ 16/Dec/11 ]

I made new tests with both lustre 2.0 and 2.1, with the patches corresponding to LU601 (change 1276 patch set 13 + change 1702) and LU81.
The lustre file system ("fs64") uses 66 loop devices on the same server: 1 MGT + 1 MDT + 64 OSTs.
I use the following script to create 64 empty files, having 1 to 64 stripes:

# for i in `seq 1 64`; do
        mkdir -p /fs64/dir$i
        lfs setstripe -c $i /fs64/dir$i
        touch /fs64/dir$i/file
done

On lustre 2.1 I have errors with "chown" and "rm" commands on some files.
On Luster 2.0 I have crashes (ASSERT in fs/jbd2/transaction.c) with "chown" command on some files, and errors with "rm" command on some files.

I saw in the lustre trace buffer the following message each time an error occurs: "Invalid transaction parameters".
As this message is printed in "lustre/osd-ldiskfs/osd_handler.c:osd_trans_start()", I have added the following line at the beginning of this routine to print "tp_credits " and "j_max_transaction_buffers":
monospaced CERROR("Transaction parameters: tp_credits %d, j_max_transaction_buffers %d\n", p->tp_credits, osd_journal(dev)->j_max_transaction_buffers);

Errors with luster 2.1

With "rm" command, I have an error with file having at least 52 stripes (when "tp_credits" is higher than 1024).

# for i in `seq 1 64`; do
        rm -f /fs64/dir$i/file
done
rm: cannot remove `/fs64/dir52/file': Invalid argument
rm: cannot remove `/fs64/dir53/file': Invalid argument
rm: cannot remove `/fs64/dir54/file': Invalid argument
rm: cannot remove `/fs64/dir55/file': Invalid argument
rm: cannot remove `/fs64/dir56/file': Invalid argument
rm: cannot remove `/fs64/dir57/file': Invalid argument
rm: cannot remove `/fs64/dir58/file': Invalid argument
rm: cannot remove `/fs64/dir59/file': Invalid argument
rm: cannot remove `/fs64/dir60/file': Invalid argument
rm: cannot remove `/fs64/dir61/file': Invalid argument
rm: cannot remove `/fs64/dir62/file': Invalid argument
rm: cannot remove `/fs64/dir63/file': Invalid argument
rm: cannot remove `/fs64/dir64/file': Invalid argument

# lctl dk
00000004:00020000:2.0F:1323953356.271819:0:16899:0:(osd_handler.c:671:osd_trans_start()) Transaction parameters: tp_credits 211, j_max_transaction_buffers 1024
00000004:00020000:2.0:1323953356.292189:0:16899:0:(osd_handler.c:671:osd_trans_start()) Transaction parameters: tp_credits 40, j_max_transaction_buffers 1024
00000004:00020000:2.0:1323953356.293661:0:16899:0:(osd_handler.c:671:osd_trans_start()) Transaction parameters: tp_credits 227, j_max_transaction_buffers 1024
00000004:00020000:2.0:1323953356.293725:0:16899:0:(osd_handler.c:671:osd_trans_start()) Transaction parameters: tp_credits 40, j_max_transaction_buffers 1024
00000004:00020000:2.0:1323953356.294987:0:16899:0:(osd_handler.c:671:osd_trans_start()) Transaction parameters: tp_credits 243, j_max_transaction_buffers 1024
00000004:00020000:2.0:1323953356.295054:0:16899:0:(osd_handler.c:671:osd_trans_start()) Transaction parameters: tp_credits 40, j_max_transaction_buffers 1024
00000004:00020000:2.0:1323953356.296300:0:16899:0:(osd_handler.c:671:osd_trans_start()) Transaction parameters: tp_credits 259, j_max_transaction_buffers 1024
00000004:00020000:2.0:1323953356.296363:0:16899:0:(osd_handler.c:671:osd_trans_start()) Transaction parameters: tp_credits 40, j_max_transaction_buffers 1024
00000004:00020000:2.0:1323953356.297550:0:16899:0:(osd_handler.c:671:osd_trans_start()) Transaction parameters: tp_credits 275, j_max_transaction_buffers 1024
00000004:00020000:2.0:1323953356.297640:0:16899:0:(osd_handler.c:671:osd_trans_start()) Transaction parameters: tp_credits 40, j_max_transaction_buffers 1024
...
stripe count  6: tp_credits = 195 + (6 * 16)  = 291
...
stripe count 50: tp_credits = 195 + (50 * 16) = 995
...
00000004:00020000:0.0:1323953356.675685:0:16899:0:(osd_handler.c:671:osd_trans_start()) Transaction parameters: tp_credits 1011, j_max_transaction_buffers 1024
00000004:00020000:4.0F:1323953356.699569:0:16899:0:(osd_handler.c:671:osd_trans_start()) Transaction parameters: tp_credits 40, j_max_transaction_buffers 1024
00000004:00020000:4.0:1323953356.701323:0:16899:0:(osd_handler.c:671:osd_trans_start()) Transaction parameters: tp_credits 1027, j_max_transaction_buffers 1024
00000004:00020000:4.0:1323953356.701326:0:16899:0:(osd_handler.c:712:osd_trans_start()) Invalid transaction parameters
00000004:00020000:4.0:1323953356.711133:0:16899:0:(osd_handler.c:671:osd_trans_start()) Transaction parameters: tp_credits 1043, j_max_transaction_buffers 1024
00000004:00020000:4.0:1323953356.711136:0:16899:0:(osd_handler.c:712:osd_trans_start()) Invalid transaction parameters
00000004:00020000:4.0:1323953356.712715:0:16899:0:(osd_handler.c:671:osd_trans_start()) Transaction parameters: tp_credits 1059, j_max_transaction_buffers 1024
00000004:00020000:4.0:1323953356.712717:0:16899:0:(osd_handler.c:712:osd_trans_start()) Invalid transaction parameters
00000004:00020000:4.0:1323953356.714211:0:16899:0:(osd_handler.c:671:osd_trans_start()) Transaction parameters: tp_credits 1075, j_max_transaction_buffers 1024
00000004:00020000:4.0:1323953356.714214:0:16899:0:(osd_handler.c:712:osd_trans_start()) Invalid transaction parameters
00000004:00020000:4.0:1323953356.715707:0:16899:0:(osd_handler.c:671:osd_trans_start()) Transaction parameters: tp_credits 1091, j_max_transaction_buffers 1024
00000004:00020000:4.0:1323953356.715709:0:16899:0:(osd_handler.c:712:osd_trans_start()) Invalid transaction parameters
00000004:00020000:4.0:1323953356.717242:0:16899:0:(osd_handler.c:671:osd_trans_start()) Transaction parameters: tp_credits 1107, j_max_transaction_buffers 1024
00000004:00020000:4.0:1323953356.717244:0:16899:0:(osd_handler.c:712:osd_trans_start()) Invalid transaction parameters
00000004:00020000:4.0:1323953356.718728:0:16899:0:(osd_handler.c:671:osd_trans_start()) Transaction parameters: tp_credits 1123, j_max_transaction_buffers 1024
00000004:00020000:4.0:1323953356.718730:0:16899:0:(osd_handler.c:712:osd_trans_start()) Invalid transaction parameters
00000004:00020000:4.0:1323953356.720273:0:16899:0:(osd_handler.c:671:osd_trans_start()) Transaction parameters: tp_credits 1139, j_max_transaction_buffers 1024
00000004:00020000:4.0:1323953356.720275:0:16899:0:(osd_handler.c:712:osd_trans_start()) Invalid transaction parameters
00000004:00020000:4.0:1323953356.721743:0:16899:0:(osd_handler.c:671:osd_trans_start()) Transaction parameters: tp_credits 1155, j_max_transaction_buffers 1024
00000004:00020000:4.0:1323953356.721745:0:16899:0:(osd_handler.c:712:osd_trans_start()) Invalid transaction parameters
00000004:00020000:4.0:1323953356.723242:0:16899:0:(osd_handler.c:671:osd_trans_start()) Transaction parameters: tp_credits 1171, j_max_transaction_buffers 1024
00000004:00020000:4.0:1323953356.723244:0:16899:0:(osd_handler.c:712:osd_trans_start()) Invalid transaction parameters
00000004:00020000:4.0:1323953356.724765:0:16899:0:(osd_handler.c:671:osd_trans_start()) Transaction parameters: tp_credits 1187, j_max_transaction_buffers 1024
00000004:00020000:4.0:1323953356.724767:0:16899:0:(osd_handler.c:712:osd_trans_start()) Invalid transaction parameters
00000004:00020000:4.0:1323953356.726266:0:16899:0:(osd_handler.c:671:osd_trans_start()) Transaction parameters: tp_credits 1203, j_max_transaction_buffers 1024
00000004:00020000:4.0:1323953356.726268:0:16899:0:(osd_handler.c:712:osd_trans_start()) Invalid transaction parameters
00000004:00020000:4.0:1323953356.727795:0:16899:0:(osd_handler.c:671:osd_trans_start()) Transaction parameters: tp_credits 1219, j_max_transaction_buffers 1024
00000004:00020000:4.0:1323953356.727797:0:16899:0:(osd_handler.c:712:osd_trans_start()) Invalid transaction parameters
Debug log: 128 lines, 128 kept, 0 dropped, 0 bad.

With "chown" command, I have an error with file having at least 55 stripes (when "tp_credits" is higher than 1024)

# for i in `seq 1 64`; do
        chown nobody /fs64/dir$i/file
done
chown: changing ownership of `/fs64/dir55/file': Invalid argument
chown: changing ownership of `/fs64/dir56/file': Invalid argument
chown: changing ownership of `/fs64/dir57/file': Invalid argument
chown: changing ownership of `/fs64/dir58/file': Invalid argument
chown: changing ownership of `/fs64/dir59/file': Invalid argument
chown: changing ownership of `/fs64/dir60/file': Invalid argument
chown: changing ownership of `/fs64/dir61/file': Invalid argument
chown: changing ownership of `/fs64/dir62/file': Invalid argument
chown: changing ownership of `/fs64/dir63/file': Invalid argument
chown: changing ownership of `/fs64/dir64/file': Invalid argument



# lctl dk
00000004:00020000:4.0F:1323953479.952016:0:16899:0:(osd_handler.c:671:osd_trans_start()) Transaction parameters: tp_credits 170, j_max_transaction_buffers 1024
00000004:00020000:4.0:1323953480.009927:0:16899:0:(osd_handler.c:671:osd_trans_start()) Transaction parameters: tp_credits 186, j_max_transaction_buffers 1024
00000004:00020000:4.0:1323953480.027103:0:16899:0:(osd_handler.c:671:osd_trans_start()) Transaction parameters: tp_credits 202, j_max_transaction_buffers 1024
00000004:00020000:4.0:1323953480.044507:0:16899:0:(osd_handler.c:671:osd_trans_start()) Transaction parameters: tp_credits 218, j_max_transaction_buffers 1024
00000004:00020000:4.0:1323953480.062514:0:16899:0:(osd_handler.c:671:osd_trans_start()) Transaction parameters: tp_credits 234, j_max_transaction_buffers 1024
...
stripe count  6: tp_credits = 154 + (6 * 16)  = 250
...
stripe count 50: tp_credits = 154 + (50 * 16) = 954
...
00000004:00020000:3.0:1323953481.404824:0:16899:0:(osd_handler.c:671:osd_trans_start()) Transaction parameters: tp_credits 970, j_max_transaction_buffers 1024
00000004:00020000:3.0:1323953481.446785:0:16899:0:(osd_handler.c:671:osd_trans_start()) Transaction parameters: tp_credits 986, j_max_transaction_buffers 1024
00000004:00020000:3.0:1323953481.507227:0:16899:0:(osd_handler.c:671:osd_trans_start()) Transaction parameters: tp_credits 1002, j_max_transaction_buffers 1024
00000004:00020000:3.0:1323953481.568827:0:16899:0:(osd_handler.c:671:osd_trans_start()) Transaction parameters: tp_credits 1018, j_max_transaction_buffers 1024
00000004:00020000:3.0:1323953481.596317:0:16899:0:(osd_handler.c:671:osd_trans_start()) Transaction parameters: tp_credits 1034, j_max_transaction_buffers 1024
00000004:00020000:3.0:1323953481.596318:0:16899:0:(osd_handler.c:712:osd_trans_start()) Invalid transaction parameters
00000004:00020000:3.0:1323953481.622089:0:16899:0:(osd_handler.c:671:osd_trans_start()) Transaction parameters: tp_credits 1050, j_max_transaction_buffers 1024
00000004:00020000:3.0:1323953481.622091:0:16899:0:(osd_handler.c:712:osd_trans_start()) Invalid transaction parameters
00000004:00020000:3.0:1323953481.623936:0:16899:0:(osd_handler.c:671:osd_trans_start()) Transaction parameters: tp_credits 1066, j_max_transaction_buffers 1024
00000004:00020000:3.0:1323953481.623938:0:16899:0:(osd_handler.c:712:osd_trans_start()) Invalid transaction parameters
00000004:00020000:3.0:1323953481.625735:0:16899:0:(osd_handler.c:671:osd_trans_start()) Transaction parameters: tp_credits 1082, j_max_transaction_buffers 1024
00000004:00020000:3.0:1323953481.625737:0:16899:0:(osd_handler.c:712:osd_trans_start()) Invalid transaction parameters
00000004:00020000:3.0:1323953481.627526:0:16899:0:(osd_handler.c:671:osd_trans_start()) Transaction parameters: tp_credits 1098, j_max_transaction_buffers 1024
00000004:00020000:3.0:1323953481.627528:0:16899:0:(osd_handler.c:712:osd_trans_start()) Invalid transaction parameters
00000004:00020000:3.0:1323953481.629263:0:16899:0:(osd_handler.c:671:osd_trans_start()) Transaction parameters: tp_credits 1114, j_max_transaction_buffers 1024
00000004:00020000:3.0:1323953481.629266:0:16899:0:(osd_handler.c:712:osd_trans_start()) Invalid transaction parameters
00000004:00020000:3.0:1323953481.631049:0:16899:0:(osd_handler.c:671:osd_trans_start()) Transaction parameters: tp_credits 1130, j_max_transaction_buffers 1024
00000004:00020000:3.0:1323953481.631051:0:16899:0:(osd_handler.c:712:osd_trans_start()) Invalid transaction parameters
00000004:00020000:3.0:1323953481.632806:0:16899:0:(osd_handler.c:671:osd_trans_start()) Transaction parameters: tp_credits 1146, j_max_transaction_buffers 1024
00000004:00020000:3.0:1323953481.632808:0:16899:0:(osd_handler.c:712:osd_trans_start()) Invalid transaction parameters
00000004:00020000:3.0:1323953481.634558:0:16899:0:(osd_handler.c:671:osd_trans_start()) Transaction parameters: tp_credits 1162, j_max_transaction_buffers 1024
00000004:00020000:3.0:1323953481.634560:0:16899:0:(osd_handler.c:712:osd_trans_start()) Invalid transaction parameters
00000004:00020000:3.0:1323953481.636336:0:16899:0:(osd_handler.c:671:osd_trans_start()) Transaction parameters: tp_credits 1178, j_max_transaction_buffers 1024
00000004:00020000:3.0:1323953481.636339:0:16899:0:(osd_handler.c:712:osd_trans_start()) Invalid transaction parameters
Debug log: 74 lines, 74 kept, 0 dropped, 0 bad.

Errors and crashes with luster 2.0

With "rm" command, I have the same errors as with lustre 2.1: error with file having at least 52 stripes (when "tp_credits" is higher than 1024)

With "chown" command, I used the following loop:

for i in `seq 1 64`; do
        chown nobody /fs64/dir$i/file
done

The crash (ASSERT in fs/jbd2/transaction.c:1030) does not occur systematically with a specific stripe count, but with a stripe count between 34 and 40.
I have also observed in lustre traces that "tp_credits" is generally "154 + (16 * stripe_count)", but is sometimes "154". I have run 5 times a loop with stripe from 1 to 10 and the trace below shows that this behaviour does not always occurs on the same files.
I have run a new loop (1 to 10) with lnet.debug=-1 and attached the "lctl dk" output.

for i in `seq 1 10`; do
        chown nobody /fs64/dir$i/file
done

# lctl dk
00000004:00020000:1.0F:1324042078.877570:0:28394:0:(osd_handler.c:620:osd_trans_start()) Transaction parameters: tp_credits 154, j_max_transaction_buffers 1024
00000004:00020000:1.0:1324042078.917511:0:28399:0:(osd_handler.c:620:osd_trans_start()) Transaction parameters: tp_credits 186, j_max_transaction_buffers 1024
00000004:00020000:1.0:1324042078.951194:0:27892:0:(osd_handler.c:620:osd_trans_start()) Transaction parameters: tp_credits 154, j_max_transaction_buffers 1024
00000004:00020000:0.0F:1324042078.985516:0:27897:0:(osd_handler.c:620:osd_trans_start()) Transaction parameters: tp_credits 218, j_max_transaction_buffers 1024
00000004:00020000:1.0:1324042079.019968:0:27902:0:(osd_handler.c:620:osd_trans_start()) Transaction parameters: tp_credits 154, j_max_transaction_buffers 1024
00000004:00020000:1.0:1324042079.055553:0:27907:0:(osd_handler.c:620:osd_trans_start()) Transaction parameters: tp_credits 154, j_max_transaction_buffers 1024
00000004:00020000:1.0:1324042079.090576:0:27912:0:(osd_handler.c:620:osd_trans_start()) Transaction parameters: tp_credits 266, j_max_transaction_buffers 1024
00000004:00020000:1.0:1324042079.125160:0:27917:0:(osd_handler.c:620:osd_trans_start()) Transaction parameters: tp_credits 282, j_max_transaction_buffers 1024
00000004:00020000:0.0:1324042079.160667:0:27922:0:(osd_handler.c:620:osd_trans_start()) Transaction parameters: tp_credits 154, j_max_transaction_buffers 1024
00000004:00020000:1.0:1324042079.195866:0:27927:0:(osd_handler.c:620:osd_trans_start()) Transaction parameters: tp_credits 154, j_max_transaction_buffers 1024
Debug log: 10 lines, 10 kept, 0 dropped, 0 bad.



# lctl dk
00000004:00020000:0.0F:1324042101.166918:0:27932:0:(osd_handler.c:620:osd_trans_start()) Transaction parameters: tp_credits 170, j_max_transaction_buffers 1024
00000004:00020000:1.0F:1324042101.225960:0:27936:0:(osd_handler.c:620:osd_trans_start()) Transaction parameters: tp_credits 186, j_max_transaction_buffers 1024
00000004:00020000:1.0:1324042101.259700:0:27940:0:(osd_handler.c:620:osd_trans_start()) Transaction parameters: tp_credits 154, j_max_transaction_buffers 1024
00000004:00020000:0.0:1324042101.293826:0:27944:0:(osd_handler.c:620:osd_trans_start()) Transaction parameters: tp_credits 154, j_max_transaction_buffers 1024
00000004:00020000:1.0:1324042101.328416:0:27948:0:(osd_handler.c:620:osd_trans_start()) Transaction parameters: tp_credits 234, j_max_transaction_buffers 1024
00000004:00020000:1.0:1324042101.362943:0:27952:0:(osd_handler.c:620:osd_trans_start()) Transaction parameters: tp_credits 154, j_max_transaction_buffers 1024
00000004:00020000:1.0:1324042101.397899:0:27956:0:(osd_handler.c:620:osd_trans_start()) Transaction parameters: tp_credits 266, j_max_transaction_buffers 1024
00000004:00020000:1.0:1324042101.433863:0:27960:0:(osd_handler.c:620:osd_trans_start()) Transaction parameters: tp_credits 154, j_max_transaction_buffers 1024
00000004:00020000:0.0:1324042101.468951:0:27964:0:(osd_handler.c:620:osd_trans_start()) Transaction parameters: tp_credits 298, j_max_transaction_buffers 1024
00000004:00020000:1.0:1324042101.504277:0:27968:0:(osd_handler.c:620:osd_trans_start()) Transaction parameters: tp_credits 154, j_max_transaction_buffers 1024
Debug log: 10 lines, 10 kept, 0 dropped, 0 bad.



# lctl dk
00000004:00020000:1.0F:1324042164.759485:0:27977:0:(osd_handler.c:620:osd_trans_start()) Transaction parameters: tp_credits 154, j_max_transaction_buffers 1024
00000004:00020000:1.0:1324042164.798217:0:27982:0:(osd_handler.c:620:osd_trans_start()) Transaction parameters: tp_credits 186, j_max_transaction_buffers 1024
00000004:00020000:1.0:1324042164.833287:0:27986:0:(osd_handler.c:620:osd_trans_start()) Transaction parameters: tp_credits 154, j_max_transaction_buffers 1024
00000004:00020000:1.0:1324042164.867438:0:27990:0:(osd_handler.c:620:osd_trans_start()) Transaction parameters: tp_credits 218, j_max_transaction_buffers 1024
00000004:00020000:1.0:1324042164.901913:0:27994:0:(osd_handler.c:620:osd_trans_start()) Transaction parameters: tp_credits 154, j_max_transaction_buffers 1024
00000004:00020000:0.0F:1324042164.936663:0:27998:0:(osd_handler.c:620:osd_trans_start()) Transaction parameters: tp_credits 250, j_max_transaction_buffers 1024
00000004:00020000:1.0:1324042164.971295:0:28002:0:(osd_handler.c:620:osd_trans_start()) Transaction parameters: tp_credits 266, j_max_transaction_buffers 1024
00000004:00020000:2.0F:1324042165.006438:0:28006:0:(osd_handler.c:620:osd_trans_start()) Transaction parameters: tp_credits 154, j_max_transaction_buffers 1024
00000004:00020000:1.0:1324042165.043848:0:28010:0:(osd_handler.c:620:osd_trans_start()) Transaction parameters: tp_credits 298, j_max_transaction_buffers 1024
00000004:00020000:1.0:1324042165.079304:0:28014:0:(osd_handler.c:620:osd_trans_start()) Transaction parameters: tp_credits 154, j_max_transaction_buffers 1024
Debug log: 10 lines, 10 kept, 0 dropped, 0 bad.



# lctl dk
00000004:00020000:1.0F:1324042445.862850:0:28039:0:(osd_handler.c:620:osd_trans_start()) Transaction parameters: tp_credits 154, j_max_transaction_buffers 1024
00000004:00020000:1.0:1324042445.924627:0:28043:0:(osd_handler.c:620:osd_trans_start()) Transaction parameters: tp_credits 154, j_max_transaction_buffers 1024
00000004:00020000:1.0:1324042445.958677:0:28047:0:(osd_handler.c:620:osd_trans_start()) Transaction parameters: tp_credits 154, j_max_transaction_buffers 1024
00000004:00020000:1.0:1324042445.992663:0:28051:0:(osd_handler.c:620:osd_trans_start()) Transaction parameters: tp_credits 218, j_max_transaction_buffers 1024
00000004:00020000:1.0:1324042446.027256:0:28055:0:(osd_handler.c:620:osd_trans_start()) Transaction parameters: tp_credits 234, j_max_transaction_buffers 1024
00000004:00020000:1.0:1324042446.071215:0:28059:0:(osd_handler.c:620:osd_trans_start()) Transaction parameters: tp_credits 154, j_max_transaction_buffers 1024
00000004:00020000:1.0:1324042446.106079:0:28063:0:(osd_handler.c:620:osd_trans_start()) Transaction parameters: tp_credits 154, j_max_transaction_buffers 1024
00000004:00020000:1.0:1324042446.141006:0:28067:0:(osd_handler.c:620:osd_trans_start()) Transaction parameters: tp_credits 282, j_max_transaction_buffers 1024
00000004:00020000:1.0:1324042446.175980:0:28071:0:(osd_handler.c:620:osd_trans_start()) Transaction parameters: tp_credits 154, j_max_transaction_buffers 1024
00000004:00020000:1.0:1324042446.211506:0:28075:0:(osd_handler.c:620:osd_trans_start()) Transaction parameters: tp_credits 314, j_max_transaction_buffers 1024
Debug log: 10 lines, 10 kept, 0 dropped, 0 bad.



# lctl dk
00000004:00020000:3.0F:1324042453.048705:0:28080:0:(osd_handler.c:620:osd_trans_start()) Transaction parameters: tp_credits 170, j_max_transaction_buffers 1024
00000004:00020000:3.0:1324042453.085640:0:28084:0:(osd_handler.c:620:osd_trans_start()) Transaction parameters: tp_credits 186, j_max_transaction_buffers 1024
00000004:00020000:1.0F:1324042453.119569:0:28088:0:(osd_handler.c:620:osd_trans_start()) Transaction parameters: tp_credits 154, j_max_transaction_buffers 1024
00000004:00020000:1.0:1324042453.153856:0:28092:0:(osd_handler.c:620:osd_trans_start()) Transaction parameters: tp_credits 218, j_max_transaction_buffers 1024
00000004:00020000:1.0:1324042453.188227:0:28096:0:(osd_handler.c:620:osd_trans_start()) Transaction parameters: tp_credits 154, j_max_transaction_buffers 1024
00000004:00020000:1.0:1324042453.223103:0:28100:0:(osd_handler.c:620:osd_trans_start()) Transaction parameters: tp_credits 250, j_max_transaction_buffers 1024
00000004:00020000:1.0:1324042453.258033:0:28105:0:(osd_handler.c:620:osd_trans_start()) Transaction parameters: tp_credits 154, j_max_transaction_buffers 1024
00000004:00020000:0.0F:1324042453.293552:0:28108:0:(osd_handler.c:620:osd_trans_start()) Transaction parameters: tp_credits 154, j_max_transaction_buffers 1024
00000004:00020000:1.0:1324042453.328871:0:28112:0:(osd_handler.c:620:osd_trans_start()) Transaction parameters: tp_credits 298, j_max_transaction_buffers 1024
00000004:00020000:1.0:1324042453.364038:0:28116:0:(osd_handler.c:620:osd_trans_start()) Transaction parameters: tp_credits 154, j_max_transaction_buffers 1024
Debug log: 10 lines, 10 kept, 0 dropped, 0 bad.
Comment by Zhenyu Xu [ 16/Dec/11 ]

2.1+patches journal credit calculation is what we expected, and the error is due to the small size of journal device space, it cannot hold single transaction journal size requirement.

2.0+patches journal credit calculation is odd. I've checked the log, the stripe count increased with every mdd_attr_set operation, and mdd_log_op_setattr() indeed returns correct number of llog writing which equals to the stripe count.

Comment by Andreas Dilger [ 16/Dec/11 ]

The 1024 block transaction limit is only because you must be testing with a very tiny loopback filesystem for the MDT. The default for a real MDT is 1/4 of 400MB, so 25000 blocks.

When you write that 2.0 crashes in this testing - does that mean it hits the BUG reported for this ticket, while 2.1 only prints the error message?

Comment by Patrick Valentin (Inactive) [ 19/Dec/11 ]

In the previous tests, the sizes of the loop devices were the folowing:
{{ MGT=100Mb, MDT=200Mb, OST=10Mb}}

I modified these sizes as follow:
{{ MGT=200Mb, MDT=1Gb, OST=50Mb}}

With Lustre 2.1, I no longer have the previous errors, neither with "rm", nor with "chown".

With Lustre 2.0, I no longer have errors with "rm" command, But I still have the crash with "chown" command, and the stack is the one reported in this ticket, except the runing command, which is "mdt_xxx" instead of "tgt_recov".

crash> bt
PID: 29332  TASK: ffff8804fe88ad60  CPU: 1   COMMAND: "mdt_179"
 #0 [ffff8804fe88ed80] machine_kexec at ffffffff8102e77b
 #1 [ffff8804fe88ede0] crash_kexec at ffffffff810a6cd8
 #2 [ffff8804fe88eeb0] oops_end at ffffffff8146aad0
 #3 [ffff8804fe88eee0] die at ffffffff8101021b
 #4 [ffff8804fe88ef10] do_trap at ffffffff8146a3a4
 #5 [ffff8804fe88ef70] do_invalid_op at ffffffff8100dda5
 #6 [ffff8804fe88f010] invalid_op at ffffffff8100cf3b
    [exception RIP: jbd2_journal_dirty_metadata+269]
    RIP: ffffffffa01478ed  RSP: ffff8804fe88f0c0  RFLAGS: 00010246
    RAX: ffff880278d6d980  RBX: ffff88028bf1c6d8  RCX: ffff880283424dc8
    RDX: 0000000000000000  RSI: ffff880283424dc8  RDI: 0000000000000000
    RBP: ffff8804fe88f0e0   R8: 9010000000000000   R9: fd5ebe53088b7202
    R10: 0000000000000001  R11: 00000000000016b0  R12: ffff88028bf93ac8
    R13: ffff880283424dc8  R14: ffff88032bfc6800  R15: 0000000000001000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffff8804fe88f0e8] __ldiskfs_handle_dirty_metadata at ffffffffa043d3fb
 #8 [ffff8804fe88f128] fsfilt_ldiskfs_write_handle at ffffffffa052dde7
 #9 [ffff8804fe88f1d8] fsfilt_ldiskfs_write_record at ffffffffa052e0fe
#10 [ffff8804fe88f258] llog_lvfs_write_blob at ffffffffa096618c
#11 [ffff8804fe88f308] llog_lvfs_write_rec at ffffffffa09679fa
#12 [ffff8804fe88f3e8] llog_cat_add_rec at ffffffffa0964905
#13 [ffff8804fe88f468] llog_obd_origin_add at ffffffffa096b1a6
#14 [ffff8804fe88f4c8] llog_add at ffffffffa096b381
#15 [ffff8804fe88f538] lov_llog_origin_add at ffffffffa0dc40cc
#16 [ffff8804fe88f5e8] llog_add at ffffffffa096b381
#17 [ffff8804fe88f658] mds_llog_origin_add at ffffffffa0543d29
#18 [ffff8804fe88f6d8] llog_add at ffffffffa096b381
#19 [ffff8804fe88f748] mdd_log_op_setattr at ffffffffa056a9da
#20 [ffff8804fe88f7e8] mdd_setattr_log at ffffffffa056b650
#21 [ffff8804fe88f858] mdd_attr_set at ffffffffa0561e32
#22 [ffff8804fe88f968] cml_attr_set at ffffffffa06817af
#23 [ffff8804fe88f9c8] mdt_attr_set at ffffffffa06456f4
#24 [ffff8804fe88fa58] mdt_reint_setattr at ffffffffa0645e4e
#25 [ffff8804fe88fae8] mdt_reint_rec at ffffffffa064167f
#26 [ffff8804fe88fb38] mdt_reint_internal at ffffffffa0638a34
#27 [ffff8804fe88fbc8] mdt_reint at ffffffffa0638d9c
#28 [ffff8804fe88fc18] mdt_handle_common at ffffffffa062d9f5
#29 [ffff8804fe88fc98] mdt_regular_handle at ffffffffa062ea05
#30 [ffff8804fe88fca8] ptlrpc_server_handle_request at ffffffffa0c3f641
#31 [ffff8804fe88fde8] ptlrpc_main at ffffffffa0c409e2
#32 [ffff8804fe88ff48] kernel_thread at ffffffff8100d1aa

I tried several times with the following loop, and it crashes 2 times at 50 stripes, and 3 times at 48 stripes.

# for i in `seq 1 64`; do
        echo "chown on file with $i stripes"
        chown nobody /fs64/dir$i/file
done
Comment by Peter Jones [ 19/Dec/11 ]

Patrick

When are you planning to upgrade to 2.1? There are a lot of changes between 2.0 and 2.1 so it seems that it would be more productive to focus efforts on working from that baseline.

Peter

Comment by Patrick Valentin (Inactive) [ 20/Dec/11 ]

Peter,

CEA plans tests on lustre 2.1 on a restricted number of nodes on beginning of january, for a probable upgrade of the whole cluster mid-february or beginning of march (the next large mantenance window).
The support team reported that since the end of past week, another Bull customer, AWE, is also hitting this issue during acceptance phase.

Regards,
Patrick

Comment by Zhenyu Xu [ 08/Jan/12 ]

OSD API changes from LU-909 should have fixed this issue.

Comment by Build Master (Inactive) [ 16/Feb/12 ]

Integrated in lustre-master » x86_64,server,el5,ofa #476
LU-601 mdd: cleanup error messages and code style (Revision adc30ea680674fc8aec6377dff60748bb566912a)

Result = SUCCESS
Oleg Drokin : adc30ea680674fc8aec6377dff60748bb566912a
Files :

  • lustre/mdd/mdd_orphans.c
  • lustre/tests/replay-single.sh
  • lustre/mdd/mdd_internal.h
Comment by Build Master (Inactive) [ 16/Feb/12 ]

Integrated in lustre-master » x86_64,server,el5,inkernel #476
LU-601 mdd: cleanup error messages and code style (Revision adc30ea680674fc8aec6377dff60748bb566912a)

Result = SUCCESS
Oleg Drokin : adc30ea680674fc8aec6377dff60748bb566912a
Files :

  • lustre/mdd/mdd_orphans.c
  • lustre/mdd/mdd_internal.h
  • lustre/tests/replay-single.sh
Comment by Build Master (Inactive) [ 16/Feb/12 ]

Integrated in lustre-master » x86_64,client,ubuntu1004,inkernel #476
LU-601 mdd: cleanup error messages and code style (Revision adc30ea680674fc8aec6377dff60748bb566912a)

Result = SUCCESS
Oleg Drokin : adc30ea680674fc8aec6377dff60748bb566912a
Files :

  • lustre/mdd/mdd_orphans.c
  • lustre/tests/replay-single.sh
  • lustre/mdd/mdd_internal.h
Comment by Build Master (Inactive) [ 16/Feb/12 ]

Integrated in lustre-master » x86_64,client,el5,ofa #476
LU-601 mdd: cleanup error messages and code style (Revision adc30ea680674fc8aec6377dff60748bb566912a)

Result = SUCCESS
Oleg Drokin : adc30ea680674fc8aec6377dff60748bb566912a
Files :

  • lustre/mdd/mdd_orphans.c
  • lustre/mdd/mdd_internal.h
  • lustre/tests/replay-single.sh
Comment by Build Master (Inactive) [ 16/Feb/12 ]

Integrated in lustre-master » x86_64,client,sles11,inkernel #476
LU-601 mdd: cleanup error messages and code style (Revision adc30ea680674fc8aec6377dff60748bb566912a)

Result = SUCCESS
Oleg Drokin : adc30ea680674fc8aec6377dff60748bb566912a
Files :

  • lustre/mdd/mdd_orphans.c
  • lustre/tests/replay-single.sh
  • lustre/mdd/mdd_internal.h
Comment by Build Master (Inactive) [ 16/Feb/12 ]

Integrated in lustre-master » x86_64,client,el5,inkernel #476
LU-601 mdd: cleanup error messages and code style (Revision adc30ea680674fc8aec6377dff60748bb566912a)

Result = SUCCESS
Oleg Drokin : adc30ea680674fc8aec6377dff60748bb566912a
Files :

  • lustre/tests/replay-single.sh
  • lustre/mdd/mdd_orphans.c
  • lustre/mdd/mdd_internal.h
Comment by Build Master (Inactive) [ 16/Feb/12 ]

Integrated in lustre-master » i686,client,el6,inkernel #476
LU-601 mdd: cleanup error messages and code style (Revision adc30ea680674fc8aec6377dff60748bb566912a)

Result = SUCCESS
Oleg Drokin : adc30ea680674fc8aec6377dff60748bb566912a
Files :

  • lustre/tests/replay-single.sh
  • lustre/mdd/mdd_orphans.c
  • lustre/mdd/mdd_internal.h
Comment by Build Master (Inactive) [ 16/Feb/12 ]

Integrated in lustre-master » x86_64,client,el6,inkernel #476
LU-601 mdd: cleanup error messages and code style (Revision adc30ea680674fc8aec6377dff60748bb566912a)

Result = SUCCESS
Oleg Drokin : adc30ea680674fc8aec6377dff60748bb566912a
Files :

  • lustre/mdd/mdd_orphans.c
  • lustre/tests/replay-single.sh
  • lustre/mdd/mdd_internal.h
Comment by Build Master (Inactive) [ 16/Feb/12 ]

Integrated in lustre-master » i686,server,el5,ofa #476
LU-601 mdd: cleanup error messages and code style (Revision adc30ea680674fc8aec6377dff60748bb566912a)

Result = SUCCESS
Oleg Drokin : adc30ea680674fc8aec6377dff60748bb566912a
Files :

  • lustre/tests/replay-single.sh
  • lustre/mdd/mdd_orphans.c
  • lustre/mdd/mdd_internal.h
Comment by Build Master (Inactive) [ 16/Feb/12 ]

Integrated in lustre-master » i686,client,el5,ofa #476
LU-601 mdd: cleanup error messages and code style (Revision adc30ea680674fc8aec6377dff60748bb566912a)

Result = SUCCESS
Oleg Drokin : adc30ea680674fc8aec6377dff60748bb566912a
Files :

  • lustre/tests/replay-single.sh
  • lustre/mdd/mdd_internal.h
  • lustre/mdd/mdd_orphans.c
Comment by Build Master (Inactive) [ 16/Feb/12 ]

Integrated in lustre-master » i686,client,el5,inkernel #476
LU-601 mdd: cleanup error messages and code style (Revision adc30ea680674fc8aec6377dff60748bb566912a)

Result = SUCCESS
Oleg Drokin : adc30ea680674fc8aec6377dff60748bb566912a
Files :

  • lustre/tests/replay-single.sh
  • lustre/mdd/mdd_internal.h
  • lustre/mdd/mdd_orphans.c
Comment by Build Master (Inactive) [ 16/Feb/12 ]

Integrated in lustre-master » i686,server,el5,inkernel #476
LU-601 mdd: cleanup error messages and code style (Revision adc30ea680674fc8aec6377dff60748bb566912a)

Result = SUCCESS
Oleg Drokin : adc30ea680674fc8aec6377dff60748bb566912a
Files :

  • lustre/mdd/mdd_orphans.c
  • lustre/tests/replay-single.sh
  • lustre/mdd/mdd_internal.h
Comment by Build Master (Inactive) [ 16/Feb/12 ]

Integrated in lustre-master » x86_64,server,el6,inkernel #476
LU-601 mdd: cleanup error messages and code style (Revision adc30ea680674fc8aec6377dff60748bb566912a)

Result = SUCCESS
Oleg Drokin : adc30ea680674fc8aec6377dff60748bb566912a
Files :

  • lustre/mdd/mdd_orphans.c
  • lustre/mdd/mdd_internal.h
  • lustre/tests/replay-single.sh
Comment by Build Master (Inactive) [ 16/Feb/12 ]

Integrated in lustre-master » i686,server,el6,inkernel #476
LU-601 mdd: cleanup error messages and code style (Revision adc30ea680674fc8aec6377dff60748bb566912a)

Result = SUCCESS
Oleg Drokin : adc30ea680674fc8aec6377dff60748bb566912a
Files :

  • lustre/mdd/mdd_orphans.c
  • lustre/tests/replay-single.sh
  • lustre/mdd/mdd_internal.h
Comment by Build Master (Inactive) [ 17/Feb/12 ]

Integrated in lustre-master » x86_64,server,el6,ofa #480
LU-601 mdd: cleanup error messages and code style (Revision adc30ea680674fc8aec6377dff60748bb566912a)

Result = FAILURE
Oleg Drokin : adc30ea680674fc8aec6377dff60748bb566912a
Files :

  • lustre/mdd/mdd_internal.h
  • lustre/mdd/mdd_orphans.c
  • lustre/tests/replay-single.sh
Comment by Build Master (Inactive) [ 17/Feb/12 ]

Integrated in lustre-master » x86_64,client,el6,ofa #480
LU-601 mdd: cleanup error messages and code style (Revision adc30ea680674fc8aec6377dff60748bb566912a)

Result = FAILURE
Oleg Drokin : adc30ea680674fc8aec6377dff60748bb566912a
Files :

  • lustre/tests/replay-single.sh
  • lustre/mdd/mdd_internal.h
  • lustre/mdd/mdd_orphans.c
Comment by Build Master (Inactive) [ 17/Feb/12 ]

Integrated in lustre-master » i686,client,el6,ofa #480
LU-601 mdd: cleanup error messages and code style (Revision adc30ea680674fc8aec6377dff60748bb566912a)

Result = ABORTED
Oleg Drokin : adc30ea680674fc8aec6377dff60748bb566912a
Files :

  • lustre/mdd/mdd_orphans.c
  • lustre/mdd/mdd_internal.h
  • lustre/tests/replay-single.sh
Comment by Peter Jones [ 26/Feb/12 ]

Landed for 2.1.1 and 2.2

Generated at Sat Feb 10 01:08:38 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.