[LU-1235] timeout in sanity subtest 103,unable to handle kernel paging request Created: 19/Mar/12  Updated: 29/May/17  Resolved: 29/May/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.3.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Sarah Liu Assignee: Zhenyu Xu
Resolution: Duplicate Votes: 0
Labels: None
Environment:

server: 2.2-RC1-RHEL6
client: 2.1.1-RHEL6


Issue Links:
Duplicate
duplicates LU-1823 sanity/103: slab corruption Resolved
is duplicated by LU-1296 Test failure on test suite sanity, su... Resolved
Severity: 3
Rank (Obsolete): 4536

 Description   

hit this issue again when doing interop test between 2.2-RC1 server and 2.1.1 RHEL6 client:
https://maloo.whamcloud.com/test_sets/6b0e0a92-714a-11e1-a89e-5254004bbbd3



 Comments   
Comment by Peter Jones [ 19/Mar/12 ]

Bobi

Could you please comment on this one?

Thanks

Peter

Comment by Zhenyu Xu [ 19/Mar/12 ]

MDS panic on bad IP, in osd_trans_commit_cb(), I think its the bad journal callback function address caused the panic.

20:01:48:BUG: unable to handle kernel paging request at 0000000400000002
20:01:48:IP: [<0000000400000002>] 0x400000002
20:01:48:PGD 72aa2067 PUD 0
20:01:48:Oops: 0010 1 SMP
20:01:48:last sysfs file: /sys/module/obdclass/initstate
20:01:48:CPU 0
20:01:48:Modules linked in: nfs fscache cmm(U) osd_ldiskfs(U) mdt(U) mdd(U) mds(U) fsfilt_ldiskfs(U) mgs(U) mgc(U) lustre(U) lquota(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) ldiskfs(U) jbd2 nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: llog_test]
20:01:48:
20:01:48:Pid: 4547, comm: jbd2/dm-0-8 Not tainted 2.6.32-220.4.2.el6_lustre.x86_64 #1 Red Hat KVM
20:01:48:RIP: 0010:[<0000000400000002>] [<0000000400000002>] 0x400000002 ===============> BAD RIP value
20:01:48:RSP: 0018:ffff88003f6bfca8 EFLAGS: 00010246
20:01:49:RAX: ffff880040263dc0 RBX: ffff8800553d03c0 RCX: 0000000000000000
20:01:49:RDX: ffff880040263dc0 RSI: ffff8800553d03c0 RDI: 0000000000000000
20:01:49:RBP: ffff88003f6bfce0 R08: 00000000ffffff0a R09: 0000000000000000
20:01:49:R10: 000000000000000f R11: 0000000000000000 R12: 0000000000000000
20:01:49:R13: ffff880037c26200 R14: 0006000100000002 R15: ffff8800553d0430
20:01:49:FS: 0000000000000000(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
20:01:49:CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
20:01:49:CR2: 0000000400000002 CR3: 0000000072ab9000 CR4: 00000000000006f0
20:01:49:DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
20:01:49:DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
20:01:49:Process jbd2/dm-0-8 (pid: 4547, threadinfo ffff88003f6be000, task ffff880037f1a040)
20:01:49:Stack:
20:01:49: ffffffffa0b8f1c9 0000000000000d88 ffff880068e414d8 ffff88003ef5368c
20:01:49:<0> ffff88003ef78c00 ffff88004db21c90 0000000000000000 ffff88003f6bfd20
20:01:49:<0> ffffffffa03ee36a ffff88003f6bfd20 ffff880068ec4b9c ffff88004db21bc0
20:01:49:Call Trace:
20:01:50: [<ffffffffa0b8f1c9>] ? osd_trans_commit_cb+0x79/0x1e0 [osd_ldiskfs]
20:01:50: [<ffffffffa03ee36a>] ldiskfs_journal_commit_callback+0x8a/0xc0 [ldiskfs]
20:01:50: [<ffffffffa039a89f>] jbd2_journal_commit_transaction+0x110f/0x1530 [jbd2]
20:01:50: [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
20:01:50: [<ffffffff8107ca1b>] ? try_to_del_timer_sync+0x7b/0xe0
20:01:50: [<ffffffffa039faf8>] kjournald2+0xb8/0x220 [jbd2]
20:01:50: [<ffffffff81090a90>] ? autoremove_wake_function+0x0/0x40
20:01:50: [<ffffffffa039fa40>] ? kjournald2+0x0/0x220 [jbd2]
20:01:50: [<ffffffff81090726>] kthread+0x96/0xa0
20:01:50: [<ffffffff8100c14a>] child_rip+0xa/0x20
20:01:50: [<ffffffff81090690>] ? kthread+0x0/0xa0
20:01:50: [<ffffffff8100c140>] ? child_rip+0x0/0x20
20:01:50:Code: Bad RIP value.
20:01:50:RIP [<0000000400000002>] 0x400000002
20:01:50: RSP <ffff88003f6bfca8>
20:01:50:CR2: 0000000400000002
20:01:50:--[ end trace e31e50250e65373c ]--
20:01:50:Kernel panic - not syncing: Fatal exception
20:01:50:Pid: 4547, comm: jbd2/dm-0-8 Tainted: G D ---------------- 2.6.32-220.4.2.el6_lustre.x86_64 #1
20:01:50:Call Trace:
20:01:50: [<ffffffff814ec61a>] ? panic+0x78/0x143
20:01:50: [<ffffffff814f07a4>] ? oops_end+0xe4/0x100
20:01:51: [<ffffffff8104234b>] ? no_context+0xfb/0x260
20:01:51: [<ffffffff810425d5>] ? __bad_area_nosemaphore+0x125/0x1e0
20:01:51: [<ffffffff812755b6>] ? vsnprintf+0x2b6/0x5f0
20:01:51: [<ffffffff810426a3>] ? bad_area_nosemaphore+0x13/0x20
20:01:51: [<ffffffff81042d5d>] ? __do_page_fault+0x31d/0x480
20:01:51: [<ffffffffa043a19b>] ? cfs_set_ptldebug_header+0x2b/0xc0 [libcfs]
20:01:51: [<ffffffffa0443e31>] ? libcfs_debug_vmsg2+0x4e1/0xb60 [libcfs]
20:01:51: [<ffffffff814f275e>] ? do_page_fault+0x3e/0xa0
20:01:51: [<ffffffff814efb15>] ? page_fault+0x25/0x30
20:01:51: [<ffffffffa0b8f1c9>] ? osd_trans_commit_cb+0x79/0x1e0 [osd_ldiskfs]
20:01:51: [<ffffffffa03ee36a>] ? ldiskfs_journal_commit_callback+0x8a/0xc0 [ldiskfs]
20:01:51: [<ffffffffa039a89f>] ? jbd2_journal_commit_transaction+0x110f/0x1530 [jbd2]
20:01:51: [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
20:01:51: [<ffffffff8107ca1b>] ? try_to_del_timer_sync+0x7b/0xe0
20:01:51: [<ffffffffa039faf8>] ? kjournald2+0xb8/0x220 [jbd2]
20:01:51: [<ffffffff81090a90>] ? autoremove_wake_function+0x0/0x40
20:01:51: [<ffffffffa039fa40>] ? kjournald2+0x0/0x220 [jbd2]
20:01:51: [<ffffffff81090726>] ? kthread+0x96/0xa0
20:01:52: [<ffffffff8100c14a>] ? child_rip+0xa/0x20
20:01:52: [<ffffffff81090690>] ? kthread+0x0/0xa0
20:01:52: [<ffffffff8100c140>] ? child_rip+0x0/0x20

osd_trans_commit_cb()
static void osd_trans_commit_cb(struct journal_callback *jcb, int error)
{
        struct osd_thandle *oh = container_of0(jcb, struct osd_thandle, ot_jcb);
        struct thandle     *th  = &oh->ot_super;
        struct lu_device   *lud = &th->th_dev->dd_lu_dev;
        struct dt_txn_commit_cb *dcb, *tmp;

        LASSERT(oh->ot_handle == NULL);

        if (error)
                CERROR("transaction @0x%p commit error: %d\n", th, error);

        dt_txn_hook_commit(th);

        /* call per-transaction callbacks if any */
        cfs_list_for_each_entry_safe(dcb, tmp, &oh->ot_dcb_list, dcb_linkage)
                dcb->dcb_func(NULL, th, dcb, error);    // ===========> BAD RIP <==============

        lu_ref_del_at(&lud->ld_reference, oh->ot_dev_link, "osd-tx", th);
        lu_device_put(lud);
        th->th_dev = NULL;

        lu_context_exit(&th->th_ctx);
        lu_context_fini(&th->th_ctx);
        OBD_FREE_PTR(oh);
}
Comment by Sarah Liu [ 26/Mar/12 ]

got this error again in RC2 testing,server/client: RHEL6-ofed , https://maloo.whamcloud.com/test_sets/175f9a26-770f-11e1-a169-5254004bbbd3

Comment by Zhenyu Xu [ 28/Mar/12 ]

Sarah,

Would you mind loading this patch http://review.whamcloud.com/2394 and trying to hit the issue again?

Comment by Sarah Liu [ 28/Mar/12 ]

Sure, will keep you updated.

Comment by Zhenyu Xu [ 10/Apr/12 ]

crash> dis osd_trans_commit_cb+0x79
0xffffffffa0b1dcd9 <osd_trans_commit_cb+121>: cmp %r15,%r14

crash> dis osd_trans_commit_cb
0xffffffffa0b1dc60 <osd_trans_commit_cb>: push %rbp
0xffffffffa0b1dc61 <osd_trans_commit_cb+1>: mov %rsp,%rbp
0xffffffffa0b1dc64 <osd_trans_commit_cb+4>: push %r15
0xffffffffa0b1dc66 <osd_trans_commit_cb+6>: push %r14
0xffffffffa0b1dc68 <osd_trans_commit_cb+8>: push %r13
0xffffffffa0b1dc6a <osd_trans_commit_cb+10>: push %r12
0xffffffffa0b1dc6c <osd_trans_commit_cb+12>: push %rbx
0xffffffffa0b1dc6d <osd_trans_commit_cb+13>: sub $0x8,%rsp
0xffffffffa0b1dc71 <osd_trans_commit_cb+17>: nopl 0x0(%rax,%rax,1)
0xffffffffa0b1dc76 <osd_trans_commit_cb+22>: cmp $0xfffffffffffff000,%rsi
0xffffffffa0b1dc7d <osd_trans_commit_cb+29>: mov %edx,%r12d
0xffffffffa0b1dc80 <osd_trans_commit_cb+32>: ja 0xffffffffa0b1de32
0xffffffffa0b1dc86 <osd_trans_commit_cb+38>: test %rsi,%rsi
0xffffffffa0b1dc89 <osd_trans_commit_cb+41>: je 0xffffffffa0b1de32
0xffffffffa0b1dc8f <osd_trans_commit_cb+47>: lea -0x58(%rsi),%rbx
0xffffffffa0b1dc93 <osd_trans_commit_cb+51>: cmpq $0x0,0x50(%rbx)
0xffffffffa0b1dc98 <osd_trans_commit_cb+56>: mov (%rbx),%r13
0xffffffffa0b1dc9b <osd_trans_commit_cb+59>: jne 0xffffffffa0b1ddf6
0xffffffffa0b1dca1 <osd_trans_commit_cb+65>: test %r12d,%r12d
0xffffffffa0b1dca4 <osd_trans_commit_cb+68>: jne 0xffffffffa0b1dd90
0xffffffffa0b1dcaa <osd_trans_commit_cb+74>: mov %rbx,%rdi
0xffffffffa0b1dcad <osd_trans_commit_cb+77>: lea 0x70(%rbx),%r15
0xffffffffa0b1dcb1 <osd_trans_commit_cb+81>: callq 0xffffffffa050bfa0 <dt_txn_hook_commit>
0xffffffffa0b1dcb6 <osd_trans_commit_cb+86>: mov 0x70(%rbx),%rax
0xffffffffa0b1dcba <osd_trans_commit_cb+90>: cmp %r15,%rax
0xffffffffa0b1dcbd <osd_trans_commit_cb+93>: mov (%rax),%r14
0xffffffffa0b1dcc0 <osd_trans_commit_cb+96>: jne 0xffffffffa0b1dccb
0xffffffffa0b1dcc2 <osd_trans_commit_cb+98>: jmp 0xffffffffa0b1dce4
0xffffffffa0b1dcc4 <osd_trans_commit_cb+100>: nopl 0x0(%rax)
0xffffffffa0b1dcc8 <osd_trans_commit_cb+104>: mov %rdx,%r14
0xffffffffa0b1dccb <osd_trans_commit_cb+107>: mov %rax,%rdx
0xffffffffa0b1dcce <osd_trans_commit_cb+110>: xor %edi,%edi
0xffffffffa0b1dcd0 <osd_trans_commit_cb+112>: mov %r12d,%ecx
0xffffffffa0b1dcd3 <osd_trans_commit_cb+115>: mov %rbx,%rsi

0xffffffffa0b1dcd6 <osd_trans_commit_cb+118>: callq *0x10(%rax)
0xffffffffa0b1dcd9 <osd_trans_commit_cb+121>: cmp %r15,%r14 ===========> compare the next entry with the cb list head
0xffffffffa0b1dcdc <osd_trans_commit_cb+124>: mov (%r14),%rdx
0xffffffffa0b1dcdf <osd_trans_commit_cb+127>: mov %r14,%rax
0xffffffffa0b1dce2 <osd_trans_commit_cb+130>: jne 0xffffffffa0b1dcc8
0xffffffffa0b1dce4 <osd_trans_commit_cb+132>: lea 0x10(%rbx),%r12
0xffffffffa0b1dce8 <osd_trans_commit_cb+136>: mov %r13,%rdi
0xffffffffa0b1dceb <osd_trans_commit_cb+139>: callq 0xffffffffa0508530 <lu_device_put>

looks like the list was corrupted.

Comment by Zhenyu Xu [ 11/Apr/12 ]
0000000000000c60 <osd_trans_commit_cb>:
osd_trans_commit_cb():
BUILD/BUILD/lustre-2.2.50/lustre/osd-ldiskfs/osd_handler.c:546
     c60:       55                      push   %rbp
     c61:       48 89 e5                mov    %rsp,%rbp
     c64:       41 57                   push   %r15
     c66:       41 56                   push   %r14
     c68:       41 55                   push   %r13
     c6a:       41 54                   push   %r12
     c6c:       53                      push   %rbx
     c6d:       48 83 ec 08             sub    $0x8,%rsp
     c71:       e8 00 00 00 00          callq  c76 <osd_trans_commit_cb+0x16>
__container_of():
BUILD/BUILD/lustre-2.2.50/libcfs/include/libcfs/libcfs.h:321
     c76:       48 81 fe 00 f0 ff ff    cmp    $0xfffffffffffff000,%rsi
osd_trans_commit_cb():
BUILD/BUILD/lustre-2.2.50/lustre/osd-ldiskfs/osd_handler.c:546
     c7d:       41 89 d4                mov    %edx,%r12d
__container_of():
BUILD/BUILD/lustre-2.2.50/libcfs/include/libcfs/libcfs.h:321
     c80:       0f 87 ac 01 00 00       ja     e32 <osd_trans_commit_cb+0x1d2>
     c86:       48 85 f6                test   %rsi,%rsi
     c89:       0f 84 a3 01 00 00       je     e32 <osd_trans_commit_cb+0x1d2>
BUILD/BUILD/lustre-2.2.50/libcfs/include/libcfs/libcfs.h:324
     c8f:       48 8d 5e a8             lea    -0x58(%rsi),%rbx
osd_trans_commit_cb():
BUILD/BUILD/lustre-2.2.50/lustre/osd-ldiskfs/osd_handler.c:552
     c93:       48 83 7b 50 00          cmpq   $0x0,0x50(%rbx)
BUILD/BUILD/lustre-2.2.50/lustre/osd-ldiskfs/osd_handler.c:549
     c98:       4c 8b 2b                mov    (%rbx),%r13
BUILD/BUILD/lustre-2.2.50/lustre/osd-ldiskfs/osd_handler.c:552
     c9b:       0f 85 55 01 00 00       jne    df6 <osd_trans_commit_cb+0x196>
BUILD/BUILD/lustre-2.2.50/lustre/osd-ldiskfs/osd_handler.c:554
     ca1:       45 85 e4                test   %r12d,%r12d
     ca4:       0f 85 e6 00 00 00       jne    d90 <osd_trans_commit_cb+0x130>
BUILD/BUILD/lustre-2.2.50/lustre/osd-ldiskfs/osd_handler.c:557
     caa:       48 89 df                mov    %rbx,%rdi
BUILD/BUILD/lustre-2.2.50/lustre/osd-ldiskfs/osd_handler.c:560
     cad:       4c 8d 7b 70             lea    0x70(%rbx),%r15
BUILD/BUILD/lustre-2.2.50/lustre/osd-ldiskfs/osd_handler.c:557
     cb1:       e8 00 00 00 00          callq  cb6 <osd_trans_commit_cb+0x56>
BUILD/BUILD/lustre-2.2.50/lustre/osd-ldiskfs/osd_handler.c:560
     cb6:       48 8b 43 70             mov    0x70(%rbx),%rax
     cba:       4c 39 f8                cmp    %r15,%rax
     cbd:       4c 8b 30                mov    (%rax),%r14
     cc0:       75 09                   jne    ccb <osd_trans_commit_cb+0x6b>
     cc2:       eb 20                   jmp    ce4 <osd_trans_commit_cb+0x84>
     cc4:       0f 1f 40 00             nopl   0x0(%rax)
     cc8:       49 89 d6                mov    %rdx,%r14
BUILD/BUILD/lustre-2.2.50/lustre/osd-ldiskfs/osd_handler.c:561
     ccb:       48 89 c2                mov    %rax,%rdx
     cce:       31 ff                   xor    %edi,%edi
     cd0:       44 89 e1                mov    %r12d,%ecx
     cd3:       48 89 de                mov    %rbx,%rsi
     cd6:       ff 50 10                callq  *0x10(%rax)
BUILD/BUILD/lustre-2.2.50/lustre/osd-ldiskfs/osd_handler.c:560
     cd9:       4d 39 fe                cmp    %r15,%r14                =======> <osd_trans_commit_cb+0x79>
     cdc:       49 8b 16                mov    (%r14),%rdx
     cdf:       4c 89 f0                mov    %r14,%rax
     ce2:       75 e4                   jne    cc8 <osd_trans_commit_cb+0x68>
BUILD/BUILD/lustre-2.2.50/lustre/osd-ldiskfs/osd_handler.c:567
     ce4:       4c 8d 63 10             lea    0x10(%rbx),%r12
BUILD/BUILD/lustre-2.2.50/lustre/osd-ldiskfs/osd_handler.c:564
     ce8:       4c 89 ef                mov    %r13,%rdi
     ceb:       e8 00 00 00 00          callq  cf0 <osd_trans_commit_cb+0x90>
...

Comment by Zhenyu Xu [ 11/Apr/12 ]

Tappro,

Does it relate to LU-795 (http://review.whamcloud.com/1621)?

Comment by Mikhail Pershin [ 11/Apr/12 ]

yes, this code was added with LU-795. But the reason of failure is not yet clear for me and I see no any logs available, do we have any?

Comment by Zhenyu Xu [ 11/Apr/12 ]

unfortunately MDS paniced due to the bad memory access, and no logs were collected since.

Comment by Niu Yawei (Inactive) [ 13/Apr/12 ]

Hi, tappro

In the patch for LU-795, the sync transaction will be changed as async incorrectly in mdt_txn_stop_cb():

        /* if can't add callback, do sync write */
        txn->th_sync = !!lut_last_commit_cb_add(txn, &mdt->mdt_lut,
                                                mti->mti_exp,
                                                mti->mti_transno);

I think we need to open a new ticket for this defect.

Comment by Mikhail Pershin [ 13/Apr/12 ]

yes, '|=' should be there to don't drop sync but accumulate all possible sync cases into flag

Comment by Zhenyu Xu [ 10/May/12 ]

the 'txn->th_sync != !!lut_last_commit_cb_add' patch (http://review.whamcloud.com/2530) as been landed to master

Comment by Peter Jones [ 10/May/12 ]

ok then let's mark this as resolved and reopen if it is seen with code since that April 29th landing under LU-911

Comment by Peter Jones [ 11/May/12 ]

Hmm. I just realized that sanity is still failing for the 2.2.52 tag which contains the fix you mentioned. Are we now experiencing a different failure?

Comment by Mikhail Pershin [ 11/May/12 ]

Peter, the fix you mentioned is not for root cause but side issue. The LU-1235 is not yet fixed.

Comment by Peter Jones [ 11/May/12 ]

ok so what are the next steps for the central issue?

Comment by Andreas Dilger [ 31/May/12 ]

This still failed 3 times in the last 2 weeks (about 7% of runs according to Maloo):

https://maloo.whamcloud.com/sub_tests/bd350a56-a1fa-11e1-abdc-52540035b04c
https://maloo.whamcloud.com/sub_tests/715c8542-a6d6-11e1-90f2-52540035b04c
https://maloo.whamcloud.com/sub_tests/a2ffdcb2-a716-11e1-acdf-52540035b04c

I've resubmitted the build of the original debugging patch submitted in March.

Comment by Sarah Liu [ 11/Jun/12 ]

another failure: https://maloo.whamcloud.com/test_sets/353c939e-b1db-11e1-bb61-52540035b04c

Comment by Andreas Dilger [ 22/Jun/12 ]

This is being hit in a reported 27% of test runs:

https://maloo.whamcloud.com/test_sets/19a4c974-bbf1-11e1-95bf-52540035b04c

Comment by Andreas Dilger [ 28/Jun/12 ]

Bobijam, any progress on this bug?

Comment by Zhenyu Xu [ 28/Jun/12 ]

not yet, while another debugging patch is in review phase (http://review.whamcloud.com/#change,2394)

Comment by Peter Jones [ 30/Jul/12 ]

Latest diagnostic patch is landed for next tag.

Comment by Sarah Liu [ 07/Aug/12 ]

In the latest tag 2.2.92, subtest 103 passed on both RHEL5 and RHEL6 client

https://maloo.whamcloud.com/test_sets/64843e64-e0d3-11e1-a388-52540035b04c
https://maloo.whamcloud.com/test_sets/3f8f3644-dbc0-11e1-81e3-52540035b04c

Comment by Peter Jones [ 07/Aug/12 ]

ok then let's drop this from being a blocker unless it reoccurs and we are able to gather the diagnostic information from the logs.

Comment by Jian Yu [ 13/Aug/12 ]

Lustre Clients: v2_1_3_RC1
Lustre Build: http://build.whamcloud.com/job/lustre-b2_1/113/
Distro/Arch: RHEL6.3/x86_64 (kernel version: 2.6.32-279.2.1.el6)

Lustre Servers: 2.2.0
Lustre Build: http://build.whamcloud.com/job/lustre-b2_2/17/
Distro/Arch: RHEL6.3/x86_64 (kernel version: 2.6.32_220.4.2.el6)

The same issue occurred: https://maloo.whamcloud.com/test_sets/bc40a18e-e384-11e1-b6d3-52540035b04c

Comment by Zhenyu Xu [ 13/Aug/12 ]

I'll port the debugging patch to b2_2

Comment by Zhenyu Xu [ 13/Aug/12 ]

b2_2 patch port tracking at http://review.whamcloud.com/3615

Comment by Sarah Liu [ 27/Sep/12 ]

server: 2.2.0 RHEL6
client: 2.3-RC1 RHEL6

https://maloo.whamcloud.com/test_sets/4bddaaee-0806-11e2-b8a8-52540035b04c

Comment by Jian Yu [ 08/Oct/12 ]

Lustre Client Build: http://build.whamcloud.com/job/lustre-b2_3/28
Lustre Server Build: http://build.whamcloud.com/job/lustre-b2_2/17
Distro/Arch: RHEL6.3/x86_64

The same issue occurred: https://maloo.whamcloud.com/test_sets/e151ca0a-0e2e-11e2-91a3-52540035b04c

As per Peter, we don't have any plans to land anything to b2_2 at this time. We can add Lustre version check code in b2_3 and master test suites to skip the test as what we did in LU-1912.

Comment by Ann Koehler (Inactive) [ 24/Jul/14 ]

Just in case this helps anyone else: we hit the MDS panic in jbd2/dm-0-8 reported above with b2_2. We tracked the root cause to LU-1823, slab corruption.

Generated at Sat Feb 10 01:14:48 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.