Details

    • Bug
    • Resolution: Cannot Reproduce
    • Critical
    • None
    • Lustre 2.5.2
    • None
    • 3
    • 15792

    Description

      We had mds crash on our lustre cluster after we updated from 2.4 to 2.5 .
      We have build new luster client for the new kernel on client node.
      This is quite similar to LU-5040.

      Attachments

        Issue Links

          Activity

            [LU-5640] mds crash after update

            Looks like the transaction handle ran out of credits in close trying to get quota for a setattr (likely updating atime on close). That is a bit strange, since we shouldn't need to get extra credits for quota when modifying just the timestamp.

            We currently calls dquot_initialize() for all kinds of setattr (despite of if uid/gid will be changed), this will be fixed by LU-5040.

            niu Niu Yawei (Inactive) added a comment - Looks like the transaction handle ran out of credits in close trying to get quota for a setattr (likely updating atime on close). That is a bit strange, since we shouldn't need to get extra credits for quota when modifying just the timestamp. We currently calls dquot_initialize() for all kinds of setattr (despite of if uid/gid will be changed), this will be fixed by LU-5040 .

            Looks like the transaction handle ran out of credits in close trying to get quota for a setattr (likely updating atime on close). That is a bit strange, since we shouldn't need to get extra credits for quota when modifying just the timestamp.

            Sep 16 23:13:09 fimm-mds1 kernel: kernel BUG at fs/jbd2/transaction.c:1033!
            Sep 16 23:13:09 fimm-mds1 kernel: invalid opcode: 0000 [#1] SMP 
            Sep 16 23:13:09 fimm-mds1 kernel: Pid: 5418, comm: mdt_rdpg03_002 Not tainted 2.6.32-431.17.1.el6_lustre.x86_64 #1 Dell Inc. RIP: 0010:[<ffffffffa00688ad>]  [<ffffffffa00688ad>] jbd2_journal_dirty_metadata+0x10d/0x150 [jbd2]
            __ldiskfs_handle_dirty_metadata+0x7b/0x100 [ldiskfs]
            ldiskfs_quota_write+0x165/0x210 [ldiskfs]
            v2_write_file_info+0xa1/0xe0
            dquot_acquire+0x138/0x140
            ldiskfs_acquire_dquot+0x66/0xb0 [ldiskfs]
            dqget+0x2ac/0x390
            dquot_initialize+0x98/0x240
            ldiskfs_dquot_initialize+0x83/0xd0 [ldiskfs]
            osd_attr_set+0x12f/0x540 [osd_ldiskfs]
            lod_attr_set+0x12b/0x450 [lod]
            mdd_attr_set_internal+0x151/0x230 [mdd]
            mdd_attr_set+0x117a/0x1470 [mdd]
            mdt_mfd_close+0x7ac/0x1bc0 [mdt]
            mdt_close+0x642/0xa80 [mdt]
            mdt_handle_common+0x52a/0x1470 [mdt]
            mds_readpage_handle+0x15/0x20 [mdt]
            ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc]
            ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]
            ptlrpc_main+0xaed/0x1740 [ptlrpc]
            

            This might be fixed with the newer quota patches.

            adilger Andreas Dilger added a comment - Looks like the transaction handle ran out of credits in close trying to get quota for a setattr (likely updating atime on close). That is a bit strange, since we shouldn't need to get extra credits for quota when modifying just the timestamp. Sep 16 23:13:09 fimm-mds1 kernel: kernel BUG at fs/jbd2/transaction.c:1033! Sep 16 23:13:09 fimm-mds1 kernel: invalid opcode: 0000 [#1] SMP Sep 16 23:13:09 fimm-mds1 kernel: Pid: 5418, comm: mdt_rdpg03_002 Not tainted 2.6.32-431.17.1.el6_lustre.x86_64 #1 Dell Inc. RIP: 0010:[<ffffffffa00688ad>] [<ffffffffa00688ad>] jbd2_journal_dirty_metadata+0x10d/0x150 [jbd2] __ldiskfs_handle_dirty_metadata+0x7b/0x100 [ldiskfs] ldiskfs_quota_write+0x165/0x210 [ldiskfs] v2_write_file_info+0xa1/0xe0 dquot_acquire+0x138/0x140 ldiskfs_acquire_dquot+0x66/0xb0 [ldiskfs] dqget+0x2ac/0x390 dquot_initialize+0x98/0x240 ldiskfs_dquot_initialize+0x83/0xd0 [ldiskfs] osd_attr_set+0x12f/0x540 [osd_ldiskfs] lod_attr_set+0x12b/0x450 [lod] mdd_attr_set_internal+0x151/0x230 [mdd] mdd_attr_set+0x117a/0x1470 [mdd] mdt_mfd_close+0x7ac/0x1bc0 [mdt] mdt_close+0x642/0xa80 [mdt] mdt_handle_common+0x52a/0x1470 [mdt] mds_readpage_handle+0x15/0x20 [mdt] ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc] ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc] ptlrpc_main+0xaed/0x1740 [ptlrpc] This might be fixed with the newer quota patches.

            People

              wc-triage WC Triage
              sardar saerda (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: