Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.5.4
    • Lustre 2.5.3
    • None
    • 3
    • 15702

    Description

      This was introduced when porting the 3902ff4c54925b2f1fcb732a32ed7ee5428e9f77

      Some bits in osd_declare_write() are lost during porting.

      Attachments

        Activity

          [LU-5612] typo in osd_declare_write()
          haasken Ryan Haasken added a comment -

          Now that I look more closely at our own stack traces, it turns out we got a stack trace including dqget and do_insert_tree when we attempted to restart the file system after the crash. Niu, can you confirm that the fix which landed for LU-5040 fixes the bug shown in this stack trace:

          https://jira.hpdd.intel.com/browse/LU-5040?focusedCommentId=90730&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-90730

          That stack trace is not for the bug which was caused by this typo, is it?

          haasken Ryan Haasken added a comment - Now that I look more closely at our own stack traces, it turns out we got a stack trace including dqget and do_insert_tree when we attempted to restart the file system after the crash. Niu, can you confirm that the fix which landed for LU-5040 fixes the bug shown in this stack trace: https://jira.hpdd.intel.com/browse/LU-5040?focusedCommentId=90730&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-90730 That stack trace is not for the bug which was caused by this typo, is it?
          haasken Ryan Haasken added a comment -

          Thanks for pointing those other tickets out. Our stack trace is slightly different than those listed in LU-5040. Here it is, copied from LU-5250:

          [exception RIP: jbd2_journal_dirty_metadata+268]
          RIP: ffffffffa02cc86c RSP: ffff88087be375e0 RFLAGS: 00010246
          RAX: ffff8806485b3bc0 RBX: ffff8806f520d588 RCX: ffff88084223bcf8
          RDX: 0000000000000000 RSI: ffff88084223bcf8 RDI: 0000000000000000
          RBP: ffff88087be37600 R8: f010000000000000 R9: f79fde5390e73e02
          R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801eb760748
          R13: ffff88084223bcf8 R14: ffff88086b22d800 R15: 0000000000000c00
          ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
          #4 [ffff88087be37608] __ldiskfs_handle_dirty_metadata at ffffffffa02ee0bb [ldiskfs]
          #5 [ffff88087be37648] ldiskfs_quota_write at ffffffffa0324b95 [ldiskfs]
          #6 [ffff88087be376b8] write_blk at ffffffff811e44ae
          #7 [ffff88087be376c8] remove_tree at ffffffff811e4da1
          #8 [ffff88087be37738] remove_tree at ffffffff811e4bf8
          #9 [ffff88087be377a8] remove_tree at ffffffff811e4bf8
          #10 [ffff88087be37818] qtree_delete_dquot at ffffffff811e4fe3
          #11 [ffff88087be37838] qtree_release_dquot at ffffffff811e501f
          #12 [ffff88087be37848] v2_release_dquot at ffffffff811e3cc0
          #13 [ffff88087be37858] dquot_release at ffffffff811df8e5
          #14 [ffff88087be37898] ldiskfs_release_dquot at ffffffffa03235be [ldiskfs]
          #15 [ffff88087be378b8] dqput at ffffffff811e0489
          #16 [ffff88087be378e8] dquot_transfer at ffffffff811e3253
          #17 [ffff88087be379c8] vfs_dq_transfer at ffffffff811dfc0c
          #18 [ffff88087be379e8] osd_quota_transfer at ffffffffa0ba98a5 [osd_ldiskfs]
          #19 [ffff88087be37a58] osd_attr_set at ffffffffa0bbcb8a [osd_ldiskfs]
          #20 [ffff88087be37ab8] dt_attr_set.clone.2 at ffffffffa083a969 [ofd]
          #21 [ffff88087be37ac8] ofd_attr_set at ffffffffa083e472 [ofd]
          #22 [ffff88087be37b28] ofd_setattr at ffffffffa082fe68 [ofd]
          #23 [ffff88087be37bb8] ost_setattr at ffffffffa06461fb [ost]
          #24 [ffff88087be37c18] ost_handle at ffffffffa06491fd [ost]
          #25 [ffff88087be37d68] ptlrpc_server_handle_request at ffffffffa06df4d5 [ptlrpc]
          #26 [ffff88087be37e48] ptlrpc_main at ffffffffa06e083d [ptlrpc]
          #27 [ffff88087be37ee8] kthread at ffffffff81096136
          #28 [ffff88087be37f48] kernel_thread at ffffffff8100c0ca
          #0 [ffff88087be37400] die at ffffffff8100f18b
          

          This is very similar to the stack traces posted by Mahmoud on August 4th in LU-5040, but those stack traces are in dqget rather than dqput.

          ... 
          [<ffffffff811e029c>] dqget+0x2ac/0x390^M
          [<ffffffff811e1b86>] dquot_transfer+0x116/0x620^M
          [<ffffffff811e09ab>] ? dquot_initialize+0x1fb/0x240^M
          [<ffffffffa0be0558>] ? __ldiskfs_journal_stop+0x68/0xa0 [ldiskfs]^M
          [<ffffffff811de4bc>] vfs_dq_transfer+0x6c/0xd0^M
          ...
          

          Is this still the same bug? Why are we hitting the assertion in dqput rather than dqget?

          haasken Ryan Haasken added a comment - Thanks for pointing those other tickets out. Our stack trace is slightly different than those listed in LU-5040 . Here it is, copied from LU-5250 : [exception RIP: jbd2_journal_dirty_metadata+268] RIP: ffffffffa02cc86c RSP: ffff88087be375e0 RFLAGS: 00010246 RAX: ffff8806485b3bc0 RBX: ffff8806f520d588 RCX: ffff88084223bcf8 RDX: 0000000000000000 RSI: ffff88084223bcf8 RDI: 0000000000000000 RBP: ffff88087be37600 R8: f010000000000000 R9: f79fde5390e73e02 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801eb760748 R13: ffff88084223bcf8 R14: ffff88086b22d800 R15: 0000000000000c00 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #4 [ffff88087be37608] __ldiskfs_handle_dirty_metadata at ffffffffa02ee0bb [ldiskfs] #5 [ffff88087be37648] ldiskfs_quota_write at ffffffffa0324b95 [ldiskfs] #6 [ffff88087be376b8] write_blk at ffffffff811e44ae #7 [ffff88087be376c8] remove_tree at ffffffff811e4da1 #8 [ffff88087be37738] remove_tree at ffffffff811e4bf8 #9 [ffff88087be377a8] remove_tree at ffffffff811e4bf8 #10 [ffff88087be37818] qtree_delete_dquot at ffffffff811e4fe3 #11 [ffff88087be37838] qtree_release_dquot at ffffffff811e501f #12 [ffff88087be37848] v2_release_dquot at ffffffff811e3cc0 #13 [ffff88087be37858] dquot_release at ffffffff811df8e5 #14 [ffff88087be37898] ldiskfs_release_dquot at ffffffffa03235be [ldiskfs] #15 [ffff88087be378b8] dqput at ffffffff811e0489 #16 [ffff88087be378e8] dquot_transfer at ffffffff811e3253 #17 [ffff88087be379c8] vfs_dq_transfer at ffffffff811dfc0c #18 [ffff88087be379e8] osd_quota_transfer at ffffffffa0ba98a5 [osd_ldiskfs] #19 [ffff88087be37a58] osd_attr_set at ffffffffa0bbcb8a [osd_ldiskfs] #20 [ffff88087be37ab8] dt_attr_set.clone.2 at ffffffffa083a969 [ofd] #21 [ffff88087be37ac8] ofd_attr_set at ffffffffa083e472 [ofd] #22 [ffff88087be37b28] ofd_setattr at ffffffffa082fe68 [ofd] #23 [ffff88087be37bb8] ost_setattr at ffffffffa06461fb [ost] #24 [ffff88087be37c18] ost_handle at ffffffffa06491fd [ost] #25 [ffff88087be37d68] ptlrpc_server_handle_request at ffffffffa06df4d5 [ptlrpc] #26 [ffff88087be37e48] ptlrpc_main at ffffffffa06e083d [ptlrpc] #27 [ffff88087be37ee8] kthread at ffffffff81096136 #28 [ffff88087be37f48] kernel_thread at ffffffff8100c0ca #0 [ffff88087be37400] die at ffffffff8100f18b This is very similar to the stack traces posted by Mahmoud on August 4th in LU-5040 , but those stack traces are in dqget rather than dqput. ... [<ffffffff811e029c>] dqget+0x2ac/0x390^M [<ffffffff811e1b86>] dquot_transfer+0x116/0x620^M [<ffffffff811e09ab>] ? dquot_initialize+0x1fb/0x240^M [<ffffffffa0be0558>] ? __ldiskfs_journal_stop+0x68/0xa0 [ldiskfs]^M [<ffffffff811de4bc>] vfs_dq_transfer+0x6c/0xd0^M ... Is this still the same bug? Why are we hitting the assertion in dqput rather than dqget?
          Niu, could this typo/omission be the cause for LU-5250? Did you see some symptoms that caused you to open this bug?
          

          LU-5250 is probably dup of LU-5040. This defect (LU-5612) was found while testing patch of LU-5040.

          niu Niu Yawei (Inactive) added a comment - Niu, could this typo/omission be the cause for LU-5250? Did you see some symptoms that caused you to open this bug? LU-5250 is probably dup of LU-5040 . This defect ( LU-5612 ) was found while testing patch of LU-5040 .
          haasken Ryan Haasken added a comment -

          Niu, could this typo/omission be the cause for LU-5250? Did you see some symptoms that caused you to open this bug?

          haasken Ryan Haasken added a comment - Niu, could this typo/omission be the cause for LU-5250 ? Did you see some symptoms that caused you to open this bug?
          pjones Peter Jones added a comment -

          Landed for 2.5.4. Not needed on master,

          pjones Peter Jones added a comment - Landed for 2.5.4. Not needed on master,
          niu Niu Yawei (Inactive) added a comment - patch for b2_5: http://review.whamcloud.com/11889

          People

            niu Niu Yawei (Inactive)
            niu Niu Yawei (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: