[LU-5612] typo in osd_declare_write() Created: 12/Sep/14 Updated: 28/Sep/14 Resolved: 18/Sep/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.3 |
| Fix Version/s: | Lustre 2.5.4 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Niu Yawei (Inactive) | Assignee: | Niu Yawei (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 15702 | ||||
| Description |
|
This was introduced when porting the 3902ff4c54925b2f1fcb732a32ed7ee5428e9f77 Some bits in osd_declare_write() are lost during porting. |
| Comments |
| Comment by Niu Yawei (Inactive) [ 12/Sep/14 ] |
|
patch for b2_5: http://review.whamcloud.com/11889 |
| Comment by Peter Jones [ 18/Sep/14 ] |
|
Landed for 2.5.4. Not needed on master, |
| Comment by Ryan Haasken [ 25/Sep/14 ] |
|
Niu, could this typo/omission be the cause for |
| Comment by Niu Yawei (Inactive) [ 26/Sep/14 ] |
Niu, could this typo/omission be the cause for LU-5250? Did you see some symptoms that caused you to open this bug?
|
| Comment by Ryan Haasken [ 26/Sep/14 ] |
|
Thanks for pointing those other tickets out. Our stack trace is slightly different than those listed in [exception RIP: jbd2_journal_dirty_metadata+268] RIP: ffffffffa02cc86c RSP: ffff88087be375e0 RFLAGS: 00010246 RAX: ffff8806485b3bc0 RBX: ffff8806f520d588 RCX: ffff88084223bcf8 RDX: 0000000000000000 RSI: ffff88084223bcf8 RDI: 0000000000000000 RBP: ffff88087be37600 R8: f010000000000000 R9: f79fde5390e73e02 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801eb760748 R13: ffff88084223bcf8 R14: ffff88086b22d800 R15: 0000000000000c00 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #4 [ffff88087be37608] __ldiskfs_handle_dirty_metadata at ffffffffa02ee0bb [ldiskfs] #5 [ffff88087be37648] ldiskfs_quota_write at ffffffffa0324b95 [ldiskfs] #6 [ffff88087be376b8] write_blk at ffffffff811e44ae #7 [ffff88087be376c8] remove_tree at ffffffff811e4da1 #8 [ffff88087be37738] remove_tree at ffffffff811e4bf8 #9 [ffff88087be377a8] remove_tree at ffffffff811e4bf8 #10 [ffff88087be37818] qtree_delete_dquot at ffffffff811e4fe3 #11 [ffff88087be37838] qtree_release_dquot at ffffffff811e501f #12 [ffff88087be37848] v2_release_dquot at ffffffff811e3cc0 #13 [ffff88087be37858] dquot_release at ffffffff811df8e5 #14 [ffff88087be37898] ldiskfs_release_dquot at ffffffffa03235be [ldiskfs] #15 [ffff88087be378b8] dqput at ffffffff811e0489 #16 [ffff88087be378e8] dquot_transfer at ffffffff811e3253 #17 [ffff88087be379c8] vfs_dq_transfer at ffffffff811dfc0c #18 [ffff88087be379e8] osd_quota_transfer at ffffffffa0ba98a5 [osd_ldiskfs] #19 [ffff88087be37a58] osd_attr_set at ffffffffa0bbcb8a [osd_ldiskfs] #20 [ffff88087be37ab8] dt_attr_set.clone.2 at ffffffffa083a969 [ofd] #21 [ffff88087be37ac8] ofd_attr_set at ffffffffa083e472 [ofd] #22 [ffff88087be37b28] ofd_setattr at ffffffffa082fe68 [ofd] #23 [ffff88087be37bb8] ost_setattr at ffffffffa06461fb [ost] #24 [ffff88087be37c18] ost_handle at ffffffffa06491fd [ost] #25 [ffff88087be37d68] ptlrpc_server_handle_request at ffffffffa06df4d5 [ptlrpc] #26 [ffff88087be37e48] ptlrpc_main at ffffffffa06e083d [ptlrpc] #27 [ffff88087be37ee8] kthread at ffffffff81096136 #28 [ffff88087be37f48] kernel_thread at ffffffff8100c0ca #0 [ffff88087be37400] die at ffffffff8100f18b This is very similar to the stack traces posted by Mahmoud on August 4th in ... [<ffffffff811e029c>] dqget+0x2ac/0x390^M [<ffffffff811e1b86>] dquot_transfer+0x116/0x620^M [<ffffffff811e09ab>] ? dquot_initialize+0x1fb/0x240^M [<ffffffffa0be0558>] ? __ldiskfs_journal_stop+0x68/0xa0 [ldiskfs]^M [<ffffffff811de4bc>] vfs_dq_transfer+0x6c/0xd0^M ... Is this still the same bug? Why are we hitting the assertion in dqput rather than dqget? |
| Comment by Ryan Haasken [ 26/Sep/14 ] |
|
Now that I look more closely at our own stack traces, it turns out we got a stack trace including dqget and do_insert_tree when we attempted to restart the file system after the crash. Niu, can you confirm that the fix which landed for That stack trace is not for the bug which was caused by this typo, is it? |
| Comment by Niu Yawei (Inactive) [ 28/Sep/14 ] |
|
Hi, Ryan I think it's not caused by this typo, however, this typo can cause insufficient credits in certain case definitely, so you'd better have this fix as well. The different stack trace (the crash in dqput path) is indeed different with |