[LU-5640] mds crash after update Created: 18/Sep/14  Updated: 09/Oct/21  Resolved: 09/Oct/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.2
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: saerda Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: None
Environment:

Client: Centos 6.5 with 2.6.32-431.29.2.el6.x86_64 kernel lustre-client-modules-2.5.3-2.6.32_431.29.2.el6.x86_64.x86_64 lustre-client-2.5.3-2.6.32_431.29.2.el6.x86_64.x86_64 Server: kernel-2.6.32-431.17.1.el6_lustre.x86_64 lustre-modules-2.5.2-2.6.32_431.17.1.el6_lustre.x86_64.x86_64


Attachments: Text File fimm-mds_crash20140917.txt    
Issue Links:
Related
is related to LU-5040 kernel BUG at fs/jbd2/transaction.c:1033 Resolved
Severity: 3
Rank (Obsolete): 15792

 Description   

We had mds crash on our lustre cluster after we updated from 2.4 to 2.5 .
We have build new luster client for the new kernel on client node.
This is quite similar to LU-5040.



 Comments   
Comment by Andreas Dilger [ 19/Sep/14 ]

Looks like the transaction handle ran out of credits in close trying to get quota for a setattr (likely updating atime on close). That is a bit strange, since we shouldn't need to get extra credits for quota when modifying just the timestamp.

Sep 16 23:13:09 fimm-mds1 kernel: kernel BUG at fs/jbd2/transaction.c:1033!
Sep 16 23:13:09 fimm-mds1 kernel: invalid opcode: 0000 [#1] SMP 
Sep 16 23:13:09 fimm-mds1 kernel: Pid: 5418, comm: mdt_rdpg03_002 Not tainted 2.6.32-431.17.1.el6_lustre.x86_64 #1 Dell Inc. RIP: 0010:[<ffffffffa00688ad>]  [<ffffffffa00688ad>] jbd2_journal_dirty_metadata+0x10d/0x150 [jbd2]
__ldiskfs_handle_dirty_metadata+0x7b/0x100 [ldiskfs]
ldiskfs_quota_write+0x165/0x210 [ldiskfs]
v2_write_file_info+0xa1/0xe0
dquot_acquire+0x138/0x140
ldiskfs_acquire_dquot+0x66/0xb0 [ldiskfs]
dqget+0x2ac/0x390
dquot_initialize+0x98/0x240
ldiskfs_dquot_initialize+0x83/0xd0 [ldiskfs]
osd_attr_set+0x12f/0x540 [osd_ldiskfs]
lod_attr_set+0x12b/0x450 [lod]
mdd_attr_set_internal+0x151/0x230 [mdd]
mdd_attr_set+0x117a/0x1470 [mdd]
mdt_mfd_close+0x7ac/0x1bc0 [mdt]
mdt_close+0x642/0xa80 [mdt]
mdt_handle_common+0x52a/0x1470 [mdt]
mds_readpage_handle+0x15/0x20 [mdt]
ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc]
ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]
ptlrpc_main+0xaed/0x1740 [ptlrpc]

This might be fixed with the newer quota patches.

Comment by Niu Yawei (Inactive) [ 22/Sep/14 ]

Looks like the transaction handle ran out of credits in close trying to get quota for a setattr (likely updating atime on close). That is a bit strange, since we shouldn't need to get extra credits for quota when modifying just the timestamp.

We currently calls dquot_initialize() for all kinds of setattr (despite of if uid/gid will be changed), this will be fixed by LU-5040.

Generated at Sat Feb 10 01:53:14 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.