[LU-4807] repeating DQACQ failed with -37 Created: 24/Mar/14  Updated: 25/Mar/14  Resolved: 25/Mar/14

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.1
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Shuichi Ihara (Inactive) Assignee: Niu Yawei (Inactive)
Resolution: Duplicate Votes: 0
Labels: None
Environment:

Lustre-2.4.1


Issue Links:
Duplicate
duplicates LU-4249 exception RIP: lqe64_hash_keycmp+12 Resolved
Severity: 3
Rank (Obsolete): 13222

 Description   
Mar 25 00:29:51 ddnoss4 kernel: LustreError: 4503:0:(qsd_handler.c:344:qsd_req_completion()) $$$ DQACQ failed with -37, flags:0x2 qsd:home2-OST0026 qtype:grp id:3303 enforced:1 granted:7364460 pending:0 waiting:0 req:1 usage:7165364 qunit:4194304 qtune:524288 edquot:0
Mar 25 00:39:51 ddnoss4 kernel: LustreError: 4503:0:(qsd_handler.c:344:qsd_req_completion()) $$$ DQACQ failed with -37, flags:0x2 qsd:home2-OST0026 qtype:grp id:3303 enforced:1 granted:7364460 pending:0 waiting:0 req:1 usage:7165364 qunit:4194304 qtune:524288 edquot:0
Mar 25 00:49:51 ddnoss4 kernel: LustreError: 4503:0:(qsd_handler.c:344:qsd_req_completion()) $$$ DQACQ failed with -37, flags:0x2 qsd:home2-OST0026 qtype:grp id:3303 enforced:1 granted:7364460 pending:0 waiting:0 req:1 usage:7165364 qunit:4194304 qtune:524288 edquot:0
Mar 25 00:59:51 ddnoss4 kernel: LustreError: 4503:0:(qsd_handler.c:344:qsd_req_completion()) $$$ DQACQ failed with -37, flags:0x2 qsd:home2-OST0026 qtype:grp id:3303 enforced:1 granted:7364460 pending:0 waiting:0 req:1 usage:7165364 qunit:4194304 qtune:524288 edquot:0

Eventually, OSS crashed. Here is console messages when OSS crashed.

<4>Pid: 179, comm: kswapd1 Not tainted 2.6.32-358.18.1.el6_lustre.x86_64 #1 Dell Inc. PowerEdge R620/0PXXHP
<4>RIP: 0010:[<ffffffffa0d45f2c>]  [<ffffffffa0d45f2c>] lqe64_hash_keycmp+0xc/0x20 [lquota]
<4>RSP: 0018:ffff880821b9d940  EFLAGS: 00010206
<4>RAX: 0000000000000c7b RBX: ffff88048dc20c80 RCX: 0000000000000000
<4>RDX: 0000000000000000 RSI: 5a5a5a5a5a5a5a5a RDI: ffff880473a14f28
<4>RBP: ffff880821b9d940 R08: 0000000000000003 R09: 0000000000000001
<4>R10: 0000000000000000 R11: 0000000000000000 R12: ffff880821b9d9d0
<4>R13: ffff880473a14f28 R14: 0000000000000000 R15: 5a5a5a5a5a5a5a5a
<4>FS:  0000000000000000(0000) GS:ffff88084c400000(0000) knlGS:0000000000000000
<4>CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
<4>CR2: 00000034bb673e10 CR3: 000000100a428000 CR4: 00000000000407e0
<4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>Process kswapd1 (pid: 179, threadinfo ffff880821b9c000, task ffff880821422040)
<4>Stack:
<4> ffff880821b9d990 ffffffffa0577945 0000000300000000 0000000000000000
<4><d> ffff880821b9d9b0 ffff880821b9d9d0 ffff880473a14f28 ffff88048dc20c80
<4><d> 0000000000000001 ffffffffa0714e00 ffff880821b9d9c0 ffffffffa0577ac7
<4>Call Trace:
<4> [<ffffffffa0577945>] cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs]
<4> [<ffffffffa0577ac7>] cfs_hash_dual_bd_lookup_locked+0x37/0x70 [libcfs]
<4> [<ffffffffa0578cf4>] cfs_hash_lookup+0x54/0xa0 [libcfs]
<4> [<ffffffffa0d464f7>] lqe_locate+0x47/0x850 [lquota]
<4> [<ffffffffa0d5804b>] qsd_op_adjust+0x2cb/0x580 [lquota]
<4> [<ffffffffa0db77f1>] osd_object_delete+0x231/0x2f0 [osd_ldiskfs]
<4> [<ffffffffa0687829>] lu_object_free+0x89/0x1a0 [obdclass]
<4> [<ffffffffa05774d2>] ? cfs_hash_bd_from_key+0x42/0xd0 [libcfs]
<4> [<ffffffffa0688b3f>] lu_site_purge+0x2af/0x4a0 [obdclass]
<4> [<ffffffffa0688e06>] lu_cache_shrink+0xd6/0x280 [obdclass]
<4> [<ffffffff81131fca>] shrink_slab+0x12a/0x1a0
<4> [<ffffffff811351ba>] balance_pgdat+0x59a/0x820
<4> [<ffffffff81135574>] kswapd+0x134/0x3c0
<4> [<ffffffff81096da0>] ? autoremove_wake_function+0x0/0x40
<4> [<ffffffff81135440>] ? kswapd+0x0/0x3c0
<4> [<ffffffff81096a36>] kthread+0x96/0xa0
<4> [<ffffffff8100c0ca>] child_rip+0xa/0x20
<4> [<ffffffff810969a0>] ? kthread+0x0/0xa0
<4> [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
<4>Code: 29 c8 21 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 1f 44 00 00 c9 48 8d 47 10 c3 90 55 48 89 e5 0f 1f 44 00 00 48 8b 07 <48> 39 46 10 c9 0f 94 c0 0f b6 c0 c3 0f 1f 84 00 00 00 00 00 55 
<1>RIP  [<ffffffffa0d45f2c>] lqe64_hash_keycmp+0xc/0x20 [lquota]
<4> RSP <ffff880821b9d940>

It's very similar to LU-4249



 Comments   
Comment by Niu Yawei (Inactive) [ 25/Mar/14 ]

dup of LU-4249

Generated at Sat Feb 10 01:46:01 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.