[LU-2812] parallel-scale test_compilebench hung: task ldlm_poold:16196 blocked for more than 120 seconds Created: 14/Feb/13 Updated: 14/Aug/16 Resolved: 14/Aug/16 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 1.8.9 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Jian Yu | Assignee: | WC Triage |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Lustre Tag: v1_8_9_WC1_RC1 The async journal commit feature and cancel lock before replay feature are enabled: filter->fo_syncjournal = 0; |
||
| Severity: | 3 |
| Rank (Obsolete): | 6814 |
| Description |
|
The async journal commit feature and cancel lock before replay feature are disabled by default on Lustre b1_8 branch. After enabling them, running parallel-scale compilebench test hung as follows: == parallel-scale test compilebench: compilebench == 09:24:53 (1360776293) OPTIONS: cbench_DIR=/usr/bin cbench_IDIRS=2 cbench_RUNS=2 client-24vm1 client-24vm2.lab.whamcloud.com ./compilebench -D /mnt/lustre/d0.compilebench -i 2 -r 2 --makej using working directory /mnt/lustre/d0.compilebench, 2 intial dirs 2 runs native unpatched native-0 222MB in 522.93 seconds (0.43 MB/s) native patched native-0 109MB in 424.16 seconds (0.26 MB/s) native patched compiled native-0 691MB in 69.30 seconds (9.98 MB/s) create dir kernel-0 222MB in 1110.55 seconds (0.20 MB/s) create dir kernel-1 222MB in 3047.05 seconds (0.07 MB/s) Console log on the client node showed that: 09:24:59:Lustre: DEBUG MARKER: == parallel-scale test compilebench: compilebench == 09:24:53 (1360776293) 09:24:59:Lustre: DEBUG MARKER: /usr/sbin/lctl mark .\/compilebench -D \/mnt\/lustre\/d0.compilebench -i 2 -r 2 --makej 09:24:59:Lustre: DEBUG MARKER: ./compilebench -D /mnt/lustre/d0.compilebench -i 2 -r 2 --makej 09:33:01:INFO: task ldlm_poold:16196 blocked for more than 120 seconds. 09:33:01:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 09:33:01:ldlm_poold D 0000000000000000 0 16196 2 0x00000080 09:33:01: ffff88000e4bd9b0 0000000000000046 ffff88000e4bd960 ffffffff810097cc 09:33:01: ffff8800117460b8 0000000000000000 00000000004bd970 ffff880002214200 09:33:01: ffff880037871058 ffff88000e4bdfd8 000000000000fb88 ffff880037871058 09:33:01:Call Trace: 09:33:01: [<ffffffff810097cc>] ? __switch_to+0x1ac/0x320 09:33:01: [<ffffffff814e9c50>] ? thread_return+0x4e/0x76e 09:33:01: [<ffffffff814eaac5>] schedule_timeout+0x215/0x2e0 09:33:01: [<ffffffff8105f8ac>] ? try_to_wake_up+0x24c/0x3e0 09:33:01: [<ffffffff814ea743>] wait_for_common+0x123/0x180 09:33:01: [<ffffffff8105fa40>] ? default_wake_function+0x0/0x20 09:33:01: [<ffffffff814ea85d>] wait_for_completion+0x1d/0x20 09:33:01: [<ffffffffa04dddcd>] __ldlm_bl_to_thread+0x19d/0x1b0 [ptlrpc] 09:33:01: [<ffffffffa04d672b>] ? ldlm_cli_cancel_local+0xab/0x350 [ptlrpc] 09:33:01: [<ffffffffa04e35b9>] ldlm_bl_to_thread+0x379/0x5f0 [ptlrpc] 09:33:01: [<ffffffffa04d88e1>] ? ldlm_cancel_list+0xf1/0x240 [ptlrpc] 09:33:01: [<ffffffffa04e384e>] ldlm_bl_to_thread_list+0x1e/0xa0 [ptlrpc] 09:33:01: [<ffffffffa04d999a>] ldlm_cancel_lru+0x7a/0x1f0 [ptlrpc] 09:33:01: [<ffffffff814e9c50>] ? thread_return+0x4e/0x76e 09:33:01: [<ffffffffa04ea36c>] ldlm_cli_pool_recalc+0x1fc/0x2a0 [ptlrpc] 09:33:01: [<ffffffff8107d4eb>] ? try_to_del_timer_sync+0x7b/0xe0 09:33:02: [<ffffffffa04ea508>] ldlm_pool_recalc+0xf8/0x130 [ptlrpc] 09:33:02: [<ffffffffa04eb0ec>] ldlm_pools_recalc+0x9c/0x2d0 [ptlrpc] 09:33:02: [<ffffffffa04ec714>] ldlm_pools_thread_main+0xb4/0x2f0 [ptlrpc] 09:33:02: [<ffffffff8105fa40>] ? default_wake_function+0x0/0x20 09:33:02: [<ffffffff8100c0ca>] child_rip+0xa/0x20 09:33:02: [<ffffffffa04ec660>] ? ldlm_pools_thread_main+0x0/0x2f0 [ptlrpc] 09:33:02: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20 Maloo report: https://maloo.whamcloud.com/test_sets/83638de2-7667-11e2-bc2f-52540035b04c |
| Comments |
| Comment by Jian Yu [ 14/Feb/13 ] |
|
It seems this is related to LU-1376. |
| Comment by Oleg Drokin [ 14/Feb/13 ] |
|
I suspect this i the same problem as |
| Comment by James A Simmons [ 14/Aug/16 ] |
|
Old blocker for unsupported version |