Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.13.0
-
3
-
9223372036854775807
Description
during mdtest from 10 clients to a directory which configured with DNE2/DoM, few clients crashed at same time below.
[403683.467928] LustreError: 11-0: cache1-MDT0000-mdc-ffff89de6e235000: operation mds_connect to node 10.0.10.175@o2ib10 failed: rc = -16 [403709.580276] Lustre: Mounted cache1-client [403807.677757] LustreError: 192778:0:(osc_lock.c:687:osc_ldlm_weigh_ast()) ASSERTION( dlmlock->l_resource->lr_type == LDLM_EXTENT || ldlm_has_dom(dlmlock) ) failed: [403807.681557] LustreError: 192778:0:(osc_lock.c:687:osc_ldlm_weigh_ast()) LBUG [403807.683405] Pid: 192778, comm: mdtest 3.10.0-957.10.1.el7.x86_64 #1 SMP Mon Mar 18 15:06:45 UTC 2019 [403807.683407] Call Trace: [403807.683419] [<ffffffffc08867cc>] libcfs_call_trace+0x8c/0xc0 [libcfs] [403807.683446] [<ffffffffc088687c>] lbug_with_loc+0x4c/0xa0 [libcfs] [403807.683452] [<ffffffffc0d4b947>] osc_ldlm_weigh_ast+0x377/0x3a0 [osc] [403807.683474] [<ffffffffc0d9aa91>] mdc_cancel_weight+0xe1/0x130 [mdc] [403807.683484] [<ffffffffc0bb10d1>] ldlm_cancel_no_wait_policy+0x51/0x80 [ptlrpc] [403807.683527] [<ffffffffc0bb1118>] ldlm_cancel_aged_no_wait_policy+0x18/0x70 [ptlrpc] [403807.683546] [<ffffffffc0bb25ba>] ldlm_prepare_lru_list+0x1fa/0x4c0 [ptlrpc] [403807.683565] [<ffffffffc0bb5f1a>] ldlm_cancel_lru_local+0x1a/0x30 [ptlrpc] [403807.683584] [<ffffffffc0bb614e>] ldlm_prep_elc_req+0x21e/0x490 [ptlrpc] [403807.683605] [<ffffffffc0bb63e8>] ldlm_prep_enqueue_req+0x28/0x30 [ptlrpc] [403807.683624] [<ffffffffc0da9f55>] mdc_intent_getattr_pack.isra.15+0xa5/0x3a0 [mdc] [403807.683632] [<ffffffffc0dace12>] mdc_enqueue_base+0x532/0x1500 [mdc] [403807.683640] [<ffffffffc0dae545>] mdc_intent_lock+0x135/0x560 [mdc] [403807.683649] [<ffffffffc0deb962>] lmv_intent_lock+0x472/0xaf0 [lmv] [403807.683659] [<ffffffffc0e4760a>] ll_lookup_it+0x3aa/0x1910 [lustre] [403807.683685] [<ffffffffc0e49fbb>] ll_lookup_nd+0xbb/0x190 [lustre] [403807.683696] [<ffffffffa9a4c5b3>] lookup_real+0x23/0x60 [403807.683702] [<ffffffffa9a4cfd2>] __lookup_hash+0x42/0x60 [403807.683705] [<ffffffffa9a538bc>] do_unlinkat+0x14c/0x2d0 [403807.683710] [<ffffffffa9a549d6>] SyS_unlink+0x16/0x20 [403807.683715] [<ffffffffa9f75ddb>] system_call_fastpath+0x22/0x27 [403807.683722] [<ffffffffffffffff>] 0xffffffffffffffff [403807.683756] Kernel panic - not syncing: LBUG [403807.685577] CPU: 7 PID: 192778 Comm: mdtest Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.10.1.el7.x86_64 #1 [403807.689195] Hardware name: Intel Corporation S2600KPR/S2600KPR, BIOS SE5C610.86B.01.01.0027.071020182329 07/10/2018 [403807.691049] Call Trace: [403807.692858] [<ffffffffa9f62e41>] dump_stack+0x19/0x1b [403807.694647] [<ffffffffa9f5c550>] panic+0xe8/0x21f [403807.696419] [<ffffffffc08868cb>] lbug_with_loc+0x9b/0xa0 [libcfs] [403807.698183] [<ffffffffc0d4b947>] osc_ldlm_weigh_ast+0x377/0x3a0 [osc] [403807.699954] [<ffffffffc0d9aa91>] mdc_cancel_weight+0xe1/0x130 [mdc] [403807.701740] [<ffffffffc0bb10d1>] ldlm_cancel_no_wait_policy+0x51/0x80 [ptlrpc] [403807.703481] [<ffffffffc0bb1118>] ldlm_cancel_aged_no_wait_policy+0x18/0x70 [ptlrpc] [403807.706888] [<ffffffffc0bb25ba>] ldlm_prepare_lru_list+0x1fa/0x4c0 [ptlrpc] [403807.710249] [<ffffffffc0bb5f1a>] ldlm_cancel_lru_local+0x1a/0x30 [ptlrpc] [403807.711919] [<ffffffffc0bb614e>] ldlm_prep_elc_req+0x21e/0x490 [ptlrpc] [403807.713536] [<ffffffffc0bb63e8>] ldlm_prep_enqueue_req+0x28/0x30 [ptlrpc] [403807.715110] [<ffffffffc0da9f55>] mdc_intent_getattr_pack.isra.15+0xa5/0x3a0 [mdc] [403807.716683] [<ffffffffc0dace12>] mdc_enqueue_base+0x532/0x1500 [mdc] [403807.721334] [<ffffffffc0dae545>] mdc_intent_lock+0x135/0x560 [mdc] [403807.727177] [<ffffffffc0deb962>] lmv_intent_lock+0x472/0xaf0 [lmv] [403807.735265] [<ffffffffc0e4760a>] ll_lookup_it+0x3aa/0x1910 [lustre] [403807.744991] [<ffffffffc0e49fbb>] ll_lookup_nd+0xbb/0x190 [lustre] [403807.746080] [<ffffffffa9a4c5b3>] lookup_real+0x23/0x60 [403807.747136] [<ffffffffa9a4cfd2>] __lookup_hash+0x42/0x60 [403807.748161] [<ffffffffa9a538bc>] do_unlinkat+0x14c/0x2d0 [403807.750112] [<ffffffffa9a549d6>] SyS_unlink+0x16/0x20 [403807.751057] [<ffffffffa9f75ddb>] system_call_fastpath+0x22/0x27
Attachments
Activity
Fix Version/s | New: Lustre 2.12.3 [ 14418 ] |
Labels | Original: LTS12 ORNL | New: ORNL |
Link | Original: This issue is related to JFC-17 [ JFC-17 ] |
Link | New: This issue is related to JFC-20 [ JFC-20 ] |
Link | New: This issue is related to JFC-17 [ JFC-17 ] |
Labels | Original: ORNL | New: LTS12 ORNL |
Resolution | New: Fixed [ 1 ] | |
Status | Original: Open [ 1 ] | New: Resolved [ 5 ] |
Description |
Original:
during mdtest from 10 clients to a directory which configured with DNE2/DoM, few clients crashed at same time below.
{noformat} [403683.467928] LustreError: 11-0: cache1-MDT0000-mdc-ffff89de6e235000: operation mds_connect to node 10.0.10.175@o2ib10 failed: rc = -16 [403709.580276] Lustre: Mounted cache1-client [403807.677757] LustreError: 192778:0:(osc_lock.c:687:osc_ldlm_weigh_ast()) ASSERTION( dlmlock->l_resource->lr_type == LDLM_EXTENT || ldlm_has_dom(dlmlock) ) failed: [403807.681557] LustreError: 192778:0:(osc_lock.c:687:osc_ldlm_weigh_ast()) LBUG [403807.683405] Pid: 192778, comm: mdtest 3.10.0-957.10.1.el7.x86_64 #1 SMP Mon Mar 18 15:06:45 UTC 2019 [403807.683407] Call Trace: [403807.683419] [<ffffffffc08867cc>] libcfs_call_trace+0x8c/0xc0 [libcfs] [403807.683446] [<ffffffffc088687c>] lbug_with_loc+0x4c/0xa0 [libcfs] [403807.683452] [<ffffffffc0d4b947>] osc_ldlm_weigh_ast+0x377/0x3a0 [osc] [403807.683474] [<ffffffffc0d9aa91>] mdc_cancel_weight+0xe1/0x130 [mdc] [403807.683484] [<ffffffffc0bb10d1>] ldlm_cancel_no_wait_policy+0x51/0x80 [ptlrpc] [403807.683527] [<ffffffffc0bb1118>] ldlm_cancel_aged_no_wait_policy+0x18/0x70 [ptlrpc] [403807.683546] [<ffffffffc0bb25ba>] ldlm_prepare_lru_list+0x1fa/0x4c0 [ptlrpc] [403807.683565] [<ffffffffc0bb5f1a>] ldlm_cancel_lru_local+0x1a/0x30 [ptlrpc] [403807.683584] [<ffffffffc0bb614e>] ldlm_prep_elc_req+0x21e/0x490 [ptlrpc] [403807.683605] [<ffffffffc0bb63e8>] ldlm_prep_enqueue_req+0x28/0x30 [ptlrpc] [403807.683624] [<ffffffffc0da9f55>] mdc_intent_getattr_pack.isra.15+0xa5/0x3a0 [mdc] [403807.683632] [<ffffffffc0dace12>] mdc_enqueue_base+0x532/0x1500 [mdc] [403807.683640] [<ffffffffc0dae545>] mdc_intent_lock+0x135/0x560 [mdc] [403807.683649] [<ffffffffc0deb962>] lmv_intent_lock+0x472/0xaf0 [lmv] [403807.683659] [<ffffffffc0e4760a>] ll_lookup_it+0x3aa/0x1910 [lustre] [403807.683685] [<ffffffffc0e49fbb>] ll_lookup_nd+0xbb/0x190 [lustre] [403807.683696] [<ffffffffa9a4c5b3>] lookup_real+0x23/0x60 [403807.683702] [<ffffffffa9a4cfd2>] __lookup_hash+0x42/0x60 [403807.683705] [<ffffffffa9a538bc>] do_unlinkat+0x14c/0x2d0 [403807.683710] [<ffffffffa9a549d6>] SyS_unlink+0x16/0x20 [403807.683715] [<ffffffffa9f75ddb>] system_call_fastpath+0x22/0x27 [403807.683722] [<ffffffffffffffff>] 0xffffffffffffffff [403807.683756] Kernel panic - not syncing: LBUG [403807.685577] CPU: 7 PID: 192778 Comm: mdtest Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.10.1.el7.x86_64 #1 [403807.689195] Hardware name: Intel Corporation S2600KPR/S2600KPR, BIOS SE5C610.86B.01.01.0027.071020182329 07/10/2018 [403807.691049] Call Trace: [403807.692858] [<ffffffffa9f62e41>] dump_stack+0x19/0x1b [403807.694647] [<ffffffffa9f5c550>] panic+0xe8/0x21f [403807.696419] [<ffffffffc08868cb>] lbug_with_loc+0x9b/0xa0 [libcfs] [403807.698183] [<ffffffffc0d4b947>] osc_ldlm_weigh_ast+0x377/0x3a0 [osc] [403807.699954] [<ffffffffc0d9aa91>] mdc_cancel_weight+0xe1/0x130 [mdc] [403807.701740] [<ffffffffc0bb10d1>] ldlm_cancel_no_wait_policy+0x51/0x80 [ptlrpc] [403807.703481] [<ffffffffc0bb1118>] ldlm_cancel_aged_no_wait_policy+0x18/0x70 [ptlrpc] [403807.705194] [<ffffffffc0ba0f18>] ? ldlm_lock_remove_from_lru_check+0x158/0x1a0 [ptlrpc] [403807.706888] [<ffffffffc0bb25ba>] ldlm_prepare_lru_list+0x1fa/0x4c0 [ptlrpc] [403807.708576] [<ffffffffc0bb1100>] ? ldlm_cancel_no_wait_policy+0x80/0x80 [ptlrpc] [403807.710249] [<ffffffffc0bb5f1a>] ldlm_cancel_lru_local+0x1a/0x30 [ptlrpc] [403807.711919] [<ffffffffc0bb614e>] ldlm_prep_elc_req+0x21e/0x490 [ptlrpc] [403807.713536] [<ffffffffc0bb63e8>] ldlm_prep_enqueue_req+0x28/0x30 [ptlrpc] [403807.715110] [<ffffffffc0da9f55>] mdc_intent_getattr_pack.isra.15+0xa5/0x3a0 [mdc] [403807.716683] [<ffffffffc0dace12>] mdc_enqueue_base+0x532/0x1500 [mdc] [403807.718271] [<ffffffffc0a22781>] ? lprocfs_counter_sub+0xc1/0x130 [obdclass] [403807.719812] [<ffffffffc0e47c23>] ? ll_lookup_it+0x9c3/0x1910 [lustre] [403807.721334] [<ffffffffc0dae545>] mdc_intent_lock+0x135/0x560 [mdc] [403807.722814] [<ffffffffc0e46ba0>] ? ll_md_need_convert+0x1b0/0x1b0 [lustre] [403807.724291] [<ffffffffc0bb1ae0>] ? ldlm_expired_completion_wait+0x220/0x220 [ptlrpc] [403807.725744] [<ffffffffc0db18e0>] ? mdc_changelog_cdev_finish+0x1c0/0x1c0 [mdc] [403807.727177] [<ffffffffc0deb962>] lmv_intent_lock+0x472/0xaf0 [lmv] [403807.728578] [<ffffffffa992e262>] ? from_kgid+0x12/0x20 [403807.729954] [<ffffffffc0e46e87>] ? ll_i2suppgid+0x37/0x40 [lustre] [403807.731323] [<ffffffffc0e46eb4>] ? ll_i2gids+0x24/0xb0 [lustre] [403807.732655] [<ffffffffa992e262>] ? from_kgid+0x12/0x20 [403807.733968] [<ffffffffc0e46ba0>] ? ll_md_need_convert+0x1b0/0x1b0 [lustre] [403807.735265] [<ffffffffc0e4760a>] ll_lookup_it+0x3aa/0x1910 [lustre] [403807.736552] [<ffffffffc0e16893>] ? ll_inode_permission+0x93/0x3c0 [lustre] [403807.737817] [<ffffffffa9af9252>] ? security_inode_permission+0x22/0x30 [403807.739054] [<ffffffffa9a4d472>] ? __inode_permission+0x52/0xd0 [403807.740259] [<ffffffffa9a4d508>] ? inode_permission+0x18/0x50 [403807.741452] [<ffffffffa9a5134e>] ? link_path_walk+0x27e/0x8b0 [403807.742653] [<ffffffffc0a48550>] ? cl_env_put+0x140/0x1d0 [obdclass] [403807.743865] [<ffffffffc0e5e4b0>] ? cl_inode_fini+0x90/0x1c0 [lustre] [403807.744991] [<ffffffffc0e49fbb>] ll_lookup_nd+0xbb/0x190 [lustre] [403807.746080] [<ffffffffa9a4c5b3>] lookup_real+0x23/0x60 [403807.747136] [<ffffffffa9a4cfd2>] __lookup_hash+0x42/0x60 [403807.748161] [<ffffffffa9a538bc>] do_unlinkat+0x14c/0x2d0 [403807.749153] [<ffffffffa9a5312d>] ? putname+0x3d/0x60 [403807.750112] [<ffffffffa9a549d6>] SyS_unlink+0x16/0x20 [403807.751057] [<ffffffffa9f75ddb>] system_call_fastpath+0x22/0x27 {noformat} |
New:
during mdtest from 10 clients to a directory which configured with DNE2/DoM, few clients crashed at same time below.
{noformat} [403683.467928] LustreError: 11-0: cache1-MDT0000-mdc-ffff89de6e235000: operation mds_connect to node 10.0.10.175@o2ib10 failed: rc = -16 [403709.580276] Lustre: Mounted cache1-client [403807.677757] LustreError: 192778:0:(osc_lock.c:687:osc_ldlm_weigh_ast()) ASSERTION( dlmlock->l_resource->lr_type == LDLM_EXTENT || ldlm_has_dom(dlmlock) ) failed: [403807.681557] LustreError: 192778:0:(osc_lock.c:687:osc_ldlm_weigh_ast()) LBUG [403807.683405] Pid: 192778, comm: mdtest 3.10.0-957.10.1.el7.x86_64 #1 SMP Mon Mar 18 15:06:45 UTC 2019 [403807.683407] Call Trace: [403807.683419] [<ffffffffc08867cc>] libcfs_call_trace+0x8c/0xc0 [libcfs] [403807.683446] [<ffffffffc088687c>] lbug_with_loc+0x4c/0xa0 [libcfs] [403807.683452] [<ffffffffc0d4b947>] osc_ldlm_weigh_ast+0x377/0x3a0 [osc] [403807.683474] [<ffffffffc0d9aa91>] mdc_cancel_weight+0xe1/0x130 [mdc] [403807.683484] [<ffffffffc0bb10d1>] ldlm_cancel_no_wait_policy+0x51/0x80 [ptlrpc] [403807.683527] [<ffffffffc0bb1118>] ldlm_cancel_aged_no_wait_policy+0x18/0x70 [ptlrpc] [403807.683546] [<ffffffffc0bb25ba>] ldlm_prepare_lru_list+0x1fa/0x4c0 [ptlrpc] [403807.683565] [<ffffffffc0bb5f1a>] ldlm_cancel_lru_local+0x1a/0x30 [ptlrpc] [403807.683584] [<ffffffffc0bb614e>] ldlm_prep_elc_req+0x21e/0x490 [ptlrpc] [403807.683605] [<ffffffffc0bb63e8>] ldlm_prep_enqueue_req+0x28/0x30 [ptlrpc] [403807.683624] [<ffffffffc0da9f55>] mdc_intent_getattr_pack.isra.15+0xa5/0x3a0 [mdc] [403807.683632] [<ffffffffc0dace12>] mdc_enqueue_base+0x532/0x1500 [mdc] [403807.683640] [<ffffffffc0dae545>] mdc_intent_lock+0x135/0x560 [mdc] [403807.683649] [<ffffffffc0deb962>] lmv_intent_lock+0x472/0xaf0 [lmv] [403807.683659] [<ffffffffc0e4760a>] ll_lookup_it+0x3aa/0x1910 [lustre] [403807.683685] [<ffffffffc0e49fbb>] ll_lookup_nd+0xbb/0x190 [lustre] [403807.683696] [<ffffffffa9a4c5b3>] lookup_real+0x23/0x60 [403807.683702] [<ffffffffa9a4cfd2>] __lookup_hash+0x42/0x60 [403807.683705] [<ffffffffa9a538bc>] do_unlinkat+0x14c/0x2d0 [403807.683710] [<ffffffffa9a549d6>] SyS_unlink+0x16/0x20 [403807.683715] [<ffffffffa9f75ddb>] system_call_fastpath+0x22/0x27 [403807.683722] [<ffffffffffffffff>] 0xffffffffffffffff [403807.683756] Kernel panic - not syncing: LBUG [403807.685577] CPU: 7 PID: 192778 Comm: mdtest Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.10.1.el7.x86_64 #1 [403807.689195] Hardware name: Intel Corporation S2600KPR/S2600KPR, BIOS SE5C610.86B.01.01.0027.071020182329 07/10/2018 [403807.691049] Call Trace: [403807.692858] [<ffffffffa9f62e41>] dump_stack+0x19/0x1b [403807.694647] [<ffffffffa9f5c550>] panic+0xe8/0x21f [403807.696419] [<ffffffffc08868cb>] lbug_with_loc+0x9b/0xa0 [libcfs] [403807.698183] [<ffffffffc0d4b947>] osc_ldlm_weigh_ast+0x377/0x3a0 [osc] [403807.699954] [<ffffffffc0d9aa91>] mdc_cancel_weight+0xe1/0x130 [mdc] [403807.701740] [<ffffffffc0bb10d1>] ldlm_cancel_no_wait_policy+0x51/0x80 [ptlrpc] [403807.703481] [<ffffffffc0bb1118>] ldlm_cancel_aged_no_wait_policy+0x18/0x70 [ptlrpc] [403807.706888] [<ffffffffc0bb25ba>] ldlm_prepare_lru_list+0x1fa/0x4c0 [ptlrpc] [403807.710249] [<ffffffffc0bb5f1a>] ldlm_cancel_lru_local+0x1a/0x30 [ptlrpc] [403807.711919] [<ffffffffc0bb614e>] ldlm_prep_elc_req+0x21e/0x490 [ptlrpc] [403807.713536] [<ffffffffc0bb63e8>] ldlm_prep_enqueue_req+0x28/0x30 [ptlrpc] [403807.715110] [<ffffffffc0da9f55>] mdc_intent_getattr_pack.isra.15+0xa5/0x3a0 [mdc] [403807.716683] [<ffffffffc0dace12>] mdc_enqueue_base+0x532/0x1500 [mdc] [403807.721334] [<ffffffffc0dae545>] mdc_intent_lock+0x135/0x560 [mdc] [403807.727177] [<ffffffffc0deb962>] lmv_intent_lock+0x472/0xaf0 [lmv] [403807.735265] [<ffffffffc0e4760a>] ll_lookup_it+0x3aa/0x1910 [lustre] [403807.744991] [<ffffffffc0e49fbb>] ll_lookup_nd+0xbb/0x190 [lustre] [403807.746080] [<ffffffffa9a4c5b3>] lookup_real+0x23/0x60 [403807.747136] [<ffffffffa9a4cfd2>] __lookup_hash+0x42/0x60 [403807.748161] [<ffffffffa9a538bc>] do_unlinkat+0x14c/0x2d0 [403807.750112] [<ffffffffa9a549d6>] SyS_unlink+0x16/0x20 [403807.751057] [<ffffffffa9f75ddb>] system_call_fastpath+0x22/0x27 {noformat} |
Labels | New: ORNL |
Fix Version/s | New: Lustre 2.13.0 [ 14290 ] |