[LU-10630] recovery-random-scale test_fail_client_mds: client cannot connect to MDS Created: 07/Feb/18 Updated: 24/Nov/21 Resolved: 24/Nov/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.11.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Sarah Liu | Assignee: | WC Triage |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
recovery-random-scale test_fail_client_mds - Timeout occurred after 1444 mins, last suite running was recovery-random-scale, restarting cluster to continue tests This issue was created by maloo for sarah_lw <wei3.liu@intel.com> This issue relates to the following test suite run: test_fail_client_mds failed with the following error: Timeout occurred after 1444 mins, last suite running was recovery-random-scale, restarting cluster to continue tests [ 18.684976] LNet: Accept all, port 7988 [ 113.727780] LustreError: 11-0: lustre-MDT0000-mdc-ffff88007b744000: operation mds_connect to node 10.2.8.168@tcp failed: rc = -11 [ 128.332376] random: crng init done [ 263.727780] LustreError: 11-0: lustre-MDT0000-mdc-ffff88007b744000: operation mds_connect to node 10.2.8.168@tcp failed: rc = -11 [ 413.727721] LustreError: 11-0: lustre-MDT0000-mdc-ffff88007b744000: operation mds_connect to node 10.2.8.168@tcp failed: rc = -11 [ 563.727711] LustreError: 11-0: lustre-MDT0000-mdc-ffff88007b744000: operation mds_connect to node 10.2.8.168@tcp failed: rc = -11 |
| Comments |
| Comment by James Nunez (Inactive) [ 08/Feb/18 ] |
|
There's not much to look at in the dmesg logs, but in the MDS1 (vm12) console log, we see the following stack trace [ 618.123147] LNet: Service thread pid 13443 was inactive for 60.04s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [ 618.126713] Pid: 13443, comm: mdt00_003 [ 618.127520] [ 618.127520] Call Trace: [ 618.128418] [<ffffffff816ab6b9>] schedule+0x29/0x70 [ 618.129587] [<ffffffff816a9004>] schedule_timeout+0x174/0x2c0 [ 618.130846] [<ffffffffc0abdd47>] ? libcfs_debug_msg+0x57/0x80 [libcfs] [ 618.132274] [<ffffffff8109a6c0>] ? process_timeout+0x0/0x10 [ 618.133532] [<ffffffffc0ab2eb1>] ? cfs_block_sigsinv+0x71/0xa0 [libcfs] [ 618.134850] [<ffffffffc13838d0>] osp_precreate_reserve+0x2e0/0x810 [osp] [ 618.136193] [<ffffffff810c6440>] ? default_wake_function+0x0/0x20 [ 618.137320] [<ffffffffc1378c53>] osp_declare_create+0x193/0x590 [osp] [ 618.138609] [<ffffffffc0bea619>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [ 618.139946] [<ffffffffc12ca1dc>] lod_sub_declare_create+0xdc/0x210 [lod] [ 618.141282] [<ffffffffc12c353e>] lod_qos_declare_object_on+0xbe/0x3a0 [lod] [ 618.142569] [<ffffffffc12c44ba>] lod_alloc_rr.constprop.18+0x70a/0x1000 [lod] [ 618.143974] [<ffffffffc12c8a8f>] lod_qos_prep_create+0xc0f/0x1830 [lod] [ 618.145265] [<ffffffffc12c9c0d>] lod_prepare_create+0x25d/0x360 [lod] [ 618.146636] [<ffffffffc12bbdce>] lod_declare_striped_create+0x1ee/0x970 [lod] [ 618.148097] [<ffffffffc12ca1dc>] ? lod_sub_declare_create+0xdc/0x210 [lod] [ 618.149544] [<ffffffffc12c00e4>] lod_declare_create+0x204/0x590 [lod] [ 618.150866] [<ffffffffc0bea619>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [ 618.152333] [<ffffffffc133139f>] mdd_declare_create_object_internal+0xdf/0x2f0 [mdd] [ 618.153810] [<ffffffffc1321b53>] mdd_declare_create+0x53/0xe20 [mdd] [ 618.155154] [<ffffffffc1325e69>] mdd_create+0x879/0x1410 [mdd] [ 618.156268] [<ffffffffc11db305>] mdt_reint_open+0x1a45/0x2890 [mdt] [ 618.157529] [<ffffffffc0c1e087>] ? upcall_cache_get_entry+0x3f7/0x8f0 [obdclass] [ 618.158905] [<ffffffffc11beb53>] ? ucred_set_jobid+0x53/0x70 [mdt] [ 618.160143] [<ffffffffc11cf410>] mdt_reint_rec+0x80/0x210 [mdt] [ 618.161253] [<ffffffffc11aef8b>] mdt_reint_internal+0x5fb/0x9c0 [mdt] [ 618.162529] [<ffffffffc11bb457>] mdt_intent_reint+0x157/0x420 [mdt] [ 618.163720] [<ffffffffc11b20b2>] mdt_intent_opc+0x442/0xad0 [mdt] [ 618.165008] [<ffffffffc0e3bb90>] ? lustre_swab_ldlm_intent+0x0/0x20 [ptlrpc] [ 618.166314] [<ffffffffc11b9c73>] mdt_intent_policy+0x1a3/0x360 [mdt] [ 618.167583] [<ffffffffc0dea2fa>] ldlm_lock_enqueue+0x38a/0x970 [ptlrpc] [ 618.168942] [<ffffffffc0e13a33>] ldlm_handle_enqueue0+0x8f3/0x1400 [ptlrpc] [ 618.170447] [<ffffffffc0e3bc10>] ? lustre_swab_ldlm_request+0x0/0x30 [ptlrpc] [ 618.171912] [<ffffffffc0e99752>] tgt_enqueue+0x62/0x210 [ptlrpc] [ 618.173114] [<ffffffffc0ea1965>] tgt_request_handle+0x925/0x13b0 [ptlrpc] [ 618.173927] [<ffffffffc0e45c7e>] ptlrpc_server_handle_request+0x24e/0xab0 [ptlrpc] [ 618.174816] [<ffffffff810bc0f8>] ? __wake_up_common+0x58/0x90 [ 618.175639] [<ffffffffc0e49422>] ptlrpc_main+0xa92/0x1e40 [ptlrpc] [ 618.176374] [<ffffffff810c0d30>] ? finish_task_switch+0x50/0x160 [ 618.177202] [<ffffffffc0e48990>] ? ptlrpc_main+0x0/0x1e40 [ptlrpc] [ 618.177932] [<ffffffff810b252f>] kthread+0xcf/0xe0 [ 618.178689] [<ffffffff810b2460>] ? kthread+0x0/0xe0 [ 618.179434] [<ffffffff816b8798>] ret_from_fork+0x58/0x90 [ 618.180131] [<ffffffff810b2460>] ? kthread+0x0/0xe0 [ 618.180732] [ 618.180936] LustreError: dumping log to /tmp/lustre-log.1516576126.13443 [ 618.379102] Pid: 12418, comm: mdt00_002 [ 618.379607] [ 618.379607] Call Trace: [ 618.380078] [<ffffffff816ab6b9>] schedule+0x29/0x70 [ 618.380662] [<ffffffff816a9004>] schedule_timeout+0x174/0x2c0 [ 618.381362] [<ffffffffc0abdd47>] ? libcfs_debug_msg+0x57/0x80 [libcfs] [ 618.382286] [<ffffffff8109a6c0>] ? process_timeout+0x0/0x10 [ 618.382976] [<ffffffffc0ab2eb1>] ? cfs_block_sigsinv+0x71/0xa0 [libcfs] [ 618.383754] [<ffffffffc13838d0>] osp_precreate_reserve+0x2e0/0x810 [osp] [ 618.384636] [<ffffffff810c6440>] ? default_wake_function+0x0/0x20 [ 618.385399] [<ffffffffc1378c53>] osp_declare_create+0x193/0x590 [osp] [ 618.386245] [<ffffffffc0bea619>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [ 618.387066] [<ffffffffc12ca1dc>] lod_sub_declare_create+0xdc/0x210 [lod] [ 618.387854] [<ffffffffc12c353e>] lod_qos_declare_object_on+0xbe/0x3a0 [lod] [ 618.388780] [<ffffffffc12c44ba>] lod_alloc_rr.constprop.18+0x70a/0x1000 [lod] [ 618.389774] [<ffffffffc07009d5>] ? dbuf_find+0x1d5/0x1e0 [zfs] [ 618.390494] [<ffffffffc0604487>] ? tsd_get+0x37/0x60 [spl] [ 618.391235] [<ffffffffc12c8a8f>] lod_qos_prep_create+0xc0f/0x1830 [lod] [ 618.392049] [<ffffffffc12c989a>] ? lod_prepare_inuse+0x1ea/0x300 [lod] [ 618.392812] [<ffffffffc12c9c0d>] lod_prepare_create+0x25d/0x360 [lod] [ 618.393661] [<ffffffffc12bbdce>] lod_declare_striped_create+0x1ee/0x970 [lod] [ 618.394541] [<ffffffffc12ca1dc>] ? lod_sub_declare_create+0xdc/0x210 [lod] [ 618.395386] [<ffffffffc12c00e4>] lod_declare_create+0x204/0x590 [lod] [ 618.396203] [<ffffffffc133139f>] mdd_declare_create_object_internal+0xdf/0x2f0 [mdd] [ 618.397112] [<ffffffffc1321b53>] mdd_declare_create+0x53/0xe20 [mdd] [ 618.397925] [<ffffffffc1325e69>] mdd_create+0x879/0x1410 [mdd] [ 618.398641] [<ffffffffc11db305>] mdt_reint_open+0x1a45/0x2890 [mdt] [ 618.399495] [<ffffffffc0c1e087>] ? upcall_cache_get_entry+0x3f7/0x8f0 [obdclass] [ 618.400369] [<ffffffffc11beb53>] ? ucred_set_jobid+0x53/0x70 [mdt] [ 618.401176] [<ffffffffc11cf410>] mdt_reint_rec+0x80/0x210 [mdt] [ 618.401861] [<ffffffffc11aef8b>] mdt_reint_internal+0x5fb/0x9c0 [mdt] [ 618.402628] [<ffffffffc11bb457>] mdt_intent_reint+0x157/0x420 [mdt] [ 618.403504] [<ffffffffc11b20b2>] mdt_intent_opc+0x442/0xad0 [mdt] [ 618.404332] [<ffffffffc0e3bb90>] ? lustre_swab_ldlm_intent+0x0/0x20 [ptlrpc] [ 618.405167] [<ffffffffc11b9c73>] mdt_intent_policy+0x1a3/0x360 [mdt] [ 618.405990] [<ffffffffc0dea2fa>] ldlm_lock_enqueue+0x38a/0x970 [ptlrpc] [ 618.406786] [<ffffffffc0e13a33>] ldlm_handle_enqueue0+0x8f3/0x1400 [ptlrpc] [ 618.407723] [<ffffffffc0e3bc10>] ? lustre_swab_ldlm_request+0x0/0x30 [ptlrpc] [ 618.408607] [<ffffffffc0e99752>] tgt_enqueue+0x62/0x210 [ptlrpc] [ 618.409414] [<ffffffffc0ea1965>] tgt_request_handle+0x925/0x13b0 [ptlrpc] [ 618.410241] [<ffffffffc0e45c7e>] ptlrpc_server_handle_request+0x24e/0xab0 [ptlrpc] [ 618.411197] [<ffffffff810bc0f8>] ? __wake_up_common+0x58/0x90 [ 618.411936] [<ffffffffc0e49422>] ptlrpc_main+0xa92/0x1e40 [ptlrpc] [ 618.412664] [<ffffffff810c0d30>] ? finish_task_switch+0x50/0x160 [ 618.413477] [<ffffffffc0e48990>] ? ptlrpc_main+0x0/0x1e40 [ptlrpc] [ 618.414215] [<ffffffff810b252f>] kthread+0xcf/0xe0 [ 618.414856] [<ffffffff810b2460>] ? kthread+0x0/0xe0 [ 618.415446] [<ffffffff816b8798>] ret_from_fork+0x58/0x90 [ 618.416169] [<ffffffff810b2460>] ? kthread+0x0/0xe0 [ 618.416780] [ 618.416974] LustreError: dumping log to /tmp/lustre-log.1516576127.12418 |