Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.13.0, Lustre 2.12.6
-
None
-
3
-
9223372036854775807
Description
This issue was created by maloo for bfaccini <bruno.faccini@intel.com>
This issue relates to the following tests suites run:
https://testing.whamcloud.com/test_sets/ac9d40dc-62a0-11e9-92d8-52540065bddc
https://testing.whamcloud.com/sub_tests/6ef1565c-6619-11e9-a6f9-52540065bddc
https://testing.whamcloud.com/sub_tests/5ce256f8-66a2-11e9-a6f2-52540065bddc
test_130a failed with the following error:
Timeout occurred after 214 mins, last suite running was replay-single, restarting cluster to continue tests
MMDS Console :
[ 9269.958840] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == replay-single test 130a: DoM file create \(setstripe\) replay ======================================= 09:01:59 \(1555664519\) [ 9270.176313] Lustre: DEBUG MARKER: == replay-single test 130a: DoM file create (setstripe) replay ======================================= 09:01:59 (1555664519) [ 9270.379677] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null || /usr/sbin/lctl lustre_build_version 2>/dev/null || /usr/sbin/lctl --version 2>/dev/null | cut -d' ' -f2 [ 9270.695930] Lustre: DEBUG MARKER: sync; sync; sync [ 9271.703775] Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 notransno [ 9272.010818] Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 readonly [ 9272.361465] Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds1 REPLAY BARRIER on lustre-MDT0000 [ 9272.515462] Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000 [ 9272.688112] Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true [ 9272.992505] Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds1 [ 9274.621764] LustreError: 18079:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8b478eaa1f80 x1631222875733264/t0(0) o41->lustre-MDT0001-osp-MDT0000@10.9.25.5@tcp:24/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 [ 9274.625498] LustreError: 18079:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 1 previous similar message [ 9274.686206] Lustre: lustre-MDT0000: Not available for connect from 0@lo (stopping) [ 9274.687681] Lustre: Skipped 24 previous similar messages [ 9287.251726] Lustre: lustre-MDT0000 is waiting for obd_unlinked_exports more than 8 seconds. The obd refcount = 4. Is it stuck? [ 9287.253710] Lustre: lustre-MDT0000: UNLINKED ffff8b479f68c800 2a0a0e03-7e4d-227a-03c2-bc66c4ed138c 10.9.25.1@tcp 1 (1 0 0) 1 0 1 0: (null) 554050781187 stale:0 [ 9301.677791] Lustre: mdt00_001: service thread pid 11852 was inactive for 40.031 seconds. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. [ 9303.255723] Lustre: lustre-MDT0000 is waiting for obd_unlinked_exports more than 16 seconds. The obd refcount = 4. Is it stuck? [ 9303.257733] Lustre: lustre-MDT0000: UNLINKED ffff8b479f68c800 2a0a0e03-7e4d-227a-03c2-bc66c4ed138c 10.9.25.1@tcp 1 (1 0 0) 1 0 1 0: (null) 554050781187 stale:0 [ 9335.259742] Lustre: lustre-MDT0000 is waiting for obd_unlinked_exports more than 32 seconds. The obd refcount = 4. Is it stuck? [ 9335.261875] Lustre: lustre-MDT0000: UNLINKED ffff8b479f68c800 2a0a0e03-7e4d-227a-03c2-bc66c4ed138c 10.9.25.1@tcp 1 (1 0 0) 1 0 1 0: (null) 554050781187 stale:0 [ 9399.263733] Lustre: lustre-MDT0000 is waiting for obd_unlinked_exports more than 64 seconds. The obd refcount = 4. Is it stuck? [ 9399.265718] Lustre: lustre-MDT0000: UNLINKED ffff8b479f68c800 2a0a0e03-7e4d-227a-03c2-bc66c4ed138c 10.9.25.1@tcp 1 (1 0 0) 1 0 1 0: (null) 554050781187 stale:0 [ 9527.267724] Lustre: lustre-MDT0000 is waiting for obd_unlinked_exports more than 128 seconds. The obd refcount = 4. Is it stuck? [ 9527.269735] Lustre: lustre-MDT0000: UNLINKED ffff8b479f68c800 2a0a0e03-7e4d-227a-03c2-bc66c4ed138c 10.9.25.1@tcp 1 (1 0 0) 1 0 1 0: (null) 554050781187 stale:0 [ 9720.312753] INFO: task umount:29172 blocked for more than 120 seconds. [ 9720.314046] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 9720.315363] umount D ffff8b47bac3f6c0 0 29172 29171 0x00000080 [ 9720.316759] Call Trace: [ 9720.317225] [<ffffffffc0ccfe94>] ? print_export_data.isra.18+0x224/0x240 [obdclass] [ 9720.318535] [<ffffffff9c367c49>] schedule+0x29/0x70 [ 9720.319550] [<ffffffff9c365668>] schedule_timeout+0x168/0x2d0 [ 9720.320532] [<ffffffff9bca98e0>] ? __internal_add_timer+0x130/0x130 [ 9720.321705] [<ffffffffc0cd77f9>] obd_exports_barrier+0xa9/0x1a0 [obdclass] [ 9720.322942] [<ffffffffc1300e22>] mdt_device_fini+0xe2/0x930 [mdt] [ 9720.324052] [<ffffffffc0d01583>] ? lu_context_init+0xd3/0x1f0 [obdclass] [ 9720.325301] [<ffffffffc0ceead2>] class_cleanup+0x862/0xbd0 [obdclass] [ 9720.326435] [<ffffffffc0cefacc>] class_process_config+0x65c/0x2830 [obdclass] [ 9720.327791] [<ffffffffc0ba3f17>] ? libcfs_debug_msg+0x57/0x80 [libcfs] [ 9720.328989] [<ffffffffc0cf1e66>] class_manual_cleanup+0x1c6/0x720 [obdclass] [ 9720.330295] [<ffffffffc0d2259e>] server_put_super+0x8de/0xcd0 [obdclass] [ 9720.331480] [<ffffffff9be43dbd>] generic_shutdown_super+0x6d/0x100 [ 9720.332660] [<ffffffff9be441b2>] kill_anon_super+0x12/0x20 [ 9720.333624] [<ffffffffc0cf4a82>] lustre_kill_super+0x32/0x50 [obdclass] [ 9720.334830] [<ffffffff9be4456e>] deactivate_locked_super+0x4e/0x70 [ 9720.335912] [<ffffffff9be44cf6>] deactivate_super+0x46/0x60 [ 9720.336962] [<ffffffff9be6327f>] cleanup_mnt+0x3f/0x80 [ 9720.337861] [<ffffffff9be63312>] __cleanup_mnt+0x12/0x20 [ 9720.338903] [<ffffffff9bcbe79b>] task_work_run+0xbb/0xe0 [ 9720.339830] [<ffffffff9bc2bc65>] do_notify_resume+0xa5/0xc0 [ 9720.340869] [<ffffffff9c375124>] int_signal+0x12/0x17 [ 9783.271763] Lustre: lustre-MDT0000 is waiting for obd_unlinked_exports more than 256 seconds. The obd refcount = 4. Is it stuck? [ 9783.273856] Lustre: lustre-MDT0000: UNLINKED ffff8b479f68c800 2a0a0e03-7e4d-227a-03c2-bc66c4ed138c 10.9.25.1@tcp 1 (1 0 0) 1 0 1 0: (null) 554050781187 stale:0 [ 9875.645997] Lustre: lustre-MDT0000: Not available for connect from 0@lo (stopping) [ 9875.647925] Lustre: Skipped 1919 previous similar messages [ 9960.341752] INFO: task umount:29172 blocked for more than 120 seconds. [ 9960.342930] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 9960.344222] umount D ffff8b4794725140 0 29172 29171 0x00000080 [ 9960.345637] Call Trace: [ 9960.346100] [<ffffffffc0ccfe94>] ? print_export_data.isra.18+0x224/0x240 [obdclass] [ 9960.347457] [<ffffffff9c367c49>] schedule+0x29/0x70 [ 9960.348293] [<ffffffff9c365668>] schedule_timeout+0x168/0x2d0 [ 9960.349327] [<ffffffff9bca98e0>] ? __internal_add_timer+0x130/0x130 [ 9960.350416] [<ffffffffc0cd77f9>] obd_exports_barrier+0xa9/0x1a0 [obdclass] [ 9960.351657] [<ffffffffc1300e22>] mdt_device_fini+0xe2/0x930 [mdt] [ 9960.352701] [<ffffffffc0d01583>] ? lu_context_init+0xd3/0x1f0 [obdclass] [ 9960.353977] [<ffffffffc0ceead2>] class_cleanup+0x862/0xbd0 [obdclass] [ 9960.355070] [<ffffffffc0cefacc>] class_process_config+0x65c/0x2830 [obdclass] [ 9960.356351] [<ffffffffc0ba3f17>] ? libcfs_debug_msg+0x57/0x80 [libcfs] [ 9960.357456] [<ffffffffc0cf1e66>] class_manual_cleanup+0x1c6/0x720 [obdclass] [ 9960.358726] [<ffffffffc0d2259e>] server_put_super+0x8de/0xcd0 [obdclass] [ 9960.359859] [<ffffffff9be43dbd>] generic_shutdown_super+0x6d/0x100 [ 9960.360972] [<ffffffff9be441b2>] kill_anon_super+0x12/0x20 [ 9960.361932] [<ffffffffc0cf4a82>] lustre_kill_super+0x32/0x50 [obdclass] [ 9960.363113] [<ffffffff9be4456e>] deactivate_locked_super+0x4e/0x70 [ 9960.364150] [<ffffffff9be44cf6>] deactivate_super+0x46/0x60 [ 9960.365255] [<ffffffff9be6327f>] cleanup_mnt+0x3f/0x80 [ 9960.366130] [<ffffffff9be63312>] __cleanup_mnt+0x12/0x20 [ 9960.367114] [<ffffffff9bcbe79b>] task_work_run+0xbb/0xe0 [ 9960.368029] [<ffffffff9bc2bc65>] do_notify_resume+0xa5/0xc0 [ 9960.369027] [<ffffffff9c375124>] int_signal+0x12/0x17 ........................... [11040.569776] INFO: task umount:29172 blocked for more than 120 seconds. [11040.571000] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [11040.572289] umount D ffff8b4794725140 0 29172 29171 0x00000080 [11040.573636] Call Trace: [11040.574114] [<ffffffffc0ccfe94>] ? print_export_data.isra.18+0x224/0x240 [obdclass] [11040.575532] [<ffffffff9c367c49>] schedule+0x29/0x70 [11040.576402] [<ffffffff9c365668>] schedule_timeout+0x168/0x2d0 [11040.577443] [<ffffffff9bca98e0>] ? __internal_add_timer+0x130/0x130 [11040.578514] [<ffffffffc0cd77f9>] obd_exports_barrier+0xa9/0x1a0 [obdclass] [11040.579776] [<ffffffffc1300e22>] mdt_device_fini+0xe2/0x930 [mdt] [11040.580827] [<ffffffffc0d01583>] ? lu_context_init+0xd3/0x1f0 [obdclass] [11040.582065] [<ffffffffc0ceead2>] class_cleanup+0x862/0xbd0 [obdclass] [11040.583183] [<ffffffffc0cefacc>] class_process_config+0x65c/0x2830 [obdclass] [11040.584501] [<ffffffffc0ba3f17>] ? libcfs_debug_msg+0x57/0x80 [libcfs] [11040.585630] [<ffffffffc0cf1e66>] class_manual_cleanup+0x1c6/0x720 [obdclass] [11040.586971] [<ffffffffc0d2259e>] server_put_super+0x8de/0xcd0 [obdclass] [11040.588287] [<ffffffff9be43dbd>] generic_shutdown_super+0x6d/0x100 [11040.589406] [<ffffffff9be441b2>] kill_anon_super+0x12/0x20 [11040.590422] [<ffffffffc0cf4a82>] lustre_kill_super+0x32/0x50 [obdclass] [11040.591706] [<ffffffff9be4456e>] deactivate_locked_super+0x4e/0x70 [11040.592915] [<ffffffff9be44cf6>] deactivate_super+0x46/0x60 [11040.593923] [<ffffffff9be6327f>] cleanup_mnt+0x3f/0x80 [11040.594901] [<ffffffff9be63312>] __cleanup_mnt+0x12/0x20 [11040.595850] [<ffffffff9bcbe79b>] task_work_run+0xbb/0xe0 [11040.596956] [<ffffffff9bc2bc65>] do_notify_resume+0xa5/0xc0 [11040.597930] [<ffffffff9c375124>] int_signal+0x12/0x17 [11077.566009] Lustre: lustre-MDT0000: Not available for connect from 0@lo (stopping) [11077.567489] Lustre: Skipped 1916 previous similar messages [11319.279765] Lustre: lustre-MDT0000 is waiting for obd_unlinked_exports more than 1024 seconds. The obd refcount = 4. Is it stuck? [11319.281778] Lustre: lustre-MDT0000: UNLINKED ffff8b479f68c800 2a0a0e03-7e4d-227a-03c2-bc66c4ed138c 10.9.25.1@tcp 1 (1 0 0) 1 0 1 0: (null) 554050781187 stale:0 [11678.526106] Lustre: lustre-MDT0000: Not available for connect from 0@lo (stopping) [11678.528152] Lustre: Skipped 1918 previous similar messages [12279.485876] Lustre: lustre-MDT0000: Not available for connect from 0@lo (stopping) [12279.487604] Lustre: Skipped 1916 previous similar messages [12880.445916] Lustre: lustre-MDT0000: Not available for connect from 0@lo (stopping) [12880.448085] Lustre: Skipped 1916 previous similar messages