Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12220

replay-single test_130a: timeout, MDT umount stuck in obd_exports_barrier()

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.13.0, Lustre 2.12.6
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for bfaccini <bruno.faccini@intel.com>

      This issue relates to the following tests suites run:
      https://testing.whamcloud.com/test_sets/ac9d40dc-62a0-11e9-92d8-52540065bddc
      https://testing.whamcloud.com/sub_tests/6ef1565c-6619-11e9-a6f9-52540065bddc
      https://testing.whamcloud.com/sub_tests/5ce256f8-66a2-11e9-a6f2-52540065bddc

      test_130a failed with the following error:

      Timeout occurred after 214 mins, last suite running was replay-single, restarting cluster to continue tests
      

      MMDS Console :

      [ 9269.958840] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == replay-single test 130a: DoM file create \(setstripe\) replay ======================================= 09:01:59 \(1555664519\)
      [ 9270.176313] Lustre: DEBUG MARKER: == replay-single test 130a: DoM file create (setstripe) replay ======================================= 09:01:59 (1555664519)
      [ 9270.379677] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null ||
      				/usr/sbin/lctl lustre_build_version 2>/dev/null ||
      				/usr/sbin/lctl --version 2>/dev/null | cut -d' ' -f2
      [ 9270.695930] Lustre: DEBUG MARKER: sync; sync; sync
      [ 9271.703775] Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 notransno
      [ 9272.010818] Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 readonly
      [ 9272.361465] Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds1 REPLAY BARRIER on lustre-MDT0000
      [ 9272.515462] Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000
      [ 9272.688112] Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true
      [ 9272.992505] Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds1
      [ 9274.621764] LustreError: 18079:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED   req@ffff8b478eaa1f80 x1631222875733264/t0(0) o41->lustre-MDT0001-osp-MDT0000@10.9.25.5@tcp:24/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1
      [ 9274.625498] LustreError: 18079:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 1 previous similar message
      [ 9274.686206] Lustre: lustre-MDT0000: Not available for connect from 0@lo (stopping)
      [ 9274.687681] Lustre: Skipped 24 previous similar messages
      [ 9287.251726] Lustre: lustre-MDT0000 is waiting for obd_unlinked_exports more than 8 seconds. The obd refcount = 4. Is it stuck?
      [ 9287.253710] Lustre: lustre-MDT0000: UNLINKED ffff8b479f68c800 2a0a0e03-7e4d-227a-03c2-bc66c4ed138c 10.9.25.1@tcp 1 (1 0 0) 1 0 1 0:           (null)  554050781187 stale:0
      [ 9301.677791] Lustre: mdt00_001: service thread pid 11852 was inactive for 40.031 seconds. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one.
      [ 9303.255723] Lustre: lustre-MDT0000 is waiting for obd_unlinked_exports more than 16 seconds. The obd refcount = 4. Is it stuck?
      [ 9303.257733] Lustre: lustre-MDT0000: UNLINKED ffff8b479f68c800 2a0a0e03-7e4d-227a-03c2-bc66c4ed138c 10.9.25.1@tcp 1 (1 0 0) 1 0 1 0:           (null)  554050781187 stale:0
      [ 9335.259742] Lustre: lustre-MDT0000 is waiting for obd_unlinked_exports more than 32 seconds. The obd refcount = 4. Is it stuck?
      [ 9335.261875] Lustre: lustre-MDT0000: UNLINKED ffff8b479f68c800 2a0a0e03-7e4d-227a-03c2-bc66c4ed138c 10.9.25.1@tcp 1 (1 0 0) 1 0 1 0:           (null)  554050781187 stale:0
      [ 9399.263733] Lustre: lustre-MDT0000 is waiting for obd_unlinked_exports more than 64 seconds. The obd refcount = 4. Is it stuck?
      [ 9399.265718] Lustre: lustre-MDT0000: UNLINKED ffff8b479f68c800 2a0a0e03-7e4d-227a-03c2-bc66c4ed138c 10.9.25.1@tcp 1 (1 0 0) 1 0 1 0:           (null)  554050781187 stale:0
      [ 9527.267724] Lustre: lustre-MDT0000 is waiting for obd_unlinked_exports more than 128 seconds. The obd refcount = 4. Is it stuck?
      [ 9527.269735] Lustre: lustre-MDT0000: UNLINKED ffff8b479f68c800 2a0a0e03-7e4d-227a-03c2-bc66c4ed138c 10.9.25.1@tcp 1 (1 0 0) 1 0 1 0:           (null)  554050781187 stale:0
      [ 9720.312753] INFO: task umount:29172 blocked for more than 120 seconds.
      [ 9720.314046] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 9720.315363] umount          D ffff8b47bac3f6c0     0 29172  29171 0x00000080
      [ 9720.316759] Call Trace:
      [ 9720.317225]  [<ffffffffc0ccfe94>] ? print_export_data.isra.18+0x224/0x240 [obdclass]
      [ 9720.318535]  [<ffffffff9c367c49>] schedule+0x29/0x70
      [ 9720.319550]  [<ffffffff9c365668>] schedule_timeout+0x168/0x2d0
      [ 9720.320532]  [<ffffffff9bca98e0>] ? __internal_add_timer+0x130/0x130
      [ 9720.321705]  [<ffffffffc0cd77f9>] obd_exports_barrier+0xa9/0x1a0 [obdclass]
      [ 9720.322942]  [<ffffffffc1300e22>] mdt_device_fini+0xe2/0x930 [mdt]
      [ 9720.324052]  [<ffffffffc0d01583>] ? lu_context_init+0xd3/0x1f0 [obdclass]
      [ 9720.325301]  [<ffffffffc0ceead2>] class_cleanup+0x862/0xbd0 [obdclass]
      [ 9720.326435]  [<ffffffffc0cefacc>] class_process_config+0x65c/0x2830 [obdclass]
      [ 9720.327791]  [<ffffffffc0ba3f17>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
      [ 9720.328989]  [<ffffffffc0cf1e66>] class_manual_cleanup+0x1c6/0x720 [obdclass]
      [ 9720.330295]  [<ffffffffc0d2259e>] server_put_super+0x8de/0xcd0 [obdclass]
      [ 9720.331480]  [<ffffffff9be43dbd>] generic_shutdown_super+0x6d/0x100
      [ 9720.332660]  [<ffffffff9be441b2>] kill_anon_super+0x12/0x20
      [ 9720.333624]  [<ffffffffc0cf4a82>] lustre_kill_super+0x32/0x50 [obdclass]
      [ 9720.334830]  [<ffffffff9be4456e>] deactivate_locked_super+0x4e/0x70
      [ 9720.335912]  [<ffffffff9be44cf6>] deactivate_super+0x46/0x60
      [ 9720.336962]  [<ffffffff9be6327f>] cleanup_mnt+0x3f/0x80
      [ 9720.337861]  [<ffffffff9be63312>] __cleanup_mnt+0x12/0x20
      [ 9720.338903]  [<ffffffff9bcbe79b>] task_work_run+0xbb/0xe0
      [ 9720.339830]  [<ffffffff9bc2bc65>] do_notify_resume+0xa5/0xc0
      [ 9720.340869]  [<ffffffff9c375124>] int_signal+0x12/0x17
      [ 9783.271763] Lustre: lustre-MDT0000 is waiting for obd_unlinked_exports more than 256 seconds. The obd refcount = 4. Is it stuck?
      [ 9783.273856] Lustre: lustre-MDT0000: UNLINKED ffff8b479f68c800 2a0a0e03-7e4d-227a-03c2-bc66c4ed138c 10.9.25.1@tcp 1 (1 0 0) 1 0 1 0:           (null)  554050781187 stale:0
      [ 9875.645997] Lustre: lustre-MDT0000: Not available for connect from 0@lo (stopping)
      [ 9875.647925] Lustre: Skipped 1919 previous similar messages
      [ 9960.341752] INFO: task umount:29172 blocked for more than 120 seconds.
      [ 9960.342930] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 9960.344222] umount          D ffff8b4794725140     0 29172  29171 0x00000080
      [ 9960.345637] Call Trace:
      [ 9960.346100]  [<ffffffffc0ccfe94>] ? print_export_data.isra.18+0x224/0x240 [obdclass]
      [ 9960.347457]  [<ffffffff9c367c49>] schedule+0x29/0x70
      [ 9960.348293]  [<ffffffff9c365668>] schedule_timeout+0x168/0x2d0
      [ 9960.349327]  [<ffffffff9bca98e0>] ? __internal_add_timer+0x130/0x130
      [ 9960.350416]  [<ffffffffc0cd77f9>] obd_exports_barrier+0xa9/0x1a0 [obdclass]
      [ 9960.351657]  [<ffffffffc1300e22>] mdt_device_fini+0xe2/0x930 [mdt]
      [ 9960.352701]  [<ffffffffc0d01583>] ? lu_context_init+0xd3/0x1f0 [obdclass]
      [ 9960.353977]  [<ffffffffc0ceead2>] class_cleanup+0x862/0xbd0 [obdclass]
      [ 9960.355070]  [<ffffffffc0cefacc>] class_process_config+0x65c/0x2830 [obdclass]
      [ 9960.356351]  [<ffffffffc0ba3f17>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
      [ 9960.357456]  [<ffffffffc0cf1e66>] class_manual_cleanup+0x1c6/0x720 [obdclass]
      [ 9960.358726]  [<ffffffffc0d2259e>] server_put_super+0x8de/0xcd0 [obdclass]
      [ 9960.359859]  [<ffffffff9be43dbd>] generic_shutdown_super+0x6d/0x100
      [ 9960.360972]  [<ffffffff9be441b2>] kill_anon_super+0x12/0x20
      [ 9960.361932]  [<ffffffffc0cf4a82>] lustre_kill_super+0x32/0x50 [obdclass]
      [ 9960.363113]  [<ffffffff9be4456e>] deactivate_locked_super+0x4e/0x70
      [ 9960.364150]  [<ffffffff9be44cf6>] deactivate_super+0x46/0x60
      [ 9960.365255]  [<ffffffff9be6327f>] cleanup_mnt+0x3f/0x80
      [ 9960.366130]  [<ffffffff9be63312>] __cleanup_mnt+0x12/0x20
      [ 9960.367114]  [<ffffffff9bcbe79b>] task_work_run+0xbb/0xe0
      [ 9960.368029]  [<ffffffff9bc2bc65>] do_notify_resume+0xa5/0xc0
      [ 9960.369027]  [<ffffffff9c375124>] int_signal+0x12/0x17
      
      ...........................
      
      [11040.569776] INFO: task umount:29172 blocked for more than 120 seconds.
      [11040.571000] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [11040.572289] umount          D ffff8b4794725140     0 29172  29171 0x00000080
      [11040.573636] Call Trace:
      [11040.574114]  [<ffffffffc0ccfe94>] ? print_export_data.isra.18+0x224/0x240 [obdclass]
      [11040.575532]  [<ffffffff9c367c49>] schedule+0x29/0x70
      [11040.576402]  [<ffffffff9c365668>] schedule_timeout+0x168/0x2d0
      [11040.577443]  [<ffffffff9bca98e0>] ? __internal_add_timer+0x130/0x130
      [11040.578514]  [<ffffffffc0cd77f9>] obd_exports_barrier+0xa9/0x1a0 [obdclass]
      [11040.579776]  [<ffffffffc1300e22>] mdt_device_fini+0xe2/0x930 [mdt]
      [11040.580827]  [<ffffffffc0d01583>] ? lu_context_init+0xd3/0x1f0 [obdclass]
      [11040.582065]  [<ffffffffc0ceead2>] class_cleanup+0x862/0xbd0 [obdclass]
      [11040.583183]  [<ffffffffc0cefacc>] class_process_config+0x65c/0x2830 [obdclass]
      [11040.584501]  [<ffffffffc0ba3f17>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
      [11040.585630]  [<ffffffffc0cf1e66>] class_manual_cleanup+0x1c6/0x720 [obdclass]
      [11040.586971]  [<ffffffffc0d2259e>] server_put_super+0x8de/0xcd0 [obdclass]
      [11040.588287]  [<ffffffff9be43dbd>] generic_shutdown_super+0x6d/0x100
      [11040.589406]  [<ffffffff9be441b2>] kill_anon_super+0x12/0x20
      [11040.590422]  [<ffffffffc0cf4a82>] lustre_kill_super+0x32/0x50 [obdclass]
      [11040.591706]  [<ffffffff9be4456e>] deactivate_locked_super+0x4e/0x70
      [11040.592915]  [<ffffffff9be44cf6>] deactivate_super+0x46/0x60
      [11040.593923]  [<ffffffff9be6327f>] cleanup_mnt+0x3f/0x80
      [11040.594901]  [<ffffffff9be63312>] __cleanup_mnt+0x12/0x20
      [11040.595850]  [<ffffffff9bcbe79b>] task_work_run+0xbb/0xe0
      [11040.596956]  [<ffffffff9bc2bc65>] do_notify_resume+0xa5/0xc0
      [11040.597930]  [<ffffffff9c375124>] int_signal+0x12/0x17
      [11077.566009] Lustre: lustre-MDT0000: Not available for connect from 0@lo (stopping)
      [11077.567489] Lustre: Skipped 1916 previous similar messages
      [11319.279765] Lustre: lustre-MDT0000 is waiting for obd_unlinked_exports more than 1024 seconds. The obd refcount = 4. Is it stuck?
      [11319.281778] Lustre: lustre-MDT0000: UNLINKED ffff8b479f68c800 2a0a0e03-7e4d-227a-03c2-bc66c4ed138c 10.9.25.1@tcp 1 (1 0 0) 1 0 1 0:           (null)  554050781187 stale:0
      [11678.526106] Lustre: lustre-MDT0000: Not available for connect from 0@lo (stopping)
      [11678.528152] Lustre: Skipped 1918 previous similar messages
      [12279.485876] Lustre: lustre-MDT0000: Not available for connect from 0@lo (stopping)
      [12279.487604] Lustre: Skipped 1916 previous similar messages
      [12880.445916] Lustre: lustre-MDT0000: Not available for connect from 0@lo (stopping)
      [12880.448085] Lustre: Skipped 1916 previous similar messages
      

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              bruno Bruno Faccini (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: