Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12319

sanity-scrub test 5 hangs with OST waiting for obd_unlinked_exports

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.13.0, Lustre 2.12.1, Lustre 2.12.2
    • ZFS/DNE
    • 3
    • 9223372036854775807

    Description

      sanity-scrub test_5 hangs on OST unmount with OST waiting for obd_unlinked_exports for ZFS/DNE. We’ve seen this test fail twice with the first failure on 2019-04-17 with logs at https://testing.whamcloud.com/test_sets/b416bc06-60e0-11e9-92fe-52540065bddc.

      Looking at the OSS console log, we see

      [11834.952415] Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-ost2
      [11849.146323] Lustre: lustre-OST0001 is waiting for obd_unlinked_exports more than 8 seconds. The obd refcount = 4. Is it stuck?
      [11849.148320] Lustre: lustre-OST0001: UNLINKED ffff96655cb3f800 028216d1-49cf-def1-f2c5-8e9a09f0ba40 10.9.5.250@tcp 1 (0 0 0) 1 0 1 0:           (null)  0 stale:0
      [11865.150330] Lustre: lustre-OST0001 is waiting for obd_unlinked_exports more than 16 seconds. The obd refcount = 4. Is it stuck?
      [11865.152291] Lustre: lustre-OST0001: UNLINKED ffff96655cb3f800 028216d1-49cf-def1-f2c5-8e9a09f0ba40 10.9.5.250@tcp 1 (0 0 0) 1 0 1 0:           (null)  0 stale:0
      [11897.154323] Lustre: lustre-OST0001 is waiting for obd_unlinked_exports more than 32 seconds. The obd refcount = 4. Is it stuck?
      [11897.156322] Lustre: lustre-OST0001: UNLINKED ffff96655cb3f800 028216d1-49cf-def1-f2c5-8e9a09f0ba40 10.9.5.250@tcp 1 (0 0 0) 1 0 1 0:           (null)  0 stale:0
      [11961.158333] Lustre: lustre-OST0001 is waiting for obd_unlinked_exports more than 64 seconds. The obd refcount = 4. Is it stuck?
      [11961.160328] Lustre: lustre-OST0001: UNLINKED ffff96655cb3f800 028216d1-49cf-def1-f2c5-8e9a09f0ba40 10.9.5.250@tcp 1 (0 0 0) 1 0 1 0:           (null)  0 stale:0
      [12089.162328] Lustre: lustre-OST0001 is waiting for obd_unlinked_exports more than 128 seconds. The obd refcount = 4. Is it stuck?
      [12089.164308] Lustre: lustre-OST0001: UNLINKED ffff96655cb3f800 028216d1-49cf-def1-f2c5-8e9a09f0ba40 10.9.5.250@tcp 1 (0 0 0) 1 0 1 0:           (null)  0 stale:0
      [12240.336376] INFO: task umount:4154 blocked for more than 120 seconds.
      [12240.337520] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [12240.338841] umount          D ffff966553984100     0  4154   4153 0x00000080
      [12240.340202] Call Trace:
      [12240.340846]  [<ffffffffc0e04234>] ? print_export_data.isra.18+0x224/0x240 [obdclass]
      [12240.342282]  [<ffffffffbb968c49>] schedule+0x29/0x70
      [12240.343148]  [<ffffffffbb966668>] schedule_timeout+0x168/0x2d0
      [12240.344251]  [<ffffffffbb2a9920>] ? __internal_add_timer+0x130/0x130
      [12240.345354]  [<ffffffffc0e0bddc>] ? dump_exports+0xec/0x100 [obdclass]
      [12240.346560]  [<ffffffffc0e0be99>] obd_exports_barrier+0xa9/0x1a0 [obdclass]
      [12240.347791]  [<ffffffffc13d2873>] ofd_device_fini+0xa3/0x2d0 [ofd]
      [12240.349010]  [<ffffffffc0e231c2>] class_cleanup+0x862/0xbd0 [obdclass]
      [12240.350134]  [<ffffffffc0e241bc>] class_process_config+0x65c/0x2830 [obdclass]
      [12240.351480]  [<ffffffffc0cd1f37>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
      [12240.352618]  [<ffffffffc0e26556>] class_manual_cleanup+0x1c6/0x710 [obdclass]
      [12240.353967]  [<ffffffffc0e5697e>] server_put_super+0x8de/0xcd0 [obdclass]
      [12240.355140]  [<ffffffffbb4441bd>] generic_shutdown_super+0x6d/0x100
      [12240.356291]  [<ffffffffbb4445b2>] kill_anon_super+0x12/0x20
      [12240.357262]  [<ffffffffc0e290c2>] lustre_kill_super+0x32/0x50 [obdclass]
      [12240.358476]  [<ffffffffbb44496e>] deactivate_locked_super+0x4e/0x70
      [12240.359543]  [<ffffffffbb4450f6>] deactivate_super+0x46/0x60
      [12240.360648]  [<ffffffffbb46367f>] cleanup_mnt+0x3f/0x80
      [12240.361549]  [<ffffffffbb463712>] __cleanup_mnt+0x12/0x20
      [12240.362568]  [<ffffffffbb2be7db>] task_work_run+0xbb/0xe0
      [12240.363517]  [<ffffffffbb22bc65>] do_notify_resume+0xa5/0xc0
      [12240.364572]  [<ffffffffbb976124>] int_signal+0x12/0x17
      [12345.166337] Lustre: lustre-OST0001 is waiting for obd_unlinked_exports more than 256 seconds. The obd refcount = 4. Is it stuck?
      [12345.168318] Lustre: lustre-OST0001: UNLINKED ffff96655cb3f800 028216d1-49cf-def1-f2c5-8e9a09f0ba40 10.9.5.250@tcp 1 (0 0 0) 1 0 1 0:           (null)  0 stale:0
      

      This failure looks like several other tickets where the MGS is waiting for obd_unlinked_exports, but those tickets are closed. LU-8500 for example.

      The other hang took place on 2019-05-17 with log at https://testing.whamcloud.com/test_sets/c0369af6-7a17-11e9-be83-52540065bddc.

      Attachments

        Activity

          People

            wc-triage WC Triage
            jamesanunez James Nunez (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: