Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2939

Lustre: MGS is waiting for obd_unlinked_exports more than 256 seconds. The obd refcount = 5. Is it stuck?

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.4.0
    • None
    • 3
    • 7057

    Description

      Running various tests in a loop I am seeing the message like that somewhat regularly.
      The latest one happened in replay-single test 52

      == replay-single test 52: time out lock replay (3764) == 01:37:31 (1362638251)
      Filesystem           1K-blocks      Used Available Use% Mounted on
      192.168.10.216@tcp:/lustre
                              374928     50772    303616  15% /mnt/lustre
      mcreate: cannot create `/mnt/lustre2/fsa-centos6-6.localnet' with mode 0100644: Read-only file system
      rm: cannot remove `/mnt/lustre2/fsa-centos6-6.localnet': No such file or directory
      fail_loc=0x8000030c
      Failing mds1 on centos6-6.localnet
      Stopping /mnt/mds1 (opts:) on centos6-6.localnet
      

      [37981.683825] Lustre: DEBUG MARKER: == replay-single test 52: time out lock replay (3764) == 01:37:31 (1362638251)
      [37981.704645] Lustre: DEBUG MARKER: cancel_lru_locks mdc start
      [37981.735425] Lustre: DEBUG MARKER: cancel_lru_locks mdc stop
      [37981.917515] Turning device loop0 (0x700000) read-only
      [37981.942649] Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000
      [37981.950096] Lustre: DEBUG MARKER: local REPLAY BARRIER on lustre-MDT0000
      [37984.526745] LustreError: 137-5: UUID 'lustre-MDT0000_UUID' is not available for connect (no target)
      [37984.527548] LustreError: Skipped 1 previous similar message
      [37989.523057] LustreError: 137-5: UUID 'lustre-MDT0000_UUID' is not available for connect (no target)
      [37989.523952] LustreError: Skipped 2 previous similar messages
      [37996.136083] Lustre: MGS is waiting for obd_unlinked_exports more than 8 seconds. The obd refcount = 5. Is it stuck?
      [37999.132513] LustreError: 137-5: UUID 'lustre-MDT0000_UUID' is not available for connect (no target)
      [38012.137563] Lustre: MGS is waiting for obd_unlinked_exports more than 16 seconds. The obd refcount = 5. Is it stuck?
      [38019.133096] LustreError: 137-5: UUID 'lustre-MDT0000_UUID' is not available for connect (no target)
      [38019.133919] LustreError: Skipped 11 previous similar messages
      [38044.137577] Lustre: MGS is waiting for obd_unlinked_exports more than 32 seconds. The obd refcount = 5. Is it stuck?
      [38050.132095] Lustre: 11212:0:(client.c:1866:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1362638298/real 1362638298] req@ffff8800b451a7f0 x1428827873041828/t0(0) o250->MGC192.168.10.216@tcp@0@lo:26/25 lens 400/544 e 0 to 1 dl 1362638319 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      [38050.134028] Lustre: 11212:0:(client.c:1866:ptlrpc_expire_one_request()) Skipped 25 previous similar messages
      [38054.136029] LustreError: 137-5: UUID 'lustre-MDT0000_UUID' is not available for connect (no target)
      [38054.136894] LustreError: Skipped 20 previous similar messages
      [38108.137578] Lustre: MGS is waiting for obd_unlinked_exports more than 64 seconds. The obd refcount = 5. Is it stuck?
      [38119.136041] LustreError: 137-5: UUID 'lustre-MDT0000_UUID' is not available for connect (no target)
      [38119.137021] LustreError: Skipped 38 previous similar messages
      [38236.137579] Lustre: MGS is waiting for obd_unlinked_exports more than 128 seconds. The obd refcount = 5. Is it stuck?
      [38249.135021] LustreError: 137-5: UUID 'lustre-MDT0000_UUID' is not available for connect (no target)
      [38249.135879] LustreError: Skipped 77 previous similar messages
      [38400.624138] INFO: task umount:3429 blocked for more than 120 seconds.
      [38400.624649] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [38400.625436] umount D 0000000000000003 2608 3429 3428 0x00000000
      [38400.626043] ffff88006ed25a98 0000000000000086 0000000000000000 ffff88006ed25a48
      [38400.626850] ffff88006ed25a08 ffff88006a542bf0 ffffffffa0d9320f 0000000000000000
      [38400.627634] ffff88007a544ab8 ffff88006ed25fd8 000000000000fba8 ffff88007a544ab8
      [38400.628438] Call Trace:
      [38400.628792] [<ffffffff814f8ad1>] schedule_timeout+0x191/0x2e0
      [38400.629232] [<ffffffff8107bcd0>] ? process_timeout+0x0/0x10
      [38400.629751] [<ffffffffa0a7f75d>] cfs_schedule_timeout_and_set_state+0x1d/0x20 [libcfs]
      [38400.630619] [<ffffffffa0d19670>] obd_exports_barrier+0xb0/0x190 [obdclass]
      [38400.631133] [<ffffffffa05d2936>] mgs_device_fini+0xf6/0x5c0 [mgs]
      [38400.631615] [<ffffffffa0d45cc7>] class_cleanup+0x577/0xda0 [obdclass]
      [38400.633249] [<ffffffffa0d1be9c>] ? class_name2dev+0x7c/0xe0 [obdclass]
      [38400.633799] [<ffffffffa0d475ac>] class_process_config+0x10bc/0x1c80 [obdclass]
      [38400.634531] [<ffffffffa0d40f93>] ? lustre_cfg_new+0x353/0x7e0 [obdclass]
      [38400.635035] [<ffffffffa0d482e9>] class_manual_cleanup+0x179/0x6e0 [obdclass]
      [38400.635513] [<ffffffff814faebe>] ? _read_unlock+0xe/0x10
      [38400.635993] [<ffffffffa0d1be9c>] ? class_name2dev+0x7c/0xe0 [obdclass]
      [38400.636501] [<ffffffffa0d5415d>] server_put_super+0x43d/0xe60 [obdclass]
      [38400.637008] [<ffffffff8117d6ab>] generic_shutdown_super+0x5b/0xe0
      [38400.637480] [<ffffffff8117d796>] kill_anon_super+0x16/0x60
      [38400.638004] [<ffffffffa0d4a0e6>] lustre_kill_super+0x36/0x60 [obdclass]
      [38400.638468] [<ffffffff8117e825>] deactivate_super+0x85/0xa0
      [38400.638904] [<ffffffff8119a89f>] mntput_no_expire+0xbf/0x110
      [38400.639338] [<ffffffff8119b34b>] sys_umount+0x7b/0x3a0
      [38400.639764] [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
      [38492.136071] Lustre: MGS is waiting for obd_unlinked_exports more than 256 seconds. The obd refcount = 5. Is it stuck?
      [38509.132648] LustreError: 137-5: UUID 'lustre-MDT0000_UUID' is not available for connect (no target)
      [38509.133504] LustreError: Skipped 154 previous similar messages
      ...

      
      

      I dumped a crashdump so all interested parties can take a look. /exports/crashdumps/t2/hung-obd_unlinked_exports.dmp (modules present too)

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              green Oleg Drokin
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: