Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4734

umounts of OST stuck

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.1.6
    • None
    • 3
    • 13009

    Description

      Hi,

      We seem to hit same issue as LU-3230.

      The umount was in :

      crash> bt 3632
      PID: 3632 TASK: ffff880455aec080 CPU: 7 COMMAND: "umount"
       #0 [ffff880455c53958] schedule at ffffffff81485765
       #1 [ffff880455c53a20] schedule_timeout at ffffffff81486590
       #2 [ffff880455c53ad0] cfs_schedule_timeout_and_set_state at ffffffffa051469d
      [libcfs]
       #3 [ffff880455c53ae0] obd_exports_barrier at ffffffffa05eb89d [obdclass]
       #4 [ffff880455c53b30] filter_precleanup at ffffffffa0bcc0a2 [obdfilter]
       #5 [ffff880455c53b90] class_cleanup at ffffffffa0609f97 [obdclass]
       #6 [ffff880455c53c10] class_process_config at ffffffffa060c323 [obdclass]
       #7 [ffff880455c53cb0] class_manual_cleanup at ffffffffa060d069 [obdclass]
       #8 [ffff880455c53d70] server_put_super at ffffffffa0618f4c [obdclass]
       #9 [ffff880455c53e40] generic_shutdown_super at ffffffff81165f3b
      #10 [ffff880455c53e60] kill_anon_super at ffffffff81166056
      #11 [ffff880455c53e80] lustre_kill_super at ffffffffa060eca6 [obdclass]
      #12 [ffff880455c53ea0] deactivate_super at ffffffff81166ff0
      #13 [ffff880455c53ec0] mntput_no_expire at ffffffff811831cf
      #14 [ffff880455c53ef0] sys_umount at ffffffff81183c98
      #15 [ffff880455c53f80] system_call_fastpath at ffffffff810030f2
      

      and we can see in the console the following messages :

      Lustre: ptmp2-OST008d is waiting for obd_unlinked_exports more than 8 seconds. The obd refcount = 6. Is it stuck?
      Lustre: ptmp2-OST008d is waiting for obd_unlinked_exports more than 16 seconds. The obd refcount = 6. Is it stuck?
      Lustre: DEBUG MARKER: Tue Jan 14 11:45:01 2014
      
      Lustre: ptmp2-OST008d is waiting for obd_unlinked_exports more than 32 seconds. The obd refcount = 6. Is it stuck?
      Lustre: ptmp2-OST008d is waiting for obd_unlinked_exports more than 64 seconds. The obd refcount = 6. Is it stuck?
      

      In the server crash we can find the struct obd_device *obd pointer:
      0xffff880864076038
      and find the connection that the server is waiting for:

      crash> struct ptlrpc_connection 0xffff88083b878bc0
      struct ptlrpc_connection {
        c_hash = {
          next = 0x0,
          pprev = 0xffff881035375440
        },
        c_self = 1407418007560379,
        c_peer = {
          nid = 1407418007565894,
          pid = 12345
        },
        c_remote_uuid = {
          uuid =
      "NET_0x5000a0a643646_UUID\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
                             JO.BOO.PI.FO -> lascaux4091
                             JO.BOO.WL.BZF -> lascaux226
      
        },
        c_refcount = {
          counter = 3
        }
      }
      

      Thanks,
      Sebastien.

      Attachments

        Issue Links

          Activity

            People

              utopiabound Nathaniel Clark
              sebastien.buisson Sebastien Buisson (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: