Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2683

Client deadlock in cl_lock_mutex_get

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.4.0
    • Lustre 2.4.0
    • Sequoia ppc64 client, 2.3.58-5chaos, servers are x86_64 running 2.3.58-6chaos
    • 3
    • 6270

    Description

      With 2.3.58-5chaos ppc64 clients, we are seeing Lustre hang with many threads waiting on a mutex under cl_lock_mutex_get(). In the attached file "RA0-ID-J03.log.txt", you can see the sysrq-t and sysrq-l output. The node is responsive, but lustre is doing nothing because all ptlrpcd threads are stuck in the same path:

      2013-01-25 11:51:10.788870 {DefaultControlEventListener} [mmcs]{131}.0.0: ptlrpcd_0     D 0000000000000000     0  3268      2 0x00000000
      2013-01-25 11:51:10.788922 {DefaultControlEventListener} [mmcs]{131}.0.0: Call Trace:
      2013-01-25 11:51:10.788974 {DefaultControlEventListener} [mmcs]{131}.0.0: [c0000003c621ee70] [c00000000068f010] svc_rdma_ops+0xda18/0x1a900 (unreliable)
      2013-01-25 11:51:10.789026 {DefaultControlEventListener} [mmcs]{131}.0.0: [c0000003c621f040] [c000000000009b2c] .__switch_to+0xc4/0x100
      2013-01-25 11:51:10.789078 {DefaultControlEventListener} [mmcs]{131}.0.0: [c0000003c621f0d0] [c00000000042a418] .schedule+0x7d4/0x944
      2013-01-25 11:51:10.789129 {DefaultControlEventListener} [mmcs]{131}.0.0: [c0000003c621f380] [c00000000042b4a8] .__mutex_lock_slowpath+0x208/0x390
      2013-01-25 11:51:10.789181 {DefaultControlEventListener} [mmcs]{131}.0.0: [c0000003c621f480] [c00000000042bf44] .mutex_lock+0x38/0x58
      2013-01-25 11:51:10.789234 {DefaultControlEventListener} [mmcs]{131}.0.0: [c0000003c621f500] [8000000002501cf8] .cl_lock_mutex_get+0xc8/0x110 [obdclass]
      2013-01-25 11:51:10.789285 {DefaultControlEventListener} [mmcs]{131}.0.0: [c0000003c621f590] [800000000514d124] .lovsub_parent_lock+0x94/0x260 [lov]
      2013-01-25 11:51:10.789337 {DefaultControlEventListener} [mmcs]{131}.0.0: [c0000003c621f640] [800000000514de70] .lovsub_lock_state+0xd0/0x300 [lov]
      2013-01-25 11:51:10.789389 {DefaultControlEventListener} [mmcs]{131}.0.0: [c0000003c621f710] [80000000024fe344] .cl_lock_state_signal+0xd4/0x2c0 [obdclass]
      2013-01-25 11:51:10.789441 {DefaultControlEventListener} [mmcs]{131}.0.0: [c0000003c621f7d0] [80000000024ff448] .cl_lock_signal+0xa8/0x260 [obdclass]
      2013-01-25 11:51:10.789492 {DefaultControlEventListener} [mmcs]{131}.0.0: [c0000003c621f890] [80000000046adb24] .osc_lock_upcall+0x194/0x810 [osc]
      2013-01-25 11:51:10.789544 {DefaultControlEventListener} [mmcs]{131}.0.0: [c0000003c621f970] [8000000004688a7c] .osc_enqueue_fini+0xfc/0x3f0 [osc]
      2013-01-25 11:51:10.789596 {DefaultControlEventListener} [mmcs]{131}.0.0: [c0000003c621fa60] [8000000004697364] .osc_enqueue_interpret+0x104/0x240 [osc]
      2013-01-25 11:51:10.789647 {DefaultControlEventListener} [mmcs]{131}.0.0: [c0000003c621fb40] [8000000003b7a308] .ptlrpc_check_set+0x3c8/0x4e50 [ptlrpc]
      2013-01-25 11:51:10.789700 {DefaultControlEventListener} [mmcs]{131}.0.0: [c0000003c621fd20] [8000000003bcffec] .ptlrpcd_check+0x66c/0x870 [ptlrpc]
      2013-01-25 11:51:10.789751 {DefaultControlEventListener} [mmcs]{131}.0.0: [c0000003c621fe40] [8000000003bd054c] .ptlrpcd+0x35c/0x510 [ptlrpc]
      2013-01-25 11:51:10.789802 {DefaultControlEventListener} [mmcs]{131}.0.0: [c0000003c621ff90] [c00000000001b9a0] .kernel_thread+0x54/0x70
      

      Attachments

        Issue Links

          Activity

            People

              jay Jinshan Xiong (Inactive)
              morrone Christopher Morrone (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: