[LU-4734] umounts of OST stuck Created: 07/Mar/14  Updated: 11/Mar/14  Resolved: 11/Mar/14

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.6
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Sebastien Buisson (Inactive) Assignee: Nathaniel Clark
Resolution: Duplicate Votes: 0
Labels: None

Issue Links:
Duplicate
duplicates LU-3230 conf-sanity fails to start run: umoun... Resolved
Related
is related to LU-3230 conf-sanity fails to start run: umoun... Resolved
Severity: 3
Rank (Obsolete): 13009

 Description   

Hi,

We seem to hit same issue as LU-3230.

The umount was in :

crash> bt 3632
PID: 3632 TASK: ffff880455aec080 CPU: 7 COMMAND: "umount"
 #0 [ffff880455c53958] schedule at ffffffff81485765
 #1 [ffff880455c53a20] schedule_timeout at ffffffff81486590
 #2 [ffff880455c53ad0] cfs_schedule_timeout_and_set_state at ffffffffa051469d
[libcfs]
 #3 [ffff880455c53ae0] obd_exports_barrier at ffffffffa05eb89d [obdclass]
 #4 [ffff880455c53b30] filter_precleanup at ffffffffa0bcc0a2 [obdfilter]
 #5 [ffff880455c53b90] class_cleanup at ffffffffa0609f97 [obdclass]
 #6 [ffff880455c53c10] class_process_config at ffffffffa060c323 [obdclass]
 #7 [ffff880455c53cb0] class_manual_cleanup at ffffffffa060d069 [obdclass]
 #8 [ffff880455c53d70] server_put_super at ffffffffa0618f4c [obdclass]
 #9 [ffff880455c53e40] generic_shutdown_super at ffffffff81165f3b
#10 [ffff880455c53e60] kill_anon_super at ffffffff81166056
#11 [ffff880455c53e80] lustre_kill_super at ffffffffa060eca6 [obdclass]
#12 [ffff880455c53ea0] deactivate_super at ffffffff81166ff0
#13 [ffff880455c53ec0] mntput_no_expire at ffffffff811831cf
#14 [ffff880455c53ef0] sys_umount at ffffffff81183c98
#15 [ffff880455c53f80] system_call_fastpath at ffffffff810030f2

and we can see in the console the following messages :

Lustre: ptmp2-OST008d is waiting for obd_unlinked_exports more than 8 seconds. The obd refcount = 6. Is it stuck?
Lustre: ptmp2-OST008d is waiting for obd_unlinked_exports more than 16 seconds. The obd refcount = 6. Is it stuck?
Lustre: DEBUG MARKER: Tue Jan 14 11:45:01 2014

Lustre: ptmp2-OST008d is waiting for obd_unlinked_exports more than 32 seconds. The obd refcount = 6. Is it stuck?
Lustre: ptmp2-OST008d is waiting for obd_unlinked_exports more than 64 seconds. The obd refcount = 6. Is it stuck?

In the server crash we can find the struct obd_device *obd pointer:
0xffff880864076038
and find the connection that the server is waiting for:

crash> struct ptlrpc_connection 0xffff88083b878bc0
struct ptlrpc_connection {
  c_hash = {
    next = 0x0,
    pprev = 0xffff881035375440
  },
  c_self = 1407418007560379,
  c_peer = {
    nid = 1407418007565894,
    pid = 12345
  },
  c_remote_uuid = {
    uuid =
"NET_0x5000a0a643646_UUID\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
                       JO.BOO.PI.FO -> lascaux4091
                       JO.BOO.WL.BZF -> lascaux226

  },
  c_refcount = {
    counter = 3
  }
}

Thanks,
Sebastien.



 Comments   
Comment by Peter Jones [ 07/Mar/14 ]

Nathaniel

Could you please take care of this one?

Thanks

Peter

Comment by Nathaniel Clark [ 07/Mar/14 ]

This does look to be a duplicate of LU-3230, which is fixed on b2_5 (will be in 2.5.1) and master (2.5.52 and will be in 2.6). There is also a patch for b2_4 which hasn't been merged yet http://review.whamcloud.com/8591 so the earliest release it would be included in would be 2.4.3 (if there is one).

Generated at Sat Feb 10 01:45:23 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.