Details
-
Bug
-
Resolution: Duplicate
-
Minor
-
None
-
Lustre 2.1.6
-
None
-
3
-
13009
Description
Hi,
We seem to hit same issue as LU-3230.
The umount was in :
crash> bt 3632 PID: 3632 TASK: ffff880455aec080 CPU: 7 COMMAND: "umount" #0 [ffff880455c53958] schedule at ffffffff81485765 #1 [ffff880455c53a20] schedule_timeout at ffffffff81486590 #2 [ffff880455c53ad0] cfs_schedule_timeout_and_set_state at ffffffffa051469d [libcfs] #3 [ffff880455c53ae0] obd_exports_barrier at ffffffffa05eb89d [obdclass] #4 [ffff880455c53b30] filter_precleanup at ffffffffa0bcc0a2 [obdfilter] #5 [ffff880455c53b90] class_cleanup at ffffffffa0609f97 [obdclass] #6 [ffff880455c53c10] class_process_config at ffffffffa060c323 [obdclass] #7 [ffff880455c53cb0] class_manual_cleanup at ffffffffa060d069 [obdclass] #8 [ffff880455c53d70] server_put_super at ffffffffa0618f4c [obdclass] #9 [ffff880455c53e40] generic_shutdown_super at ffffffff81165f3b #10 [ffff880455c53e60] kill_anon_super at ffffffff81166056 #11 [ffff880455c53e80] lustre_kill_super at ffffffffa060eca6 [obdclass] #12 [ffff880455c53ea0] deactivate_super at ffffffff81166ff0 #13 [ffff880455c53ec0] mntput_no_expire at ffffffff811831cf #14 [ffff880455c53ef0] sys_umount at ffffffff81183c98 #15 [ffff880455c53f80] system_call_fastpath at ffffffff810030f2
and we can see in the console the following messages :
Lustre: ptmp2-OST008d is waiting for obd_unlinked_exports more than 8 seconds. The obd refcount = 6. Is it stuck? Lustre: ptmp2-OST008d is waiting for obd_unlinked_exports more than 16 seconds. The obd refcount = 6. Is it stuck? Lustre: DEBUG MARKER: Tue Jan 14 11:45:01 2014 Lustre: ptmp2-OST008d is waiting for obd_unlinked_exports more than 32 seconds. The obd refcount = 6. Is it stuck? Lustre: ptmp2-OST008d is waiting for obd_unlinked_exports more than 64 seconds. The obd refcount = 6. Is it stuck?
In the server crash we can find the struct obd_device *obd pointer:
0xffff880864076038
and find the connection that the server is waiting for:
crash> struct ptlrpc_connection 0xffff88083b878bc0 struct ptlrpc_connection { c_hash = { next = 0x0, pprev = 0xffff881035375440 }, c_self = 1407418007560379, c_peer = { nid = 1407418007565894, pid = 12345 }, c_remote_uuid = { uuid = "NET_0x5000a0a643646_UUID\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000" JO.BOO.PI.FO -> lascaux4091 JO.BOO.WL.BZF -> lascaux226 }, c_refcount = { counter = 3 } }
Thanks,
Sebastien.