[LU-9867] MDT crashes on failback, attempting to umount Created: 11/Aug/17  Updated: 18/Dec/17

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.11.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Cliff White (Inactive) Assignee: Lai Siyao
Resolution: Unresolved Votes: 0
Labels: soak
Environment:

Soak performance cluster


Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Soak-11 MDT is failed over to soak-10.
Failover completes, soak-10 attempts to unmount

2017-08-11 07:04:42,819:fsmgmt.fsmgmt:INFO     Failing back soaked-MDT0003 ...
2017-08-11 07:04:42,819:fsmgmt.fsmgmt:INFO     Unmounting soaked-MDT0003 on soak-10 ...

soak-10 wedges on unmount, and then crashes.

Aug 11 07:04:42 soak-10 kernel: Lustre: Failing over soaked-MDT0003
Aug 11 07:04:48 soak-10 kernel: LustreError: 22116:0:(osp_precreate.c:619:osp_precreate_send()) soaked-OST0000-osc-MDT0003: can't precreate: rc = -5
Aug 11 07:04:48 soak-10 kernel: LustreError: 22116:0:(osp_precreate.c:1259:osp_precreate_thread()) soaked-OST0000-osc-MDT0003: cannot precreate objects: rc = -5
Aug 11 07:04:48 soak-10 kernel: LustreError: 22118:0:(osp_precreate.c:903:osp_precreate_cleanup_orphans()) soaked-OST0001-osc-MDT0003: cannot cleanup orphans: rc = -5
Aug 11 07:04:48 soak-10 kernel: LustreError: 3751:0:(osp_precreate.c:1311:osp_precreate_ready_condition()) soaked-OST000e-osc-MDT0003: precreate failed opd_pre_status -108
Aug 11 07:04:48 soak-10 kernel: LustreError: 3457:0:(client.c:1166:ptlrpc_import_delay_req()) @@@ IMP_CLOSED   req@ffff88039b436f00 x1575403953520176/t0(0) o13->soaked-OST0011-osc-MDT0003@192.168.1.107@o2ib:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1
Aug 11 07:04:48 soak-10 kernel: Lustre: soaked-MDT0003: Not available for connect from 192.168.1.144@o2ib (stopping)
Aug 11 07:04:48 soak-10 kernel: LustreError: 3460:0:(client.c:1166:ptlrpc_import_delay_req()) @@@ IMP_CLOSED   req@ffff8803bf7e4200 x1575403953576704/t0(0) o13->soaked-OST0009-osc-MDT0003@192.168.1.105@o2ib:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1
Aug 11 07:04:48 soak-10 kernel: LustreError: 3460:0:(client.c:1166:ptlrpc_import_delay_req()) Skipped 1 previous similar message
Aug 11 07:04:48 soak-10 kernel: LustreError: 22162:0:(osp_precreate.c:903:osp_precreate_cleanup_orphans()) soaked-OST0017-osc-MDT0003: cannot cleanup orphans: rc = -5
Aug 11 07:04:48 soak-10 kernel: LustreError: 22162:0:(osp_precreate.c:903:osp_precreate_cleanup_orphans()) Skipped 4 previous similar messages
Aug 11 07:04:48 soak-10 kernel: LustreError: 22313:0:(ldlm_resource.c:1094:ldlm_resource_complain()) mdt-soaked-MDT0003_UUID: namespace resource [0x2c000040c:0x8907:0x0].0x0 (ffff880765a7d2c0) refcount nonzero (2) after lock cleanup; forcing cleanup.
Aug 11 07:04:48 soak-10 kernel: LustreError: 22313:0:(ldlm_resource.c:1676:ldlm_resource_dump()) --- Resource: [0x2c000040c:0x8907:0x0].0x0 (ffff880765a7d2c0) refcount = 3
Aug 11 07:04:48 soak-10 kernel: LustreError: 22313:0:(ldlm_resource.c:1679:ldlm_resource_dump()) Granted locks (in reverse order):
Aug 11 07:04:48 soak-10 kernel: LustreError: 22313:0:(ldlm_resource.c:1682:ldlm_resource_dump()) ### ### ns: mdt-soaked-MDT0003_UUID lock: ffff88067bc04200/0xe70ae3776920e59b lrc: 2/0,1 mode: CW/CW res: [0x2c000040c:0x8907:0x0].0x0 bits 0x2/0x0 rrc: 4 type: IBT flags: 0x40316400000000 nid: local remote: 0x0 expref: -99 pid: 3751 timeout: 0 lvb_type: 0
Aug 11 07:04:48 soak-10 kernel: LustreError: 22313:0:(ldlm_resource.c:1676:ldlm_resource_dump()) --- Resource: [0x2c0000bea:0xb74a:0x0].0x0 (ffff8806abf93140) refcount = 3
Aug 11 07:04:48 soak-10 kernel: LustreError: 22313:0:(ldlm_resource.c:1679:ldlm_resource_dump()) Granted locks (in reverse order):
Aug 11 07:04:48 soak-10 kernel: LustreError: 22313:0:(ldlm_resource.c:1676:ldlm_resource_dump()) --- Resource: [0x2c0000bea:0xd06a:0x0].0x54cb7170 (ffff8803e4f5f5c0) refcount = 17
Aug 11 07:04:48 soak-10 kernel: LustreError: 22313:0:(ldlm_resource.c:1679:ldlm_resource_dump()) Granted locks (in reverse order):
Aug 11 07:04:48 soak-10 kernel: LustreError: 22313:0:(ldlm_resource.c:1697:ldlm_resource_dump()) Waiting locks:
Aug 11 07:04:48 soak-10 kernel: LustreError: 22313:0:(ldlm_resource.c:1699:ldlm_resource_dump()) ### ### ns: mdt-soaked-MDT0003_UUID lock: ffff88038cd75400/0xe70ae377690c01a9 lrc: 2/0,1 mode: --/PW res: [0x2c0000bea:0xd06a:0x0].0x54cb7170 bits 0x2/0x0 rrc: 18 type: IBT flags: 0x40316400000020 nid: local remote: 0x0 expref: -99 pid: 11146 timeout: 0 lvb_type: 0
Aug 11 07:04:48 soak-10 kernel: LustreError: 22313:0:(ldlm_resource.c:1676:ldlm_resource_dump()) --- Resource: [0x2c000040c:0x8907:0x0].0x31 (ffff880765a7d500) refcount = 2
Aug 11 07:04:48 soak-10 kernel: LustreError: 22313:0:(ldlm_resource.c:1679:ldlm_resource_dump()) Granted locks (in reverse order):
Aug 11 07:04:48 soak-10 kernel: LustreError: 22313:0:(ldlm_resource.c:1676:ldlm_resource_dump()) --- Resource: [0x2c0000bea:0xd06a:0x0].0x0 (ffff88069e2c6900) refcount = 33
Aug 11 07:04:48 soak-10 kernel: LustreError: 22313:0:(ldlm_resource.c:1679:ldlm_resource_dump()) Granted locks (in reverse order):
Aug 11 07:04:48 soak-10 kernel: LustreError: 22313:0:(ldlm_resource.c:1676:ldlm_resource_dump()) --- Resource: [0x2c0000bea:0xb74a:0x0].0x7d67b49 (ffff8806abf92900) refcount = 2
Aug 11 07:04:48 soak-10 kernel: LustreError: 22313:0:(ldlm_resource.c:1679:ldlm_resource_dump()) Granted locks (in reverse order):
Aug 11 07:04:48 soak-10 kernel: Lustre: soaked-MDT0003: Not available for connect from 192.168.1.109@o2ib (stopping)
Aug 11 07:04:48 soak-10 kernel: LustreError: 3459:0:(client.c:1166:ptlrpc_import_delay_req()) @@@ IMP_CLOSED   req@ffff88035e6e4500 x1575403953642112/t0(0) o13->soaked-OST0013-osc-MDT0003@192.168.1.103@o2ib:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1
Aug 11 07:04:48 soak-10 kernel: LustreError: 3459:0:(client.c:1166:ptlrpc_import_delay_req()) Skipped 16 previous similar messages
Aug 11 07:04:48 soak-10 kernel: Lustre: soaked-MDT0003: Not available for connect from 192.168.1.122@o2ib (stopping)
Aug 11 07:04:48 soak-10 kernel: Lustre: soaked-MDT0003: Not available for connect from 192.168.1.104@o2ib (stopping)
Aug 11 07:04:48 soak-10 kernel: Lustre: Skipped 14 previous similar messages
Aug 11 07:04:52 soak-10 kernel: Lustre: soaked-MDT0003: Not available for connect from 192.168.1.108@o2ib (stopping)
Aug 11 07:04:52 soak-10 kernel: Lustre: Skipped 8 previous similar messages
Aug 11 07:04:56 soak-10 kernel: LustreError: 11-0: soaked-MDT0003-osp-MDT0002: operation obd_ping to node 0@lo failed: rc = -107
Aug 11 07:04:56 soak-10 kernel: LustreError: Skipped 2 previous similar messages
Aug 11 07:04:56 soak-10 kernel: Lustre: soaked-MDT0003-osp-MDT0002: Connection to soaked-MDT0003 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete
Aug 11 07:04:56 soak-10 kernel: Lustre: Skipped 3 previous similar messages
Aug 11 07:05:01 soak-10 kernel: Lustre: soaked-MDT0003: Not available for connect from 192.168.1.102@o2ib (stopping)
Aug 11 07:05:01 soak-10 kernel: Lustre: Skipped 15 previous similar messages
Aug 11 07:05:09 soak-10 kernel: LustreError: 0-0: Forced cleanup waiting for mdt-soaked-MDT0003_UUID namespace with 5 resources in use, (rc=-110)
Aug 11 07:05:18 soak-10 kernel: Lustre: soaked-MDT0003: Not available for connect from 172.16.1.45@o2ib1 (stopping)
...
Aug 11 07:05:59 soak-10 kernel: LustreError: 0-0: Forced cleanup waiting for mdt-soaked-MDT0003_UUID namespace with 5 resources in use, (rc=-110)
Aug 11 07:06:00 soak-10 kernel: Lustre: soaked-MDT0003: Not available for connect from 192.168.1.117@o2ib (stopping)
Aug 11 07:06:00 soak-10 kernel: Lustre: Skipped 4 previous similar messages
Aug 11 07:06:12 soak-10 kernel: LustreError: 3706:0:(lod_qos.c:208:lod_statfs_and_check()) soaked-MDT0003-mdtlov: statfs: rc = -108
Aug 11 07:06:12 soak-10 kernel: LustreError: 3706:0:(ldlm_lockd.c:1415:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8803806e9400 ns: mdt-soaked-MDT0003_UUID lock: ffff88037df50400/0xe70ae3776a0f6960 lrc: 3/0,0 mode: CR/CR res: [0x2c0000bda:0x338a:0x0].0x0 bits 0x9/0x9 rrc: 2 type: IBT flags: 0x50200000000000 nid: 192.168.1.131@o2ib remote: 0x26342f5db48b4138 expref: 3 pid: 3706 timeout: 0 lvb_type: 0
Aug 11 07:06:15 soak-10 kernel: LustreError: 3762:0:(lod_qos.c:208:lod_statfs_and_check()) soaked-MDT0003-mdtlov: statfs: rc = -108
Aug 11 07:06:15 soak-10 kernel: LustreError: 3762:0:(lod_qos.c:208:lod_statfs_and_check()) Skipped 44 previous similar messages
Aug 11 07:06:23 soak-10 kernel: LustreError: 3751:0:(lod_qos.c:208:lod_statfs_and_check()) soaked-MDT0003-mdtlov: statfs: rc = -108
Aug 11 07:06:23 soak-10 kernel: LustreError: 3751:0:(lod_qos.c:208:lod_statfs_and_check()) Skipped 46 previous similar messages
Aug 11 07:06:25 soak-10 kernel: LustreError: 3408:0:(osp_dev.c:1276:osp_device_free()) header@ffff8803ddfc5bd8[0x0, 1, [0x200000007:0x1:0x0] hash exist]{
Aug 11 07:06:25 soak-10 kernel: LustreError: 3408:0:(osp_dev.c:1276:osp_device_free()) ....mdt@ffff8803ddfc5c28mdt-object@ffff8803ddfc5bd8( , writecount=0)
Aug 11 07:06:25 soak-10 kernel: LustreError: 3408:0:(osp_dev.c:1276:osp_device_free()) ....mdd@ffff8803a16d3870mdd-object@ffff8803a16d3870(open_count=0, valid=0, cltime=0, flags=0)
Aug 11 07:06:25 soak-10 kernel: LustreError: 3408:0:(osp_dev.c:1276:osp_device_free()) ....lod@ffff8806a13e4208lod-object@ffff8806a13e4208
Aug 11 07:06:25 soak-10 kernel: LustreError: 3408:0:(osp_dev.c:1276:osp_device_free()) ....osp@ffff88038baf4910osp-object@ffff88038baf48c0
Aug 11 07:06:26 soak-10 kernel: LustreError: 3408:0:(osp_dev.c:1276:osp_device_free()) } header@ffff8803ddfc5bd8
Aug 11 07:10:33 soak-10 rsyslogd: [origin software="rsyslogd" swVersion="7.4.7" x-pid="1691" x-info="http://www.rsyslog.com"] start
Aug 11 07:09:59 soak-10 kernel: microcode: microcode updated early to revision 0x710, date = 2013-06-17
Aug 11 07:09:59 soak-10 kernel: Initializing cgroup subsys cpuset


 Comments   
Comment by Peter Jones [ 11/Aug/17 ]

Lai

Could you please advise on this one?

Thanks

Peter

Generated at Sat Feb 10 02:30:01 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.