Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.15.0
-
3
-
9223372036854775807
Description
With a LU-8367 deadlock between osp_precreate_reserve() andÂ
osp_precreate_cleanup_orphans(), I've found a problem with MDT failover.
00000020:02000400:31.0:1644539398.776433:0:454249:0:(obd_config.c:854:class_cleanup()) Failing over kjcf05-MDT0001 ... 00010000:02020000:20.0:1644539461.204784:0:454249:0:(ldlm_resource.c:1188:__ldlm_namespace_free()) 0-0: Forced cleanup waiting for mdt-kjcf05-MDT0001_UUID namespace with 46 resources in use, (rc=-110) 00010000:02020000:8.0:1644539699.332763:0:454249:0:(ldlm_resource.c:1188:__ldlm_namespace_free()) 0-0: Forced cleanup waiting for mdt-kjcf05-MDT0001_UUID namespace with 46 resources in use, (rc=-110)
So the situation is - MDT failover does not produce disconnect event, so osp_precreate_cleanup_orphans() cannot be awakened. Also it does not cleanup opd_pre_recovering and osp_precreate_reserve() wait skips wakeup signal. This hang would be ended after ~obd_timeout.
Attachments
Issue Links
- is related to
-
LU-16425 Interop recovery-small test_144a: MDT failover took 252 seconds
- Resolved