Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.7.0
-
3
-
9223372036854775807
Description
Pulling in file migration patch into my tree (pretty much master) and running racer in a loop makes racer not being able to stop lustre on second or so iteration.
What I see in the logs:
[ 8420.766878] Lustre: DEBUG MARKER: == racer test complete, duration 2754 sec == 00:07:26 (1440994046) [ 8420.967636] Lustre: setting import lustre-MDT0000_UUID INACTIVE by administrator request [ 8421.140263] Lustre: 27652:0:(llite_lib.c:2617:ll_dirty_page_discard_warn()) lustre: dirty page discard: 192.168.10.210@tcp:/lustre/fid: [0x200000401:0x1434e:0x0]/ may get corrupted (rc -108) [ 8421.141516] Lustre: 27650:0:(llite_lib.c:2617:ll_dirty_page_discard_warn()) lustre: dirty page discard: 192.168.10.210@tcp:/lustre/fid: [0x200000402:0x1418e:0x0]/ may get corrupted (rc -108) [ 8421.141559] Lustre: 27651:0:(llite_lib.c:2617:ll_dirty_page_discard_warn()) lustre: dirty page discard: 192.168.10.210@tcp:/lustre/fid: [0x200000401:0x1433e:0x0]/ may get corrupted (rc -108) [ 8421.441181] Lustre: Unmounted lustre-client [ 8421.587773] Lustre: setting import lustre-MDT0000_UUID INACTIVE by administrator request [ 8421.589214] Lustre: Skipped 2 previous similar messages [ 8421.770504] Lustre: 27654:0:(llite_lib.c:2617:ll_dirty_page_discard_warn()) lustre: dirty page discard: 192.168.10.210@tcp:/lustre/fid: [0x200000402:0x1414f:0x0]/ may get corrupted (rc -108) [ 8422.089900] Lustre: Unmounted lustre-client [ 8422.692354] LustreError: 11-0: lustre-MDT0000-lwp-OST0000: operation obd_ping to node 0@lo failed: rc = -107 [ 8422.692393] Lustre: lustre-MDT0000-lwp-OST0001: Connection to lustre-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete [ 8422.692573] Lustre: lustre-MDT0000: Not available for connect from 0@lo (stopping) [ 8422.696661] LustreError: Skipped 1 previous similar message [ 8422.728159] LustreError: 27652:0:(client.c:1138:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff880017a6dce8 x1510988910285696/t0(0) o13->lustre-OST0000-osc-MDT0000@0@lo:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 [ 8425.805649] LustreError: 27651:0:(client.c:1138:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff880036aa3ce8 x1510988910285704/t0(0) o6->lustre-OST0001-osc-MDT0000@0@lo:28/4 lens 664/432 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 [ 8425.807388] LustreError: 27651:0:(client.c:1138:ptlrpc_import_delay_req()) Skipped 1 previous similar message [ 8427.692280] Lustre: lustre-MDT0000: Not available for connect from 0@lo (stopping) [ 8427.693352] Lustre: Skipped 2 previous similar messages [ 8428.654000] Lustre: server umount lustre-MDT0000 complete [ 8429.084825] LustreError: 7702:0:(ldlm_resource.c:835:ldlm_resource_complain()) filter-lustre-OST0000_UUID: namespace resource [0x29b0:0x0:0x0].0 (ffff880093089ef8) refcount nonzero (1) after lock cleanup; forcing cleanup. [ 8429.086152] LustreError: 7702:0:(ldlm_resource.c:1448:ldlm_resource_dump()) --- Resource: [0x29b0:0x0:0x0].0 (ffff880093089ef8) refcount = 2 [ 8434.084126] LustreError: 0-0: Forced cleanup waiting for filter-lustre-OST0000_UUID namespace with 1 resources in use, (rc=-110) [ 8439.084079] LustreError: 0-0: Forced cleanup waiting for filter-lustre-OST0000_UUID namespace with 1 resources in use, (rc=-110) [ 8439.692145] Lustre: 27655:0:(client.c:2039:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1440994058/real 1440994058] req@ffff88009fac0ce8 x1510988910286992/t0(0) o400->MGC192.168.10.210@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1440994065 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [ 8439.694855] LustreError: 166-1: MGC192.168.10.210@tcp: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail [ 8444.084112] LustreError: 0-0: Forced cleanup waiting for filter-lustre-OST0000_UUID namespace with 1 resources in use, (rc=-110) [ 8445.696141] Lustre: 27649:0:(client.c:2039:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1440994065/real 1440994065] req@ffff880033d4cce8 x1510988910287004/t0(0) o250->MGC192.168.10.210@tcp@0@lo:26/25 lens 520/544 e 0 to 1 dl 1440994071 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [ 8448.696152] Lustre: 27649:0:(client.c:2039:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1440994058/real 1440994058] req@ffff88000a215ce8 x1510988910286996/t0(0) o38->lustre-MDT0000-lwp-OST0001@0@lo:12/10 lens 520/544 e 0 to 1 dl 1440994074 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [ 8449.084099] LustreError: 0-0: Forced cleanup waiting for filter-lustre-OST0000_UUID namespace with 1 resources in use, (rc=-110) [ 8454.084088] LustreError: 0-0: Forced cleanup waiting for filter-lustre-OST0000_UUID namespace with 1 resources in use, (rc=-110) [ 8460.696064] Lustre: 27649:0:(client.c:2039:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1440994075/real 1440994075] req@ffff88004cc54ce8 x1510988910287008/t0(0) o250->MGC192.168.10.210@tcp@0@lo:26/25 lens 520/544 e 0 to 1 dl 1440994086 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [ 8464.084075] LustreError: 0-0: Forced cleanup waiting for filter-lustre-OST0000_UUID namespace with 1 resources in use, (rc=-110) [ 8464.085092] LustreError: Skipped 1 previous similar message [ 8470.696064] Lustre: 27649:0:(client.c:2039:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1440994075/real 1440994075] req@ffff8800a10a5ce8 x1510988910287012/t0(0) o38->lustre-MDT0000-lwp-OST0001@0@lo:12/10 lens 520/544 e 0 to 1 dl 1440994096 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [ 8480.696143] Lustre: 27649:0:(client.c:2039:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1440994090/real 1440994090] req@ffff88000e31bce8 x1510988910287016/t0(0) o250->MGC192.168.10.210@tcp@0@lo:26/25 lens 520/544 e 0 to 1 dl 1440994106 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [ 8484.084065] LustreError: 0-0: Forced cleanup waiting for filter-lustre-OST0000_UUID namespace with 1 resources in use, (rc=-110) [ 8484.084866] LustreError: Skipped 3 previous similar messages [ 8499.696134] Lustre: 27649:0:(client.c:2039:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1440994100/real 1440994100] req@ffff88000a789ce8 x1510988910287020/t0(0) o38->lustre-MDT0000-lwp-OST0001@0@lo:12/10 lens 520/544 e 0 to 1 dl 1440994125 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [ 8519.084087] LustreError: 0-0: Forced cleanup waiting for filter-lustre-OST0000_UUID namespace with 1 resources in use, (rc=-110) [ 8519.084900] LustreError: Skipped 6 previous similar messages [ 8534.696118] Lustre: 27649:0:(client.c:2039:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1440994135/real 1440994135] req@ffff8800186e6ce8 x1510988910287032/t0(0) o250->MGC192.168.10.210@tcp@0@lo:26/25 lens 520/544 e 0 to 1 dl 1440994160 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [ 8534.697994] Lustre: 27649:0:(client.c:2039:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [ 8584.084096] LustreError: 0-0: Forced cleanup waiting for filter-lustre-OST0000_UUID namespace with 1 resources in use, (rc=-110) [ 8584.084930] LustreError: Skipped 12 previous similar messages [ 8619.696078] Lustre: 27649:0:(client.c:2039:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1440994220/real 1440994220] req@ffff88000a789ce8 x1510988910287052/t0(0) o38->lustre-MDT0000-lwp-OST0001@0@lo:12/10 lens 520/544 e 0 to 1 dl 1440994245 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [ 8619.697726] Lustre: 27649:0:(client.c:2039:ptlrpc_expire_one_request()) Skipped 4 previous similar messages [ 8714.084086] LustreError: 0-0: Forced cleanup waiting for filter-lustre-OST0000_UUID namespace with 1 resources in use, (rc=-110) [ 8714.085350] LustreError: Skipped 25 previous similar messages [ 8769.696090] Lustre: 27649:0:(client.c:2039:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1440994370/real 1440994370] req@ffff8800aebeece8 x1510988910287092/t0(0) o38->lustre-MDT0000-lwp-OST0001@0@lo:12/10 lens 520/544 e 0 to 1 dl 1440994395 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [ 8769.698845] Lustre: 27649:0:(client.c:2039:ptlrpc_expire_one_request()) Skipped 9 previous similar messages [ 8974.084085] LustreError: 0-0: Forced cleanup waiting for filter-lustre-OST0000_UUID namespace with 1 resources in use, (rc=-110)
So it looks like there was a leaked lock somewhere along the way.
Comparing a succesful run vs unsuccesful, this message pops out in the unsuccesful run:
[ 7616.286047] LustreError: 32525:0:(ldlm_request.c:452:ldlm_cli_enqueue_local()) ### delayed lvb init failed (rc -2) ns: filter-lustre-OST0000_UUID lock: ffff880094d41db8/0x96ec5ed20a281423 lrc: 2/0,0 mode: --/PR res: [0x29b0:0x0:0x0].0 rrc: 1 type: EXT [0->0] (req 0->0) flags: 0x40000000000000 nid: local remote: 0x0 expref: -99 pid: 32525 timeout: 0 lvb_type: 0
Resource seems to be matching stuck one, so I bet it's related.
Attachments
Issue Links
- is related to
-
LU-10543 racer: /mnt/lustre2 is still busy, wait one second
- Open
-
LU-14389 crash in lov_delete_composite() with racer+migrate
- Resolved
-
LU-13730 Check need to mirror Extend on a WRITE_PENDING (wp) FLR file
- Resolved
- is related to
-
LU-2766 lov_object.c:635:lov_layout_change()) ASSERTION( atomic_read(&lov->lo_active_ios) == 0 ) failed
- Resolved
-
LU-4840 Deadlock when truncating file during lfs migrate
- Resolved
-
LU-6903 racer file migration crash ASSERTION( lov->lo_type == LLT_RAID0 )
- Resolved
-
LU-8174 Taking a lease for the 2nd time fails with -EBUSY
- Resolved