Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14046

lov tgt 0 not cleaned! deathrow=0, lovrc=1

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • Lustre 2.13.0, Lustre 2.12.5
    • None
    • 2.12.5 servers + 2.13 clients, CentOS 7.6
    • 9223372036854775807

    Description

      Today we upgrade Oak servers from 2.10.8 to 2.12.5, and now we ~50 clients (2.13) out of ~1,500 that cannot mount Oak at all after reboot. Example with client 10.50.0.63@o2ib2:

      Oct 19 13:31:26 sh02-ln03.stanford.edu kernel: LustreError: 94181:0:(lov_obd.c:828:lov_cleanup()) oak-clilov-ffffa0d562f8a800: lov tgt 0 not cleaned! deathrow=0, lovrc=1
      Oct 19 13:31:26 sh02-ln03.stanford.edu kernel: LustreError: 94181:0:(lov_obd.c:828:lov_cleanup()) Skipped 291 previous similar messages
      Oct 19 13:31:27 sh02-ln03.stanford.edu kernel: Lustre: Unmounted oak-client
      Oct 19 13:31:27 sh02-ln03.stanford.edu kernel: LustreError: 94181:0:(obd_mount.c:1669:lustre_fill_super()) Unable to mount  (-5) 

       

      On the MGS side, I can see this:

      /tmp/dk:00010000:02000400:7.0:1603137393.190601:0:7903:0:(ldlm_lib.c:1151:target_handle_connect()) MGS: Received new LWP connection from 10.50.0.63@o2ib2, removing former export from same NID
      /tmp/dk:00010000:00080000:7.0:1603137393.190602:0:7903:0:(ldlm_lib.c:1227:target_handle_connect()) MGS: connection from f3832037-ce6f-4@10.50.0.63@o2ib2 t0 exp ffff88f2a4e59c00 cur 12765 last 1603137393 

      2.10 servers with 2.13 clients worked fine. This is 2.12 servers with 2.13 clients.

      Please advise. Is it the same as in LU-13719?

      Thanks!

      Stephane

      Attachments

        Issue Links

          Activity

            [LU-14046] lov tgt 0 not cleaned! deathrow=0, lovrc=1
            sthiell Stephane Thiell added a comment - - edited

            We have restarted the MGS that started to load quite a lot and it crashed when we tried to stop it. This used to already happen with 2.10, and the bug is still in 2.12.5. We have applied Hongchao's patch from LU-13667 "ptlrpc: fix endless loop issue" and restarted MGS/MDS. After that, our 2.13 clients could mount the filesystem again and we haven't seen lock timeout issues on MGS even after failing over more OSTs.

            sthiell Stephane Thiell added a comment - - edited We have restarted the MGS that started to load quite a lot and it crashed when we tried to stop it. This used to already happen with 2.10, and the bug is still in 2.12.5. We have applied Hongchao's patch from LU-13667 "ptlrpc: fix endless loop issue" and restarted MGS/MDS. After that, our 2.13 clients could mount the filesystem again and we haven't seen lock timeout issues on MGS even after failing over more OSTs.
            pjones Peter Jones added a comment -

            Hongchao

            Does this seem to be a duplicate of LU-13367?

            Peter

            pjones Peter Jones added a comment - Hongchao Does this seem to be a duplicate of LU-13367 ? Peter

            After reviewing the attached client logs (sh02-ln03.client.dk.log) again, it looks like this could be due to something else.

            On the MGS/MDS, we can see endless disconnections:

            Oct 19 13:06:17 oak-md1-s1 kernel: LustreError: 166-1: MGC10.0.2.51@o2ib5: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail
            Oct 19 13:06:17 oak-md1-s1 kernel: LustreError: Skipped 1 previous similar message
            Oct 19 13:06:17 oak-md1-s1 kernel: LustreError: 7862:0:(ldlm_request.c:148:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1603137677, 300s ago), entering recovery for MGS@10.0.2.51@o2ib5 ns: MGC10.0.2.51@o2ib5 lock: ffff88f240925c40/0xe88ce0ce9dbc849f lrc: 4/1,0 mode: 
            Oct 19 13:06:17 oak-md1-s1 kernel: LustreError: 7862:0:(ldlm_request.c:148:ldlm_expired_completion_wait()) Skipped 1 previous similar message
            Oct 19 13:09:26 oak-md1-s1 kernel: Lustre: MGS: haven't heard from client 38b0ac60-6c23-4 (at 10.49.27.12@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff88f2a8b2b400, cur 1603138166 expire 1603138016 last 1603137939
            Oct 19 13:11:17 oak-md1-s1 kernel: LustreError: 29618:0:(ldlm_resource.c:1147:ldlm_resource_complain()) MGC10.0.2.51@o2ib5: namespace resource [0x736d61726170:0x3:0x0].0x0 (ffff88f1ef5e66c0) refcount nonzero (1) after lock cleanup; forcing cleanup.
            Oct 19 13:11:17 oak-md1-s1 kernel: LustreError: 7862:0:(mgc_request.c:599:do_requeue()) failed processing log: -5
            Oct 19 13:12:47 oak-md1-s1 kernel: Lustre: MGS: Connection restored to 0a2acb4f-5e35-e84f-7137-12310b3b17d8 (at 10.12.4.25@o2ib)
            Oct 19 13:12:47 oak-md1-s1 kernel: Lustre: Skipped 3213 previous similar messages
            Oct 19 13:14:26 oak-md1-s1 kernel: Lustre: MGS: haven't heard from client 38b0ac60-6c23-4 (at 10.49.27.12@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff88e2baae9400, cur 1603138466 expire 1603138316 last 1603138239
            Oct 19 13:15:22 oak-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.0.2.102@o2ib5, removing former export from same NID
            Oct 19 13:15:22 oak-md1-s1 kernel: Lustre: Skipped 3237 previous similar messages
            Oct 19 13:16:17 oak-md1-s1 kernel: LustreError: 166-1: MGC10.0.2.51@o2ib5: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail
            Oct 19 13:16:17 oak-md1-s1 kernel: LustreError: Skipped 1 previous similar message
            Oct 19 13:16:17 oak-md1-s1 kernel: LustreError: 7862:0:(ldlm_request.c:148:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1603138277, 300s ago), entering recovery for MGS@10.0.2.51@o2ib5 ns: MGC10.0.2.51@o2ib5 lock: ffff88e04ef26e40/0xe88ce0ce9f2b45ae lrc: 4/1,0 mode: 
            Oct 19 13:16:17 oak-md1-s1 kernel: LustreError: 7862:0:(ldlm_request.c:148:ldlm_expired_completion_wait()) Skipped 1 previous similar message
            Oct 19 13:19:35 oak-md1-s1 kernel: Lustre: MGS: haven't heard from client 38b0ac60-6c23-4 (at 10.49.27.12@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff88f2a7f22c00, cur 1603138775 expire 1603138625 last 1603138548
            Oct 19 13:21:18 oak-md1-s1 kernel: LustreError: 29746:0:(ldlm_resource.c:1147:ldlm_resource_complain()) MGC10.0.2.51@o2ib5: namespace resource [0x736d61726170:0x3:0x0].0x0 (ffff88e02528f680) refcount nonzero (1) after lock cleanup; forcing cleanup.
            Oct 19 13:21:18 oak-md1-s1 kernel: LustreError: 7862:0:(mgc_request.c:599:do_requeue()) failed processing log: -5
            Oct 19 13:21:18 oak-md1-s1 kernel: LustreError: 29746:0:(ldlm_resource.c:1147:ldlm_resource_complain()) Skipped 1 previous similar message
            Oct 19 13:22:47 oak-md1-s1 kernel: Lustre: MGS: Connection restored to bf3e962b-e521-22c4-b7d4-b2c82f971648 (at 10.12.4.86@o2ib)
            Oct 19 13:22:47 oak-md1-s1 kernel: Lustre: Skipped 3211 previous similar messages
            Oct 19 13:24:35 oak-md1-s1 kernel: Lustre: MGS: haven't heard from client 38b0ac60-6c23-4 (at 10.49.27.12@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff88f2aaa09c00, cur 1603139075 expire 1603138925 last 1603138848
            Oct 19 13:25:22 oak-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.0.2.102@o2ib5, removing former export from same NID
            Oct 19 13:25:22 oak-md1-s1 kernel: Lustre: Skipped 3253 previous similar messages
            Oct 19 13:26:26 oak-md1-s1 kernel: LustreError: 166-1: MGC10.0.2.51@o2ib5: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail
            Oct 19 13:26:26 oak-md1-s1 kernel: LustreError: Skipped 1 previous similar message
            Oct 19 13:26:26 oak-md1-s1 kernel: LustreError: 7862:0:(ldlm_request.c:148:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1603138886, 300s ago), entering recovery for MGS@10.0.2.51@o2ib5 ns: MGC10.0.2.51@o2ib5 lock: ffff88e1ad4618c0/0xe88ce0cea0bd8bfc lrc: 4/1,0 mode: 
            Oct 19 13:26:27 oak-md1-s1 kernel: LustreError: 7862:0:(ldlm_request.c:148:ldlm_expired_completion_wait()) Skipped 1 previous similar message
            Oct 19 13:29:41 oak-md1-s1 kernel: Lustre: MGS: haven't heard from client 38b0ac60-6c23-4 (at 10.49.27.12@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff88f2a6e94400, cur 1603139381 expire 1603139231 last 1603139154
            Oct 19 13:30:21 oak-md1-s1 kernel: Lustre: oak-MDT0001: Client c50f4c48-a8d0-2d5f-ff90-7efef0b098e9 (at 10.210.9.195@tcp1) reconnecting
            Oct 19 13:30:21 oak-md1-s1 kernel: Lustre: Skipped 21 previous similar messages
            Oct 19 13:31:27 oak-md1-s1 kernel: LustreError: 29879:0:(ldlm_resource.c:1147:ldlm_resource_complain()) MGC10.0.2.51@o2ib5: namespace resource [0x736d61726170:0x3:0x0].0x0 (ffff88dfc0b3fc80) refcount nonzero (1) after lock cleanup; forcing cleanup.
            Oct 19 13:31:27 oak-md1-s1 kernel: LustreError: 7862:0:(mgc_request.c:599:do_requeue()) failed processing log: -5
            Oct 19 13:32:47 oak-md1-s1 kernel: Lustre: MGS: Connection restored to e3ecfc0a-db4b-4 (at 10.50.10.3@o2ib2)
            Oct 19 13:32:47 oak-md1-s1 kernel: Lustre: Skipped 3241 previous similar messages
            Oct 19 13:34:41 oak-md1-s1 kernel: Lustre: MGS: haven't heard from client 38b0ac60-6c23-4 (at 10.49.27.12@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff88f2aaadf400, cur 1603139681 expire 1603139531 last 1603139454
            Oct 19 13:35:22 oak-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.0.2.102@o2ib5, removing former export from same NID
            Oct 19 13:35:22 oak-md1-s1 kernel: Lustre: Skipped 3238 previous similar messages
            Oct 19 13:36:27 oak-md1-s1 kernel: LustreError: 166-1: MGC10.0.2.51@o2ib5: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail
            Oct 19 13:36:27 oak-md1-s1 kernel: LustreError: Skipped 1 previous similar message
            Oct 19 13:36:27 oak-md1-s1 kernel: LustreError: 7862:0:(ldlm_request.c:148:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1603139487, 300s ago), entering recovery for MGS@10.0.2.51@o2ib5 ns: MGC10.0.2.51@o2ib5 lock: ffff88e15230e540/0xe88ce0cea2ae56e9 lrc: 4/1,0 mode: 
            Oct 19 13:36:27 oak-md1-s1 kernel: LustreError: 7862:0:(ldlm_request.c:148:ldlm_expired_completion_wait()) Skipped 1 previous similar message
            Oct 19 13:39:46 oak-md1-s1 kernel: Lustre: MGS: haven't heard from client 38b0ac60-6c23-4 (at 10.49.27.12@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff88f2a4c9b800, cur 1603139986 expire 1603139836 last 1603139759
            Oct 19 13:41:27 oak-md1-s1 kernel: LustreError: 29993:0:(ldlm_resource.c:1147:ldlm_resource_complain()) MGC10.0.2.51@o2ib5: namespace resource [0x736d61726170:0x3:0x0].0x0 (ffff88e130e895c0) refcount nonzero (1) after lock cleanup; forcing cleanup.
            Oct 19 13:41:27 oak-md1-s1 kernel: LustreError: 7862:0:(mgc_request.c:599:do_requeue()) failed processing log: -5
            Oct 19 13:41:27 oak-md1-s1 kernel: LustreError: 29993:0:(ldlm_resource.c:1147:ldlm_resource_complain()) Skipped 1 previous similar message
            Oct 19 13:42:48 oak-md1-s1 kernel: Lustre: MGS: Connection restored to 4151aadc-4857-e0bc-f1e5-8c97714919e5 (at 10.210.12.107@tcp1)
            Oct 19 13:42:48 oak-md1-s1 kernel: Lustre: Skipped 3233 previous similar messages
            Oct 19 13:45:22 oak-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.0.2.102@o2ib5, removing former export from same NID
            Oct 19 13:45:22 oak-md1-s1 kernel: Lustre: Skipped 3203 previous similar messages 
            

            Do you think this issue is related to LU-13667 for which a patch has landed in b2_12?

            sthiell Stephane Thiell added a comment - After reviewing the attached client logs ( sh02-ln03.client.dk.log ) again, it looks like this could be due to something else. On the MGS/MDS, we can see endless disconnections: Oct 19 13:06:17 oak-md1-s1 kernel: LustreError: 166-1: MGC10.0.2.51@o2ib5: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail Oct 19 13:06:17 oak-md1-s1 kernel: LustreError: Skipped 1 previous similar message Oct 19 13:06:17 oak-md1-s1 kernel: LustreError: 7862:0:(ldlm_request.c:148:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1603137677, 300s ago), entering recovery for MGS@10.0.2.51@o2ib5 ns: MGC10.0.2.51@o2ib5 lock: ffff88f240925c40/0xe88ce0ce9dbc849f lrc: 4/1,0 mode: Oct 19 13:06:17 oak-md1-s1 kernel: LustreError: 7862:0:(ldlm_request.c:148:ldlm_expired_completion_wait()) Skipped 1 previous similar message Oct 19 13:09:26 oak-md1-s1 kernel: Lustre: MGS: haven't heard from client 38b0ac60-6c23-4 (at 10.49.27.12@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff88f2a8b2b400, cur 1603138166 expire 1603138016 last 1603137939 Oct 19 13:11:17 oak-md1-s1 kernel: LustreError: 29618:0:(ldlm_resource.c:1147:ldlm_resource_complain()) MGC10.0.2.51@o2ib5: namespace resource [0x736d61726170:0x3:0x0].0x0 (ffff88f1ef5e66c0) refcount nonzero (1) after lock cleanup; forcing cleanup. Oct 19 13:11:17 oak-md1-s1 kernel: LustreError: 7862:0:(mgc_request.c:599:do_requeue()) failed processing log: -5 Oct 19 13:12:47 oak-md1-s1 kernel: Lustre: MGS: Connection restored to 0a2acb4f-5e35-e84f-7137-12310b3b17d8 (at 10.12.4.25@o2ib) Oct 19 13:12:47 oak-md1-s1 kernel: Lustre: Skipped 3213 previous similar messages Oct 19 13:14:26 oak-md1-s1 kernel: Lustre: MGS: haven't heard from client 38b0ac60-6c23-4 (at 10.49.27.12@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff88e2baae9400, cur 1603138466 expire 1603138316 last 1603138239 Oct 19 13:15:22 oak-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.0.2.102@o2ib5, removing former export from same NID Oct 19 13:15:22 oak-md1-s1 kernel: Lustre: Skipped 3237 previous similar messages Oct 19 13:16:17 oak-md1-s1 kernel: LustreError: 166-1: MGC10.0.2.51@o2ib5: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail Oct 19 13:16:17 oak-md1-s1 kernel: LustreError: Skipped 1 previous similar message Oct 19 13:16:17 oak-md1-s1 kernel: LustreError: 7862:0:(ldlm_request.c:148:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1603138277, 300s ago), entering recovery for MGS@10.0.2.51@o2ib5 ns: MGC10.0.2.51@o2ib5 lock: ffff88e04ef26e40/0xe88ce0ce9f2b45ae lrc: 4/1,0 mode: Oct 19 13:16:17 oak-md1-s1 kernel: LustreError: 7862:0:(ldlm_request.c:148:ldlm_expired_completion_wait()) Skipped 1 previous similar message Oct 19 13:19:35 oak-md1-s1 kernel: Lustre: MGS: haven't heard from client 38b0ac60-6c23-4 (at 10.49.27.12@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff88f2a7f22c00, cur 1603138775 expire 1603138625 last 1603138548 Oct 19 13:21:18 oak-md1-s1 kernel: LustreError: 29746:0:(ldlm_resource.c:1147:ldlm_resource_complain()) MGC10.0.2.51@o2ib5: namespace resource [0x736d61726170:0x3:0x0].0x0 (ffff88e02528f680) refcount nonzero (1) after lock cleanup; forcing cleanup. Oct 19 13:21:18 oak-md1-s1 kernel: LustreError: 7862:0:(mgc_request.c:599:do_requeue()) failed processing log: -5 Oct 19 13:21:18 oak-md1-s1 kernel: LustreError: 29746:0:(ldlm_resource.c:1147:ldlm_resource_complain()) Skipped 1 previous similar message Oct 19 13:22:47 oak-md1-s1 kernel: Lustre: MGS: Connection restored to bf3e962b-e521-22c4-b7d4-b2c82f971648 (at 10.12.4.86@o2ib) Oct 19 13:22:47 oak-md1-s1 kernel: Lustre: Skipped 3211 previous similar messages Oct 19 13:24:35 oak-md1-s1 kernel: Lustre: MGS: haven't heard from client 38b0ac60-6c23-4 (at 10.49.27.12@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff88f2aaa09c00, cur 1603139075 expire 1603138925 last 1603138848 Oct 19 13:25:22 oak-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.0.2.102@o2ib5, removing former export from same NID Oct 19 13:25:22 oak-md1-s1 kernel: Lustre: Skipped 3253 previous similar messages Oct 19 13:26:26 oak-md1-s1 kernel: LustreError: 166-1: MGC10.0.2.51@o2ib5: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail Oct 19 13:26:26 oak-md1-s1 kernel: LustreError: Skipped 1 previous similar message Oct 19 13:26:26 oak-md1-s1 kernel: LustreError: 7862:0:(ldlm_request.c:148:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1603138886, 300s ago), entering recovery for MGS@10.0.2.51@o2ib5 ns: MGC10.0.2.51@o2ib5 lock: ffff88e1ad4618c0/0xe88ce0cea0bd8bfc lrc: 4/1,0 mode: Oct 19 13:26:27 oak-md1-s1 kernel: LustreError: 7862:0:(ldlm_request.c:148:ldlm_expired_completion_wait()) Skipped 1 previous similar message Oct 19 13:29:41 oak-md1-s1 kernel: Lustre: MGS: haven't heard from client 38b0ac60-6c23-4 (at 10.49.27.12@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff88f2a6e94400, cur 1603139381 expire 1603139231 last 1603139154 Oct 19 13:30:21 oak-md1-s1 kernel: Lustre: oak-MDT0001: Client c50f4c48-a8d0-2d5f-ff90-7efef0b098e9 (at 10.210.9.195@tcp1) reconnecting Oct 19 13:30:21 oak-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Oct 19 13:31:27 oak-md1-s1 kernel: LustreError: 29879:0:(ldlm_resource.c:1147:ldlm_resource_complain()) MGC10.0.2.51@o2ib5: namespace resource [0x736d61726170:0x3:0x0].0x0 (ffff88dfc0b3fc80) refcount nonzero (1) after lock cleanup; forcing cleanup. Oct 19 13:31:27 oak-md1-s1 kernel: LustreError: 7862:0:(mgc_request.c:599:do_requeue()) failed processing log: -5 Oct 19 13:32:47 oak-md1-s1 kernel: Lustre: MGS: Connection restored to e3ecfc0a-db4b-4 (at 10.50.10.3@o2ib2) Oct 19 13:32:47 oak-md1-s1 kernel: Lustre: Skipped 3241 previous similar messages Oct 19 13:34:41 oak-md1-s1 kernel: Lustre: MGS: haven't heard from client 38b0ac60-6c23-4 (at 10.49.27.12@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff88f2aaadf400, cur 1603139681 expire 1603139531 last 1603139454 Oct 19 13:35:22 oak-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.0.2.102@o2ib5, removing former export from same NID Oct 19 13:35:22 oak-md1-s1 kernel: Lustre: Skipped 3238 previous similar messages Oct 19 13:36:27 oak-md1-s1 kernel: LustreError: 166-1: MGC10.0.2.51@o2ib5: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail Oct 19 13:36:27 oak-md1-s1 kernel: LustreError: Skipped 1 previous similar message Oct 19 13:36:27 oak-md1-s1 kernel: LustreError: 7862:0:(ldlm_request.c:148:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1603139487, 300s ago), entering recovery for MGS@10.0.2.51@o2ib5 ns: MGC10.0.2.51@o2ib5 lock: ffff88e15230e540/0xe88ce0cea2ae56e9 lrc: 4/1,0 mode: Oct 19 13:36:27 oak-md1-s1 kernel: LustreError: 7862:0:(ldlm_request.c:148:ldlm_expired_completion_wait()) Skipped 1 previous similar message Oct 19 13:39:46 oak-md1-s1 kernel: Lustre: MGS: haven't heard from client 38b0ac60-6c23-4 (at 10.49.27.12@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff88f2a4c9b800, cur 1603139986 expire 1603139836 last 1603139759 Oct 19 13:41:27 oak-md1-s1 kernel: LustreError: 29993:0:(ldlm_resource.c:1147:ldlm_resource_complain()) MGC10.0.2.51@o2ib5: namespace resource [0x736d61726170:0x3:0x0].0x0 (ffff88e130e895c0) refcount nonzero (1) after lock cleanup; forcing cleanup. Oct 19 13:41:27 oak-md1-s1 kernel: LustreError: 7862:0:(mgc_request.c:599:do_requeue()) failed processing log: -5 Oct 19 13:41:27 oak-md1-s1 kernel: LustreError: 29993:0:(ldlm_resource.c:1147:ldlm_resource_complain()) Skipped 1 previous similar message Oct 19 13:42:48 oak-md1-s1 kernel: Lustre: MGS: Connection restored to 4151aadc-4857-e0bc-f1e5-8c97714919e5 (at 10.210.12.107@tcp1) Oct 19 13:42:48 oak-md1-s1 kernel: Lustre: Skipped 3233 previous similar messages Oct 19 13:45:22 oak-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.0.2.102@o2ib5, removing former export from same NID Oct 19 13:45:22 oak-md1-s1 kernel: Lustre: Skipped 3203 previous similar messages Do you think this issue is related to LU-13667 for which a patch has landed in b2_12?

            Sorry didn't mean to open this as an improvement, it's a bug we would like to report with clients unable to mount the filesystem. Please let me know what you think. Thanks!

            sthiell Stephane Thiell added a comment - Sorry didn't mean to open this as an improvement, it's a bug we would like to report with clients unable to mount the filesystem. Please let me know what you think. Thanks!

            People

              hongchao.zhang Hongchao Zhang
              sthiell Stephane Thiell
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: