Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11826

Cannot send after transport endpoint shutdown

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • None
    • Lustre 2.10.6
    • None
    • CentOS 7.4
    • 3
    • 9223372036854775807

    Description

      When running multiple rm's of files, we get the following error in the shell:

      /bin/rm: cannot remove '</some/file/path>’: Cannot send after transport endpoint shutdown

      These coincide with the following error in /var/log/messages:

       Dec 24 11:13:09 foxtrot2 kernel: LustreError: 11-0: foxtrot-MDT0000-mdc-ffff883ff6b12800: operation mds_close to node 10.21.22.10@tcp failed: rc = -107Dec 24 11:13:09 foxtrot2 kernel: Lustre: foxtrot-MDT0000-mdc-ffff883ff6b12800: Connection to foxtrot-MDT0000 (at 10.21.22.10@tcp) was lost; in progress operations using this service will wait for recovery to completeDec 24 11:13:09 foxtrot2 kernel: LustreError: 167-0: foxtrot-MDT0000-mdc-ffff883ff6b12800: This client was evicted by foxtrot-MDT0000; in progress operations using this service will fail.Dec 24 11:13:09 foxtrot2 kernel: LustreError: 3598:0:(mdc_locks.c:1211:mdc_intent_getattr_async_interpret()) ldlm_cli_enqueue_fini: -5Dec 24 11:13:09 foxtrot2 kernel: LustreError: 3598:0:(mdc_locks.c:1211:mdc_intent_getattr_async_interpret()) Skipped 37 previous similar messagesDec 24 11:13:09 foxtrot2 kernel: LustreError: Skipped 50 previous similar messagesDec 24 11:13:09 foxtrot2 kernel: LustreError: 39322:0:(llite_lib.c:1512:ll_md_setattr()) md_setattr fails: rc = -5Dec 24 11:13:09 foxtrot2 kernel: LustreError: 38248:0:(file.c:172:ll_close_inode_openhandle()) foxtrot-clilmv-ffff883ff6b12800: inode [0x200030875:0x5d11:0x0] mdc close failed: rc = -107Dec 24 11:13:09 foxtrot2 kernel: LustreError: 38248:0:(file.c:172:ll_close_inode_openhandle()) Skipped 743 previous similar messagesDec 24 11:13:09 foxtrot2 kernel: LustreError: 41760:0:(vvp_io.c:1474:vvp_io_init()) foxtrot: refresh file layout [0x2000302ba:0x103db:0x0] error -108.Dec 24 11:13:09 foxtrot2 kernel: LustreError: 41760:0:(vvp_io.c:1474:vvp_io_init()) Skipped 310070 previous similar messagesDec 24 11:13:09 foxtrot2 kernel: LustreError: 44300:0:(mdc_request.c:1329:mdc_read_page()) foxtrot-MDT0000-mdc-ffff883ff6b12800: [0x20002cfcf:0x5a20:0x0] lock enqueue fails: rc = -108Dec 24 11:13:09 foxtrot2 kernel: LustreError: 39322:0:(llite_lib.c:1512:ll_md_setattr()) Skipped 5 previous similar messagesDec 24 11:13:09 foxtrot2 kernel: LustreError: 12816:0:(vvp_io.c:1474:vvp_io_init()) foxtrot: refresh file layout [0x200030766:0x18539:0x0] error -108.Dec 24 11:13:09 foxtrot2 kernel: LustreError: 39252:0:(vvp_io.c:1474:vvp_io_init()) foxtrot: refresh file layout [0x2000302ba:0x10403:0x0] error -108.Dec 24 11:13:09 foxtrot2 kernel: LustreError: 39252:0:(vvp_io.c:1474:vvp_io_init()) Skipped 143616 previous similar messagesDec 24 11:13:09 foxtrot2 kernel: LustreError: 44302:0:(file.c:172:ll_close_inode_openhandle()) foxtrot-clilmv-ffff883ff6b12800: inode [0x20000070c:0x2ea9:0x0] mdc close failed: rc = -108Dec 24 11:13:09 foxtrot2 kernel: LustreError: 44302:0:(file.c:172:ll_close_inode_openhandle()) Skipped 815 previous similar messagesDec 24 11:13:09 foxtrot2 kernel: LustreError: 12816:0:(vvp_io.c:1474:vvp_io_init()) Skipped 2986 previous similar messagesDec 24 11:13:10 foxtrot2 kernel: Lustre: foxtrot-MDT0000-mdc-ffff883ff6b12800: Connection restored to 10.21.22.10@tcp (at 10.21.22.10@tcp)
      

      Attachments

        Issue Links

          Activity

            [LU-11826] Cannot send after transport endpoint shutdown
            pjones Peter Jones added a comment -

            Ah good - thanks for confirming! We have included this fix in 2.10.7 but I am loathe to include fixes that we don't know serve a purpose.

            pjones Peter Jones added a comment - Ah good - thanks for confirming! We have included this fix in 2.10.7 but I am loathe to include fixes that we don't know serve a purpose.

            Hi Peter,

             

            My apologies, I missed your last message. We did extensive testing with parallel deletes and there were no 'transport endpoint shutdown' messages. We're still getting evictions from OSTs but that is a separate issue. So I think we can consider the patch a success in fixing that issue.

            Kind regards,

            Campbell

            cmcl Campbell Mcleay (Inactive) added a comment - Hi Peter,   My apologies, I missed your last message. We did extensive testing with parallel deletes and there were no 'transport endpoint shutdown' messages. We're still getting evictions from OSTs but that is a separate issue. So I think we can consider the patch a success in fixing that issue. Kind regards, Campbell
            pjones Peter Jones added a comment -

            Disappointing to not hear explicit feedback on the effectiveness of the patch but I suppose no news is good news...

            pjones Peter Jones added a comment - Disappointing to not hear explicit feedback on the effectiveness of the patch but I suppose no news is good news...
            pjones Peter Jones added a comment -

            How are things shaping up with the patch cmcl? Ok to consider this ticket closed?

            pjones Peter Jones added a comment - How are things shaping up with the patch cmcl ? Ok to consider this ticket closed?
            green Oleg Drokin added a comment -

            Yes, llite_lloop.ko is not really used nowadays so you should not worry too much about it. Please let me know how it goes, also if the problems persist, please collect the logs like before.

            green Oleg Drokin added a comment - Yes, llite_lloop.ko is not really used nowadays so you should not worry too much about it. Please let me know how it goes, also if the problems persist, please collect the logs like before.

            Hi Oleg,

            Have built the server packages from that source tree you linked. Upon installation, I got a lot of warnings about llite_lloop.ko needing various unknown symbols, but I read somewhere that this package is obsolete - need I worry about this?

            Will start some deletes and see how it goes.

            Regards,

             

            Campbell

            cmcl Campbell Mcleay (Inactive) added a comment - - edited Hi Oleg, Have built the server packages from that source tree you linked. Upon installation, I got a lot of warnings about llite_lloop.ko needing various unknown symbols, but I read somewhere that this package is obsolete - need I worry about this? Will start some deletes and see how it goes. Regards,   Campbell
            green Oleg Drokin added a comment -

            The ported patch is here: https://review.whamcloud.com/#/c/34131/

            I hoped it would be done testing by now, but apparently we have some test system slowness where results take awhile to become available.

            green Oleg Drokin added a comment - The ported patch is here: https://review.whamcloud.com/#/c/34131/ I hoped it would be done testing by now, but apparently we have some test system slowness where results take awhile to become available.

            Hi Oleg,

            Thanks, we'll try the patch, so let's go ahead with that.

            Kind regards,

            Campbell

            cmcl Campbell Mcleay (Inactive) added a comment - Hi Oleg, Thanks, we'll try the patch, so let's go ahead with that. Kind regards, Campbell

            People

              green Oleg Drokin
              cmcl Campbell Mcleay (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: