Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2380

Hang and eviction scenario when multiple tasks/nodes do ftruncate() on the same file in parallel

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • None
    • Lustre 2.1.3
    • None
    • 3
    • 5649

    Description

      We have an scenario where some of the Lustre clients are hang and consequently evicted causing frequent job failures.

      This issue arises when ftruncate() is done on the same file and by several tasks/nodes in parallel. Even if it's clear that this is not a good practice, this behavior is quite used by some middle-tier layer heavily used at customer
      site. The consequences are that customer suffers frequent job failures.

      I think Lustre should be more resilient to this kind of issues.

      This bug has been reproduced in lustre 2.1.3 with 36 nodes running 576 mpi taks. In this case 50% of the times the test failed but the problem could maybe also be reproduced with less nodes/tasks.

      This is the reproducer:

      % cat truncate.c

      #include <mpi.h>
      #include <stdio.h>
      #include <stdlib.h>
      #include <string.h>
      #include <fcntl.h>
      #include <sys/types.h>
      #include <sys/stat.h>
      #include <unistd.h>
      
      
      main(int argc,char **argv)
      {
              int npes,myrank,l,i,j,k,n,ierr;
              int *len;
              char *rank ,*name;
              int *buf;
              struct stat *bufstat;
              int fd;
      
              MPI_Init(&argc,&argv);
              MPI_Comm_size(MPI_COMM_WORLD,&npes);
              MPI_Comm_rank(MPI_COMM_WORLD,&myrank);
      
              printf(" My rank is %d of %d \n",myrank,npes);
              buf=(int *) malloc(sizeof(int)*10000*npes);
              for (i=0;i<10000;i++) buf[i]=myrank*i;
      
              bufstat=(struct stat *) malloc(sizeof(struct stat));
      
              for (i=0;i<10;i++) {
                      if ((fd=open("/ptmp/user/mytrunk",O_CREAT|O_RDWR, 0666)) < 0) {
                              perror("open()");
                              exit(-1);
                      }
                      if (fstat(fd,bufstat) < 0) {
                              perror("fstat()");
                              exit(-1);
                      }
                      if (lseek(fd, 0, SEEK_SET) < 0) {
                              perror("lseek(SEEK_SET)");
                              exit(-1);
                      }
                      if (lseek(fd, 0, SEEK_CUR) < 0) {
                              perror("lseek(SEEK_CUR)");
                              exit(-1);
                      }
                      if ((ierr=ftruncate(fd,0)) < 0) {
                              perror("ftruncate()");
                              exit(-1);
                      }
                      if (write(fd,buf,10000*npes) < 0) {
                              perror("write()");
                              exit(-1);
                      }
                      close(fd);
              }
      
              MPI_Finalize();
      }
      

      And this is what happened:

      % mpicc truncate.c -o truncate
      % srun -n 576 -N 36 --exclusive --resv-ports ./truncate
      srun: job 269744 queued and waiting for resources
      srun: job 269744 has been allocated resources
      <<<<<< Job/run stuck for minutes !!... >>>>>>>
      fstat(): Interrupted system call
      fstat(): Interrupted system call
      fstat(): Interrupted system call
      fstat(): Interrupted system call
      ftruncate(): Interrupted system call
       My rank is 10 of 16
       My rank is 11 of 16
       My rank is 13 of 16
       My rank is 14 of 16
       My rank is 15 of 16
      fstat(): Interrupted system call
       My rank is 8 of 16
      ftruncate(): Cannot send after transport endpoint shutdown
       My rank is 9 of 16
      srun: error: lascaux2825: tasks 8-11,13-15: Exited with exit code 255
      srun: Terminating job step 269744.0
      slurmd[lascaux2825]: *** STEP 269744.0 KILLED AT 2012-06-12T16:09:25 WITH SIGNAL 9 ***
      slurmd[lascaux2824]: *** STEP 269744.0 KILLED AT 2012-06-12T16:09:25 WITH SIGNAL 9 ***
      srun: Job step aborted: Waiting up to 2 seconds for job step to finish.
      slurmd[lascaux2825]: *** STEP 269744.0 KILLED AT 2012-06-12T16:09:25 WITH SIGNAL 9 ***
      slurmd[lascaux2824]: *** STEP 269744.0 KILLED AT 2012-06-12T16:09:25 WITH SIGNAL 9 ***
      

      And one of both Nodes/Clients reported the folling messages :
      =============================================================

      1336119249 2012 May  4 10:14:09 cartan1174 kern warning kernel Lustre: ptmp-OST001a-osc-ffff880464a98000: Connection
      to service ptmp-OST001a via nid JO.BOO.ZO.LF@o2ib3 was lost; in progress operations using this service will wait for
      recovery to complete.
      1336119249 2012 May  4 10:14:09 cartan1174 kern err kernel LustreError: 45831:0:(ldlm_request.c:1152:ldlm_cli_cancel_req())
      Got rc -107 from cancel RPC: canceling anyway
      1336119249 2012 May  4 10:14:09 cartan1174 kern err kernel LustreError: 45831:0:(ldlm_request.c:1714:ldlm_cli_cancel_list())
      ldlm_cli_cancel_list: -107
      1336119249 2012 May  4 10:14:09 cartan1174 kern err kernel LustreError: 167-0: This client was evicted by ptmp-OST001a;
      in progress operations using this service will fail.
      1336119249 2012 May  4 10:14:09 cartan1174 kern err kernel LustreError: 16559:0:(client.c:1057:ptlrpc_import_delay_req())
      @@@ IMP_INVALID  req@ffff880375166000 x1398228708669187/t0(0) o101->ptmp-OST001a_UUID@JO.BOO.ZO.LF@o2ib3:28/4 lens
      296/352 e 0 to 0 dl 0 ref 1 fl Rpc:/ffffffff/ffffffff rc 0/-1
      1336119249 2012 May  4 10:14:09 cartan1174 kern err kernel LustreError: 16559:0:(client.c:1057:ptlrpc_import_delay_req())
      Skipped 31 previous similar messages
      1336119249 2012 May  4 10:14:09 cartan1174 kern err kernel LustreError: 16559:0:(client.c:1057:ptlrpc_import_delay_req())
      @@@ IMP_INVALID  req@ffff880d6719c800 x1398228708669192/t0(0) o101->ptmp-OST001a_UUID@JO.BOO.ZO.LF@o2ib3:28/4 lens
      296/352 e 0 to 0 dl 0 ref 1 fl Rpc:/ffffffff/ffffffff rc 0/-1
      1336119249 2012 May  4 10:14:09 cartan1174 kern info kernel Lustre: ptmp-OST001a-osc-ffff880464a98000: Connection restored
      to service ptmp-OST001a using nid JO.BOO.ZO.LF@o2ib3.
      

      =============================================================

      When one of the OSS handling the one of the OSTs hosting the file also reported on its side the following messages/errors:

      ====================================================

      1336119248 2012 May  4 10:14:08 cartan204 kern err kernel LustreError: 0:0:(ldlm_lockd.c:345:waiting_locks_callback())
      ### lock callback timer expired after 100s: evicting client at JO.BOO.ZL.BFP@o2ib3  ns: filter-ptmp-OST001a_UUID lock:
      ffff880d1d021b40/0x2f7b35259ae3e490 lrc: 3/0,0 mode: PW/PW res: 20974735/0 rrc: 2 type: EXT [0->18446744073709551615]
      (req 0->18446744073709551615) flags: 0x80010020 remote: 0x307f20f32f0be585 expref: 700 pid: 28661 timeout 4562700589
      1336119249 2012 May  4 10:14:09 cartan204 kern err kernel LustreError: 4578:0:(ldlm_lockd.c:1970:ldlm_cancel_handler())
      operation 103 from 12345-JO.BOO.ZL.BFP@o2ib3 with bad export cookie 3421386777158859540
      1336119249 2012 May  4 10:14:09 cartan204 kern warning kernel Lustre: 28674:0:(ldlm_lib.c:866:target_handle_connect())
      ptmp-OST001a: connection from 700d60eb-8a82-cf9e-f029-a5868449ec78@JO.BOO.ZL.BFP@o2ib3 t0 exp (null) cur 1336119249
      last 0
      1336119249 2012 May  4 10:14:09 cartan204 kern warning kernel Lustre: 28674:0:(filter.c:2813:filter_connect()) ptmp-OST001a:
      Received MDS connection (0x2f7b35259ae430da); group 0
      1336119249 2012 May  4 10:14:09 cartan204 kern warning kernel Lustre: 28674:0:(filter.c:2813:filter_connect()) Skipped
      5 previous similar messages
      

      ====================================================

      Attachments

        Issue Links

          Activity

            People

              laisiyao Lai Siyao
              dmoreno Diego Moreno (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: