Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-394

LND failure casued by discontiguous KIOV pages

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.1.0
    • Lustre 2.1.0
    • None
    • 2
    • 4943

    Description

      Cray's gnilnd is running into a hole in kiov list in Lustre 2.1:

      LustreError: 17837:0:(gnilnd_cb.c:594:kgnilnd_setup_phys_buffer()) Can't make payload
      contiguous in I/O VM:page 17, offset 0, nob 6350, kiov_offset 0 kiov_len 2254
      LustreError: 17837:0:(gnilnd_cb.c:1751:kgnilnd_send()) unable to setup buffer: -22

      It used to be that only the first and last page in an IOV were allowed
      to be of a offset + length < PAGE_SIZE.

      It doesn't have this problem with 1.8 client and 2.1 server.

      This problem can be reproduced by "fsx-linux -WR -dn -N 10000 junkfile".

      The osc_brw() is never called and the unfragmented pages logic is not exercised in 2.1

      Attachments

        1. c38.dk.lu394
          6.02 MB
        2. nid0037.dk
          722 kB

        Activity

          [LU-394] LND failure casued by discontiguous KIOV pages

          Integrated in lustre-master » x86_64,server,el5,inkernel #228
          LU-394: LND failure casued by discontiguous KIOV

          Oleg Drokin : 419016ac3e53e798453106ec04412a4843620916
          Files :

          • lustre/obdclass/cl_lock.c
          • lustre/include/cl_object.h
          • lustre/tests/sanity.sh
          • lustre/osc/osc_io.c
          • lustre/obdclass/cl_page.c
          • lustre/osc/osc_lock.c
          • lustre/osc/osc_page.c
          • lustre/osc/osc_request.c
          • lustre/include/obd_support.h
          • lustre/osc/osc_object.c
          • lustre/llite/vvp_io.c
          hudson Build Master (Inactive) added a comment - Integrated in lustre-master » x86_64,server,el5,inkernel #228 LU-394 : LND failure casued by discontiguous KIOV Oleg Drokin : 419016ac3e53e798453106ec04412a4843620916 Files : lustre/obdclass/cl_lock.c lustre/include/cl_object.h lustre/tests/sanity.sh lustre/osc/osc_io.c lustre/obdclass/cl_page.c lustre/osc/osc_lock.c lustre/osc/osc_page.c lustre/osc/osc_request.c lustre/include/obd_support.h lustre/osc/osc_object.c lustre/llite/vvp_io.c

          Integrated in lustre-master » i686,client,el6,inkernel #228
          LU-394: LND failure casued by discontiguous KIOV

          Oleg Drokin : 419016ac3e53e798453106ec04412a4843620916
          Files :

          • lustre/llite/vvp_io.c
          • lustre/tests/sanity.sh
          • lustre/obdclass/cl_page.c
          • lustre/osc/osc_page.c
          • lustre/include/cl_object.h
          • lustre/osc/osc_object.c
          • lustre/obdclass/cl_lock.c
          • lustre/include/obd_support.h
          • lustre/osc/osc_request.c
          • lustre/osc/osc_lock.c
          • lustre/osc/osc_io.c
          hudson Build Master (Inactive) added a comment - Integrated in lustre-master » i686,client,el6,inkernel #228 LU-394 : LND failure casued by discontiguous KIOV Oleg Drokin : 419016ac3e53e798453106ec04412a4843620916 Files : lustre/llite/vvp_io.c lustre/tests/sanity.sh lustre/obdclass/cl_page.c lustre/osc/osc_page.c lustre/include/cl_object.h lustre/osc/osc_object.c lustre/obdclass/cl_lock.c lustre/include/obd_support.h lustre/osc/osc_request.c lustre/osc/osc_lock.c lustre/osc/osc_io.c

          Integrated in lustre-master » x86_64,client,ubuntu1004,inkernel #228
          LU-394: LND failure casued by discontiguous KIOV

          Oleg Drokin : 419016ac3e53e798453106ec04412a4843620916
          Files :

          • lustre/osc/osc_request.c
          • lustre/osc/osc_lock.c
          • lustre/osc/osc_page.c
          • lustre/llite/vvp_io.c
          • lustre/tests/sanity.sh
          • lustre/osc/osc_io.c
          • lustre/obdclass/cl_page.c
          • lustre/osc/osc_object.c
          • lustre/obdclass/cl_lock.c
          • lustre/include/cl_object.h
          • lustre/include/obd_support.h
          hudson Build Master (Inactive) added a comment - Integrated in lustre-master » x86_64,client,ubuntu1004,inkernel #228 LU-394 : LND failure casued by discontiguous KIOV Oleg Drokin : 419016ac3e53e798453106ec04412a4843620916 Files : lustre/osc/osc_request.c lustre/osc/osc_lock.c lustre/osc/osc_page.c lustre/llite/vvp_io.c lustre/tests/sanity.sh lustre/osc/osc_io.c lustre/obdclass/cl_page.c lustre/osc/osc_object.c lustre/obdclass/cl_lock.c lustre/include/cl_object.h lustre/include/obd_support.h

          Integrated in lustre-master » x86_64,client,el6,inkernel #228
          LU-394: LND failure casued by discontiguous KIOV

          Oleg Drokin : 419016ac3e53e798453106ec04412a4843620916
          Files :

          • lustre/llite/vvp_io.c
          • lustre/osc/osc_lock.c
          • lustre/tests/sanity.sh
          • lustre/osc/osc_io.c
          • lustre/obdclass/cl_page.c
          • lustre/obdclass/cl_lock.c
          • lustre/osc/osc_page.c
          • lustre/include/obd_support.h
          • lustre/osc/osc_object.c
          • lustre/osc/osc_request.c
          • lustre/include/cl_object.h
          hudson Build Master (Inactive) added a comment - Integrated in lustre-master » x86_64,client,el6,inkernel #228 LU-394 : LND failure casued by discontiguous KIOV Oleg Drokin : 419016ac3e53e798453106ec04412a4843620916 Files : lustre/llite/vvp_io.c lustre/osc/osc_lock.c lustre/tests/sanity.sh lustre/osc/osc_io.c lustre/obdclass/cl_page.c lustre/obdclass/cl_lock.c lustre/osc/osc_page.c lustre/include/obd_support.h lustre/osc/osc_object.c lustre/osc/osc_request.c lustre/include/cl_object.h

          Integrated in lustre-master » x86_64,client,sles11,inkernel #228
          LU-394: LND failure casued by discontiguous KIOV

          Oleg Drokin : 419016ac3e53e798453106ec04412a4843620916
          Files :

          • lustre/osc/osc_lock.c
          • lustre/include/obd_support.h
          • lustre/obdclass/cl_page.c
          • lustre/osc/osc_page.c
          • lustre/osc/osc_request.c
          • lustre/llite/vvp_io.c
          • lustre/osc/osc_io.c
          • lustre/obdclass/cl_lock.c
          • lustre/include/cl_object.h
          • lustre/osc/osc_object.c
          • lustre/tests/sanity.sh
          hudson Build Master (Inactive) added a comment - Integrated in lustre-master » x86_64,client,sles11,inkernel #228 LU-394 : LND failure casued by discontiguous KIOV Oleg Drokin : 419016ac3e53e798453106ec04412a4843620916 Files : lustre/osc/osc_lock.c lustre/include/obd_support.h lustre/obdclass/cl_page.c lustre/osc/osc_page.c lustre/osc/osc_request.c lustre/llite/vvp_io.c lustre/osc/osc_io.c lustre/obdclass/cl_lock.c lustre/include/cl_object.h lustre/osc/osc_object.c lustre/tests/sanity.sh

          Integrated in lustre-master » x86_64,client,el5,inkernel #228
          LU-394: LND failure casued by discontiguous KIOV

          Oleg Drokin : 419016ac3e53e798453106ec04412a4843620916
          Files :

          • lustre/obdclass/cl_lock.c
          • lustre/tests/sanity.sh
          • lustre/osc/osc_request.c
          • lustre/osc/osc_page.c
          • lustre/llite/vvp_io.c
          • lustre/include/cl_object.h
          • lustre/osc/osc_io.c
          • lustre/obdclass/cl_page.c
          • lustre/osc/osc_object.c
          • lustre/osc/osc_lock.c
          • lustre/include/obd_support.h
          hudson Build Master (Inactive) added a comment - Integrated in lustre-master » x86_64,client,el5,inkernel #228 LU-394 : LND failure casued by discontiguous KIOV Oleg Drokin : 419016ac3e53e798453106ec04412a4843620916 Files : lustre/obdclass/cl_lock.c lustre/tests/sanity.sh lustre/osc/osc_request.c lustre/osc/osc_page.c lustre/llite/vvp_io.c lustre/include/cl_object.h lustre/osc/osc_io.c lustre/obdclass/cl_page.c lustre/osc/osc_object.c lustre/osc/osc_lock.c lustre/include/obd_support.h
          pjones Peter Jones added a comment -

          Landed for 2.1

          pjones Peter Jones added a comment - Landed for 2.1

          Integrated in lustre-master » x86_64,client,el5,ofa #228
          LU-394: LND failure casued by discontiguous KIOV

          Oleg Drokin : 419016ac3e53e798453106ec04412a4843620916
          Files :

          • lustre/osc/osc_object.c
          • lustre/include/obd_support.h
          • lustre/osc/osc_lock.c
          • lustre/obdclass/cl_lock.c
          • lustre/osc/osc_request.c
          • lustre/tests/sanity.sh
          • lustre/llite/vvp_io.c
          • lustre/osc/osc_io.c
          • lustre/obdclass/cl_page.c
          • lustre/include/cl_object.h
          • lustre/osc/osc_page.c
          hudson Build Master (Inactive) added a comment - Integrated in lustre-master » x86_64,client,el5,ofa #228 LU-394 : LND failure casued by discontiguous KIOV Oleg Drokin : 419016ac3e53e798453106ec04412a4843620916 Files : lustre/osc/osc_object.c lustre/include/obd_support.h lustre/osc/osc_lock.c lustre/obdclass/cl_lock.c lustre/osc/osc_request.c lustre/tests/sanity.sh lustre/llite/vvp_io.c lustre/osc/osc_io.c lustre/obdclass/cl_page.c lustre/include/cl_object.h lustre/osc/osc_page.c

          Sorry, it actually works.

          I must have mis-labeled the image. After a total rebuild/install, it all works fine.

          wang Wally Wang (Inactive) added a comment - Sorry, it actually works. I must have mis-labeled the image. After a total rebuild/install, it all works fine.

          From the log, it looks like the last page(index 49) was still added to the request. Are you sure you're using the correct patch? Can you please run the test case sanity:219 to check if it works?

          I'm going to provide you a new patch with more debug info.

          jay Jinshan Xiong (Inactive) added a comment - From the log, it looks like the last page(index 49) was still added to the request. Are you sure you're using the correct patch? Can you please run the test case sanity:219 to check if it works? I'm going to provide you a new patch with more debug info.

          debug log is attached, the ost console shows:

          2011-07-18T15:15:28.004617-05:00 c0-0c1s5n2 LNetError: 15810:0:(gnilnd_cb.c:594:kgnilnd_setup_phys_buffer()) Can't make payload contiguous in I/O VM:page 17, offset 0, nob 6350, kiov_offset 0 kiov_len 2254
          2011-07-18T15:15:28.004650-05:00 c0-0c1s5n2 LNetError: 15810:0:(gnilnd_cb.c:1751:kgnilnd_send()) unable to setup buffer: -22

          wang Wally Wang (Inactive) added a comment - debug log is attached, the ost console shows: 2011-07-18T15:15:28.004617-05:00 c0-0c1s5n2 LNetError: 15810:0:(gnilnd_cb.c:594:kgnilnd_setup_phys_buffer()) Can't make payload contiguous in I/O VM:page 17, offset 0, nob 6350, kiov_offset 0 kiov_len 2254 2011-07-18T15:15:28.004650-05:00 c0-0c1s5n2 LNetError: 15810:0:(gnilnd_cb.c:1751:kgnilnd_send()) unable to setup buffer: -22

          People

            jay Jinshan Xiong (Inactive)
            wang Wally Wang (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: