Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17597

interop: master/2.15/2.14/2.12 sanity test_56x: migrate failed rc = 22

Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • None
    • Lustre 2.16.0
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com>

      This issue relates to the following test suite run:
      https://testing.whamcloud.com/test_sets/a8aa2796-5d0c-4650-a096-0e09a2c7a98e

      test_56x started failing on 2023-11-18 with the following error:

      migrate failed rc = 22
      

      Test session details:
      clients: https://build.whamcloud.com/job/lustre-master/4489 - 5.14.0-284.30.1.el9_2.x86_64
      servers: https://build.whamcloud.com/job/lustre-b2_12/164 - 3.10.0-1160.49.1.el7_lustre.x86_64

      Patch https://review.whamcloud.com/51126 "LU-13805 llite: Implement unaligned DIO connect flag" was landed on 2023-11-18 without any actual interop testing.

      I now see pretty regular sanity test_56x failures with "-22 = -EINVAL" being returned from "lfs migrate" during interop testing and it looks from the debug log that this patch is the culprit, even though the servers are running ldiskfs:

       rw26.c:517:ll_direct_IO_impl()) VFS Op:inode=[0x2000013a3:0x31a:0x0](ffff9db7850f5a90), size=2868 (max 603979776),
              offset=0=0, pages 1 (max 147456), locked, parallel, unaligned
       rw26.c:547:ll_direct_IO_impl()) Process leaving (rc=18446744073709551594 : -22 : ffffffffffffffea)
       cl_io_start()) Process leaving (rc=18446744073709551594 : -22 : ffffffffffffffea)
       cl_io_loop()) Process leaving (rc=18446744073709551594 : -22 : ffffffffffffffea)
       ll_file_io_generic()) file1: 1 io complete with rc: -22, result: 0, restart: 0
       ll_file_io_generic()) Process leaving (rc=18446744073709551594 : -22 : ffffffffffffffea)
       ll_file_read_iter()) Process leaving (rc=18446744073709551594 : -22 : ffffffffffffffea)
      
       543         /* this means we encountered an old server which can't safely support
       544          * unaligned DIO, so we have to disable it
       545          */
       546         if (unaligned && !cl_io_top(io)->ci_allow_unaligned_dio)
       547                 RETURN(-EINVAL);
      

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      sanity test_56x - migrate failed rc = 22

      Attachments

        Issue Links

          Activity

            [LU-17597] interop: master/2.15/2.14/2.12 sanity test_56x: migrate failed rc = 22
            pjones Peter Jones added a comment -

            Believed to be duplicate of LU-17525 

            pjones Peter Jones added a comment - Believed to be duplicate of LU-17525  

            It looks like this interop issue will be fixed via Shaun's patch https://review.whamcloud.com/53997 "LU-17525 llite: unaligned DIO iterop page alignment".

            adilger Andreas Dilger added a comment - It looks like this interop issue will be fixed via Shaun's patch https://review.whamcloud.com/53997 " LU-17525 llite: unaligned DIO iterop page alignment ".

            Hi Patrick, could you please take a look at this.

            It appears the "LU-13805 llite: Implement unaligned DIO connect flag" patch has broken interop for ldiskfs servers rather than fixed it. I'm not totally sure why, but it might be that setting "data->ocd_connect_flags2 |= OBD_CONNECT2_UNALIGNED_DIO" is done when it is too late to have any affect on the client (e.g. it isn't saved in the OSC import that is checked later?).

            The testing pretty clearly shows that the issue started on 2023-11-18 when that patch landed.

            https://testing.whamcloud.com/search?status%5B%5D=FAIL&test_set_script_id=f9516376-32bc-11e0-aaee-52540025f9ae&sub_test_script_id=e1b4c5d2-90fc-11e2-8311-52540035b04c&start_date=2023-11-01&end_date=2024-01-01&source=sub_tests#redirect

            adilger Andreas Dilger added a comment - Hi Patrick, could you please take a look at this. It appears the " LU-13805 llite: Implement unaligned DIO connect flag " patch has broken interop for ldiskfs servers rather than fixed it. I'm not totally sure why, but it might be that setting " data->ocd_connect_flags2 |= OBD_CONNECT2_UNALIGNED_DIO " is done when it is too late to have any affect on the client (e.g. it isn't saved in the OSC import that is checked later?). The testing pretty clearly shows that the issue started on 2023-11-18 when that patch landed. https://testing.whamcloud.com/search?status%5B%5D=FAIL&test_set_script_id=f9516376-32bc-11e0-aaee-52540025f9ae&sub_test_script_id=e1b4c5d2-90fc-11e2-8311-52540035b04c&start_date=2023-11-01&end_date=2024-01-01&source=sub_tests#redirect

            People

              stancheff Shaun Tancheff
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: