Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-65

Interop testing results for 1.8.5.54 clients with Lustre 2.0.59 servers

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.1.0
    • Lustre 1.8.6
    • None
    • 4 OSS, each with 7 OSTS with one MDS with a disk devoted to a MDT and another disk devoted to the MGS; all running lustre 2.0.59. I have 4 clients running lustre 1.8.5.54

    Description

      These are the results of running the b1_8 acc_sm test. I also listed the results at https://bugzilla.lustre.org/show_bug.cgi?id=21367.

      sanity 64b - bug 22703
      sanity 72b - bug 24226
      replay-single 65a - bug 19960
      sanity-quota - totally broken. Does not work at all. Locks up clients

      obdfilter-survey - locks the client up. No debug output
      config-sanity 55,56,57 - no bug report yet. Seeing the following error

      LustreError: 13996:0:(mdt_handler.c:4521:mdt_init0()) CMD Operation not allowed in IOP mode
      LustreError: 13996:0:(obd_config.c:495:class_setup()) setup lustre-MDT0001 failed (-22)
      LustreError: 13996:0:(obd_config.c:1338:class_config_llog_handler()) Err -22 on cfg command:
      Lustre: cmd=cf003 0:lustre-MDT0001 1:lustre-MDT0001_UUID 2:1 3:lustre-MDT0001-mdtlov 4:f
      LustreError: 15b-f: MGC10.36.230.2@o2ib: The configuration from log 'lustre-MDT0001'failed from the
      MGS (-22). Make sure this client and the MGS are running compatible versions of Lustre.
      LustreError: 15c-8: MGC10.36.230.2@o2ib: The configuration from log 'lustre-MDT0001' failed (-22).
      This may be the result of communication errors between this node and the MGS, a bad configuration,
      or other errors. See the syslog for more information.

      I will provide more info and logs as very soon.

      Attachments

        Activity

          [LU-65] Interop testing results for 1.8.5.54 clients with Lustre 2.0.59 servers
          pjones Peter Jones added a comment -

          Believed resolved. ORNL will reopen or open a new ticket if their reproducer still has issues

          pjones Peter Jones added a comment - Believed resolved. ORNL will reopen or open a new ticket if their reproducer still has issues

          FWIW, your ubuntu reviews builds are failing due to LU-92. I've just submitted http://review.whamcloud.com/#change,356 to test a patch to fix this. If it passes it's review testing, you could try to import that patch into your branch, putting it patch before your patch to see if it resolves your ubuntu build issue.

          brian Brian Murrell (Inactive) added a comment - FWIW, your ubuntu reviews builds are failing due to LU-92 . I've just submitted http://review.whamcloud.com/#change,356 to test a patch to fix this. If it passes it's review testing, you could try to import that patch into your branch, putting it patch before your patch to see if it resolves your ubuntu build issue.

          Integrated in reviews-centos5 #541
          LU-65 ORNL Lustre 2.X testing

          James Simmons : 43d727e089f1a1cf237da4251dc2aa661de05a0b
          Files :

          • lustre/tests/conf-sanity.sh
          • libcfs/libcfs/darwin/darwin-proc.c
          • lustre/obdclass/class_obd.c
          • lustre/include/obd_support.h
          • libcfs/include/libcfs/Makefile.am
          • lustre/obdclass/darwin/darwin-sysctl.c
          • lustre/lvfs/lvfs_lib.c
          • lustre/obdfilter/filter.c
          • lustre/obdclass/linux/linux-sysctl.c
          • libcfs/libcfs/module.c
          • lustre/mdt/mdt_internal.h
          • libcfs/libcfs/Makefile.in
          • lustre/liblustre/tests/recovery_small.c
          • libcfs/libcfs/linux/linux-proc.c
          • libcfs/include/libcfs/libcfs.h
          • lustre/include/darwin/obd_support.h
          • lustre/include/linux/obd_support.h
          • lustre/tests/sanity-gss.sh
          • libcfs/libcfs/autoMakefile.am
          • libcfs/libcfs/fail.c
          • libcfs/include/libcfs/libcfs_fail.h
          hudson Build Master (Inactive) added a comment - Integrated in reviews-centos5 #541 LU-65 ORNL Lustre 2.X testing James Simmons : 43d727e089f1a1cf237da4251dc2aa661de05a0b Files : lustre/tests/conf-sanity.sh libcfs/libcfs/darwin/darwin-proc.c lustre/obdclass/class_obd.c lustre/include/obd_support.h libcfs/include/libcfs/Makefile.am lustre/obdclass/darwin/darwin-sysctl.c lustre/lvfs/lvfs_lib.c lustre/obdfilter/filter.c lustre/obdclass/linux/linux-sysctl.c libcfs/libcfs/module.c lustre/mdt/mdt_internal.h libcfs/libcfs/Makefile.in lustre/liblustre/tests/recovery_small.c libcfs/libcfs/linux/linux-proc.c libcfs/include/libcfs/libcfs.h lustre/include/darwin/obd_support.h lustre/include/linux/obd_support.h lustre/tests/sanity-gss.sh libcfs/libcfs/autoMakefile.am libcfs/libcfs/fail.c libcfs/include/libcfs/libcfs_fail.h

          The patch of "http://review.whamcloud.com/#change,286" has been landed, I think you can test with the latest code.

          On the other hand, would you like to update your patch of "http://review.whamcloud.com/#change,284" to make it more compatible?

          Thanks

          yong.fan nasf (Inactive) added a comment - The patch of "http://review.whamcloud.com/#change,286" has been landed, I think you can test with the latest code. On the other hand, would you like to update your patch of "http://review.whamcloud.com/#change,284" to make it more compatible? Thanks

          Sorry I haven't been able to test. The build system is broken.

          /data/buildsystem/jsimmons-head/rpmbuild/BUILD/lustre-2.0.59/lustre/lvfs/fsfilt-ldiskfs.c: In function 'fsfilt_ldiskfs_fid2dentry':
          /data/buildsystem/jsimmons-head/rpmbuild/BUILD/lustre-2.0.59/lustre/lvfs/fsfilt-ldiskfs.c:2352: error: implicit declaration of function 'exportfs_decode_fh'
          /data/buildsystem/jsimmons-head/rpmbuild/BUILD/lustre-2.0.59/lustre/lvfs/fsfilt-ldiskfs.c:2352: error: 'FILEID_INO32_GEN' undeclared (first use in this function)
          /data/buildsystem/jsimmons-head/rpmbuild/BUILD/lustre-2.0.59/lustre/lvfs/fsfilt-ldiskfs.c:2352: error: (Each undeclared identifier is reported only once
          /data/buildsystem/jsimmons-head/rpmbuild/BUILD/lustre-2.0.59/lustre/lvfs/fsfilt-ldiskfs.c:2352: error: for each function it appears in.)
          cc1: warnings being treated as errors

          simmonsja James A Simmons added a comment - Sorry I haven't been able to test. The build system is broken. /data/buildsystem/jsimmons-head/rpmbuild/BUILD/lustre-2.0.59/lustre/lvfs/fsfilt-ldiskfs.c: In function 'fsfilt_ldiskfs_fid2dentry': /data/buildsystem/jsimmons-head/rpmbuild/BUILD/lustre-2.0.59/lustre/lvfs/fsfilt-ldiskfs.c:2352: error: implicit declaration of function 'exportfs_decode_fh' /data/buildsystem/jsimmons-head/rpmbuild/BUILD/lustre-2.0.59/lustre/lvfs/fsfilt-ldiskfs.c:2352: error: 'FILEID_INO32_GEN' undeclared (first use in this function) /data/buildsystem/jsimmons-head/rpmbuild/BUILD/lustre-2.0.59/lustre/lvfs/fsfilt-ldiskfs.c:2352: error: (Each undeclared identifier is reported only once /data/buildsystem/jsimmons-head/rpmbuild/BUILD/lustre-2.0.59/lustre/lvfs/fsfilt-ldiskfs.c:2352: error: for each function it appears in.) cc1: warnings being treated as errors

          > For http://review.whamcloud.com/#change,286 the two sets of patches conflict. Is the second patch the only valid one?

          Yes, set 2 is the right one.

          yong.fan nasf (Inactive) added a comment - > For http://review.whamcloud.com/#change,286 the two sets of patches conflict. Is the second patch the only valid one? Yes, set 2 is the right one.

          For http://review.whamcloud.com/#change,286 the two sets of patches conflict. Is the second patch the only valid one?

          simmonsja James A Simmons added a comment - For http://review.whamcloud.com/#change,286 the two sets of patches conflict. Is the second patch the only valid one?
          yong.fan nasf (Inactive) added a comment - - edited

          > config-sanity 55,56,57 - no bug report yet.

          According to the MDS side log (lustre_conf-sanity_test_55.1297196706.gz), the system is not ready to accept client(10.36.230.36@o2ib) connection yet.

          ==========
          00000004:00000001:2.0:1297196688.391043:0:19245:0:(mdt_handler.c:2528:mdt_req_handle()) Process entered
          00000004:00000001:2.0:1297196688.391045:0:19245:0:(mdt_handler.c:2482:mdt_unpack_req_pack_rep()) Process entered
          00000004:00000001:2.0:1297196688.391046:0:19245:0:(mdt_handler.c:2503:mdt_unpack_req_pack_rep()) Process leaving (rc=0 : 0 : 0)
          00010000:00000001:2.0:1297196688.391049:0:19245:0:(ldlm_lib.c:667:target_handle_connect()) Process entered
          00010000:02000400:2.0:1297196688.391055:0:19245:0:(ldlm_lib.c:694:target_handle_connect()) lustre-MDT0000: temporarily refusing client connection from 10.36.230.36@o2ib
          00010000:00000001:2.0:1297196688.406588:0:19245:0:(ldlm_lib.c:695:target_handle_connect()) Process leaving via out (rc=18446744073709551605 : -11 : 0xfffffffffffffff5)
          00010000:00000001:2.0:1297196688.406591:0:19245:0:(ldlm_lib.c:1082:target_handle_connect()) Process leaving (rc=18446744073709551605 : -11 : fffffffffffffff5)
          ==========

          That means MDS returned "EAGAIN" to client to tell it retry later, which is normal case. But from the log, I can not find any other communication between client and MDS after that, until MDS reported test_55 failure.

          00000001:00000001:5.0:1297196705.949788:0:19367:0:(debug.c:439:libcfs_debug_mark_buffer()) ***************************************************
          00000001:02000400:5.0:1297196705.949789:0:19367:0:(debug.c:440:libcfs_debug_mark_buffer()) DEBUG MARKER: conf-sanity test_55: @@@@@@ FAIL: client start failed
          00000001:00000001:5.0:1297196705.963380:0:19367:0:(debug.c:441:libcfs_debug_mark_buffer()) ***************************************************

          I need client side log to investigate what happened on client after MDS returned "EAGAIN".
          James, are there any logs for that? It seems not easy to reproduce conf_sanity test_55 failure.

          Thanks!

          yong.fan nasf (Inactive) added a comment - - edited > config-sanity 55,56,57 - no bug report yet. According to the MDS side log (lustre_conf-sanity_test_55.1297196706.gz), the system is not ready to accept client(10.36.230.36@o2ib) connection yet. ========== 00000004:00000001:2.0:1297196688.391043:0:19245:0:(mdt_handler.c:2528:mdt_req_handle()) Process entered 00000004:00000001:2.0:1297196688.391045:0:19245:0:(mdt_handler.c:2482:mdt_unpack_req_pack_rep()) Process entered 00000004:00000001:2.0:1297196688.391046:0:19245:0:(mdt_handler.c:2503:mdt_unpack_req_pack_rep()) Process leaving (rc=0 : 0 : 0) 00010000:00000001:2.0:1297196688.391049:0:19245:0:(ldlm_lib.c:667:target_handle_connect()) Process entered 00010000:02000400:2.0:1297196688.391055:0:19245:0:(ldlm_lib.c:694:target_handle_connect()) lustre-MDT0000: temporarily refusing client connection from 10.36.230.36@o2ib 00010000:00000001:2.0:1297196688.406588:0:19245:0:(ldlm_lib.c:695:target_handle_connect()) Process leaving via out (rc=18446744073709551605 : -11 : 0xfffffffffffffff5) 00010000:00000001:2.0:1297196688.406591:0:19245:0:(ldlm_lib.c:1082:target_handle_connect()) Process leaving (rc=18446744073709551605 : -11 : fffffffffffffff5) ========== That means MDS returned "EAGAIN" to client to tell it retry later, which is normal case. But from the log, I can not find any other communication between client and MDS after that, until MDS reported test_55 failure. 00000001:00000001:5.0:1297196705.949788:0:19367:0:(debug.c:439:libcfs_debug_mark_buffer()) *************************************************** 00000001:02000400:5.0:1297196705.949789:0:19367:0:(debug.c:440:libcfs_debug_mark_buffer()) DEBUG MARKER: conf-sanity test_55: @@@@@@ FAIL: client start failed 00000001:00000001:5.0:1297196705.963380:0:19367:0:(debug.c:441:libcfs_debug_mark_buffer()) *************************************************** I need client side log to investigate what happened on client after MDS returned "EAGAIN". James, are there any logs for that? It seems not easy to reproduce conf_sanity test_55 failure. Thanks!
          yong.fan nasf (Inactive) added a comment - - edited

          > sanity-quota - totally broken. Does not work at all. Locks up clients

          Are there any logs related with sanity_quota interoperability test which caused client locked up? Because sanity-quota interoperability test passed in my local environment. I have checked bugzilla also, and found recent test result:

          https://bugzilla.lustre.org/show_bug.cgi?id=24207#c4

          That means sanity-quota interoperability works under TCP environment, but failed under IB case for bug 24055, and related patch for bug 24055 has been landed.

          So would you like to check whether such patch applied in your test. On the other hand, I think bug 24055's patch is not enough, you need above patch for bug 22703 also.

          Thanks!

          yong.fan nasf (Inactive) added a comment - - edited > sanity-quota - totally broken. Does not work at all. Locks up clients Are there any logs related with sanity_quota interoperability test which caused client locked up? Because sanity-quota interoperability test passed in my local environment. I have checked bugzilla also, and found recent test result: https://bugzilla.lustre.org/show_bug.cgi?id=24207#c4 That means sanity-quota interoperability works under TCP environment, but failed under IB case for bug 24055, and related patch for bug 24055 has been landed. So would you like to check whether such patch applied in your test. On the other hand, I think bug 24055's patch is not enough, you need above patch for bug 22703 also. Thanks!

          > sanity 64b - bug 22703

          I have made patch for it:
          http://review.whamcloud.com/#change,286 (for master)
          http://review.whamcloud.com/#change,287 (for b1_8)

          yong.fan nasf (Inactive) added a comment - > sanity 64b - bug 22703 I have made patch for it: http://review.whamcloud.com/#change,286 (for master) http://review.whamcloud.com/#change,287 (for b1_8)

          People

            yong.fan nasf (Inactive)
            simmonsja James A Simmons
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: