Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-65

Interop testing results for 1.8.5.54 clients with Lustre 2.0.59 servers

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.1.0
    • Lustre 1.8.6
    • None
    • 4 OSS, each with 7 OSTS with one MDS with a disk devoted to a MDT and another disk devoted to the MGS; all running lustre 2.0.59. I have 4 clients running lustre 1.8.5.54

    Description

      These are the results of running the b1_8 acc_sm test. I also listed the results at https://bugzilla.lustre.org/show_bug.cgi?id=21367.

      sanity 64b - bug 22703
      sanity 72b - bug 24226
      replay-single 65a - bug 19960
      sanity-quota - totally broken. Does not work at all. Locks up clients

      obdfilter-survey - locks the client up. No debug output
      config-sanity 55,56,57 - no bug report yet. Seeing the following error

      LustreError: 13996:0:(mdt_handler.c:4521:mdt_init0()) CMD Operation not allowed in IOP mode
      LustreError: 13996:0:(obd_config.c:495:class_setup()) setup lustre-MDT0001 failed (-22)
      LustreError: 13996:0:(obd_config.c:1338:class_config_llog_handler()) Err -22 on cfg command:
      Lustre: cmd=cf003 0:lustre-MDT0001 1:lustre-MDT0001_UUID 2:1 3:lustre-MDT0001-mdtlov 4:f
      LustreError: 15b-f: MGC10.36.230.2@o2ib: The configuration from log 'lustre-MDT0001'failed from the
      MGS (-22). Make sure this client and the MGS are running compatible versions of Lustre.
      LustreError: 15c-8: MGC10.36.230.2@o2ib: The configuration from log 'lustre-MDT0001' failed (-22).
      This may be the result of communication errors between this node and the MGS, a bad configuration,
      or other errors. See the syslog for more information.

      I will provide more info and logs as very soon.

      Attachments

        Activity

          [LU-65] Interop testing results for 1.8.5.54 clients with Lustre 2.0.59 servers

          Sorry I haven't been able to test. The build system is broken.

          /data/buildsystem/jsimmons-head/rpmbuild/BUILD/lustre-2.0.59/lustre/lvfs/fsfilt-ldiskfs.c: In function 'fsfilt_ldiskfs_fid2dentry':
          /data/buildsystem/jsimmons-head/rpmbuild/BUILD/lustre-2.0.59/lustre/lvfs/fsfilt-ldiskfs.c:2352: error: implicit declaration of function 'exportfs_decode_fh'
          /data/buildsystem/jsimmons-head/rpmbuild/BUILD/lustre-2.0.59/lustre/lvfs/fsfilt-ldiskfs.c:2352: error: 'FILEID_INO32_GEN' undeclared (first use in this function)
          /data/buildsystem/jsimmons-head/rpmbuild/BUILD/lustre-2.0.59/lustre/lvfs/fsfilt-ldiskfs.c:2352: error: (Each undeclared identifier is reported only once
          /data/buildsystem/jsimmons-head/rpmbuild/BUILD/lustre-2.0.59/lustre/lvfs/fsfilt-ldiskfs.c:2352: error: for each function it appears in.)
          cc1: warnings being treated as errors

          simmonsja James A Simmons added a comment - Sorry I haven't been able to test. The build system is broken. /data/buildsystem/jsimmons-head/rpmbuild/BUILD/lustre-2.0.59/lustre/lvfs/fsfilt-ldiskfs.c: In function 'fsfilt_ldiskfs_fid2dentry': /data/buildsystem/jsimmons-head/rpmbuild/BUILD/lustre-2.0.59/lustre/lvfs/fsfilt-ldiskfs.c:2352: error: implicit declaration of function 'exportfs_decode_fh' /data/buildsystem/jsimmons-head/rpmbuild/BUILD/lustre-2.0.59/lustre/lvfs/fsfilt-ldiskfs.c:2352: error: 'FILEID_INO32_GEN' undeclared (first use in this function) /data/buildsystem/jsimmons-head/rpmbuild/BUILD/lustre-2.0.59/lustre/lvfs/fsfilt-ldiskfs.c:2352: error: (Each undeclared identifier is reported only once /data/buildsystem/jsimmons-head/rpmbuild/BUILD/lustre-2.0.59/lustre/lvfs/fsfilt-ldiskfs.c:2352: error: for each function it appears in.) cc1: warnings being treated as errors

          > For http://review.whamcloud.com/#change,286 the two sets of patches conflict. Is the second patch the only valid one?

          Yes, set 2 is the right one.

          yong.fan nasf (Inactive) added a comment - > For http://review.whamcloud.com/#change,286 the two sets of patches conflict. Is the second patch the only valid one? Yes, set 2 is the right one.

          For http://review.whamcloud.com/#change,286 the two sets of patches conflict. Is the second patch the only valid one?

          simmonsja James A Simmons added a comment - For http://review.whamcloud.com/#change,286 the two sets of patches conflict. Is the second patch the only valid one?
          yong.fan nasf (Inactive) added a comment - - edited

          > config-sanity 55,56,57 - no bug report yet.

          According to the MDS side log (lustre_conf-sanity_test_55.1297196706.gz), the system is not ready to accept client(10.36.230.36@o2ib) connection yet.

          ==========
          00000004:00000001:2.0:1297196688.391043:0:19245:0:(mdt_handler.c:2528:mdt_req_handle()) Process entered
          00000004:00000001:2.0:1297196688.391045:0:19245:0:(mdt_handler.c:2482:mdt_unpack_req_pack_rep()) Process entered
          00000004:00000001:2.0:1297196688.391046:0:19245:0:(mdt_handler.c:2503:mdt_unpack_req_pack_rep()) Process leaving (rc=0 : 0 : 0)
          00010000:00000001:2.0:1297196688.391049:0:19245:0:(ldlm_lib.c:667:target_handle_connect()) Process entered
          00010000:02000400:2.0:1297196688.391055:0:19245:0:(ldlm_lib.c:694:target_handle_connect()) lustre-MDT0000: temporarily refusing client connection from 10.36.230.36@o2ib
          00010000:00000001:2.0:1297196688.406588:0:19245:0:(ldlm_lib.c:695:target_handle_connect()) Process leaving via out (rc=18446744073709551605 : -11 : 0xfffffffffffffff5)
          00010000:00000001:2.0:1297196688.406591:0:19245:0:(ldlm_lib.c:1082:target_handle_connect()) Process leaving (rc=18446744073709551605 : -11 : fffffffffffffff5)
          ==========

          That means MDS returned "EAGAIN" to client to tell it retry later, which is normal case. But from the log, I can not find any other communication between client and MDS after that, until MDS reported test_55 failure.

          00000001:00000001:5.0:1297196705.949788:0:19367:0:(debug.c:439:libcfs_debug_mark_buffer()) ***************************************************
          00000001:02000400:5.0:1297196705.949789:0:19367:0:(debug.c:440:libcfs_debug_mark_buffer()) DEBUG MARKER: conf-sanity test_55: @@@@@@ FAIL: client start failed
          00000001:00000001:5.0:1297196705.963380:0:19367:0:(debug.c:441:libcfs_debug_mark_buffer()) ***************************************************

          I need client side log to investigate what happened on client after MDS returned "EAGAIN".
          James, are there any logs for that? It seems not easy to reproduce conf_sanity test_55 failure.

          Thanks!

          yong.fan nasf (Inactive) added a comment - - edited > config-sanity 55,56,57 - no bug report yet. According to the MDS side log (lustre_conf-sanity_test_55.1297196706.gz), the system is not ready to accept client(10.36.230.36@o2ib) connection yet. ========== 00000004:00000001:2.0:1297196688.391043:0:19245:0:(mdt_handler.c:2528:mdt_req_handle()) Process entered 00000004:00000001:2.0:1297196688.391045:0:19245:0:(mdt_handler.c:2482:mdt_unpack_req_pack_rep()) Process entered 00000004:00000001:2.0:1297196688.391046:0:19245:0:(mdt_handler.c:2503:mdt_unpack_req_pack_rep()) Process leaving (rc=0 : 0 : 0) 00010000:00000001:2.0:1297196688.391049:0:19245:0:(ldlm_lib.c:667:target_handle_connect()) Process entered 00010000:02000400:2.0:1297196688.391055:0:19245:0:(ldlm_lib.c:694:target_handle_connect()) lustre-MDT0000: temporarily refusing client connection from 10.36.230.36@o2ib 00010000:00000001:2.0:1297196688.406588:0:19245:0:(ldlm_lib.c:695:target_handle_connect()) Process leaving via out (rc=18446744073709551605 : -11 : 0xfffffffffffffff5) 00010000:00000001:2.0:1297196688.406591:0:19245:0:(ldlm_lib.c:1082:target_handle_connect()) Process leaving (rc=18446744073709551605 : -11 : fffffffffffffff5) ========== That means MDS returned "EAGAIN" to client to tell it retry later, which is normal case. But from the log, I can not find any other communication between client and MDS after that, until MDS reported test_55 failure. 00000001:00000001:5.0:1297196705.949788:0:19367:0:(debug.c:439:libcfs_debug_mark_buffer()) *************************************************** 00000001:02000400:5.0:1297196705.949789:0:19367:0:(debug.c:440:libcfs_debug_mark_buffer()) DEBUG MARKER: conf-sanity test_55: @@@@@@ FAIL: client start failed 00000001:00000001:5.0:1297196705.963380:0:19367:0:(debug.c:441:libcfs_debug_mark_buffer()) *************************************************** I need client side log to investigate what happened on client after MDS returned "EAGAIN". James, are there any logs for that? It seems not easy to reproduce conf_sanity test_55 failure. Thanks!
          yong.fan nasf (Inactive) added a comment - - edited

          > sanity-quota - totally broken. Does not work at all. Locks up clients

          Are there any logs related with sanity_quota interoperability test which caused client locked up? Because sanity-quota interoperability test passed in my local environment. I have checked bugzilla also, and found recent test result:

          https://bugzilla.lustre.org/show_bug.cgi?id=24207#c4

          That means sanity-quota interoperability works under TCP environment, but failed under IB case for bug 24055, and related patch for bug 24055 has been landed.

          So would you like to check whether such patch applied in your test. On the other hand, I think bug 24055's patch is not enough, you need above patch for bug 22703 also.

          Thanks!

          yong.fan nasf (Inactive) added a comment - - edited > sanity-quota - totally broken. Does not work at all. Locks up clients Are there any logs related with sanity_quota interoperability test which caused client locked up? Because sanity-quota interoperability test passed in my local environment. I have checked bugzilla also, and found recent test result: https://bugzilla.lustre.org/show_bug.cgi?id=24207#c4 That means sanity-quota interoperability works under TCP environment, but failed under IB case for bug 24055, and related patch for bug 24055 has been landed. So would you like to check whether such patch applied in your test. On the other hand, I think bug 24055's patch is not enough, you need above patch for bug 22703 also. Thanks!

          > sanity 64b - bug 22703

          I have made patch for it:
          http://review.whamcloud.com/#change,286 (for master)
          http://review.whamcloud.com/#change,287 (for b1_8)

          yong.fan nasf (Inactive) added a comment - > sanity 64b - bug 22703 I have made patch for it: http://review.whamcloud.com/#change,286 (for master) http://review.whamcloud.com/#change,287 (for b1_8)

          I believe I found the problem for replay-single 65a. Please look at patch http://review.whamcloud.com/#change,284

          simmonsja James A Simmons added a comment - I believe I found the problem for replay-single 65a. Please look at patch http://review.whamcloud.com/#change,284

          Just tried it. Also the test fails with 2.X clients with 2.X servers.

          simmonsja James A Simmons added a comment - Just tried it. Also the test fails with 2.X clients with 2.X servers.

          >replay-single 65a - bug 19960

          Sorry, I can not reproduce this failure. Can you show me an easy way to reproduce it? I think it is a duplicate of bug 22560, which has been fixed on master and lustre-1.8.5. Would you like to verify it again?

          yong.fan nasf (Inactive) added a comment - >replay-single 65a - bug 19960 Sorry, I can not reproduce this failure. Can you show me an easy way to reproduce it? I think it is a duplicate of bug 22560, which has been fixed on master and lustre-1.8.5. Would you like to verify it again?
          pjones Peter Jones added a comment -

          Jsmes

          Personally I think that it is easier to track issues when there is a 1:1 relationship between tickets and issues\fixes

          Peter

          pjones Peter Jones added a comment - Jsmes Personally I think that it is easier to track issues when there is a 1:1 relationship between tickets and issues\fixes Peter

          I thing you can create some sub-tasks under this one, then it is more easy to be tracked.

          yong.fan nasf (Inactive) added a comment - I thing you can create some sub-tasks under this one, then it is more easy to be tracked.

          People

            yong.fan nasf (Inactive)
            simmonsja James A Simmons
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: