Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-639

replay-dual test_0b: @@@@@@ FAIL: mount1 fails

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.2.0
    • None
    • None
    • 3
    • 4779

    Description

      == replay-dual test 0b: lost client during waiting for next transno ================================== 11:11:19 (1314285079)
      Filesystem 1K-blocks Used Available Use% Mounted on
      10.37.248.61@o2ib1:/lustre
      22047088 922544 20002752 5% /lustre/barry
      Failing mds1 on node barry-mds1
      Stopping /tmp/mds1 (opts on barry-mds1
      affected facets: mds1
      Failover mds1 to barry-mds1
      11:11:34 (1314285094) waiting for barry-mds1 network 900 secs ...
      11:11:34 (1314285094) network interface is UP
      Starting mds1: -o user_xattr,acl /dev/md5 /tmp/mds1
      Started lustre-MDT0000
      Starting client: spoon01: -o user_xattr,acl,flock 10.37.248.61@o2ib1:/lustre /lustre/barry
      mount.lustre: mount 10.37.248.61@o2ib1:/lustre at /lustre/barry failed: File exists
      replay-dual test_0b: @@@@@@ FAIL: mount1 fais

      Client dmesg
      Lustre: DEBUG MARKER: == replay-dual test 0b: lost client during waiting for next transno ================================== 11:11:19 (1314285079)
      Lustre: DEBUG MARKER: local REPLAY BARRIER on lustre-MDT0000
      LustreError: 31491:0:(ldlm_request.c:1172:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway
      LustreError: 31491:0:(ldlm_request.c:1799:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108
      Lustre: client ffff810173cb2400 umount complete
      Lustre: setting import lustre-MDT0000_UUID INACTIVE by administrator request
      Lustre: setting import lustre-OST0000_UUID INACTIVE by administrator request
      LustreError: 31613:0:(ldlm_request.c:1172:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway
      LustreError: 31613:0:(ldlm_request.c:1799:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108
      Lustre: client ffff81017c9d6800 umount complete
      LustreError: 31623:0:(genops.c:304:class_newdev()) Device MGC10.37.248.61@o2ib1 already exists, won't add
      LustreError: 31623:0:(obd_config.c:327:class_attach()) Cannot create device MGC10.37.248.61@o2ib1 of type mgc : -17
      LustreError: 31623:0:(obd_mount.c:512:lustre_start_simple()) MGC10.37.248.61@o2ib1 attach error -17
      LustreError: 31623:0:(obd_mount.c:2160:lustre_fill_super()) Unable to mount (-17)
      Lustre: DEBUG MARKER: replay-dual test_0b: @@@@@@ FAIL: mount1 fais

      MDS dmesg

      Lustre: DEBUG MARKER: == replay-dual test 0b: lost client during waiting for next transno ================================== 11:11:19 (1314285079)
      LustreError: 10361:0:(osd_handler.c:938:osd_ro()) *** setting device osd-ldiskfs read-only ***
      Turning device md5 (0x900005) read-only
      Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000
      Lustre: Failing over lustre-MDT0000
      Lustre: 10460:0:(quota_master.c:793:close_quota_files()) quota[0] is off already
      Lustre: 10460:0:(quota_master.c:793:close_quota_files()) Skipped 1 previous similar message
      Lustre: Failing over mdd_obd-lustre-MDT0000
      Lustre: mdd_obd-lustre-MDT0000: shutting down for failover; client state will be preserved.
      Removing read-only on unknown block (0x900005)
      Lustre: server umount lustre-MDT0000 complete
      LDISKFS-fs (md5): recovery complete
      LDISKFS-fs (md5): mounted filesystem with ordered data mode
      JBD: barrier-based sync failed on md5-8 - disabling barriers
      LDISKFS-fs (md5): mounted filesystem with ordered data mode
      Lustre: Enabling ACL
      Lustre: Enabling user_xattr
      Lustre: lustre-MDT0000: used disk, loading
      Lustre: 10592:0:(ldlm_lib.c:1903:target_recovery_init()) RECOVERY: service lustre-MDT0000, 66 recoverable clients, last_transno 4294967297
      LustreError: 10599:0:(ldlm_lib.c:1740:target_recovery_thread()) lustre-MDT0000: started recovery thread pid 10599
      LustreError: 10601:0:(mdt_handler.c:2785:mdt_recovery()) operation 400 on unconnected MDS from 12345-10.37.248.45@o2ib1
      LustreError: 10601:0:(ldlm_lib.c:2128:target_send_reply_msg()) @@@ processing error (107) req@ffff81040ca1c400 x1378037410570255/t0(0) o-1><?>@<?>:0/0 lens 192/0 e 0 to 0 dl 1314285137 ref 1 fl Interpret:H/ffffffff/ffffffff rc -107/-1
      LustreError: 10601:0:(ldlm_lib.c:2128:target_send_reply_msg()) Skipped 1 previous similar message
      LustreError: 137-5: UUID 'lustre-MDT0000_UUID' is not available for connect (not set up)
      Lustre: 10592:0:(mdt_lproc.c:257:lprocfs_wr_identity_upcall()) lustre-MDT0000: identity upcall set to /usr/sbin/l_getidentity
      Lustre: 10592:0:(mds_lov.c:1004:mds_notify()) MDS mdd_obd-lustre-MDT0000: add target lustre-OST0000_UUID
      Lustre: 10592:0:(mds_lov.c:1004:mds_notify()) Skipped 4 previous similar messages
      JBD: barrier-based sync failed on md5-8 - disabling barriers
      Lustre: 5799:0:(mds_lov.c:1024:mds_notify()) MDS mdd_obd-lustre-MDT0000: in recovery, not resetting orphans on lustre-OST0000_UUID
      Lustre: 5799:0:(mds_lov.c:1024:mds_notify()) MDS mdd_obd-lustre-MDT0000: in recovery, not resetting orphans on lustre-OST0001_UUID
      LustreError: 10601:0:(mdt_handler.c:2785:mdt_recovery()) operation 400 on unconnected MDS from 12345-10.37.248.44@o2ib1
      Lustre: lustre-MDT0000: temporarily refusing client connection from 10.37.248.44@o2ib1
      Lustre: Skipped 1 previous similar message
      LNet: 10801:0:(debug.c:326:libcfs_debug_str2mask()) You are trying to use a numerical value for the mask - this will be deprecated in a future release.
      Lustre: DEBUG MARKER: replay-dual test_0b: @@@@@@ FAIL: mount1 fais
      LustreError: 10601:0:(mdt_handler.c:2785:mdt_recovery()) operation 400 on unconnected MDS from 12345-10.37.248.4@o2ib1
      Lustre: 10601:0:(ldlm_lib.c:2029:target_queue_recovery_request()) Next recovery transno: 4294967298, current: 4294967306, replaying
      Lustre: 10601:0:(ldlm_lib.c:2029:target_queue_recovery_request()) Next recovery transno: 4294967298, current: 4294967303, replaying
      LustreError: 10606:0:(mdt_handler.c:2785:mdt_recovery()) operation 400 on unconnected MDS from 12345-10.37.248.16@o2ib1
      LustreError: 10606:0:(mdt_handler.c:2785:mdt_recovery()) Skipped 58 previous similar messages


      Info required for matching: replay-dual test_0b 0b

      Attachments

        Issue Links

          Activity

            People

              bobijam Zhenyu Xu
              simmonsja James A Simmons
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: