Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7675

replay-single test_101 times out after aborting recovery on mount of the mds1

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Minor
    • None
    • Lustre 2.8.0
    • None
    • autotest review-dne-part-2
    • 3
    • 9223372036854775807

    Description

      replay-single test 101 times out on mount of the mds1 with the abort recovery flag. The last information in the test_log is

      01:57:12 (1452765432) waiting for onyx-34vm7 network 900 secs ...
      01:57:12 (1452765432) network interface is UP
      CMD: onyx-34vm7 hostname
      CMD: onyx-34vm7 test -b /dev/lvm-Role_MDS/P1
      Starting mds1:  -o abort_recovery /dev/lvm-Role_MDS/P1 /mnt/mds1
      CMD: onyx-34vm7 mkdir -p /mnt/mds1; mount -t lustre  -o abort_recovery 		                   /dev/lvm-Role_MDS/P1 /mnt/mds1
      

      From the MDS1 console, we see:

      01:57:22:LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. quota=on. Opts: 
      01:57:22:LustreError: 14301:0:(mdt_handler.c:5605:mdt_iocontrol()) lustre-MDT0000: Aborting recovery for device
      01:57:44:LustreError: 14301:0:(ldlm_lib.c:2479:target_stop_recovery_thread()) lustre-MDT0000: Aborting recovery
      01:57:44:Lustre: 14377:0:(ldlm_lib.c:1945:target_recovery_overseer()) recovery is aborted, evict exports in recovery
      01:57:44:Lustre: 14377:0:(ldlm_lib.c:1945:target_recovery_overseer()) Skipped 2 previous similar messages
      01:57:44:Lustre: lustre-MDT0000: disconnecting 5 stale clients
      01:57:44:LustreError: 14377:0:(update_records.c:72:update_records_dump()) master transno = 382252089401 batchid = 373662154835 flags = 0 ops = 19 params = 9
      01:57:44:LustreError: 14377:0:(update_records.c:72:update_records_dump()) master transno = 382252089401 batchid = 373662154836 flags = 0 ops = 28 params = 24
      01:57:44:LustreError: 14377:0:(update_records.c:72:update_records_dump()) master transno = 382252089401 batchid = 377957122268 flags = 0 ops = 19 params = 9
      01:57:44:
      Press any key to continue.
      01:57:44:
      Press any key to continue.
      01:57:44:
      Press any key to continue.
      01:57:44:
      Press any key to continue.
      01:57:44:
      Press any key to continue.
      01:57:44: [H [J
      01:57:44:    GNU GRUB  version 0.97  (631K lower / 2096116K upper memory)
      

      We’ve seen this error four times in the past two months during review-dne-part-2 testing. Logs are at
      2015-11-27 03:10:27 - https://testing.hpdd.intel.com/test_sets/874faa9a-9503-11e5-bdeb-5254006e85c2
      2015-12-12 02:31:59 - https://testing.hpdd.intel.com/test_sets/77362cfc-a0e2-11e5-9d88-5254006e85c2
      2016-01-02 08:22:17 - https://testing.hpdd.intel.com/test_sets/102b7ef4-b177-11e5-bf32-5254006e85c2
      2016-01-14 08:30:36 - https://testing.hpdd.intel.com/test_sets/4723f9d4-bae8-11e5-87b4-5254006e85c2

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: