Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13179

conf-sanity test 93 fails with ''mds2: import is not in FULL state after 40''

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.12.4
    • None
    • DNE/ZFS
    • 3
    • 9223372036854775807

    Description

      conf-sanity tes_93 fails with ''mds2: import is not in FULL state after 40''. Looking at the console log for MDS 2/4 (vm5) for the failure at https://testing.whamcloud.com/test_sets/1e5610e6-43ab-11ea-8072-52540065bddc, we see that

      trevis-42vm5: trevis-42vm5.trevis.whamcloud.com: executing wait_import_state FULL os[cp].lustre-OST0000-osc-MDT0001.ost_server_uuid 40
      CMD: trevis-42vm4 zfs get -H -o value 						lustre:svname lustre-mdt3/mdt3
      Starting mds3:   lustre-mdt3/mdt3 /mnt/lustre-mds3
      CMD: trevis-42vm4 mkdir -p /mnt/lustre-mds3; mount -t lustre   lustre-mdt3/mdt3 /mnt/lustre-mds3
      CMD: trevis-42vm5 zfs get -H -o value 						lustre:svname lustre-mdt4/mdt4
      Starting mds4:   lustre-mdt4/mdt4 /mnt/lustre-mds4
      CMD: trevis-42vm5 mkdir -p /mnt/lustre-mds4; mount -t lustre   lustre-mdt4/mdt4 /mnt/lustre-mds4
      trevis-42vm5:  rpc : @@@@@@ FAIL: can't put import for os[cp].lustre-OST0000-osc-MDT0001.ost_server_uuid into FULL state after 40 sec, have  
      trevis-42vm5:   Trace dump:
      trevis-42vm5:   = /usr/lib64/lustre/tests/test-framework.sh:5900:error()
      trevis-42vm5:   = /usr/lib64/lustre/tests/test-framework.sh:7027:_wait_import_state()
      trevis-42vm5:   = /usr/lib64/lustre/tests/test-framework.sh:7049:wait_import_state()
      

      In all the console logs, we just get confirmation that the MSD1 can’t connect in the allotted time limit.

      Looking at the console log for the MDS1/3 (vm4), we see

      [83050.530881] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-42vm5.trevis.whamcloud.com: executing wait_import_state FULL os[cp].lustre-OST0000-osc-MDT0001.ost_server_uuid 40
      [83050.717460] Lustre: DEBUG MARKER: zfs get -H -o value 						lustre:svname lustre-mdt3/mdt3
      [83050.753501] Lustre: DEBUG MARKER: trevis-42vm5.trevis.whamcloud.com: executing wait_import_state FULL os[cp].lustre-OST0000-osc-MDT0001.ost_server_uuid 40
      [83051.092507] Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds3; mount -t lustre   lustre-mdt3/mdt3 /mnt/lustre-mds3
      [83064.412617] LustreError: 3861:0:(fail.c:129:__cfs_fail_timeout_set()) cfs_fail_timeout id 90e sleeping for 10000ms
      [83064.414535] LustreError: 3861:0:(fail.c:129:__cfs_fail_timeout_set()) Skipped 76 previous similar messages
      [83074.420718] LustreError: 3861:0:(fail.c:133:__cfs_fail_timeout_set()) cfs_fail_timeout id 90e awake
      [83074.422537] LustreError: 3861:0:(fail.c:133:__cfs_fail_timeout_set()) Skipped 76 previous similar messages
      [83089.482356] Lustre: lustre-MDT0002: Imperative Recovery not enabled, recovery window 60-180
      [83089.484385] Lustre: Skipped 6 previous similar messages
      [83090.262090] Lustre: cli-ctl-lustre-MDT0002: Allocated super-sequence [0x0000000280000400-0x00000002c0000400]:2:mdt]
      [83091.385126] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  rpc : @@@@@@ FAIL: can\'t put import for os[cp].lustre-OST0000-osc-MDT0001.ost_server_uuid into FULL state after 40 sec, have  
      [83091.610883] Lustre: DEBUG MARKER: rpc : @@@@@@ FAIL: can't put import for os[cp].lustre-OST0000-osc-MDT0001.ost_server_uuid into FULL state after 40 sec, have
      

      Looking at the console log for the OSS (vm3), we see

      [83053.780999] Lustre: DEBUG MARKER: == rpc test complete, duration -o sec ================================================================ 00:08:41 (1580342921)
      [83054.152945] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-42vm5.trevis.whamcloud.com: executing wait_import_state FULL os[cp].lustre-OST0000-osc-MDT0001.ost_server_uuid 40
      [83054.372472] Lustre: DEBUG MARKER: trevis-42vm5.trevis.whamcloud.com: executing wait_import_state FULL os[cp].lustre-OST0000-osc-MDT0001.ost_server_uuid 40
      [83092.153168] Lustre: cli-lustre-OST0000-super: Allocated super-sequence [0x0000000240000400-0x0000000280000400]:0:ost]
      [83092.155157] Lustre: Skipped 2 previous similar messages
      [83094.995539] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  rpc : @@@@@@ FAIL: can\'t put import for os[cp].lustre-OST0000-osc-MDT0001.ost_server_uuid into FULL state after 40 sec, have  
      [83095.232157] Lustre: DEBUG MARKER: rpc : @@@@@@ FAIL: can't put import for os[cp].lustre-OST0000-osc-MDT0001.ost_server_uuid into FULL state after 40 sec, have
      

      On the client1 (vm1) console log, we see

      [83271.701328] Lustre: 17215:0:(lmv_obd.c:269:lmv_init_ea_size()) lustre-clilmv-ffff97f8f9748800: NULL export for 1
      [83292.919916] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  conf-sanity test_93: @@@@@@ FAIL: mds2: import is not in FULL state after 40 
      [83293.136689] Lustre: DEBUG MARKER: conf-sanity test_93: @@@@@@ FAIL: mds2: import is not in FULL state after 40
      

      This is the first time we’ve seen this issue for banch testing; 29 JAN 2020 for 2.12.3.109 DNE/ZFS.

      In the past year, we’ve seen a similar issue twice when running testing for patches, but the patch being tested may have cause the failure.

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: