Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5658

sanity test_17n: destroy remote dir error 0

Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • None
    • Lustre 2.7.0
    • None
    • 3
    • 15857

    Description

      This issue was created by maloo for Amir Shehata <amir.shehata@intel.com>

      Seeing the following errors:

      LustreError: 11-0: MGC10.1.4.222@tcp: Communicating with 10.1.4.222@tcp, operation obd_ping failed with -107.
      LustreError: 166-1: MGC10.1.4.222@tcp: Connection to MGS (at 10.1.4.222@tcp) was lost; in progress operations using this service will fail
      LustreError: 8827:0:(mgc_request.c:517:do_requeue()) failed processing log: -5
      
      LustreError: 11-0: lustre-MDT0001-mdc-ffff88007981bc00: Communicating with 10.1.4.218@tcp, operation mds_statfs failed with -107.
      LustreError: Skipped 1 previous similar message
      Lustre: lustre-MDT0001-mdc-ffff88007981bc00: Connection to lustre-MDT0001 (at 10.1.4.218@tcp) was lost; in progress operations using this service will wait for recovery to complete
      LustreError: 4138:0:(client.c:2802:ptlrpc_replay_interpret()) @@@ status 301, old was 0  req@ffff880079cf9000 x1479964115341320/t4294967394(4294967394) o101->lustre-MDT0001-mdc-ffff88007981bc00@10.1.4.218@tcp:12/10 lens 592/544 e 0 to 0 dl 1411404116 ref 2 fl Interpret:RP/4/0 rc 301/301
      
      

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/c65731d8-42b4-11e4-b87c-5254006e85c2.

      Attachments

        Activity

          [LU-5658] sanity test_17n: destroy remote dir error 0

          Do, can you please look into why this patch is failing conf-sanity? That is preventing it from landing:
          https://maloo.whamcloud.com/test_sessions/d2739d04-50c8-11e4-aa89-5254006e85c2
          https://maloo.whamcloud.com/test_sessions/18ef3062-50de-11e4-ac0f-5254006e85c2

          adilger Andreas Dilger added a comment - Do, can you please look into why this patch is failing conf-sanity? That is preventing it from landing: https://maloo.whamcloud.com/test_sessions/d2739d04-50c8-11e4-aa89-5254006e85c2 https://maloo.whamcloud.com/test_sessions/18ef3062-50de-11e4-ac0f-5254006e85c2
          di.wang Di Wang added a comment -

          Duplicate with LU-5420.

          di.wang Di Wang added a comment - Duplicate with LU-5420 .
          di.wang Di Wang added a comment -

          Hmm, we probably should retry for mgc, instead of failing mgc in step 2. So the fix for LU-5420( http://review.whamcloud.com/#/c/11258/) should fix the issue.

          di.wang Di Wang added a comment - Hmm, we probably should retry for mgc, instead of failing mgc in step 2. So the fix for LU-5420 ( http://review.whamcloud.com/#/c/11258/ ) should fix the issue.
          di.wang Di Wang added a comment -

          Just checked the debug log, it seems the process is this

          1. MDT0/MGS restarts.

          2. MDT1 restarts and try to reuse MGC (note: the mgc is shared by MDT1/MDT2/MDT3), which is evicted by MGS(because of 1). So MDT1 can not get the config log with the mgc, so it will use the local config log.

          00000100:02000400:0.0:1411404071.644549:0:3397:0:(import.c:950:ptlrpc_connect_interpret()) Evicted from MGS (at 10.1.4.222@tcp) after server handle changed from 0x82e1444051c34df2 to 0x82e1444051c3568f
          ......
          10000000:01000000:1.0:1411404071.645599:0:11810:0:(mgc_request.c:1866:mgc_process_log()) Can't get cfg lock: -108
          10000000:01000000:1.0:1411404071.645900:0:11810:0:(mgc_request.c:1777:mgc_process_cfg_log()) Failed to get MGS log lustre-MDT0001, using local copy for now, will try to update later.
          

          3. Unfortunately, the config log is stale, and OSTs are not included in the config log yet, which cause the problem.

          di.wang Di Wang added a comment - Just checked the debug log, it seems the process is this 1. MDT0/MGS restarts. 2. MDT1 restarts and try to reuse MGC (note: the mgc is shared by MDT1/MDT2/MDT3), which is evicted by MGS(because of 1). So MDT1 can not get the config log with the mgc, so it will use the local config log. 00000100:02000400:0.0:1411404071.644549:0:3397:0:(import.c:950:ptlrpc_connect_interpret()) Evicted from MGS (at 10.1.4.222@tcp) after server handle changed from 0x82e1444051c34df2 to 0x82e1444051c3568f ...... 10000000:01000000:1.0:1411404071.645599:0:11810:0:(mgc_request.c:1866:mgc_process_log()) Can't get cfg lock: -108 10000000:01000000:1.0:1411404071.645900:0:11810:0:(mgc_request.c:1777:mgc_process_cfg_log()) Failed to get MGS log lustre-MDT0001, using local copy for now, will try to update later. 3. Unfortunately, the config log is stale, and OSTs are not included in the config log yet, which cause the problem.
          di.wang Di Wang added a comment - http://review.whamcloud.com/#/c/12202
          di.wang Di Wang added a comment -

          Hmm, I saw this on mds console message

          Lustre: Skipped 3 previous similar messages
          Lustre: lustre-MDT0001: Recovery over after 0:04, of 5 clients 5 recovered and 0 were evicted.
          LustreError: 4554:0:(lod_lov.c:698:validate_lod_and_idx()) lustre-MDT0001-mdtlov: bad idx: 4 of 32
          Lustre: DEBUG MARKER: /usr/sbin/lctl mark  sanity test_17n: @@@@@@ FAIL: destroy remote dir error 0 
          
          di.wang Di Wang added a comment - Hmm, I saw this on mds console message Lustre: Skipped 3 previous similar messages Lustre: lustre-MDT0001: Recovery over after 0:04, of 5 clients 5 recovered and 0 were evicted. LustreError: 4554:0:(lod_lov.c:698:validate_lod_and_idx()) lustre-MDT0001-mdtlov: bad idx: 4 of 32 Lustre: DEBUG MARKER: /usr/sbin/lctl mark sanity test_17n: @@@@@@ FAIL: destroy remote dir error 0

          It looks like this is a recent regression.

          adilger Andreas Dilger added a comment - It looks like this is a recent regression.

          Di,
          Can you please have a look at this one and comment?
          Thank you!

          jlevi Jodi Levi (Inactive) added a comment - Di, Can you please have a look at this one and comment? Thank you!

          People

            di.wang Di Wang
            maloo Maloo
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: