[LU-7862] clients evicted during failover Created: 10/Mar/16  Updated: 08/Sep/16

Status: In Progress
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Frank Heckes (Inactive) Assignee: Lai Siyao
Resolution: Unresolved Votes: 0
Labels: soak
Environment:

lola
build: https://build.hpdd.intel.com/job/lustre-b2_8/12/


Attachments: HTML File recovery-times-20160317    
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Error occurred during soak testing of build '20160309' (b2_8 RC5) (see: https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20160309 also). DNE is enabled. MDTs had been formatted using ldiskfs, OSTs using zfs. MDS nodes are configured in active - active HA failover configuration. (For test set-up configuration see https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-Configuration)

The issue occurs during failover of MDS nodes. A typical error is:
Server event history:

  • mds_failover : 2016-03-09 23:54:41,330 - 2016-03-10 00:03:51,040 lola-9
  • Secondary node lola-8 evict all clients:
    lola-8.log:Mar 10 00:03:38 lola-8 kernel: Lustre: soaked-MDT0003: Recovery over after 1:11, of 16 clients 0 recovered and 16 were evicted.
    lola-8.log:Mar 10 00:03:49 lola-8 kernel: Lustre: soaked-MDT0002: Recovery over after 0:32, of 16 clients 0 recovered and 16 were evicted.
    
  • Primary node lola-9 partially evict clients:
    lola-9.log:Mar 10 00:07:03 lola-9 kernel: Lustre: soaked-MDT0002: Recovery over after 3:04, of 16 clients 11 recovered and 5 were evicted.
    lola-9.log:Mar 10 00:10:51 lola-9 kernel: Lustre: soaked-MDT0003: Recovery over after 6:55, of 16 clients 14 recovered and 2 were evicted.
    

    Client events:

  • Job crash with:
    03/10/2016 00:14:49: Process 0(): FAILED in show_file_system_size, unable to statfs() file system: Input/output error
    --------------------------------------------------------------------------
    MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD 
    with errorcode 1.
    
    NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
    You may or may not see output from other processes, depending on
    exactly when Open MPI kills them.
    --------------------------------------------------------------------------
    In: PMI_Abort(1, N/A)
    srun: Job step aborted: Waiting up to 2 seconds for job step to finish.
    slurmd[lola-27]: *** STEP 451370.0 KILLED AT 2016-03-10T00:14:49 WITH SIGNAL 9 ***
    slurmd[lola-27]: *** STEP 451370.0 KILLED AT 2016-03-10T00:14:49 WITH SIGNAL 9 ***
    srun: error: lola-27: task 0: Exited with exit code 1
    
  • Lustre eroors on lola-27 reads as
    lola-27.log:Mar 10 00:14:04 lola-27 kernel: Lustre: 3779:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1457596992/real 1457596992]  req@ffff8807278556c0 x1528367917668784/t0(0) o400->soaked-MDT0003-mdc-ffff88081f7c1800@192.168.1.108@o2ib10:12/10 lens 224/224 e 0 to 1 dl 1457597644 ref 1 fl Rpc:X/c0/ffffffff rc 0/-1
    lola-27.log:Mar 10 00:14:04 lola-27 kernel: Lustre: 3779:0:(client.c:2063:ptlrpc_expire_one_request()) Skipped 7 previous similar messages
    lola-27.log:Mar 10 00:14:04 lola-27 kernel: LustreError: 167-0: soaked-MDT0003-mdc-ffff88081f7c1800: This client was evicted by soaked-MDT0003; in progress operations using this service will fail.
    lola-27.log:Mar 10 00:14:04 lola-27 kernel: Lustre: soaked-MDT0003-mdc-ffff88081f7c1800: Connection restored to 192.168.1.109@o2ib10 (at 192.168.1.109@o2ib10)
    lola-27.log:Mar 10 00:14:49 lola-27 kernel: LustreError: 167-0: soaked-MDT0002-mdc-ffff88081f7c1800: This client was evicted by soaked-MDT0002; in progress operations using this service will fail.
    lola-27.log:Mar 10 00:14:49 lola-27 kernel: LustreError: 36067:0:(llite_lib.c:2309:ll_prep_inode()) new_inode -fatal: rc -5
    lola-27.log:Mar 10 00:14:49 lola-27 kernel: LustreError: 59437:0:(lmv_obd.c:1467:lmv_statfs()) can't stat MDS #2 (soaked-MDT0002-mdc-ffff88081f7c1800), error -5
    lola-27.log:Mar 10 00:14:49 lola-27 kernel: LustreError: 59437:0:(llite_lib.c:1752:ll_statfs_internal()) md_statfs fails: rc = -5
    lola-27.log:Mar 10 00:14:49 lola-27 kernel: Lustre: soaked-MDT0002-mdc-ffff88081f7c1800: Connection restored to 192.168.1.109@o2ib10 (at 192.168.1.109@o2ib10)
    lola-27.log:Mar 10 00:14:49 lola-27 kernel: LustreError: 36067:0:(llite_lib.c:2309:ll_prep_inode()) Skipped 2 previous similar messages
    lola-30.log:Mar 10 00:14:46 lola-30 kernel: LustreError: 167-0: soaked-MDT0002-mdc-ffff88086534ec00: This client was evicted by soaked-MDT0002; in progress operations using this service will fail.
    lola-30.log:Mar 10 00:14:46 lola-30 kernel: LustreError: 42688:0:(llite_lib.c:2309:ll_prep_inode()) new_inode -fatal: rc -5
    lola-30.log:Mar 10 00:14:46 lola-30 kernel: Lustre: soaked-MDT0002-mdc-ffff88086534ec00: Connection restored to 192.168.1.109@o2ib10 (at 192.168.1.109@o2ib10)
    lola-30.log:Mar 10 00:14:46 lola-30 kernel: LustreError: 42688:0:(llite_lib.c:2309:ll_prep_inode()) Skipped 1 previous similar message
    
  • Other jobs crash and leave orphaned files behind:
    451145:
    ls: cannot access 451145/pct-createunlink-0-412: No such file or directory
    total 9856
    d????????? ? ?        ?            ?            ? pct-createunlink-0-412
    

We currently don't have debug logs for these events. I'll prepare clients and server nodes to trigger a debug creation.



 Comments   
Comment by Frank Heckes (Inactive) [ 17/Mar/16 ]

Soak has been continued to execute b2_8 RC5 build with reformatted Lustre FS.
Now there's only 1 MDT per MDS and 5 OSTs per OSS (unchanged). MDT had
been formatted with ldiskfs and OSTs using zfs.

  1. Recovery process always completes successful for 'mds_restarts' (no client is evicted)
  2. For mds_failover(+ interrupt recovery on secondary node by umounting immediately again)
    the recovery is only partially successful for a subset of the clients, i.e. clients will always be evicted
  3. For mds_failover(+ wait for recovery to complete on secondary before failback)
    the recovery process is always successful on the secondary and the primary node.

The attached file recovery-times-20160317 shows the events for the three 'categories' above

  1. All events between 2016-03-15 07:20 — 2016-03-16 08:45 were executed with 'mds_restart' as only injected fault
  2. All events between 2016-03-16 08:45 — 2016-03-17 04:38 were executed with 'mds_restart', 'mds_failover + immediate failback'
  3. All events between 2016-03-17 04:38 — 2016-03-17 07:25 were executed with 'mds_restart', 'mds_failover + wait for recovery'
Generated at Sat Feb 10 02:12:34 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.