Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4325

Config failover between 2 Lustre serves, simulating one server crashed, the other server crashed unexpected when it take the task of the crashed one

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Critical
    • None
    • Lustre 2.4.0
    • None
    • 2 Lustre server +1 client server
    • 3
    • 11824

    Description

      1.mount 1 MDT and 4 OSTs on the Lustre Server1.
      2.mount 4 OSTs on the Lustre Server2.
      3.Config Lustre failover between the 2 Lustre Servers.
      4.mount the Lustre File System on the Lustre Client.
      5.Write and Read datas on the Lustre Client.
      6.Simulating the Lustre Server1 crashed.
      7.The Lustre Server2 crashed unexpectedly when it take the task of the Lustre Server1 ,the call trace info as follow:
      LustreError: 137-5: lustre-OST0000_UUID: not available for connect from 192.168.22.202@tcp (no target)
      LustreError: Skipped 3 previous similar messages
      LDISKFS-fs (sde): recovery complete
      LDISKFS-fs (sde): mounted filesystem with ordered data mode. quota=on. Opts:
      LustreError: 10026:0:(genops.c:320:class_newdev()) Device MGC192.168.22.50@tcp already exists at 2, won't add
      LustreError: 10026:0:(obd_config.c:374:class_attach()) Cannot create device MGC192.168.22.50@tcp of type mgc : -17
      LustreError: 10026:0:(obd_mount.c:196:lustre_start_simple()) MGC192.168.22.50@tcp attach error -17
      LustreError: 10026:0:(obd_mount_server.c:844:lustre_disconnect_lwp()) lustre-MDT0000-lwp-MDT0000: Can't end config log lustre-client.
      LustreError: 10026:0:(obd_mount_server.c:1426:server_put_super()) lustre-MDT0000: failed to disconnect lwp. (rc=-2)
      LustreError: 10026:0:(obd_mount_server.c:1456:server_put_super()) no obd lustre-MDT0000
      LustreError: 10026:0:(obd_mount_server.c:135:server_deregister_mount()) lustre-MDT0000 not registered
      LustreError: 10026:0:(genops.c:1570:obd_exports_barrier()) ASSERTION( list_empty(&obd->obd_exports) ) failed:
      LustreError: 10026:0:(genops.c:1570:obd_exports_barrier()) LBUG
      Pid: 10026, comm: mount.lustre

      Call Trace:
      [<ffffffffa070f8a5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      [<ffffffffa070feb7>] lbug_with_loc+0x47/0xb0 [libcfs]
      [<ffffffffa0818d91>] obd_exports_barrier+0x181/0x190 [obdclass]
      [<ffffffffa0f23886>] mgs_device_fini+0xf6/0x5c0 [mgs]
      [<ffffffffa0843837>] class_cleanup+0x817/0xe00 [obdclass]
      [<ffffffffa081ce2c>] ? class_name2dev+0x7c/0xe0 [obdclass]
      [<ffffffffa0847e9b>] class_process_config+0x1b6b/0x2f60 [obdclass]
      [<ffffffffa0710b90>] ? cfs_alloc+0x30/0x60 [libcfs]
      [<ffffffffa0849723>] class_manual_cleanup+0x493/0xe80 [obdclass]
      [<ffffffff8147a1fe>] ? _read_unlock+0xe/0x10
      [<ffffffffa081ce2c>] ? class_name2dev+0x7c/0xe0 [obdclass]
      [<ffffffffa0884b9d>] server_put_super+0x42d/0x2580 [obdclass]
      [<ffffffffa0887440>] server_fill_super+0x750/0x1580 [obdclass]
      [<ffffffffa0854c98>] lustre_fill_super+0x1d8/0x530 [obdclass]
      [<ffffffffa0854ac0>] ? lustre_fill_super+0x0/0x530 [obdclass]
      [<ffffffff8114d21f>] get_sb_nodev+0x5f/0xa0
      [<ffffffffa084c3f5>] lustre_get_sb+0x25/0x30 [obdclass]
      [<ffffffff8114c74b>] vfs_kern_mount+0x7b/0x1b0
      [<ffffffff8114c8f2>] do_kern_mount+0x52/0x130
      [<ffffffff81168912>] do_mount+0x2d2/0x8c0
      [<ffffffff81168f90>] sys_mount+0x90/0xe0
      [<ffffffff81002f5b>] system_call_fastpath+0x16/0x1b

      Message fromKernel panic - not syncing: LBUG
      Pid: 10026, comm: mount.lustre Tainted: PF --------------- 2.6.32-358.6.2.l2.08 #2
      Call Trace:
      [<ffffffff81476fa7>] ? panic+0xa1/0x163
      [<ffffffffa070ff0b>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
      [<ffffffffa0818d91>] ? obd_exports_barrier+0x181/0x190 [obdclass]
      [<ffffffffa0f23886>] ? mgs_device_fini+0xf6/0x5c0 [mgs]
      [<ffffffffa0843837>] ? class_cleanup+0x817/0xe00 [obdclass]
      [<ffffffffa081ce2c>] ? class_name2dev+0x7c/0xe0 [obdclass]
      [<ffffffffa0847e9b>] ? class_process_config+0x1b6b/0x2f60 [obdclass]
      syslogd@50:B3:4 [<ffffffffa0710b90>] ? cfs_alloc+0x30/0x60 [libcfs]
      [<ffffffffa0849723>] ? class_manual_cleanup+0x493/0xe80 [obdclass]
      2:00:01:01 at Se [<ffffffff8147a1fe>] ? _read_unlock+0xe/0x10
      [<ffffffffa081ce2c>] ? class_name2dev+0x7c/0xe0 [obdclass]
      [<ffffffffa0884b9d>] ? server_put_super+0x42d/0x2580 [obdclass]
      [<ffffffffa0887440>] ? server_fill_super+0x750/0x1580 [obdclass]
      p 22 12:53:12 .. [<ffffffffa0854c98>] ? lustre_fill_super+0x1d8/0x530 [obdclass]
      [<ffffffffa0854ac0>] ? lustre_fill_super+0x0/0x530 [obdclass]
      [<ffffffff8114d21f>] ? get_sb_nodev+0x5f/0xa0
      [<ffffffffa084c3f5>] ? lustre_get_sb+0x25/0x30 [obdclass]
      [<ffffffff8114c74b>] ? vfs_kern_mount+0x7b/0x1b0
      [<ffffffff8114c8f2>] ? do_kern_mount+0x52/0x130
      [<ffffffff81168912>] ? do_mount+0x2d2/0x8c0
      [<ffffffff81168f90>] ? sys_mount+0x90/0xe0
      [<ffffffff81002f5b>] ? system_call_fastpath+0x16/0x1b
      *******show para for nt_memcpy16*******
      src: ffff880285fc4f00, dst: ffffc90112030e70, len: 56
      *******show para for panic done*******
      ODSP:MSG:BUGON: This stack is bug.
      ODSP:MSG:BUGON: Local was taken over by peer. Suspend CPU.
      ODSP:MSG:BUGON: Local was taken over by peer. Suspend CPU.

      Attachments

        Activity

          [LU-4325] Config failover between 2 Lustre serves, simulating one server crashed, the other server crashed unexpected when it take the task of the crashed one
          adilger Andreas Dilger made changes -
          Resolution New: Cannot Reproduce [ 5 ]
          Status Original: Open [ 1 ] New: Resolved [ 5 ]
          yueyuling yueyuling created issue -

          People

            wc-triage WC Triage
            yueyuling yueyuling
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: