Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2824

recovery-mds-scale test_failover_ost: tar: etc/localtime: Cannot open: Input/output error

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.1.5, Lustre 1.8.9
    • Lustre 1.8.9
    • None
    • 3
    • 6837

    Description

      While running recovery-mds-scale test_failover_ost, it failed as follows after running 15 hours:

      ==== Checking the clients loads AFTER failover -- failure NOT OK
      Client load failed on node client-31vm6, rc=1
      Client load failed during failover. Exiting...
      Found the END_RUN_FILE file: /home/autotest/.autotest/shared_dir/2013-02-12/172229-70152412386500/end_run_file
      client-31vm6.lab.whamcloud.com
      Client load  failed on node client-31vm6.lab.whamcloud.com:
      /logdir/test_logs/2013-02-12/lustre-b1_8-el5-x86_64-vs-lustre-b1_8-el6-x86_64--review--1_1_1__13121__-70152412386500-172228/recovery-mds-scale.test_failover_ost.run__stdout.client-31vm6.lab.whamcloud.com.log
      /logdir/test_logs/2013-02-12/lustre-b1_8-el5-x86_64-vs-lustre-b1_8-el6-x86_64--review--1_1_1__13121__-70152412386500-172228/recovery-mds-scale.test_failover_ost.run__debug.client-31vm6.lab.whamcloud.com.log
      2013-02-13 10:50:55 Terminating clients loads ...
      Duration:               86400
      Server failover period: 900 seconds
      Exited after:           56345 seconds
      Number of failovers before exit:
      mds: 0 times
      ost1: 8 times
      ost2: 3 times
      ost3: 13 times
      ost4: 7 times
      ost5: 10 times
      ost6: 10 times
      ost7: 12 times
      Status: FAIL: rc=1
      

      The output of "tar" operation on client-31vm6 showed that:

      tar: etc/chef/solo.rb: Cannot open: Input/output error
      tar: etc/chef/client.rb: Cannot open: Input/output error
      tar: etc/prelink.cache: Cannot open: Input/output error
      tar: etc/readahead.conf: Cannot open: Input/output error
      tar: etc/localtime: Cannot open: Input/output error
      tar: Exiting with failure status due to previous errors
      

      Dmesg on the MDS node (client-31vm3) showed that:

      Lustre: DEBUG MARKER: Starting failover on ost2
      Lustre: 7396:0:(client.c:1529:ptlrpc_expire_one_request()) @@@ Request x1426822752375573 sent from lustre-OST0000-osc to NID 10.10.4.195@tcp 7s ago has timed out (7s prior to deadline).
        req@ffff810048893000 x1426822752375573/t0 o13->lustre-OST0000_UUID@10.10.4.195@tcp:7/4 lens 192/528 e 0 to 1 dl 1360781077 ref 1 fl Rpc:N/0/0 rc 0/0
      Lustre: 7396:0:(client.c:1529:ptlrpc_expire_one_request()) Skipped 117 previous similar messages
      Lustre: lustre-OST0000-osc: Connection to service lustre-OST0000 via nid 10.10.4.195@tcp was lost; in progress operations using this service will wait for recovery to complete.
      Lustre: lustre-OST0005-osc: Connection to service lustre-OST0005 via nid 10.10.4.195@tcp was lost; in progress operations using this service will wait for recovery to complete.
      Lustre: Skipped 4 previous similar messages
      Lustre: lustre-OST0006-osc: Connection to service lustre-OST0006 via nid 10.10.4.195@tcp was lost; in progress operations using this service will wait for recovery to complete.
      Lustre: 7398:0:(import.c:517:import_select_connection()) lustre-OST0000-osc: tried all connections, increasing latency to 2s
      Lustre: 7398:0:(import.c:517:import_select_connection()) Skipped 59 previous similar messages
      LustreError: 7396:0:(lov_request.c:694:lov_update_create_set()) error creating fid 0x80013 sub-object on OST idx 0/1: rc = -11
      LustreError: 7725:0:(lov_request.c:694:lov_update_create_set()) error creating fid 0x80013 sub-object on OST idx 0/1: rc = -5
      LustreError: 7725:0:(mds_open.c:442:mds_create_objects()) error creating objects for inode 524307: rc = -5
      LustreError: 7725:0:(mds_open.c:827:mds_finish_open()) mds_create_objects: rc = -5
      LustreError: 7396:0:(lov_request.c:694:lov_update_create_set()) error creating fid 0x80013 sub-object on OST idx 1/1: rc = -11
      LustreError: 7750:0:(mds_open.c:442:mds_create_objects()) error creating objects for inode 524307: rc = -5
      LustreError: 7750:0:(mds_open.c:827:mds_finish_open()) mds_create_objects: rc = -5
      Lustre: MGS: haven't heard from client 71727980-7899-77af-8af0-a42b0349985a (at 10.10.4.195@tcp) in 49 seconds. I think it's dead, and I am evicting it.
      LustreError: 7396:0:(lov_request.c:694:lov_update_create_set()) error creating fid 0x80013 sub-object on OST idx 2/1: rc = -11
      LustreError: 7396:0:(lov_request.c:694:lov_update_create_set()) Skipped 1 previous similar message
      LustreError: 7745:0:(mds_open.c:442:mds_create_objects()) error creating objects for inode 524307: rc = -5
      LustreError: 7745:0:(mds_open.c:827:mds_finish_open()) mds_create_objects: rc = -5
      LustreError: 7396:0:(lov_request.c:694:lov_update_create_set()) error creating fid 0x80013 sub-object on OST idx 3/1: rc = -11
      LustreError: 7396:0:(lov_request.c:694:lov_update_create_set()) Skipped 1 previous similar message
      LustreError: 7739:0:(mds_open.c:442:mds_create_objects()) error creating objects for inode 524307: rc = -5
      LustreError: 7739:0:(mds_open.c:827:mds_finish_open()) mds_create_objects: rc = -5
      LustreError: 7396:0:(lov_request.c:694:lov_update_create_set()) error creating fid 0x80013 sub-object on OST idx 4/1: rc = -11
      LustreError: 7396:0:(lov_request.c:694:lov_update_create_set()) Skipped 1 previous similar message
      LustreError: 7731:0:(mds_open.c:442:mds_create_objects()) error creating objects for inode 524307: rc = -5
      LustreError: 7731:0:(mds_open.c:827:mds_finish_open()) mds_create_objects: rc = -5
      Lustre: 7397:0:(quota_master.c:1724:mds_quota_recovery()) Only 6/7 OSTs are active, abort quota recovery
      Lustre: 7397:0:(quota_master.c:1724:mds_quota_recovery()) Skipped 6 previous similar messages
      Lustre: lustre-OST0000-osc: Connection restored to service lustre-OST0000 using nid 10.10.4.191@tcp.
      Lustre: Skipped 6 previous similar messages
      Lustre: MDS lustre-MDT0000: lustre-OST0000_UUID now active, resetting orphans
      Lustre: Skipped 6 previous similar messages
      LustreError: 7396:0:(lov_request.c:694:lov_update_create_set()) error creating fid 0x80013 sub-object on OST idx 5/1: rc = -11
      LustreError: 7396:0:(lov_request.c:694:lov_update_create_set()) Skipped 1 previous similar message
      Lustre: DEBUG MARKER: /usr/sbin/lctl mark ==== Checking the clients loads AFTER failover -- failure NOT OK
      Lustre: DEBUG MARKER: ==== Checking the clients loads AFTER failover -- failure NOT OK
      Lustre: DEBUG MARKER: /usr/sbin/lctl mark Client load failed on node client-31vm6, rc=1
      

      Maloo report: https://maloo.whamcloud.com/test_sets/40f45b4c-760f-11e2-b5e2-52540035b04c

      Attachments

        Activity

          [LU-2824] recovery-mds-scale test_failover_ost: tar: etc/localtime: Cannot open: Input/output error
          nozaki Hiroya Nozaki (Inactive) added a comment - - edited

          Hi All. I'm sorry to interrupt you suddenly but I'd like to say some words.

          1) In b2_1, IMHO, I think we should handle -ENODEV case too in the same way of the -EBUSY. Because when obd_stopping has already been set, mdtlov might have already been released.

          "__mds_lov_synchronize()"
          static int __mds_lov_synchronize(void *data)
          {
                  struct mds_lov_sync_info *mlsi = data;
                  struct obd_device *obd = mlsi->mlsi_obd;
                  struct obd_device *watched = mlsi->mlsi_watched;
                  struct mds_obd *mds = &obd->u.mds;
                  struct obd_uuid *uuid;
                  __u32  idx = mlsi->mlsi_index;
                  struct mds_group_info mgi;
                  struct llog_ctxt *ctxt;
                  int rc = 0;
                  ENTRY;
          
                  OBD_FREE_PTR(mlsi);
          
                  LASSERT(obd);
                  LASSERT(watched);
                  uuid = &watched->u.cli.cl_target_uuid;
                  LASSERT(uuid);
          
                  cfs_down_read(&mds->mds_notify_lock);          ///////////////////
                  if (obd->obd_stopping || obd->obd_fail) <----- // when the case //
                          GOTO(out, rc = -ENODEV);               ///////////////////
          
                  ....
          
                  EXIT;
          out:
                  cfs_up_read(&mds->mds_notify_lock);
                  if (rc) {
                          /* Deactivate it for safety */
                          CERROR("%s sync failed %d, deactivating\n", obd_uuid2str(uuid),
                                 rc);                                          ///////////////////////////////////////////////////////////////////////////////
                          if (!obd->obd_stopping && mds->mds_lov_obd && <----- //there must be an address, but we don't know wheter it's been freed or not. //
                              !mds->mds_lov_obd->obd_stopping && !watched->obd_stopping)
                                  obd_notify(mds->mds_lov_obd, watched,
                                             OBD_NOTIFY_INACTIVE, NULL);
                  }
          
                  class_decref(obd, "mds_lov_synchronize", obd);
                  return rc;
          }
          
          "mds_precleanup"
          static int mds_precleanup(struct obd_device *obd, enum obd_cleanup_stage stage)
          {
                  struct mds_obd *mds = &obd->u.mds;
                  struct llog_ctxt *ctxt;
                  int rc = 0;
                  ENTRY;
          
                  switch (stage) {
                  case OBD_CLEANUP_EARLY:
                          break;
                  case OBD_CLEANUP_EXPORTS:
                          mds_lov_early_clean(obd);
                          cfs_down_write(&mds->mds_notify_lock);
                          mds_lov_disconnect(obd);
                          mds_lov_clean(obd);
                          ctxt = llog_get_context(obd, LLOG_CONFIG_ORIG_CTXT);
                          if (ctxt)
                                  llog_cleanup(ctxt);
                          ctxt = llog_get_context(obd, LLOG_LOVEA_ORIG_CTXT);
                          if (ctxt)
                                  llog_cleanup(ctxt);
                          rc = obd_llog_finish(obd, 0);
                          mds->mds_lov_exp = NULL;
                          cfs_up_write(&mds->mds_notify_lock);
                          break;
                  }
                  RETURN(rc);
          }
          

          2) I'm not so sure but the backported patch of LU-1291(http://review.whamcloud.com/#change,2708) looks necessary to me.
          3) Don't we need the mds_notify_lock logic for b1_8 too ?

          Anyway, I'll be grad when you find my comments helpful.

          nozaki Hiroya Nozaki (Inactive) added a comment - - edited Hi All. I'm sorry to interrupt you suddenly but I'd like to say some words. 1) In b2_1, IMHO, I think we should handle -ENODEV case too in the same way of the -EBUSY. Because when obd_stopping has already been set, mdtlov might have already been released. "__mds_lov_synchronize()" static int __mds_lov_synchronize(void *data) { struct mds_lov_sync_info *mlsi = data; struct obd_device *obd = mlsi->mlsi_obd; struct obd_device *watched = mlsi->mlsi_watched; struct mds_obd *mds = &obd->u.mds; struct obd_uuid *uuid; __u32 idx = mlsi->mlsi_index; struct mds_group_info mgi; struct llog_ctxt *ctxt; int rc = 0; ENTRY; OBD_FREE_PTR(mlsi); LASSERT(obd); LASSERT(watched); uuid = &watched->u.cli.cl_target_uuid; LASSERT(uuid); cfs_down_read(&mds->mds_notify_lock); /////////////////// if (obd->obd_stopping || obd->obd_fail) <----- // when the case // GOTO(out, rc = -ENODEV); /////////////////// .... EXIT; out: cfs_up_read(&mds->mds_notify_lock); if (rc) { /* Deactivate it for safety */ CERROR( "%s sync failed %d, deactivating\n" , obd_uuid2str(uuid), rc); /////////////////////////////////////////////////////////////////////////////// if (!obd->obd_stopping && mds->mds_lov_obd && <----- //there must be an address, but we don 't know wheter it' s been freed or not. // !mds->mds_lov_obd->obd_stopping && !watched->obd_stopping) obd_notify(mds->mds_lov_obd, watched, OBD_NOTIFY_INACTIVE, NULL); } class_decref(obd, "mds_lov_synchronize" , obd); return rc; } "mds_precleanup" static int mds_precleanup(struct obd_device *obd, enum obd_cleanup_stage stage) { struct mds_obd *mds = &obd->u.mds; struct llog_ctxt *ctxt; int rc = 0; ENTRY; switch (stage) { case OBD_CLEANUP_EARLY: break ; case OBD_CLEANUP_EXPORTS: mds_lov_early_clean(obd); cfs_down_write(&mds->mds_notify_lock); mds_lov_disconnect(obd); mds_lov_clean(obd); ctxt = llog_get_context(obd, LLOG_CONFIG_ORIG_CTXT); if (ctxt) llog_cleanup(ctxt); ctxt = llog_get_context(obd, LLOG_LOVEA_ORIG_CTXT); if (ctxt) llog_cleanup(ctxt); rc = obd_llog_finish(obd, 0); mds->mds_lov_exp = NULL; cfs_up_write(&mds->mds_notify_lock); break ; } RETURN(rc); } 2) I'm not so sure but the backported patch of LU-1291 ( http://review.whamcloud.com/#change,2708 ) looks necessary to me. 3) Don't we need the mds_notify_lock logic for b1_8 too ? Anyway, I'll be grad when you find my comments helpful.

          Peter, we need to port it to b2_1. Master has different orphan cleanup mechanism, it needn't such fix.

          patch for b2_1: http://review.whamcloud.com/5462

          niu Niu Yawei (Inactive) added a comment - Peter, we need to port it to b2_1. Master has different orphan cleanup mechanism, it needn't such fix. patch for b2_1: http://review.whamcloud.com/5462
          pjones Peter Jones added a comment -

          Niu

          Do we need this same change on master and/or b2_1?

          Peter

          pjones Peter Jones added a comment - Niu Do we need this same change on master and/or b2_1? Peter
          yujian Jian Yu added a comment -

          Hi Oleg,

          Sorry for the confusion, the original issue in this ticket occurred after failing over OST, and the MDS dmesg log showed that:

          LustreError: 7396:0:(lov_request.c:694:lov_update_create_set()) error creating fid 0x80013 sub-object on OST idx 0/1: rc = -11
          LustreError: 7725:0:(lov_request.c:694:lov_update_create_set()) error creating fid 0x80013 sub-object on OST idx 0/1: rc = -5
          LustreError: 7725:0:(mds_open.c:442:mds_create_objects()) error creating objects for inode 524307: rc = -5
          LustreError: 7725:0:(mds_open.c:827:mds_finish_open()) mds_create_objects: rc = -5
          

          Maloo report: https://maloo.whamcloud.com/test_sets/40f45b4c-760f-11e2-b5e2-52540035b04c

          So, it seems the original issue is a duplicate of LU-1475.

          The second issue in this ticket occurred after failing over MDS, and the MDS dmesg log showed that:

          LustreError: 3182:0:(lov_obd.c:1153:lov_clear_orphans()) error in orphan recovery on OST idx 1/7: rc = -16
          LustreError: 2768:0:(llog_server.c:466:llog_origin_handle_cancel()) Cancel 1 of 2 llog-records failed: -2
          LustreError: 3182:0:(mds_lov.c:1057:__mds_lov_synchronize()) lustre-OST0001_UUID failed at mds_lov_clear_orphans: -16
          LustreError: 3182:0:(mds_lov.c:1066:__mds_lov_synchronize()) lustre-OST0001_UUID sync failed -16, deactivating
          Lustre: Service thread pid 2786 completed after 102.31s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
          LustreError: 2793:0:(mds_open.c:442:mds_create_objects()) error creating objects for inode 2622037: rc = -5
          LustreError: 2793:0:(mds_open.c:827:mds_finish_open()) mds_create_objects: rc = -5
          

          Maloo report: https://maloo.whamcloud.com/test_sets/61cf03e0-76c7-11e2-bc2f-52540035b04c

          For the second issue, as you suggested, we need figure out why OST failed to clear orphans. After looking into the console and debug logs on OSS node, I did not see failure messages. However, on MDS, I found:

          Lustre: Service thread pid 2786 was inactive for 60.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
          Pid: 2786, comm: ll_mdt_01
          
          Call Trace:
           [<ffffffff8b03f220>] lustre_pack_request+0x630/0x6f0 [ptlrpc]
           [<ffffffff8006389f>] schedule_timeout+0x8a/0xad
           [<ffffffff8009a950>] process_timeout+0x0/0x5
           [<ffffffff8b105bd5>] osc_create+0xc75/0x13d0 [osc]
           [<ffffffff8005c3be>] cache_alloc_refill+0x108/0x188
           [<ffffffff8008f372>] default_wake_function+0x0/0xe
           [<ffffffff8b1b4edb>] qos_remedy_create+0x45b/0x570 [lov]
           [<ffffffff8002e4db>] __wake_up+0x38/0x4f
           [<ffffffff8008eb6b>] dequeue_task+0x18/0x37
           [<ffffffff8b1aedf3>] lov_fini_create_set+0x243/0x11e0 [lov]
           [<ffffffff8b1a2b72>] lov_create+0x1552/0x1860 [lov]
           [<ffffffff8b2952e5>] ldiskfs_mark_iloc_dirty+0x4a5/0x540 [ldiskfs]
           [<ffffffff8008f372>] default_wake_function+0x0/0xe
           [<ffffffff8b38eb8a>] mds_finish_open+0x1fea/0x43e0 [mds]
           [<ffffffff80019d5a>] __getblk+0x25/0x22c
           [<ffffffff8b28c16b>] __ldiskfs_handle_dirty_metadata+0xdb/0x110 [ldiskfs]
           [<ffffffff8b2952e5>] ldiskfs_mark_iloc_dirty+0x4a5/0x540 [ldiskfs]
           [<ffffffff8b295b07>] ldiskfs_mark_inode_dirty+0x187/0x1e0 [ldiskfs]
           [<ffffffff80063af9>] mutex_lock+0xd/0x1d
           [<ffffffff8b395e11>] mds_open+0x2f01/0x386b [mds]
           [<ffffffff8b03d5f1>] lustre_swab_buf+0x81/0x170 [ptlrpc]
           [<ffffffff8000d585>] dput+0x2c/0x114
           [<ffffffff8b36c0c5>] mds_reint_rec+0x365/0x550 [mds]
           [<ffffffff8b396d3e>] mds_update_unpack+0x1fe/0x280 [mds]
           [<ffffffff8b35eeda>] mds_reint+0x35a/0x420 [mds]
           [<ffffffff8b35ddea>] fixup_handle_for_resent_req+0x5a/0x2c0 [mds]
           [<ffffffff8b368c0e>] mds_intent_policy+0x49e/0xc10 [mds]
           [<ffffffff8affe270>] ldlm_resource_putref_internal+0x230/0x460 [ptlrpc]
           [<ffffffff8affbeb6>] ldlm_lock_enqueue+0x186/0xb20 [ptlrpc]
           [<ffffffff8aff87fd>] ldlm_lock_create+0x9bd/0x9f0 [ptlrpc]
           [<ffffffff8b020870>] ldlm_server_blocking_ast+0x0/0x83d [ptlrpc]
           [<ffffffff8b01db29>] ldlm_handle_enqueue+0xbf9/0x1210 [ptlrpc]
           [<ffffffff8b367b40>] mds_handle+0x40e0/0x4d10 [mds]
           [<ffffffff8aed5868>] libcfs_ip_addr2str+0x38/0x40 [libcfs]
           [<ffffffff8aed5c7e>] libcfs_nid2str+0xbe/0x110 [libcfs]
           [<ffffffff8b048af5>] ptlrpc_server_log_handling_request+0x105/0x130 [ptlrpc]
           [<ffffffff8b04b874>] ptlrpc_server_handle_request+0x984/0xe00 [ptlrpc]
           [<ffffffff8b04bfd5>] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc]
           [<ffffffff8008d7a6>] __wake_up_common+0x3e/0x68
           [<ffffffff8b04cf16>] ptlrpc_main+0xf16/0x10e0 [ptlrpc]
           [<ffffffff8005dfc1>] child_rip+0xa/0x11
          

          And the debug log on MDS showed that:

          00000004:02000000:0:1360747003.206214:0:3136:0:(mds_lov.c:1049:__mds_lov_synchronize()) MDS lustre-MDT0000: lustre-OST0000_UUID now active, resetting orphans
          00020000:00080000:0:1360747003.206224:0:3136:0:(lov_obd.c:1116:lov_clear_orphans()) clearing orphans only for lustre-OST0000_UUID
          00020000:01000000:0:1360747003.206225:0:3136:0:(lov_obd.c:1141:lov_clear_orphans()) Clear orphans for 0:lustre-OST0000_UUID
          00000008:00080000:0:1360747003.206229:0:3136:0:(osc_create.c:556:osc_create()) lustre-OST0000-osc: oscc recovery started - delete to 23015
          00000008:00080000:0:1360747003.206238:0:3136:0:(osc_request.c:406:osc_real_create()) @@@ delorphan from OST integration  req@ffff810068000c00 x1426846474323651/t0 o5->lustre-OST0000_UUID@10.10.4.199@tcp:28/4 lens 400/592 e 0 to 1 dl 0 ref 1 fl New:/0/0 rc 0/0
          ......
          00000004:02000000:0:1360747003.206849:0:3181:0:(mds_lov.c:1049:__mds_lov_synchronize()) MDS lustre-MDT0000: lustre-OST0000_UUID now active, resetting orphans
          00020000:00080000:0:1360747003.206852:0:3181:0:(lov_obd.c:1116:lov_clear_orphans()) clearing orphans only for lustre-OST0000_UUID
          00020000:01000000:0:1360747003.206853:0:3181:0:(lov_obd.c:1141:lov_clear_orphans()) Clear orphans for 0:lustre-OST0000_UUID
          00020000:00020000:0:1360747003.206855:0:3181:0:(lov_obd.c:1153:lov_clear_orphans()) error in orphan recovery on OST idx 0/7: rc = -16
          00000004:00020000:0:1360747003.207975:0:3181:0:(mds_lov.c:1057:__mds_lov_synchronize()) lustre-OST0000_UUID failed at mds_lov_clear_orphans: -16
          00000004:00020000:0:1360747003.209032:0:3181:0:(mds_lov.c:1066:__mds_lov_synchronize()) lustre-OST0000_UUID sync failed -16, deactivating
          00020000:01000000:0:1360747003.209906:0:3181:0:(lov_obd.c:589:lov_set_osc_active()) Marking OSC lustre-OST0000_UUID inactive
          

          The orphans clear operation was performed twice with the same OST, and in the second run, osc_create() returns -EBUSY because OSCC_FLAG_SYNC_IN_PROGRESS flag was still set due to slow ll_mdt service thread. Then __mds_lov_synchronize()) deactivated the OSC. Other OSCs also hit this issue and were deactivated.

          Niu proposed a patch to fix this issue in http://review.whamcloud.com/5450. Could you please inspect? Thanks.

          yujian Jian Yu added a comment - Hi Oleg, Sorry for the confusion, the original issue in this ticket occurred after failing over OST, and the MDS dmesg log showed that: LustreError: 7396:0:(lov_request.c:694:lov_update_create_set()) error creating fid 0x80013 sub-object on OST idx 0/1: rc = -11 LustreError: 7725:0:(lov_request.c:694:lov_update_create_set()) error creating fid 0x80013 sub-object on OST idx 0/1: rc = -5 LustreError: 7725:0:(mds_open.c:442:mds_create_objects()) error creating objects for inode 524307: rc = -5 LustreError: 7725:0:(mds_open.c:827:mds_finish_open()) mds_create_objects: rc = -5 Maloo report: https://maloo.whamcloud.com/test_sets/40f45b4c-760f-11e2-b5e2-52540035b04c So, it seems the original issue is a duplicate of LU-1475 . The second issue in this ticket occurred after failing over MDS, and the MDS dmesg log showed that: LustreError: 3182:0:(lov_obd.c:1153:lov_clear_orphans()) error in orphan recovery on OST idx 1/7: rc = -16 LustreError: 2768:0:(llog_server.c:466:llog_origin_handle_cancel()) Cancel 1 of 2 llog-records failed: -2 LustreError: 3182:0:(mds_lov.c:1057:__mds_lov_synchronize()) lustre-OST0001_UUID failed at mds_lov_clear_orphans: -16 LustreError: 3182:0:(mds_lov.c:1066:__mds_lov_synchronize()) lustre-OST0001_UUID sync failed -16, deactivating Lustre: Service thread pid 2786 completed after 102.31s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). LustreError: 2793:0:(mds_open.c:442:mds_create_objects()) error creating objects for inode 2622037: rc = -5 LustreError: 2793:0:(mds_open.c:827:mds_finish_open()) mds_create_objects: rc = -5 Maloo report: https://maloo.whamcloud.com/test_sets/61cf03e0-76c7-11e2-bc2f-52540035b04c For the second issue, as you suggested, we need figure out why OST failed to clear orphans. After looking into the console and debug logs on OSS node, I did not see failure messages. However, on MDS, I found: Lustre: Service thread pid 2786 was inactive for 60.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Pid: 2786, comm: ll_mdt_01 Call Trace: [<ffffffff8b03f220>] lustre_pack_request+0x630/0x6f0 [ptlrpc] [<ffffffff8006389f>] schedule_timeout+0x8a/0xad [<ffffffff8009a950>] process_timeout+0x0/0x5 [<ffffffff8b105bd5>] osc_create+0xc75/0x13d0 [osc] [<ffffffff8005c3be>] cache_alloc_refill+0x108/0x188 [<ffffffff8008f372>] default_wake_function+0x0/0xe [<ffffffff8b1b4edb>] qos_remedy_create+0x45b/0x570 [lov] [<ffffffff8002e4db>] __wake_up+0x38/0x4f [<ffffffff8008eb6b>] dequeue_task+0x18/0x37 [<ffffffff8b1aedf3>] lov_fini_create_set+0x243/0x11e0 [lov] [<ffffffff8b1a2b72>] lov_create+0x1552/0x1860 [lov] [<ffffffff8b2952e5>] ldiskfs_mark_iloc_dirty+0x4a5/0x540 [ldiskfs] [<ffffffff8008f372>] default_wake_function+0x0/0xe [<ffffffff8b38eb8a>] mds_finish_open+0x1fea/0x43e0 [mds] [<ffffffff80019d5a>] __getblk+0x25/0x22c [<ffffffff8b28c16b>] __ldiskfs_handle_dirty_metadata+0xdb/0x110 [ldiskfs] [<ffffffff8b2952e5>] ldiskfs_mark_iloc_dirty+0x4a5/0x540 [ldiskfs] [<ffffffff8b295b07>] ldiskfs_mark_inode_dirty+0x187/0x1e0 [ldiskfs] [<ffffffff80063af9>] mutex_lock+0xd/0x1d [<ffffffff8b395e11>] mds_open+0x2f01/0x386b [mds] [<ffffffff8b03d5f1>] lustre_swab_buf+0x81/0x170 [ptlrpc] [<ffffffff8000d585>] dput+0x2c/0x114 [<ffffffff8b36c0c5>] mds_reint_rec+0x365/0x550 [mds] [<ffffffff8b396d3e>] mds_update_unpack+0x1fe/0x280 [mds] [<ffffffff8b35eeda>] mds_reint+0x35a/0x420 [mds] [<ffffffff8b35ddea>] fixup_handle_for_resent_req+0x5a/0x2c0 [mds] [<ffffffff8b368c0e>] mds_intent_policy+0x49e/0xc10 [mds] [<ffffffff8affe270>] ldlm_resource_putref_internal+0x230/0x460 [ptlrpc] [<ffffffff8affbeb6>] ldlm_lock_enqueue+0x186/0xb20 [ptlrpc] [<ffffffff8aff87fd>] ldlm_lock_create+0x9bd/0x9f0 [ptlrpc] [<ffffffff8b020870>] ldlm_server_blocking_ast+0x0/0x83d [ptlrpc] [<ffffffff8b01db29>] ldlm_handle_enqueue+0xbf9/0x1210 [ptlrpc] [<ffffffff8b367b40>] mds_handle+0x40e0/0x4d10 [mds] [<ffffffff8aed5868>] libcfs_ip_addr2str+0x38/0x40 [libcfs] [<ffffffff8aed5c7e>] libcfs_nid2str+0xbe/0x110 [libcfs] [<ffffffff8b048af5>] ptlrpc_server_log_handling_request+0x105/0x130 [ptlrpc] [<ffffffff8b04b874>] ptlrpc_server_handle_request+0x984/0xe00 [ptlrpc] [<ffffffff8b04bfd5>] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] [<ffffffff8008d7a6>] __wake_up_common+0x3e/0x68 [<ffffffff8b04cf16>] ptlrpc_main+0xf16/0x10e0 [ptlrpc] [<ffffffff8005dfc1>] child_rip+0xa/0x11 And the debug log on MDS showed that: 00000004:02000000:0:1360747003.206214:0:3136:0:(mds_lov.c:1049:__mds_lov_synchronize()) MDS lustre-MDT0000: lustre-OST0000_UUID now active, resetting orphans 00020000:00080000:0:1360747003.206224:0:3136:0:(lov_obd.c:1116:lov_clear_orphans()) clearing orphans only for lustre-OST0000_UUID 00020000:01000000:0:1360747003.206225:0:3136:0:(lov_obd.c:1141:lov_clear_orphans()) Clear orphans for 0:lustre-OST0000_UUID 00000008:00080000:0:1360747003.206229:0:3136:0:(osc_create.c:556:osc_create()) lustre-OST0000-osc: oscc recovery started - delete to 23015 00000008:00080000:0:1360747003.206238:0:3136:0:(osc_request.c:406:osc_real_create()) @@@ delorphan from OST integration req@ffff810068000c00 x1426846474323651/t0 o5->lustre-OST0000_UUID@10.10.4.199@tcp:28/4 lens 400/592 e 0 to 1 dl 0 ref 1 fl New:/0/0 rc 0/0 ...... 00000004:02000000:0:1360747003.206849:0:3181:0:(mds_lov.c:1049:__mds_lov_synchronize()) MDS lustre-MDT0000: lustre-OST0000_UUID now active, resetting orphans 00020000:00080000:0:1360747003.206852:0:3181:0:(lov_obd.c:1116:lov_clear_orphans()) clearing orphans only for lustre-OST0000_UUID 00020000:01000000:0:1360747003.206853:0:3181:0:(lov_obd.c:1141:lov_clear_orphans()) Clear orphans for 0:lustre-OST0000_UUID 00020000:00020000:0:1360747003.206855:0:3181:0:(lov_obd.c:1153:lov_clear_orphans()) error in orphan recovery on OST idx 0/7: rc = -16 00000004:00020000:0:1360747003.207975:0:3181:0:(mds_lov.c:1057:__mds_lov_synchronize()) lustre-OST0000_UUID failed at mds_lov_clear_orphans: -16 00000004:00020000:0:1360747003.209032:0:3181:0:(mds_lov.c:1066:__mds_lov_synchronize()) lustre-OST0000_UUID sync failed -16, deactivating 00020000:01000000:0:1360747003.209906:0:3181:0:(lov_obd.c:589:lov_set_osc_active()) Marking OSC lustre-OST0000_UUID inactive The orphans clear operation was performed twice with the same OST, and in the second run, osc_create() returns -EBUSY because OSCC_FLAG_SYNC_IN_PROGRESS flag was still set due to slow ll_mdt service thread. Then __mds_lov_synchronize()) deactivated the OSC. Other OSCs also hit this issue and were deactivated. Niu proposed a patch to fix this issue in http://review.whamcloud.com/5450 . Could you please inspect? Thanks.
          green Oleg Drokin added a comment -

          This seems to be a very similar problem, though unlike lu-1475, here we see there is an error MDS gets from OST (in clear orphans) that prompts it to deactivate the export and this leads to object creation failures, where as in lu-1475 the failure is kind of sporadic without any real error visible other than ost restarting.

          So for this bug I imagine somebody needs to figure out why did OST fail to clear orphans like requested.

          green Oleg Drokin added a comment - This seems to be a very similar problem, though unlike lu-1475, here we see there is an error MDS gets from OST (in clear orphans) that prompts it to deactivate the export and this leads to object creation failures, where as in lu-1475 the failure is kind of sporadic without any real error visible other than ost restarting. So for this bug I imagine somebody needs to figure out why did OST fail to clear orphans like requested.
          yujian Jian Yu added a comment -

          Hi Oleg,

          Is the original issue in this ticket a duplicate of LU-1475?

          yujian Jian Yu added a comment - Hi Oleg, Is the original issue in this ticket a duplicate of LU-1475 ?
          yujian Jian Yu added a comment -

          Lustre Tag: v1_8_9_WC1_RC1
          Lustre Build: http://build.whamcloud.com/job/lustre-b1_8/256
          Distro/Arch: RHEL5.9/x86_64
          Network: TCP (1GigE)
          ENABLE_QUOTA=yes
          FAILURE_MODE=HARD

          While running recovery-mds-scale test_failover_mds, it also failed with the similar issue after running 6 hours (MDS failed over 26 times).

          The output of "tar" operation on client node (client-32vm6) showed that:

          <~snip~>
          tar: etc/default/nss: Cannot open: Input/output error
          tar: etc/default/useradd: Cannot open: Input/output error
          tar: Error exit delayed from previous errors
          

          Dmesg on the MDS node (client-32vm3) showed that:

          Lustre: lustre-OST0000-osc: Connection restored to service lustre-OST0000 using nid 10.10.4.199@tcp.
          Lustre: lustre-OST0002-osc: Connection restored to service lustre-OST0002 using nid 10.10.4.199@tcp.
          Lustre: Skipped 1 previous similar message
          Lustre: MDS lustre-MDT0000: lustre-OST0000_UUID now active, resetting orphans
          LustreError: 3181:0:(lov_obd.c:1153:lov_clear_orphans()) error in orphan recovery on OST idx 0/7: rc = -16
          LustreError: 3181:0:(mds_lov.c:1057:__mds_lov_synchronize()) lustre-OST0000_UUID failed at mds_lov_clear_orphans: -16
          LustreError: 3181:0:(mds_lov.c:1066:__mds_lov_synchronize()) lustre-OST0000_UUID sync failed -16, deactivating
          Lustre: MDS lustre-MDT0000: lustre-OST0001_UUID now active, resetting orphans
          Lustre: Skipped 2 previous similar messages
          LustreError: 3182:0:(lov_obd.c:1153:lov_clear_orphans()) error in orphan recovery on OST idx 1/7: rc = -16
          LustreError: 2768:0:(llog_server.c:466:llog_origin_handle_cancel()) Cancel 1 of 2 llog-records failed: -2
          LustreError: 3182:0:(mds_lov.c:1057:__mds_lov_synchronize()) lustre-OST0001_UUID failed at mds_lov_clear_orphans: -16
          LustreError: 3182:0:(mds_lov.c:1066:__mds_lov_synchronize()) lustre-OST0001_UUID sync failed -16, deactivating
          Lustre: Service thread pid 2786 completed after 102.31s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
          LustreError: 2793:0:(mds_open.c:442:mds_create_objects()) error creating objects for inode 2622037: rc = -5
          LustreError: 2793:0:(mds_open.c:827:mds_finish_open()) mds_create_objects: rc = -5
          LustreError: 2794:0:(mds_open.c:442:mds_create_objects()) error creating objects for inode 2622037: rc = -5
          LustreError: 2794:0:(mds_open.c:827:mds_finish_open()) mds_create_objects: rc = -5
          LustreError: 2792:0:(mds_open.c:442:mds_create_objects()) error creating objects for inode 2622052: rc = -5
          LustreError: 2792:0:(mds_open.c:442:mds_create_objects()) Skipped 581 previous similar messages
          LustreError: 2792:0:(mds_open.c:827:mds_finish_open()) mds_create_objects: rc = -5
          LustreError: 2792:0:(mds_open.c:827:mds_finish_open()) Skipped 581 previous similar messages
          LustreError: 2769:0:(llog_server.c:466:llog_origin_handle_cancel()) Cancel 2 of 2 llog-records failed: -2
          LustreError: 2769:0:(llog_server.c:466:llog_origin_handle_cancel()) Cancel 59 of 122 llog-records failed: -2
          LustreError: 2769:0:(llog_server.c:466:llog_origin_handle_cancel()) Skipped 6 previous similar messages
          LustreError: 3189:0:(llog_server.c:466:llog_origin_handle_cancel()) Cancel 51 of 122 llog-records failed: -2
          

          Maloo report: https://maloo.whamcloud.com/test_sets/61cf03e0-76c7-11e2-bc2f-52540035b04c

          yujian Jian Yu added a comment - Lustre Tag: v1_8_9_WC1_RC1 Lustre Build: http://build.whamcloud.com/job/lustre-b1_8/256 Distro/Arch: RHEL5.9/x86_64 Network: TCP (1GigE) ENABLE_QUOTA=yes FAILURE_MODE=HARD While running recovery-mds-scale test_failover_mds, it also failed with the similar issue after running 6 hours (MDS failed over 26 times). The output of "tar" operation on client node (client-32vm6) showed that: <~snip~> tar: etc/default/nss: Cannot open: Input/output error tar: etc/default/useradd: Cannot open: Input/output error tar: Error exit delayed from previous errors Dmesg on the MDS node (client-32vm3) showed that: Lustre: lustre-OST0000-osc: Connection restored to service lustre-OST0000 using nid 10.10.4.199@tcp. Lustre: lustre-OST0002-osc: Connection restored to service lustre-OST0002 using nid 10.10.4.199@tcp. Lustre: Skipped 1 previous similar message Lustre: MDS lustre-MDT0000: lustre-OST0000_UUID now active, resetting orphans LustreError: 3181:0:(lov_obd.c:1153:lov_clear_orphans()) error in orphan recovery on OST idx 0/7: rc = -16 LustreError: 3181:0:(mds_lov.c:1057:__mds_lov_synchronize()) lustre-OST0000_UUID failed at mds_lov_clear_orphans: -16 LustreError: 3181:0:(mds_lov.c:1066:__mds_lov_synchronize()) lustre-OST0000_UUID sync failed -16, deactivating Lustre: MDS lustre-MDT0000: lustre-OST0001_UUID now active, resetting orphans Lustre: Skipped 2 previous similar messages LustreError: 3182:0:(lov_obd.c:1153:lov_clear_orphans()) error in orphan recovery on OST idx 1/7: rc = -16 LustreError: 2768:0:(llog_server.c:466:llog_origin_handle_cancel()) Cancel 1 of 2 llog-records failed: -2 LustreError: 3182:0:(mds_lov.c:1057:__mds_lov_synchronize()) lustre-OST0001_UUID failed at mds_lov_clear_orphans: -16 LustreError: 3182:0:(mds_lov.c:1066:__mds_lov_synchronize()) lustre-OST0001_UUID sync failed -16, deactivating Lustre: Service thread pid 2786 completed after 102.31s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). LustreError: 2793:0:(mds_open.c:442:mds_create_objects()) error creating objects for inode 2622037: rc = -5 LustreError: 2793:0:(mds_open.c:827:mds_finish_open()) mds_create_objects: rc = -5 LustreError: 2794:0:(mds_open.c:442:mds_create_objects()) error creating objects for inode 2622037: rc = -5 LustreError: 2794:0:(mds_open.c:827:mds_finish_open()) mds_create_objects: rc = -5 LustreError: 2792:0:(mds_open.c:442:mds_create_objects()) error creating objects for inode 2622052: rc = -5 LustreError: 2792:0:(mds_open.c:442:mds_create_objects()) Skipped 581 previous similar messages LustreError: 2792:0:(mds_open.c:827:mds_finish_open()) mds_create_objects: rc = -5 LustreError: 2792:0:(mds_open.c:827:mds_finish_open()) Skipped 581 previous similar messages LustreError: 2769:0:(llog_server.c:466:llog_origin_handle_cancel()) Cancel 2 of 2 llog-records failed: -2 LustreError: 2769:0:(llog_server.c:466:llog_origin_handle_cancel()) Cancel 59 of 122 llog-records failed: -2 LustreError: 2769:0:(llog_server.c:466:llog_origin_handle_cancel()) Skipped 6 previous similar messages LustreError: 3189:0:(llog_server.c:466:llog_origin_handle_cancel()) Cancel 51 of 122 llog-records failed: -2 Maloo report: https://maloo.whamcloud.com/test_sets/61cf03e0-76c7-11e2-bc2f-52540035b04c
          yujian Jian Yu added a comment -

          We used to hit LU-463 while running the recovery-mds-scale test_failover_ost on Lustre b1_8 build.

          yujian Jian Yu added a comment - We used to hit LU-463 while running the recovery-mds-scale test_failover_ost on Lustre b1_8 build.

          People

            niu Niu Yawei (Inactive)
            yujian Jian Yu
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: