Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-328

OSS pseudo-hang due to (struct filter_obd *)->fo_llog_list_lock deadlock upon OSTs warm restart/recovery

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.1.0
    • Lustre 2.0.0, Lustre 2.1.0
    • None
    • Bull PetaFlopV1.0AE1, kernel 2.6.32-71.14.1.el6, lustre 2.0.0.1
    • 3
    • 5005

    Description

      During a "warm" (shine stop/start) of several OSTs, OSS became pseudo-hang (ping ok but no login possible).

      Forced crash-dump analysis show that all 32 CPUs/Cores are running/spinning with 32 ll_ost_<xx> threads all with the following stack-trace:
      =======================================================
      _spin_lock()
      filter_sync_llogs()
      filter_set_info_async()
      target_handle_connect()
      ost_handle()
      ptlrpc_server_handle_request()
      ptlrpc_main()
      kernel_thread()
      =======================================================

      when the thread owning the concerned (struct filter_obd *)->fo_llog_list_lock spin-lock is sleeping/waiting to be re-scheduled with the following stack-trace :
      ==================================================================================
      schedule()
      __cond_resched()
      _cond_resched()
      __kmalloc()
      cfs_alloc()
      filter_find_create_olg()
      filter_set_info_async()
      ost_set_info()
      ost_handle()
      ptlrpc_server_handle_request()
      ptlrpc_main()
      kernel_thread()
      ==================================================================================

      So this definitelly points to a bug in filter_find_create_olg() where if filter_find_olg_internal() does not find a matching struct obd_llog_group and it needs to be kmem-allocated, this can and must be done with (struct filter_obd *)->fo_llog_list_lock freed to avoid this race+dead-lock scenario.

      This means that the following source extract of filter_find_create_olg() routine :
      ==================================================================================
      OBD_ALLOC_PTR(olg);
      if (olg == NULL)
      GOTO(out_unlock, olg = ERR_PTR(-ENOMEM));

      llog_group_init(olg, group);
      ==================================================================================

      should be replaced with :
      =========================
      spin_unlock(&filter->fo_llog_list_lock);
      OBD_ALLOC_PTR(olg);
      if (olg == NULL)
      GOTO(out, olg = ERR_PTR(-ENOMEM));

      llog_group_init(olg, group);
      spin_lock(&filter->fo_llog_list_lock);
      =========================

      Attachments

        Activity

          [LU-328] OSS pseudo-hang due to (struct filter_obd *)->fo_llog_list_lock deadlock upon OSTs warm restart/recovery
          pjones Peter Jones added a comment -

          Landed for 2.1

          pjones Peter Jones added a comment - Landed for 2.1

          Integrated in lustre-master » x86_64,server,el5,ofa #149
          LU-328 unlock fo_llog_list_lock before allocating memory

          Oleg Drokin : 799ed89ac8633b17f631cc13a3fb6fd90d8e958c
          Files :

          • lustre/obdfilter/filter.c
          hudson Build Master (Inactive) added a comment - Integrated in lustre-master » x86_64,server,el5,ofa #149 LU-328 unlock fo_llog_list_lock before allocating memory Oleg Drokin : 799ed89ac8633b17f631cc13a3fb6fd90d8e958c Files : lustre/obdfilter/filter.c

          Integrated in lustre-master » x86_64,client,el5,ofa #149
          LU-328 unlock fo_llog_list_lock before allocating memory

          Oleg Drokin : 799ed89ac8633b17f631cc13a3fb6fd90d8e958c
          Files :

          • lustre/obdfilter/filter.c
          hudson Build Master (Inactive) added a comment - Integrated in lustre-master » x86_64,client,el5,ofa #149 LU-328 unlock fo_llog_list_lock before allocating memory Oleg Drokin : 799ed89ac8633b17f631cc13a3fb6fd90d8e958c Files : lustre/obdfilter/filter.c

          Integrated in lustre-master » i686,server,el6,inkernel #149
          LU-328 unlock fo_llog_list_lock before allocating memory

          Oleg Drokin : 799ed89ac8633b17f631cc13a3fb6fd90d8e958c
          Files :

          • lustre/obdfilter/filter.c
          hudson Build Master (Inactive) added a comment - Integrated in lustre-master » i686,server,el6,inkernel #149 LU-328 unlock fo_llog_list_lock before allocating memory Oleg Drokin : 799ed89ac8633b17f631cc13a3fb6fd90d8e958c Files : lustre/obdfilter/filter.c

          Integrated in lustre-master » x86_64,server,el5,inkernel #149
          LU-328 unlock fo_llog_list_lock before allocating memory

          Oleg Drokin : 799ed89ac8633b17f631cc13a3fb6fd90d8e958c
          Files :

          • lustre/obdfilter/filter.c
          hudson Build Master (Inactive) added a comment - Integrated in lustre-master » x86_64,server,el5,inkernel #149 LU-328 unlock fo_llog_list_lock before allocating memory Oleg Drokin : 799ed89ac8633b17f631cc13a3fb6fd90d8e958c Files : lustre/obdfilter/filter.c

          Integrated in lustre-master » x86_64,client,el5,inkernel #149
          LU-328 unlock fo_llog_list_lock before allocating memory

          Oleg Drokin : 799ed89ac8633b17f631cc13a3fb6fd90d8e958c
          Files :

          • lustre/obdfilter/filter.c
          hudson Build Master (Inactive) added a comment - Integrated in lustre-master » x86_64,client,el5,inkernel #149 LU-328 unlock fo_llog_list_lock before allocating memory Oleg Drokin : 799ed89ac8633b17f631cc13a3fb6fd90d8e958c Files : lustre/obdfilter/filter.c

          Integrated in lustre-master » i686,client,el6,inkernel #149
          LU-328 unlock fo_llog_list_lock before allocating memory

          Oleg Drokin : 799ed89ac8633b17f631cc13a3fb6fd90d8e958c
          Files :

          • lustre/obdfilter/filter.c
          hudson Build Master (Inactive) added a comment - Integrated in lustre-master » i686,client,el6,inkernel #149 LU-328 unlock fo_llog_list_lock before allocating memory Oleg Drokin : 799ed89ac8633b17f631cc13a3fb6fd90d8e958c Files : lustre/obdfilter/filter.c

          Integrated in lustre-master » i686,server,el5,ofa #149
          LU-328 unlock fo_llog_list_lock before allocating memory

          Oleg Drokin : 799ed89ac8633b17f631cc13a3fb6fd90d8e958c
          Files :

          • lustre/obdfilter/filter.c
          hudson Build Master (Inactive) added a comment - Integrated in lustre-master » i686,server,el5,ofa #149 LU-328 unlock fo_llog_list_lock before allocating memory Oleg Drokin : 799ed89ac8633b17f631cc13a3fb6fd90d8e958c Files : lustre/obdfilter/filter.c

          Integrated in lustre-master » i686,server,el5,inkernel #149
          LU-328 unlock fo_llog_list_lock before allocating memory

          Oleg Drokin : 799ed89ac8633b17f631cc13a3fb6fd90d8e958c
          Files :

          • lustre/obdfilter/filter.c
          hudson Build Master (Inactive) added a comment - Integrated in lustre-master » i686,server,el5,inkernel #149 LU-328 unlock fo_llog_list_lock before allocating memory Oleg Drokin : 799ed89ac8633b17f631cc13a3fb6fd90d8e958c Files : lustre/obdfilter/filter.c

          Integrated in lustre-master » x86_64,client,ubuntu1004,ofa #149
          LU-328 unlock fo_llog_list_lock before allocating memory

          Oleg Drokin : 799ed89ac8633b17f631cc13a3fb6fd90d8e958c
          Files :

          • lustre/obdfilter/filter.c
          hudson Build Master (Inactive) added a comment - Integrated in lustre-master » x86_64,client,ubuntu1004,ofa #149 LU-328 unlock fo_llog_list_lock before allocating memory Oleg Drokin : 799ed89ac8633b17f631cc13a3fb6fd90d8e958c Files : lustre/obdfilter/filter.c

          People

            hongchao.zhang Hongchao Zhang
            louveta Alexandre Louvet (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: