Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.2.0, Lustre 2.1.2
    • Lustre 2.2.0
    • None
    • 3
    • 4739

    Description

      Each time running the racer test the MDS eventually oops when running with the newest lustre code from master.

      Attachments

        1. barry-all.sh
          5 kB
          James A Simmons

        Issue Links

          Activity

            [LU-1017] MDS oops when running racer test
            pjones Peter Jones made changes -
            Fix Version/s New: Lustre 2.1.2 [ 10111 ]
            bogl Bob Glossman (Inactive) added a comment - http://review.whamcloud.com/#change,2629 back port to b2_1

            Integrated in lustre-master » i686,client,el6,ofa #480
            LU-1017 handle -EAGAIN properly in lu_object_find_try() (Revision 4ce426ab33196ff04cc7016a2dc717e53d1b4a75)

            Result = ABORTED
            Oleg Drokin : 4ce426ab33196ff04cc7016a2dc717e53d1b4a75
            Files :

            • lustre/obdclass/lu_object.c
            hudson Build Master (Inactive) added a comment - Integrated in lustre-master » i686,client,el6,ofa #480 LU-1017 handle -EAGAIN properly in lu_object_find_try() (Revision 4ce426ab33196ff04cc7016a2dc717e53d1b4a75) Result = ABORTED Oleg Drokin : 4ce426ab33196ff04cc7016a2dc717e53d1b4a75 Files : lustre/obdclass/lu_object.c

            Integrated in lustre-master » x86_64,client,el6,ofa #480
            LU-1017 handle -EAGAIN properly in lu_object_find_try() (Revision 4ce426ab33196ff04cc7016a2dc717e53d1b4a75)

            Result = FAILURE
            Oleg Drokin : 4ce426ab33196ff04cc7016a2dc717e53d1b4a75
            Files :

            • lustre/obdclass/lu_object.c
            hudson Build Master (Inactive) added a comment - Integrated in lustre-master » x86_64,client,el6,ofa #480 LU-1017 handle -EAGAIN properly in lu_object_find_try() (Revision 4ce426ab33196ff04cc7016a2dc717e53d1b4a75) Result = FAILURE Oleg Drokin : 4ce426ab33196ff04cc7016a2dc717e53d1b4a75 Files : lustre/obdclass/lu_object.c

            Integrated in lustre-master » x86_64,server,el6,ofa #480
            LU-1017 handle -EAGAIN properly in lu_object_find_try() (Revision 4ce426ab33196ff04cc7016a2dc717e53d1b4a75)

            Result = FAILURE
            Oleg Drokin : 4ce426ab33196ff04cc7016a2dc717e53d1b4a75
            Files :

            • lustre/obdclass/lu_object.c
            hudson Build Master (Inactive) added a comment - Integrated in lustre-master » x86_64,server,el6,ofa #480 LU-1017 handle -EAGAIN properly in lu_object_find_try() (Revision 4ce426ab33196ff04cc7016a2dc717e53d1b4a75) Result = FAILURE Oleg Drokin : 4ce426ab33196ff04cc7016a2dc717e53d1b4a75 Files : lustre/obdclass/lu_object.c
            tappro Mikhail Pershin added a comment - - edited

            this patch overlaps with previous commit b9ccecd1453c5c76fe135048c39f149c241650c6 LU-1013 obdclass: lu_object_find miss to unlink object from LRU.

            Fan Yong was trying to solve the same issue it seems but cover only single case and also didn't handle -EAGAIN case. This is his change:

            @@ -627,12 +627,14 @@ static struct lu_object *lu_object_find_try(const struct lu_env *env,
                             bkt->lsb_busy++;
                             cfs_hash_bd_unlock(hs, &bd, 1);
                             return o;
            +        } else {
            +                if (!cfs_list_empty(&shadow->lo_header->loh_lru))
            +                        cfs_list_del_init(&shadow->lo_header->loh_lru);
            +                lprocfs_counter_incr(s->ls_stats, LU_SS_CACHE_RACE);
            +                cfs_hash_bd_unlock(hs, &bd, 1);
            +                lu_object_free(env, o);
            +                return shadow;
                     }
            -
            -        lprocfs_counter_incr(s->ls_stats, LU_SS_CACHE_RACE);
            -        cfs_hash_bd_unlock(hs, &bd, 1);
            -        lu_object_free(env, o);
            -        return shadow;
             }
            

            Now we have other code using result of htable_lookup() without checking for -EAGAIN. Meanwhile this commit is not needed at all after LU-1017 fix, because the last does list_del_init inside of htable_lookup, so we can just revert b9ccecd1453c5c76fe135048c39f149c241650c6 to solve this issue.

            tappro Mikhail Pershin added a comment - - edited this patch overlaps with previous commit b9ccecd1453c5c76fe135048c39f149c241650c6 LU-1013 obdclass: lu_object_find miss to unlink object from LRU. Fan Yong was trying to solve the same issue it seems but cover only single case and also didn't handle -EAGAIN case. This is his change: @@ -627,12 +627,14 @@ static struct lu_object *lu_object_find_try( const struct lu_env *env, bkt->lsb_busy++; cfs_hash_bd_unlock(hs, &bd, 1); return o; + } else { + if (!cfs_list_empty(&shadow->lo_header->loh_lru)) + cfs_list_del_init(&shadow->lo_header->loh_lru); + lprocfs_counter_incr(s->ls_stats, LU_SS_CACHE_RACE); + cfs_hash_bd_unlock(hs, &bd, 1); + lu_object_free(env, o); + return shadow; } - - lprocfs_counter_incr(s->ls_stats, LU_SS_CACHE_RACE); - cfs_hash_bd_unlock(hs, &bd, 1); - lu_object_free(env, o); - return shadow; } Now we have other code using result of htable_lookup() without checking for -EAGAIN. Meanwhile this commit is not needed at all after LU-1017 fix, because the last does list_del_init inside of htable_lookup, so we can just revert b9ccecd1453c5c76fe135048c39f149c241650c6 to solve this issue.

            Integrated in lustre-master » i686,client,el5,ofa #468
            LU-1017 handle -EAGAIN properly in lu_object_find_try() (Revision 4ce426ab33196ff04cc7016a2dc717e53d1b4a75)

            Result = SUCCESS
            Oleg Drokin : 4ce426ab33196ff04cc7016a2dc717e53d1b4a75
            Files :

            • lustre/obdclass/lu_object.c
            hudson Build Master (Inactive) added a comment - Integrated in lustre-master » i686,client,el5,ofa #468 LU-1017 handle -EAGAIN properly in lu_object_find_try() (Revision 4ce426ab33196ff04cc7016a2dc717e53d1b4a75) Result = SUCCESS Oleg Drokin : 4ce426ab33196ff04cc7016a2dc717e53d1b4a75 Files : lustre/obdclass/lu_object.c

            Integrated in lustre-master » i686,server,el5,ofa #468
            LU-1017 handle -EAGAIN properly in lu_object_find_try() (Revision 4ce426ab33196ff04cc7016a2dc717e53d1b4a75)

            Result = SUCCESS
            Oleg Drokin : 4ce426ab33196ff04cc7016a2dc717e53d1b4a75
            Files :

            • lustre/obdclass/lu_object.c
            hudson Build Master (Inactive) added a comment - Integrated in lustre-master » i686,server,el5,ofa #468 LU-1017 handle -EAGAIN properly in lu_object_find_try() (Revision 4ce426ab33196ff04cc7016a2dc717e53d1b4a75) Result = SUCCESS Oleg Drokin : 4ce426ab33196ff04cc7016a2dc717e53d1b4a75 Files : lustre/obdclass/lu_object.c

            Integrated in lustre-master » i686,server,el5,inkernel #468
            LU-1017 handle -EAGAIN properly in lu_object_find_try() (Revision 4ce426ab33196ff04cc7016a2dc717e53d1b4a75)

            Result = SUCCESS
            Oleg Drokin : 4ce426ab33196ff04cc7016a2dc717e53d1b4a75
            Files :

            • lustre/obdclass/lu_object.c
            hudson Build Master (Inactive) added a comment - Integrated in lustre-master » i686,server,el5,inkernel #468 LU-1017 handle -EAGAIN properly in lu_object_find_try() (Revision 4ce426ab33196ff04cc7016a2dc717e53d1b4a75) Result = SUCCESS Oleg Drokin : 4ce426ab33196ff04cc7016a2dc717e53d1b4a75 Files : lustre/obdclass/lu_object.c

            Integrated in lustre-master » i686,client,el5,inkernel #468
            LU-1017 handle -EAGAIN properly in lu_object_find_try() (Revision 4ce426ab33196ff04cc7016a2dc717e53d1b4a75)

            Result = SUCCESS
            Oleg Drokin : 4ce426ab33196ff04cc7016a2dc717e53d1b4a75
            Files :

            • lustre/obdclass/lu_object.c
            hudson Build Master (Inactive) added a comment - Integrated in lustre-master » i686,client,el5,inkernel #468 LU-1017 handle -EAGAIN properly in lu_object_find_try() (Revision 4ce426ab33196ff04cc7016a2dc717e53d1b4a75) Result = SUCCESS Oleg Drokin : 4ce426ab33196ff04cc7016a2dc717e53d1b4a75 Files : lustre/obdclass/lu_object.c

            People

              niu Niu Yawei (Inactive)
              simmonsja James A Simmons
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: