Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-39

ASSERTION(atomic_read(&client_stat->nid_exp_ref_count) == 0) failed: count 1

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.1.0, Lustre 1.8.6
    • Lustre 1.8.6
    • None
    • lustre 1.8.3.0-6chaos
    • 3
    • 23,499
    • 5093

    Description

      A sysadmin was shutting down an MDS node cleanly in preparation for scheduled upgrades. During the umount of the MGS device, we hit the following assertion:

      LustreError ... (lprocfs_status.c:1060:lprocfs_free_client_stats())
      ASSERTION(atomic_read(&client_stat->nid_exp_ref_count) == 0) failed: count 1

      And the stack trace was:

      :obdclass:lprocfs_free_client_stats
      :obdclass:lprocfs_free_per_client_stats
      :mgs:lproc_mgs_cleanup
      :mgs:mgs_cleanup
      :obdclass:class_decref
      :obdclass:class_export_destroy
      :obdclass:obd_zombie_impexp_cull
      :obdclass:obd_zombie_impexp_thread

      We have seen this same assertion from OSTs as well. Some investigation was done in bug 23499, but there is not yet a solution.

      Attachments

        Activity

          [LU-39] ASSERTION(atomic_read(&client_stat->nid_exp_ref_count) == 0) failed: count 1

          Integrated in lustre-master-centos5 #159
          LU-39 ASSERTION(atomic_read(&client_stat->nid_exp_ref_count) == 0)

          Oleg Drokin : 2a6045403fbd46bb6501df907f0321f5401924ba
          Files :

          • lustre/mdt/mdt_fs.c
          • lustre/mgs/mgs_internal.h
          • lustre/mdt/mdt_internal.h
          • lustre/include/lprocfs_status.h
          • lustre/mgs/mgs_fs.c
          • lustre/mgs/mgs_handler.c
          • lustre/obdfilter/filter.c
          • lustre/obdclass/lprocfs_status.c
          • lustre/mdt/mdt_handler.c
          hudson Build Master (Inactive) added a comment - Integrated in lustre-master-centos5 #159 LU-39 ASSERTION(atomic_read(&client_stat->nid_exp_ref_count) == 0) Oleg Drokin : 2a6045403fbd46bb6501df907f0321f5401924ba Files : lustre/mdt/mdt_fs.c lustre/mgs/mgs_internal.h lustre/mdt/mdt_internal.h lustre/include/lprocfs_status.h lustre/mgs/mgs_fs.c lustre/mgs/mgs_handler.c lustre/obdfilter/filter.c lustre/obdclass/lprocfs_status.c lustre/mdt/mdt_handler.c

          Integrated in reviews-centos5 #529
          LU-39 ASSERTION(atomic_read(&client_stat->nid_exp_ref_count) == 0)

          Bobi Jam : 8efdff9aeb1933f9b1e7320ff48ad84983e4daa3
          Files :

          • lustre/mgs/mgs_fs.c
          • lustre/mdt/mdt_fs.c
          • lustre/mdt/mdt_internal.h
          • lustre/include/lprocfs_status.h
          • lustre/mgs/mgs_handler.c
          • lustre/obdfilter/filter.c
          • lustre/obdclass/lprocfs_status.c
          • lustre/mdt/mdt_handler.c
          • lustre/mgs/mgs_internal.h
          hudson Build Master (Inactive) added a comment - Integrated in reviews-centos5 #529 LU-39 ASSERTION(atomic_read(&client_stat->nid_exp_ref_count) == 0) Bobi Jam : 8efdff9aeb1933f9b1e7320ff48ad84983e4daa3 Files : lustre/mgs/mgs_fs.c lustre/mdt/mdt_fs.c lustre/mdt/mdt_internal.h lustre/include/lprocfs_status.h lustre/mgs/mgs_handler.c lustre/obdfilter/filter.c lustre/obdclass/lprocfs_status.c lustre/mdt/mdt_handler.c lustre/mgs/mgs_internal.h
          bobijam Zhenyu Xu added a comment -

          HEAD version patch.

          bobijam Zhenyu Xu added a comment - HEAD version patch.

          bobi, I think we might want this patch for 2.x as well, I just made a quick look and found it's also in 2.x.

          Thanks

          liang Liang Zhen (Inactive) added a comment - bobi, I think we might want this patch for 2.x as well, I just made a quick look and found it's also in 2.x. Thanks
          bobijam Zhenyu Xu added a comment -

          yes, that could be the case.

          bobijam Zhenyu Xu added a comment - yes, that could be the case.

          The OSS nodes and the MDS have multiple NIDs, and the clients only have a single NID that they use to talk to the servers. Could this be caused by the MDS losing connection with the OSS nodes, and reconnecting using a different NID? The MDS would be similar to a client in this case.

          marc@llnl.gov D. Marc Stearman (Inactive) added a comment - The OSS nodes and the MDS have multiple NIDs, and the clients only have a single NID that they use to talk to the servers. Could this be caused by the MDS losing connection with the OSS nodes, and reconnecting using a different NID? The MDS would be similar to a client in this case.
          bobijam Zhenyu Xu added a comment - - edited

          Christopher,

          Do some of the clients use multip NIDs to connect servers?

          If a client is configed with multiple NIDs, when the connected NID encounters problem, it will reconnect server with another new NID, the lprocfs_exp_setup() misses releasing the old NID's stats refcount.

          bobijam Zhenyu Xu added a comment - - edited Christopher, Do some of the clients use multip NIDs to connect servers? If a client is configed with multiple NIDs, when the connected NID encounters problem, it will reconnect server with another new NID, the lprocfs_exp_setup() misses releasing the old NID's stats refcount.

          People

            bobijam Zhenyu Xu
            morrone Christopher Morrone (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: