Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13476

vvp_set_pagevec_dirty / vvp_page_completion_write lock ordering appears to trigger RCU stalls

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.14.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      vvp_set_pagevec_dirty

      locks mapping->page_tree and integrates over pages
      lock_page_memcg
      <work>
      unock_page_memcg
      vvp_page_completion_write
      calls lock_page_memcg
      and then locks mapping->page_tree

      This can cause an extended lock or deadlock with vvp_set_pagevec_dirty

      One core spinning here

      #0 [ffffc9000b97fa78] _raw_spin_lock_irqsave at ffffffff815b65e9
          /home/abuild/rpmbuild/BUILD/kernel-cray_ari_c-4.12.14/linux-4.12.14/linux-obj/../kernel/locking/spinlock.c: 160
      #1 [ffffc9000b97fa98] lock_page_memcg at ffffffff811d7a89
          /home/abuild/rpmbuild/BUILD/kernel-cray_ari_c-4.12.14/linux-4.12.14/linux-obj/../mm/memcontrol.c: 1695
      #2 [ffffc9000b97fac0] test_clear_page_writeback at ffffffff8117c479
          /home/abuild/rpmbuild/BUILD/kernel-cray_ari_c-4.12.14/linux-4.12.14/linux-obj/../mm/page-writeback.c: 2780
      #3 [ffffc9000b97fb10] end_page_writeback at ffffffff8116a657
          /home/abuild/rpmbuild/BUILD/kernel-cray_ari_c-4.12.14/linux-4.12.14/linux-obj/../mm/filemap.c: 1273
      #4 [ffffc9000b97fb28] vvp_page_completion_write at ffffffffa0816341 [lustre]
          /home/abuild/rpmbuild/BUILD/cray-lustre-2.12.0.5_cray_290_gdd6781b/lustre/llite/vvp_page.c: 316
      #5 [ffffc9000b97fb58] cl_page_completion at ffffffffa0504663 [obdclass]
          /home/abuild/rpmbuild/BUILD/cray-lustre-2.12.0.5_cray_290_gdd6781b/lustre/obdclass/cl_page.c: 931
      

      With many cores spinning here:

        _raw_spin_lock_irqsave+0x39/0x50
        vvp_set_pagevec_dirty+0x97/0x3a0 [lustre]
        write_commit_callback+0x64/0x1a0 [lustre]
        osc_queue_async_io+0x910/0x18e0 [osc]
        ? vvp_set_pagevec_dirty+0x3a0/0x3a0 [lustre]
        ? vvp_set_pagevec_dirty+0x3a0/0x3a0 [lustre]
        osc_page_cache_add+0x5f/0x180 [osc]
        osc_io_commit_async+0x2a0/0x500 [osc]
        ? vvp_set_pagevec_dirty+0x3a0/0x3a0 [lustre]
        ? vvp_set_pagevec_dirty+0x3a0/0x3a0 [lustre]
        cl_io_commit_async+0xa9/0x150 [obdclass]
        ? vvp_set_pagevec_dirty+0x3a0/0x3a0 [lustre]
        lov_io_commit_async+0x106/0x580 [lov]
        ? vvp_set_pagevec_dirty+0x3a0/0x3a0 [lustre]
        ? vvp_set_pagevec_dirty+0x3a0/0x3a0 [lustre]
        cl_io_commit_async+0xa9/0x150 [obdclass]
        vvp_io_write_commit+0x157/0x5e0 [lustre]
        vvp_io_write_start+0x6ac/0x8b0 [lustre]
        cl_io_start+0x6e/0x120 [obdclass]
        cl_io_loop+0xca/0x1c0 [obdclass]
        ll_file_io_generic+0x3c9/0xdd0 [lustre]
        ll_file_write_iter+0x124/0x630 [lustre]
      

      Would explain the RCU stall where the memcg and tree_lock/i_pages order is inverted.

      Attachments

        Activity

          People

            stancheff Shaun Tancheff
            stancheff Shaun Tancheff
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: