Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1431 Support for larger than 1MB sequential I/O RPCs
  3. LU-2702

client read RPCs do not generate read requests in brw_stats

    XMLWordPrintable

Details

    • Technical task
    • Resolution: Not a Bug
    • Minor
    • None
    • Lustre 2.4.0
    • None
    • 6295

    Description

      It appears that there is some kind of bug with the client page cache and DLM locking, where pages are left in cache when the DLM lock is cleared. This can be easily demonstrated with a simple test:

      [root@sookie-gig tests]# dd if=/dev/zero of=/mnt/testfs/f2 bs=1M count=10
      10+0 records in
      10+0 records out
      10485760 bytes (10 MB) copied, 0.643432 s, 16.3 MB/s
      [root@sookie-gig tests]# lctl set_param ldlm.namespaces.*OST*.lru_size=0
      ldlm.namespaces.testfs-OST0000-osc-ffff88009ee56800.lru_size=0
      ldlm.namespaces.testfs-OST0001-osc-ffff88009ee56800.lru_size=0
      ldlm.namespaces.testfs-OST0002-osc-ffff88009ee56800.lru_size=0
      [root@sookie-gig tests]# wc -l /proc/fs/lustre/llite/*/dump_page_cache
      0 /proc/fs/lustre/llite/testfs-ffff88009ee56800/dump_page_cache
      [root@sookie-gig tests]# lctl set_param osd-ldiskfs.*.brw_stats=0
      osd-ldiskfs.testfs-OST0000.brw_stats=0
      osd-ldiskfs.testfs-OST0001.brw_stats=0
      osd-ldiskfs.testfs-OST0002.brw_stats=0
      [root@sookie-gig tests]# wc -l /proc/fs/lustre/llite/*/dump_page_cache
      0 /proc/fs/lustre/llite/testfs-ffff88009ee56800/dump_page_cache
      [root@sookie-gig tests]# dd of=/dev/null if=/mnt/testfs/f0 bs=1M count=10
      10+0 records in
      10+0 records out
      10485760 bytes (10 MB) copied, 0.55208 s, 19.0 MB/s
      [root@sookie-gig tests]# ../utils/lctl get_param osd-ldiskfs.*OST0002*.brw_stats | less
      osd-ldiskfs.testfs-OST0002.brw_stats=
      snapshot_time:         1359442889.629725 (secs.usecs)
                                 read      |     write
      pages per bulk r/w     rpcs  % cum % |  rpcs        % cum %
      

      This looks like no RPC is sent to the OST for the read, even though all of the DLM locks have been dropped on the client, and no pages are being tracked by Lustre. However, dropping the VM cache causes an RPC on the OST.

      [root@sookie-gig tests]# echo 3 > /proc/sys/vm/drop_caches
      [root@sookie-gig tests]# dd of=/dev/null if=/mnt/testfs/f0 bs=1M count=10
      10+0 records in
      10+0 records out
      10485760 bytes (10 MB) copied, 0.698268 s, 15.0 MB/s
      [root@sookie-gig tests]# ../utils/lctl get_param osd-ldiskfs.*OST0002*.brw_stats | less
      osd-ldiskfs.testfs-OST0002.brw_stats=
      snapshot_time:         1359442909.926812 (secs.usecs)
                                 read      |     write
      pages per bulk r/w     rpcs  % cum % |  rpcs        % cum %
      256:                    10 100 100   |    0   0   0
      

      However, it appears that this is an artifact of having a single-node test system with client and OST on the same node:

      [root@sookie-gig tests]# lctl get_param osc.*OST0002*.stats
      osc.testfs-OST0002-osc-MDT0000.stats=
      snapshot_time             1359443359.105016 secs.usecs
      req_waittime              7 samples [usec] 823 1171 7129 7374769
      req_active                7 samples [reqs] 1 1 7 7
      ost_statfs                7 samples [usec] 823 1171 7129 7374769
      osc.testfs-OST0002-osc-ffff88009ee56800.stats=
      snapshot_time             1359443359.105890 secs.usecs
      req_waittime              18 samples [usec] 1870 26512 171590 2515558672
      req_active                18 samples [reqs] 1 2 22 30
      read_bytes                10 samples [bytes] 1048576 1048576 10485760 10995116277760
      ost_read                  10 samples [usec] 11644 26512 149316 2452393930
      obd_ping                  7 samples [usec] 2640 3084 20404 59667842
      [root@sookie-gig tests]# lctl get_param ost.OSS.ost_io.stats 
      ost.OSS.ost_io.stats=
      snapshot_time             1359443418.884993 secs.usecs
      req_waittime              10 samples [usec] 372 547 4714 2246654
      req_qdepth                10 samples [reqs] 0 0 0 0
      req_active                10 samples [reqs] 1 1 10 10
      req_timeout               10 samples [sec] 1 1 10 10
      reqbuf_avail              22 samples [bufs] 63 64 1404 89604
      ost_read                  10 samples [usec] 10818 17798 131256 1806684314
      

      This shows that the client did send an RPC to the OST. The culprit is that the OST has a read cache to satisfy the read RPCs without submitting an IO request to the OSD. I think the easiest way to handle this is to just check the ost_read and ost_write RPC count in the client osc.*.stats instead of getting it from the OST. That also avoids the need for drop_caches, which will not fix the test when the OST is on a remote node as is normal during our testing.

      Attachments

        Activity

          People

            adilger Andreas Dilger
            adilger Andreas Dilger
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: