Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5263

ll_read_ahead_pages() ASSERTION( page_idx > ria->ria_stoff )

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.7.0, Lustre 2.5.3
    • Lustre 2.1.6
    • None
    • RHEL 6 w/ kernel 2.6.32_220.23.1
    • 3
    • 14689

    Description

      One Lustre client frequently crash on LBUG with the ASSERTION( page_idx > ria->ria_stoff ).. (13 crashes in the past 3 months)

      This Lustre client acts as a nfs server and exports Lustre to a web server through nfs-ganesha.

      ----8< ----
      [24937.600920] Lustre: DEBUG MARKER: Thu Jun 5 20:00:01 2014
      [24937.600921]
      [24950.667750] LustreError: 4126:0:(rw.c:698:ll_read_ahead_pages()) ASSERTION( page_idx > ria->ria_stoff ) failed: Invalid page_idx 234497rs 234497 re 300287 ro 234751 rl 256 rp 1
      [24950.683642] LustreError: 4126:0:(rw.c:698:ll_read_ahead_pages()) LBUG
      [24950.690154] Pid: 4126, comm: ganesha.nfsd
      [24950.695572]
      [24950.695573] Call Trace:
      [24950.702337] [<ffffffffa05697f5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      [24950.710703] [<ffffffffa0569e07>] lbug_with_loc+0x47/0xb0 [libcfs]
      [24950.718317] [<ffffffffa0bb511f>] ll_readahead+0x10cf/0x1100 [lustre]
      [24950.726131] [<ffffffffa0bdc805>] vvp_io_read_page+0x305/0x360 [lustre]
      [24950.734159] [<ffffffffa068eb4d>] cl_io_read_page+0x8d/0x170 [obdclass]
      [24950.742108] [<ffffffffa0682c19>] ? cl_page_assume+0xf9/0x2d0 [obdclass]
      [24950.750166] [<ffffffffa0bb5746>] ll_readpage+0x96/0x200 [lustre]
      [24950.757661] [<ffffffff810ff88c>] generic_file_aio_read+0x1fc/0x700
      [24950.765360] [<ffffffff810816ff>] ? up+0x2f/0x50
      [24950.771433] [<ffffffffa0bdce1b>] vvp_io_read_start+0x13b/0x3e0 [lustre]
      [24950.779575] [<ffffffffa068cb4a>] cl_io_start+0x6a/0x140 [obdclass]
      [24950.787244] [<ffffffffa0690e2c>] cl_io_loop+0xcc/0x190 [obdclass]
      [24950.794848] [<ffffffffa0b8d097>] ll_file_io_generic+0x3a7/0x560 [lustre]
      [24950.803059] [<ffffffffa0b8d389>] ll_file_aio_read+0x139/0x2c0 [lustre]
      [24950.811092] [<ffffffffa0b8d849>] ll_file_read+0x169/0x2a0 [lustre]
      [24950.818784] [<ffffffff81164525>] vfs_read+0xb5/0x1a0
      [24950.825275] [<ffffffff81164852>] sys_pread64+0x82/0xa0
      [24950.831917] [<ffffffff810030f2>] system_call_fastpath+0x16/0x1b
      [24950.839353]
      [24950.842659] Kernel panic - not syncing: LBUG
      [24950.847986] Pid: 4126, comm: ganesha.nfsd Not tainted 2.6.32-220.23.1.bl6.Bull.28.10.x86_64 #1
      [24950.858073] Call Trace:
      [24950.861907] [<ffffffff814851a0>] ? panic+0x78/0x143
      [24950.868308] [<ffffffffa0569e5b>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
      [24950.876096] [<ffffffffa0bb511f>] ? ll_readahead+0x10cf/0x1100 [lustre]
      [24950.884141] [<ffffffffa0bdc805>] ? vvp_io_read_page+0x305/0x360 [lustre]
      [24950.892371] [<ffffffffa068eb4d>] ? cl_io_read_page+0x8d/0x170 [obdclass]
      [24950.900562] [<ffffffffa0682c19>] ? cl_page_assume+0xf9/0x2d0 [obdclass]
      [24950.908676] [<ffffffffa0bb5746>] ? ll_readpage+0x96/0x200 [lustre]
      [24950.916357] [<ffffffff810ff88c>] ? generic_file_aio_read+0x1fc/0x700
      [24950.924221] [<ffffffff810816ff>] ? up+0x2f/0x50
      [24950.930283] [<ffffffffa0bdce1b>] ? vvp_io_read_start+0x13b/0x3e0 [lustre]
      [24950.938586] [<ffffffffa068cb4a>] ? cl_io_start+0x6a/0x140 [obdclass]
      [24950.946448] [<ffffffffa0690e2c>] ? cl_io_loop+0xcc/0x190 [obdclass]
      [24950.954218] [<ffffffffa0b8d097>] ? ll_file_io_generic+0x3a7/0x560 [lustre]
      [24950.962601] [<ffffffffa0b8d389>] ? ll_file_aio_read+0x139/0x2c0 [lustre]
      [24950.970813] [<ffffffffa0b8d849>] ? ll_file_read+0x169/0x2a0 [lustre]
      [24950.978649] [<ffffffff81164525>] ? vfs_read+0xb5/0x1a0
      [24950.985288] [<ffffffff81164852>] ? sys_pread64+0x82/0xa0
      [24950.992092] [<ffffffff810030f2>] ? system_call_fastpath+0x16/0x1b
      ----8< ----

      We asked the customer to add read ahead to the debug log. (lctl set_param debug=+reada)

      The debug log is available in the attached support bundle (from crash 2014-06-05-20:01:29).

      Read-ahead settings:
      ----8< ----

      1. lctl get_param llite..max_read_ahead
        llite.store1-ffff88120ee93c00.max_read_ahead_mb=40
        llite.store1-ffff88120ee93c00.max_read_ahead_per_file_mb=40
        llite.store1-ffff88120ee93c00.max_read_ahead_whole_mb=2
        ----8< ----

      We asked the customer to take a look at the web server logs to see which files are accessed at the time of the crash. This is never the same file.

      It looks like LU-4192

      Attachments

        Issue Links

          Activity

            People

              bobijam Zhenyu Xu
              bruno.travouillon Bruno Travouillon (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: