Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5471

Assertion at cl_page_assume for HSM

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • None
    • None
    • None
    • 3
    • 15242

    Description

      The stack trace is as follows:

      LustreError: 9876:0:(cl_page.c:698:cl_page_assume()) page@ffff88062850a7e0[2 ffff880bf2500e18 0 1834972009 1 ffff880633900065 0000000000000003 0x0]
      LustreError: 9876:0:(cl_page.c:698:cl_page_assume()) vvp-page@ffff88062850a880(0:0:0) vm@ffffea00154cfbd8 40000000000801 4:0 ffff88062850a7e0 0 lru
      LustreError: 9876:0:(cl_page.c:698:cl_page_assume()) lov-page@ffff88062850a8d8, raid0
      LustreError: 9876:0:(cl_page.c:698:cl_page_assume()) osc-page@ffff88062851a700 0: 1< 0x845fed 0 1 - - > 2< 0 0 0 0x10000 0x100 | 0001000000000000 ffff880c090f3620 ffff88063267cea8 > 3< - (null) 0 18446462603027842665 0 > 4< 0 0 8 10481664 - | - - - - > 5< - - - - | 0 - | 0 - ->
      LustreError: 9876:0:(cl_page.c:698:cl_page_assume()) end page@ffff88062850a7e0
      LustreError: 9876:0:(cl_page.c:698:cl_page_assume()) pg->cp_owner == NULL
      LustreError: 9876:0:(cl_page.c:698:cl_page_assume()) ASSERTION( 0 ) failed:
      Aug 5 13:23:35 LustreError: 9876:0:(cl_page.c:698:cl_page_assume()) LBUG
      Pid: 9876, comm: cat
      Call Trace:
      [<ffffffffa0839895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      n003 kernel: Lus [<ffffffffa0839e97>] lbug_with_loc+0x47/0xb0 [libcfs]
      treError: 9876:0 [<ffffffffa0999002>] cl_page_assume+0x212/0x220 [obdclass]
      [<ffffffffa0f03f73>] ll_readpage+0x83/0x1a0 [lustre]
      :(cl_page.c:698: [<ffffffff81120f6c>] generic_file_aio_read+0x1fc/0x700
      [<ffffffffa0f2a48f>] ? cl_glimpse_lock+0x1df/0x490 [lustre]
      cl_page_assume() [<ffffffffa0f34f99>] vvp_io_read_start+0x259/0x470 [lustre]
      ) page@ffff88062 [<ffffffffa09a199a>] cl_io_start+0x6a/0x140 [obdclass]
      850a7e0[2 ffff88 [<ffffffffa09a5b24>] cl_io_loop+0xb4/0x1b0 [obdclass]
      0bf2500e18 0 183 [<ffffffffa0ed6f06>] ll_file_io_generic+0x2b6/0x710 [lustre]
      4972009 1 ffff88 [<ffffffffa0ed7faf>] ll_file_aio_read+0x13f/0x2c0 [lustre]
      [<ffffffffa0ed829c>] ll_file_read+0x16c/0x2a0 [lustre]
      0633900065 00000 [<ffffffff81189365>] vfs_read+0xb5/0x1a0
      [<ffffffff811894a1>] sys_read+0x51/0x90
      [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      

      The root cause of this issue is that we need to adjust page size after a file loses its layout; otherwise, the page size will keep increasing and eventually overflow an u16 integer that is used to remember the current page size.

      This issue can be easily reproduced by releasing and restoring an HSM in a loop.

      Attachments

        Activity

          People

            jay Jinshan Xiong (Inactive)
            jay Jinshan Xiong (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: