Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5010

LBUG at osc_io.c:834:osc_req_attr_set())

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.6.0
    • None
    • None
    • 3
    • 13869

    Description

      The client node crashed with the following stack trace:

      May  5 06:13:59 iwc321 kernel: LustreError: 32047:0:(osc_io.c:834:osc_req_attr_set()) LBUG
      2014-05-05 06:17:19 INFO: task ior:32047 blocked for more than 120 seconds.
      2014-05-05 06:17:19       Not tainted 2.6.32-431.5.1.el6.x86_64 #1
      2014-05-05 06:17:19 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      2014-05-05 06:17:19 ior           D 0000000000000006     0 32047  32038 0x00000080
      2014-05-05 06:17:19  ffff8801b06a95c8 0000000000000082 0000000000000000 ffff8801bd6c5540
      2014-05-05 06:17:19  ffff8801bd6c5540 ffffffffa0d4d160 ffff8801b06a9588 ffff8801b0791068
      2014-05-05 06:17:19  ffff8801bd6c5af8 ffff8801b06a9fd8 000000000000fbc8 ffff8801bd6c5af8
      2014-05-05 06:17:19 Call Trace:
      2014-05-05 06:17:19  [<ffffffff81065df0>] ? default_wake_function+0x0/0x20
      2014-05-05 06:17:19  [<ffffffffa03acecd>] lbug_with_loc+0x7d/0xb0 [libcfs] 
      2014-05-05 06:17:19  [<ffffffffa0d241bc>] osc_req_attr_set+0x57c/0x5b0 [osc]
      2014-05-05 06:17:19  [<ffffffffa03b7328>] ? libcfs_log_return+0x28/0x40 [libcfs]
      2014-05-05 06:17:19  [<ffffffffa0524ef1>] cl_req_attr_set+0xd1/0x230 [obdclass]
      2014-05-05 06:17:19  [<ffffffffa0d14ec7>] osc_build_rpc+0x487/0x1850 [osc]
      2014-05-05 06:17:19  [<ffffffffa03bc911>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
      2014-05-05 06:17:19  [<ffffffffa0d304c7>] osc_io_unplug0+0x1257/0x1f00 [osc]
      2014-05-05 06:17:19  [<ffffffffa0d32ef1>] osc_io_unplug+0x11/0x20 [osc]
      2014-05-05 06:17:19  [<ffffffffa0d330cd>] osc_queue_sync_pages+0x1cd/0x350 [osc]
      2014-05-05 06:17:19  [<ffffffffa0d2477f>] osc_io_submit+0x33f/0x4b0 [osc]
      2014-05-05 06:17:20  [<ffffffffa052472e>] cl_io_submit_rw+0x6e/0x160 [obdclass]
      2014-05-05 06:17:20  [<ffffffffa0db80f9>] lov_io_submit+0x189/0x4b0 [lov]
      2014-05-05 06:17:20  [<ffffffffa052472e>] cl_io_submit_rw+0x6e/0x160 [obdclass]
      2014-05-05 06:17:20  [<ffffffffa0526dc0>] cl_io_read_page+0x180/0x190 [obdclass]
      2014-05-05 06:17:20  [<ffffffffa0e72f81>] ll_readpage+0x91/0x1a0 [lustre] 
      2014-05-05 06:17:20  [<ffffffff811212fc>] generic_file_aio_read+0x1fc/0x700
      2014-05-05 06:17:20  [<ffffffffa0ea4079>] vvp_io_read_start+0x259/0x470 [lustre]
      2014-05-05 06:17:20  [<ffffffffa05249ba>] cl_io_start+0x6a/0x140 [obdclass]
      2014-05-05 06:17:20  [<ffffffffa0528b44>] cl_io_loop+0xb4/0x1b0 [obdclass]
      2014-05-05 06:17:20  [<ffffffffa0e45f06>] ll_file_io_generic+0x2b6/0x710 [lustre]
      2014-05-05 06:17:20  [<ffffffffa05188f9>] ? cl_env_get+0x29/0x350 [obdclass]
      2014-05-05 06:17:20  [<ffffffffa0e46faf>] ll_file_aio_read+0x13f/0x2c0 [lustre]
      2014-05-05 06:17:20  [<ffffffffa0e4729c>] ll_file_read+0x16c/0x2a0 [lustre]
      2014-05-05 06:17:20  [<ffffffff81189695>] vfs_read+0xb5/0x1a0
      2014-05-05 06:17:20  [<ffffffff811897d1>] sys_read+0x51/0x90
      2014-05-05 06:17:20  [<ffffffff810e1e4e>] ? __audit_syscall_exit+0x25e/0x290
      2014-05-05 06:17:20  [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      

      This bug is introduced by the commit: http://review.whamcloud.com/8523 where read ahead lock matching process was optimized. However, it turns out there exists a race with lock cancellation.

      As a workaround fix, I'm going to disable the optimization and will come up with a fix later, after 2.6 is released.

      Attachments

        Activity

          People

            jay Jinshan Xiong (Inactive)
            jay Jinshan Xiong (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: