Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8936

Client LBUGs with cl_object.c:735:cl_env_attach() ASSERTION( rc == 0 ) in process ldlm_bl_02

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • None
    • None
    • lustre-2.8.0_5.chaos-2.ch6.x86_64
      kernel 3.10.0-514.0.0.1chaos.ch6.x86_64
    • 3
    • 9223372036854775807

    Description

      Console reports first this:

      LustreError: 7526:0:(cl_object.c:735:cl_env_attach()) ASSERTION( rc == 0 ) failed:
      LustreError: 7526:0:(cl_object.c:735:cl_env_attach()) LBUG
      Pid: 7526, comm: ldlm_bl_02
      
      Call Trace:
       libcfs_debug_dumpstack+0x53/0x80 [libcfs]
       lbug_with_loc+0x45/0xc0 [libcfs]
       cl_env_percpu_get+0xc2/0xd0 [obdclass]
       ll_invalidatepage+0x41/0x170 [lustre]
       vvp_page_discard+0xbd/0x160 [lustre]
       cl_page_invoid+0x68/0x170 [obdclass]
       cl_page_discard+0x13/0x20 [obdclass]
       discard_cb+0x67/0x190 [osc]
       osc_page_gang_lookup+0x1e0/0x320 [osc]
       ? discard_cb+0x0/0x190 [osc]
       osc_lock_discard_pages+0x119/0x22d [osc]
       ? discard_cb+0x0/0x190 [osc]
       osc_lock_flush+0x89/0x280 [osc]
       osc_ldlm_blocking_ast+0x2e3/0x3a0 [osc]
       ldlm_cancel_callback+0x8a/0x2e0 [ptlrpc]
       ? dequeue_entity+0x11c/0x5d0
       ldlm_cli_cancel_local+0xa0/0x420 [ptlrpc]
       ldlm_cli_cancel+0xab/0x3d0 [ptlrpc]
       osc_ldlm_blocking_ast+0x17a/0x3a0 [osc]
       ? __schedule+0x3b8/0x9c0
       ldlm_handle_bl_callback+0xcf/0x410 [ptlrpc]
       ldlm_bl_thread_main+0x531/0x700 [ptlrpc]
       ? default_wake_function+0x0/0x20
       ? ldlm_bl_thread_main+0x0/0x700 [ptlrpc]
       kthread+0xcf/0xe0
       ? kthread+0x0/0xe0
       ret_from_fork+0x58/0x90
       ? kthread+0x0/0xe0
      

      is followed by

      BUG: sleeping function called from invalid context at mm/slub.c:941
      in_atomic(): 1, irqs_disabled(): 0, pid: 7526, name: ldlm_bl_02
      CPU: 23 PID: 7526 Comm: ldlm_bl_02 Tainted: G           OE  ------------   3.10.0-514.0.0.1chaos.ch6.x86_64 #1
      Call Trace:
       dump_stack+0x19/0x1b
       __might_sleep+0xd9/0x100
       kmem_cache_alloc_trace+0x4a/0x250
       ? call_usermodehelper_setup+0x3f/0xa0
       call_usermodehelper_setup+0x3f/0xa0
       call_usermodehelper+0x31/0x60
       libcfs_run_upcall+0x9e/0x3b0 [libcfs]
       ? snprintf+0x49/0x70
       libcfs_run_lbug_upcall+0x7d/0x100 [libcfs]
       lbug_with_loc+0x57/0xc0 [libcfs]
       cl_env_percpu_get+0xc2/0xd0 [obdclass]
       ll_invalidatepage+0x41/0x170 [lustre]
       vvp_page_discard+0xbd/0x160 [lustre]
       cl_page_invoid+0x68/0x170 [obdclass]
       cl_page_discard+0x13/0x20 [obdclass]
       discard_cb+0x67/0x190 [osc]
       osc_page_gang_lookup+0x1e0/0x320 [osc]
       ? check_and_discard_cb+0x150/0x150 [osc]
       osc_lock_discard_pages+0x119/0x22d [osc]
       ? check_and_discard_cb+0x150/0x150 [osc]
       osc_lock_flush+0x89/0x280 [osc]
       osc_ldlm_blocking_ast+0x2e3/0x3a0 [osc]
       ldlm_cancel_callback+0x8a/0x2e0 [ptlrpc]
       ? dequeue_entity+0x11c/0x5d0
       ldlm_cli_cancel_local+0xa0/0x420 [ptlrpc]
       ldlm_cli_cancel+0xab/0x3d0 [ptlrpc]
       osc_ldlm_blocking_ast+0x17a/0x3a0 [osc]
       ? __schedule+0x3b8/0x9c0
       ldlm_handle_bl_callback+0xcf/0x410 [ptlrpc]
       ldlm_bl_thread_main+0x531/0x700 [ptlrpc]
       ? wake_up_state+0x20/0x20
       ? ldlm_handle_bl_callback+0x410/0x410 [ptlrpc]
       kthread+0xcf/0xe0
       ? kthread_create_on_node+0x140/0x140
       ret_from_fork+0x58/0x90
       ? kthread_create_on_node+0x140/0x140
      

      I'm not certain whether there is a particular workload that triggers this. We've been running concurrent mdtest and ior jobs, using remote directories but not striped directories.

      The frequency is high; running on 300 clients for about 2 hours triggered this bug in 1/3 of the nodes.

      Attachments

        Issue Links

          Activity

            People

              jay Jinshan Xiong (Inactive)
              ofaaland Olaf Faaland
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: