Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2365

ASSERTION( cfs_list_empty(&request->rq_list)

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • None
    • Lustre 2.4.0
    • Lustre 2.3.56-2chaos (github.com/chaos/lustre), includes new unstable-page limiting patches
    • 3
    • 5628

    Description

      Hit the following while running ior at large scale on Sequoia.

      98304 tasks, command line:

      ior -F -e -g -C -t 1m -b 512m -o /p/lsfull/morrone/f
      2012-11-20 13:17:16.901289 {DefaultControlEventListener} [mmcs]{623}.13.1: Lustre: lsfull-MDT0000-mdc-c0000003ea31d400: Connection to lsfull-MDT0000 (at 172.20.5.1@o2ib500) was
       lost; in progress operations using this service will wait for recovery to complete
      2012-11-20 13:17:16.948873 {DefaultControlEventListener} [mmcs]{623}.1.0: Lustre: lsfull-MDT0000-mdc-c0000003ea31d400: Connection restored to lsfull-MDT0000 (at 172.20.5.1@o2ib
      500)
      2012-11-20 13:17:17.841116 {DefaultControlEventListener} [mmcs]{623}.4.1: LustreError: 3722:0:(client.c:2250:__ptlrpc_free_req()) ASSERTION( cfs_list_empty(&request->rq_list) )
       failed: req c0000003edfad400
      2012-11-20 13:17:17.882162 {DefaultControlEventListener} [mmcs]{623}.4.1: LustreError: 3722:0:(client.c:2250:__ptlrpc_free_req()) LBUG
      2012-11-20 13:17:17.920585 {DefaultControlEventListener} [mmcs]{623}.4.1: Call Trace:
      2012-11-20 13:17:17.959831 {DefaultControlEventListener} [mmcs]{623}.4.1: [c0000003ee0338d0] [c000000000008160] .show_stack+0x7c/0x184 (unreliable)
      2012-11-20 13:17:17.998895 {DefaultControlEventListener} [mmcs]{623}.4.1: [c0000003ee033980] [8000000000a70cb8] .libcfs_debug_dumpstack+0xd8/0x150 [libcfs]
      2012-11-20 13:17:18.039425 {DefaultControlEventListener} [mmcs]{623}.4.1: [c0000003ee033a30] [8000000000a71480] .lbug_with_loc+0x50/0xc0 [libcfs]
      2012-11-20 13:17:18.042457 {DefaultControlEventListener} [mmcs]{623}.4.1: [c0000003ee033ac0] [8000000003a056c0] .__ptlrpc_req_finished+0x980/0xb60 [ptlrpc]
      2012-11-20 13:17:18.082353 {DefaultControlEventListener} [mmcs]{623}.4.1: [c0000003ee033b80] [8000000006a13ac4] .ll_fsync+0x4e4/0xc50 [lustre]
      2012-11-20 13:17:18.128703 {DefaultControlEventListener} [mmcs]{623}.4.1: [c0000003ee033c80] [c0000000000fc094] .vfs_fsync_range+0xb0/0x104
      2012-11-20 13:17:18.133202 {DefaultControlEventListener} [mmcs]{623}.4.1: [c0000003ee033d30] [c0000000000fc18c] .do_fsync+0x3c/0x6c
      2012-11-20 13:17:18.135223 {DefaultControlEventListener} [mmcs]{623}.4.1: [c0000003ee033dc0] [c0000000000fc1fc] .SyS_fsync+0x18/0x28
      2012-11-20 13:17:18.173757 {DefaultControlEventListener} [mmcs]{623}.4.1: [c0000003ee033e30] [c000000000000580] syscall_exit+0x0/0x2c
      2012-11-20 13:17:18.211524 {DefaultControlEventListener} [mmcs]{623}.4.1: Kernel panic - not syncing: LBUG
      2012-11-20 13:17:18.258280 {DefaultControlEventListener} [mmcs]{623}.4.1: Call Trace:
      2012-11-20 13:17:18.262113 {DefaultControlEventListener} [mmcs]{623}.4.1: [c0000003ee0338f0] [c000000000008160] .show_stack+0x7c/0x184 (unreliable)
      2012-11-20 13:17:18.263117 {DefaultControlEventListener} [mmcs]{623}.4.1: [c0000003ee0339a0] [c000000000432c0c] .panic+0x80/0x1a8
      2012-11-20 13:17:18.284729 {DefaultControlEventListener} [mmcs]{623}.4.1: [c0000003ee033a30] [8000000000a714e0] .lbug_with_loc+0xb0/0xc0 [libcfs]
      2012-11-20 13:17:18.285323 {DefaultControlEventListener} [mmcs]{623}.4.1: [c0000003ee033ac0] [8000000003a056c0] .__ptlrpc_req_finished+0x980/0xb60 [ptlrpc]
      2012-11-20 13:17:18.287243 {DefaultControlEventListener} [mmcs]{623}.4.1: [c0000003ee033b80] [8000000006a13ac4] .ll_fsync+0x4e4/0xc50 [lustre]
      2012-11-20 13:17:18.320182 {DefaultControlEventListener} [mmcs]{623}.4.1: [c0000003ee033c80] [c0000000000fc094] .vfs_fsync_range+0xb0/0x104
      2012-11-20 13:17:18.359030 {DefaultControlEventListener} [mmcs]{623}.4.1: [c0000003ee033d30] [c0000000000fc18c] .do_fsync+0x3c/0x6c
      2012-11-20 13:17:18.403154 {DefaultControlEventListener} [mmcs]{623}.4.1: [c0000003ee033dc0] [c0000000000fc1fc] .SyS_fsync+0x18/0x28
      2012-11-20 13:17:18.439477 {DefaultControlEventListener} [mmcs]{623}.4.1: [c0000003ee033e30] [c000000000000580] syscall_exit+0x0/0x2c
      

      Attachments

        Activity

          [LU-2365] ASSERTION( cfs_list_empty(&request->rq_list)
          jay Jinshan Xiong (Inactive) made changes -
          Comment [ I think this problem has been fixed by patch 9fb46705. The patch information is as follows:

          {code}
          9fb46705 (Andreas Dilger 2012-12-04 13:44:31 -0700 1805) spin_lock(&imp->imp_lock);
          9fb46705 (Andreas Dilger 2012-12-04 13:44:31 -0700 1806) /* Request already may be not on sending or delaying list. This
          9fb46705 (Andreas Dilger 2012-12-04 13:44:31 -0700 1807) * may happen in the case of marking it erroneous for the case
          9fb46705 (Andreas Dilger 2012-12-04 13:44:31 -0700 1808) * ptlrpc_import_delay_req(req, status) find it impossible to
          9fb46705 (Andreas Dilger 2012-12-04 13:44:31 -0700 1809) * allow sending this rpc and returns *status != 0. */
          9fb46705 (Andreas Dilger 2012-12-04 13:44:31 -0700 1810) if (!cfs_list_empty(&req->rq_list)) {
          9fb46705 (Andreas Dilger 2012-12-04 13:44:31 -0700 1811) cfs_list_del_init(&req->rq_list);
          9fb46705 (Andreas Dilger 2012-12-04 13:44:31 -0700 1812) cfs_atomic_dec(&imp->imp_inflight);
          9fb46705 (Andreas Dilger 2012-12-04 13:44:31 -0700 1813) }
          9fb46705 (Andreas Dilger 2012-12-04 13:44:31 -0700 1814) spin_unlock(&imp->imp_lock);
          {code} ]
          jay Jinshan Xiong (Inactive) made changes -
          Resolution New: Fixed [ 1 ]
          Status Original: Open [ 1 ] New: Resolved [ 5 ]
          jlevi Jodi Levi (Inactive) made changes -
          Affects Version/s New: Lustre 2.4.0 [ 10154 ]
          pjones Peter Jones made changes -
          Labels Original: topsequoia New: sequoia
          bzzz Alex Zhuravlev made changes -
          Assignee Original: Alex Zhuravlev [ bzzz ] New: Jinshan Xiong [ jay ]
          pjones Peter Jones made changes -
          Assignee Original: WC Triage [ wc-triage ] New: Alex Zhuravlev [ bzzz ]
          morrone Christopher Morrone (Inactive) made changes -
          Labels New: topsequoia
          morrone Christopher Morrone (Inactive) made changes -
          Environment Original: Lustre 2.5.56-2chaos (github.com/chaos/lustre), includes new unstable-page limiting patches New: Lustre 2.3.56-2chaos (github.com/chaos/lustre), includes new unstable-page limiting patches
          morrone Christopher Morrone (Inactive) created issue -

          People

            jay Jinshan Xiong (Inactive)
            morrone Christopher Morrone (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: