Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17585

OSS LBUG: LustreError: 46818:0:(ofd_io.c:1027:ofd_commitrw_read()) ASSERTION( ofd_object_exists(fo) ) failed

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • None
    • Lustre 2.15.4
    • None
    • 2.15.4 clients + servers, ldiskfs, CentOS 7.9 kernel 3.10.0-1160.108.1.el7_lustre.pl1.x86_64
    • 3
    • 9223372036854775807

    Description

      Hello, we hit our first LBUG with 2.15.4 in production with the following trace:

      [1928909.418709] Lustre: fir-OST0099: Bulk IO read error with 13b41e84-d040-4a81-8804-171135a5ca68 (at 10.51.14.12@o2ib3), client will retry: rc -110
      [1928909.431824] Lustre: Skipped 46 previous similar messages
      [1928922.657005] LustreError: 44052:0:(ldlm_lockd.c:261:expired_lock_main()) ### lock callback timer expired after 100s: evicting client at 10.51.10.38@o2ib3  ns: filter-fir-OST009f_UUID lock: ffff9e292b11e540/0xa96a836f66e96b33 lrc: 3/0,0 mode: PW/PW res: [0x2ac0000400:0x1e13a4b:0x0].0x0 rrc: 3 type: EXT [0->18446744073709551615] (req 4194304->4259839) gid 0 flags: 0x60000400020020 nid: 10.51.10.38@o2ib3 remote: 0xc0aaf00bc9c529b4 expref: 44 pid: 47188 timeout: 1929024 lvb_type: 0
      [1928922.699614] LustreError: 33995:0:(client.c:1256:ptlrpc_import_delay_req()) @@@ IMP_CLOSED  req@ffff9e215aab2400 x1789761432555520/t0(0) o105->fir-OST009f@10.51.10.38@o2ib3:15/16 lens 392/224 e 0 to 0 dl 0 ref 1 fl Rpc:QU/0/ffffffff rc 0/-1 job:''
      [1928923.056984] LustreError: 46818:0:(ldlm_lib.c:3538:target_bulk_io()) @@@ Eviction on bulk READ  req@ffff9e205b151f80 x1790434282424320/t0(0) o3->de1c7312-0728-4b26-8a91-192b15c87ece@10.51.10.38@o2ib3:9/0 lens 488/440 e 0 to 0 dl 1708737149 ref 1 fl Interpret:/0/0 rc 0/0 job:'41750242'
      [1928923.082396] LustreError: 46818:0:(ofd_io.c:1027:ofd_commitrw_read()) ASSERTION( ofd_object_exists(fo) ) failed: 
      [1928923.092895] LustreError: 46818:0:(ofd_io.c:1027:ofd_commitrw_read()) LBUG
      [1928923.099873] Pid: 46818, comm: ll_ost_io01_031 3.10.0-1160.108.1.el7_lustre.pl1.x86_64 #1 SMP Fri Jan 26 11:26:38 PST 2024
      [1928923.111132] Call Trace:
      [1928923.113781] [<0>] libcfs_call_trace+0x90/0xf0 [libcfs]
      [1928923.119107] [<0>] lbug_with_loc+0x4c/0xa0 [libcfs]
      [1928923.124079] [<0>] ofd_commitrw+0xd2a/0xd80 [ofd]
      [1928923.128901] [<0>] tgt_brw_read+0xa88/0x2030 [ptlrpc]
      [1928923.134058] [<0>] tgt_request_handle+0x93f/0x19d0 [ptlrpc]
      [1928923.139729] [<0>] ptlrpc_server_handle_request+0x253/0xc30 [ptlrpc]
      [1928923.146185] [<0>] ptlrpc_main+0xbf4/0x15e0 [ptlrpc]
      [1928923.151246] [<0>] kthread+0xd1/0xe0
      [1928923.154921] [<0>] ret_from_fork_nospec_begin+0x7/0x21
      [1928923.160178] [<0>] 0xfffffffffffffffe
      [1928923.163930] Kernel panic - not syncing: LBUG
      [1928923.168372] CPU: 26 PID: 46818 Comm: ll_ost_io01_031 Kdump: loaded Tainted: G           OE  ------------   3.10.0-1160.108.1.el7_lustre.pl1.x86_64 #1
      [1928923.181917] Hardware name: Dell Inc. PowerEdge R6525/07Y51T, BIOS 2.13.3 09/12/2023
      [1928923.189741] Call Trace:
      [1928923.192369]  [<ffffffff90db1bec>] dump_stack+0x19/0x1f
      [1928923.197681]  [<ffffffff90dab708>] panic+0xe8/0x21f
      [1928923.202836]  [<ffffffffc055a5eb>] lbug_with_loc+0x9b/0xa0 [libcfs]
      [1928923.209310]  [<ffffffffc185730a>] ofd_commitrw+0xd2a/0xd80 [ofd]
      [1928923.215493]  [<ffffffff906cc790>] ? wake_up_atomic_t+0x40/0x40
      [1928923.221520]  [<ffffffffc13c9818>] tgt_brw_read+0xa88/0x2030 [ptlrpc]
      [1928923.228053]  [<ffffffffc1358586>] ? ptl_send_buf+0x136/0x540 [ptlrpc]
      [1928923.234674]  [<ffffffffc13625a7>] ? lustre_msg_add_version+0x27/0xb0 [ptlrpc]
      [1928923.241987]  [<ffffffffc13628e2>] ? lustre_pack_reply_v2+0x142/0x2c0 [ptlrpc]
      [1928923.249302]  [<ffffffffc1362ad2>] ? lustre_pack_reply_flags+0x72/0x1f0 [ptlrpc]
      [1928923.256787]  [<ffffffffc1362c61>] ? lustre_pack_reply+0x11/0x20 [ptlrpc]
      [1928923.263672]  [<ffffffffc13c711f>] tgt_request_handle+0x93f/0x19d0 [ptlrpc]
      [1928923.270728]  [<ffffffffc13a87a5>] ? ptlrpc_nrs_req_get_nolock0+0xd5/0x170 [ptlrpc]
      [1928923.278463]  [<ffffffffc055703e>] ? ktime_get_real_seconds+0xe/0x20 [libcfs]
      [1928923.285694]  [<ffffffffc1372dc3>] ptlrpc_server_handle_request+0x253/0xc30 [ptlrpc]
      [1928923.293513]  [<ffffffff906d8990>] ? task_rq_unlock+0x20/0x20
      [1928923.299346]  [<ffffffff906d8f73>] ? __wake_up+0x13/0x20
      [1928923.304758]  [<ffffffffc1374a54>] ptlrpc_main+0xbf4/0x15e0 [ptlrpc]
      [1928923.311351]  [<ffffffffc1373e60>] ? ptlrpc_wait_event+0x5d0/0x5d0 [ptlrpc]
      [1928923.318550]  [<ffffffff906cb621>] kthread+0xd1/0xe0
      [1928923.323602]  [<ffffffff906cb550>] ? insert_kthread_work+0x40/0x40
      [1928923.329868]  [<ffffffff90dc51dd>] ret_from_fork_nospec_begin+0x7/0x21
      [1928923.336482]  [<ffffffff906cb550>] ? insert_kthread_work+0x40/0x40
      

      Attaching vmcore-dmesg as fir-io8-s2_2024-02-23_17-11-22_vmcore-dmesg.txt and the output of "foreach bt" as fir-io8-s2_2024-02-23_17-11-22_foreach_bt.txt

      The vmcore can be made available upon request.

      Any ideas? Thanks!

      Attachments

        Issue Links

          Activity

            People

              bzzz Alex Zhuravlev
              sthiell Stephane Thiell
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: