Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16263

crash in sanity 273b osc_page_delete LBUG

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.16.0
    • Lustre 2.16.0
    • None
    • 3
    • 9223372036854775807

    Description

      For the master next on Oct 10 suddenly this crash came out and seems to repeat regularly though not very often since then.

      [338005.158491] Lustre: DEBUG MARKER: == sanity test 273b: DoM: race writeback and object destroy ===================== 18:58:27 (1666652307)
      [338005.392174] LustreError: 17734:0:(osc_cache.c:2484:osc_teardown_async_page()) extent ffff8800400af8e8@{[0 -> 255/1023], [2|0|-|cache|wi|ffff8800badddb28], [1073152|256|+|-|ffff8802276bfc00|1024|          (null)]} trunc at 0.
      [338005.394627] LustreError: 17734:0:(osc_cache.c:2484:osc_teardown_async_page()) ### extent: ffff8800400af8e8 ns: lustre-OST0003-osc-ffff8803228cca88 lock: ffff8802276bfc00/0xd5e49483fedfcc95 lrc: 2/0,0 mode: PW/PW res: [0x51a7a:0x0:0x0].0x0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->1048575) gid 0 flags: 0x800020000000000 nid: local remote: 0xd5e49483fedfcc9c expref: -99 pid: 17734 timeout: 0 lvb_type: 1
      [338005.397689] LustreError: 17734:0:(osc_page.c:174:osc_page_delete()) page@ffff8802b0a34e18[2 ffff880309e4ec80 5 1           (null)]
      [338005.399302] LustreError: 17734:0:(osc_page.c:174:osc_page_delete()) vmpage @ffffea0005eb7f80 2fffff0000083d 3:0 ffff8802b0a34e18 256 lru
      [338005.400969] LustreError: 17734:0:(osc_page.c:174:osc_page_delete()) osc-page@ffff8802b0a34e78 0: 1< 2 + - > 2< 0 0 4096 0x0 0x40420 |           (null) ffff88028f868710 ffff8800badddb28 > 3< 0 0 > 4< 0 0 8 97349632 - | - - + - > 5< - - + - | 0 - | 256 - ->
      [338005.403820] LustreError: 17734:0:(osc_page.c:174:osc_page_delete()) end page@ffff8802b0a34e18
      [338005.405381] LustreError: 17734:0:(osc_page.c:174:osc_page_delete()) Trying to teardown failed: -16
      [338005.406554] LustreError: 17734:0:(osc_page.c:175:osc_page_delete()) ASSERTION( 0 ) failed: 
      [338005.407645] LustreError: 17734:0:(osc_page.c:175:osc_page_delete()) LBUG
      [338005.408234] Pid: 17734, comm: multiop 3.10.0-7.9-debug #2 SMP Tue Feb 1 18:17:58 EST 2022
      [338005.409142] Call Trace:
      [338005.409630] [<0>] libcfs_call_trace+0x90/0xf0 [libcfs]
      [338005.410130] [<0>] lbug_with_loc+0x4c/0xa0 [libcfs]
      [338005.410607] [<0>] osc_page_delete+0x48d/0x500 [osc]
      [338005.411107] [<0>] __cl_page_delete+0x82/0x320 [obdclass]
      [338005.411593] [<0>] cl_page_delete+0x33/0x110 [obdclass]
      [338005.412078] [<0>] ll_invalidatepage+0x7f/0x170 [lustre]
      [338005.412568] [<0>] do_invalidatepage_range+0x71/0x80
      [338005.413050] [<0>] truncate_inode_page+0x77/0x80
      [338005.413652] [<0>] truncate_inode_pages_range+0x1ea/0x7d0
      [338005.414215] [<0>] truncate_inode_pages_final+0x4c/0x60
      [338005.414791] [<0>] ll_truncate_inode_pages_final+0x21/0xe0 [lustre][338005.415385] [<0>] ll_delete_inode+0x38/0x150 [lustre]
      [338005.415929] [<0>] evict+0xaf/0x180
      [338005.416481] [<0>] iput+0xf5/0x180
      [338005.416977] [<0>] __dentry_kill+0x148/0x1b0
      [338005.417523] [<0>] dput+0xca/0x1cc
      [338005.418054] [<0>] __fput+0x1a0/0x240
      [338005.418624] [<0>] ____fput+0xe/0x10
      [338005.419155] [<0>] task_work_run+0xb5/0xf0
      [338005.419584] [<0>] do_notify_resume+0x92/0xb0
      [338005.420091] [<0>] int_signal+0x12/0x17
      [338005.420650] [<0>] 0xfffffffffffffffe
      [338005.421218] Kernel panic - not syncing: LBUG
      [338005.421744] CPU: 1 PID: 17734 Comm: multiop Kdump: loaded Tainted: P        W  OE  ------------   3.10.0-7.9-debug #2
      [338005.422211] Hardware name: Red Hat KVM, BIOS 1.15.0-1.module_el8.6.0+1087+b42c8331 04/01/2014
      [338005.422211] Call Trace:
      [338005.422211]  [<ffffffff817d93f8>] dump_stack+0x19/0x1b
      [338005.422211]  [<ffffffff817d24d5>] panic+0xe8/0x20d
      [338005.422211]  [<ffffffff817e324e>] ? _raw_spin_unlock+0xe/0x20
      [338005.422211]  [<ffffffffa022c56b>] lbug_with_loc+0x9b/0xa0 [libcfs][338005.422211]  [<ffffffffa08d22dd>] osc_page_delete+0x48d/0x500 [osc][338005.422211]  [<ffffffffa03e2612>] __cl_page_delete+0x82/0x320 [obdclass][338005.422211]  [<ffffffffa03e28e3>] cl_page_delete+0x33/0x110 [obdclass][338005.422211]  [<ffffffffa171b30f>] ll_invalidatepage+0x7f/0x170 [lustre][338005.422211]  [<ffffffff811c5b41>] do_invalidatepage_range+0x71/0x80
      [338005.422211]  [<ffffffff811c5be7>] truncate_inode_page+0x77/0x80
      [338005.422211]  [<ffffffff811c5e1a>] truncate_inode_pages_range+0x1ea/0x7d0
      [338005.422211]  [<ffffffff811c646c>] truncate_inode_pages_final+0x4c/0x60
      [338005.422211]  [<ffffffffa16f5121>] ll_truncate_inode_pages_final+0x21/0xe0 [lustre]
      [338005.422211]  [<ffffffffa16f5648>] ll_delete_inode+0x38/0x150 [lustre]
      [338005.422211]  [<ffffffff81264a2f>] evict+0xaf/0x180
      [338005.422211]  [<ffffffff81264e65>] iput+0xf5/0x180
      [338005.422211]  [<ffffffff8125ff68>] __dentry_kill+0x148/0x1b0
      [338005.422211]  [<ffffffff8126072a>] dput+0xca/0x1c0
      [338005.422211]  [<ffffffff81248000>] __fput+0x1a0/0x240
      [338005.422211]  [<ffffffff8124817e>] ____fput+0xe/0x10
      [338005.422211]  [<ffffffff810b69b5>] task_work_run+0xb5/0xf0
      [338005.422211]  [<ffffffff8102ccb2>] do_notify_resume+0x92/0xb0
      [338005.422211]  [<ffffffff817ee363>] int_signal+0x12/0x17 
      

      Always test 273b

      https://knox.linuxhacker.ru/crashdb_ui_external.py.cgi?newid=67744 tracks all such crashes

      the list of patches in that master-next release:

      a99fcef712 LU-15721 llite: only statfs for projid if PROJINHERIT set
      4f3e709b69 LU-16219 tests: syntax error fix
      c90e7da475 LU-16198 tests: increase margin for sanity/33hh
      fe12cf7af6 LU-16200 tests: test_32[f,g]: specify blocksize explicitly
      e0a8f3f60d LU-16180 ptlrpc: reduce lock contention in ptlrpc_free_committed
      873276be1c LU-16076 utils: enhance 'lfs check' command
      9f0b2c2e23 LU-16044 osd: discard pagecache in truncate's declaration
      90307c2721 LU-15451 sec: retry ro mount if read-only flag set
      b871286a4e LU-15619 osc: Remove oap lock
      b918031786 LU-15014 osc: Fix possible null pointer
      98c70df982 LU-13364 utils: fix bad output for lnetctl import --show
      1c095e3a80 LU-14165 utils: llog_reader: display changleog_user records
      b89c2797eb LU-16139 statahead: avoid to block ptlrpcd interpret context
      ff59bd3a4a LU-6142 obdclass: change some foo0() to __foo()
      0572fc241a LU-10391 lnet: support IPv6 in lnet_inet_enumerate()
      5f5523beb1 LU-16002 ptlrpc: reduce pinger eviction time
      a1dfff48db LU-16211 o2iblnd: Avoid NULL md deref
      3bb1366fe8 LU-16046 ldlm: group lock fix
      f79107cf9a LU-16046 revert: "LU-9964 llite: prevent mulitple group locks" 

      I tried excluding the LU-15619, LU-15014 and LU-16139 and it did not help and the rest do not appear to be related.

      Attachments

        Issue Links

          Activity

            People

              bobijam Zhenyu Xu
              green Oleg Drokin
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: