Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-20276

sanityn test_70b: OOM crash for osd-zfs with ZFS 2.4.0

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Medium
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Andreas Dilger <adilger@dilger.ca>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/8224a976-9a59-4610-9aca-194e28a25c72

      test_70b failed with the following error on the OSS:

      trevis-156vm93 crashed during sanityn test_70b
      

      Test session details:
      clients: https://build.whamcloud.com/job/lustre-reviews/125152 - 4.18.0-553.117.1.el8_10.x86_64
      servers: https://build.whamcloud.com/job/lustre-reviews/125152 - 4.18.0-553.117.1.el8_lustre.x86_64

      It looks like the update to ZFS 2.4.0 has brought with it some OOM crashes during testing.

      This has hit 8x after 2026-05-11 when patch 65577 landed (1/41 ZFS test runs):
      https://testing.whamcloud.com/search?horizon=2332800&status%5B%5D=CRASH&test_set_script_id=570ba67a-4a46-11e0-a7f6-52540025f9af&sub_test_script_id=52c5a802-7bc4-11e2-8242-52540035b04c&source=sub_tests#redirect

      Lustre: DEBUG MARKER: == sanityn test 70b: remove files after calling rm_entry === 20:53:09 (1778964789)
      obd_memory max: 166597799, obd_memory current: 145190199
      ll_ost00_030 invoked oom-killer: gfp_mask=0x6042c0(GFP_KERNEL|__GFP_NOWARN|__GFP_COMP), order=0, oom_score_adj=0
      CPU: 1 PID: 37734 Comm: ll_ost00_030  4.18.0-553.117.1.el8_lustre.x86_64 #1
      Hardware name: Red Hat KVM/RHEL, BIOS 1.16.3-4.el9 04/01/2014
      Call Trace:
        dump_stack+0x41/0x60
        dump_header+0x4a/0x1df
        out_of_memory.cold.36+0xa/0x7e
        __alloc_pages_slowpath+0xbf0/0xcd0
        __alloc_pages_nodemask+0x2e2/0x330
        new_slab+0x3f4/0x4f0
        ___slab_alloc+0x3a3/0x950
        kmem_cache_alloc+0x252/0x280
        spl_kmem_cache_alloc+0x6f/0x630 [spl]
        dbuf_dirty+0x107/0x930 [zfs]
        dnode_dirty_l1+0x31/0x50 [zfs]
        dnode_free_range+0x45a/0x600 [zfs]
        dmu_free_long_range+0x39d/0x4e0 [zfs]
        osd_unlinked_object_free+0x44/0x3a0 [osd_zfs]
        osd_unlinked_list_emptify+0xaa/0xb0 [osd_zfs]
        osd_trans_stop+0x3c3/0x530 [osd_zfs]
        ofd_destroy+0x56b/0xd40 [ofd]
        ofd_destroy_by_fid+0x3c2/0x570 [ofd]
        ofd_destroy_hdl+0x235/0x8d0 [ofd]
        tgt_request_handle+0x403/0x1d60 [ptlrpc]
        ptlrpc_server_handle_request+0x2ca/0xd70 [ptlrpc]
        ptlrpc_main+0xb90/0x1450 [ptlrpc]
        kthread+0x134/0x150
      

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      sanityn test_70b - trevis-156vm93 crashed during sanityn test_70b

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: