Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-19092

Crash in ll_release_user_pages in sanity-pcc test 40

Details

    • Bug
    • Resolution: Unresolved
    • Critical
    • Lustre 2.17.0
    • Lustre 2.17.0
    • None
    • 3
    • 9223372036854775807

    Description

      About two master landings ago when first bits of clio/dio from recent batch started to come in, a prominent sanity-pcc test 40 crash in ll_release_user_pages appeared, looks like this:

      First crash: https://testing.whamcloud.com/test_sets/dd2f3162-c2dc-40c7-a02a-d1bad78c95e1

      most recent crash as of the time of this ticket filing: https://testing.whamcloud.com/test_sets/98d8457e-5aeb-41d7-976a-a713ecb4ddf1

       

      [33457.292181] Lustre: DEBUG MARKER: dd if=/mnt/lustre/d40.sanity-pcc/f40.sanity-pcc of=/dev/null bs=1M count=1 iflag=direct
      [33457.400022] BUG: kernel NULL pointer dereference, address: 0000000000000000
      [33457.402182] #PF: supervisor read access in kernel mode
      [33457.403315] #PF: error_code(0x0000) - not-present page
      [33457.404463] PGD 0 P4D 0
      [33457.405240] Oops: 0000 [#1] PREEMPT SMP PTI
      [33457.417216] CPU: 1 PID: 1698 Comm: dd Kdump: loaded Tainted: G           OE      n 6.4.0-150600.23.50-default #1 SLE15-SP6 32013eadc71d652cb07a599d8a722b9604994156
      [33457.435656] Hardware name: Red Hat KVM, BIOS 1.16.0-4.module+el8.8.0+1454+0b2cbfb8 04/01/2014
      [33457.437334] RIP: 0010:ll_release_user_pages+0x15/0x100 [obdclass]
      [33457.438973] Code: 6d e5 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 83 fe 00 41 55 49 89 fd 41 54 55 53 74 5b 7e 4b <48> 8b 07 48 85 c0 74 43 48 8d 6f 08 8d 56 ff 4c 8d 64 d5 00 eb 12
      [33457.442598] RSP: 0018:ffffb6ea42cdf510 EFLAGS: 00010202
      [33457.443518] RAX: 0000000000000000 RBX: ffff89e303f33000 RCX: ffff0a00ffffff04
      [33457.444607] RDX: 0000000000000001 RSI: 000000006ea42ce0 RDI: 0000000000000000
      [33457.445870] RBP: ffff89e303f33000 R08: 0000000000000000 R09: 0000000000000151
      [33457.447596] R10: ffffb6ea42cdf560 R11: 0a2e676e696e6961 R12: ffff89e30307aab8
      [33457.449301] R13: 0000000000000000 R14: ffff89e303f33000 R15: ffff89e314aa8090
      [33457.451116] FS:  00007ff372aa3740(0000) GS:ffff89e3bcd00000(0000) knlGS:0000000000000000
      [33457.452927] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [33457.454459] CR2: 0000000000000000 CR3: 0000000008572002 CR4: 0000000000060ee0
      [33457.456278] Call Trace:
      [33457.457114]  <TASK>
      [33457.457825]  cl_sub_dio_end+0x226/0x490 [obdclass 236ee5bdfa9d6196309bc0286afb34df863838c8]
      [33457.460100]  ? __pfx_cl_sub_dio_end+0x10/0x10 [obdclass 236ee5bdfa9d6196309bc0286afb34df863838c8]
      [33457.462521]  __cl_sync_io_note+0x224/0x330 [obdclass 236ee5bdfa9d6196309bc0286afb34df863838c8]
      [33457.464029]  ll_direct_IO+0xa3a/0xdd0 [lustre e0f2add258d3842e2f4f396fc167c80b2708be3b]
      [33457.465457]  ? atime_needs_update+0xa3/0x110
      [33457.466166]  ? touch_atime+0x34/0x150
      [33457.466813]  generic_file_read_iter+0x87/0x120
      [33457.467613]  vvp_io_read_start+0x6c2/0x8a0 [lustre e0f2add258d3842e2f4f396fc167c80b2708be3b]
      [33457.468944]  cl_io_start+0x70/0x140 [obdclass 236ee5bdfa9d6196309bc0286afb34df863838c8]
      [33457.470261]  cl_io_loop+0x9e/0x230 [obdclass 236ee5bdfa9d6196309bc0286afb34df863838c8]
      [33457.471523]  ? ll_cl_add+0x95/0x100 [lustre e0f2add258d3842e2f4f396fc167c80b2708be3b]
      [33457.472749]  ll_file_io_generic+0xa20/0x10a0 [lustre e0f2add258d3842e2f4f396fc167c80b2708be3b]
      [33457.474068]  do_file_read_iter+0xd2c/0x1050 [lustre e0f2add258d3842e2f4f396fc167c80b2708be3b]
      [33457.475356]  __kernel_read+0xf0/0x280
      [33457.475982]  pcc_attach_data_archive+0x432/0xb70 [lustre e0f2add258d3842e2f4f396fc167c80b2708be3b]
      [33457.477314]  pcc_readonly_attach+0x4c0/0xd90 [lustre e0f2add258d3842e2f4f396fc167c80b2708be3b]
      [33457.478601]  ? pcc_readonly_attach_sync+0x1d3/0x2c0 [lustre e0f2add258d3842e2f4f396fc167c80b2708be3b]
      [33457.479956]  pcc_readonly_attach_sync+0x1d3/0x2c0 [lustre e0f2add258d3842e2f4f396fc167c80b2708be3b]
      [33457.481281]  pcc_file_open+0x9c4/0x1040 [lustre e0f2add258d3842e2f4f396fc167c80b2708be3b]
      [33457.482503]  ll_atomic_open+0x985/0x9e0 [lustre e0f2add258d3842e2f4f396fc167c80b2708be3b]
      [33457.483726]  ? __d_lookup+0x72/0xb0
      [33457.484295]  path_openat+0x644/0x1050
      [33457.484909]  do_filp_open+0xc5/0x140
      [33457.485531]  ? kmem_cache_alloc+0x18a/0x340
      [33457.486587]  ? getname_flags+0x46/0x1e0
      [33457.487635]  ? do_sys_openat2+0x248/0x320
      [33457.488522]  do_sys_openat2+0x248/0x320
      [33457.489562]  do_sys_open+0x57/0x80
      [33457.490500]  do_syscall_64+0x5b/0x80
      [33457.491287]  ? __count_memcg_events+0x46/0x90
      [33457.492327]  ? count_memcg_event_mm+0x3d/0x60
      [33457.493494]  ? handle_mm_fault+0x196/0x2f0
      [33457.494150]  ? do_user_addr_fault+0x267/0x890
      [33457.495103]  ? exc_page_fault+0x69/0x150
      [33457.496143]  entry_SYSCALL_64_after_hwframe+0x7c/0xe6
      [33457.497461] RIP: 0033:0x7ff37292017e
      [33457.498090] Code: 83 e2 40 75 4f 89 f0 f7 d0 a9 00 00 41 00 74 44 80 3d b5 d8 0e 00 00 74 68 89 da 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 8e 00 00 00 48 8b 54 24 28 64 48 2b 14 25

      I guess all the hits I saw come from SLES15 SP6 btw, hency why it's not showing up on regular reviews?

      Attachments

        Activity

          People

            qian_wc Qian Yingjin
            green Oleg Drokin
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: