Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16638

LustreError: 18531:0:(osc_object.c:410:osc_req_attr_set()) LBUG

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • Lustre 2.15.2
    • None
    • RHEL 9.0 client running 2.15.2 with tcp networking.
    • 3
    • 9223372036854775807

    Description

      We're seeing a regular crash on one of our clients that reexports a lustre volume via nfs to other clients. For a while I thought it was related to atime updates as we were seeing that in the stack trace, but it's still crashing in the same spot in osc_object after disabling atime.

      [238496.543455] LustreError: 18531:0:(osc_object.c:396:osc_req_attr_set()) page@00000000124db7f5[4 000000001cc24e6a 4 1 0000000000000000]
      [238496.543488] LustreError: 18531:0:(osc_object.c:396:osc_req_attr_set()) vvp-page@00000000644ae261(0:0) vm@0000000013d555b5 17ffffc0002001 4:0 ffff8b33354a0500 540 lru
      [238496.543514] LustreError: 18531:0:(osc_object.c:396:osc_req_attr_set()) lov-page@00000000fac88b5b
      [238496.543532] LustreError: 18531:0:(osc_object.c:396:osc_req_attr_set()) osc-page@00000000c5423838 540: 1< 0x845fed 1 + + > 2< 2211840 0 4096 0x7 0x9 | 0000000000000000 0000000032e9a87e 00000000b31fd886 > 3< 1 0 0 > 4< 0 0 8 156499967 - | - - - + > 5< - - - + | 0 - | 0 - ->
      [238496.543569] LustreError: 18531:0:(osc_object.c:396:osc_req_attr_set()) end page@00000000124db7f5
      [238496.543585] LustreError: 18531:0:(osc_object.c:396:osc_req_attr_set()) uncovered page!
      [238496.543598] LustreError: 18531:0:(ldlm_resource.c:1783:ldlm_resource_dump()) --- Resource: [0xd3409f:0x0:0x0].0x0 (000000004660d5d9) refcount = 3
      [238496.543618] LustreError: 18531:0:(ldlm_resource.c:1787:ldlm_resource_dump()) Granted locks (in reverse order):
      [238496.543635] LustreError: 18531:0:(ldlm_resource.c:1790:ldlm_resource_dump()) ### ### ns: work-OST0003-osc-ffff8b3d1067e800 lock: 00000000552d990c/0x2904edfb430539b2 lrc: 3/1,0 mode: PR/PR res: [0xd3409f:0x0:0x0].0x0 rrc: 4 type: EXT [0->2211839] (req 2146304->2211839) gid 0 flags: 0x800420400020000 nid: local remote: 0x27d356efda730f51 expref: -99 pid: 18893 timeout: 0 lvb_type: 1
      [238496.543687] LustreError: 18531:0:(ldlm_resource.c:1802:ldlm_resource_dump()) Waiting locks:
      [238496.543701] LustreError: 18531:0:(ldlm_resource.c:1804:ldlm_resource_dump()) ### ### ns: work-OST0003-osc-ffff8b3d1067e800 lock: 0000000049878f3e/0x2904edfb430539b9 lrc: 4/1,0 mode: --/PR res: [0xd3409f:0x0:0x0].0x0 rrc: 4 type: EXT [2211840->2277375] (req 2211840->2277375) gid 0 flags: 0x20000 nid: local remote: 0x27d356efda730f5f expref: -99 pid: 18894 timeout: 0 lvb_type: 1
      [238496.543746] Pid: 18531, comm: ptlrpcd_03_06 5.14.0-70.36.1.el9_0.x86_64 #1 SMP PREEMPT Thu Nov 24 11:28:21 EST 2022
      [238496.543762] Call Trace TBD:
      [238496.543767] LustreError: 18531:0:(osc_object.c:410:osc_req_attr_set()) LBUG
      [238496.543779] Pid: 18531, comm: ptlrpcd_03_06 5.14.0-70.36.1.el9_0.x86_64 #1 SMP PREEMPT Thu Nov 24 11:28:21 EST 2022
      [238496.543794] Call Trace TBD:
      [238496.543799] Kernel panic - not syncing: LBUG
      [238496.543807] CPU: 46 PID: 18531 Comm: ptlrpcd_03_06 Kdump: loaded Tainted: P           OE    --------- ---  5.14.0-70.36.1.el9_0.x86_64 #1
      [238496.543827] Hardware name: Supermicro AS -1114S-WN10RT/H12SSW-NTR, BIOS 2.3 12/03/2021
      [238496.543840] Call Trace:
      [238496.543848]  dump_stack_lvl+0x34/0x48
      [238496.543860]  panic+0x102/0x2d4
      [238496.543869]  lbug_with_loc.cold+0x18/0x18 [libcfs]
      [238496.543887]  osc_req_attr_set+0x32a/0x540 [osc]
      [238496.543905]  cl_req_attr_set+0x5e/0x160 [obdclass]
      [238496.543939]  osc_build_rpc+0x4a7/0x11f0 [osc]
      [238496.544421]  osc_send_read_rpc+0x6de/0x810 [osc]
      [238496.545787]  osc_check_rpcs+0x335/0x3c0 [osc]
      [238496.546230]  osc_io_unplug0+0x75/0x90 [osc]
      [238496.546662]  brw_queue_work+0x2f/0xd0 [osc]
      [238496.547086]  work_interpreter+0x32/0x170 [ptlrpc]
      [238496.547527]  ptlrpc_check_set+0x415/0x1ea0 [ptlrpc]
      [238496.547966]  ptlrpcd_check+0x3d0/0x5c0 [ptlrpc]
      [238496.548787]  ptlrpcd+0x20d/0x4a0 [ptlrpc]
      [238496.550000]  kthread+0x149/0x170
      [238496.550732]  ret_from_fork+0x22/0x30
      

      This crash is relatively new for us, we started to notice it after we switched from o2ib to tcp to address stability issues in our environment that we believe (we're still investigating) are related to rdma on rhel9 with omnipath.

      We have a couple vmcores from the crash kernel available if desired, however I'd rather not attach them here.

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              snehring Shane Nehring
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: