Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14389

crash in lov_delete_composite() with racer+migrate

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.14.0
    • None
    • 3
    • 9223372036854775807

    Description

      Running racer with patch https://review.whamcloud.com/13669 "LU-7073 tests: Add file migration to racer" causes repeated client crashes:
      https://testing-archive.whamcloud.com/gerrit-janitor/13887/testresults/racer-special4-ldiskfs-centos7_x86_64-centos7_x86_64/
      https://testing-archive.whamcloud.com/gerrit-janitor/13887/testresults/racer-special6-ldiskfs-DNE-centos7_x86_64-centos7_x86_64/
      https://testing-archive.whamcloud.com/gerrit-janitor/13887/testresults/racer-special10-ldiskfs-centos7_x86_64-centos7_x86_64/
      https://testing-archive.whamcloud.com/gerrit-janitor/13887/testresults/racer-special8-ldiskfs-centos7_x86_64-centos7_x86_64/

      It looks like there is an error in parsing the layout in lov_init_composite(), then the caller lov_layout_change() tries to clean up after the error and calls lov_delete_composite() and crashes:

      [  385.587338] LustreError: 20227:0:(lov_object.c:680:lov_init_composite()) lustre-clilov-ffff8800d65a3000: DOM entries with different sizes
      [  385.590193] LustreError: 20227:0:(lov_ea.c:617:dump_lsm()) lsm ffff8800c2482280, objid 0x0:0, maxbytes 0x400000fe000, magic 0x0BD60BD0, refc: 2, entry: 4, layout_gen 4
      [  385.592938] LustreError: 20227:0:(lov_ea.c:639:dump_lsm()) [0x0, 0x80000): id: 65537, flags: 10, magic 0x0BD10BD0, layout_gen 0, stripe count 0, sstripe size 524288, pool: []
      [  385.596090] LustreError: 20227:0:(lov_ea.c:639:dump_lsm()) [0x80000, 0xffffffffffffffff): id: 65538, flags: 10, magic 0x0BD10BD0, layout_gen 0, stripe count 1, sstripe size 1048576, pool: []
      [  385.602686] LustreError: 20227:0:(lov_ea.c:649:dump_lsm())    oinfo:ffff8800c2482180: ostid: 0x0:1893 ost idx: 1 gen: 0
      [  385.607168] LustreError: 20227:0:(lov_ea.c:639:dump_lsm()) [0x0, 0x100000): id: 131073, flags: 11, magic 0x0BD10BD0, layout_gen 0, stripe count 0, sstripe size 1048576, pool: []
      [  385.610768] LustreError: 20227:0:(lov_ea.c:639:dump_lsm()) [0x100000, 0xffffffffffffffff): id: 131074, flags: 0, magic 0x0BD10BD0, layout_gen 65535, stripe count 1, sstripe size 1048576, pool: []
      [  385.615671] LustreError: 20227:0:(lov_object.c:1298:lov_layout_change()) lustre-clilov-ffff8800d65a3000: cannot apply new layout on [0x200000402:0x3e6a:0x0] : rc = -22
      [  385.615671] LustreError: 20227:0:(lov_object.c:1298:lov_layout_change()) lustre-clilov-ffff8800d65a3000: cannot apply new layout on [0x200000402:0x3e6a:0x0] : rc = -22
      [  385.620214] BUG: unable to handle kernel NULL pointer dereference at 0000000000000014
      [  385.622675] IP: [<ffffffffa08baef4>] lov_delete_composite+0x104/0x540 [lov]
      [  385.627878] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
      [  385.642379] CPU: 1 PID: 20227 Comm: ln Kdump: loaded 
      [  [  385.649327] RIP: 0010:[<ffffffffa08baef4>] lov_delete_composite+0x104/0x540 [lov]
      [  385.669870] Call Trace:
      [  385.670394]  [<ffffffffa08bf96a>] lov_conf_set+0x8ca/0xaa0 [lov]
      [  385.672880]  [<ffffffffa0333950>] cl_conf_set+0x60/0x120 [obdclass]
      [  385.675008]  [<ffffffffa0de6a9b>] cl_file_inode_init+0x12b/0x390 [lustre]
      [  385.677377]  [<ffffffffa0dbaae5>] ll_update_inode+0x365/0x670 [lustre]
      [  385.688379]  [<ffffffffa0dcdef3>] ll_iget+0x253/0x350 [lustre]
      [  385.689648]  [<ffffffffa0dbf90d>] ll_prep_inode+0x20d/0x9b0 [lustre]
      [  385.697886]  [<ffffffffa0dce90c>] ll_lookup_it_finish.isra.24+0xbc/0xe60 [lustre]
      [  385.702800]  [<ffffffffa0dd001b>] ll_lookup_it.constprop.26+0x96b/0x1400 [lustre]
      [  385.705598]  [<ffffffffa0dd0b97>] ll_lookup_nd+0xe7/0x1c0 [lustre]
      [  385.706979]  [<ffffffff8124f2dd>] lookup_real+0x1d/0x50
      [  385.708098]  [<ffffffff8124fdc2>] __lookup_hash+0x42/0x60
      [  385.709437]  [<ffffffff817d5ff3>] lookup_slow+0x42/0xa7
      [  385.710981]  [<ffffffff8125565e>] path_lookupat+0x89e/0x8d0
      [  385.714547]  [<ffffffff812556bb>] filename_lookup+0x2b/0xc0
      [  385.715670]  [<ffffffff812575b7>] user_path_at_empty+0x67/0xc0
      [  385.717213]  [<ffffffff81257621>] user_path_at+0x11/0x20
      [  385.718617]  [<ffffffff81249df3>] vfs_fstatat+0x63/0xc0
      

      Attachments

        Issue Links

          Activity

            People

              adilger Andreas Dilger
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: