Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8460

(osc_cache.c:658:osc_extent_find()) ASSERTION( (max_pages & ~chunk_mask) == 0 ) failed: LBUG

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.9.0
    • None
    • 3
    • 9223372036854775807

    Description

      As part of CORAL testing I was trying to use ior to to write/read some 16MB RPCs. As soon as the test started it appeared all the Lustre clients took an LBUG, crashed, and rebooted.

      [263032.826900] LustreError: 54616:0:(mgc_request.c:1536:mgc_apply_recover_logs()) mgc: cannot find uuid by nid 192.168.1.4@o2ib
      [263032.839625] Lustre: 54616:0:(mgc_request.c:1756:mgc_process_recover_nodemap_log()) MGC192.168.1.5@o2ib: error processing recovery log lsdraid-cliir: rc = -2
      [263032.940768] Lustre: Mounted lsdraid-client
      [270650.452658] LustreError: 64198:0:(osc_cache.c:658:osc_extent_find()) ASSERTION( (max_pages & ~chunk_mask) == 0 ) failed: 
      [270650.465098] LustreError: 64198:0:(osc_cache.c:658:osc_extent_find()) LBUG
      [270650.472830] Pid: 64198, comm: ior
      [270650.476666] 
      Call Trace:
      [270650.482816]  [<ffffffffa08267d3>] libcfs_debug_dumpstack+0x53/0x80 [libcfs]
      [270650.492278]  [<ffffffffa0826d75>] lbug_with_loc+0x45/0xc0 [libcfs]
      [270650.502128]  [<ffffffffa0dc066d>] osc_extent_find+0x2ec/0x1a98 [osc]
      [270650.510720]  [<ffffffff81189b79>] ? zone_statistics+0x89/0xa0
      [270650.518827]  [<ffffffffa0dc2354>] ? osc_enter_cache+0x53b/0xe7a [osc]
      [270650.527638]  [<ffffffff810b5f8d>] ? ttwu_do_activate.constprop.84+0x5d/0x70
      [270650.537084]  [<ffffffff81173317>] ? __alloc_pages_nodemask+0x197/0xba0
      [270650.545977]  [<ffffffff81285e38>] ? security_capable+0x18/0x20
      [270650.554180]  [<ffffffffa0db8705>] osc_queue_async_io+0x6a5/0x1870 [osc]
      [270650.563226]  [<ffffffffa0d2aa70>] ? write_commit_callback+0x0/0x50 [lustre]
      [270650.572635]  [<ffffffffa0da55e3>] osc_page_cache_add+0x43/0x140 [osc]
      [270650.581481]  [<ffffffffa0dad903>] osc_io_commit_async+0x173/0x440 [osc]
      [270650.590561]  [<ffffffffa0d2aa70>] ? write_commit_callback+0x0/0x50 [lustre]
      [270650.600035]  [<ffffffffa0959317>] cl_io_commit_async+0x77/0x140 [obdclass]
      [270650.609317]  [<ffffffffa0c8654c>] lov_io_commit_async+0x2dc/0x4a0 [lov]
      [270650.618397]  [<ffffffffa0d2aa70>] ? write_commit_callback+0x0/0x50 [lustre]
      [270650.627864]  [<ffffffffa0d2aa70>] ? write_commit_callback+0x0/0x50 [lustre]
      [270650.637324]  [<ffffffffa0959317>] cl_io_commit_async+0x77/0x140 [obdclass]
      [270650.646690]  [<ffffffffa0d2d97a>] vvp_io_write_commit+0x17a/0x8d0 [lustre]
      [270650.655374]  [<ffffffffa0d1b807>] ll_write_end+0xc7/0x400 [lustre]
      [270650.663281]  [<ffffffff8116a214>] generic_file_buffered_write+0x184/0x290
      [270650.671879]  [<ffffffff8116b6d5>] __generic_file_aio_write+0x1d5/0x3e0
      [270650.680178]  [<ffffffffa0d2e355>] vvp_io_write_start+0x285/0x650 [lustre]
      [270650.688744]  [<ffffffffa0958925>] cl_io_start+0x65/0x130 [obdclass]
      [270650.696904]  [<ffffffffa095acc5>] cl_io_loop+0xa5/0x190 [obdclass]
      [270650.704945]  [<ffffffffa0cdbbef>] ll_file_io_generic+0x67f/0xaa0 [lustre]
      [270650.713656]  [<ffffffff811d4256>] ? mem_cgroup_update_page_stat+0x16/0x50
      [270650.722373]  [<ffffffffa0cdc2dd>] ll_file_aio_write+0x12d/0x1f0 [lustre]
      [270650.731000]  [<ffffffffa0cdc46e>] ll_file_write+0xce/0x1e0 [lustre]
      [270650.739126]  [<ffffffff811dec1d>] vfs_write+0xbd/0x1e0
      [270650.745994]  [<ffffffff811df6bf>] SyS_write+0x7f/0xe0
      [270650.752766]  [<ffffffff81646889>] system_call_fastpath+0x16/0x1b
      [270650.760571] 
      [270650.763508] Kernel panic - not syncing: LBUG
      [270650.769875] CPU: 17 PID: 64198 Comm: ior Tainted: G          IOE  ------------   3.10.0-327.22.2.el7.x86_64 #1
      [270650.782689] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS SE5C610.86B.01.01.0008.021120151325 02/11/2015
      [270650.796038]  ffffffffa0843def 0000000067048a74 ffff88103fd9f560 ffffffff816360fc
      [270650.806102]  ffff88103fd9f5e0 ffffffff8162f977 ffffffff00000008 ffff88103fd9f5f0
      [270650.816170]  ffff88103fd9f590 0000000067048a74 ffffffffa0dc63b0 0000000000000000
      [270650.826189] Call Trace:
      [270650.830566]  [<ffffffff816360fc>] dump_stack+0x19/0x1b
      [270650.837897]  [<ffffffff8162f977>] panic+0xd8/0x1e7
      [270650.844870]  [<ffffffffa0826ddb>] lbug_with_loc+0xab/0xc0 [libcfs]
      [270650.853345]  [<ffffffffa0dc066d>] osc_extent_find+0x2ec/0x1a98 [osc]
      [270650.862000]  [<ffffffff81189b79>] ? zone_statistics+0x89/0xa0
      [270650.869976]  [<ffffffffa0dc2354>] ? osc_enter_cache+0x53b/0xe7a [osc]
      [270650.878736]  [<ffffffff810b5f8d>] ? ttwu_do_activate.constprop.84+0x5d/0x70
      [270650.888101]  [<ffffffff81173317>] ? __alloc_pages_nodemask+0x197/0xba0
      [270650.896965]  [<ffffffff81285e38>] ? security_capable+0x18/0x20
      [270650.905069]  [<ffffffffa0db8705>] osc_queue_async_io+0x6a5/0x1870 [osc]
      [270650.914081]  [<ffffffffa0d2aa70>] ? mkwrite_commit_callback+0x20/0x20 [lustre]
      [270650.923740]  [<ffffffffa0da55e3>] osc_page_cache_add+0x43/0x140 [osc]
      [270650.932528]  [<ffffffffa0dad903>] osc_io_commit_async+0x173/0x440 [osc]
      [270650.941515]  [<ffffffffa0d2aa70>] ? mkwrite_commit_callback+0x20/0x20 [lustre]
      [270650.951198]  [<ffffffffa0959317>] cl_io_commit_async+0x77/0x140 [obdclass]
      [270650.960478]  [<ffffffffa0c8654c>] lov_io_commit_async+0x2dc/0x4a0 [lov]
      [270650.969481]  [<ffffffffa0d2aa70>] ? mkwrite_commit_callback+0x20/0x20 [lustre]
      [270650.979160]  [<ffffffffa0d2aa70>] ? mkwrite_commit_callback+0x20/0x20 [lustre]
      [270650.989192]  [<ffffffffa0959317>] cl_io_commit_async+0x77/0x140 [obdclass]
      [270650.998797]  [<ffffffffa0d2d97a>] vvp_io_write_commit+0x17a/0x8d0 [lustre]
      [270651.008434]  [<ffffffffa0d1b807>] ll_write_end+0xc7/0x400 [lustre]
      [270651.017239]  [<ffffffff8116a214>] generic_file_buffered_write+0x184/0x290
      [270651.026725]  [<ffffffff8116b6d5>] __generic_file_aio_write+0x1d5/0x3e0
      [270651.035938]  [<ffffffffa0d2e355>] vvp_io_write_start+0x285/0x650 [lustre]
      [270651.045484]  [<ffffffffa0958925>] cl_io_start+0x65/0x130 [obdclass]
      [270651.054443]  [<ffffffffa095acc5>] cl_io_loop+0xa5/0x190 [obdclass]
      [270651.063300]  [<ffffffffa0cdbbef>] ll_file_io_generic+0x67f/0xaa0 [lustre]
      [270651.072803]  [<ffffffff811d4256>] ? mem_cgroup_update_page_stat+0x16/0x50
      [270651.082339]  [<ffffffffa0cdc2dd>] ll_file_aio_write+0x12d/0x1f0 [lustre]
      [270651.091766]  [<ffffffffa0cdc46e>] ll_file_write+0xce/0x1e0 [lustre]
      [270651.100727]  [<ffffffff811dec1d>] vfs_write+0xbd/0x1e0
      [270651.108378]  [<ffffffff811df6bf>] SyS_write+0x7f/0xe0
      [270651.115918]  [<ffffffff81646889>] system_call_fastpath+0x16/0x1b 
      

      Here is what IOR saw:

      IOR-3.0.1: MPI Coordinated Test of Parallel I/O
      
      Began: Mon Aug  1 18:28:46 2016
      Command line used: /home/johnsali/wolf-3/ior/src/ior -i 5 -v -b 4096M -t 16m
      Machine: Linux wolf-6.wolf.hpdd.intel.com
      Start time skew across all tasks: 0.00 sec
      
      Test 0 started: Mon Aug  1 18:28:46 2016
      Path: /mnt/lustre
      FS: 13.9 TiB   Used FS: 0.0%   Inodes: 4.2 Mi   Used Inodes: 0.0%
      Participating tasks: 3
      Summary:
      	api                = POSIX
      	test filename      = testFile
      	access             = single-shared-file
      	pattern            = segmented (1 segment)
      	ordering in a file = sequential offsets
      	ordering inter file= no tasks offsets
      	clients            = 3 (1 per node)
      	repetitions        = 5
      	xfersize           = 16 MiB
      	blocksize          = 4 GiB
      	aggregate filesize = 12 GiB
      
      access    bw(MiB/s)  block(KiB) xfer(KiB)  open(s)    wr/rd(s)   close(s)   total(s)   iter
      ------    ---------  ---------- ---------  --------   --------   --------   --------   ----
      Commencing write performance test: Mon Aug  1 18:28:46 2016
      
      Message from syslogd@wolf-6 at Aug  1 18:28:48 ...
       kernel:LustreError: 64198:0:(osc_cache.c:658:osc_extent_find()) ASSERTION( (max_pages & ~chunk_mask) == 0 ) failed: 
      
      Message from syslogd@wolf-6 at Aug  1 18:28:48 ...
       kernel:LustreError: 64198:0:(osc_cache.c:658:osc_extent_find()) LBUG
      

      Attempts to change the max_pages_per_rpc have not been successful:

      [root@wolf-8 ~]#  lctl set_param osc.*.max_pages_per_rpc=1024
      error: set_param: setting /proc/fs/lustre/osc/lsdraid-OST0000-osc-ffff88104f0c1000/max_pages_per_rpc=1024: Numerical result out of range 
      

      System crash files can be found at (all clients hit the LBUG/crash/reboot at the same time):
      /scratch/dumps/wolf-6.wolf.hpdd.intel.com/10.8.1.6-2016-08-01-18:29:12
      /scratch/dumps/wolf-7.wolf.hpdd.intel.com/10.8.1.7-2016-08-01-18:29:10
      /scratch/dumps/wolf-8.wolf.hpdd.intel.com/10.8.1.8-2016-08-01-18:29:11

      Attachments

        Activity

          [LU-8460] (osc_cache.c:658:osc_extent_find()) ASSERTION( (max_pages & ~chunk_mask) == 0 ) failed: LBUG
          pjones Peter Jones added a comment -

          Landed for 2.9

          pjones Peter Jones added a comment - Landed for 2.9

          Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/21825/
          Subject: LU-8460 osc: max_pages_per_rpc should be chunk size aligned
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 13834f5aeef42d3c358574ac59475c0758dce300

          gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/21825/ Subject: LU-8460 osc: max_pages_per_rpc should be chunk size aligned Project: fs/lustre-release Branch: master Current Patch Set: Commit: 13834f5aeef42d3c358574ac59475c0758dce300

          It seems that it would be possible to test this with a ZFS OST with recordsize=1024k and then manually setting brw_size=512K on an unpatched OST, and verifying that the client no longer crashes. Similarly, testing that it isn't possible to set brw_size smaller than recordsize on a patched OST would be useful, including changing recordsize after mount.

          adilger Andreas Dilger added a comment - It seems that it would be possible to test this with a ZFS OST with recordsize=1024k and then manually setting brw_size=512K on an unpatched OST, and verifying that the client no longer crashes. Similarly, testing that it isn't possible to set brw_size smaller than recordsize on a patched OST would be useful, including changing recordsize after mount.

          Currently this bug is blocking our progress: https://jira.hpdd.intel.com/browse/LU-8498

          jsalians_intel John Salinas (Inactive) added a comment - Currently this bug is blocking our progress: https://jira.hpdd.intel.com/browse/LU-8498

          jsalians_intel can you please verify that patch 21825 actually fixes the problem?

          jay Jinshan Xiong (Inactive) added a comment - jsalians_intel can you please verify that patch 21825 actually fixes the problem?

          Bobi Jam (bobijam@hotmail.com) uploaded a new patch: http://review.whamcloud.com/21825
          Subject: LU-8460 osc: chunk size should not bigger than RPC size
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: bde1f813f387f8e15492a864e8c3a78eb0eb3ff4

          gerrit Gerrit Updater added a comment - Bobi Jam (bobijam@hotmail.com) uploaded a new patch: http://review.whamcloud.com/21825 Subject: LU-8460 osc: chunk size should not bigger than RPC size Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: bde1f813f387f8e15492a864e8c3a78eb0eb3ff4

          chunk_bits is 24 means 16MB block size on the OST side and max_pages_per_rpc was set to 256 unfortunately.

          jay Jinshan Xiong (Inactive) added a comment - chunk_bits is 24 means 16MB block size on the OST side and max_pages_per_rpc was set to 256 unfortunately.
          jsalians_intel John Salinas (Inactive) added a comment - - edited

          With the patch I see:

          kernel:LustreError: 137570:0:(osc_cache.c:661:osc_extent_find()) ASSERTION( (max_pages & ~chunk_mask) == 0 ) failed: max_pages: 0x100 chunk_bits: 24 chunk_mask: 0xfffffffffffff000
          kernel:LustreError: 137570:0:(osc_cache.c:661:osc_extent_find()) LBUG
          

          This is with default settings (no 16MB RPCs set):

          client [root@wolf-6 ~]# cat /proc/fs/lustre/osc/lsdraid-OST0000-osc-ffff88084eccc800/max_pages_per_rpc
          256 
          

          OSS:

          [root@wolf-3 kernel]# lctl get_param obdfilter.lsdraid-OST0000.brw_size
          obdfilter.lsdraid-OST0000.brw_size=1
           [root@wolf-4 ~]# lctl get_param obdfilter.lsdraid-OST0000.brw_size
          obdfilter.lsdraid-OST0000.brw_size=1
          
          [413569.867436] Lustre: Mounted lsdraid-client
          [414679.351734] LustreError: 137570:0:(osc_cache.c:661:osc_extent_find()) ASSERTION( (max_pages & ~chunk_mask) == 0 ) failed: max_pages: 0x100 chunk_bits: 24 chunk_mask: 0xfffffffffffff000
          [414679.373266] LustreError: 137570:0:(osc_cache.c:661:osc_extent_find()) LBUG
          [414679.382571] Pid: 137570, comm: mdtest
          [414679.388269] 
          Call Trace:
          [414679.395587]  [<ffffffffa071d7d3>] libcfs_debug_dumpstack+0x53/0x80 [libcfs]
          [414679.404872]  [<ffffffffa071dd75>] lbug_with_loc+0x45/0xc0 [libcfs]
          [414679.413187]  [<ffffffffa0e31677>] osc_extent_find+0x2f6/0x1aa2 [osc]
          [414679.421671]  [<ffffffffa0e3335e>] ? osc_enter_cache+0x53b/0xe7a [osc]
          [414679.430159]  [<ffffffff81173317>] ? __alloc_pages_nodemask+0x197/0xba0
          [414679.438756]  [<ffffffffa0e1780b>] ? osc_lru_alloc+0x3b/0x390 [osc]
          [414679.446884]  [<ffffffff81285e38>] ? security_capable+0x18/0x20
          [414679.454653]  [<ffffffffa0e29705>] osc_queue_async_io+0x6a5/0x1870 [osc]
          [414679.463245]  [<ffffffffa0cf9c8c>] ? lov_merge_lvb_kms+0x12c/0x450 [lov]
          [414679.471818]  [<ffffffffa0e17b60>] ? osc_page_init+0x0/0x210 [osc]
          [414679.479775]  [<ffffffffa0d9ba70>] ? write_commit_callback+0x0/0x50 [lustre]
          [414679.488686]  [<ffffffffa0e165e3>] osc_page_cache_add+0x43/0x140 [osc]
          [414679.496969]  [<ffffffffa0e1e903>] osc_io_commit_async+0x173/0x440 [osc]
          [414679.505470]  [<ffffffffa0d9ba70>] ? write_commit_callback+0x0/0x50 [lustre]
          [414679.514355]  [<ffffffffa0a65557>] cl_io_commit_async+0x77/0x140 [obdclass]
          [414679.523097]  [<ffffffffa0cf754c>] lov_io_commit_async+0x2dc/0x4a0 [lov]
          [414679.531506]  [<ffffffffa0d9ba70>] ? write_commit_callback+0x0/0x50 [lustre]
          [414679.540302]  [<ffffffffa0d9ba70>] ? write_commit_callback+0x0/0x50 [lustre]
          [414679.549061]  [<ffffffffa0a65557>] cl_io_commit_async+0x77/0x140 [obdclass]
          [414679.557714]  [<ffffffffa0d9e97a>] vvp_io_write_commit+0x17a/0x8d0 [lustre]
          [414679.566324]  [<ffffffffa0d9f39f>] vvp_io_write_start+0x2cf/0x650 [lustre]
          [414679.574846]  [<ffffffffa0a64b65>] cl_io_start+0x65/0x130 [obdclass]
          [414679.582796]  [<ffffffffa0a66f05>] cl_io_loop+0xa5/0x190 [obdclass]
          [414679.590648]  [<ffffffffa0d4cbef>] ll_file_io_generic+0x67f/0xaa0 [lustre]
          [414679.599179]  [<ffffffffa0d4d2dd>] ll_file_aio_write+0x12d/0x1f0 [lustre]
          [414679.607620]  [<ffffffffa0d4d46e>] ll_file_write+0xce/0x1e0 [lustre]
          [414679.615561]  [<ffffffff811dec1d>] vfs_write+0xbd/0x1e0
          [414679.622248]  [<ffffffff811df6bf>] SyS_write+0x7f/0xe0
          [414679.628843]  [<ffffffff81646889>] system_call_fastpath+0x16/0x1b
          [414679.636498] 
          [414679.639249] Kernel panic - not syncing: LBUG
          [414679.644949] CPU: 15 PID: 137570 Comm: mdtest Tainted: G          IOE  ------------   3.10.0-327.22.2.el7.x86_64 #1
          [414679.657442] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS SE5C610.86B.01.01.0008.021120151325 02/11/2015
          [414679.670035]  ffffffffa073adef 00000000a1cd2686 ffff88103ee6f6f8 ffffffff816360fc
          [414679.679336]  ffff88103ee6f778 ffffffff8162f977 ffffffff00000008 ffff88103ee6f788
          [414679.688629]  ffff88103ee6f728 00000000a1cd2686 ffffffffa0e373b0 0000000000000000
          [414679.697931] Call Trace:
          [414679.701624]  [<ffffffff816360fc>] dump_stack+0x19/0x1b
          [414679.708306]  [<ffffffff8162f977>] panic+0xd8/0x1e7
          [414679.714580]  [<ffffffffa071dddb>] lbug_with_loc+0xab/0xc0 [libcfs]
          [414679.722387]  [<ffffffffa0e31677>] osc_extent_find+0x2f6/0x1aa2 [osc]
          [414679.730387]  [<ffffffffa0e3335e>] ? osc_enter_cache+0x53b/0xe7a [osc]
          [414679.738450]  [<ffffffff81173317>] ? __alloc_pages_nodemask+0x197/0xba0
          [414679.746598]  [<ffffffffa0e1780b>] ? osc_lru_alloc+0x3b/0x390 [osc]
          [414679.754346]  [<ffffffff81285e38>] ? security_capable+0x18/0x20
          [414679.761704]  [<ffffffffa0e29705>] osc_queue_async_io+0x6a5/0x1870 [osc]
          [414679.769940]  [<ffffffffa0cf9c8c>] ? lov_merge_lvb_kms+0x12c/0x450 [lov]
          [414679.778188]  [<ffffffffa0e17b60>] ? osc_lru_alloc+0x390/0x390 [osc]
          [414679.786036]  [<ffffffffa0d9ba70>] ? mkwrite_commit_callback+0x20/0x20 [lustre]
          [414679.794960]  [<ffffffffa0e165e3>] osc_page_cache_add+0x43/0x140 [osc]
          [414679.803203]  [<ffffffffa0e1e903>] osc_io_commit_async+0x173/0x440 [osc]
          [414679.811621]  [<ffffffffa0d9ba70>] ? mkwrite_commit_callback+0x20/0x20 [lustre]
          [414679.820732]  [<ffffffffa0a65557>] cl_io_commit_async+0x77/0x140 [obdclass]
          [414679.829440]  [<ffffffffa0cf754c>] lov_io_commit_async+0x2dc/0x4a0 [lov]
          [414679.837867]  [<ffffffffa0d9ba70>] ? mkwrite_commit_callback+0x20/0x20 [lustre]
          [414679.846976]  [<ffffffffa0d9ba70>] ? mkwrite_commit_callback+0x20/0x20 [lustre]
          [414679.856076]  [<ffffffffa0a65557>] cl_io_commit_async+0x77/0x140 [obdclass]
          [414679.864791]  [<ffffffffa0d9e97a>] vvp_io_write_commit+0x17a/0x8d0 [lustre]
          [414679.873498]  [<ffffffffa0d9f39f>] vvp_io_write_start+0x2cf/0x650 [lustre]
          [414679.882106]  [<ffffffffa0a64b65>] cl_io_start+0x65/0x130 [obdclass]
          [414679.890136]  [<ffffffffa0a66f05>] cl_io_loop+0xa5/0x190 [obdclass]
          [414679.898055]  [<ffffffffa0d4cbef>] ll_file_io_generic+0x67f/0xaa0 [lustre]
          [414679.906650]  [<ffffffffa0d4d2dd>] ll_file_aio_write+0x12d/0x1f0 [lustre]
          [414679.915158]  [<ffffffffa0d4d46e>] ll_file_write+0xce/0x1e0 [lustre]
          [414679.923158]  [<ffffffff811dec1d>] vfs_write+0xbd/0x1e0
          [414679.929946]  [<ffffffff811df6bf>] SyS_write+0x7f/0xe0
          [414679.936584]  [<ffffffff81646889>] system_call_fastpath+0x16/0x1b
          
          /scratch/dumps/wolf-6.wolf.hpdd.intel.com/10.8.1.6-2016-08-08-17:12:27/
          
          jsalians_intel John Salinas (Inactive) added a comment - - edited With the patch I see: kernel:LustreError: 137570:0:(osc_cache.c:661:osc_extent_find()) ASSERTION( (max_pages & ~chunk_mask) == 0 ) failed: max_pages: 0x100 chunk_bits: 24 chunk_mask: 0xfffffffffffff000 kernel:LustreError: 137570:0:(osc_cache.c:661:osc_extent_find()) LBUG This is with default settings (no 16MB RPCs set): client [root@wolf-6 ~]# cat /proc/fs/lustre/osc/lsdraid-OST0000-osc-ffff88084eccc800/max_pages_per_rpc 256 OSS: [root@wolf-3 kernel]# lctl get_param obdfilter.lsdraid-OST0000.brw_size obdfilter.lsdraid-OST0000.brw_size=1 [root@wolf-4 ~]# lctl get_param obdfilter.lsdraid-OST0000.brw_size obdfilter.lsdraid-OST0000.brw_size=1 [413569.867436] Lustre: Mounted lsdraid-client [414679.351734] LustreError: 137570:0:(osc_cache.c:661:osc_extent_find()) ASSERTION( (max_pages & ~chunk_mask) == 0 ) failed: max_pages: 0x100 chunk_bits: 24 chunk_mask: 0xfffffffffffff000 [414679.373266] LustreError: 137570:0:(osc_cache.c:661:osc_extent_find()) LBUG [414679.382571] Pid: 137570, comm: mdtest [414679.388269] Call Trace: [414679.395587] [<ffffffffa071d7d3>] libcfs_debug_dumpstack+0x53/0x80 [libcfs] [414679.404872] [<ffffffffa071dd75>] lbug_with_loc+0x45/0xc0 [libcfs] [414679.413187] [<ffffffffa0e31677>] osc_extent_find+0x2f6/0x1aa2 [osc] [414679.421671] [<ffffffffa0e3335e>] ? osc_enter_cache+0x53b/0xe7a [osc] [414679.430159] [<ffffffff81173317>] ? __alloc_pages_nodemask+0x197/0xba0 [414679.438756] [<ffffffffa0e1780b>] ? osc_lru_alloc+0x3b/0x390 [osc] [414679.446884] [<ffffffff81285e38>] ? security_capable+0x18/0x20 [414679.454653] [<ffffffffa0e29705>] osc_queue_async_io+0x6a5/0x1870 [osc] [414679.463245] [<ffffffffa0cf9c8c>] ? lov_merge_lvb_kms+0x12c/0x450 [lov] [414679.471818] [<ffffffffa0e17b60>] ? osc_page_init+0x0/0x210 [osc] [414679.479775] [<ffffffffa0d9ba70>] ? write_commit_callback+0x0/0x50 [lustre] [414679.488686] [<ffffffffa0e165e3>] osc_page_cache_add+0x43/0x140 [osc] [414679.496969] [<ffffffffa0e1e903>] osc_io_commit_async+0x173/0x440 [osc] [414679.505470] [<ffffffffa0d9ba70>] ? write_commit_callback+0x0/0x50 [lustre] [414679.514355] [<ffffffffa0a65557>] cl_io_commit_async+0x77/0x140 [obdclass] [414679.523097] [<ffffffffa0cf754c>] lov_io_commit_async+0x2dc/0x4a0 [lov] [414679.531506] [<ffffffffa0d9ba70>] ? write_commit_callback+0x0/0x50 [lustre] [414679.540302] [<ffffffffa0d9ba70>] ? write_commit_callback+0x0/0x50 [lustre] [414679.549061] [<ffffffffa0a65557>] cl_io_commit_async+0x77/0x140 [obdclass] [414679.557714] [<ffffffffa0d9e97a>] vvp_io_write_commit+0x17a/0x8d0 [lustre] [414679.566324] [<ffffffffa0d9f39f>] vvp_io_write_start+0x2cf/0x650 [lustre] [414679.574846] [<ffffffffa0a64b65>] cl_io_start+0x65/0x130 [obdclass] [414679.582796] [<ffffffffa0a66f05>] cl_io_loop+0xa5/0x190 [obdclass] [414679.590648] [<ffffffffa0d4cbef>] ll_file_io_generic+0x67f/0xaa0 [lustre] [414679.599179] [<ffffffffa0d4d2dd>] ll_file_aio_write+0x12d/0x1f0 [lustre] [414679.607620] [<ffffffffa0d4d46e>] ll_file_write+0xce/0x1e0 [lustre] [414679.615561] [<ffffffff811dec1d>] vfs_write+0xbd/0x1e0 [414679.622248] [<ffffffff811df6bf>] SyS_write+0x7f/0xe0 [414679.628843] [<ffffffff81646889>] system_call_fastpath+0x16/0x1b [414679.636498] [414679.639249] Kernel panic - not syncing: LBUG [414679.644949] CPU: 15 PID: 137570 Comm: mdtest Tainted: G IOE ------------ 3.10.0-327.22.2.el7.x86_64 #1 [414679.657442] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS SE5C610.86B.01.01.0008.021120151325 02/11/2015 [414679.670035] ffffffffa073adef 00000000a1cd2686 ffff88103ee6f6f8 ffffffff816360fc [414679.679336] ffff88103ee6f778 ffffffff8162f977 ffffffff00000008 ffff88103ee6f788 [414679.688629] ffff88103ee6f728 00000000a1cd2686 ffffffffa0e373b0 0000000000000000 [414679.697931] Call Trace: [414679.701624] [<ffffffff816360fc>] dump_stack+0x19/0x1b [414679.708306] [<ffffffff8162f977>] panic+0xd8/0x1e7 [414679.714580] [<ffffffffa071dddb>] lbug_with_loc+0xab/0xc0 [libcfs] [414679.722387] [<ffffffffa0e31677>] osc_extent_find+0x2f6/0x1aa2 [osc] [414679.730387] [<ffffffffa0e3335e>] ? osc_enter_cache+0x53b/0xe7a [osc] [414679.738450] [<ffffffff81173317>] ? __alloc_pages_nodemask+0x197/0xba0 [414679.746598] [<ffffffffa0e1780b>] ? osc_lru_alloc+0x3b/0x390 [osc] [414679.754346] [<ffffffff81285e38>] ? security_capable+0x18/0x20 [414679.761704] [<ffffffffa0e29705>] osc_queue_async_io+0x6a5/0x1870 [osc] [414679.769940] [<ffffffffa0cf9c8c>] ? lov_merge_lvb_kms+0x12c/0x450 [lov] [414679.778188] [<ffffffffa0e17b60>] ? osc_lru_alloc+0x390/0x390 [osc] [414679.786036] [<ffffffffa0d9ba70>] ? mkwrite_commit_callback+0x20/0x20 [lustre] [414679.794960] [<ffffffffa0e165e3>] osc_page_cache_add+0x43/0x140 [osc] [414679.803203] [<ffffffffa0e1e903>] osc_io_commit_async+0x173/0x440 [osc] [414679.811621] [<ffffffffa0d9ba70>] ? mkwrite_commit_callback+0x20/0x20 [lustre] [414679.820732] [<ffffffffa0a65557>] cl_io_commit_async+0x77/0x140 [obdclass] [414679.829440] [<ffffffffa0cf754c>] lov_io_commit_async+0x2dc/0x4a0 [lov] [414679.837867] [<ffffffffa0d9ba70>] ? mkwrite_commit_callback+0x20/0x20 [lustre] [414679.846976] [<ffffffffa0d9ba70>] ? mkwrite_commit_callback+0x20/0x20 [lustre] [414679.856076] [<ffffffffa0a65557>] cl_io_commit_async+0x77/0x140 [obdclass] [414679.864791] [<ffffffffa0d9e97a>] vvp_io_write_commit+0x17a/0x8d0 [lustre] [414679.873498] [<ffffffffa0d9f39f>] vvp_io_write_start+0x2cf/0x650 [lustre] [414679.882106] [<ffffffffa0a64b65>] cl_io_start+0x65/0x130 [obdclass] [414679.890136] [<ffffffffa0a66f05>] cl_io_loop+0xa5/0x190 [obdclass] [414679.898055] [<ffffffffa0d4cbef>] ll_file_io_generic+0x67f/0xaa0 [lustre] [414679.906650] [<ffffffffa0d4d2dd>] ll_file_aio_write+0x12d/0x1f0 [lustre] [414679.915158] [<ffffffffa0d4d46e>] ll_file_write+0xce/0x1e0 [lustre] [414679.923158] [<ffffffff811dec1d>] vfs_write+0xbd/0x1e0 [414679.929946] [<ffffffff811df6bf>] SyS_write+0x7f/0xe0 [414679.936584] [<ffffffff81646889>] system_call_fastpath+0x16/0x1b /scratch/dumps/wolf-6.wolf.hpdd.intel.com/10.8.1.6-2016-08-08-17:12:27/

          the issue is that the brw_size is 16M on the server side but the max_pages_per_rpc is set to 1024, which means 4M, on the client side

          there is a defect to handle chunk size on the client side, which should be set to the maximum of “size reported from OST” and “max_pages_per_rpc”

          jay Jinshan Xiong (Inactive) added a comment - the issue is that the brw_size is 16M on the server side but the max_pages_per_rpc is set to 1024, which means 4M, on the client side there is a defect to handle chunk size on the client side, which should be set to the maximum of “size reported from OST” and “max_pages_per_rpc”
          pjones Peter Jones added a comment -

          Bobijam

          Could you please assist with this issue?

          John

          Were you using Andreas's diagnostic patch when you were running?

          Peter

          pjones Peter Jones added a comment - Bobijam Could you please assist with this issue? John Were you using Andreas's diagnostic patch when you were running? Peter

          People

            bobijam Zhenyu Xu
            jsalians_intel John Salinas (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: