Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-19192

attempt to migrate an overstriped file crashes Lustre

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Medium
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      Here is an attempt to reproduce the issue:

      [root@rocky94 tests]# mkdir /mnt/lustre/foodir
      [root@rocky94 tests]# ../utils/lfs setstripe  --overstripe-count 2000 /mnt/lustre/foodir/
      [root@rocky94 tests]# touch /mnt/lustre/foodir/file1
      [root@rocky94 tests]# ../utils/lfs migrate -m 1 /mnt/lustre/foodir
      lt-lfs migrate: /mnt/lustre/foodir/file1 migrate failed: Read-only file system (30)
      [root@rocky94 tests]# dmesg | tail
      [22550.004927] LustreError: 99642:0:(osd_io.c:1976:osd_ldiskfs_write_record()) journal_get_write_access() returned error -30
      [22550.005730] LustreError: 99642:0:(tgt_lastrcvd.c:1293:tgt_add_reply_data()) lustre-MDT0000: can't update reply_data file: rc = -30
      [22550.006587] LustreError: 99642:0:(osd_handler.c:2238:osd_trans_stop()) lustre-MDT0000: failed in transaction hook: rc = -30
      [22550.007708] LDISKFS-fs error (device dm-0) in osd_trans_stop:2246: error 28
      [22550.007903] LustreError: 98821:0:(osd_handler.c:1905:osd_trans_commit_cb()) transaction @0xffff96b319f4e400 commit error: 2
      [22550.012283] LustreError: 99642:0:(osd_handler.c:2248:osd_trans_stop()) lustre-MDT0000: failed to stop transaction: rc = -28
      [22550.013511] LustreError: 99642:0:(update_trans.c:1010:top_trans_stop()) lustre-MDT0000-osd: stop trans failed: rc = -30
      [22550.073556] LustreError: 99642:0:(mdt_reint.c:2469:mdt_reint_migrate()) lustre-MDT0000: migrate [0x240000402:0x1:0x0]/file1 failed: rc = -30
      [22553.739958] LustreError: 99134:0:(llog_cat.c:779:llog_cat_cancel_arr_rec()) lustre-MDT0000-osp-MDT0001: fail to cancel 1 llog-records: rc = -116
      [22553.742417] LustreError: 99134:0:(llog_cat.c:815:llog_cat_cancel_records()) lustre-MDT0000-osp-MDT0001: fail to cancel 1 of 1 llog-records: rc = -116
      [root@rocky94 tests]# 
      

      kernel messages are:

      [22549.953537] ------------[ cut here ]------------
      [22549.953819] WARNING: CPU: 1 PID: 99642 at /mnt/nfs/git/lustre-release/ldiskfs/ext4_jbd2.c:357 __ldiskfs_handle_dirty_metadata+0xf1/0x180 [ldiskfs]
      [22549.954759] Modules linked in: lustre(OE) osp(OE) ofd(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) dm_flakey tls loop jbd2 mbcache rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill intel_rapl_msr intel_rapl_common kvm_amd ccp kvm virtio_gpu virtio_dma_buf drm_shmem_helper drm_kms_helper iTCO_wdt iTCO_vendor_support syscopyarea sysfillrect irqbypass vfat i2c_i801 sysimgblt pcspkr lpc_ich i2c_smbus virtio_balloon fat fb_sys_fops joydev drm fuse xfs libcrc32c sr_mod cdrom sg crct10dif_pclmul crc32_pclmul crc32c_intel ahci libahci ghash_clmulni_intel libata virtio_console virtio_blk virtio_net net_failover failover serio_raw sunrpc dm_mirror dm_region_hash dm_log dm_mod [last unloaded: libcfs]
      [22549.959408] CPU: 1 PID: 99642 Comm: mdt00_004 Kdump: loaded Tainted: G        W  OE     -------  ---  5.14.0-427.42.1.x7.2.000.103.x86_64 #1
      [22549.960143] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
      [22549.960612] RIP: 0010:__ldiskfs_handle_dirty_metadata+0xf1/0x180 [ldiskfs]
      [22549.961050] Code: 44 89 fa 4c 89 f6 49 c7 c1 3a 43 9d c1 4c 89 e7 41 bd fb ff ff ff e8 7e 69 06 00 eb 95 48 89 ef 45 31 ed e8 f1 30 d3 f1 eb 88 <0f> 0b 48 c7 c2 c0 87 9c c1 45 89 e8 48 89 d9 44 89 fe 4c 89 f7 e8
      [22549.962129] RSP: 0018:ffffabb0871cb6f0 EFLAGS: 00010286
      [22549.962438] RAX: ffff96b30783c800 RBX: ffff96b42fe5faf0 RCX: 00000000000003fc
      [22549.962855] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff96b2f79a4440
      [22549.963280] RBP: ffff96b2f79a5f08 R08: ffff96b2f79a5f08 R09: ffff96b2c9b06000
      [22549.963697] R10: ffff96b43bcb9c90 R11: 0000000000000028 R12: ffff96b3071636e0
      [22549.964133] R13: 00000000ffffffe4 R14: ffffffffc19cb990 R15: 000000000000038c
      [22549.964551] FS:  0000000000000000(0000) GS:ffff96b43bc80000(0000) knlGS:0000000000000000
      [22549.965033] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [22549.965372] CR2: 00007f07e11a2584 CR3: 0000000104cac000 CR4: 0000000000750ee0
      [22549.965789] PKRU: 55555554
      [22549.965963] Call Trace:
      [22549.966128]  <TASK>
      [22549.966261]  ? srso_alias_return_thunk+0x5/0xfbef5
      [22549.966569]  ? show_trace_log_lvl+0x26e/0x2df
      [22549.966844]  ? show_trace_log_lvl+0x26e/0x2df
      [22549.967117]  ? ldiskfs_getblk+0x16b/0x210 [ldiskfs]
      [22549.967431]  ? __ldiskfs_handle_dirty_metadata+0xf1/0x180 [ldiskfs]
      [22549.967824]  ? __warn+0x81/0x110
      [22549.968054]  ? __ldiskfs_handle_dirty_metadata+0xf1/0x180 [ldiskfs]
      [22549.968440]  ? report_bug+0x10a/0x140
      [22549.968674]  ? handle_bug+0x3c/0x70
      [22549.968901]  ? exc_invalid_op+0x14/0x70
      [22549.969136]  ? asm_exc_invalid_op+0x16/0x20
      [22549.969388]  ? __ldiskfs_handle_dirty_metadata+0xf1/0x180 [ldiskfs]
      [22549.969774]  ldiskfs_getblk+0x16b/0x210 [ldiskfs]
      [22549.970082]  ldiskfs_bread+0xc/0x70 [ldiskfs]
      [22549.970360]  osd_ldiskfs_write_record+0x4c9/0x6a0 [osd_ldiskfs]
      [22549.970773]  osd_write+0xf0/0x410 [osd_ldiskfs]
      [22549.971063]  ? srso_alias_return_thunk+0x5/0xfbef5
      [22549.971351]  dt_record_write+0x32/0x110 [obdclass]
      [22549.971865]  llog_osd_write_rec+0xd63/0x1c90 [obdclass]
      [22549.972224]  llog_write_rec+0x43f/0x590 [obdclass]
      [22549.972545]  llog_cat_add_rec+0xab/0x560 [obdclass]
      [22549.972863]  llog_add+0x1a1/0x200 [obdclass]
      [22549.973164]  sub_updates_write+0x97c/0x1340 [ptlrpc]
      [22549.973830]  ? top_trans_stop+0x2f3/0xd30 [ptlrpc]
      [22549.974177]  top_trans_stop+0x2f3/0xd30 [ptlrpc]
      [22549.974505]  mdd_trans_stop+0x27/0x16c [mdd]
      [22549.974812]  mdd_migrate_object+0x5ca/0xfa0 [mdd]
      [22549.975124]  ? mdd_migrate+0x14/0x20 [mdd]
      [22549.975379]  mdd_migrate+0x14/0x20 [mdd]
      [22549.975623]  mdt_reint_migrate+0x1207/0x1ba0 [mdt]
      [22549.976035]  mdt_reint_rec+0x11c/0x270 [mdt]
      [22549.976309]  mdt_reint_internal+0x501/0x980 [mdt]
      [22549.976606]  mdt_reint+0x59/0x110 [mdt]
      [22549.976853]  tgt_handle_request0+0x2ee/0x720 [ptlrpc]
      [22549.977231]  tgt_request_handle+0x255/0xb70 [ptlrpc]
      [22549.977578]  ptlrpc_server_handle_request.isra.0+0x21c/0xbc0 [ptlrpc]
      [22549.978028]  ? srso_alias_return_thunk+0x5/0xfbef5
      [22549.978315]  ptlrpc_main+0xa77/0xfa0 [ptlrpc]
      [22549.978624]  ? __pfx_ptlrpc_main+0x10/0x10 [ptlrpc]
      [22549.978970]  kthread+0xe0/0x100
      [22549.979185]  ? __pfx_kthread+0x10/0x10
      [22549.979410]  ret_from_fork+0x2c/0x50
      [22549.979641]  </TASK>
      [22549.979779] ---[ end trace e10715f26dba9e05 ]---
      [22549.980066] LDISKFS-fs: ldiskfs_getblk:908: aborting transaction: error 28 in __ldiskfs_handle_dirty_metadata
      [22549.980641] LDISKFS-fs error (device dm-0): ldiskfs_getblk:908: inode #166: block 34083: comm mdt00_004: journal_dirty_metadata failed: handle type 0 started at line 2133, credits 516/0, errcode -28
      

      depending on the ldiskfs mount options, the failure causes mounting the fs R/O or crash.

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              zam Alexander Zarochentsev
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated: