Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7970

intermittent ENOSPC on osd_write_commit

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • Lustre 2.8.0
    • None
    • Centos7.2 3.10.0-327.3.1.el7_lustre.x86_64
      lustre 2.8.0
      zfs 0.6.5.5
      OPA 10.0.0.0.697

      zfs mirrored mgt/mdt
      zfs raidz2 9+2 ost 16 ost/oss 2 OSS's total
    • 3
    • 9223372036854775807

    Description

      Intermittent enospc error. Reproduced it today and grabbed debug info.
      All OSTs have ~28TB free
      It was happening over and over on 1 client with rsync of many small files, but not seen on a second client doing the same rsync to a seperate dir. Shortly after I grabbed the debug info, it no longer happened.
      No errors found in system logs.

      server debug, full debug attached.

      00080000:00000020:5.0:1459450158.935617:0:6537:0:(osd_handler.c:601:osd_sync()) synced OSD osd-zfs
      00002000:00000020:5.0:1459450158.935624:0:6537:0:(ofd_grant.c:280:ofd_grant_statfs()) pylon1-OST0017: cli 278722d8-f96f-0b0a-2cb5-fc84038e8f72/ffff880b43f8e000 free: 31335004504064 avail: 31335002406912
      00002000:00000020:5.0:1459450158.935627:0:6537:0:(ofd_grant.c:348:ofd_grant_space_left()) pylon1-OST0017: cli 278722d8-f96f-0b0a-2cb5-fc84038e8f72/ffff880b43f8e000 avail 31335002406912 left 0 unstable 0 tot_grant 25092482826240 pending 0
      00002000:00000020:5.0:1459450158.935631:0:6537:0:(ofd_grant.c:392:ofd_grant_incoming()) pylon1-OST0017: cli 278722d8-f96f-0b0a-2cb5-fc84038e8f72/ffff880b43f8e000 reports grant 49152 dropped 0, local 12582912
      00002000:00000020:5.0:1459450158.935636:0:6537:0:(ofd_grant.c:634:ofd_grant_check()) pylon1-OST0017: cli 278722d8-f96f-0b0a-2cb5-fc84038e8f72/ffff880b43f8e000 idx 0 no space for 1048576
      00002000:00000020:5.0:1459450158.935638:0:6537:0:(ofd_grant.c:655:ofd_grant_check()) pylon1-OST0017: cli 278722d8-f96f-0b0a-2cb5-fc84038e8f72/ffff880b43f8e000 granted: 0 ungranted: 0 grant: 12582912 dirty: 0
      00000100:00000010:5.0:1459450158.935658:0:6537:0:(client.c:134:ptlrpc_new_bulk()) kmalloced 'desc': 240 at ffff88101e6d4f00.
      00000020:00000040:5.0:1459450158.935659:0:6537:0:(genops.c:807:class_export_get()) GETting export ffff880b43f8e000 : new refcount 1260
      00000400:00000010:5.0:1459450158.935665:0:6537:0:(lib-lnet.h:218:lnet_md_alloc()) alloc '(md)': 176 at ffff880f583c58c0 (tot 256070564).
      00000400:00000010:5.0:1459450158.935668:0:6537:0:(lib-lnet.h:263:lnet_msg_alloc()) alloc '(msg)': 352 at ffff880b36f19800 (tot 256070916).
      00000400:00000200:5.0:1459450158.935672:0:6537:0:(lib-move.c:2412:LNetGet()) LNetGet -> 12345-10.4.112.74@o2ib
      00000800:00000200:5.0:1459450158.935676:0:6537:0:(o2iblnd_cb.c:1501:kiblnd_send()) sending 0 bytes in 0 frags to 12345-10.4.112.74@o2ib
      00000800:00000200:5.0:1459450158.935679:0:6537:0:(o2iblnd_cb.c:730:kiblnd_setup_rd_kiov()) niov 4 offset 0 nob 16384
      00000400:00000010:5.0:1459450158.935682:0:6537:0:(lib-lnet.h:263:lnet_msg_alloc()) alloc '(msg)': 352 at ffff880b36f18200 (tot 256071268).
      00000400:00000200:5.0:1459450158.935685:0:6537:0:(lib-move.c:2293:lnet_create_reply_msg()) 10.4.108.51@o2ib: Reply from 12345-10.4.112.74@o2ib md ffff880f583c58c0
      00000800:00000200:5.0:1459450158.935689:0:6537:0:(o2iblnd.c:402:kiblnd_find_peer_locked()) got peer [ffff88068f575580] -> 10.4.112.74@o2ib (2) version: 12
      00000800:00000200:5.0:1459450158.935691:0:6537:0:(o2iblnd_cb.c:1384:kiblnd_launch_tx()) conn[ffff880e476dc000] (260)++
      00000800:00000200:5.0:1459450158.935693:0:6537:0:(o2iblnd_cb.c:1155:kiblnd_queue_tx_locked()) conn[ffff880e476dc000] (261)++
      00000800:00000200:5.0:1459450158.935700:0:6537:0:(o2iblnd_cb.c:1390:kiblnd_launch_tx()) conn[ffff880e476dc000] (262)--
      00000100:00000200:5.0:1459450158.935703:0:6537:0:(niobuf.c:251:ptlrpc_start_bulk_transfer()) Transferring 4 pages 16384 bytes via portal 8 id 12345-10.4.112.74@o2ib mbits 0x56eb8dfb7df38-0x56eb8dfb7df38
      00000800:00000200:12.1:1459450158.935746:0:748:0:(o2iblnd_cb.c:3448:kiblnd_cq_completion()) conn[ffff880e476dc000] (261)++
      00000800:00000200:5.0:1459450158.935754:0:43037:0:(o2iblnd_cb.c:3570:kiblnd_scheduler()) conn[ffff880e476dc000] (262)++
      00000800:00000200:5.0:1459450158.935757:0:43037:0:(o2iblnd_cb.c:1011:kiblnd_tx_complete()) conn[ffff880e476dc000] (263)++
      00000800:00000200:5.0:1459450158.935759:0:43037:0:(o2iblnd_cb.c:1020:kiblnd_tx_complete()) conn[ffff880e476dc000] (264)--
      00000800:00000200:5.0:1459450158.935760:0:43037:0:(o2iblnd_cb.c:3586:kiblnd_scheduler()) conn[ffff880e476dc000] (263)--
      00000800:00000200:5.0:1459450158.935761:0:43037:0:(o2iblnd_cb.c:3586:kiblnd_scheduler()) conn[ffff880e476dc000] (262)--
      00000800:00000200:12.1:1459450158.935811:0:748:0:(o2iblnd_cb.c:3448:kiblnd_cq_completion()) conn[ffff880e476dc000] (261)++
      00000800:00000200:28.0:1459450158.935846:0:43035:0:(o2iblnd_cb.c:3570:kiblnd_scheduler()) conn[ffff880e476dc000] (262)++
      00000800:00000200:28.0:1459450158.935855:0:43035:0:(o2iblnd_cb.c:325:kiblnd_handle_rx()) Received d7[0] from 10.4.112.74@o2ib
      00000800:00000200:5.0:1459450158.935856:0:43037:0:(o2iblnd_cb.c:3586:kiblnd_scheduler()) conn[ffff880e476dc000] (263)--
      00000800:00000200:28.0:1459450158.935858:0:43035:0:(o2iblnd_cb.c:79:kiblnd_tx_done()) conn[ffff880e476dc000] (262)--
      00000100:00000200:28.0:1459450158.935863:0:43035:0:(events.c:447:server_bulk_callback()) event type 5, status 0, desc ffff88101e6d4f00
      00000400:00000010:28.0:1459450158.935867:0:43035:0:(lib-lnet.h:273:lnet_msg_free()) kfreed 'msg': 352 at ffff880b36f19800 (tot 256070916).
      00000100:00000200:28.0:1459450158.935870:0:43035:0:(events.c:447:server_bulk_callback()) event type 3, status 0, desc ffff88101e6d4f00
      00000400:00000200:28.0:1459450158.935874:0:43035:0:(lib-md.c:73:lnet_md_unlink()) Unlinking md ffff880f583c58c0
      00000400:00000010:28.0:1459450158.935876:0:43035:0:(lib-lnet.h:240:lnet_md_free()) kfreed 'md': 176 at ffff880f583c58c0 (tot 256070740).
      00000400:00000010:28.0:1459450158.935878:0:43035:0:(lib-lnet.h:273:lnet_msg_free()) kfreed 'msg': 352 at ffff880b36f18200 (tot 256070388).
      00000800:00000200:28.0:1459450158.935879:0:43035:0:(o2iblnd_cb.c:193:kiblnd_post_rx()) conn[ffff880e476dc000] (261)++
      00000800:00000200:28.0:1459450158.935882:0:43035:0:(o2iblnd_cb.c:222:kiblnd_post_rx()) conn[ffff880e476dc000] (262)--
      00000800:00000200:28.0:1459450158.935883:0:43035:0:(o2iblnd_cb.c:3586:kiblnd_scheduler()) conn[ffff880e476dc000] (261)--
      00080000:00000010:5.0:1459450158.935896:0:6537:0:(osd_handler.c:348:osd_trans_create()) kmalloced 'oh': 384 at ffff880b36f18800.
      00080000:04000000:5.0:1459450158.935908:0:6537:0:(osd_io.c:657:osd_declare_write_commit()) writing 4 pages, reserving 0K of quota space
      00000020:00000010:5.0:1459450158.936001:0:6537:0:(lu_object.c:1714:keys_init()) kmalloced 'ctx->lc_value': 320 at ffff880b36f19600.
      00000020:00000010:5.0:1459450158.936003:0:6537:0:(update_records.c:1217:update_key_init()) kmalloced 'value': 408 at ffff880b36f19000.
      00000004:00000010:5.0:1459450158.936006:0:6537:0:(osp_dev.c:1850:osp_txn_key_init()) kmalloced 'value': 4 at ffff8807b5337b98.
      00080000:00000002:5.0:1459450158.936010:0:6537:0:(osd_io.c:706:osd_write_commit()) write 4096 bytes at 65536
      00080000:00000002:5.0:1459450158.936011:0:6537:0:(osd_io.c:714:osd_write_commit()) obj [0x100170000:0x2035b:0x0]: skipping lnb[0]: rc=-28
      00080000:00000002:5.0:1459450158.936013:0:6537:0:(osd_io.c:706:osd_write_commit()) write 4096 bytes at 69632
      00080000:00000002:5.0:1459450158.936014:0:6537:0:(osd_io.c:714:osd_write_commit()) obj [0x100170000:0x2035b:0x0]: skipping lnb[1]: rc=-28
      00080000:00000002:5.0:1459450158.936015:0:6537:0:(osd_io.c:706:osd_write_commit()) write 4096 bytes at 73728
      00080000:00000002:5.0:1459450158.936016:0:6537:0:(osd_io.c:714:osd_write_commit()) obj [0x100170000:0x2035b:0x0]: skipping lnb[2]: rc=-28
      00080000:00000002:5.0:1459450158.936017:0:6537:0:(osd_io.c:706:osd_write_commit()) write 4096 bytes at 77824
      00080000:00000002:5.0:1459450158.936018:0:6537:0:(osd_io.c:714:osd_write_commit()) obj [0x100170000:0x2035b:0x0]: skipping lnb[3]: rc=-28
      00080000:00000010:5.0:1459450158.936022:0:6537:0:(osd_object.c:993:osd_attr_set()) kmalloced 'bulk': 400 at ffff880b36f19200.
      00080000:00000010:5.0:1459450158.936062:0:6537:0:(osd_object.c:1092:osd_attr_set()) kfreed 'bulk': 400 at ffff880b36f19200.

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              bjohanso Brian Johanson
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: