Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.7.0
-
lustre 2.7.54
spl/zfs 0.6.4.1
single MDS with one MDT and MGS
two OSS's with one OST each
-
3
-
9223372036854775807
Description
Running mdtest I get failures like this:
06/18/2015 16:22:12: Process 26(zwicky29): FAILED in create_remove_items_helper, unable to create file file.mdtest.26.0 (cwd=/p/lburn/faaland1/zfs-crada/mdtest/2/#test-dir.0/mdtest_tree.26.0): No space left on device
06/18/2015 16:22:12: Process 19(zwicky22): FAILED in create_remove_items_helper, unable to create file file.mdtest.19.0 (cwd=/p/lburn/faaland1/zfs-crada/mdtest/2/#test-dir.0/mdtest_tree.19.0): No space left on device
The three servers involved are using the zfs backend and their pools have lots of free space; all are <1% full.
In the lustre debug log on the MDS, I'm seeing
lod_qos.c:238:lod_statfs_and_check() return -30 (-LUSTRE_EROFS)
lod_qos.c:1016:lod_alloc_rr() return -28 (-ENOSPC)
Many other functions report exiting with -28 as well:
lod_object.c:2104:lod_declare_xattr_set()
lod_object.c:3352:lod_declare_striped_object()
lod_object.c:3384:lod_declare_striped_object()
lod_object.c:3463:lod_declare_object_create()
lod_qos.c:1913:lod_qos_prep_create()
mdd_dir.c:1786:mdd_create_data()
mdd_dir.c:1807:mdd_create_data()
mdd_dir.c:1983:mdd_declare_object_create()
mdd_dir.c:2054:mdd_declare_create()
mdd_dir.c:2354:mdd_create()
mdd_object.c:352:mdd_declare_object_create_internal()
mdt_open.c:1105:mdt_open_by_fid_lock()
mdt_open.c:1255:mdt_reint_open()
mdt_open.c:1374:mdt_reint_open()
mdt_open.c:138:mdt_create_data()
mdt_open.c:347:mdt_mfd_open()
mdt_open.c:607:mdt_finish_open()
mdt_reint.c:1997:mdt_reint_rec()
I've attached a few thousand lines of debug output from the mds with both debug and debug_subsys set to -1. I can reproduce easily, so I can get debug output with specific subsystems turned off or on.
Attachments
Issue Links
- is related to
-
LU-6767 Capture READONLY status in osd-zfs osd_statfs()
- Closed