[LU-7352] conf-sanity test_78: no space left on device Created: 28/Oct/15 Updated: 10/Apr/17 Resolved: 06/Jun/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.9.0 |
| Fix Version/s: | Lustre 2.9.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Maloo | Assignee: | WC Triage |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
This issue was created by maloo for Andreas Dilger <andreas.dilger@intel.com> This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/fe434a72-7d5b-11e5-bca9-5254006e85c2. The sub-test test_78 failed with the following error: 1048576 bytes (1.0 MB) copied, 0.140863 s, 7.4 MB/s dd: opening `/mnt/lustre/d78.conf-sanity/f78.conf-sanity-99': No space left on device conf-sanity test_78: @@@@@@ FAIL: (4) create /mnt/lustre/d78.conf-sanity/f78.conf-sanity-99 failed Please provide additional information about the failure here. Info required for matching: conf-sanity 78 |
| Comments |
| Comment by Andreas Dilger [ 28/Oct/15 ] |
|
This may relate to |
| Comment by James Nunez (Inactive) [ 29/Oct/15 ] |
|
More failures on master: |
| Comment by Jian Yu [ 08/Apr/16 ] |
|
The same failure still occurred on master branch: |
| Comment by Bob Glossman (Inactive) [ 23/Apr/16 ] |
|
another on master: |
| Comment by Emoly Liu [ 25/Apr/16 ] |
|
Another failure on master: |
| Comment by Gerrit Updater [ 13/May/16 ] |
|
Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: http://review.whamcloud.com/20166 |
| Comment by Andreas Dilger [ 13/May/16 ] |
|
Looking at the client debug log from a recent failure it appears that the ENOSPC is coming from the MDS, possibly if there are no precreated objects on the OSTs yet since the filesystem was just formatted? (mdc_locks.c:536:mdc_finish_enqueue()) Process entered (mdc_locks.c:590:mdc_finish_enqueue()) @@@ op: 3 disposition: 17, status: -28 req@ffff88004e7ca400 x1534075851521040/t0(0) o101->lustre-MDT0000-mdc-ffff88005ac84800@10.9.4.123@tcp:12/10 lens 888/544 e 0 to 0 dl 1463009294 ref 1 fl Complete:R/0/0 rc 301/301 (mdc_locks.c:719:mdc_finish_enqueue()) Process leaving (rc=0 : 0 : 0) (mdc_locks.c:980:mdc_finish_intent_lock()) D_IT dentry f78.conf-sanity-97 intent: open|creat status -28 disp 17 rc 0 The MDS log seems to agree with that to some extent, but it is getting 0 available blocks back from the OST, so it is generating the ENOSPC locally: (osp_precreate.c:129:osp_statfs_interpret()) Process entered (osp_precreate.c:969:osp_pre_update_status()) lustre-OST0000-osc-MDT0000: status: 36034 blocks, 9261 free, 26 used, 0 avail -> -28: rc = 0 (osp_precreate.c:972:osp_pre_update_status()) non-committed changes: 0, in progress: 0 (osp_precreate.c:151:osp_statfs_interpret()) updated statfs ffff88004f038800 (osp_precreate.c:153:osp_statfs_interpret()) Process leaving (rc=0 : 0 : 0) The OST was formatted with 45000 blocks = 175MB for the test: Permanent disk data:
Target: lustre:OST0000
Index: 0
Lustre FS: lustre
Mount type: ldiskfs
Flags: 0x62
(OST first_time update )
Persistent mount opts: ,errors=remount-ro
Parameters: mgsnode=10.9.4.123@tcp sys.timeout=20
device size = 12472MB
formatting backing filesystem ldiskfs on /dev/lvm-Role_OSS/P1
target name lustre:OST0000
4k blocks 45000
options -I 256 -q -O extents,uninit_bg,dir_nlink,quota,huge_file,flex_bg -G 256 -E lazy_itable_init,resize="4290772992",lazy_journal_init -F
mkfs_cmd = mke2fs -j -b 4096 -L lustre:OST0000 -I 256 -q -O extents,uninit_bg,dir_nlink,quota,huge_file,flex_bg -G 256 -E lazy_itable_init,resize="4290772992",lazy_journal_init -F /dev/lvm-Role_OSS/P1 45000
but there are 7 OSTs ~= 1225MB so it is strange that there isn't enough space available on this one OST. Hopefully the extra debugging patch will show what is happening. |
| Comment by Andreas Dilger [ 19/May/16 ] |
|
The actual problem here is that there is only a single OST in the filesystem, and it is very close to running out of space, so when the MDT is trying to precreate objects it is seeing that bavail = 0 and skipping that OST, which is the only one. It isn't taking into account that the client may have reserved space on the OST that it wants to write to a new file. That said, if the MDT checked bfree it may continue to create files on that OST even though the client does not have any grant there. I thought many years ago that the client should be able to send a bitmap of "preferred" OSTs and/or "forbidden" OSTs during create RPCs to the MDS. This would allow the client to select OSTs that are closer to itself by network topology, skip OSTs that it does not have a connection to, or that it doesn't have any grant for. |
| Comment by Bob Glossman (Inactive) [ 24/May/16 ] |
|
another on master: |
| Comment by Bob Glossman (Inactive) [ 28/May/16 ] |
|
another on master: |
| Comment by Wang Shilong (Inactive) [ 28/May/16 ] |
|
Let's think this problem more. Now lod_qos_dev_is_full() is always checking using bavail which will except So if we skip granted space here and just use bfree blocks to check if we can Another thing that I notcied is now it has following comments: |
| Comment by Wang Shilong (Inactive) [ 28/May/16 ] |
|
Hmm..one point is MDS is trying to balance OST objects allocation between different Whatever way, users will still fail to create empty files, when all OST |
| Comment by Gerrit Updater [ 30/May/16 ] |
|
Andreas Dilger (andreas.dilger@intel.com) merged in patch http://review.whamcloud.com/20166/ |
| Comment by Gerrit Updater [ 01/Jun/16 ] |
|
Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: http://review.whamcloud.com/20558 |
| Comment by Gerrit Updater [ 04/Jun/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/20558/ |