[LU-6538] Client s getting "operation ost_write failed with -3." Created: 29/Apr/15 Updated: 22/Jun/16 Resolved: 22/Jun/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.3 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Haisong Cai (Inactive) | Assignee: | Niu Yawei (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | sdsc | ||
| Environment: |
Linux meerkat-mds-10-1.local 2.6.32-431.29.2.el6_lustre.gb8d9077.x86_64 #1 SMP Mon Apr 27 12:55:47 PDT 2015 x86_64 x86_64 x86_64 GNU/Linux |
||
| Attachments: |
|
| Severity: | 4 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
Some clients can't write to some OSTs. user@gordon-ln3 LTEST]$ cd /oasis/projects/nsf/use300/mahidhar/LTEST Client dmesg got: LustreError: 11-0: meerkat-OST0015-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx244@tcp, operation ost_write failed with -3. Server log see attachment |
| Comments |
| Comment by Rick Wagner (Inactive) [ 29/Apr/15 ] |
|
Minh, can you add an sdsc label to this ticket? |
| Comment by Zhenyu Xu [ 29/Apr/15 ] |
|
From the log, it possibly relates to osd quota operation osd_acct_index_lookup() which calls vfs_get_dqblk() int vfs_get_dqblk(struct super_block *sb, int type, qid_t id, struct if_dqblk *di) { struct dquot *dquot; dquot = dqget(sb, id, type); if (!dquot) return -ESRCH; do_get_dqblk(dquot, di); dqput(dquot); return 0; } and it cannot get reference to dquot. |
| Comment by Haisong Cai (Inactive) [ 29/Apr/15 ] |
|
Hi Zhenyu, Is this a bug or setting problem? Haisong |
| Comment by Niu Yawei (Inactive) [ 29/Apr/15 ] |
|
Hi, Haisong This looks like a setting problem. Do all the OST/MDT devices have quota feature enabled? You can run following command to verify it: "debugfs -c -R features $dev" |
| Comment by Haisong Cai (Inactive) [ 29/Apr/15 ] |
|
Hi Niu, proc and debugfs checked out in OSS and MDS. thanks, |
| Comment by Niu Yawei (Inactive) [ 29/Apr/15 ] |
|
All devices looks have quota feature enabled, then I suspect that quota wasn't enabled on mount for some OSTs. (OST0034 for instance) When did you observe such write failures (-3)? Any error happened on these OSTs before the -3 failures? Do you happen to have earlier log messages so that we can check if quota wasn't enabled successfully on mount time? Could you pick on problematic OST (OST0034 for instance), and perform umount -> mount to see if that will resolve the problem? And please also attach the log messages when umount/mount the OST. Thank you. |
| Comment by Haisong Cai (Inactive) [ 29/Apr/15 ] |
|
Niu, 1) We are mounting OST simply like this: Are there options we should specify when mounting? 2) (-3) error is happening one client side when someone write to problematic OSTs. It doesn't happen for user and either every time. The filesystem was upgraded from 1.8.7 to 2.4.3 a year ago and didn't had problem. Last Monday we upgraded from 2.4.3 to 2.5.3. Almost right after that we started getting write errors. 3) I will gather the logs when we mount OSTs last Monday and upload here 4) I will also try to umount and mount the problematic and attach log here. Haisong |
| Comment by Haisong Cai (Inactive) [ 29/Apr/15 ] |
|
Tried unmount and mount an bad OST: [root@meerkat-oss-11-4 ~]# mount -t lustre /dev/ost_sdi /meerkat/ost_sdi [root@meerkat-oss-11-4 ~]# |
| Comment by Haisong Cai (Inactive) [ 29/Apr/15 ] |
|
Last few lines from dmesg on 11-4: VFS: Error -5 occurred while creating quota. |
| Comment by Niu Yawei (Inactive) [ 29/Apr/15 ] |
|
Looks the quota accounting files are corrupted somehow, could you make sure that latest e2fsprogs are installed on the problematic OSTs? (1.42.12-wc1)
|
| Comment by Haisong Cai (Inactive) [ 29/Apr/15 ] |
|
Aggregated logs from all MDS/OSS, before and after 2.4.3 -> 2.5.3 upgrade. |
| Comment by Haisong Cai (Inactive) [ 29/Apr/15 ] |
|
On all OSTs or just the ones reporting problem? thanks, |
| Comment by Haisong Cai (Inactive) [ 29/Apr/15 ] |
|
By the way.... [root@meerkat-oss-11-4 log]# rpm -aq | grep e2fsprog |
| Comment by Haisong Cai (Inactive) [ 29/Apr/15 ] |
|
[root@meerkat-oss-11-4 log]# umount /meerkat/ost_sdi [root@meerkat-oss-11-4 log]# Testing and will let you know if it fix the problem or not. |
| Comment by Niu Yawei (Inactive) [ 29/Apr/15 ] |
|
I'm not sure if the latest e2fsprogs has been installed when you enable quota feature for the OST/MDT devices (when upgrading from 2.1 to 2.4). If it wasn't the latest version, I suggest you to run it for all MDT/OSTs. Of course, you'd choose the problematic one first to see if it can fix the problem. |
| Comment by Haisong Cai (Inactive) [ 29/Apr/15 ] |
|
Niu, A quick test on one the problematic OSTs show that fix the write problem. Haisong |
| Comment by Niu Yawei (Inactive) [ 23/Jul/15 ] |
|
Haisong, anything else need we do on this ticket? Can we close this one? |