summary refs log tree commit diff
path: root/fs
AgeCommit message (Collapse)Author
2014-05-06new methods: ->read_iter() and ->write_iter()Al Viro
Beginning to introduce those. Just the callers for now, and it's clumsier than it'll eventually become; once we finish converting aio_read and aio_write instances, the things will get nicer. For now, these guys are in parallel to ->aio_read() and ->aio_write(); they take iocb and iov_iter, with everything in iov_iter already validated. File offset is passed in iocb->ki_pos, iov/nr_segs - in iov_iter. Main concerns in that series are stack footprint and ability to split the damn thing cleanly. [fix from Peter Ujfalusi <peter.ujfalusi@ti.com> folded] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06replace checking for ->read/->aio_read presence with check in ->f_modeAl Viro
Since we are about to introduce new methods (read_iter/write_iter), the tests in a bunch of places would have to grow inconveniently. Check once (at open() time) and store results in ->f_mode as FMODE_CAN_READ and FMODE_CAN_WRITE resp. It might end up being a temporary measure - once everything switches from ->aio_{read,write} to ->{read,write}_iter it might make sense to return to open-coded checks. We'll see... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06xfs: trim the argument lists of xfs_file_{dio,buffered}_aio_write()Al Viro
pos is redundant (it's iocb->ki_pos), and iov/nr_segs/count are taken care of by lifting iov_iter into the caller. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06blkdev_aio_read(): switch to generic_file_read_iter(), get rid of iov_shorten()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06iov_iter_truncate()Al Viro
Now It Can Be Done(tm) - we don't need to do iov_shorten() in generic_file_direct_write() anymore, now that all ->direct_IO() instances are converted to proper iov_iter methods and honour iter->count and iter->iov_offset properly. Get rid of count/ocount arguments of generic_file_direct_write(), while we are at it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06btrfs: switch check_direct_IO() to iov_iterAl Viro
... and don't open-code iov_iter_alignment() there Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06new helper: iov_iter_get_pages_alloc()Al Viro
same as iov_iter_get_pages(), except that pages array is allocated (kmalloc if possible, vmalloc if that fails) and left for caller to free. Lustre and NFS ->direct_IO() switched to it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06new helper: iov_iter_npages()Al Viro
counts the pages covered by iov_iter, up to given limit. do_block_direct_io() and fuse_iter_npages() switched to it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06f2fs: switch to iov_iter_alignment()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06fuse: switch to iov_iter_get_pages()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06fuse: pull iov_iter initializations upAl Viro
... to fuse_direct_{read,write}(). ->direct_IO() path uses the iov_iter passed by the caller instead. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06new helper: iov_iter_get_pages()Al Viro
iov_iter_get_pages(iter, pages, maxsize, &start) grabs references pinning the pages of up to maxsize of (contiguous) data from iter. Returns the amount of memory grabbed or -error. In case of success, the requested area begins at offset start in pages[0] and runs through pages[1], etc. Less than requested amount might be returned - either because the contiguous area in the beginning of iterator is smaller than requested, or because the kernel failed to pin that many pages. direct-io.c switched to using iov_iter_get_pages() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06dio: take updating ->result into do_direct_IO()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06start adding the tag to iov_iterAl Viro
For now, just use the same thing we pass to ->direct_IO() - it's all iovec-based at the moment. Pass it explicitly to iov_iter_init() and account for kvec vs. iovec in there, by the same kludge NFS ->direct_IO() uses. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06new helper: generic_file_read_iter()Al Viro
iov_iter-using variant of generic_file_aio_read(). Some callers converted. Note that it's still not quite there for use as ->read_iter() - we depend on having zero iter->iov_offset in O_DIRECT case. Fortunately, that's true for all converted callers (and for generic_file_aio_read() itself). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06fuse_file_aio_write(): merge initializations of iov_iterAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06ceph_aio_read(): keep iov_iter across retriesAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06new primitive: iov_iter_alignment()Al Viro
returns the value aligned as badly as the worst remaining segment in iov_iter is. Use instead of open-coded equivalents. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06switch {__,}blockdev_direct_IO() to iov_iterAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06get rid of pointless iov_length() in ->direct_IO()Al Viro
all callers have iov_length(iter->iov, iter->nr_segs) == iov_iter_count(iter) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06ext4: switch the guts of ->direct_IO() to iov_iterAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06convert the guts of nfs_direct_IO() to iov_iterAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06pass iov_iter to ->direct_IO()Al Viro
unmodified, for now Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06kill generic_segment_checks()Al Viro
all callers of ->aio_read() and ->aio_write() have iov/nr_segs already checked - generic_segment_checks() done after that is just an odd way to spell iov_length(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06__btrfs_direct_write(): switch to iov_iterAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06generic_file_direct_write(): switch to iov_iterAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06kill iov_iter_copy_from_user()Al Viro
all callers can use copy_page_from_iter() and it actually simplifies them. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06fs/file.c: don't open-code kvfree()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06Merge branch 'akpm' (incoming from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "13 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: agp: info leak in agpioc_info_wrap() fs/affs/super.c: bugfix / double free fanotify: fix -EOVERFLOW with large files on 64-bit slub: use sysfs'es release mechanism for kmem_cache revert "mm: vmscan: do not swap anon pages just because free+file is low" autofs: fix lockref lookup mm: filemap: update find_get_pages_tag() to deal with shadow entries mm/compaction: make isolate_freepages start at pageblock boundary MAINTAINERS: zswap/zbud: change maintainer email address mm/page-writeback.c: fix divide by zero in pos_ratio_polynom hugetlb: ensure hugepage access is denied if hugepages are not supported slub: fix memcg_propagate_slab_attrs drivers/rtc/rtc-pcf8523.c: fix month definition
2014-05-06fs/affs/super.c: bugfix / double freeFabian Frederick
Commit 842a859db26b ("affs: use ->kill_sb() to simplify ->put_super() and failure exits of ->mount()") adds .kill_sb which frees sbi but doesn't remove sbi free in case of parse_options error causing double free+random crash. Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> [3.14.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-05-06fanotify: fix -EOVERFLOW with large files on 64-bitWill Woods
On 64-bit systems, O_LARGEFILE is automatically added to flags inside the open() syscall (also openat(), blkdev_open(), etc). Userspace therefore defines O_LARGEFILE to be 0 - you can use it, but it's a no-op. Everything should be O_LARGEFILE by default. But: when fanotify does create_fd() it uses dentry_open(), which skips all that. And userspace can't set O_LARGEFILE in fanotify_init() because it's defined to 0. So if fanotify gets an event regarding a large file, the read() will just fail with -EOVERFLOW. This patch adds O_LARGEFILE to fanotify_init()'s event_f_flags on 64-bit systems, using the same test as open()/openat()/etc. Addresses https://bugzilla.redhat.com/show_bug.cgi?id=696821 Signed-off-by: Will Woods <wwoods@redhat.com> Acked-by: Eric Paris <eparis@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-05-06autofs: fix lockref lookupIan Kent
autofs needs to be able to see private data dentry flags for its dentrys that are being created but not yet hashed and for its dentrys that have been rmdir()ed but not yet freed. It needs to do this so it can block processes in these states until a status has been returned to indicate the given operation is complete. It does this by keeping two lists, active and expring, of dentrys in this state and uses ->d_release() to keep them stable while it checks the reference count to determine if they should be used. But with the recent lockref changes dentrys being freed sometimes don't transition to a reference count of 0 before being freed so autofs can occassionally use a dentry that is invalid which can lead to a panic. Signed-off-by: Ian Kent <raven@themaw.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-05-06hugetlb: ensure hugepage access is denied if hugepages are not supportedNishanth Aravamudan
Currently, I am seeing the following when I `mount -t hugetlbfs /none /dev/hugetlbfs`, and then simply do a `ls /dev/hugetlbfs`. I think it's related to the fact that hugetlbfs is properly not correctly setting itself up in this state?: Unable to handle kernel paging request for data at address 0x00000031 Faulting instruction address: 0xc000000000245710 Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=2048 NUMA pSeries .... In KVM guests on Power, in a guest not backed by hugepages, we see the following: AnonHugePages: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 64 kB HPAGE_SHIFT == 0 in this configuration, which indicates that hugepages are not supported at boot-time, but this is only checked in hugetlb_init(). Extract the check to a helper function, and use it in a few relevant places. This does make hugetlbfs not supported (not registered at all) in this environment. I believe this is fine, as there are no valid hugepages and that won't change at runtime. [akpm@linux-foundation.org: use pr_info(), per Mel] [akpm@linux-foundation.org: fix build when HPAGE_SHIFT is undefined] Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-05-06Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "dcache fixes + kvfree() (uninlined, exported by mm/util.c) + posix_acl bugfix from hch" The dcache fixes are for a subtle LRU list corruption bug reported by Miklos Szeredi, where people inside IBM saw list corruptions with the LTP/host01 test. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: nick kvfree() from apparmor posix_acl: handle NULL ACL in posix_acl_equiv_mode dcache: don't need rcu in shrink_dentry_list() more graceful recovery in umount_collect() don't remove from shrink list in select_collect() dentry_kill(): don't try to remove from shrink list expand the call of dentry_lru_del() in dentry_kill() new helper: dentry_free() fold try_prune_one_dentry() fold d_kill() and d_free() fix races between __d_instantiate() and checks of dentry flags
2014-05-06posix_acl: handle NULL ACL in posix_acl_equiv_modeChristoph Hellwig
Various filesystems don't bother checking for a NULL ACL in posix_acl_equiv_mode, and thus can dereference a NULL pointer when it gets passed one. This usually happens from the NFS server, as the ACL tools never pass a NULL ACL, but instead of one representing the mode bits. Instead of adding boilerplat to all filesystems put this check into one place, which will allow us to remove the check from other filesystems as well later on. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Ben Greear <greearb@candelatech.com> Reported-by: Marco Munderloh <munderl@tnt.uni-hannover.de>, Cc: Chuck Lever <chuck.lever@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse fixes from Miklos Szeredi: "This adds ctime update in the new cached writeback mode and also fixes/simplifies the mtime update handling. Support for rename flags (aka renameat2) is also added to the userspace API" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: add renameat2 support fuse: clear MS_I_VERSION fuse: clear FUSE_I_CTIME_DIRTY flag on setattr fuse: trust kernel i_ctime only fuse: remove .update_time fuse: allow ctime flushing to userspace fuse: fuse: add time_gran to INIT_OUT fuse: add .write_inode fuse: clean up fsync fuse: fuse: fallocate: use file_update_time() fuse: update mtime on open(O_TRUNC) in atomic_o_trunc mode fuse: update mtime on truncate(2) fuse: do not use uninitialized i_mode fuse: fix mtime update error in fsync fuse: check fallocate mode fuse: add __exit to fuse_ctl_cleanup
2014-05-05Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph fixes from Sage Weil: "First, there is a critical fix for the new primary-affinity function that went into -rc1. The second batch of patches from Zheng fix a range of problems with directory fragmentation, readdir, and a few odds and ends for cephfs" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: reserve caps for file layout/lock MDS requests ceph: avoid releasing caps that are being used ceph: clear directory's completeness when creating file libceph: fix non-default values check in apply_primary_affinity() ceph: use fpos_cmp() to compare dentry positions ceph: check directory's completeness before emitting directory entry
2014-05-05UBIFS: fix remount error pathArtem Bityutskiy
Dan's "smatch" checker found out that there was a bug in the error path of the 'ubifs_remount_rw()' function. Instead of jumping to the "out" label which cleans-things up, we just returned. This patch fixes the problem. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2014-05-03dcache: don't need rcu in shrink_dentry_list()Miklos Szeredi
Since now the shrink list is private and nobody can free the dentry while it is on the shrink list, we can remove RCU protection from this. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-03more graceful recovery in umount_collect()Al Viro
Start with shrink_dcache_parent(), then scan what remains. First of all, BUG() is very much an overkill here; we are holding ->s_umount, and hitting BUG() means that a lot of interesting stuff will be hanging after that point (sync(2), for example). Moreover, in cases when there had been more than one leak, we'll be better off reporting all of them. And more than just the last component of pathname - %pd is there for just such uses... That was the last user of dentry_lru_del(), so kill it off... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-03don't remove from shrink list in select_collect()Al Viro
If we find something already on a shrink list, just increment data->found and do nothing else. Loops in shrink_dcache_parent() and check_submounts_and_drop() will do the right thing - everything we did put into our list will be evicted and if there had been nothing, but data->found got non-zero, well, we have somebody else shrinking those guys; just try again. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-01Merge git://git.kvack.org/~bcrl/aio-fixesLinus Torvalds
Pull aio fixes from Ben LaHaise: "The first change from Anatol fixes a regression where io_destroy() no longer waits for outstanding aios to complete. The second corrects a memory leak in an error path for vectored aio operations. Both of these bug fixes should be queued up for stable as well" * git://git.kvack.org/~bcrl/aio-fixes: aio: fix potential leak in aio_run_iocb(). aio: block io_destroy() until all context requests are completed
2014-05-01dentry_kill(): don't try to remove from shrink listAl Viro
If the victim in on the shrink list, don't remove it from there. If shrink_dentry_list() manages to remove it from the list before we are done - fine, we'll just free it as usual. If not - mark it with new flag (DCACHE_MAY_FREE) and leave it there. Eventually, shrink_dentry_list() will get to it, remove the sucker from shrink list and call dentry_kill(dentry, 0). Which is where we'll deal with freeing. Since now dentry_kill(dentry, 0) may happen after or during dentry_kill(dentry, 1), we need to recognize that (by seeing DCACHE_DENTRY_KILLED already set), unlock everything and either free the sucker (in case DCACHE_MAY_FREE has been set) or leave it for ongoing dentry_kill(dentry, 1) to deal with. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-01aio: fix potential leak in aio_run_iocb().Leon Yu
iovec should be reclaimed whenever caller of rw_copy_check_uvector() returns, but it doesn't hold when failure happens right after aio_setup_vectored_rw(). Fix that in a such way to avoid hairy goto. Signed-off-by: Leon Yu <chianglungyu@gmail.com> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Cc: stable@vger.kernel.org
2014-04-30expand the call of dentry_lru_del() in dentry_kill()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-04-30new helper: dentry_free()Al Viro
The part of old d_free() that dealt with actual freeing of dentry. Taken out of dentry_kill() into a separate function. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-04-30fold try_prune_one_dentry()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-04-30fold d_kill() and d_free()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-04-28ceph: reserve caps for file layout/lock MDS requestsYan, Zheng
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-28ceph: avoid releasing caps that are being usedYan, Zheng
To avoid releasing caps that are being used, encode_inode_release() should send implemented caps to MDS. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Sage Weil <sage@inktank.com>