]> Pileus Git - ~andy/linux/log
~andy/linux
10 years agolocks: break delegations on any attribute modification
J. Bruce Fields [Tue, 20 Sep 2011 21:19:26 +0000 (17:19 -0400)]
locks: break delegations on any attribute modification

NFSv4 uses leases to guarantee that clients can cache metadata as well
as data.

Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Cc: David Howells <dhowells@redhat.com>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Dustin Kirkland <dustin.kirkland@gazzang.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agolocks: break delegations on link
J. Bruce Fields [Tue, 20 Sep 2011 21:14:31 +0000 (17:14 -0400)]
locks: break delegations on link

Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Dustin Kirkland <dustin.kirkland@gazzang.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agolocks: break delegations on rename
J. Bruce Fields [Tue, 20 Sep 2011 20:59:58 +0000 (16:59 -0400)]
locks: break delegations on rename

Cc: David Howells <dhowells@redhat.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agolocks: helper functions for delegation breaking
J. Bruce Fields [Tue, 28 Aug 2012 14:50:40 +0000 (07:50 -0700)]
locks: helper functions for delegation breaking

We'll need the same logic for rename and link.

Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agolocks: break delegations on unlink
J. Bruce Fields [Tue, 20 Sep 2011 13:14:34 +0000 (09:14 -0400)]
locks: break delegations on unlink

We need to break delegations on any operation that changes the set of
links pointing to an inode.  Start with unlink.

Such operations also hold the i_mutex on a parent directory.  Breaking a
delegation may require waiting for a timeout (by default 90 seconds) in
the case of a unresponsive NFS client.  To avoid blocking all directory
operations, we therefore drop locks before waiting for the delegation.
The logic then looks like:

acquire locks
...
test for delegation; if found:
take reference on inode
release locks
wait for delegation break
drop reference on inode
retry

It is possible this could never terminate.  (Even if we take precautions
to prevent another delegation being acquired on the same inode, we could
get a different inode on each retry.)  But this seems very unlikely.

The initial test for a delegation happens after the lock on the target
inode is acquired, but the directory inode may have been acquired
further up the call stack.  We therefore add a "struct inode **"
argument to any intervening functions, which we use to pass the inode
back up to the caller in the case it needs a delegation synchronously
broken.

Cc: David Howells <dhowells@redhat.com>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Dustin Kirkland <dustin.kirkland@gazzang.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agonamei: minor vfs_unlink cleanup
J. Bruce Fields [Tue, 28 Aug 2012 11:03:24 +0000 (07:03 -0400)]
namei: minor vfs_unlink cleanup

We'll be using dentry->d_inode in one more place.

Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agolocks: implement delegations
J. Bruce Fields [Mon, 5 Mar 2012 18:18:59 +0000 (13:18 -0500)]
locks: implement delegations

Implement NFSv4 delegations at the vfs level using the new FL_DELEG lock
type.

Note nfsd is the only delegation user and is only using read
delegations.  Warn on any attempt to set a write delegation for now.
We'll come back to that case later.

Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agolocks: introduce new FL_DELEG lock flag
J. Bruce Fields [Fri, 1 Jul 2011 19:18:34 +0000 (15:18 -0400)]
locks: introduce new FL_DELEG lock flag

For now FL_DELEG is just a synonym for FL_LEASE.  So this patch doesn't
change behavior.

Next we'll modify break_lease to treat FL_DELEG leases differently, to
account for the fact that NFSv4 delegations should be broken in more
situations than Windows oplocks.

Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agovfs: take i_mutex on renamed file
J. Bruce Fields [Mon, 5 Mar 2012 16:40:41 +0000 (11:40 -0500)]
vfs: take i_mutex on renamed file

A read delegation is used by NFSv4 as a guarantee that a client can
perform local read opens without informing the server.

The open operation takes the last component of the pathname as an
argument, thus is also a lookup operation, and giving the client the
above guarantee means informing the client before we allow anything that
would change the set of names pointing to the inode.

Therefore, we need to break delegations on rename, link, and unlink.

We also need to prevent new delegations from being acquired while one of
these operations is in progress.

We could add some completely new locking for that purpose, but it's
simpler to use the i_mutex, since that's already taken by all the
operations we care about.

The single exception is rename.  So, modify rename to take the i_mutex
on the file that is being renamed.

Also fix up lockdep and Documentation/filesystems/directory-locking to
reflect the change.

Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agovfs: rename I_MUTEX_QUOTA now that it's not used for quotas
J. Bruce Fields [Wed, 18 Apr 2012 19:21:34 +0000 (15:21 -0400)]
vfs: rename I_MUTEX_QUOTA now that it's not used for quotas

I_MUTEX_QUOTA is now just being used whenever we want to lock two
non-directories.  So the name isn't right.  I_MUTEX_NONDIR2 isn't
especially elegant but it's the best I could think of.

Also fix some outdated documentation.

Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agovfs: don't use PARENT/CHILD lock classes for non-directories
J. Bruce Fields [Wed, 25 Apr 2012 11:19:52 +0000 (07:19 -0400)]
vfs: don't use PARENT/CHILD lock classes for non-directories

Reserve I_MUTEX_PARENT and I_MUTEX_CHILD for locking of actual
directories.

(Also I_MUTEX_QUOTA isn't really a meaningful name for this locking
class any more; fixed in a later patch.)

Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agovfs: pull ext4's double-i_mutex-locking into common code
J. Bruce Fields [Wed, 18 Apr 2012 19:16:33 +0000 (15:16 -0400)]
vfs: pull ext4's double-i_mutex-locking into common code

We want to do this elsewhere as well.

Also catch any attempts to use it for directories (where this ordering
would conflict with ancestor-first directory ordering in lock_rename).

Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Dave Chinner <david@fromorbit.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Acked-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agoexportfs: fix quadratic behavior in filehandle lookup
J. Bruce Fields [Fri, 18 Oct 2013 01:34:21 +0000 (21:34 -0400)]
exportfs: fix quadratic behavior in filehandle lookup

Suppose we're given the filehandle for a directory whose closest
ancestor in the dcache is its Nth ancestor.

The main loop in reconnect_path searches for an IS_ROOT ancestor of
target_dir, reconnects that ancestor to its parent, then recommences the
search for an IS_ROOT ancestor from target_dir.

This behavior is quadratic in N.  And there's really no need to restart
the search from target_dir each time: once a directory has been looked
up, it won't become IS_ROOT again.  So instead of starting from
target_dir each time, we can continue where we left off.

This simplifies the code and improves performance on very deep directory
heirachies.  (I can't think of any reason anyone should need heirarchies
a hundred or more deep, but the performance improvement may be valuable
if only to limit damage in case of abuse.)

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agoexportfs: better variable name
J. Bruce Fields [Fri, 18 Oct 2013 01:42:35 +0000 (21:42 -0400)]
exportfs: better variable name

Replace another unhelpful acronym.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agoexportfs: move most of reconnect_path to helper function
J. Bruce Fields [Thu, 17 Oct 2013 15:13:00 +0000 (11:13 -0400)]
exportfs: move most of reconnect_path to helper function

Also replace 3 easily-confused three-letter acronyms by more helpful
variable names.

Just cleanup, no change in functionality, with one exception: the
dentry_connected() check in the "out_reconnected" case will now only
check the ancestors of the current dentry instead of checking all the
way from target_dir.  Since we've already verified connectivity up to
this dentry, that should be sufficient.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agoexportfs: eliminate unused "noprogress" counter
J. Bruce Fields [Thu, 17 Oct 2013 01:20:19 +0000 (21:20 -0400)]
exportfs: eliminate unused "noprogress" counter

Note this counter is now being set to 0 on every pass through the loop,
so it no longer serves any useful purpose.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agoexportfs: stop retrying once we race with rename/remove
J. Bruce Fields [Thu, 17 Oct 2013 01:09:30 +0000 (21:09 -0400)]
exportfs: stop retrying once we race with rename/remove

There are two places here where we could race with a rename or remove:

- We could find the parent, but then be removed or renamed away
  from that parent directory before finding our name in that
  directory.
- We could find the parent, and find our name in that parent,
  but then be renamed or removed before we look ourselves up by
  that name in that parent.

In both cases the concurrent rename or remove will take care of
reconnecting the directory that we're currently examining.  Our target
directory should then also be connected.  Check this and clear
DISCONNECTED in these cases instead of looping around again.

Note: we *do* need to check that this actually happened if we want to be
robust in the face of corrupted filesystems: a corrupted filesystem
could just return a completely wrong parent, and we want to fail with an
error in that case before starting to clear DISCONNECTED on
non-DISCONNECTED filesystems.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agoexportfs: clear DISCONNECTED on all parents sooner
J. Bruce Fields [Mon, 9 Sep 2013 20:15:13 +0000 (16:15 -0400)]
exportfs: clear DISCONNECTED on all parents sooner

Once we've found any connected parent, we know all our parents are
connected--that's true even if there's a concurrent rename.  May as well
clear them all at once and be done with it.

Reviewed-by: Cristoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agoexportfs: more detailed comment for path_reconnect
J. Bruce Fields [Wed, 23 Oct 2013 00:59:19 +0000 (20:59 -0400)]
exportfs: more detailed comment for path_reconnect

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agoexportfs: BUG_ON in crazy corner case
Christoph Hellwig [Wed, 16 Oct 2013 19:48:53 +0000 (15:48 -0400)]
exportfs: BUG_ON in crazy corner case

This would indicate a nasty bug in the dcache and has never triggered in
the past 10 years as far as I know.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agodcache: fix outdated DCACHE_NEED_LOOKUP comment
J. Bruce Fields [Wed, 23 Oct 2013 20:09:16 +0000 (16:09 -0400)]
dcache: fix outdated DCACHE_NEED_LOOKUP comment

The DCACHE_NEED_LOOKUP case referred to here was removed with
39e3c9553f34381a1b664c27b0c696a266a5735e "vfs: remove
DCACHE_NEED_LOOKUP".

There are only four real_lookup() callers and all of them pass in an
unhashed dentry just returned from d_alloc.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agodcache: don't clear DCACHE_DISCONNECTED too early
J. Bruce Fields [Wed, 18 Jul 2012 22:27:37 +0000 (16:27 -0600)]
dcache: don't clear DCACHE_DISCONNECTED too early

DCACHE_DISCONNECTED should not be cleared until we're sure the dentry is
connected all the way up to the root of the filesystem.  It *shouldn't*
be cleared as soon as the dentry is connected to a parent.  That will
cause bugs at least on exportable filesystems.

Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agodcache: Don't set DISCONNECTED on "pseudo filesystem" dentries
J. Bruce Fields [Fri, 29 Jun 2012 20:20:47 +0000 (16:20 -0400)]
dcache: Don't set DISCONNECTED on "pseudo filesystem" dentries

I can't for the life of me see any reason why anyone should care whether
a dentry that is never hooked into the dentry cache would need
DCACHE_DISCONNECTED set.

This originates from 4b936885ab04dc6e0bb0ef35e0e23c1a7364d9e5 "fs:
improve scalability of pseudo filesystems", which probably just made the
false assumption the DCACHE_DISCONNECTED was meant to be set on anything
not connected to a parent somehow.

So this is just confusing.  Ideally the only uses of DCACHE_DISCONNECTED
would be in the filehandle-lookup code, which needs it to ensure
dentries are connected into the dentry tree before use.

I left d_alloc_pseudo there even though it's now equivalent to
__d_alloc(), just on the theory the name is better documentation of its
intended use outside dcache.c.

Cc: Nick Piggin <npiggin@kernel.dk>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agodcache: use IS_ROOT to decide where dentry is hashed
J. Bruce Fields [Thu, 28 Jun 2012 16:10:55 +0000 (12:10 -0400)]
dcache: use IS_ROOT to decide where dentry is hashed

Every hashed dentry is either hashed in the dentry_hashtable, or a
superblock's s_anon list.

__d_drop() assumes it can determine which is the case by checking
DCACHE_DISCONNECTED; this is not true.

It is true that when DCACHE_DISCONNECTED is cleared, the dentry is not
only hashed on dentry_hashtable, but is fully connected to its parents
back to the root.

But the converse is *not* true: fs/exportfs/expfs.c:reconnect_path()
attempts to connect a directory (found by filehandle lookup) back to
root by ascending to parents and performing lookups one at a time.  It
does not clear DCACHE_DISCONNECTED until it's done, and that is not at
all an atomic process.

In particular, it is possible for DCACHE_DISCONNECTED to be set on a
dentry which is hashed on the dentry_hashtable.

Instead, use IS_ROOT() to check which hash chain a dentry is on.  This
*does* work:

Dentries are hashed only by:

- d_obtain_alias, which adds an IS_ROOT() dentry to sb_anon.

- __d_rehash, called by _d_rehash: hashes to the dentry's
  parent, and all callers of _d_rehash appear to have d_parent
  set to a "real" parent.
- __d_rehash, called by __d_move: rehashes the moved dentry to
  hash chain determined by target, and assigns target's d_parent
  to its d_parent, before dropping the dentry's d_lock.

Therefore I believe it's safe for a holder of a dentry's d_lock to
assume that it is hashed on sb_anon if and only if IS_ROOT(dentry) is
true.

I believe the incorrect assumption about DCACHE_DISCONNECTED was
originally introduced by ceb5bdc2d246 "fs: dcache per-bucket dcache hash
locking".

Also add a comment while we're here.

Cc: Nick Piggin <npiggin@kernel.dk>
Acked-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agoocfs2: get rid of impossible checks
Al Viro [Mon, 4 Nov 2013 00:49:19 +0000 (19:49 -0500)]
ocfs2: get rid of impossible checks

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agoqnx4: i_sb is never NULL
Al Viro [Mon, 4 Nov 2013 00:46:35 +0000 (19:46 -0500)]
qnx4: i_sb is never NULL

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agoexportfs: fix 32-bit nfsd handling of 64-bit inode numbers
J. Bruce Fields [Tue, 10 Sep 2013 15:41:12 +0000 (11:41 -0400)]
exportfs: fix 32-bit nfsd handling of 64-bit inode numbers

Symptoms were spurious -ENOENTs on stat of an NFS filesystem from a
32-bit NFS server exporting a very large XFS filesystem, when the
server's cache is cold (so the inodes in question are not in cache).

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reported-by: Trevor Cordes <trevor@tecnopolis.ca>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agovfs: split out vfs_getattr_nosec
J. Bruce Fields [Wed, 2 Oct 2013 21:01:18 +0000 (17:01 -0400)]
vfs: split out vfs_getattr_nosec

The filehandle lookup code wants this version of getattr.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agoiget/iget5: don't bother with ->i_lock until we find a match
Al Viro [Wed, 6 Nov 2013 14:54:52 +0000 (09:54 -0500)]
iget/iget5: don't bother with ->i_lock until we find a match

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agoVFS: Put a small type field into struct dentry::d_flags
David Howells [Thu, 12 Sep 2013 18:22:53 +0000 (19:22 +0100)]
VFS: Put a small type field into struct dentry::d_flags

Put a type field into struct dentry::d_flags to indicate if the dentry is one
of the following types that relate particularly to pathwalk:

Miss (negative dentry)
Directory
"Automount" directory (defective - no i_op->lookup())
Symlink
Other (regular, socket, fifo, device)

The type field is set to one of the first five types on a dentry by calls to
__d_instantiate() and d_obtain_alias() from information in the inode (if one is
given).

The type is cleared by dentry_unlink_inode() when it reconstitutes an existing
dentry as a negative dentry.

Accessors provided are:

d_set_type(dentry, type)
d_is_directory(dentry)
d_is_autodir(dentry)
d_is_symlink(dentry)
d_is_file(dentry)
d_is_negative(dentry)
d_is_positive(dentry)

A bunch of checks in pathname resolution switched to those.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agoelf{,_fdpic} coredump: get rid of pointless if (siginfo->si_signo)
Al Viro [Mon, 14 Oct 2013 11:39:56 +0000 (07:39 -0400)]
elf{,_fdpic} coredump: get rid of pointless if (siginfo->si_signo)

we can't get to do_coredump() if that condition isn't satisfied...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agoconstify do_coredump() argument
Al Viro [Sun, 13 Oct 2013 21:57:29 +0000 (17:57 -0400)]
constify do_coredump() argument

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agoconstify copy_siginfo_to_user{,32}()
Al Viro [Sun, 13 Oct 2013 21:23:53 +0000 (17:23 -0400)]
constify copy_siginfo_to_user{,32}()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years ago... and kill anon_inode_getfile_private()
Al Viro [Wed, 9 Oct 2013 14:26:28 +0000 (10:26 -0400)]
... and kill anon_inode_getfile_private()

it's a seriously misguided API, now fortunately without users.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agorework aio migrate pages to use aio fs
Benjamin LaHaise [Tue, 17 Sep 2013 14:18:25 +0000 (10:18 -0400)]
rework aio migrate pages to use aio fs

Don't abuse anon_inodes.c to host private files needed by aio;
we can bloody well declare a mini-fs of our own instead of
patching up what anon_inodes can create for us.

Tested-by: Benjamin LaHaise <bcrl@kvack.org>
Acked-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agotake anon inode allocation to libfs.c
Al Viro [Thu, 3 Oct 2013 02:35:11 +0000 (22:35 -0400)]
take anon inode allocation to libfs.c

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agonew helper: dump_align()
Al Viro [Tue, 8 Oct 2013 15:05:01 +0000 (11:05 -0400)]
new helper: dump_align()

dump_skip to given alignment...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agospufs: get rid of dump_emit() wrappers
Al Viro [Tue, 8 Oct 2013 13:44:29 +0000 (09:44 -0400)]
spufs: get rid of dump_emit() wrappers

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agodump_skip(): dump_seek() replacement taking coredump_params
Al Viro [Tue, 8 Oct 2013 13:26:08 +0000 (09:26 -0400)]
dump_skip(): dump_seek() replacement taking coredump_params

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agomake dump_emit() use vfs_write() instead of banging at ->f_op->write directly
Al Viro [Tue, 8 Oct 2013 13:11:48 +0000 (09:11 -0400)]
make dump_emit() use vfs_write() instead of banging at ->f_op->write directly

... and deal with short writes properly - the output might be to pipe, after
all; as it is, e.g. no-MMU case of elf_fdpic coredump can write a whole lot
more than a page worth of data at one call.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agobinfmt_elf: count notes towards coredump limit
Al Viro [Mon, 7 Oct 2013 11:23:45 +0000 (07:23 -0400)]
binfmt_elf: count notes towards coredump limit

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agoaout: switch to dump_emit
Al Viro [Mon, 7 Oct 2013 11:22:01 +0000 (07:22 -0400)]
aout: switch to dump_emit

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agoswitch elf_coredump_extra_notes_write() to dump_emit()
Al Viro [Sun, 6 Oct 2013 02:24:29 +0000 (22:24 -0400)]
switch elf_coredump_extra_notes_write() to dump_emit()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agoconvert the rest of binfmt_elf_fdpic to dump_emit()
Al Viro [Sat, 5 Oct 2013 22:58:47 +0000 (18:58 -0400)]
convert the rest of binfmt_elf_fdpic to dump_emit()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agobinfmt_elf: convert writing actual dump pages to dump_emit()
Al Viro [Sat, 5 Oct 2013 22:08:47 +0000 (18:08 -0400)]
binfmt_elf: convert writing actual dump pages to dump_emit()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agoswitch elf_core_write_extra_data() to dump_emit()
Al Viro [Sat, 5 Oct 2013 21:50:15 +0000 (17:50 -0400)]
switch elf_core_write_extra_data() to dump_emit()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agoswitch elf_core_write_extra_phdrs() to dump_emit()
Al Viro [Sat, 5 Oct 2013 21:22:57 +0000 (17:22 -0400)]
switch elf_core_write_extra_phdrs() to dump_emit()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agonew helper: dump_emit()
Al Viro [Sat, 5 Oct 2013 19:32:35 +0000 (15:32 -0400)]
new helper: dump_emit()

dump_write() analog, takes core_dump_params instead of file,
keeps track of the amount written in cprm->written and checks for
cprm->limit.  Start using it in binfmt_elf.c...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agorestore 32bit aout coredump
Al Viro [Sun, 6 Oct 2013 15:10:08 +0000 (11:10 -0400)]
restore 32bit aout coredump

just getting rid of bitrot

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agono need to keep brlock macros anymore...
Al Viro [Sat, 5 Oct 2013 18:19:39 +0000 (14:19 -0400)]
no need to keep brlock macros anymore...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agocoda_revalidate_inode(): switch to passing inode...
Al Viro [Fri, 4 Oct 2013 22:17:02 +0000 (18:17 -0400)]
coda_revalidate_inode(): switch to passing inode...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agofold __d_shrink() into its only remaining caller
Al Viro [Fri, 4 Oct 2013 15:09:01 +0000 (11:09 -0400)]
fold __d_shrink() into its only remaining caller

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agoget rid of s_files and files_lock
Al Viro [Fri, 4 Oct 2013 15:06:42 +0000 (11:06 -0400)]
get rid of s_files and files_lock

The only thing we need it for is alt-sysrq-r (emergency remount r/o)
and these days we can do just as well without going through the
list of files.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agoget rid of {lock,unlock}_rcu_walk()
Al Viro [Fri, 8 Nov 2013 17:45:01 +0000 (12:45 -0500)]
get rid of {lock,unlock}_rcu_walk()

those have become aliases for rcu_read_{lock,unlock}()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agoRCU'd vfsmounts
Al Viro [Mon, 30 Sep 2013 02:06:07 +0000 (22:06 -0400)]
RCU'd vfsmounts

* RCU-delayed freeing of vfsmounts
* vfsmount_lock replaced with a seqlock (mount_lock)
* sequence number from mount_lock is stored in nameidata->m_seq and
used when we exit RCU mode
* new vfsmount flag - MNT_SYNC_UMOUNT.  Set by umount_tree() when its
caller knows that vfsmount will have no surviving references.
* synchronize_rcu() done between unlocking namespace_sem in namespace_unlock()
and doing pending mntput().
* new helper: legitimize_mnt(mnt, seq).  Checks the mount_lock sequence
number against seq, then grabs reference to mnt.  Then it rechecks mount_lock
again to close the race and either returns success or drops the reference it
has acquired.  The subtle point is that in case of MNT_SYNC_UMOUNT we can
simply decrement the refcount and sod off - aforementioned synchronize_rcu()
makes sure that final mntput() won't come until we leave RCU mode.  We need
that, since we don't want to end up with some lazy pathwalk racing with
umount() and stealing the final mntput() from it - caller of umount() may
expect it to return only once the fs is shut down and we don't want to break
that.  In other cases (i.e. with MNT_SYNC_UMOUNT absent) we have to do
full-blown mntput() in case of mount_lock sequence number mismatch happening
just as we'd grabbed the reference, but in those cases we won't be stealing
the final mntput() from anything that would care.
* mntput_no_expire() doesn't lock anything on the fast path now.  Incidentally,
SMP and UP cases are handled the same way - no ifdefs there.
* normal pathname resolution does *not* do any writes to mount_lock.  It does,
of course, bump the refcounts of vfsmount and dentry in the very end, but that's
it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agoswitch shrink_dcache_for_umount() to use of d_walk()
Al Viro [Fri, 8 Nov 2013 17:31:16 +0000 (12:31 -0500)]
switch shrink_dcache_for_umount() to use of d_walk()

we have too many iterators in fs/dcache.c...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agofuse: rcu-delay freeing fuse_conn
Al Viro [Fri, 4 Oct 2013 01:21:39 +0000 (21:21 -0400)]
fuse: rcu-delay freeing fuse_conn

makes ->permission() and ->d_revalidate() safety in RCU mode independent
from vfsmount_lock.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agopid_namespace: make freeing struct pid_namespace rcu-delayed
Al Viro [Thu, 3 Oct 2013 17:28:06 +0000 (13:28 -0400)]
pid_namespace: make freeing struct pid_namespace rcu-delayed

makes procfs ->premission() instances safety in RCU mode independent
from vfsmount_lock.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agoncpfs: rcu-delay unload_nls() and freeing ncp_server
Al Viro [Thu, 3 Oct 2013 17:22:44 +0000 (13:22 -0400)]
ncpfs: rcu-delay unload_nls() and freeing ncp_server

makes ->d_hash() and ->d_compare() safety in RCU mode independent
from vfsmount_lock.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agofat: rcu-delay unloading nls and freeing sbi
Al Viro [Thu, 3 Oct 2013 17:16:50 +0000 (13:16 -0400)]
fat: rcu-delay unloading nls and freeing sbi

makes ->d_hash() and ->d_compare() safety in RCU mode independent
from vfsmount_lock.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agocifs: rcu-delay unload_nls() and freeing sbi
Al Viro [Thu, 3 Oct 2013 16:53:37 +0000 (12:53 -0400)]
cifs: rcu-delay unload_nls() and freeing sbi

makes ->d_hash(), ->d_compare() and ->permission() safety in RCU mode
independent from vfsmount_lock.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agoautofs4: make freeing sbi rcu-delayed
Al Viro [Thu, 3 Oct 2013 16:46:44 +0000 (12:46 -0400)]
autofs4: make freeing sbi rcu-delayed

makes ->d_managed() safety in RCU mode independent from vfsmount_lock

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agoadfs: delayed freeing of sbi
Al Viro [Thu, 3 Oct 2013 16:37:18 +0000 (12:37 -0400)]
adfs: delayed freeing of sbi

makes ->d_hash() and ->d_compare() safety in RCU mode independent
from vfsmount_lock.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agohpfs: make freeing sbi and codetables rcu-delayed
Al Viro [Thu, 3 Oct 2013 16:25:10 +0000 (12:25 -0400)]
hpfs: make freeing sbi and codetables rcu-delayed

makes ->d_hash() and ->d_compare() safety in RCU mode independent
from vfsmount_lock

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agomake freeing super_block rcu-delayed
Al Viro [Fri, 4 Oct 2013 21:06:56 +0000 (17:06 -0400)]
make freeing super_block rcu-delayed

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agovfs: introduce d_instantiate_no_diralias()
Miklos Szeredi [Tue, 1 Oct 2013 14:44:54 +0000 (16:44 +0200)]
vfs: introduce d_instantiate_no_diralias()

...which just returns -EBUSY if a directory alias would be created.

This is to be used by fuse mkdir to make sure that a buggy or malicious
userspace filesystem doesn't do anything nasty.  Previously fuse used a
private mutex for this purpose, which can now go away.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
10 years agomove taking vfsmount_lock down into prepend_path()
Al Viro [Tue, 1 Oct 2013 20:18:06 +0000 (16:18 -0400)]
move taking vfsmount_lock down into prepend_path()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agosplit __lookup_mnt() in two functions
Al Viro [Tue, 1 Oct 2013 20:11:26 +0000 (16:11 -0400)]
split __lookup_mnt() in two functions

Instead of passing the direction as argument (and checking it on every
step through the hash chain), just have separate __lookup_mnt() and
__lookup_mnt_last().  And use the standard iterators...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agouninline destroy_super(), consolidate alloc_super()
Al Viro [Tue, 1 Oct 2013 19:09:58 +0000 (15:09 -0400)]
uninline destroy_super(), consolidate alloc_super()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agoisofs: don't pass dentry to isofs_hash{i,}_common()
Al Viro [Sun, 29 Sep 2013 22:09:05 +0000 (18:09 -0400)]
isofs: don't pass dentry to isofs_hash{i,}_common()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agonew helpers: lock_mount_hash/unlock_mount_hash
Al Viro [Sun, 29 Sep 2013 15:24:49 +0000 (11:24 -0400)]
new helpers: lock_mount_hash/unlock_mount_hash

aka br_write_{lock,unlock} of vfsmount_lock.  Inlines in fs/mount.h,
vfsmount_lock extern moved over there as well.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agodon't bother with vfsmount_lock in mounts_poll()
Al Viro [Sun, 29 Sep 2013 14:59:59 +0000 (10:59 -0400)]
don't bother with vfsmount_lock in mounts_poll()

wake_up_interruptible/poll_wait provide sufficient barriers;
just use ACCESS_ONCE() to fetch ns->event and that's it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agonamespace.c: get rid of mnt_ghosts
Al Viro [Sun, 29 Sep 2013 03:10:55 +0000 (23:10 -0400)]
namespace.c: get rid of mnt_ghosts

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agofold dup_mnt_ns() into its only surviving caller
Al Viro [Sun, 29 Sep 2013 00:47:57 +0000 (20:47 -0400)]
fold dup_mnt_ns() into its only surviving caller

should've been done 6 years ago...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agomnt_set_expiry() doesn't need vfsmount_lock
Al Viro [Sun, 29 Sep 2013 00:30:00 +0000 (20:30 -0400)]
mnt_set_expiry() doesn't need vfsmount_lock

->mnt_expire is protected by namespace_sem

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agofinish_automount() doesn't need vfsmount_lock for removal from expiry list
Al Viro [Sun, 29 Sep 2013 00:29:00 +0000 (20:29 -0400)]
finish_automount() doesn't need vfsmount_lock for removal from expiry list

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agofs/namespace.c: bury long-dead define
Al Viro [Sat, 28 Sep 2013 16:54:06 +0000 (12:54 -0400)]
fs/namespace.c: bury long-dead define

MNT_WRITER_UNDERFLOW_LIMIT has been missed 4 years ago when it became unused.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agofold mntfree() into mntput_no_expire()
Al Viro [Sat, 28 Sep 2013 16:41:25 +0000 (12:41 -0400)]
fold mntfree() into mntput_no_expire()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agodo_remount(): pull touch_mnt_namespace() up
Al Viro [Tue, 17 Sep 2013 02:41:01 +0000 (22:41 -0400)]
do_remount(): pull touch_mnt_namespace() up

... and don't bother with dropping and regaining vfsmount_lock

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agodup_mnt_ns(): get rid of pointless grabbing of vfsmount_lock
Al Viro [Tue, 17 Sep 2013 02:22:16 +0000 (22:22 -0400)]
dup_mnt_ns(): get rid of pointless grabbing of vfsmount_lock

mnt_list is protected by namespace_sem, not vfsmount_lock

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agofs_is_visible only needs namespace_sem held shared
Al Viro [Tue, 17 Sep 2013 01:37:36 +0000 (21:37 -0400)]
fs_is_visible only needs namespace_sem held shared

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agoinitialize namespace_sem statically
Al Viro [Tue, 17 Sep 2013 01:34:53 +0000 (21:34 -0400)]
initialize namespace_sem statically

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agofile->f_op is never NULL...
Al Viro [Sun, 22 Sep 2013 20:27:52 +0000 (16:27 -0400)]
file->f_op is never NULL...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agortl8188eu: remove dead code
Al Viro [Sun, 22 Sep 2013 18:42:05 +0000 (14:42 -0400)]
rtl8188eu: remove dead code

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agodmxdev: get rid of pointless clearing ->f_op
Al Viro [Sun, 22 Sep 2013 18:33:32 +0000 (14:33 -0400)]
dmxdev: get rid of pointless clearing ->f_op

nobody else will see that struct file after return from ->release()
anyway; just leave ->f_op as is and let __fput() do that fops_put().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agoconsolidate the reassignments of ->f_op in ->open() instances
Al Viro [Sun, 22 Sep 2013 18:17:15 +0000 (14:17 -0400)]
consolidate the reassignments of ->f_op in ->open() instances

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agoput_mnt_ns(): use drop_collected_mounts()
Al Viro [Tue, 17 Sep 2013 01:19:20 +0000 (21:19 -0400)]
put_mnt_ns(): use drop_collected_mounts()

... rather than open-coding it

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agoncpfs: switch to %p[dD]
Al Viro [Mon, 16 Sep 2013 14:59:55 +0000 (10:59 -0400)]
ncpfs: switch to %p[dD]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agoubifs: switch to %pd
Al Viro [Mon, 16 Sep 2013 14:58:53 +0000 (10:58 -0400)]
ubifs: switch to %pd

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agosunrpc: switch to %pd
Al Viro [Mon, 16 Sep 2013 14:57:41 +0000 (10:57 -0400)]
sunrpc: switch to %pd

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agonfsd: switch to %p[dD]
Al Viro [Mon, 16 Sep 2013 14:57:01 +0000 (10:57 -0400)]
nfsd: switch to %p[dD]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agonfs: use %p[dD] instead of open-coded (and often racy) equivalents
Al Viro [Mon, 16 Sep 2013 14:53:17 +0000 (10:53 -0400)]
nfs: use %p[dD] instead of open-coded (and often racy) equivalents

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agobefs: split symlink iops in two - for short and long symlinks resp.
Al Viro [Mon, 16 Sep 2013 14:35:31 +0000 (10:35 -0400)]
befs: split symlink iops in two - for short and long symlinks resp.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agonew helper: kfree_put_link()
Al Viro [Mon, 16 Sep 2013 14:30:04 +0000 (10:30 -0400)]
new helper: kfree_put_link()

duplicated to hell and back...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agolibfs: get exports to definitions of objects being exported...
Al Viro [Mon, 16 Sep 2013 01:20:49 +0000 (21:20 -0400)]
libfs: get exports to definitions of objects being exported...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agoecryptfs: ->lower_path.dentry is never NULL
Al Viro [Mon, 16 Sep 2013 00:54:18 +0000 (20:54 -0400)]
ecryptfs: ->lower_path.dentry is never NULL

... on anything found via ->d_fsdata

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agoecryptfs: get rid of ecryptfs_set_dentry_lower{,_mnt}
Al Viro [Mon, 16 Sep 2013 00:50:13 +0000 (20:50 -0400)]
ecryptfs: get rid of ecryptfs_set_dentry_lower{,_mnt}

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agoecryptfs: don't leave RCU pathwalk immediately
Al Viro [Mon, 16 Sep 2013 00:45:11 +0000 (20:45 -0400)]
ecryptfs: don't leave RCU pathwalk immediately

If the underlying dentry doesn't have ->d_revalidate(), there's no need to
force dropping out of RCU mode.  All we need for that is to make freeing
ecryptfs_dentry_info RCU-delayed.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agoecryptfs: check DCACHE_OP_REVALIDATE instead of ->d_op
Al Viro [Sun, 15 Sep 2013 23:41:16 +0000 (19:41 -0400)]
ecryptfs: check DCACHE_OP_REVALIDATE instead of ->d_op

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years ago9p: make v9fs_cache_inode_{get,put,set}_cookie empty inlines for !9P_CACHEFS
Al Viro [Tue, 17 Sep 2013 12:07:11 +0000 (08:07 -0400)]
9p: make v9fs_cache_inode_{get,put,set}_cookie empty inlines for !9P_CACHEFS

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>