Return -EINVAL if the size isn't OK instead of -EPERM.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently connection tracking handles ICMP error like normal packets
if it failed to get related connection. But it fails that after all.
This makes connection tracking stop tracking ICMP error at early point.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Without this patch, any user can cause nfnetlink subsystems to be
autoloaded. Those subsystems however could add significant processing
overhead to packet processing, and would refuse any configuration messages
from non-CAP_NET_ADMIN processes anyway.
This patch follows a suggestion from Patrick McHardy.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The reply tuple of the PNS->PAC expectation was using the wrong call id.
So we had the following situation:
- PNS behind NAT firewall
- PNS call id requires NATing
- PNS->PAC gre packet arrives first
then the PNS->PAC expectation is matched, and the other expectation
is deleted, but the PAC->PNS gre packets do not match the gre conntrack
because the call id is wrong.
We also cannot use ip_nat_follow_master().
Signed-off-by: Philip Craig <philipc@snapgear.com>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
ctnetlink_get_conntrack is always called from user context, so GFP_KERNEL
is enough.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kill some useless headers included in ctnetlink. They aren't used in any
way.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add missing module alias. This is a must to load ctnetlink on demand. For
example, the conntrack tool will fail if the module isn't loaded.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for conntrack marking from user space.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This fixes an oops triggered from userspace. If we don't pass information
about the private protocol info, the reference to attr will be NULL. This is
likely to happen in update messages.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
nfattr_parse (and thus nfattr_parse_nested) always returns success. So we
can make them 'void' and remove all the checking at the caller side.
Based on original patch by Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The packet counter variable of conntrack was changed to 32bits from 64bits.
This follows that change.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use ARRAY_SIZE macro instead of sizeof(x)/sizeof(x[0]) and remove a
duplicate of ARRAY_SIZE which is never used anyways.
Signed-off-by: Tobias Klauser <tklauser@nuerscht.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use ARRAY_SIZE macro instead of sizeof(x)/sizeof(x[0]) and remove a
duplicate of ARRAY_SIZE which is never used anyways.
Signed-off-by: Tobias Klauser <tklauser@nuerscht.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before we did CLONE_THREAD, the way to check whether we were attaching
to ourselves was to just check "current == task", but with CLONE_THREAD
we should check that the thread group ID matches instead.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch makes a needlessly global function static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: William Irwin <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch makes the following previously global and EXPORT_SYMBOL'ed
code static:
- struct mpt_proc_root_dir
- int mpt_stm_index
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch makes two needlessly global functions static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch makes the needlessly global function path_lookup_create()
static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Document in Documentation/md.txt the files that now appear in sysfs, and make
a couple of small refinements to exactly when 'level' and 'raid_disks' are
empty, to make it match the documentation.
Signed-off-by: Neil Brown <neilb@suse.de>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The current sync_action for an array can be one of
idle - nothing happening
resync - reduncancy being recalcualted
recover - missing device being recoverred to spare
check - user initiated check of redundancy
repair - like resync but user-initiated and ignores
bitmap optimisation.
Each of these strings can also be written to the 'sync_action' file to cause
that action to happen (if appropriate).
While 'sync' is not technically correct, as a recovery is *not* a 'sync', I
think it is the most servicable word here. Also 'action' is a strong word
than 'mode'.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There are a few loose ends following the conversion of md to use kthreads:
- Some fields in mdk_thread_t that aren't needed (kthreads does it's own
completion and manages it's own name).
- thread->run is now never NULL, so no need to check
- Some tests for signal_pending that aren't needed (As we don't use signals
to stop threads any more)
- Some flush_signals are not needed
- Some waits are interruptible and don't need to be.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The 'auto-readonly' flag (which suppresses resync and superblock updates until
the first write) is not meaningful for personalities that don't support resync
or superblock writes (raid0, linear, etc).
So clear the setting early to avoid it confusing anything - e.g. appearing in
/proc/mdstat
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The introduction of 'resync=PENDING' (for read-only devices) caused that
message to appear for non-syncable arrays like raid0 and linear. Simplest
thing is to not try to print any resync info unless the personality clearly
supports it.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Some, but not all, md array support data redundancy and hence support checking
and restoring that redundancy (resync, rebuild).
Some attributes apply specifically to functions involving this redundancy, and
so should only appear for md arrays for which they are meaningful. i.e. they
should not appear for raid0, linear, multpath, faulty.
This patch separates these into a distinct group and creates the group only if
the personality supports sync_request.
Signed-off-by: Neil Brown <neilb@suse.de>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
1/ I really should be using the __ATTR macros for defining attributes, so
that the .owner field get set properly, otherwise modules can be removed
while sysfs files are open. This also involves some name changes of _show
routines.
2/ Always lock the mddev (against reconfiguration) for all sysfs attribute
access. This easily avoid certain races and is completely consistant with
other interfaces (ioctl and /proc/mdstat both always lock against
reconfiguration).
3/ raid5 attributes must check that the 'conf' structure actually exists
(the array could have been stopped while an attribute file was open).
4/ A missing 'kfree' from when the raid5_conf_t was converted to have a
kobject embedded, and then converted back again.
Signed-off-by: Neil Brown <neilb@suse.de>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A sync of raid5 usually ignore blocks which the bitmap says are in-sync. But
a user-request check or repair should not ignore these.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Raid1 currently optimises resync using the intent bitmap etc. This
optimisation is not wanted when we explicitly request a repair through sysfs,
so add appropriate checks.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If a block_device is a partition, then it's kobject is
bdev->bd_part->kobj
otherwise (if it is a full device), the kobject is
bdev->bd_disk->kobj
As md wants back-links to the correct object (whether partition or not), we
need to respect this difference... (Thus current code shows a link to the
whole device, whether we are using a partition or not, which is wrong).
Signed-off-by: Neil Brown <neilb@suse.de>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When an md array is started, the superblock will be written, and resync may
commense. This is not good if you want to be completely read-only as, for
example, when preparing to resume from a suspend-to-disk image.
So introduce a module parameter "start_ro" which can be set
to '1' at boot, at module load, or via
/sys/module/md_mod/parameters/start_ro
When this is set, new arrays get an 'auto-ro' mode, which disables all
internal io (superblock updates, resync, recovery) and is automatically
switched to 'rw' when the first write request arrives.
The array can be set to true 'ro' mode using 'mdadm -r' before the first
write request, or resync can be started without a write using 'mdadm -w'.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
With version-0.90 superblock, component devices on an md device to not have
any stable name related to the array -(version-1 assigns a fixed index when
a device is added to an array, and this remains despit any hot-swap).
The intial code for making these devices appear in sysfs used dynamic
names, which would change whenever a hot-spare was swapped for a failed or
missing device. This turns out not to be practical in sysfs for a number
of reasons.
This patch changes then naming of component devices to be based on the
result of 'bdevname'. This is stable and should be unique.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We can only accept BARRIER requests if all slaves handle
barriers, and that can, of course, change with time....
So we keep track of whether the whole array seems safe for barriers,
and also whether each individual rdev handles barriers.
We initially assumes barriers are OK.
When writing the superblock we try a barrier, and if that fails, we flag
things for no-barriers. This will usually clear the flags fairly quickly.
If writing the superblock finds that BIO_RW_BARRIER is -ENOTSUPP, we need to
resubmit, so introduce function "md_super_wait" which waits for requests to
finish, and retries ENOTSUPP requests without the barrier flag.
When writing the real raid1, write requests which were BIO_RW_BARRIER but
which aresn't supported need to be retried. So raid1d is enhanced to do this,
and when any bio write completes (i.e. no retry needed) we remove it from the
r1bio, so that devices needing retry are easy to find.
We should hardly ever get -ENOTSUPP errors when writing data to the raid.
It should only happen if:
1/ the device used to support BARRIER, but now doesn't. Few devices
change like this, though raid1 can!
or
2/ the array has no persistent superblock, so there was no opportunity to
pre-test for barriers when writing the superblock.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Current bitmaps use set_bit et.al and so are host-endian, which means
not-portable. Oops.
Define a new version number (4) for which bitmaps are little-endian.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This has the advantage of removing the confusion caused by 'rdev_t' and
'mddev_t' both having 'in_sync' fields.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Two refinements to the 'attempt-overwrite-on-read-error' mechanism.
1/ If the array is read-only, don't attempt an over-write.
2/ If there are more than max_nr_stripes read errors on a device with
no success, fail the drive. This will make sure a dead
drive will be eventually kicked even when we aren't trying
to rewrite (which would normally kick a dead drive more quickly.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There isn't really a need for raid5 attributes to be an a subdirectory,
so this patch moves them from
/sys/block/mdX/md/raid5/attribute
to
/sys/block/mdX/md/attribute
This suggests that all md personalities should co-operate about
namespace usage, but that shouldn't be a problem.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
1/ Use reduce stack usage, because 'gcc' apparently doesn't overlay
different variables that are in separate scopes...
2/ Use test_bit instead of ( .. & 1<< ..) which in this case is buggy.
Thanks to Andrew Morton
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
With this, raid5 can be asked to check parity without repairing it. It also
keeps a count of the number of incorrect parity blocks found (mismatches) and
reports them through sysfs.
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
You can trigger a 'check' with
echo check > /sys/block/mdX/md/scan_mode
or a check-and-repair errors with
echo repair > /sys/block/mdX/md/scan_mode
and read the current state from the same file.
Note: personalities need to know the different between 'check' and 'repair',
but don't yet. Until they do, 'check' will be the same as 'repair' and will
just do a normal resync pass.
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
/sys/block/mdX/md/raid5/
contains raid5-related attributes.
Currently
stripe_cache_size
is number of entries in stripe cache, and is settable.
stripe_cache_active
is number of active entries, and in only readable.
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Each device in an md array how has a corresponding
/sys/block/mdX/md/devNN/
directory which can contain attributes. Currently there is only 'state' which
summarises the state, nd 'super' which has a copy of the superblock, and
'block' which is a symlink to the block device.
Also, /sys/block/mdX/md/rdNN represents slot 'NN' in the array, and is a
symlink to the relevant 'devNN'. Obviously spare devices do not have a slot
in the array, and so don't have such a symlink.
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>