On systems that create and delete lots of dynamic devices the
31bit linux ifindex fails to fit in the 16bit macvtap minor,
resulting in unusable macvtap devices. I have systems running
automated tests that that hit this condition in just a few days.
Use a linux idr allocator to track which mavtap minor numbers
are available and and to track the association between macvtap
minor numbers and macvtap network devices.
Remove the unnecessary unneccessary check to see if the network
device we have found is indeed a macvtap device. With macvtap
specific data structures it is impossible to find any other
kind of networking device.
Increase the macvtap minor range from 65536 to the full 20 bits
that is supported by linux device numbers. It doesn't solve the
original problem but there is no penalty for a larger minor
device range.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Place macvlan_common_newlink at the end of macvtap_newlink because
failing in newlink after registering your network device is not
supported.
Move device_create into a netdevice creation notifier. The network device
notifier is the only hook that is called after the network device has been
registered with the device layer and before register_network_device returns
success.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To avoid leaking packets in the receive queue. Add a socket destructor
that will run whenever destroy a macvtap socket.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To see if it is appropriate to enable the macvtap zero copy feature
don't test the lowerdev network device flags. Instead test the
macvtap network device flags which are a direct copy of the lowerdev
flags. This is important because nothing holds a reference to lowerdev
and on a very bad day we lowerdev could be a pointer to stale memory.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a small window in macvtap_open between looking up a
networking device and calling macvtap_set_queue in which
macvtap_del_queues called from macvtap_dellink. After
calling macvtap_del_queues it is totally incorrect to
allow macvtap_set_queue to proceed so prevent success by
reporting that all of the available queues are in use.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We must account in skb->truesize, the size of the fragments, not the
used part of them.
Doing this work is important to avoid unexpected OOM situations.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: "Michael S. Tsirkin" <mst@redhat.com>
CC: virtualization@lists.linux-foundation.org
CC: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bnx2x allocates a full page per fragment.
We must account in skb->truesize, the size of the fragment, not the used
part of it.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I've split this bit out of the skb frag destructor patch since it helps enforce
the use of the fragment API.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: James Bottomley <James.Bottomley@suse.de>
Cc: Karen Xie <kxie@chelsio.com>
Cc: linux-scsi@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Casey Leedom <leedom@chelsio.com>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Dimitris Michailidis <dm@chelsio.com>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
As soon as the nfs_client gets created, its cl_rpcclient is set to
ERR_PTR(-EINVAL). The rpc client structure is allocated later. Check
if the client is ready before using the cl_rpcclient pointer.
Signed-off-by: Malahal Naineni <malahal@us.ibm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Allow query access as a normal user removing the need
for CAP_MAC_ADMIN. Give RW access to /smack/access
for UGO. Do not import smack labels in access check.
Signed-off-by: Jarkko Sakkinen <jarkko.j.sakkinen@gmail.com>
Signed-off-by: Casey Schaufler <cschaufler@cschaufler-intel.(none)>
Support the following models: Super Joy Box 3 Pro, Super Dual Box Pro
and Super Joy Box 5 Pro. These models have support for pressure
sensitive buttons and they can force the controller to either digital
or analog mode, both of which are not supported yet.
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Up till now the IP{,V6}_TRANSPARENT socket options (which actually set
the same bit in the socket struct) have required CAP_NET_ADMIN
privileges to set or clear the option.
- we make clearing the bit not require any privileges.
- we allow CAP_NET_ADMIN to set the bit (as before this change)
- we allow CAP_NET_RAW to set this bit, because raw
sockets already pretty much effectively allow you
to emulate socket transparency.
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcp_fin() only needs socket pointer, we can remove skb and th params.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch enables the ethtool interface. The implementation is done
using the libphy helper functions.
Signed-off-by: David S. Miller <davem@davemloft.net>
control these three function declarations and
definitions with same macro CONFIG_PCI_IOV
drivers/net/ethernet/intel/igb/igb_main.c:165:
warning: ‘igb_vf_configure’ declared ‘static’ but never defined
drivers/net/ethernet/intel/igb/igb_main.c:166:
warning: ‘igb_find_enabled_vfs’ declared ‘static’ but never defined
drivers/net/ethernet/intel/igb/igb_main.c:167:
warning: ‘igb_check_vf_assignment’ declared ‘static’ but never defined
Signed-off-by: RongQing Li <roy.qing.li@gmail.com>
Acked-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rather than clipping the number of CPUs using the compile-time NR_CPUS
constant, use the runtime nr_cpu_ids value instead. This allows the
nr_cpus command line option to work as expected.
Cc: <stable@kernel.org>
Reported-by: Mark Salter <msalter@redhat.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
skb->truesize must account for allocated memory, not the used part of
it. Doing this work is important to avoid unexpected OOM situations.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Jon Mason <mason@myri.com>
Acked-by: Jon Mason <mason@myri.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
igbvf allocates half a page per skb fragment. We must account
PAGE_SIZE/2 increments on skb->truesize, not the actual frag length.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Upon adding new board LL debug support, if the resultant code
addition would not cause PC relative offset of "hexbuf" from
"adr r2, hexbuf" (+2) instruction to be representable in a
shifted 8-bit value (hence indirectly putting higher aligment
requirement on larger offsets), following error occurs,
arch/arm/kernel/debug.S: Assembler messages:
arch/arm/kernel/debug.S:138: Error: invalid constant (428) after fixup
Fix it by bringing "hexbuf" closer so that "adr"
can have the offset.
Signed-off-by: Afzal Mohammed <afzal@ti.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Daniel Turull reported inaccuracies in pktgen when using low packet
rates, because we call ndelay(val) with values bigger than 20000.
Instead of calling ndelay() for delays < 100us, we can instead loop
calling ktime_now() only.
Reported-by: Daniel Turull <daniel.turull@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit 356f039822 (TCP: increase default initial receive
window.), we allow sender to send 10 (TCP_DEFAULT_INIT_RCVWND) segments.
Change tcp_fixup_rcvbuf() to reflect this change, even if no real change
is expected, since sysctl_tcp_rmem[1] = 87380 and this value
is bigger than tcp_fixup_rcvbuf() computed rcvmem (~23720)
Note: Since commit 356f039822 limited default window to maximum of
10*1460 and 2*MSS, we use same heuristic in this patch.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It seems ip_gre is able to change dev->needed_headroom on the fly.
Its is not legal unfortunately and triggers a BUG in raw_sendmsg()
skb = sock_alloc_send_skb(sk, ... + LL_ALLOCATED_SPACE(rt->dst.dev)
< another cpu change dev->needed_headromm (making it bigger)
...
skb_reserve(skb, LL_RESERVED_SPACE(rt->dst.dev));
We end with LL_RESERVED_SPACE() being bigger than LL_ALLOCATED_SPACE()
-> we crash later because skb head is exhausted.
Bug introduced in commit 243aad83 in 2.6.34 (ip_gre: include route
header_len in max_headroom calculation)
Reported-by: Elmar Vonlanthen <evonlanthen@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Timo Teräs <timo.teras@iki.fi>
CC: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
fib_rules: fix unresolved_rules counting
r8169: fix wrong eee setting for rlt8111evl
r8169: fix driver shutdown WoL regression.
ehea: Change maintainer to me
pptp: pptp_rcv_core() misses pskb_may_pull() call
tproxy: copy transparent flag when creating a time wait
pptp: fix skb leak in pptp_xmit()
bonding: use local function pointer of bond->recv_probe in bond_handle_frame
smsc911x: Add support for SMSC LAN89218
tg3: negate USE_PHYLIB flag check
netconsole: enable netconsole can make net_device refcnt incorrent
bluetooth: Properly clone LSM attributes to newly created child connections
l2tp: fix a potential skb leak in l2tp_xmit_skb()
bridge: fix hang on removal of bridge via netlink
x25: Prevent skb overreads when checking call user data
x25: Handle undersized/fragmented skbs
x25: Validate incoming call user data lengths
udplite: fast-path computation of checksum coverage
IPVS netns shutdown/startup dead-lock
netfilter: nf_conntrack: fix event flooding in GRE protocol tracker
Just another step in stopping the use of libnewt in perf.
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-uy6s534uqxq8tenh6s3k8ocj@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Fixing the way the tracing information is stored within record command.
The current implementation is causing issues for pipe output.
Following commands fail currently:
perf script syscall-counts ls
perf record -e syscalls:sys_exit_read ls | ./perf report -i -
The tracing information is part of the perf data file. It contains
several files from within the tracing debugfs and procs directories.
Beside some static header files, for each tracing event the format
file is added. The /proc/kallsyms file is also added.
The tracing data are stored with preceeding size. This is causing some
dificulties for pipe output, since there's no way to tell debugfs/proc
file size before reading it. So, for pipe output, all the debugfs files
were read twice. Once to get the overall size and once to store the
content itself. This can cause problem in case any of these file
changed, within the storage time.
To fix this behaviour and ensure the integrity of the tracing data, we:
- read debugfs/proc file into the temp file
- get temp file size and dump it to the pipe
- dump the temp file contents to the pipe
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20111020135943.GD2092@jolsa.brq.redhat.com
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Since 8-bit temperature values are now handled in 16-bit struct
members, values have to be cast to s8 for negative temperatures to be
properly handled. This is broken since kernel version 2.6.39
(commit bce26c58df86599c9570cee83eac58bdaae760e4.)
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Guenter Roeck <guenter.roeck@ericsson.com>
Cc: stable@kernel.org # 2.6.39+
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
And also no leed to show the [.] (level: k, . for userspace) when
showing just one DSO.
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-4h3f6ro5o7ebepjbssxf0dd3@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Instead, store a pointer to the currently assigned function.
This allows us to delete the mux_requested variable from pin_desc; a pin
is requested if its currently assigned function is non-NULL.
When a pin is requested as a GPIO rather than a regular function, the
assigned function name is dynamically constructed. In this case, we have
to kstrdup() the dynamically constructed name, so that mux_function doesn't
pointed at stack data. This requires pin_free to be told whether to free
the mux_function pointer or not.
This removes the hard-coded maximum function name length.
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
A pin controller's names array is no longer marked __refdata. Hence, we
can avoid copying a pin's name into the descriptor when registering it.
Instead, just point at the string supplied in the pin array.
This both simplifies and speeds up pin controller initialization, but
also removes the hard-coded maximum pin name length.
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
A pin controller's pin definitions are used both during pinctrl_register()
and pinctrl_unregister(). The latter happens outside of __init/__devinit
time, and hence it is unsafe to mark the pin array as __refdata.
Acked-by: Barry Song <Baohua.Song@csr.com>
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
get_group_pins() "returns" a pointer to an array of const objects, through
a pointer parameter. Fix the prototype so what's pointed at by the returned
pointer is const, rather than the function parameter being const.
This also allows the removal of a cast in each of the two current pinmux
drivers.
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
A few network drivers currently use skb_frag_struct for this purpose but I have
patches which add additional fields and semantics there which these other uses
do not want.
A structure for reference sub-page regions seems like a generally useful thing
so do so instead of adding a network subsystem specific structure.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Jens Axboe <jaxboe@fusionio.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
skb->truesize must account for allocated memory, not the used part of
it. Doing this work is important to avoid unexpected OOM situations.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove manual initialization in set_skb_frag, and instead
use __skb_fill_page_desc() to do the same. Patch tested
on net-next.
Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Following the 'perf report' model we don't zap hist_entry instances from
the rb tree, we just keep them with he->filtered set to a mask of the
filters applied to it (thread, parent, DSO so far).
In top we need to decay even filtered entries, but we better not touch
total_period for them...
Now everything seems to work when filters are applied on top as they
worked in 'report', i.e. both dynamic and static hist entry browsing
works with filters.
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-yt4xsbq20u9x9ypuwwyw2kao@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
I don't usually pay much attention to the stale "? " addresses in
stack backtraces, but this lucky report from Pawel Sikora hints that
mremap's move_ptes() has inadequate locking against page migration.
3.0 BUG_ON(!PageLocked(p)) in migration_entry_to_page():
kernel BUG at include/linux/swapops.h:105!
RIP: 0010:[<ffffffff81127b76>] [<ffffffff81127b76>]
migration_entry_wait+0x156/0x160
[<ffffffff811016a1>] handle_pte_fault+0xae1/0xaf0
[<ffffffff810feee2>] ? __pte_alloc+0x42/0x120
[<ffffffff8112c26b>] ? do_huge_pmd_anonymous_page+0xab/0x310
[<ffffffff81102a31>] handle_mm_fault+0x181/0x310
[<ffffffff81106097>] ? vma_adjust+0x537/0x570
[<ffffffff81424bed>] do_page_fault+0x11d/0x4e0
[<ffffffff81109a05>] ? do_mremap+0x2d5/0x570
[<ffffffff81421d5f>] page_fault+0x1f/0x30
mremap's down_write of mmap_sem, together with i_mmap_mutex or lock,
and pagetable locks, were good enough before page migration (with its
requirement that every migration entry be found) came in, and enough
while migration always held mmap_sem; but not enough nowadays, when
there's memory hotremove and compaction.
The danger is that move_ptes() lets a migration entry dodge around
behind remove_migration_pte()'s back, so it's in the old location when
looking at the new, then in the new location when looking at the old.
Either mremap's move_ptes() must additionally take anon_vma lock(), or
migration's remove_migration_pte() must stop peeking for is_swap_entry()
before it takes pagetable lock.
Consensus chooses the latter: we prefer to add overhead to migration
than to mremapping, which gets used by JVMs and by exec stack setup.
Reported-and-tested-by: Paweł Sikora <pluto@agmk.net>
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I audited all of the callers in the tree and only one of them (pktgen) expects
it to do so. Taking this reference is pretty obviously confusing and error
prone.
In particular I looked at the following commits which switched callers of
(__)skb_frag_set_page to the skb paged fragment api:
6a930b9f16 cxgb3: convert to SKB paged frag API.
5dc3e196ea myri10ge: convert to SKB paged frag API.
0e0634d20d vmxnet3: convert to SKB paged frag API.
86ee8130a4 virtionet: convert to SKB paged frag API.
4a22c4c919 sfc: convert to SKB paged frag API.
18324d690d cassini: convert to SKB paged frag API.
b061b39e3a benet: convert to SKB paged frag API.
b7b6a688d2 bnx2: convert to SKB paged frag API.
804cf14ea5 net: xfrm: convert to SKB frag APIs
ea2ab69379 net: convert core to skb paged frag APIs
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is just a cleanup.
My testing version of Smatch warns about this:
net/core/filter.c +380 check_load_and_stores(6)
warn: check 'flen' for negative values
flen comes from the user. We try to clamp the values here between 1
and BPF_MAXINSNS but the clamp doesn't work because it could be
negative. This is a bug, but it's not exploitable.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
"ethtool -e ethX" dumps EEPROM data. Patch sets EEPROM length for device.
Ethtool works alot better when the kernel believes the length is > 0.
From: Allan Chou <allan@asix.com.tw>
Signed-off-by: Grant Grundler <grundler@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>