Now that the header always goes first, it is possible to
simplify rds_ib_xmit. Instead of having a path to handle 0-byte
dgrams and another path to handle >0, these can both be handled
in one path. This lets us eliminate xmit_populate_wr().
Rename sent to bytes_sent, to differentiate better from other
variable named "send".
Signed-off-by: Andy Grover <andy.grover@oracle.com>
These functions were to cope with differently ordered
sg entries depending on RDS 3.0 or 3.1+. Now that
we've dropped 3.0 compatibility we no longer need them.
Also, modify usage sites for these to refer to sge[0] or [1]
directly. Reorder code to initialize header sgs first.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
RDS 3.0 connections (in OFED 1.3 and earlier) put the
header at the end. 3.1 connections put it at the head.
The code has significant added complexity in order to
handle both configurations. In OFED 1.6 we can
drop this and simplify the code by only supporting
"header-first" configuration.
This patch checks the protocol version, and if prior
to 3.1, does not complete the connection.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
both atomics and rdmas need to convert ib-specific completion codes
into RDS status codes. Rename rds_ib_rdma_send_complete to
rds_ib_send_complete, and have it take a pointer to the function to
call with the new error code.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Instead of using a constant for initiator_depth and
responder_resources, read the per-QP values when the
device is enumerated, and then use these values when creating
the connection.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Implement a CMSG-based interface to do FADD and CSWP ops.
Alter send routines to handle atomic ops.
Add atomic counters to stats.
Add xmit_atomic() to struct rds_transport
Inline rds_ib_send_unmap_rdma into unmap_rm
Signed-off-by: Andy Grover <andy.grover@oracle.com>
The previous code was correct, but made the assumption that
if r_notifier was non-NULL then either r_recverr or r_notify
was true. Valid, but fragile. Changed to explicitly check
r_recverr (shows up in greps for recverr now, too.)
Signed-off-by: Andy Grover <andy.grover@oracle.com>
rds_message_alloc_sgs() now returns correctly-initialized
sg lists, so calleds need not do this themselves.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
This eliminates a separate memory alloc, although
it is now necessary to add an "r_active" flag, since
it is no longer to use the m_rdma_op pointer as an
indicator of if an rdma op is present.
rdma SGs allocated from rm sg pool.
rds_rm_size also gets bigger. It's a little inefficient to
run through CMSGs twice, but it makes later steps a lot smoother.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
r_m_copy_from_user used to allocate the rm as well as kernel
buffers for the data, and then copy the data in. Now, sendmsg()
allocates the rm, although the data buffer alloc still happens
in r_m_copy_from_user.
SGs are still allocated with rm, but now r_m_alloc_sgs() is
used to reserve them. This allows multiple SG lists to be
allocated from the one rm -- this is important once we also
want to alloc our rdma sgl from this pool.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
First, it looks to me like the atomic_inc is wrong.
We should be decrementing refcount only once here, no? It's
already being done by the mr_put() at the end.
Second, simplify the logic a bit by bailing early (with a warning)
if !mr.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Clearly separate rdma-related variables in rm from data-related ones.
This is in anticipation of adding atomic support.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
This function has been the source of numerous bugs; it's just
too complicated. Simplified to nest spinlocks cleanly within
the second loop body, and kick out early if there are no
rms to drop.
This will be a little slower because conn lock is grabbed for
each entry instead of "caching" the lock across rms, but this
should be entirely irrelevant to fastpath performance.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
On second look at this bug (OFED #2002), it seems that the
collision is not with the retransmission queue (packet acked
by the peer), but with the local send completion. A theoretical
sequence of events (from time t0 to t3) is thought to be as
follows,
Thread #1
t0:
sock_release
rds_release
rds_send_drop_to /* wait on send completion */
t2:
rds_rdma_drop_keys() /* destroy & free all mrs */
Thread #2
t1:
rds_ib_send_cq_comp_handler
rds_ib_send_unmap_rm
rds_message_unmapped /* wake up #1 @ t0 */
t3:
rds_message_put
rds_message_purge
rds_mr_put /* memory corruption detected */
The problem with the rds_rdma_drop_keys() is it could
remove a mr's refcount more than its due (i.e. repeatedly
as long as it still remains in the tree (mr->r_refcount > 0)).
Theoretically it should remove only one reference - reference
by the tree.
/* Release any MRs associated with this socket */
while ((node = rb_first(&rs->rs_rdma_keys))) {
mr = container_of(node, struct rds_mr, r_rb_node);
if (mr->r_trans == rs->rs_transport)
mr->r_invalidate = 0;
rds_mr_put(mr);
}
I think the correct way of doing it is to remove the mr from
the tree and rds_destroy_mr it first, then a rds_mr_put()
to decrement its reference count by one. Whichever thread
holds the last reference will free the mr via rds_mr_put().
Signed-off-by: Tina Yang <tina.yang@oracle.com>
Signed-off-by: Andy Grover <andy.grover@oracle.com>
in_interrupt() is true in softirqs. The BUG_ONs are supposed
to check for if irqs are disabled, so we should use
BUG_ON(irqs_disabled()) instead, duh.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
The VF has no flash and can only do memory mapped I/O.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reduce indentation in a couple of places
Add static function ixgbe_psum
Add temporary for adapter->stats
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Did not add #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
because no printk in this module used message prefixing.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Whitespace cleanups.
Move inline keyword after function type declarations.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ordering of operations was messed up in the init and as a result when
VMDQ was enabled we were trying to enable TX rings before setting the VFTE
bits. This resulted in a ring that appeared to fail to enable when in fact
it was blocked because the VFTE bits were cleared after the reset.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Blackhole routes are used when xfrm_lookup() returns -EREMOTE (error
triggered by IKE for example), hence this kind of route is always
temporary and so we should check if a better route exists for next
packets.
Bug has been introduced by commit d11a4dc18b.
Signed-off-by: Jianzhao Wang <jianzhao.wang@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds NAPI support to the qeth layer 2 and layer 3
discipline. It is important to understand that we can not enable/disable
IRQs as usual, we have to use the corresponding new QDIO interface.
Also to not overdraw the budget we have to stop and restart buffer
processing at any point during processing a bulk of QDIO buffers.
Having the driver NAPI enabled it is possible to turn on GRO for the
layer 3 discipline.
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the claw code calls ccwgroup_remove_ccwdev(), we need to make sure
CCWGROUP is enabled when CLAW is enabled. Otherwise we hit fun undefined
references at build time:
ERROR: "ccwgroup_remove_ccwdev" [drivers/s390/net/claw.ko] undefined!
ERROR: "ccwgroup_probe_ccwdev" [drivers/s390/net/claw.ko] undefined!
ERROR: "ccwgroup_driver_register" [drivers/s390/net/claw.ko] undefined!
ERROR: "ccwgroup_driver_unregister" [drivers/s390/net/claw.ko] undefined!
ERROR: "ccwgroup_create_from_string" [drivers/s390/net/claw.ko] undefined!
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Format an ipv6 address using vsprintf extensions.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend the qdio API to allow polling in the upper-layer driver. This
is needed by qeth to use NAPI.
To use the new interface the upper-layer driver must specify the
queue_start_poll(). This callback is used to signal the upper-layer
driver that is has initiative and must process the inbound queue by
calling qdio_get_next_buffers(). If the upper-layer driver wants to
stop polling it calls qdio_start_irq().
Since adapter interrupts are not completely stoppable qdio implements
a software bit QDIO_QUEUE_IRQS_DISABLED to safely disable interrupts for an
input queue.
The old interface is preserved and will be used as is by zfcp.
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Several users report issues with 32-bit adapters when plugged
into PCI slots in machines with >= 4GB ram. In particular AMD
systems with HyperTransport to PCI bridges seem to trigger the
issue, but it isn't limited to only them.
This issue is not easily reproducible here, yet still continues
to occur in the field. For e1000 on PCI devices, just disable DMA
addresses over the 4GB boundary when in PCI (not PCI-X) mode, to
prevent the issue from continuing to pop up. The performance
impact for this is negligible.
The code was refactored to move the init of the hw struct to its
own function. This allows the init to be called very early in
probe, which then allows using hw-> members for this fix.
A slight refactor to the DMA mask code was done for minor
correctness based on the instructions in DMA-API-HOWTO.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We use rcu_dereference_check(p, rcu_read_lock_held() ||
lockdep_rtnl_is_held()) several times in network stack.
More usages to come too, so its time to create a helper.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Karsten Keil <isdn@linux-pingi.de>
Cc: Tilman Schmidt <tilman@imap.cc>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
niu_get_ethtool_tcam_all() assumes that its output buffer is the right
size, and warns before returning if it is not. However, the output
buffer size is under user control and ETHTOOL_GRXCLSRLALL is an
unprivileged ethtool command. Therefore this is at least a local
denial-of-service vulnerability.
Change it to check before writing each entry and to return an error if
the buffer is already full.
Compile-tested only.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Casts __kernel to __user pointer require __force markup, so add it. Also
sock_get/setsockopt() takes @optval and/or @optlen arguments as user pointers
but were taking kernel pointers, use new variables 'uoptval' and/or 'uoptlen'
to fix it. These remove following warnings from sparse:
net/socket.c:1922:46: warning: cast adds address space to expression (<asn:1>)
net/socket.c:3061:61: warning: incorrect type in argument 4 (different address spaces)
net/socket.c:3061:61: expected char [noderef] <asn:1>*optval
net/socket.c:3061:61: got char *optval
net/socket.c:3061:69: warning: incorrect type in argument 5 (different address spaces)
net/socket.c:3061:69: expected int [noderef] <asn:1>*optlen
net/socket.c:3061:69: got int *optlen
net/socket.c:3063:67: warning: incorrect type in argument 4 (different address spaces)
net/socket.c:3063:67: expected char [noderef] <asn:1>*optval
net/socket.c:3063:67: got char *optval
net/socket.c:3064:45: warning: incorrect type in argument 5 (different address spaces)
net/socket.c:3064:45: expected int [noderef] <asn:1>*optlen
net/socket.c:3064:45: got int *optlen
net/socket.c:3078:61: warning: incorrect type in argument 4 (different address spaces)
net/socket.c:3078:61: expected char [noderef] <asn:1>*optval
net/socket.c:3078:61: got char *optval
net/socket.c:3080:67: warning: incorrect type in argument 4 (different address spaces)
net/socket.c:3080:67: expected char [noderef] <asn:1>*optval
net/socket.c:3080:67: got char *optval
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces cx82310_eth driver - driver for USB ethernet port of
ADSL routers based on Conexant CX82310 chips. Such routers usually have
ethernet port(s) too which are bridged together with the USB ethernet port,
allowing the USB-connected machine to communicate to the network (and also
internet through the ADSL, of course).
This is my first driver, so please check thoroughly. As there's no protocol
documentation, it was done with usbsnoop dumps from Windows driver, some
parts (the commands) inspired by cxacru driver and also other usbnet drivers.
The driver passed my testing - some real work and also pings sized from 0 to
65507 B.
The only problem I found is the ifconfig error counter. When I return 0 (or 1
but empty skb) from rx_fixup(), usbnet increases the error counter although
it's not an error condition (because packets can cross URB boundaries). Maybe
the usbnet should be fixed to allow rx_fixup() to return empty skbs (or some
other value, e.g. 2)?
The USB ID of my device is 0x0572:0xcb01 which conflicts with some ADSL modems
using cxacru driver (they probably use the same chipset but simpler
firmware). The modems seem to use bDeviceClass 0 and iProduct "ADSL USB
MODEM", my router uses bDeviceClass 255 and iProduct "USB NET CARD". The
driver matches only devices with class 255 and checks for the iProduct string
during init. I already posted a patch for the cxacru driver to ignore these
devices.
Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When there is only one rps_cpus, skb_get_rxhash() can be eliminated.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This simple patch copies the current approach for SIOCINQ ioctl() from DCCP
into SCTP so that the userland code working with SCTP can use a similar
interface across different protocols to know how much space to allocate for
a buffer.
Signed-off-by: Diego Elio Pettenò <flameeyes@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Do not create expectation when forwarding the PORT
command to avoid blocking the connection. The problem is that
nf_conntrack_ftp.c:help() tries to create the same expectation later in
POST_ROUTING and drops the packet with "dropping packet" message after
failure in nf_ct_expect_related.
- Change ip_vs_update_conntrack to alter the conntrack
for related connections from real server. If we do not alter the reply in
this direction the next packet from client sent to vport 20 comes as NEW
connection. We alter it but may be some collision happens for both
conntracks and the second conntrack gets destroyed immediately. The
connection stucks too.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch: "gro: fix different skb headrooms" in its part:
"2) allocate a minimal skb for head of frag_list" is buggy. The copied
skb has p->data set at the ip header at the moment, and skb_gro_offset
is the length of ip + tcp headers. So, after the change the length of
mac header is skipped. Later skb_set_mac_header() sets it into the
NET_SKB_PAD area (if it's long enough) and ip header is misaligned at
NET_SKB_PAD + NET_IP_ALIGN offset. There is no reason to assume the
original skb was wrongly allocated, so let's copy it as it was.
bugzilla : https://bugzilla.kernel.org/show_bug.cgi?id=16626
fixes commit: 3d3be4333f
Reported-by: Plamen Petrov <pvp-lsts@fs.uni-ruse.bg>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
CC: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Plamen Petrov <pvp-lsts@fs.uni-ruse.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
PCI: bus speed strings should be const
PCI hotplug: Fix build with CONFIG_ACPI unset
PCI: PCIe: Remove the port driver module exit routine
PCI: PCIe: Move PCIe PME code to the pcie directory
PCI: PCIe: Disable PCIe port services during port initialization
PCI: PCIe: Ask BIOS for control of all native services at once
ACPI/PCI: Negotiate _OSC control bits before requesting them
ACPI/PCI: Do not preserve _OSC control bits returned by a query
ACPI/PCI: Make acpi_pci_query_osc() return control bits
ACPI/PCI: Reorder checks in acpi_pci_osc_control_set()
PCI: PCIe: Introduce commad line switch for disabling port services
PCI: PCIe AER: Introduce pci_aer_available()
x86/PCI: only define pci_domain_nr if PCI and PCI_DOMAINS are set
PCI: provide stub pci_domain_nr function for !CONFIG_PCI configs
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: Make fiemap work with sparse files
xfs: prevent 32bit overflow in space reservation
xfs: Disallow 32bit project quota id
xfs: improve buffer cache hash scalability