It is not a good idea to blindly unmask the RX and TX_END interrupts
for all eight queues on all mv643xx_eth hardware, since some variations
of the hardware have less than eight transmit/receive queues, and the
RX/TX_END interrupts for the queues they don't have can be in use by
other interrupt sources.
Signed-off-by: Saeed Bishara <saeed@marvell.com>
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On the platforms that mv643xx_eth is used on, the manual skb->data
alignment logic in mv643xx_eth can be simplified, as the only case we
need to handle is where NET_SKB_PAD is not a multiple of the cache
line size. If this is the case, the extra padding we need can be
computed at compile time, while if NET_SKB_PAD _is_ a multiple of
the cache line size, the code can be optimised out entirely.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the definitions for the SDMA and port serial configuration
register values to where all the other register definitions live,
and expand the shifts to 32 bit constants.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On several mv643xx_eth hardware versions, the two 64bit mib counters
for 'good octets received' and 'good octets sent' are actually 32bit
counters, and reading from the upper half of the register has the same
effect as reading from the lower half of the register: an atomic
read-and-clear of the entire 32bit counter value. This can under heavy
traffic occasionally lead to small numbers being added to the upper
half of the 64bit mib counter even though no 32bit wrap has occured.
Since we poll the mib counters at least every 30 seconds anyway, we
might as well just skip the reads of the upper halves of the hardware
counters without breaking the stats, which this patch does.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Cc: stable@kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, when OOM occurs during rx ring refill, mv643xx_eth will get
into an infinite loop, due to the refill function setting the OOM bit
but not clearing the 'rx refill needed' bit for this queue, while the
calling function (the NAPI poll handler) will call the refill function
in a loop until the 'rx refill needed' bit goes off, without checking
the OOM bit.
This patch fixes this by checking the OOM bit in the NAPI poll handler
before attempting to do rx refill. This means that once OOM occurs,
we won't try to do any memory allocations again until the next invocation
of the poll handler.
While we're at it, change the OOM flag to be a single bit instead of
one bit per receive queue since OOM is a system state rather than a
per-queue state, and cancel the OOM timer on entry to the NAPI poll
handler if it's running to prevent it from firing when we've already
come out of OOM.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Cc: stable@kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Move SDMA configuration from interface up to port probe, to prevent
overwriting the receive coalescing timer value on interface up.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When mv643xx_eth_open() is called to up an interface, port_start()
will first re-program the unicast address filter, and then
re-initialise the PORT_CONFIG register, but that will disable unicast
promiscuous mode if it was enabled by the unicast address filter setup.
This isn't a problem on ifconfig up, as ->set_rx_mode() will be called
shortly afterwards which will program the filters again, but it does
trigger when changing the MTU, which calls mv643xx_eth_stop() and then
mv643xx_eth_open() by hand to repopulate the receive rings with skbuffs
of the new size.
Swap the initialisation of the PORT_START register and the call to
the unicast filter setup function to fix this.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A receive coalescing timeout of 250 usec appears to strike a good
balance between allowing enough received frames to be aggregated for
LRO to do its job and not allowing the connection to stall due to
delaying ACKs to the remote end for too long.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the netif_carrier_off() call in ->open() to port probe, so that
ethtool doesn't report the link as being up before we have up'd the
interface.
Move initialisation of the rx/tx coalescing timers from ->open() to
port probe, so that we don't reset the coalescing timers every time
the interface is up'd.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
mib_counters_update() also restarts the timer.
So the timer is dequeued, the stats are read and then the timer is
enqueued again. This is "okay" unless someone unloads the module.
The locking here is also broken:
mib_counters_update() grabs just a simple spinlock. The only thing the
lock is good for is to protect the timer func against other callers
namely mv643xx_eth_stop() && mv643xx_eth_get_ethtool_stats(). That means
if the spinlock is taken via the ethtool path and than the timer kicks
in then the box will lock up.
Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Acked-by: Lennert Buytenhek <buytenh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Controlled by a compile-time (Kconfig) option for now, since it
isn't a win in all cases.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rename the mp->default_[rt]x_ring_size variables to ->[rt]x_ring_size,
allow them to be read via the standard ethtool ->get_ringparam() op,
and add a ->set_ringparam() op to allow resizing them at run time.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch:
- increases the precision of the receive/transmit interrupt
coalescing register value computations by using 64bit temporaries;
- adds functions to read the current hardware coalescing register
values and convert them back to usecs;
- exports the {get,set} {rx,tx} coal methods via the standard
ethtool coalescing interface.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's a waste having two different versions of this structure around
when the differences between ethtool ops for phy'd and phy-less
interfaces are so minor.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Contrary to what the docs say, the 'extended interrupt cause' bit in
the interrupt cause register (bit 1) appears to not be maskable on at
least some of the mv643xx_eth platforms, making writing zeroes to the
interrupt mask register but not the extended interrupt mask register
insufficient to stop interrupts from occuring. Therefore, also write
zeroes to the extended interrupt mask register when shutting down the
port.
This fixes the interrupt storm seen on the Pegasos board when shutting
down the interface.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 66e63ffbc0 ("mv643xx_eth:
implement ->set_rx_mode()") cleaned up mv643xx_eth's multicast filter
programming, but broke it as well.
The non-special multicast filter table (for multicast addresses that
are not of the form 01:00:5e:00:00:xx) consists of 256 hash table
buckets organised as 64 32-bit words, where the 'accept' bits are
in the LSB of each byte, so in bits 24 16 8 0 of each 32-bit word.
The old code got this right, but the referenced commit broke this by
using bits 3 2 1 0 instead. This commit fixes this up.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit cd4ccf76bf.
On the Pegasos board, we can't do DMA burst that are longer than
one cache line. For now, go back to using 32 byte DMA bursts for
all mv643xx_eth platforms -- we can switch the ARM-based platforms
back to doing long 128 byte bursts in the next development cycle.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Reported-by: Alan Curry <pacman@kosh.dhis.org>
Reported-by: Gabriel Paubert <paubert@iram.es>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, if multiple unicast addresses are programmed into a
mv643xx_eth interface, the core networking will resort to enabling
promiscuous mode on the interface, as mv643xx_eth does not implement
->set_rx_mode().
This patch switches mv643xx_eth over from ->set_multicast_list()
to ->set_rx_mode(), and implements support for secondary unicast
addresses. The hardware can handle multiple unicast addresses as
long as their first 11 nibbles are the same (i.e. are of the form
xx:xx:xx:xx:xx:xy where the x part is the same for all addresses), so
if that is the case, we use that mode. If it's not the case, we enable
unicast promiscuous mode in the hardware, which is slightly better than
enabling promiscuous mode for multicasts as well, which is what would
happen before.
While we are at it, change the programming sequence so that we
don't clear all filter bits first, so we don't lose all incoming
packets while the filter is being reprogrammed.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since txq_alloc_desc_index() is a very simple function, and since
descriptor ring index handling for transmit reclaim, receive
processing and receive refill is already handled inline as well,
inline txq_alloc_desc_index() into its two call sites.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The mv643xx_eth driver uses the rdl()/wrl() macros to read and
write hardware registers. Per-port registers are accessed in the
following way:
#define PORT_STATUS(p) (0x0444 + ((p) << 10))
[...]
static inline u32 rdl(struct mv643xx_eth_private *mp, int offset)
{
return readl(mp->shared->base + offset);
}
[...]
port_status = rdl(mp, PORT_STATUS(mp->port_num));
By giving the per-port 'struct mv643xx_eth_private' its own
'void __iomem *base' pointer that points to the per-port register
area, we can get rid of both the double indirection and the << 10
that is done for every per-port register access -- this patch does
that.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix up a couple of coding style issues caught by checkpatch.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When mv643xx_eth allocates skbuffs, it adds
'dma_get_cache_alignment() - 1' to the length it needs, so that it can
align the skb's ->data pointer to a cache boundary. When checking
whether a transmitted skbuff can be reused as a receive buffer, these
bytes needs to be included into the minimum bound for the recycle check.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The generic packet receive code takes care of setting
netdev->last_rx when necessary, for the sake of the
bonding ARP monitor.
Drivers need not do it any more.
Some cases had to be skipped over because the drivers
were making use of the ->last_rx value themselves.
Signed-off-by: David S. Miller <davem@davemloft.net>
The mv643xx_eth mii bus implementation uses wait_event_timeout() to
wait for SMI completion interrupts.
If wait_event_timeout() would return zero, mv643xx_eth would conclude
that the SMI access timed out, but this is not necessarily true --
wait_event_timeout() can also return zero in the case where the SMI
completion interrupt did happen in time but where it took longer than
the requested timeout for the process performing the SMI access to be
scheduled again. This would lead to occasional SMI access timeouts
when the system would be under heavy load.
The fix is to ignore the return value of wait_event_timeout(), and
to re-check the SMI done bit after wait_event_timeout() returns to
determine whether or not the SMI access timed out.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
This converts pretty much everything to print_mac. There were
a few things that had conflicts which I have just dropped for
now, no harm done.
I've built an allyesconfig with this and looked at the files
that weren't built very carefully, but it's a huge patch.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
mv643xx_eth uses ip_hdr() (defined in linux/ip.h), but relied on
another header file to include the needed header file indirectly.
In latest net-next this indirect include chain is gone, so the
driver fails to build. Include linux/ip.h explicitly to fix this.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces mdiobus_alloc() and mdiobus_free(), and
makes all mdio bus drivers use these functions to allocate their
struct mii_bus'es dynamically.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Andy Fleming <afleming@freescale.com>
In preparation of giving mii_bus objects a device tree presence of
their own, rename struct mii_bus's ->dev argument to ->parent, since
having a 'struct device *dev' that points to our parent device
conflicts with introducing a 'struct device dev' representing our own
device.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Andy Fleming <afleming@freescale.com>
This gives a nice increase in the maximum loss-free packet forwarding
rate in routing workloads.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Switch mv643xx_eth from using drivers/net/mii.c to using phylib.
Since the mv643xx_eth hardware does all the link state handling and
PHY polling, the driver will use phylib in the "Doing it all yourself"
mode described in the phylib documentation.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Andy Fleming <afleming@freescale.com>
If we don't poll the hardware statistics counters at least once every
~34 seconds, overflow might occur without us noticing. So, set up a
timer to poll the statistics counters at least once every 30 seconds.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
When the IP header doesn't start 14, 18, 22 or 26 bytes into the packet
(which are the only four cases that the hardware can deal with if asked
to do IP checksumming on transmit), invoke the software checksum helper
instead of letting the packet go out with a corrupt checksum inserted
into the packet in the wrong place.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
We have to explicitly tell the hardware to include the pseudo-header
when doing receive checksumming, otherwise hardware checksumming will
fail for every received packet and we'll end up setting CHECKSUM_NONE
on every received packet.
While we're at it, when skb->ip_summed is set to CHECKSUM_UNNECESSARY
on received packets, skb->csum is supposed to be undefined, and thus
there is no need to set it.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Add support for mv643xx_eth versions that have no transmit bandwidth
control registers at all, such as the ethernet block found in the
Marvell 88F6183 ARM SoC.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>