I noticed ring variable was initialized before allocations, and that
memory node management was a bit ugly. We also leak memory in case of
ring allocations error.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch adds support for the x540 MAC which is the next MAC in the
82598/82599 line.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Adds the new x540.c file and Aquantia 1202 PHY for X540 support.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The Tx hang logic has been known to detect false hangs when
the device is receiving pause frames or has delayed processing
for some other reason.
This patch makes the logic more robust and resolves these
known issues. The old logic checked to see if the device
was paused by querying the HW then the hang logic was
aborted if the device was currently paused. This check was
racy because the device could have been in the pause state
any time up to this check. The other operation of the
hang logic is to verify the Tx ring is still advancing
the old logic checked the EOP timestamp. This is not
sufficient to determine the ring is not advancing but
only infers that it may be moving slowly.
Here we add logic to track the number of completed Tx
descriptors and use the adapter stats to check if any
pause frames have been received since the previous Tx
hang check. This way we avoid racing with the HW
register and do not detect false hangs if the ring is
advancing slowly.
This patch is primarily the work of Jesse Brandeburg. I
clean it up some and fixed the PFC checking.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change resolves some null function pointer accesses on 82598 when a
multi-speed fiber module is inserted into the adapter.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The q_vector back pointer was not being set in the rings so it would not
have been possible to determine the parent q_vector of the ring.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change cleans up some of the items in ixgbe_map_rings_to_vectors.
Specifically it merges the two for loops and drops the unnecessary vectors
parameter.
It also moves the vector names into the q_vectors themselves.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change is meant to improve the stack utilization and simplify the math
used in ixgbe_set_itr_msix.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
There are a number of places where we use the variable j to contain the
register index of the ring. Instead of using such a non-descriptive
variable name it is better that we name it reg_idx so that it is clear what
the variable contains.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change is just to cleanup some confusing logic in ixgbe_cache_ring_rss
which can be simplified by adding a conditional with return to the start of
the call.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change adds support for certain 82599 based Mezzanine adapters.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change replaces a number of if/elseif/else statements with switch
statements to support the addition of future devices to the ixgbe driver.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change cleans up the use of rsc_count and changes it to a boolean since
the actual numerical value is used nowhere in the Rx cleanup path. I am
also moving the skb count into the RSC_CB path since it is much easier to
track it there than when it is passed as a parameter to various function
calls.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change cleans up the ixgbe_atr filter setup function so that it uses
fewer items from the stack. Since the code is only applicable to IPv4 w/
TCP it makes sense to just use the pointers based on the headers themselves
instead of copying them to temp variables and then writing those to the
filters.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The code for ixgbe_clean_rx_irq was much more tangled up than it needed to
be in terms of logic statements and unused variables. This change
untangles much of that and drops several unused variables such as cleaned
which was being returned but never checked.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This changes the numbering scheme slightly. Previously the ordering was
coming out like this:
Rx-2
Rx-1
Rx-0
TxRx-0
Which would drop two queues on CPU 0. This change makes it so that the
ordering is like this:
Rx-3
Rx-2
Rx-1
TxRx-0
This means that each CPU will have it's own Rx queue, and only CPU 0 will
have the Tx queue.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The code as it existed could re-arm the queues when it was requesting a HW
reset due to a TX hang. Instead of doing that this change makes it so that
we will just exit if the hardware is believed to be hung.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change makes it so that we perform link setup with interrupts
disabled. If the SFP has not been detected previously we will schedule the
SFP detection task to run in order to detect link. By doing this we avoid
the possibility of interrupts firing in the middle of our link setup during
ixgbe_up_complete.
In addition this change makes it so that the multi-speed fiber setup and SFP
setup are not mutually exclusive. The addresses issues seen in which a
link would only come up at 1G on some multi-speed fiber modules.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change adds a set of state flags to the rings that allow them to
independently function allowing for features like RSC, packet split, and
TX hang detection to be done per ring instead of for the entire device.
This is accomplished by re-purposing the flow director reinit_state member
and making it a global state instead since a long for a single bit flag is
a bit wasteful.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This is the start of work to sort out what belongs in the rings and what
belongs in the q_vector. Items like the CPU variable for make much more
sense in the q_vector since the CPU is a per-interrupt thing rather than a
per ring thing.
I also added a back-pointer from the ring to the q_vector.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change moves an adapter pointer into the private portion of the
pci_dev instead of a pointer to the netdev. The reason for this change is
because in most cases we just want the adapter anyway. In addition as we
start moving toward multiple netdevs per port we may want to move the
adapter pointer out of the netdevs entirely.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Missed some code that was left floating around in the DCB configuration
for the TXDCTL register. As a result the register was being messed with in
two different spots when we only needed to do the change once.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The main reason for this change is to keep the suspend/resume logic matched
up. The clear_interrupt_scheme function will disable MSI-X which will
effect the PCIe configuration space. Therefore we will want to do it before
we save state to avoid having the interrupt state restored by
pci_restore_state, and then trying to re-enable MSI/MSI-X interrupts via
ixgbe_setup_interrupt_scheme.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change places a netdev pointer directly into the ring structure. This
way we can avoid having to determine which netdev we are supposed to be
using and can just access the one on the ring directly.
As a result of this change further collapse of the code is possible by
dropping the adapter from ixgbe_alloc_rx_buffers, and the netdev pointer
from ixgbe_xmit_frame_ring_adv and ixgbe_maybe_stop_tx.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change moved some of the RX and TX stats into separate structures and
them placed those structures in a union in order to help reduce the size of
the ring structure.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change is meant to simplify DMA map/unmap by providing a device
pointer. As a result the adapter pointer can be dropped from many of
the calls.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change drops ring->head since it is not used in any hot-path and can
easily be determined using IXGBE_[RT]DH(ring->reg_idx).
It also changes ring->tail into a true pointer so we can avoid unnecessary
pointer math to find the location of the tail.
In addition I also dropped the setting of head and tail in
ixgbe_clean_[rx|tx]_ring. The only location that should be setting the head
and tail values is ixgbe_configure_[rx|tx]_ring and that is only while the
queue is disabled.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change re-orders alloc_rx_buffers to make better use of the packet
split enabled flag. The new setup should require less branching in the
code since now we are down to fewer if statements since we either are
handling packet split or aren't.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change simplifies the work being done by the TX interrupt handler and
pushes it into the tx_map call. This allows for fewer cache misses since
the TX cleanup now accesses almost none of the skb members.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The maximum credits per traffic class only needs to be greater
then the TSO size for 82598 devices. The 82599 devices do not
have this requirement so only do this test for 82598 devices.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Currently the high and low water marks for PFC are being set
conservatively for jumbo frames. This means the RX buffers
are being underutilized in the default 1500 MTU. This patch
fixes this so that the water marks are set as described in
the data sheet considering the MTU size.
The equation used is,
RTT * 1.44 + MTU * 1.44 + MTU
Where RTT is the round trip time and MTU is the max frame size
in KB. To avoid floating point arithmetic FC_HIGH_WATER is
defined
((((RTT + MTU) * 144) + 99) / 100) + MTU
This changes how the hardware field fc.low_water and
fc.high_water are used. With this change they are no longer
storing the actual low water and high water markers but are
storing the required head room in the buffer. This simplifies
the logic and we do not need to account for the size of the
buffer when setting the thresholds.
Testing with iperf and 16 threads showed a slight uptick in
throughput over a single traffic class .1-.2Gbps and a reduction
in pause frames. Without the patch a 30 second run would show
~10-15 pause frames being transmitted with the patch ~2-5 are
seen. Test were run back to back with 82599.
Note RXPBSIZE is in KB and low and high water marks fields are
also in KB. However the FCRT* registers are 32B granularity and
right shifted 5 into the register,
(((rx_pbsize - water_mark) * 1024) / 32) << 5
is the most explicit conversion here we simplify
(rx_pbsize - water_mark) * 32 << 5 = (rx_pbsize - water_mark) << 10
This patch updates the PFC thresholds and legacy FC thresholds.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
"cat /proc/net/dev" uses RCU protection only.
Its quite possible we call a driver get_stats() method while device is
dismantling and freeing its data structures.
So get_stats() methods must be very careful not accessing driver private
data without appropriate locking.
In ixgbe case, we access rx_ring pointers. These pointers are freed in
ixgbe_clear_interrupt_scheme() and set to NULL, this can trigger NULL
dereference in ixgbe_get_stats64()
A possible fix is to use RCU locking in ixgbe_get_stats64() and defer
rx_ring freeing after a grace period in ixgbe_clear_interrupt_scheme()
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Reported-by: Tantilov, Emil S <emil.s.tantilov@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Currently the skb->protocol field is used to setup various
offloading parameters on transmit for the correct protocol.
However, if vlan offloading is disabled or otherwise not used,
the protocol field will be ETH_P_8021Q, not the actual protocol.
This will cause the offloading to be not performed correctly,
even though the hardware is capable of looking inside vlan tags.
Instead, look inside the header if necessary to determine the
correct protocol type.
To some extent this fixes a regression from 2.6.36 because it
was previously not possible to disable vlan offloading and this
error case was not exposed.
Signed-off-by: Hao Zheng <hzheng@nicira.com>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
CC: Alex Duyck <alexander.h.duyck@intel.com>
CC: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The DCB credits refill quantum _must_ be greater than half the max
packet size. This is needed to guarantee that TX DMA operations
are not attempted during a pause state. Additionally, the min IFG
must be set correctly for DCB mode. If a DMA operation is
requested unexpectedly during the pause state the HW data
store may be corrupted leading to a DMA hang. The DMA hang
requires a reset to correct. This fixes the HW configuration
to avoid this condition.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current ixgbe stats have following problems :
- Not 64 bit safe (on 32bit arches)
- Not safe in ixgbe_clean_rx_irq() :
All cpus dirty a common location (netdev->stats.rx_bytes &
netdev->stats.rx_packets) without proper synchronization.
This slow down a bit multiqueue operations, and possibly miss some
updates.
Fixes :
Implement ndo_get_stats64() method to provide accurate 64bit rx|tx
bytes/packets counters, using 64bit safe infrastructure.
ixgbe_get_ethtool_stats() also use this infrastructure to provide 64bit
safe counters.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make the ixgbe driver use the new vlan accleration model.
Signed-off-by: Jesse Gross <jesse@nicira.com>
CC: Peter Waskiewicz <peter.p.waskiewicz.jr@intel.com>
CC: Emil Tantilov <emil.s.tantilov@intel.com>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Many (but not all) drivers check to see whether there is a vlan
group configured before using a tag stored in the skb. There's
not much point in this check since it just throws away data that
should only be present in the expected circumstances. However,
it will soon be legal and expected to get a vlan tag when no
vlan group is configured, so remove this check from all drivers
to avoid dropping the tags.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
VLAN_GROUP_ARRAY_LEN is simply the number of possible vlan VIDs.
Since vlan groups will soon be more of an implementation detail
for vlan devices, rename the constant to be descriptive of its
actual purpose.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove a DCB check config from DCB configuration we
continue to configure DCB even if it fails so don't
even bother to check. Plus user space (lldpad) checks
this before programming the hw anyways.
Worse case is we program some values into the hw that
don't make total sense resulting in incorrect bandwidth
allocation.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the new infrastructure to balance interrupts for flow
alignment when ATR or Flow Director are enabled.
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix possible panic/hang with shared Legacy interrupts by not enabling
interrupts when interface is down.
Also fixes an intermittent link by enabling LSC upon exit from ixgbe_intr()
This patch adds flags to ixgbe_irq_enable() to allow for some flexibility
when enabling interrupts.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change "return (EXPR);" to "return EXPR;"
return is not a function, parentheses are not required.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the netdev->features is set with NETIF_F_HIGHDMA, we should set the
corresponding netdev->vlan_features as well to allow VLAN netdev created
on top of the real netdev to be able to also benefit from HIGHDMA on 32bit
system, reducing the performance hit that is caused by __skb_linearize(),
particularly for large send. This is fixed in this patch for all Intel e1000,
e1000e, igb, ixgbe, and ixgbe drivers since this should be beneficial
to all devices supported by these drivers.
Signed-off-by: Yi Zou <yi.zou@intel.com>
Tested-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reduce indentation in a couple of places
Add static function ixgbe_psum
Add temporary for adapter->stats
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Did not add #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
because no printk in this module used message prefixing.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Whitespace cleanups.
Move inline keyword after function type declarations.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ordering of operations was messed up in the init and as a result when
VMDQ was enabled we were trying to enable TX rings before setting the VFTE
bits. This resulted in a ring that appeared to fail to enable when in fact
it was blocked because the VFTE bits were cleared after the reset.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
fresh skbs have ip_summed set to CHECKSUM_NONE (0)
We can avoid setting again skb->ip_summed to CHECKSUM_NONE in drivers.
Introduce skb_checksum_none_assert() helper so that we keep this
assertion documented in driver sources.
Change most occurrences of :
skb->ip_summed = CHECKSUM_NONE;
by :
skb_checksum_none_assert(skb);
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
"foo = &function" is more commonly written "foo = function"
Done with coccinelle script:
// <smpl>
@r@
identifier f;
@@
f(...) { ... }
@@
identifier r.f;
@@
- &f
+ f
// </smpl>
drivers/net/tehuti.c used a function and struct with the
same name, the function was renamed.
Compile tested x86 only.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>