Just maintain the list properly by returning the head of the remaining
SKB list from dev_hard_start_xmit().
Signed-off-by: David S. Miller <davem@davemloft.net>
dev_hard_start_xmit() does two things, it first validates and
canonicalizes the SKB, then it actually sends it.
Make a set of helper functions for doing the first part.
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a slight policy change happening here as well.
The previous code would drop the entire rest of the GSO skb if any of
them got, for example, a congestion notification.
That makes no sense, anything NET_XMIT_MASK and below is something
like congestion or policing. And in the congestion case it doesn't
even mean the packet was actually dropped.
Just continue until dev_xmit_complete() evaluates to false.
Signed-off-by: David S. Miller <davem@davemloft.net>
CONFIG_NETFILTER_XT_TARGET_LOG is not selected anymore when jumping
from 3.16 to 3.17-rc1 if you don't set on the new NF_LOG_IPV4 and
NF_LOG_IPV6 switches.
Change this to select the three new symbols NF_LOG_COMMON, NF_LOG_IPV4
and NF_LOG_IPV6 instead, so NETFILTER_XT_TARGET_LOG remains enabled
when moving from old to new kernels.
Reported-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
CHECKSUM_UNNECESSARY may be applied to the SCTP CRC so we need to
appropriate account for this by decrementing csum_level. This is
done by calling __skb_dec_checksum_unnecessary.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow GRO path to "consume" checksums provided in CHECKSUM_UNNECESSARY
and to report new checksums verfied for use in fallback to normal
path.
Change GRO checksum path to track csum_level using a csum_cnt field
in NAPI_GRO_CB. On GRO initialization, if ip_summed is
CHECKSUM_UNNECESSARY set NAPI_GRO_CB(skb)->csum_cnt to
skb->csum_level + 1. For each checksum verified, decrement
NAPI_GRO_CB(skb)->csum_cnt while its greater than zero. If a checksum
is verfied and NAPI_GRO_CB(skb)->csum_cnt == 0, we have verified a
deeper checksum than originally indicated in skbuf so increment
csum_level (or initialize to CHECKSUM_UNNECESSARY if ip_summed is
CHECKSUM_NONE or CHECKSUM_COMPLETE).
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch:
- Clarifies the specific requirements of devices returning
CHECKSUM_UNNECESSARY (comments in skbuff.h).
- Adds csum_level field to skbuff. This is used to express how
many checksums are covered by CHECKSUM_UNNECESSARY (stores n - 1).
This replaces the overloading of skb->encapsulation, that field is
is now only used to indicate inner headers are valid.
- Change __skb_checksum_validate_needed to "consume" each checksum
as indicated by csum_level as layers of the the packet are parsed.
- Remove skb_pop_rcv_encapsulation, no longer needed in the new
csum_level model.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since SCTP day 1, that is, 19b55a2af145 ("Initial commit") from lksctp
tree, the official <netinet/sctp.h> header carries a copy of enum
sctp_sstat_state that looks like (compared to the current in-kernel
enumeration):
User definition: Kernel definition:
enum sctp_sstat_state { typedef enum {
SCTP_EMPTY = 0, <removed>
SCTP_CLOSED = 1, SCTP_STATE_CLOSED = 0,
SCTP_COOKIE_WAIT = 2, SCTP_STATE_COOKIE_WAIT = 1,
SCTP_COOKIE_ECHOED = 3, SCTP_STATE_COOKIE_ECHOED = 2,
SCTP_ESTABLISHED = 4, SCTP_STATE_ESTABLISHED = 3,
SCTP_SHUTDOWN_PENDING = 5, SCTP_STATE_SHUTDOWN_PENDING = 4,
SCTP_SHUTDOWN_SENT = 6, SCTP_STATE_SHUTDOWN_SENT = 5,
SCTP_SHUTDOWN_RECEIVED = 7, SCTP_STATE_SHUTDOWN_RECEIVED = 6,
SCTP_SHUTDOWN_ACK_SENT = 8, SCTP_STATE_SHUTDOWN_ACK_SENT = 7,
}; } sctp_state_t;
This header was later on also placed into the uapi, so that user space
programs can compile without having <netinet/sctp.h>, but the shipped
with <linux/sctp.h> instead.
While RFC6458 under 8.2.1.Association Status (SCTP_STATUS) says that
sstat_state can range from SCTP_CLOSED to SCTP_SHUTDOWN_ACK_SENT, we
nevertheless have a what it appears to be dummy SCTP_EMPTY state from
the very early days.
While it seems to do just nothing, commit 0b8f9e25b0 ("sctp: remove
completely unsed EMPTY state") did the right thing and removed this dead
code. That however, causes an off-by-one when the user asks the SCTP
stack via SCTP_STATUS API and checks for the current socket state thus
yielding possibly undefined behaviour in applications as they expect
the kernel to tell the right thing.
The enumeration had to be changed however as based on the current socket
state, we access a function pointer lookup-table through this. Therefore,
I think the best way to deal with this is just to add a helper function
sctp_assoc_to_state() to encapsulate the off-by-one quirk.
Reported-by: Tristan Su <sooqing@gmail.com>
Fixes: 0b8f9e25b0 ("sctp: remove completely unsed EMPTY state")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit ed98df3361 ("net: use __GFP_NORETRY for high order
allocations") we tried to address one issue caused by order-3
allocations.
We still observe high latencies and system overhead in situations where
compaction is not successful.
Instead of trying order-3, order-2, and order-1, do a single order-3
best effort and immediately fallback to plain order-0.
This mimics slub strategy to fallback to slab min order if the high
order allocation used for performance failed.
Order-3 allocations give a performance boost only if they can be done
without recurring and expensive memory scan.
Quoting David :
The page allocator relies on synchronous (sync light) memory compaction
after direct reclaim for allocations that don't retry and deferred
compaction doesn't work with this strategy because the allocation order
is always decreasing from the previous failed attempt.
This means sync light compaction will always be encountered if memory
cannot be defragmented or reclaimed several times during the
skb_page_frag_refill() iteration.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 6c9808ce09 ("tipc: remove port_lock") accidentally involves
a potential bug: when tipc socket instance(tsk) is not got with given
reference number in tipc_sk_get(), tsk is set to NULL. Subsequently
we jump to exit label where to decrease socket reference counter
pointed by tsk pointer in tipc_sk_put(). However, As now tsk is NULL,
oops may happen because of touching a NULL pointer.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace occurences of skb_get_queue_mapping() and follow-up
netdev_get_tx_queue() with an actual helper function.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds one more ACPI ID of a Broadcom bluetooth chip.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When a device driver is unloaded local->interfaces
list is cleared. If there was more than 1
interface running and connected (bound to a
chanctx) then chantype recalc was called and it
ended up with compat being NULL causing a call
trace warning.
Warn if compat becomes NULL as a result of
incompatible bss_conf.chandef of interfaces bound
to a given channel context only.
The call trace looked like this:
WARNING: CPU: 2 PID: 2594 at /devel/src/linux/net/mac80211/chan.c:557 ieee80211_recalc_chanctx_chantype+0x2cd/0x2e0()
Modules linked in: ath10k_pci(-) ath10k_core ath
CPU: 2 PID: 2594 Comm: rmmod Tainted: G W 3.16.0-rc1+ #150
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
0000000000000009 ffff88001ea279c0 ffffffff818dfa93 0000000000000000
ffff88001ea279f8 ffffffff810514a8 ffff88001ce09cd0 ffff88001e03cc58
0000000000000000 ffff88001ce08840 ffff88001ce09cd0 ffff88001ea27a08
Call Trace:
[<ffffffff818dfa93>] dump_stack+0x4d/0x66
[<ffffffff810514a8>] warn_slowpath_common+0x78/0xa0
[<ffffffff81051585>] warn_slowpath_null+0x15/0x20
[<ffffffff818a407d>] ieee80211_recalc_chanctx_chantype+0x2cd/0x2e0
[<ffffffff818a3dda>] ? ieee80211_recalc_chanctx_chantype+0x2a/0x2e0
[<ffffffff818a4919>] ieee80211_assign_vif_chanctx+0x1a9/0x770
[<ffffffff818a6220>] __ieee80211_vif_release_channel+0x70/0x130
[<ffffffff818a6dd3>] ieee80211_vif_release_channel+0x43/0xb0
[<ffffffff81885f4e>] ieee80211_stop_ap+0x21e/0x5a0
[<ffffffff8184b9b5>] __cfg80211_stop_ap+0x85/0x520
[<ffffffff8181c188>] __cfg80211_leave+0x68/0x120
[<ffffffff8181c268>] cfg80211_leave+0x28/0x40
[<ffffffff8181c5f3>] cfg80211_netdev_notifier_call+0x373/0x6b0
[<ffffffff8107f965>] notifier_call_chain+0x55/0x110
[<ffffffff8107fa41>] raw_notifier_call_chain+0x11/0x20
[<ffffffff816a8dc0>] call_netdevice_notifiers_info+0x30/0x60
[<ffffffff816a8eb9>] __dev_close_many+0x59/0xf0
[<ffffffff816a9021>] dev_close_many+0x81/0x120
[<ffffffff816aa1c5>] rollback_registered_many+0x115/0x2a0
[<ffffffff816aa3a6>] unregister_netdevice_many+0x16/0xa0
[<ffffffff8187d841>] ieee80211_remove_interfaces+0x121/0x1b0
[<ffffffff8185e0e6>] ieee80211_unregister_hw+0x56/0x110
[<ffffffffa0011ac4>] ath10k_mac_unregister+0x14/0x60 [ath10k_core]
[<ffffffffa0014fe7>] ath10k_core_unregister+0x27/0x40 [ath10k_core]
[<ffffffffa003b1f4>] ath10k_pci_remove+0x44/0xa0 [ath10k_pci]
[<ffffffff81373138>] pci_device_remove+0x28/0x60
[<ffffffff814cb534>] __device_release_driver+0x64/0xd0
[<ffffffff814cbcc8>] driver_detach+0xb8/0xc0
[<ffffffff814cb23a>] bus_remove_driver+0x4a/0xb0
[<ffffffff814cc697>] driver_unregister+0x27/0x50
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add support for the Broadcom Starfigther 2 switch chip using a DSA
driver. This switch driver supports the following features:
- configuration of the external switch port interface: MII, RevMII,
RGMII and RGMII_NO_ID are supported
- support for the per-port MIB counters
- support for link interrupts for special ports (e.g: MoCA)
- powering up/down of switch memories to conserve power when ports are
unused
Finally, update the compatible property for the DSA core code to match
our switch top-level compatible node.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for the 4-bytes Broadcom tag that built-in switches such as
the Starfighter 2 might insert when receiving packets, or that we need
to insert while targetting specific switch ports. We use a fake local
EtherType value for this 4-bytes switch tag: ETH_P_BRCMTAG to make sure
we can assign DSA-specific network operations within the DSA drivers.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow switch drivers to hook a PHY link update callback to perform
port-specific link work.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Whenever libphy determines that the link status of a given PHY/port has
changed, allow to call into the switch driver link adjustment callback
so proper actions can be taken care of by the switch driver upon link
notification.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case switch port tagging is disabled (voluntarily, or the switch just
does not support it), allow us to continue using the defined set of
dsa_device_ops in net/dsa/slave.c.
We introduce dsa_protocol_is_tagged() to check whether we need to
override skb->protocol and go through the DSA-specifif packet_type
function, or if we just go on and receive the SKB through the normal
path.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Modify the DSA slave interface to be bound to an arbitray PHY, not just
the ones that are available as child PHY devices of the switch MDIO bus.
This allows us for instance to have external PHYs connected to a
separate MDIO bus, but yet also connected to a given switch port.
Under certain configurations, the physical port mask might not be a 1:1
mapping to the MII PHYs mask. This is the case, if e.g: Port 1 of the
switch is used and connects to a PHY at a MDIO address different than 1.
Introduce a phys_mii_mask variable which allows driver to implement and
divert their own MDIO read/writes operations for a subset of the MDIO
PHY addresses.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We will later use the per-port device_node pointer to fetch a bunch of
port-specific properties.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We might need to fetch additional resources from the device tree node
pointer, such as register ranges or other properties. Keep a device_node
pointer around for this.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
DSA is currently registering one packet_type function per EtherType it
needs to intercept in the receive path of a DSA-enabled Ethernet device.
Right now we have three of them: trailer, DSA and eDSA, and there might
be more in the future, this will not scale to the addition of new
protocols.
This patch proceeds with adding a new layer of abstraction and two new
functions:
dsa_switch_rcv() which will dispatch into the tag-protocol specific
receive function implemented by net/dsa/tag_*.c
dsa_slave_xmit() which will dispatch into the tag-protocol specific
transmit function implemented by net/dsa/tag_*.c
When we do create the per-port slave network devices, we iterate over
the switch protocol to assign the DSA-specific receive and transmit
operations.
A new fake ethertype value is used: ETH_P_XDSA to illustrate the fact
that this is no longer going to look like ETH_P_DSA or ETH_P_TRAILER
like it used to be.
This allows us to greatly simplify the check in eth_type_trans() and
always override the skb->protocol with ETH_P_XDSA for Ethernet switches
tagged protocol, while also reducing the number repetitive slave
netdevice_ops assignments.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit fc60476761
("ipvs: changes for local real server") from 2.6.37
introduced DNAT support to local real server but the
IPv6 LOCAL_OUT handler ip_vs_local_reply6() is
registered incorrectly as IPv4 hook causing any outgoing
IPv4 traffic to be dropped depending on the IP header values.
Chris tracked down the problem to CONFIG_IP_VS_IPV6=y
Bug report: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1349768
Reported-by: Chris J Arges <chris.j.arges@canonical.com>
Tested-by: Chris J Arges <chris.j.arges@canonical.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
The "rcu_dereference()" calls are used directly in conditions.
Since their return values are never dereferenced it is recommended to
use "rcu_access_pointer()" instead of "rcu_dereference()".
Therefore, this patch makes the replacements.
The following Coccinelle semantic patch was used:
@@
@@
(
if(
(<+...
- rcu_dereference
+ rcu_access_pointer
(...)
...+>)) {...}
|
while(
(<+...
- rcu_dereference
+ rcu_access_pointer
(...)
...+>)) {...}
)
Signed-off-by: Andreea-Cristina Bernat <bernat.ada@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The tunneling method should properly use tunnel encapsulation.
Fixes problem with CHECKSUM_PARTIAL packets when TCP/UDP csum
offload is supported.
Thanks to Alex Gartrell for reporting the problem, providing
solution and for all suggestions.
Reported-by: Alex Gartrell <agartrell@fb.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Alex Gartrell <agartrell@fb.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
You can use this to skip accounting objects when listing/resetting
via NFNL_MSG_ACCT_GET/NFNL_MSG_ACCT_GET_CTRZERO messages with the
NLM_F_DUMP netlink flag. The filtering covers the following cases:
1. No filter specified. In this case, the client will get old behaviour,
2. List/reset counter object only: In this case, you have to use
NFACCT_F_QUOTA as mask and value 0.
3. List/reset quota objects only: You have to use NFACCT_F_QUOTA_PKTS
as mask and value - the same, for byte based quota mask should be
NFACCT_F_QUOTA_BYTES and value - the same.
If you want to obtain the object with any quota type
(ie. NFACCT_F_QUOTA_PKTS|NFACCT_F_QUOTA_BYTES), you need to perform
two dump requests, one to obtain NFACCT_F_QUOTA_PKTS objects and
another for NFACCT_F_QUOTA_BYTES.
Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The use of "rcu_assign_pointer()" is NULLing out the pointer.
According to RCU_INIT_POINTER()'s block comment:
"1. This use of RCU_INIT_POINTER() is NULLing out the pointer"
it is better to use it instead of rcu_assign_pointer() because it has a
smaller overhead.
The following Coccinelle semantic patch was used:
@@
@@
- rcu_assign_pointer
+ RCU_INIT_POINTER
(..., NULL)
Signed-off-by: Andreea-Cristina Bernat <bernat.ada@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When using the cfg80211_inform_bss[_width]() functions drivers
cannot currently indicate whether the data was received in a
beacon or probe response. Fix that by passing a new enum that
indicates such (or unknown).
For good measure, use it in ath6kl.
Acked-by: Kalle Valo <kvalo@qca.qualcomm.com> [ath6kl]
Acked-by: Arend van Spriel <arend@broadcom.com> [brcmfmac]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
There are a few possible cases of where BSS data came from:
1) only a beacon has been received
2) only a probe response has been received
3) the driver didn't report what it received (this happens when
using cfg80211_inform_bss[_width]())
4) both probe response and beacon data has been received
Unfortunately, in the userspace API, a few things weren't there:
a) there was no way to differentiate cases 1) and 4) above
without comparing the data of the IEs
b) the TSF was always from the last frame, instead of being
exposed for beacon/probe response separately like IEs
Fix this by
i) exporting a new flag attribute that indicates whether or
not probe response data has been received - this addresses (a)
ii) exporting a BEACON_TSF attribute that holds the beacon's TSF
if a beacon has been received
iii) not exporting the beacon attributes in case (3) above as that
would just lead userspace into thinking the data actually came
from a beacon when that isn't clear
To implement this, track inside the IEs struct whether or not it
(definitely) came from a beacon.
Reported-by: William Seto
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This reverts commit dda444d524.
Channel switching code has been reworked and
improved significantly since the time original
locking issues were found.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Header-less cloned skbs with sufficient headroom need not be cloned
unless the tailroom is going to be modified.
Fix ieee80211_skb_resize so it would only resize cloned skbs if either
the header isn't released or the tailroom is going to be modified.
Some drivers might have assumed that skbs are never cloned, so add a HW
flag that explicitly permits cloned TX skbs. Drivers which do not modify
TX skbs should set this flag to avoid copying skbs.
Signed-off-by: Ido Yariv <idox.yariv@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When hw acceleration is enabled, the GENERATE_IV or PUT_IV_SPACE flags
will only require headroom space. Consequently, the tailroom-needed
counter can safely be decremented.
Signed-off-by: Ido Yariv <idox.yariv@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
In the cfg80211_rx_mgmt(), parameter @gfp was used for the memory allocation.
But, memory get allocated under spin_lock_bh(), this implies atomic context.
So, one can't use GFP_KERNEL, only variants with no __GFP_WAIT. Actually, in all
occurrences GFP_ATOMIC is used (wil6210 use GFP_KERNEL by mistake),
and it should be this way or warning triggered in the memory allocation code.
Remove @gfp parameter as no actual choice exist, and use hard coded
GFP_ATOMIC for memory allocation.
Signed-off-by: Vladimir Kondratiev <qca_vkondrat@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The "RX active" string is too long, so the columns get
shifted. Change it to just "RX" to avoid this.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
sta->last_seq_ctrl is the seq_ctrl field from the last header
seen, need to shift it 4 bits to extract the sequence number.
Otherwise the ieee80211_sn_less() check at the top of
ieee80211_sta_manage_reorder_buf drops frames until the sequence
number catches up.
Cc: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Denton Gentry <denton.gentry@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The 802.11 standard says when processing a plink confirm
frame:
"If the peerLinkID in the mesh peering instance has not been
set, the Local Link ID field of the Mesh Peering Confirm
request shall be copied into the peerLinkID in the mesh
peering instance."
We were only doing this when receiving an open peering frame,
but it could happen that the open frame gets lost and so we
should handle this case rather than rejecting the confirm and
failing the whole peering process.
Reported-by: Yu Niiro <yu.niiro@gmail.com>
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
In ieee80211_sta_ps_deliver_wakeup, sdata->smps_mode is checked. This is
initialized only for the base AP interface, not the individual VLANs.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When bringing down the AP, a WARN_ON is hit because the bss config chandef
is empty here.
Since AP_VLAN channel settings do not matter for anything chanctx related
(always inherits the settings from the AP interface), let's just ignore
it here.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This reverts commit 24aa11ab8a.
That commit was wrong since it uses data that hasn't even been set
up yet, but might be a hold-over from a previous connection.
Additionally, it seems like a driver-specific workaround that
shouldn't have been in mac80211 to start with.
Cc: stable@vger.kernel.org
Fixes: 24aa11ab8a ("mac80211: disable uAPSD if all ACs are under ACM")
Reviewed-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This is follow-up to
da08143b85 ("vlan: more careful checksum features handling")
which introduced more careful feature intersection in vlan code,
taking into account that HW_CSUM should be considered superset
of IP_CSUM/IPV6_CSUM. The same is needed in netif_skb_features()
in order to avoid offloading mismatch warning when vlan is
created on top of a bond consisting of slaves supporting IP/IPv6
checksumming but not vlan Tx offloading.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: commit 690e36e726 (net: Allow raw buffers to be passed into the flow dissector)
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: commit 690e36e726 (net: Allow raw buffers to be passed into the flow dissector)
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Code manipulating sysfs symlinks on adjacent net_devices(s)
currently doesn't take into account that devices potentially
belong to different namespaces.
This patch trying to fix an issue as follows:
- check for net_ns before creating / deleting symlink.
for now only netdev_adjacent_rename_links and
__netdev_adjacent_dev_remove are affected, afaics
__netdev_adjacent_dev_insert implies both net_devs
belong to the same namespace.
- Drop all existing symlinks to / from all adj_devs before
switching namespace and recreate them just after.
Signed-off-by: Alexander Y. Fomichev <git.user@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently it can send regulatory domain change notification before any
NEW_WIPHY notification. Moreover, if rfill_register() fails, calling
wiphy_unregister() will send a DEL_WIPHY though no NEW_WIPHY had been
sent previously.
Thus reordering so it properly notifies NEW_WIPHY before any other.
Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This adds one more ACPI ID of a Broadcom bluetooth chip.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Use HAVE_JUMP_LABEL as elsewhere in the kernel to ensure
that the toolchain has the required support in addition to
CONFIG_JUMP_LABEL being set.
Signed-off-by: Zhouyi Zhou <yizhouzhou@ict.ac.cn>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch makes no changes to the logic of the code but simply addresses
coding style issues as detected by checkpatch.
Both objdump and diff -w show no differences.
This patch removes some blank lines between the end of a function
definition and the EXPORT_SYMBOL_GPL macro in order to prevent
checkpatch warning that EXPORT_SYMBOL must immediately follow
a function.
Signed-off-by: Ian Morris <ipm@chirality.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes no changes to the logic of the code but simply addresses
coding style issues as detected by checkpatch.
Both objdump and diff -w show no differences.
This patch addresses structure definitions, specifically it cleanses the brace
placement and replaces spaces with tabs in a few places.
Signed-off-by: Ian Morris <ipm@chirality.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes no changes to the logic of the code but simply addresses
coding style issues as detected by checkpatch.
Both objdump and diff -w show no differences.
A number of items are addressed in this patch:
* Multiple spaces converted to tabs
* Spaces before tabs removed.
* Spaces in pointer typing cleansed (char *)foo etc.
* Remove space after sizeof
* Ensure spacing around comparators such as if statements.
Signed-off-by: Ian Morris <ipm@chirality.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
In GRE demux if the GRE checksum pop rcv encapsulation so that any
encapsulated checksums are treated as tunnel checksums.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement GRO for UDPv6. Add UDP checksum verification in gro_receive
for both UDP4 and UDP6 calling skb_gro_checksum_validate_zero_check.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In tcp[64]_gro_receive call skb_gro_checksum_validate to validate TCP
checksum in the gro context.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add skb_gro_checksum_validate, skb_gro_checksum_validate_zero_check,
and skb_gro_checksum_simple_validate, and __skb_gro_checksum_complete.
These are the cognates of the normal checksum functions but are used
in the gro_receive path and operate on GRO related fields in sk_buffs.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter reported that the static checker emits the warning
net/netfilter/ipset/ip_set_list_set.c:600 init_list_set()
warn: integer overflows 'sizeof(*map) + size * set->dsize'
Limit the maximal number of elements in list type of sets.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Resolve missing-field-initializer warnings by providing a
directed initializer.
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Ranges of values are broken with hash:net,net and hash:net,port,net.
hash:net,net
============
# ipset create test-nn hash:net,net
# ipset add test-nn 10.0.10.1-10.0.10.127,10.0.0.0/8
# ipset list test-nn
Name: test-nn
Type: hash:net,net
Revision: 0
Header: family inet hashsize 1024 maxelem 65536
Size in memory: 16960
References: 0
Members:
10.0.10.1,10.0.0.0/8
# ipset test test-nn 10.0.10.65,10.0.0.1
10.0.10.65,10.0.0.1 is NOT in set test-nn.
# ipset test test-nn 10.0.10.1,10.0.0.1
10.0.10.1,10.0.0.1 is in set test-nn.
hash:net,port,net
=================
# ipset create test-npn hash:net,port,net
# ipset add test-npn 10.0.10.1-10.0.10.127,tcp:80,10.0.0.0/8
# ipset list test-npn
Name: test-npn
Type: hash:net,port,net
Revision: 0
Header: family inet hashsize 1024 maxelem 65536
Size in memory: 17344
References: 0
Members:
10.0.10.8/29,tcp:80,10.0.0.0
10.0.10.16/28,tcp:80,10.0.0.0
10.0.10.2/31,tcp:80,10.0.0.0
10.0.10.64/26,tcp:80,10.0.0.0
10.0.10.32/27,tcp:80,10.0.0.0
10.0.10.4/30,tcp:80,10.0.0.0
10.0.10.1,tcp:80,10.0.0.0
# ipset list test-npn
# ipset test test-npn 10.0.10.126,tcp:80,10.0.0.2
10.0.10.126,tcp:80,10.0.0.2 is NOT in set test-npn.
# ipset test test-npn 10.0.10.126,tcp:80,10.0.0.0
10.0.10.126,tcp:80,10.0.0.0 is in set test-npn.
# ipset create test-npn hash:net,port,net
# ipset add test-npn 10.0.10.0/24,tcp:80-81,10.0.0.0/8
# ipset list test-npn
Name: test-npn
Type: hash:net,port,net
Revision: 0
Header: family inet hashsize 1024 maxelem 65536
Size in memory: 17024
References: 0
Members:
10.0.10.0,tcp:80,10.0.0.0
10.0.10.0,tcp:81,10.0.0.0
# ipset test test-npn 10.0.10.126,tcp:80,10.0.0.0
10.0.10.126,tcp:80,10.0.0.0 is NOT in set test-npn.
# ipset test test-npn 10.0.10.0,tcp:80,10.0.0.0
10.0.10.0,tcp:80,10.0.0.0 is in set test-npn.
Correctly setup from..to variables where no IPSET_ATTR_IP_TO{,2}
attribute is given, so in range processing loop we construct proper
cidr value. Check whenever we have no ranges and can short cut in
hash:net,net properly. Use unlikely() where appropriate, to comply
with other modules.
Signed-off-by: Sergey Popovich <popovich_sergei@mail.ru>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Markmask is an u32, hence it can't be greater then 4294967295 ( i.e.
0xffffffff ). This was causing smatch warning:
net/netfilter/ipset/ip_set_hash_gen.h:1084 hash_ipmark_create() warn:
impossible condition '(markmask > 4294967295) => (0-u32max > u32max)'
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Add cpu support to meta expresion.
This allows you to match packets with cpu number.
Signed-off-by: Ana Rey <anarey@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add pkttype support for ip, ipv6 and inet families of tables.
This allows you to fetch the meta packet type based on the link layer
information. The loopback traffic is a special case, the packet type
is guessed from the network layer header.
No special handling for bridge and arp since we're not going to see
such traffic in the loopback interface.
Joint work with Alvaro Neira Ayuso <alvaroneay@gmail.com>
Signed-off-by: Alvaro Neira Ayuso <alvaroneay@gmail.com>
Signed-off-by: Ana Rey <anarey@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Replace open codings of (((u64) <x> * <y>) >> 32) with reciprocal_scale().
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Drivers, and perhaps other entities we have not yet considered,
sometimes want to know how deep the protocol headers go before
deciding how large of an SKB to allocate and how much of the packet to
place into the linear SKB area.
For example, consider a driver which has a device which DMAs into
pools of pages and then tells the driver where the data went in the
DMA descriptor(s). The driver can then build an SKB and reference
most of the data via SKB fragments (which are page/offset/length
triplets).
However at least some of the front of the packet should be placed into
the linear SKB area, which comes before the fragments, so that packet
processing can get at the headers efficiently. The first thing each
protocol layer is going to do is a "pskb_may_pull()" so we might as
well aggregate as much of this as possible while we're building the
SKB in the driver.
Part of supporting this is that we don't have an SKB yet, so we want
to be able to let the flow dissector operate on a raw buffer in order
to compute the offset of the end of the headers.
So now we have a __skb_flow_dissect() which takes an explicit data
pointer and length.
Signed-off-by: David S. Miller <davem@davemloft.net>
We complete the merging of the port and socket layer by aggregating
the fields of struct tipc_port directly into struct tipc_sock, and
moving the combined structure into socket.c.
We also move all functions and macros that are not any longer
exposed to the rest of the stack into socket.c, and rename them
accordingly.
Despite the size of this commit, there are no functional changes.
We have only made such changes that are necessary due of the removal
of struct tipc_port.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The reference table is now 'socket aware' instead of being generic,
and has in reality become a socket internal table. In order to be
able to minimize the API exposed by the socket layer towards the rest
of the stack, we now move the reference table definitions and functions
into the file socket.c, and rename the functions accordingly.
There are no functional changes in this commit.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We move the inline functions in the file port.h to socket.c, and modify
their names accordingly.
We move struct tipc_port and some macros to socket.h.
Finally, we remove the file port.h.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In this commit, we move the remaining functions in port.c to
socket.c, and give them new names that correspond to their new
location. We then remove the file port.c.
There are only cosmetic changes to the moved functions.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In previous commits we have reduced usage of port_lock to a minimum,
and complemented it with usage of bh_lock_sock() at the remaining
locations. The purpose has been to remove this lock altogether, since
it largely duplicates the role of bh_lock_sock. We are now ready to do
this.
However, we still need to protect the BH callers from inadvertent
release of the socket while they hold a reference to it. We do this by
replacing port_lock by a combination of a rw-lock protecting the
reference table as such, and updating the socket reference counter while
the socket is referenced from BH. This technique is more standard and
comprehensible than the previous approach, and turns out to have a
positive effect on overall performance.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to make tipc_sock the only entity referencable from other
parts of the stack, we add a tipc_sock pointer instead of a tipc_port
pointer to the registry. As a consequence, we also let the function
tipc_port_lock() return a pointer to a tipc_sock instead of a tipc_port.
We keep the function's name for now, since the lock still is owned by
the port.
This is another step in the direction of eliminating port_lock, replacing
its usage with lock_sock() and bh_lock_sock().
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The functions tipc_port_get_ports() and tipc_port_reinit() scan over
all sockets/ports to access each of them. This is done by using a
dedicated linked list, 'tipc_socks' where all sockets are members. The
list is in turn protected by a spinlock, 'port_list_lock', while each
socket is locked by using port_lock at the moment of access.
In order to reduce complexity and risk of deadlock, we want to get
rid of the linked list and the accompanying spinlock.
This is what we do in this commit. Instead of the linked list, we use
the port registry to scan across the sockets. We also add usage of
bh_lock_sock() inside the scope of port_lock in both functions, as a
preparation for the complete removal of port_lock.
Finally, we move the functions from port.c to socket.c, and rename them
to tipc_sk_sock_show() and tipc_sk_reinit() repectively.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After the latest changes to the socket/port layer the existence of
the functions tipc_port_init() and tipc_port_destroy() cannot be
justified. They are both called only once, from tipc_sk_create() and
tipc_sk_delete() respectively, and their functionality can better be
merged into the latter two functions.
This also entails that all remaining references to port_lock now are
made from inside socket.c, something that will make it easier to remove
this lock.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function tipc_acknowledge() is a remnant from the obsolete native
API. Currently, it grabs port_lock, before building an acknowledge
message and sending it to the peer.
Since all access to socket members now is protected by the socket lock,
it has become unnecessary to grab port_lock here.
In this commit, we remove the usage of port_lock, simplify the
function, and move it to socket.c, renaming it to tipc_sk_send_ack().
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tipc_port_connect()/tipc_port_disconnect() are remnants of the obsolete
native API. Their only task is to grab port_lock and call the functions
__tipc_port_connect()/__tipc_port_disconnect() respectively, which will
perform the actual state change.
Since socket/port exection now is single-threaded the use of port_lock
is not needed any more, so we can safely replace the two functions with
their lock-free counterparts.
In this commit, we remove the two functions. Furthermore, the contents
of __tipc_port_disconnect() is so trivial that we choose to eliminate
that function too, expanding its functionality into tipc_shutdown().
__tipc_port_connect() is simplified, moved to socket.c, and given the
more correct name tipc_sk_finish_conn(). Finally, we eliminate the
function auto_connect(), and expand its contents into filter_connect().
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tipc_port_shutdown() is a remnant from the now obsolete native
interface. As such it grabs port_lock in order to protect itself
from concurrent BH processing.
However, after the recent changes to the port/socket upcalls, sockets
are now basically single-threaded, and all execution, except the read-only
tipc_sk_timer(), is executing within the protection of lock_sock(). So
the use of port_lock is not needed here.
In this commit we eliminate the whole function, and merge it into its
only caller, tipc_shutdown().
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The last remaining BH upcall to the socket, apart for the message
reception function tipc_sk_rcv(), is the timer function.
We prefer to let this function continue executing in BH, since it only
does read-acces to semi-permanent data, but we make three changes to it:
1) We introduce a bh_lock_sock()/bh_unlock_sock() inside the scope
of port_lock. This is a preparation for replacing port_lock with
bh_lock_sock() at the locations where it is still used.
2) We move the function from port.c to socket.c, as a further step
of eliminating the port code level altogether.
3) We let it make use of the newly introduced tipc_msg_create()
function. This enables us to get rid of three context specific
functions (port_create_self_abort_msg() etc.) in port.c
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the current implementation, each 'struct tipc_node' instance keeps
a linked list of those ports/sockets that are connected to the node
represented by that struct. The purpose of this is to let the node
object know which sockets to alert when it loses contact with its peer
node, i.e., which sockets need to have their connections aborted.
This entails an unwanted direct reference from the node structure
back to the port/socket structure, and a need to grab port_lock
when we have to make an upcall to the port. We want to get rid of
this unecessary BH entry point into the socket, and also eliminate
its use of port_lock.
In this commit, we instead let the node struct keep list of "connected
socket" structs, which each represents a connected socket, but is
allocated independently by the node at the moment of connection. If
the node loses contact with its peer node, the list is traversed, and
a "connection abort" message is created for each entry in the list. The
message is sent to it respective connected socket using the ordinary
data path, and the receiving socket aborts its connections upon reception
of the message.
This enables us to get rid of the direct reference from 'struct node' to
´struct port', and another unwanted BH access point to the latter.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current link implementation keeps a linked list of blocked ports/
sockets that is populated when there is link congestion. The purpose
of this is to let the link know which users to wake up when the
congestion abates.
This adds unnecessary complexity to the data structure and the code,
since it forces us to involve the link each time we want to delete
a socket. It also forces us to grab the spinlock port_lock within
the scope of node_lock. We want to get rid of this direct dependence,
as well as the deadlock hazard resulting from the usage of port_lock.
In this commit, we instead let the link keep list of a "wakeup" pseudo
messages for use in such situations. Those messages are sent to the
pending sockets via the ordinary message reception path, and wake up
the socket's owner when they are received.
This enables us to get rid of the 'waiting_ports' linked lists in struct
tipc_port that manifest this direct reference. As a consequence, we can
eliminate another BH entry into the socket, and hence the need to grab
port_lock. This is a further step in our effort to remove port_lock
altogether.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function tipc_msg_init() has turned out to be of limited value
in many cases. It take too few parameters to be usable for creating
a complete message, it makes too many assumptions about what the
message should be used for, and it does not allocate any buffer to
be returned to the caller.
Therefore, we now introduce the new function tipc_msg_create(), which
takes all the parameters needed to create a full message, and returns
a buffer of the requested size. The new function will be very useful
for the changes we will be doing in later commits in this series.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Upon timeout, undo (via both timestamps/Eifel and DSACKs) was
disabled if any retransmits were still in flight. The concern was
perhaps that spurious retransmission sent in a previous recovery
episode may trigger DSACKs to falsely undo the current recovery.
However, this inadvertently misses undo opportunities (using either
TCP timestamps or DSACKs) when timeout occurs during a loss episode,
i.e. recurring timeouts or timeout during fast recovery. In these
cases some retransmissions will be in flight but we should allow
undo. Furthermore, we should only reset undo_marker and undo_retrans
upon timeout if we are starting a new recovery episode. Finally,
when we do reset our undo state, we now do so in a manner similar
to tcp_enter_recovery(), so that we require a DSACK for each of
the outstsanding retransmissions. This will achieve the original
goal by requiring that we receive the same number of DSACKs as
retransmissions.
This patch increases the undo events by 50% on Google servers.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As a followup to commit 676d23690f ("net: Fix use after free by
removing length arg from sk_data_ready callbacks"), we can remove
some useless code in sock_queue_rcv_skb() and rxrpc_queue_rcv_skb()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ktime_get_ns() replaces ktime_to_ns(ktime_get())
ktime_get_real_ns() replaces ktime_to_ns(ktime_get_real())
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The new_ctx pointer is set only for non-chanctx drivers. This yielded a
crash for chanctx-based drivers during channel switch finalization:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
IP: ieee80211_vif_use_reserved_switch+0x71c/0xb00 [mac80211]
Use an adequate chanctx pointer to fix this.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The functions time_before, time_before_eq, time_after, and time_after_eq
are more robust for comparing jiffies against other values.
A simplified version of the Coccinelle semantic patch making this change
is as follows:
@change@
expression E1,E2,E3;
@@
- jiffies - E1 >= (E2*E3)
+ time_after_eq(jiffies, E1+E2*E3)
Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
The functions time_before, time_before_eq, time_after, and time_after_eq
are more robust for comparing jiffies against other values.
A simplified version of the Coccinelle semantic patch making this change
is as follows:
@change@
expression E1,E2;
@@
- (jiffies - E1) >= E2
+ time_after_eq(jiffies, E1+E2)
Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
The functions time_before, time_before_eq, time_after, and time_after_eq
are more robust for comparing jiffies against other values.
A simplified version of the Coccinelle semantic patch making this change
is as follows:
@change@
expression E1,E2;
@@
- jiffies - E1 < E2
+ time_before(jiffies, E1+E2)
Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
The functions time_before, time_before_eq, time_after, and time_after_eq
are more robust for comparing jiffies against other values.
A simplified version of the Coccinelle semantic patch making this change
is as follows:
@change@
expression E1,E2;
@@
(
- (jiffies - E1) < E2
+ time_before(jiffies, E1+E2)
)
Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
The use of "rcu_assign_pointer()" is NULLing out the pointer.
According to RCU_INIT_POINTER()'s block comment:
"1. This use of RCU_INIT_POINTER() is NULLing out the pointer"
it is better to use it instead of rcu_assign_pointer() because it has a
smaller overhead.
The following Coccinelle semantic patch was used:
@@
@@
- rcu_assign_pointer
+ RCU_INIT_POINTER
(..., NULL)
Signed-off-by: Andreea-Cristina Bernat <bernat.ada@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The "rcu_dereference()" call is used directly in a condition.
Since its return value is never dereferenced it is recommended to use
"rcu_access_pointer()" instead of "rcu_dereference()".
Therefore, this patch makes the replacement.
The following Coccinelle semantic patch was used:
@@
@@
(
if(
(<+...
- rcu_dereference
+ rcu_access_pointer
(...)
...+>)) {...}
|
while(
(<+...
- rcu_dereference
+ rcu_access_pointer
(...)
...+>)) {...}
)
Signed-off-by: Andreea-Cristina Bernat <bernat.ada@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The "rcu_dereference()" call is used directly in a condition.
Since its return value is never dereferenced it is recommended to use
"rcu_access_pointer()" instead of "rcu_dereference()".
Therefore, this patch makes the replacement.
The following Coccinelle semantic patch was used:
@@
@@
(
if(
(<+...
- rcu_dereference
+ rcu_access_pointer
(...)
...+>)) {...}
|
while(
(<+...
- rcu_dereference
+ rcu_access_pointer
(...)
...+>)) {...}
)
Signed-off-by: Andreea-Cristina Bernat <bernat.ada@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 7a9bc9b81a ("ipv4: Elide fib_validate_source() completely when possible.")
introduced a short-circuit to avoid calling fib_validate_source when not
needed. That change took rp_filter into account, but not accept_local.
This resulted in a change of behaviour: with rp_filter and accept_local
off, incoming packets with a local address in the source field should be
dropped.
Here is how to reproduce the change pre/post 7a9bc9b81a commit:
-configure the same IPv4 address on hosts A and B.
-try to send an ARP request from B to A.
-The ARP request will be dropped before that commit, but accepted and answered
after that commit.
This adds a check for ACCEPT_LOCAL, to maintain full
fib validation in case it is 0. We also leave __fib_validate_source() earlier
when possible, based on the same check as fib_validate_source(), once the
accept_local stuff is verified.
Cc: Gregory Detal <gregory.detal@uclouvain.be>
Cc: Christoph Paasch <christoph.paasch@uclouvain.be>
Cc: Hannes Frederic Sowa <hannes@redhat.com>
Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Sébastien Barré <sebastien.barre@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
In SCTP, selection of active (T.ACT) and retransmission (T.RET)
transports is being done whenever transport control operations
(UP, DOWN, PF, ...) are engaged through sctp_assoc_control_transport().
Commits 4c47af4d5e ("net: sctp: rework multihoming retransmission
path selection to rfc4960") and a7288c4dd5 ("net: sctp: improve
sctp_select_active_and_retran_path selection") have both improved
it towards a more fine-grained and optimal path selection.
Currently, the selection algorithm for T.ACT and T.RET is as follows:
1) Elect the two most recently used ACTIVE transports T1, T2 for
T.ACT, T.RET, where T.ACT<-T1 and T1 is most recently used
2) In case primary path T.PRI not in {T1, T2} but ACTIVE, set
T.ACT<-T.PRI and T.RET<-T1
3) If only T1 is ACTIVE from the set, set T.ACT<-T1 and T.RET<-T1
4) If none is ACTIVE, set T.ACT<-best(T.PRI, T.RET, T3) where
T3 is the most recently used (if avail) in PF, set T.RET<-T.PRI
Prior to above commits, 4) was simply a camp on T.ACT<-T.PRI and
T.RET<-T.PRI, ignoring possible paths in PF. Camping on T.PRI is
still slightly suboptimal as it can lead to the following scenario:
Setup:
<A> <B>
T1: p1p1 (10.0.10.10) <==> .'`) <==> p1p1 (10.0.10.12) <= T.PRI
T2: p1p2 (10.0.10.20) <==> (_ . ) <==> p1p2 (10.0.10.22)
net.sctp.rto_min = 1000
net.sctp.path_max_retrans = 2
net.sctp.pf_retrans = 0
net.sctp.hb_interval = 1000
T.PRI is permanently down, T2 is put briefly into PF state (e.g. due to
link flapping). Here, the first time transmission is sent over PF path
T2 as it's the only non-INACTIVE path, but the retransmitted data-chunks
are sent over the INACTIVE path T1 (T.PRI), which is not good.
After the patch, it's choosing better transports in both cases by
modifying step 4):
4) If none is ACTIVE, set T.ACT_new<-best(T.ACT_old, T3) where T3 is
the most recently used (if avail) in PF, set T.RET<-T.ACT_new
This will still select a best possible path in PF if available (which
can also include T.PRI/T.RET), and set both T.ACT/T.RET to it.
In case sctp_assoc_control_transport() *just* put T.ACT_old into INACTIVE
as it transitioned from ACTIVE->PF->INACTIVE and stays in INACTIVE just
for a very short while before going back ACTIVE, it will guarantee that
this path will be reselected for T.ACT/T.RET since T3 (PF) is not
available.
Previously, this was not possible, as we would only select between T.PRI
and T.RET, and a possible T3 would be NULL due to the fact that we have
just transitioned T3 in sctp_assoc_control_transport() from PF->INACTIVE
and would select a suboptimal path when T.PRI/T.RET have worse properties.
In the case that T.ACT_old permanently went to INACTIVE during this
transition and there's no PF path available, plus T.PRI and T.RET are
INACTIVE as well, we would now camp on T.ACT_old, but if everything is
being INACTIVE there's really not much we can do except hoping for a
successful HB to bring one of the transports back up again and, thus
cause a new selection through sctp_assoc_control_transport().
Now both tests work fine:
Case 1:
1. T1 S(ACTIVE) T.ACT
T2 S(ACTIVE) T.RET
2. T1 S(ACTIVE) T.ACT, T.RET
T2 S(PF)
3. T1 S(ACTIVE) T.ACT, T.RET
T2 S(INACTIVE)
5. T1 S(PF) T.ACT, T.RET
T2 S(INACTIVE)
[ 5.1 T1 S(INACTIVE) T.ACT, T.RET
T2 S(INACTIVE) ]
6. T1 S(ACTIVE) T.ACT, T.RET
T2 S(INACTIVE)
7. T1 S(ACTIVE) T.ACT
T2 S(ACTIVE) T.RET
Case 2:
1. T1 S(ACTIVE) T.ACT
T2 S(ACTIVE) T.RET
2. T1 S(PF)
T2 S(ACTIVE) T.ACT, T.RET
3. T1 S(INACTIVE)
T2 S(ACTIVE) T.ACT, T.RET
5. T1 S(INACTIVE)
T2 S(PF) T.ACT, T.RET
[ 5.1 T1 S(INACTIVE)
T2 S(INACTIVE) T.ACT, T.RET ]
6. T1 S(INACTIVE)
T2 S(ACTIVE) T.ACT, T.RET
7. T1 S(ACTIVE) T.ACT
T2 S(ACTIVE) T.RET
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When both transports are the same, we don't have to go down that
road only to realize that we will return the very same transport.
We are guaranteed that curr is always non-NULL. Therefore, just
short-circuit this special case.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When there are multiple vlan headers present in a received frame, the first
one is put into vlan_tci and protocol is set to ETH_P_8021Q. Anything in the
skb beyond the VLAN TPID may be still non-linear, including the inner TCI
and ethertype. While ovs_flow_extract takes care of IP and IPv6 headers, it
does nothing with ETH_P_8021Q. Later, if OVS_ACTION_ATTR_POP_VLAN is
executed, __pop_vlan_tci pulls the next vlan header into vlan_tci.
This leads to two things:
1. Part of the resulting ethernet header is in the non-linear part of the
skb. When eth_type_trans is called later as the result of
OVS_ACTION_ATTR_OUTPUT, kernel BUGs in __skb_pull. Also, __pop_vlan_tci
is in fact accessing random data when it reads past the TPID.
2. network_header points into the ethernet header instead of behind it.
mac_len is set to a wrong value (10), too.
Reported-by: Yulong Pei <ypei@redhat.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the transport has always been in state SCTP_UNCONFIRMED, it
therefore wasn't active before and hasn't been used before, and it
always has been, so it is unnecessary to bug the user with a
notification.
Reported-by: Deepak Khandelwal <khandelwal.deepak.1987@gmail.com>
Suggested-by: Vlad Yasevich <vyasevich@gmail.com>
Suggested-by: Michael Tuexen <tuexen@fh-muenster.de>
Suggested-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Zhu Yanjun <Yanjun.Zhu@windriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
af_packet can currently overwrite kernel memory by out of bound
accesses, because it assumed a [new] block can always hold one frame.
This is not generally the case, even if most existing tools do it right.
This patch clamps too long frames as API permits, and issue a one time
error on syslog.
[ 394.357639] tpacket_rcv: packet too big, clamped from 5042 to 3966. macoff=82
In this example, packet header tp_snaplen was set to 3966,
and tp_len was set to 5042 (skb->len)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: f6fb8f100b ("af-packet: TPACKET_V3 flexible buffer implementation.")
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The LECS response contains the MTU that should be used. Correctly
synchronize with other layers when updating.
Signed-off-by: Chas Williams - CONTRACTOR <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
Recently the LE passive scanning and auto-connections feature was
introduced. It uses the hci_connect_le() API which returns a hci_conn
along with a reference count to that object. All previous users would
tie this returned reference to some existing object, such as an L2CAP
channel, and there'd be no leaked references this way. For
auto-connections however the reference was returned but not stored
anywhere, leaving established connections with one higher reference
count than they should have.
Instead of playing special tricks with hci_conn_hold/drop this patch
associates the returned reference from hci_connect_le() with the object
that in practice does own this reference, i.e. the hci_conn_params
struct that caused us to initiate a connection in the first place. Once
the connection is established or fails to establish this reference is
removed appropriately.
One extra thing needed is to call hci_pend_le_actions_clear() before
calling hci_conn_hash_flush() so that the reference is cleared before
the hci_conn objects are fully removed.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This enables the netfilter NAT engine in first place, otherwise
you cannot ever select the nf_tables nat expression if iptables
is not selected.
Reported-by: Matteo Croce <technoboy85@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
There's actually no good reason why we cannot use cgroup id 0,
so lets just remove this artificial barrier.
Reported-by: Alexey Perevalov <a.perevalov@samsung.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Tested-by: Alexey Perevalov <a.perevalov@samsung.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Now q->now_rt is identical to q->now and is not required anymore.
Signed-off-by: Vasily Averin <vvs@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mainstream commit f0f6ee1f70 ("cbq: incorrect processing of high limits")
have side effect: if cbq bandwidth setting is less than real interface
throughput non-limited traffic can delay limited traffic for a very long time.
This happen because of q->now changes incorrectly in cbq_dequeue():
in described scenario L2T is much greater than real time delay,
and q->now gets an extra boost for each transmitted packet.
Accumulated boost prevents update q->now, and blocked class can wait
very long time until (q->now >= cl->undertime) will be true again.
To fix the problem the patch updates q->now on each cbq_update() call.
L2T-related pre-modification q->now was moved to cbq_update().
My testing confirmed that it fixes the problem and did not discover
any side-effects
Fixes: f0f6ee1f70 ("cbq: incorrect processing of high limits")
Signed-off-by: Vasily Averin <vvs@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch drops the userspace accessable sysfs entry for the maximum
datagram size of a 6LoWPAN fragment packet.
A fragment should not have a datagram size value greater than 1280 byte.
Instead of make this value configurable, we accept 1280 datagram size
fragment packets only.
Signed-off-by: Martin Townsend <martin.townsend@xsilon.com>
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch changes the 1281 MTU to 1280. Others stack have only a 1280
byte array for uncompressed 6LoWPAN packets, this avoid that these
stacks have an overflow. Sending 1281 uncompressed 6LoWPAN packets isn't
also rfc complaint.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
If received frame contains the reserved destination address mode. The
frame should be dropped and free the skb.
Signed-off-by: Martin Townsend <martin.townsend@xsilon.com>
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch correct the return value of lowpan_alloc_frag if an error
occur. Errno numbers should always be negative.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch fix a memory leak if received frame was not able to parse.
Signed-off-by: Martin Townsend <martin.townsend@xsilon.com>
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Currently, the NAT configs depend on iptables and ip6tables. However,
users should be capable of enabling NAT for nft without having to
switch on iptables.
Fix this by adding new specific IP_NF_NAT and IP6_NF_NAT config
switches for iptables and ip6tables NAT support. I have also moved
the original NF_NAT_IPV4 and NF_NAT_IPV6 configs out of the scope
of iptables to make them independent of it.
This patch also adds NETFILTER_XT_NAT which selects the xt_nat
combo that provides snat/dnat for iptables. We cannot use NF_NAT
anymore since nf_tables can select this.
Reported-by: Matteo Croce <technoboy85@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Commit 3b4f302d85 ("tipc: eliminate
redundant locking") introduced a bug by removing the sanity check
for message importance, allowing programs to assign any value to
the msg_user field. This will mess up the packet reception logic
and may cause random link resets.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1d023284c3 ("list: fix order of arguments for
hlist_add_after(_rcu)") was incorrectly rebased on top of
d9124268d8 ("batman-adv: Fix out-of-order
fragmentation support"). The parameter order change of the rebased patch was
not re-applied as expected. This causes a memory leak and can cause crashes
when out-of-order packets are received. hlist_add_behind will try to access the
uninitalized list pointers of frag_entry_new to find the previous/next entry
and may modify/read random memory locations.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the AP only advertises support for 20MHz (in the
ht operation ie), disable 40MHz and VHT.
This can improve interoperability with APs that
don't like stations exceeding their own
advertised capabilities.
Signed-off-by: Eliad Peller <eliadx.peller@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
We currently track the QoS capability twice: for all peer stations
in the WLAN_STA_WME flag, and for any clients associated to an AP
interface separately for drivers in the sta->sta.wme field.
Remove the WLAN_STA_WME flag and track the capability only in the
driver-visible field, getting rid of the limitation that the field
is only valid in AP mode.
Reviewed-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Silences the following sparse warnings:
net/netlink/af_netlink.c:2926:21: warning: context imbalance in 'netlink_seq_start' - wrong count at exit
net/netlink/af_netlink.c:2972:13: warning: context imbalance in 'netlink_seq_stop' - unexpected unlock
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix TCP FRTO logic so that it always notices when snd_una advances,
indicating that any RTO after that point will be a new and distinct
loss episode.
Previously there was a very specific sequence that could cause FRTO to
fail to notice a new loss episode had started:
(1) RTO timer fires, enter FRTO and retransmit packet 1 in write queue
(2) receiver ACKs packet 1
(3) FRTO sends 2 more packets
(4) RTO timer fires again (should start a new loss episode)
The problem was in step (3) above, where tcp_process_loss() returned
early (in the spot marked "Step 2.b"), so that it never got to the
logic to clear icsk_retransmits. Thus icsk_retransmits stayed
non-zero. Thus in step (4) tcp_enter_loss() would see the non-zero
icsk_retransmits, decide that this RTO is not a new episode, and
decide not to cut ssthresh and remember the current cwnd and ssthresh
for undo.
There were two main consequences to the bug that we have
observed. First, ssthresh was not decreased in step (4). Second, when
there was a series of such FRTO (1-4) sequences that happened to be
followed by an FRTO undo, we would restore the cwnd and ssthresh from
before the entire series started (instead of the cwnd and ssthresh
from before the most recent RTO). This could result in cwnd and
ssthresh being restored to values much bigger than the proper values.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Fixes: e33099f96d ("tcp: implement RFC5682 F-RTO")
Signed-off-by: David S. Miller <davem@davemloft.net>
tcp_tw_recycle heavily relies on tcp timestamps to build a per-host
ordering of incoming connections and teardowns without the need to
hold state on a specific quadruple for TCP_TIMEWAIT_LEN, but only for
the last measured RTO. To do so, we keep the last seen timestamp in a
per-host indexed data structure and verify if the incoming timestamp
in a connection request is strictly greater than the saved one during
last connection teardown. Thus we can verify later on that no old data
packets will be accepted by the new connection.
During moving a socket to time-wait state we already verify if timestamps
where seen on a connection. Only if that was the case we let the
time-wait socket expire after the RTO, otherwise normal TCP_TIMEWAIT_LEN
will be used. But we don't verify this on incoming SYN packets. If a
connection teardown was less than TCP_PAWS_MSL seconds in the past we
cannot guarantee to not accept data packets from an old connection if
no timestamps are present. We should drop this SYN packet. This patch
closes this loophole.
Please note, this patch does not make tcp_tw_recycle in any way more
usable but only adds another safety check:
Sporadic drops of SYN packets because of reordering in the network or
in the socket backlog queues can happen. Users behing NAT trying to
connect to a tcp_tw_recycle enabled server can get caught in blackholes
and their connection requests may regullary get dropped because hosts
behind an address translator don't have synchronized tcp timestamp clocks.
tcp_tw_recycle cannot work if peers don't have tcp timestamps enabled.
In general, use of tcp_tw_recycle is disadvised.
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make sure we use the correct address-family-specific function for
handling MTU reductions from within tcp_release_cb().
Previously AF_INET6 sockets were incorrectly always using the IPv6
code path when sometimes they were handling IPv4 traffic and thus had
an IPv4 dst.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Diagnosed-by: Willem de Bruijn <willemb@google.com>
Fixes: 563d34d057 ("tcp: dont drop MTU reduction indications")
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
As of 4fddbf5d78 ("sit: strictly restrict incoming traffic to tunnel link device"),
when looking up a tunnel, tunnel's underlying interface (t->parms.link)
is verified to match incoming traffic's ingress device.
However the comparison was incorrectly based on skb->dev->iflink.
Instead, dev->ifindex should be used, which correctly represents the
interface from which the IP stack hands the ipip6 packets.
This allows setting up sit tunnels bound to vlan interfaces (otherwise
incoming ipip6 traffic on the vlan interface was dropped due to
ipip6_tunnel_lookup match failure).
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I-Frame which is going to be resend already has FCS field added and set
(if it was required). Adding additional FCS field calculated from data +
old FCS in resend function is incorrect. This patch fix that.
Issue has been found during PTS testing.
Signed-off-by: Lukasz Rymanowski <lukasz.rymanowski@tieto.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
There is no need to decrease pdu size with L2CAP SDU lenght in Start
L2CAP SDU frame. Start packtet is just 2 bytes longer as specified and
we can keep payload as long as possible.
When testing SAR L2CAP against PTS, L2CAP channel is usually configured
in that way, that SDU = MPS * 3. PTS expets then 3 I-Frames from IUT: Start,
Continuation and End frame.
Without this fix, we sent 4 I-Frames. We could pass a test by using -b
option in l2test and send just two bytes less than SDU length. With this
patch no need to use -b option.
Signed-off-by: Lukasz Rymanowski <lukasz.rymanowski@tieto.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch introduce the common error path on failure of Tx by
inserting the label 'err_tx'.
Signed-off-by: Varka Bhadram <varkab@cdac.in>
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
By introducing label fail, making the common error path for
mac802154_llsec_decrypt() and packet type default case.
Signed-off-by: Varka Bhadram <varkab@cdac.in>
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch replace the sizeof(struct rx_work) with sizeof(*work)
and directly passing the skb in mac802154_subif_rx()
Signed-off-by: Varka Bhadram <varkab@cdac.in>
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
There are no external users of smp_chan_destroy() so make it private to
smp.c. The patch also moves the function higher up in the c-file in
order to avoid forward declarations.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The smp_distribute_keys() function calls smp_notify_keys() which in turn
calls l2cap_conn_update_id_addr(). The l2cap_conn_update_id_addr()
function will iterate through all L2CAP channels for the respective
connection: lock the channel, update the address information and unlock
the channel.
Since SMP is now using l2cap_chan callbacks each callback is called with
the channel lock held. Therefore, calling l2cap_conn_update_id_addr()
would cause a deadlock calling l2cap_chan_lock() on the SMP channel.
This patch moves calling smp_distribute_keys() through a workqueue so
that it is never called from an L2CAP channel callback.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
All places needing to cancel the security timer also call
smp_chan_destroy() in the same go. To eliminate the need to do these two
calls in multiple places simply move the timer cancellation into
smp_chan_destroy().
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Now that there are no-longer any users for l2cap_conn->security_timer we
can go ahead and simply remove it. The patch makes initialization of the
conn->info_timer unconditional since it's better not to leave any
l2cap_conn data structures uninitialized no matter what the underlying
transport.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds an SMP-internal timeout callback to remove the depenency
on (the soon to be removed) l2cap_conn->security_timer. The behavior is
the same as with l2cap_conn->security_timer except that the new
l2cap_conn_shutdown() public function is used instead of the L2CAP core
internal l2cap_conn_del().
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
In the case that the SMP recv callback returns error the calling code in
l2cap_core.c expects that it still owns the skb and will try to free it.
The SMP code should therefore not try to free the skb if it return an
error. This patch fixes such behavior in the SMP command handler
function.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
To restore pre-l2cap_chan functionality we should be trying to
disconnect the connection when receviving garbage SMP data (i.e. when
the SMP command handler fails). This patch renames the command handler
back to smp_sig_channel() and adds a smp_recv_cb() wrapper function for
calling it. If smp_sig_channel() fails the code calls
l2cap_conn_shutdown().
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Since we no-longer do special handling of SMP within l2cap_core.c we
don't have any code for calling l2cap_conn_del() when smp.c doesn't like
the data it gets. At the same time we cannot simply export
l2cap_conn_del() since it will try to lock the channels it calls into
whereas we already hold the lock in the smp.c l2cap_chan callbacks (i.e.
it'd lead to a deadlock).
This patch adds a new l2cap_conn_shutdown() API which is very similar to
l2cap_conn_del() except that it defers the call to l2cap_conn_del()
through a workqueue, thereby making it safe to use it from an L2CAP
channel callback.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
There's no need to export the smp_distribute_keys() function since the
resume callback is called in the same scenario. This patch makes the
smp_notify_keys function private (at the same time moving it higher up
in smp.c to avoid forward declarations) and adds a resume callback for
SMP to call it from there instead.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Now that we have all the necessary pieces in place we can fully convert
SMP to use the L2CAP channel infrastructure. This patch adds the
necessary callbacks and removes the now unneeded conn->smp_chan pointer.
One notable behavioral change in this patch comes from the following
code snippet:
- case L2CAP_CID_SMP:
- if (smp_sig_channel(conn, skb))
- l2cap_conn_del(conn->hcon, EACCES);
This piece of code was essentially forcing a disconnection if garbage
SMP data was received. The l2cap_conn_del() function is private to
l2cap_conn.c so we don't have access to it anymore when using the L2CAP
channel callbacks. Therefore, the behavior of the new code is simply to
return errors in the recv() callback (which is simply the old
smp_sig_channel()), but no disconnection will occur.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Now that we have per-adapter SMP data thanks to the root SMP L2CAP
channel we can take advantage of it and attach the AES crypto context
(only used for SMP) to it. This means that the smp_irk_matches() and
smp_generate_rpa() function can be converted to internally handle the
AES context.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch creates the initial SMP L2CAP channels and a skeleton for
their callbacks. There is one per-adapter channel created upon adapter
registration, and then one channel per-connection created through the
new_connection callback. The channels are registered with the reserved
CID 0x1f for now in order to not conflict with existing SMP
functionality. Once everything is in place the value can be changed to
what it should be, i.e. L2CAP_CID_SMP.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
As preparation for moving SMP to use l2cap_chan infrastructure we need
to move the (de)initialization functions to smp.c (where they'll
eventually need access to the local L2CAP channel callbacks).
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
First of all, it's wasteful to initialize SMP if it's never going to be
used (e.g. on non-LE controllers). Second of all, when we move to use
l2cap_chan we need to know the real local address, meaning we must have
completed at least part of the HCI init. This patch moves the SMP
initialization to after the HCI init procedure and makes it depend on
whether the controller actually supports LE.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
As preparation for converting SMP to use the l2cap_chan infrastructure
refactor the (de)initialization into separate functions.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
If the AES crypto has not been initialized properly we should cleanly
return from the hci_find_irk_by_rpa() function. Right now this will not
happen in practice, but once (in subsequent patches) SMP init is moved
to after the HCI init procedure it is possible that the pointer is NULL.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
If the AES crypto context is not available we cannot generate new RPAs.
We should therefore cleanly return an error from the function
responsible for updating the random address.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The code is consistently using the HCI_CONN_LE_SMP_PEND flag check for
the existence of the SMP context, with the exception of this one place
in smp_sig_channel(). This patch converts the place to use the flag just
like all other instances.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
For most cases it makes no difference whether l2cap_le_conn_ready() is
called before or after calling the channel ready() callbacks, however
for upcoming SMP code we need this as the ready() callback initializes
certain structures that a call to smp_conn_security() from
l2cap_le_conn_ready() depends on. Therefore, move the call to
l2cap_le_conn_ready() after iterating through and notifying channels.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
L2CAP channel implementations may want to still access the chan->conn
pointer. This will particularly be the case for SMP that will want to
clear a reference to the SMP channel in the l2cap_conn structure. The
only user of the teardown callback so far is l2cap_sock.c and for the
code there it makes no difference whether the callback is called before
or after clearing the chan->conn pointer.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The l2cap_add_scid function is used for registering a fixed L2CAP
channel. Instead of having separate initialization of the channel type
and outgoing MTU in l2cap_sock.c it's more intuitive to do these things
in the l2cap_add_scid function itself (and thereby make the
functionality available to other users besides l2cap_sock.c).
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Now that we've got the fixed channel infrastructure cleaned up in a
generic way there's no longer a need to have a dedicated function for
handling data on the ATT channel. Instead the generic
l2cap_data_channel() handler will be able to do the exact same thing.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
When notifying global fixed channels of new connections it doesn't make
sense to consider channels meant for a different link type than the one
available. This patch adds an extra parameter to the
l2cap_global_fixed_chan() lookup function and ensures that only channels
matching the current hci_conn type are looked up.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
In order to remove special handling of fixed L2CAP channels we need to
start creating them in a single place instead of having per-channel
exceptions. The most natural place is the l2cap_conn_cfm() function
which is called whenever there is a new baseband link.
The only really special case so far has been the ATT socket, so in order
not to break the code in between this patch removes the ATT special
handling at the same time as it adds the generic fixed channel handling
from l2cap_le_conn_ready() into the hci_conn_cfm() function. As a
related change the channel locking in l2cap_conn_ready() becomes simpler
and we can thereby move the smp_conn_security() call into the
l2cap_le_conn_ready() function.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch is a simple refactoring of l2cap_connect_cfm to allow easier
extension of the function.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
With the update to sk->resume() and __l2cap_no_conn_pending() we
no-longer need to have special handling of ATT channels in the
l2cap_security_cfm() function. The chan->sec_level update when
encryption has been enabled is safe to do for any kind of channel, and
the loop takes later care of calling chan->ready() or chan->resume() if
necessary.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The LE ATT socket uses a special trick where it temporarily sets
BT_CONFIG state for the duration of a security level elevation. In order
to not require special hacks for going back to BT_CONNECTED state in the
l2cap_core.c code the most reasonable place to resume the state is the
resume callback. This patch adds a new flag to track the pending
security level change and ensures that the state is set back to
BT_CONNECTED in the resume callback in case the flag is set.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The __l2cap_no_conn_pending() function would previously only return a
meaningful value for connection oriented channels and was therefore not
useful for anything else. As preparation of making the L2CAP code more
generic allow the function to be called for other channel types as well
by returning a meaningful value for them.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
When looking up entries from the global L2CAP channel list there needs
to be a guarantee that other code doesn't go and remove the entry after
a channel has been returned by the lookup function. This patch makes
sure that the channel reference is incremented before the read lock is
released in the global channel lookup functions. The patch also adds the
corresponding l2cap_chan_put() calls once the channels pointers are
no-longer needed.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The new_connection L2CAP channel callback creates a new channel based on
the provided parent channel. The 6lowpan code was confusingly naming the
child channel "pchan" and the parent channel "chan". This patch swaps
the names.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
In the smp_cmd_sign_info() function the SMP_DIST_SIGN bit is explicitly
cleared early on in the function. This means that there's no need to
check for it again before calling smp_distribute_keys().
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
When we're not connectable and all whitelisted (BR/EDR) devices are
connected it doesn't make sense to keep page scan enabled. This patch
adds code to check for any disconnected whitelist devices and if there
are none take the appropriate action in the hci_update_page_scan()
function to disable page scan.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Similar to our hci_update_background_scan() function we can simplify a
lot of code by creating a unified helper function for doing page scan
updates. This patch adds such a function to hci_core.c and updates all
the relevant places to use it.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
There are several situations where we're interested in knowing whether
we're currently in the process of powering off an adapter. This patch
adds a convenience function for the purpose and makes it public since
we'll soon need to access it from hci_event.c as well.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Bytestream timestamps are correlated with a single byte in the skbuff,
recorded in skb_shinfo(skb)->tskey. When fragmenting skbuffs, ensure
that the tskey is set for the fragment in which the tskey falls
(seqno <= tskey < end_seqno).
The original implementation did not address fragmentation in
tcp_fragment or tso_fragment. Add code to inspect the sequence numbers
and move both tskey and the relevant tx_flags if necessary.
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ACK timestamps are generated in tcp_clean_rtx_queue. The TSO datapath
can break out early, causing the timestamp code to be skipped. Move
the code up before the break.
Reported-by: David S. Miller <davem@davemloft.net>
Also fix a boundary condition: tp->snd_una is the next unacknowledged
byte and between tests inclusive (a <= b <= c), so generate a an ACK
timestamp if (prior_snd_una <= tskey <= tp->snd_una - 1).
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
b67bfe0d42 (hlist: drop the node
parameter from iterators) dropped the node parameter from
iterators which lec_tbl_walk() was using to iterate the list.
Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
One should not call blocking primitives inside a wait loop, since both
require task_struct::state to sleep, so the inner will destroy the
outer state.
sigd_enq() will possibly sleep for alloc_skb(). Move sigd_enq() before
prepare_to_wait() to avoid sleeping while waiting interruptibly. You do
not actually need to call sigd_enq() after the initial prepare_to_wait()
because we test the termination condition before calling schedule().
Based on suggestions from Peter Zijlstra.
Signed-off-by: Chas Williams <chas@cmf.n4rl.navy.mil>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
ovs_vport_alloc() bails out without freeing the memory 'vport' points to.
Picked up by Coverity - CID 1230503.
Fixes: 5cd667b0a4 ("openvswitch: Allow each vport to have an array of 'port_id's.")
Signed-off-by: Christoph Jaeger <cj@linux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
"Several networking final fixes and tidies for the merge window:
1) Changes during the merge window unintentionally took away the
ability to build bluetooth modular, fix from Geert Uytterhoeven.
2) Several phy_node reference count bug fixes from Uwe Kleine-König.
3) Fix ucc_geth build failures, also from Uwe Kleine-König.
4) Fix klog false positivies when netlink messages go to network
taps, by properly resetting the network header. Fix from Daniel
Borkmann.
5) Sizing estimate of VF netlink messages is too small, from Jiri
Benc.
6) New APM X-Gene SoC ethernet driver, from Iyappan Subramanian.
7) VLAN untagging is erroneously dependent upon whether the VLAN
module is loaded or not, but there are generic dependencies that
matter wrt what can be expected as the SKB enters the stack.
Make the basic untagging generic code, and do it unconditionally.
From Vlad Yasevich.
8) xen-netfront only has so many slots in it's transmit queue so
linearize packets that have too many frags. From Zoltan Kiss.
9) Fix suspend/resume PHY handling in bcmgenet driver, from Florian
Fainelli"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (55 commits)
net: bcmgenet: correctly resume adapter from Wake-on-LAN
net: bcmgenet: update UMAC_CMD only when link is detected
net: bcmgenet: correctly suspend and resume PHY device
net: bcmgenet: request and enable main clock earlier
net: ethernet: myricom: myri10ge: myri10ge.c: Cleaning up missing null-terminate after strncpy call
xen-netfront: Fix handling packets on compound pages with skb_linearize
net: fec: Support phys probed from devicetree and fixed-link
smsc: replace WARN_ON() with WARN_ON_SMP()
xen-netback: Don't deschedule NAPI when carrier off
net: ethernet: qlogic: qlcnic: Remove duplicate object file from Makefile
wan: wanxl: Remove typedefs from struct names
m68k/atari: EtherNEC - ethernet support (ne)
net: ethernet: ti: cpmac.c: Cleaning up missing null-terminate after strncpy call
hdlc: Remove typedefs from struct names
airo_cs: Remove typedef local_info_t
atmel: Remove typedef atmel_priv_ioctl
com20020_cs: Remove typedef com20020_dev_t
ethernet: amd: Remove typedef local_info_t
net: Always untag vlan-tagged traffic on input.
drivers: net: Add APM X-Gene SoC ethernet driver support.
...
Highlights include:
- Stable fix for a bug in nfs3_list_one_acl()
- Speed up NFS path walks by supporting LOOKUP_RCU
- More read/write code cleanups
- pNFS fixes for layout return on close
- Fixes for the RCU handling in the rpcsec_gss code
- More NFS/RDMA fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJT65zoAAoJEGcL54qWCgDyvq8QAJ+OKuC5dpngrZ13i4ZJIcK1
TJSkWCr44FhYPlrmkLCntsGX6C0376oFEtJ5uqloqK0+/QtvwRNVSQMKaJopKIVY
mR4En0WwpigxVQdW2lgto6bfOhzMVO+llVdmicEVrU8eeSThATxGNv7rxRzWorvL
RX3TwBkWSc0kLtPi66VRFQ1z+gg5I0kngyyhsKnLOaHHtpTYP2JDZlRPRkokXPUg
nmNedmC3JrFFkarroFIfYr54Qit2GW/eI2zVhOwHGCb45j4b2wntZ6wr7LpUdv3A
OGDBzw59cTpcx3Hij9CFvLYVV9IJJHBNd2MJqdQRtgWFfs+aTkZdk4uilUJCIzZh
f4BujQAlm/4X1HbPxsSvkCRKga7mesGM7e0sBDPHC1vu0mSaY1cakcj2kQLTpbQ7
gqa1cR3pZ+4shCq37cLwWU0w1yElYe1c4otjSCttPCrAjXbXJZSFzYnHm8DwKROR
t+yEDRL5BIXPu1nEtSnD2+xTQ3vUIYXooZWEmqLKgRtBTtPmgSn9Vd8P1OQXmMNo
VJyFXyjNx5WH06Wbc/jLzQ1/cyhuPmJWWyWMJlVROyv+FXk9DJUFBZuTkpMrIPcF
NlBXLV1GnA7PzMD9Xt9bwqteERZl6fOUDJLWS9P74kTk5c2kD+m+GaqC/rBTKKXc
ivr2s7aIDV48jhnwBSVL
=KE07
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-3.17-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
"Highlights include:
- stable fix for a bug in nfs3_list_one_acl()
- speed up NFS path walks by supporting LOOKUP_RCU
- more read/write code cleanups
- pNFS fixes for layout return on close
- fixes for the RCU handling in the rpcsec_gss code
- more NFS/RDMA fixes"
* tag 'nfs-for-3.17-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (79 commits)
nfs: reject changes to resvport and sharecache during remount
NFS: Avoid infinite loop when RELEASE_LOCKOWNER getting expired error
SUNRPC: remove all refcounting of groupinfo from rpcauth_lookupcred
NFS: fix two problems in lookup_revalidate in RCU-walk
NFS: allow lockless access to access_cache
NFS: teach nfs_lookup_verify_inode to handle LOOKUP_RCU
NFS: teach nfs_neg_need_reval to understand LOOKUP_RCU
NFS: support RCU_WALK in nfs_permission()
sunrpc/auth: allow lockless (rcu) lookup of credential cache.
NFS: prepare for RCU-walk support but pushing tests later in code.
NFS: nfs4_lookup_revalidate: only evaluate parent if it will be used.
NFS: add checks for returned value of try_module_get()
nfs: clear_request_commit while holding i_lock
pnfs: add pnfs_put_lseg_async
pnfs: find swapped pages on pnfs commit lists too
nfs: fix comment and add warn_on for PG_INODE_REF
nfs: check wait_on_bit_lock err in page_group_lock
sunrpc: remove "ec" argument from encrypt_v2 operation
sunrpc: clean up sparse endianness warnings in gss_krb5_wrap.c
sunrpc: clean up sparse endianness warnings in gss_krb5_seal.c
...
Pull Ceph updates from Sage Weil:
"There is a lot of refactoring and hardening of the libceph and rbd
code here from Ilya that fix various smaller bugs, and a few more
important fixes with clone overlap. The main fix is a critical change
to the request_fn handling to not sleep that was exposed by the recent
mutex changes (which will also go to the 3.16 stable series).
Yan Zheng has several fixes in here for CephFS fixing ACL handling,
time stamps, and request resends when the MDS restarts.
Finally, there are a few cleanups from Himangi Saraogi based on
Coccinelle"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (39 commits)
libceph: set last_piece in ceph_msg_data_pages_cursor_init() correctly
rbd: remove extra newlines from rbd_warn() messages
rbd: allocate img_request with GFP_NOIO instead GFP_ATOMIC
rbd: rework rbd_request_fn()
ceph: fix kick_requests()
ceph: fix append mode write
ceph: fix sizeof(struct tYpO *) typo
ceph: remove redundant memset(0)
rbd: take snap_id into account when reading in parent info
rbd: do not read in parent info before snap context
rbd: update mapping size only on refresh
rbd: harden rbd_dev_refresh() and callers a bit
rbd: split rbd_dev_spec_update() into two functions
rbd: remove unnecessary asserts in rbd_dev_image_probe()
rbd: introduce rbd_dev_header_info()
rbd: show the entire chain of parent images
ceph: replace comma with a semicolon
rbd: use rbd_segment_name_free() instead of kfree()
ceph: check zero length in ceph_sync_read()
ceph: reset r_resend_mds after receiving -ESTALE
...
Currently the functionality to untag traffic on input resides
as part of the vlan module and is build only when VLAN support
is enabled in the kernel. When VLAN is disabled, the function
vlan_untag() turns into a stub and doesn't really untag the
packets. This seems to create an interesting interaction
between VMs supporting checksum offloading and some network drivers.
There are some drivers that do not allow the user to change
tx-vlan-offload feature of the driver. These drivers also seem
to assume that any VLAN-tagged traffic they transmit will
have the vlan information in the vlan_tci and not in the vlan
header already in the skb. When transmitting skbs that already
have tagged data with partial checksum set, the checksum doesn't
appear to be updated correctly by the card thus resulting in a
failure to establish TCP connections.
The following is a packet trace taken on the receiver where a
sender is a VM with a VLAN configued. The host VM is running on
doest not have VLAN support and the outging interface on the
host is tg3:
10:12:43.503055 52:54:00:ae:42:3f > 28:d2:44:7d:c2:de, ethertype 802.1Q
(0x8100), length 78: vlan 100, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 27243,
offset 0, flags [DF], proto TCP (6), length 60)
10.0.100.1.58545 > 10.0.100.10.ircu-2: Flags [S], cksum 0xdc39 (incorrect
-> 0x48d9), seq 1069378582, win 29200, options [mss 1460,sackOK,TS val
4294837885 ecr 0,nop,wscale 7], length 0
10:12:44.505556 52:54:00:ae:42:3f > 28:d2:44:7d:c2:de, ethertype 802.1Q
(0x8100), length 78: vlan 100, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 27244,
offset 0, flags [DF], proto TCP (6), length 60)
10.0.100.1.58545 > 10.0.100.10.ircu-2: Flags [S], cksum 0xdc39 (incorrect
-> 0x44ee), seq 1069378582, win 29200, options [mss 1460,sackOK,TS val
4294838888 ecr 0,nop,wscale 7], length 0
This connection finally times out.
I've only access to the TG3 hardware in this configuration thus have
only tested this with TG3 driver. There are a lot of other drivers
that do not permit user changes to vlan acceleration features, and
I don't know if they all suffere from a similar issue.
The patch attempt to fix this another way. It moves the vlan header
stipping code out of the vlan module and always builds it into the
kernel network core. This way, even if vlan is not supported on
a virtualizatoin host, the virtual machines running on top of such
host will still work with VLANs enabled.
CC: Patrick McHardy <kaber@trash.net>
CC: Nithin Nayak Sujir <nsujir@broadcom.com>
CC: Michael Chan <mchan@broadcom.com>
CC: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains fixes for your net tree, they are:
1) Unitialize the set element key and data from the commit path,
otherwise this leaks chain refcount if the transaction is aborted,
reported by Thomas Graf.
2) Fix crash when updating chains without no counters in nf_tables,
this slipped through in the new transaction infrastructure, reported
by Matteo Croce.
3) Replace all mutex_lock_interruptible() by mutex_lock() in the Netfilter
tree, suggested by Patrick McHardy. This implicitly fixes the problem
that Eric Dumazet reported in: http://patchwork.ozlabs.org/patch/373076/
4) Fix error return code in nf_tables when deleting set element in
nf_tables if the transaction cannot be allocated, from Julia Lawall.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull namespace updates from Eric Biederman:
"This is a bunch of small changes built against 3.16-rc6. The most
significant change for users is the first patch which makes setns
drmatically faster by removing unneded rcu handling.
The next chunk of changes are so that "mount -o remount,.." will not
allow the user namespace root to drop flags on a mount set by the
system wide root. Aks this forces read-only mounts to stay read-only,
no-dev mounts to stay no-dev, no-suid mounts to stay no-suid, no-exec
mounts to stay no exec and it prevents unprivileged users from messing
with a mounts atime settings. I have included my test case as the
last patch in this series so people performing backports can verify
this change works correctly.
The next change fixes a bug in NFS that was discovered while auditing
nsproxy users for the first optimization. Today you can oops the
kernel by reading /proc/fs/nfsfs/{servers,volumes} if you are clever
with pid namespaces. I rebased and fixed the build of the
!CONFIG_NFS_FS case yesterday when a build bot caught my typo. Given
that no one to my knowledge bases anything on my tree fixing the typo
in place seems more responsible that requiring a typo-fix to be
backported as well.
The last change is a small semantic cleanup introducing
/proc/thread-self and pointing /proc/mounts and /proc/net at it. This
prevents several kinds of problemantic corner cases. It is a
user-visible change so it has a minute chance of causing regressions
so the change to /proc/mounts and /proc/net are individual one line
commits that can be trivially reverted. Unfortunately I lost and
could not find the email of the original reporter so he is not
credited. From at least one perspective this change to /proc/net is a
refgression fix to allow pthread /proc/net uses that were broken by
the introduction of the network namespace"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
proc: Point /proc/mounts at /proc/thread-self/mounts instead of /proc/self/mounts
proc: Point /proc/net at /proc/thread-self/net instead of /proc/self/net
proc: Implement /proc/thread-self to point at the directory of the current thread
proc: Have net show up under /proc/<tgid>/task/<tid>
NFS: Fix /proc/fs/nfsfs/servers and /proc/fs/nfsfs/volumes
mnt: Add tests for unprivileged remount cases that have found to be faulty
mnt: Change the default remount atime from relatime to the existing value
mnt: Correct permission checks in do_remount
mnt: Move the test for MNT_LOCK_READONLY from change_mount_flags into do_remount
mnt: Only change user settable mount flags in remount
namespaces: Use task_lock and not rcu to protect nsproxy
Pull nfsd updates from Bruce Fields:
"This includes a major rewrite of the NFSv4 state code, which has
always depended on a single mutex. As an example, open creates are no
longer serialized, fixing a performance regression on NFSv3->NFSv4
upgrades. Thanks to Jeff, Trond, and Benny, and to Christoph for
review.
Also some RDMA fixes from Chuck Lever and Steve Wise, and
miscellaneous fixes from Kinglong Mee and others"
* 'for-3.17' of git://linux-nfs.org/~bfields/linux: (167 commits)
svcrdma: remove rdma_create_qp() failure recovery logic
nfsd: add some comments to the nfsd4 object definitions
nfsd: remove the client_mutex and the nfs4_lock/unlock_state wrappers
nfsd: remove nfs4_lock_state: nfs4_state_shutdown_net
nfsd: remove nfs4_lock_state: nfs4_laundromat
nfsd: Remove nfs4_lock_state(): reclaim_complete()
nfsd: Remove nfs4_lock_state(): setclientid, setclientid_confirm, renew
nfsd: Remove nfs4_lock_state(): exchange_id, create/destroy_session()
nfsd: Remove nfs4_lock_state(): nfsd4_open and nfsd4_open_confirm
nfsd: Remove nfs4_lock_state(): nfsd4_delegreturn()
nfsd: Remove nfs4_lock_state(): nfsd4_open_downgrade + nfsd4_close
nfsd: Remove nfs4_lock_state(): nfsd4_lock/locku/lockt()
nfsd: Remove nfs4_lock_state(): nfsd4_release_lockowner
nfsd: Remove nfs4_lock_state(): nfsd4_test_stateid/nfsd4_free_stateid
nfsd: Remove nfs4_lock_state(): nfs4_preprocess_stateid_op()
nfsd: remove old fault injection infrastructure
nfsd: add more granular locking to *_delegations fault injectors
nfsd: add more granular locking to forget_openowners fault injector
nfsd: add more granular locking to forget_locks fault injector
nfsd: add a list_head arg to nfsd_foreach_client_lock
...
Determining ->last_piece based on the value of ->page_offset + length
is incorrect because length here is the length of the entire message.
->last_piece set to false even if page array data item length is <=
PAGE_SIZE, which results in invalid length passed to
ceph_tcp_{send,recv}page() and causes various asserts to fire.
# cat pages-cursor-init.sh
#!/bin/bash
rbd create --size 10 --image-format 2 foo
FOO_DEV=$(rbd map foo)
dd if=/dev/urandom of=$FOO_DEV bs=1M &>/dev/null
rbd snap create foo@snap
rbd snap protect foo@snap
rbd clone foo@snap bar
# rbd_resize calls librbd rbd_resize(), size is in bytes
./rbd_resize bar $(((4 << 20) + 512))
rbd resize --size 10 bar
BAR_DEV=$(rbd map bar)
# trigger a 512-byte copyup -- 512-byte page array data item
dd if=/dev/urandom of=$BAR_DEV bs=1M count=1 seek=5
The problem exists only in ceph_msg_data_pages_cursor_init(),
ceph_msg_data_pages_advance() does the right thing. The size_t cast is
unnecessary.
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Commit 1d8faf48c7 ("net/core: Add VF link state control") added new
attribute to IFLA_VF_INFO group in rtnl_fill_ifinfo but did not adjust size
of the allocated memory in if_nlmsg_size/rtnl_vfinfo_size. As the result, we
may trigger warnings in rtnl_getlink and similar functions when many VF
links are enabled, as the information does not fit into the allocated skb.
Fixes: 1d8faf48c7 ("net/core: Add VF link state control")
Reported-by: Yulong Pei <ypei@redhat.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since fib_lookup cannot return ESRCH no longer,
checking for this error code is no longer neccesary.
Signed-off-by: Niv Yehezkel <executerx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert a zero return value on error to a negative one, as returned
elsewhere in the function.
A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)
// <smpl>
@@
identifier ret; expression e1,e2;
@@
(
if (\(ret < 0\|ret != 0\))
{ ... return ret; }
|
ret = 0
)
... when != ret = e1
when != &ret
*if(...)
{
... when != ret = e2
when forall
return ret;
}
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Eric Dumazet reports that getsockopt() or setsockopt() sometimes
returns -EINTR instead of -ENOPROTOOPT, causing headaches to
application developers.
This patch replaces all the mutex_lock_interruptible() by mutex_lock()
in the netfilter tree, as there is no reason we should sleep for a
long time there.
Reported-by: Eric Dumazet <edumazet@google.com>
Suggested-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Julian Anastasov <ja@ssi.bg>
Fix possible replacement of the per-cpu chain counters by null
pointer when updating an existing chain in the commit path.
Reported-by: Matteo Croce <technoboy85@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This should happen once the element has been effectively released in
the commit path, not before. This fixes a possible chain refcount leak
if the transaction is aborted.
Reported-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netlink doesn't set any network header offset thus when the skb is
being passed to tap devices via dev_queue_xmit_nit(), it emits klog
false positives due to it being unset like:
...
[ 124.990397] protocol 0000 is buggy, dev nlmon0
[ 124.990411] protocol 0000 is buggy, dev nlmon0
...
So just reset the network header before passing to the device; for
packet sockets that just means nothing will change - mac and net
offset hold the same value just as before.
Reported-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The header multicast.h was included twice, so delete one of them.
Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Cc: Marek Lindner <mareklindner@neomailbox.ch>
Cc: Simon Wunderlich <sw@simonwunderlich.de>
Cc: Antonio Quartulli <antonio@meshcoding.com>
Cc: b.a.t.m.a.n@lists.open-mesh.org
Signed-off-by: David S. Miller <davem@davemloft.net>
The #include headers net/genetlink.h and linux/genetlink.h both were
included twice, so delete each of the duplicate.
Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Cc: Pravin Shelar <pshelar@nicira.com>
Cc: dev@openvswitch.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Change config symbol 6LOWPAN from type bool to type tristate, so
6LoWPAN can be built modular, just like IPV6
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Merge incoming from Andrew Morton:
- Various misc things.
- arch/sh updates.
- Part of ocfs2. Review is slow.
- Slab updates.
- Most of -mm.
- printk updates.
- lib/ updates.
- checkpatch updates.
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (226 commits)
checkpatch: update $declaration_macros, add uninitialized_var
checkpatch: warn on missing spaces in broken up quoted
checkpatch: fix false positives for --strict "space after cast" test
checkpatch: fix false positive MISSING_BREAK warnings with --file
checkpatch: add test for native c90 types in unusual order
checkpatch: add signed generic types
checkpatch: add short int to c variable types
checkpatch: add for_each tests to indentation and brace tests
checkpatch: fix brace style misuses of else and while
checkpatch: add --fix option for a couple OPEN_BRACE misuses
checkpatch: use the correct indentation for which()
checkpatch: add fix_insert_line and fix_delete_line helpers
checkpatch: add ability to insert and delete lines to patch/file
checkpatch: add an index variable for fixed lines
checkpatch: warn on break after goto or return with same tab indentation
checkpatch: emit a warning on file add/move/delete
checkpatch: add test for commit id formatting style in commit log
checkpatch: emit fewer kmalloc_array/kcalloc conversion warnings
checkpatch: improve "no space after cast" test
checkpatch: allow multiple const * types
...
Although RCU protection would be possible during diag dump, doing
so allows for concurrent table mutations which can render the
in-table offset between individual Netlink messages invalid and
thus cause legitimate sockets to be skipped in the dump.
Since the diag dump is relatively low volume and consistency is
more important than performance, the table mutex is held during
dump.
Reported-by: Andrey Wagin <avagin@gmail.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Fixes: e341694e3e ("netlink: Convert netlink_lookup() to use RCU protected hash table")
Signed-off-by: David S. Miller <davem@davemloft.net>
All other add functions for lists have the new item as first argument
and the position where it is added as second argument. This was changed
for no good reason in this function and makes using it unnecessary
confusing.
The name was changed to hlist_add_behind() to cause unconverted code to
generate a compile error instead of using the wrong parameter order.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Ken Helias <kenhelias@firemail.de>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> [intel driver bits]
Cc: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since a8afca032 (tcp: md5: protects md5sig_info with RCU) tcp_md5_do_lookup
doesn't require socket lock, rcu_read_lock is enough. Therefore socket lock is
no longer required for tcp_v{4,6}_inbound_md5_hash too, so we can move these
calls (wrapped with rcu_read_{,un}lock) before bh_lock_sock:
from tcp_v{4,6}_do_rcv to tcp_v{4,6}_rcv.
Signed-off-by: Dmitry Popov <ixaphire@qrator.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
A set of small fixes pointed out just after the merge:
- make tcp_tx_timestamp static
- make tcp_gso_tstamp static
- use before() to compare TCP seqno, instead of cast to u64
- add tstamp to tx_flags in GSO, instead of overwrite tx_flags
- record skb_shinfo(skb)->tskey for all timestamps, also HW.
- optimization in tcp_tx_timestamp:
call sock_tx_timestamp only if a tstamp option is set.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Fixes: 4ed2d765df ("net-timestamp: TCP timestamping")
Signed-off-by: David S. Miller <davem@davemloft.net>
sock_tx_timestamp() should not ignore initial *tx_flags value, as TCP
stack can store SKBTX_SHARED_FRAG in it.
Also first argument (struct sock *) can be const.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 4ed2d765df ("net-timestamp: TCP timestamping")
Cc: Willem de Bruijn <willemb@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking updates from David Miller:
"Highlights:
1) Steady transitioning of the BPF instructure to a generic spot so
all kernel subsystems can make use of it, from Alexei Starovoitov.
2) SFC driver supports busy polling, from Alexandre Rames.
3) Take advantage of hash table in UDP multicast delivery, from David
Held.
4) Lighten locking, in particular by getting rid of the LRU lists, in
inet frag handling. From Florian Westphal.
5) Add support for various RFC6458 control messages in SCTP, from
Geir Ola Vaagland.
6) Allow to filter bridge forwarding database dumps by device, from
Jamal Hadi Salim.
7) virtio-net also now supports busy polling, from Jason Wang.
8) Some low level optimization tweaks in pktgen from Jesper Dangaard
Brouer.
9) Add support for ipv6 address generation modes, so that userland
can have some input into the process. From Jiri Pirko.
10) Consolidate common TCP connection request code in ipv4 and ipv6,
from Octavian Purdila.
11) New ARP packet logger in netfilter, from Pablo Neira Ayuso.
12) Generic resizable RCU hash table, with intial users in netlink and
nftables. From Thomas Graf.
13) Maintain a name assignment type so that userspace can see where a
network device name came from (enumerated by kernel, assigned
explicitly by userspace, etc.) From Tom Gundersen.
14) Automatic flow label generation on transmit in ipv6, from Tom
Herbert.
15) New packet timestamping facilities from Willem de Bruijn, meant to
assist in measuring latencies going into/out-of the packet
scheduler, latency from TCP data transmission to ACK, etc"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1536 commits)
cxgb4 : Disable recursive mailbox commands when enabling vi
net: reduce USB network driver config options.
tg3: Modify tg3_tso_bug() to handle multiple TX rings
amd-xgbe: Perform phy connect/disconnect at dev open/stop
amd-xgbe: Use dma_set_mask_and_coherent to set DMA mask
net: sun4i-emac: fix memory leak on bad packet
sctp: fix possible seqlock seadlock in sctp_packet_transmit()
Revert "net: phy: Set the driver when registering an MDIO bus device"
cxgb4vf: Turn off SGE RX/TX Callback Timers and interrupts in PCI shutdown routine
team: Simplify return path of team_newlink
bridge: Update outdated comment on promiscuous mode
net-timestamp: ACK timestamp for bytestreams
net-timestamp: TCP timestamping
net-timestamp: SCHED timestamp on entering packet scheduler
net-timestamp: add key to disambiguate concurrent datagrams
net-timestamp: move timestamp flags out of sk_flags
net-timestamp: extend SCM_TIMESTAMPING ancillary data struct
cxgb4i : Move stray CPL definitions to cxgb4 driver
tcp: reduce spurious retransmits due to transient SACK reneging
qlcnic: Initialize dcbnl_ops before register_netdev
...
Pull security subsystem updates from James Morris:
"In this release:
- PKCS#7 parser for the key management subsystem from David Howells
- appoint Kees Cook as seccomp maintainer
- bugfixes and general maintenance across the subsystem"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (94 commits)
X.509: Need to export x509_request_asymmetric_key()
netlabel: shorter names for the NetLabel catmap funcs/structs
netlabel: fix the catmap walking functions
netlabel: fix the horribly broken catmap functions
netlabel: fix a problem when setting bits below the previously lowest bit
PKCS#7: X.509 certificate issuer and subject are mandatory fields in the ASN.1
tpm: simplify code by using %*phN specifier
tpm: Provide a generic means to override the chip returned timeouts
tpm: missing tpm_chip_put in tpm_get_random()
tpm: Properly clean sysfs entries in error path
tpm: Add missing tpm_do_selftest to ST33 I2C driver
PKCS#7: Use x509_request_asymmetric_key()
Revert "selinux: fix the default socket labeling in sock_graft()"
X.509: x509_request_asymmetric_keys() doesn't need string length arguments
PKCS#7: fix sparse non static symbol warning
KEYS: revert encrypted key change
ima: add support for measuring and appraising firmware
firmware_class: perform new LSM checks
security: introduce kernel_fw_from_file hook
PKCS#7: Missing inclusion of linux/err.h
...
Conflicts:
drivers/net/Makefile
net/ipv6/sysctl_net_ipv6.c
Two ipv6_table_template[] additions overlap, so the index
of the ipv6_table[x] assignments needed to be adjusted.
In the drivers/net/Makefile case, we've gotten rid of the
garbage whereby we had to list every single USB networking
driver in the top-level Makefile, there is just one
"USB_NETWORKING" that guards everything.
Signed-off-by: David S. Miller <davem@davemloft.net>
- kmalloc_array instead of kmalloc when possible
- avoid log spam due to useless net_ratelimit() invocations
- increase default metric hop penalty from 15 to 30
- update internal version number
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJT4Ir0AAoJEJgn97Bh2u9eYG8P/2Bx9uCbajNQSCBvQ20OXW4o
YzubqMETLel6GLoVjL2XFsmJATaQGGWYWGSytDF2sRa1C7/dXXfUrwgfFNzw2L9Z
nZSd8WbhiBW0LK/ECzTQ12M3VnTdjgj3IP9A58N9zqQf5uM1onQR6gR7IQGARn9S
D5onFdoRz5bWcVffyf4YN9irMkvAFjOB7OzNtBaHRbH5oObITS10LcQ7rVLAic/0
lyMCX1ioBnFpbH1YfII0dcSFjY22m9QTjJDj4dx6LbjDXaxv9BMiJ7bfjCt5wn1o
0ju5X898fhTX0L4Z67pGGHzawByyXtrf1r9INot8K8oqftq5vkHpIJdn9n0GrYGa
9FVoQ0hzDVSp7mxnBKnbXyaf47HX8RYRBZegHEDW8zrpO0zj7cyruRKifDL3q9R9
cqGIRYSFPaXEGF6HfmxWXCjst1VpddcTBgD7OWlKuTi/Juk2ZoWI4O+h3WlotOv8
niTiG0DiNoGlrLXjFlya6TdfL43zg3sE8O4oukcVO4uejo+wdL+Rc7ZGuO7CbO9k
V+ORQ/8Efg3XD8xRSmzi0pgyBfU/C6HTTWCIv8DIMfl0B5zoeIx75quDP9/CKIfz
UTF0R1+mA0o6UmSiIngqRTFEEB9cWrMvCjM1drMb1V99NzI+/fg1skO7pcnJRddC
rbPgeTHDAYgMsPj74rhs
=EIGl
-----END PGP SIGNATURE-----
Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge
Antonio Quartulli says:
====================
pull request: batman-adv 2014-08-05
this is a pull request intended for net-next/linux-3.17 (yeah..it's really
late).
Patches 1, 2 and 4 are really minor changes:
- kmalloc_array is substituted to kmalloc when possible (as suggested by
checkpatch);
- net_ratelimited() is now used properly and the "suppressed" message is not
printed anymore if not needed;
- the internal version number has been increased to reflect our current version.
Patch 3 instead is introducing a change in the metric computation function
by changing the penalty applied at each mesh hop from 15/255 (~6%) to
30/255 (~11%). This change is introduced by Simon Wunderlich after having
observed a performance improvement in several networks when using the new value.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Now bridge ports can be non-promiscuous, vlan_vid_add() is no longer an
unnecessary operation.
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add SOF_TIMESTAMPING_TX_ACK, a request for a tstamp when the last byte
in the send() call is acknowledged. It implements the feature for TCP.
The timestamp is generated when the TCP socket cumulative ACK is moved
beyond the tracked seqno for the first time. The feature ignores SACK
and FACK, because those acknowledge the specific byte, but not
necessarily the entire contents of the buffer up to that byte.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TCP timestamping extends SO_TIMESTAMPING to bytestreams.
Bytestreams do not have a 1:1 relationship between send() buffers and
network packets. The feature interprets a send call on a bytestream as
a request for a timestamp for the last byte in that send() buffer.
The choice corresponds to a request for a timestamp when all bytes in
the buffer have been sent. That assumption depends on in-order kernel
transmission. This is the common case. That said, it is possible to
construct a traffic shaping tree that would result in reordering.
The guarantee is strong, then, but not ironclad.
This implementation supports send and sendpages (splice). GSO replaces
one large packet with multiple smaller packets. This patch also copies
the option into the correct smaller packet.
This patch does not yet support timestamping on data in an initial TCP
Fast Open SYN, because that takes a very different data path.
If ID generation in ee_data is enabled, bytestream timestamps return a
byte offset, instead of the packet counter for datagrams.
The implementation supports a single timestamp per packet. It silenty
replaces requests for previous timestamps. To avoid missing tstamps,
flush the tcp queue by disabling Nagle, cork and autocork. Missing
tstamps can be detected by offset when the ee_data ID is enabled.
Implementation details:
- On GSO, the timestamping code can be included in the main loop. I
moved it into its own loop to reduce the impact on the common case
to a single branch.
- To avoid leaking the absolute seqno to userspace, the offset
returned in ee_data must always be relative. It is an offset between
an skb and sk field. The first is always set (also for GSO & ACK).
The second must also never be uninitialized. Only allow the ID
option on sockets in the ESTABLISHED state, for which the seqno
is available. Never reset it to zero (instead, move it to the
current seqno when reenabling the option).
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kernel transmit latency is often incurred in the packet scheduler.
Introduce a new timestamp on transmission just before entering the
scheduler. When data travels through multiple devices (bonding,
tunneling, ...) each device will export an individual timestamp.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Datagrams timestamped on transmission can coexist in the kernel stack
and be reordered in packet scheduling. When reading looped datagrams
from the socket error queue it is not always possible to unique
correlate looped data with original send() call (for application
level retransmits). Even if possible, it may be expensive and complex,
requiring packet inspection.
Introduce a data-independent ID mechanism to associate timestamps with
send calls. Pass an ID alongside the timestamp in field ee_data of
sock_extended_err.
The ID is a simple 32 bit unsigned int that is associated with the
socket and incremented on each send() call for which software tx
timestamp generation is enabled.
The feature is enabled only if SOF_TIMESTAMPING_OPT_ID is set, to
avoid changing ee_data for existing applications that expect it 0.
The counter is reset each time the flag is reenabled. Reenabling
does not change the ID of already submitted data. It is possible
to receive out of order IDs if the timestamp stream is not quiesced
first.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sk_flags is reaching its limit. New timestamping options will not fit.
Move all of them into a new field sk->sk_tsflags.
Added benefit is that this removes boilerplate code to convert between
SOF_TIMESTAMPING_.. and SOCK_TIMESTAMPING_.. in getsockopt/setsockopt.
SOCK_TIMESTAMPING_RX_SOFTWARE is also used to toggle the receive
timestamp logic (netstamp_needed). That can be simplified and this
last key removed, but will leave that for a separate patch.
Signed-off-by: Willem de Bruijn <willemb@google.com>
----
The u16 in sock can be moved into a 16-bit hole below sk_gso_max_segs,
though that scatters tstamp fields throughout the struct.
Signed-off-by: David S. Miller <davem@davemloft.net>
Applications that request kernel tx timestamps with SO_TIMESTAMPING
read timestamps as recvmsg() ancillary data. The response is defined
implicitly as timespec[3].
1) define struct scm_timestamping explicitly and
2) add support for new tstamp types. On tx, scm_timestamping always
accompanies a sock_extended_err. Define previously unused field
ee_info to signal the type of ts[0]. Introduce SCM_TSTAMP_SND to
define the existing behavior.
The reception path is not modified. On rx, no struct similar to
sock_extended_err is passed along with SCM_TIMESTAMPING.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit reduces spurious retransmits due to apparent SACK reneging
by only reacting to SACK reneging that persists for a short delay.
When a sequence space hole at snd_una is filled, some TCP receivers
send a series of ACKs as they apparently scan their out-of-order queue
and cumulatively ACK all the packets that have now been consecutiveyly
received. This is essentially misbehavior B in "Misbehaviors in TCP
SACK generation" ACM SIGCOMM Computer Communication Review, April
2011, so we suspect that this is from several common OSes (Windows
2000, Windows Server 2003, Windows XP). However, this issue has also
been seen in other cases, e.g. the netdev thread "TCP being hoodwinked
into spurious retransmissions by lack of timestamps?" from March 2014,
where the receiver was thought to be a BSD box.
Since snd_una would temporarily be adjacent to a previously SACKed
range in these scenarios, this receiver behavior triggered the Linux
SACK reneging code path in the sender. This led the sender to clear
the SACK scoreboard, enter CA_Loss, and spuriously retransmit
(potentially) every packet from the entire write queue at line rate
just a few milliseconds before the ACK for each packet arrives at the
sender.
To avoid such situations, now when a sender sees apparent reneging it
does not yet retransmit, but rather adjusts the RTO timer to give the
receiver a little time (max(RTT/2, 10ms)) to send us some more ACKs
that will restore sanity to the SACK scoreboard. If the reneging
persists until this RTO then, as before, we clear the SACK scoreboard
and enter CA_Loss.
A 10ms delay tolerates a receiver sending such a stream of ACKs at
56Kbit/sec. And to allow for receivers with slower or more congested
paths, we wait for at least RTT/2.
We validated the resulting max(RTT/2, 10ms) delay formula with a mix
of North American and South American Google web server traffic, and
found that for ACKs displaying transient reneging:
(1) 90% of inter-ACK delays were less than 10ms
(2) 99% of inter-ACK delays were less than RTT/2
In tests on Google web servers this commit reduced reneging events by
75%-90% (as measured by the TcpExtTCPSACKReneging counter), without
any measurable impact on latency for user HTTP and SPDY requests.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
net/6lowpan/iphc.c
Minor conflicts in iphc.c were changes overlapping with some
style cleanups.
John W. Linville says:
====================
Please pull this last(?) batch of wireless change intended for the
3.17 stream...
For the NFC bits, Samuel says:
"This is a rather quiet one, we have:
- A new driver from ST Microelectronics for their NCI ST21NFCB,
including device tree support.
- p2p support for the ST21NFCA driver
- A few fixes an enhancements for the NFC digital laye"
For the Atheros bits, Kalle says:
"Michal and Janusz did some important RX aggregation fixes, basically we
were missing RX reordering altogether. The 10.1 firmware doesn't support
Ad-Hoc mode and Michal fixed ath10k so that it doesn't advertise Ad-Hoc
support with that firmware. Also he implemented a workaround for a KVM
issue."
For the Bluetooth bits, Gustavo and Johan say:
"To quote Gustavo from his previous request:
'Some last minute fixes for -next. We have a fix for a use after free in
RFCOMM, another fix to an issue with ADV_DIRECT_IND and one for ADV_IND with
auto-connection handling. Last, we added support for reading the codec and
MWS setting for controllers that support these features.'
Additionally there are fixes to LE scanning, an update to conform to the 4.1
core specification as well as fixes for tracking the page scan state. All
of these fixes are important for 3.17."
And,
"We've got:
- 6lowpan fixes/cleanups
- A couple crash fixes, one for the Marvell HCI driver and another in LE SMP.
- Fix for an incorrect connected state check
- Fix for the bondable requirement during pairing (an issue which had
crept in because of using "pairable" when in fact the actual meaning
was "bondable" (these have different meanings in Bluetooth)"
Along with those are some late-breaking hardware support patches in
brcmfmac and b43 as well as a stray ath9k patch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
In svc_rdma_accept(), if rdma_create_qp() fails, there is useless
logic to try and call rdma_create_qp() again with reduced sge depths.
The assumption, I guess, was that perhaps the initial sge depths
chosen were too big. However they initial depths are selected based
on the rdma device attribute max_sge returned from ib_query_device().
If rdma_create_qp() fails, it would not be because the max_send_sge and
max_recv_sge values passed in exceed the device's max. So just remove
this code.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The default hop penalty is currently set to 15, which is applied like
that for multi interface devices (e.g. dual band APs). Single band
devices will still use an effective penalty of 30 (hop penalty + wifi
penalty).
After receiving reports of too long paths in mesh networks with dual
band APs which were fixed by increasing the hop penalty, we'd like to
suggest to increase that default value in the default setting as well.
We've evaluated that increase in a handful of medium sized mesh
networks (5-20 nodes) with single and dual band devices, with changes
for the better (shorter routes, higher throughput) or no change at all.
This patch changes the hop penalty to 30, which will give an effective
penalty of 60 on single band devices (hop penalty + wifi penalty).
Signed-off-by: Simon Wunderlich <simon@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
This patch removes unnecessary logspam which resulted from superfluous
calls to net_ratelimit(). With the supplied patch, net_ratelimit() is
called after the loglevel has been checked.
Signed-off-by: André Gaul <gaul@web-yard.de>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
batadv_frag_insert_packet was unable to handle out-of-order packets because it
dropped them directly. This is caused by the way the fragmentation lists is
checked for the correct place to insert a fragmentation entry.
The fragmentation code keeps the fragments in lists. The fragmentation entries
are kept in descending order of sequence number. The list is traversed and each
entry is compared with the new fragment. If the current entry has a smaller
sequence number than the new fragment then the new one has to be inserted
before the current entry. This ensures that the list is still in descending
order.
An out-of-order packet with a smaller sequence number than all entries in the
list still has to be added to the end of the list. The used hlist has no
information about the last entry in the list inside hlist_head and thus the
last entry has to be calculated differently. Currently the code assumes that
the iterator variable of hlist_for_each_entry can be used for this purpose
after the hlist_for_each_entry finished. This is obviously wrong because the
iterator variable is always NULL when the list was completely traversed.
Instead the information about the last entry has to be stored in a different
variable.
This problem was introduced in 610bfc6bc9
("batman-adv: Receive fragmented packets and merge").
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
With netlink_lookup() conversion to RCU, we need to use appropriate
rcu dereference in netlink_seq_socket_idx() & netlink_seq_next()
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: e341694e3e ("netlink: Convert netlink_lookup() to use RCU protected hash table")
Signed-off-by: David S. Miller <davem@davemloft.net>
Here's the big tty / serial driver update for 3.17-rc1.
Nothing major, just a number of fixes and new features for different
serial drivers, and some more tty core fixes and documentation of the
tty locks.
All of these have been in linux-next for a while.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlPf2C4ACgkQMUfUDdst+yllVgCgtZl/Mcr/LlxPgjsg2C1AE7nX
YJ4An3o4N112bkdGqhZ7RjAE6K/8YILx
=rPhE
-----END PGP SIGNATURE-----
Merge tag 'tty-3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty / serial driver update from Greg KH:
"Here's the big tty / serial driver update for 3.17-rc1.
Nothing major, just a number of fixes and new features for different
serial drivers, and some more tty core fixes and documentation of the
tty locks.
All of these have been in linux-next for a while"
* tag 'tty-3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (82 commits)
tty/n_gsm.c: fix a memory leak in gsmld_open
pch_uart: don't hardcode PCI slot to get DMA device
tty: n_gsm, use setup_timer
Revert "ARC: [arcfpga] stdout-path now suffices for earlycon/console"
serial: sc16is7xx: Correct initialization of s->clk
serial: 8250_dw: Add support for deferred probing
serial: 8250_dw: Add optional reset control support
serial: st-asc: Fix overflow in baudrate calculation
serial: st-asc: Don't call BUG in asc_console_setup()
tty: serial: msm: Make of_device_id array const
tty/n_gsm.c: get gsm->num after gsm_activate_mux
serial/core: Fix too big allocation for attribute member
drivers/tty/serial: use correct type for dma_map/unmap
serial: altera_jtaguart: Fix putchar function passed to uart_console_write()
serial/uart/8250: Add tunable RX interrupt trigger I/F of FIFO buffers
Serial: allow port drivers to have a default attribute group
tty: kgdb_nmi: Automatically manage tty enable
serial: altera_jtaguart: Adpot uart_console_write()
serial: samsung: improve code clarity by defining a variable
serial: samsung: correct the case and default order in switch
...
Pull scheduler updates from Ingo Molnar:
- Move the nohz kick code out of the scheduler tick to a dedicated IPI,
from Frederic Weisbecker.
This necessiated quite some background infrastructure rework,
including:
* Clean up some irq-work internals
* Implement remote irq-work
* Implement nohz kick on top of remote irq-work
* Move full dynticks timer enqueue notification to new kick
* Move multi-task notification to new kick
* Remove unecessary barriers on multi-task notification
- Remove proliferation of wait_on_bit() action functions and allow
wait_on_bit_action() functions to support a timeout. (Neil Brown)
- Another round of sched/numa improvements, cleanups and fixes. (Rik
van Riel)
- Implement fast idling of CPUs when the system is partially loaded,
for better scalability. (Tim Chen)
- Restructure and fix the CPU hotplug handling code that may leave
cfs_rq and rt_rq's throttled when tasks are migrated away from a dead
cpu. (Kirill Tkhai)
- Robustify the sched topology setup code. (Peterz Zijlstra)
- Improve sched_feat() handling wrt. static_keys (Jason Baron)
- Misc fixes.
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
sched/fair: Fix 'make xmldocs' warning caused by missing description
sched: Use macro for magic number of -1 for setparam
sched: Robustify topology setup
sched: Fix sched_setparam() policy == -1 logic
sched: Allow wait_on_bit_action() functions to support a timeout
sched: Remove proliferation of wait_on_bit() action functions
sched/numa: Revert "Use effective_load() to balance NUMA loads"
sched: Fix static_key race with sched_feat()
sched: Remove extra static_key*() function indirection
sched/rt: Fix replenish_dl_entity() comments to match the current upstream code
sched: Transform resched_task() into resched_curr()
sched/deadline: Kill task_struct->pi_top_task
sched: Rework check_for_tasks()
sched/rt: Enqueue just unthrottled rt_rq back on the stack in __disable_runtime()
sched/fair: Disable runtime_enabled on dying rq
sched/numa: Change scan period code to match intent
sched/numa: Rework best node setting in task_numa_migrate()
sched/numa: Examine a task move when examining a task swap
sched/numa: Simplify task_numa_compare()
sched/numa: Use effective_load() to balance NUMA loads
...
tcpm_key is an array inside struct tcp_md5sig, there is no need to check it
against NULL.
Signed-off-by: Dmitry Popov <ixaphire@qrator.net>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 6cbdceeb1c
bridge: Dump vlan information from a bridge port
introduced a comment in an attempt to explain the
code logic. The comment is unfinished so it confuses more
than it explains, remove it.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull cgroup changes from Tejun Heo:
"Mostly changes to get the v2 interface ready. The core features are
mostly ready now and I think it's reasonable to expect to drop the
devel mask in one or two devel cycles at least for a subset of
controllers.
- cgroup added a controller dependency mechanism so that block cgroup
can depend on memory cgroup. This will be used to finally support
IO provisioning on the writeback traffic, which is currently being
implemented.
- The v2 interface now uses a separate table so that the interface
files for the new interface are explicitly declared in one place.
Each controller will explicitly review and add the files for the
new interface.
- cpuset is getting ready for the hierarchical behavior which is in
the similar style with other controllers so that an ancestor's
configuration change doesn't change the descendants' configurations
irreversibly and processes aren't silently migrated when a CPU or
node goes down.
All the changes are to the new interface and no behavior changed for
the multiple hierarchies"
* 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (29 commits)
cpuset: fix the WARN_ON() in update_nodemasks_hier()
cgroup: initialize cgrp_dfl_root_inhibit_ss_mask from !->dfl_files test
cgroup: make CFTYPE_ONLY_ON_DFL and CFTYPE_NO_ internal to cgroup core
cgroup: distinguish the default and legacy hierarchies when handling cftypes
cgroup: replace cgroup_add_cftypes() with cgroup_add_legacy_cftypes()
cgroup: rename cgroup_subsys->base_cftypes to ->legacy_cftypes
cgroup: split cgroup_base_files[] into cgroup_{dfl|legacy}_base_files[]
cpuset: export effective masks to userspace
cpuset: allow writing offlined masks to cpuset.cpus/mems
cpuset: enable onlined cpu/node in effective masks
cpuset: refactor cpuset_hotplug_update_tasks()
cpuset: make cs->{cpus, mems}_allowed as user-configured masks
cpuset: apply cs->effective_{cpus,mems}
cpuset: initialize top_cpuset's configured masks at mount
cpuset: use effective cpumask to build sched domains
cpuset: inherit ancestor's masks if effective_{cpus, mems} becomes empty
cpuset: update cs->effective_{cpus, mems} when config changes
cpuset: update cpuset->effective_{cpus,mems} at hotplug
cpuset: add cs->effective_cpus and cs->effective_mems
cgroup: clean up sane_behavior handling
...
Reported by checkpatch with the following warning:
WARNING: Prefer kmalloc_array over kmalloc with multiply
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
current_cred() can only be changed by 'current', and
cred->group_info is never changed. If a new group_info is
needed, a new 'cred' is created.
Consequently it is always safe to access
current_cred()->group_info
without taking any further references.
So drop the refcounting and the incorrect rcu_dereference().
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
The new flag RPCAUTH_LOOKUP_RCU to credential lookup avoids locking,
does not take a reference on the returned credential, and returns
-ECHILD if a simple lookup was not possible.
The returned value can only be used within an rcu_read_lock protected
region.
The main user of this is the new rpc_lookup_cred_nonblock() which
returns a pointer to the current credential which is only rcu-safe (no
ref-count held), and might return -ECHILD if allocation was required.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Fix the endianness handling in gss_wrap_kerberos_v1 and drop the memset
call there in favor of setting the filler bytes directly.
In gss_wrap_kerberos_v2, get rid of the "ec" variable which is always
zero, and drop the endianness conversion of 0. Sparse handles 0 as a
special case, so it's not necessary.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Use u16 pointer in setup_token and setup_token_v2. None of the fields
are actually handled as __be16, so this simplifies the code a bit. Also
get rid of some unneeded pointer increments.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
The handling of the gc_ctx pointer only seems to be partially RCU-safe.
The assignment and freeing are done using RCU, but many places in the
code seem to dereference that pointer without proper RCU safeguards.
Fix them to use rcu_dereference and to rcu_read_lock/unlock, and to
properly handle the case where the pointer is NULL.
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* 'nfs-rdma' of git://git.linux-nfs.org/projects/anna/nfs-rdma: (916 commits)
xprtrdma: Handle additional connection events
xprtrdma: Remove RPCRDMA_PERSISTENT_REGISTRATION macro
xprtrdma: Make rpcrdma_ep_disconnect() return void
xprtrdma: Schedule reply tasklet once per upcall
xprtrdma: Allocate each struct rpcrdma_mw separately
xprtrdma: Rename frmr_wr
xprtrdma: Disable completions for LOCAL_INV Work Requests
xprtrdma: Disable completions for FAST_REG_MR Work Requests
xprtrdma: Don't post a LOCAL_INV in rpcrdma_register_frmr_external()
xprtrdma: Reset FRMRs after a flushed LOCAL_INV Work Request
xprtrdma: Reset FRMRs when FAST_REG_MR is flushed by a disconnect
xprtrdma: Properly handle exhaustion of the rb_mws list
xprtrdma: Chain together all MWs in same buffer pool
xprtrdma: Back off rkey when FAST_REG_MR fails
xprtrdma: Unclutter struct rpcrdma_mr_seg
xprtrdma: Don't invalidate FRMRs if registration fails
xprtrdma: On disconnect, don't ignore pending CQEs
xprtrdma: Update rkeys after transport reconnect
xprtrdma: Limit data payload size for ALLPHYSICAL
xprtrdma: Protect ia->ri_id when unmapping/invalidating MRs
...
In some cases where the credentials are not often reused, we may want
to limit their total number just in order to make the negative lookups
in the hash table more manageable.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
The sizing of the hash table and the practice of requiring a lookup
to retrieve the pprev to be stored in the element cookie before the
deletion of an entry is left intact.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Patrick McHardy <kaber@trash.net>
Reviewed-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heavy Netlink users such as Open vSwitch spend a considerable amount of
time in netlink_lookup() due to the read-lock on nl_table_lock. Use of
RCU relieves the lock contention.
Makes use of the new resizable hash table to avoid locking on the
lookup.
The hash table will grow if entries exceeds 75% of table size up to a
total table size of 64K. It will automatically shrink if usage falls
below 30%.
Also splits nl_table_lock into a separate mutex to protect hash table
mutations and allow synchronize_rcu() to sleep while waiting for readers
during expansion and shrinking.
Before:
9.16% kpktgend_0 [openvswitch] [k] masked_flow_lookup
6.42% kpktgend_0 [pktgen] [k] mod_cur_headers
6.26% kpktgend_0 [pktgen] [k] pktgen_thread_worker
6.23% kpktgend_0 [kernel.kallsyms] [k] memset
4.79% kpktgend_0 [kernel.kallsyms] [k] netlink_lookup
4.37% kpktgend_0 [kernel.kallsyms] [k] memcpy
3.60% kpktgend_0 [openvswitch] [k] ovs_flow_extract
2.69% kpktgend_0 [kernel.kallsyms] [k] jhash2
After:
15.26% kpktgend_0 [openvswitch] [k] masked_flow_lookup
8.12% kpktgend_0 [pktgen] [k] pktgen_thread_worker
7.92% kpktgend_0 [pktgen] [k] mod_cur_headers
5.11% kpktgend_0 [kernel.kallsyms] [k] memset
4.11% kpktgend_0 [openvswitch] [k] ovs_flow_extract
4.06% kpktgend_0 [kernel.kallsyms] [k] _raw_spin_lock
3.90% kpktgend_0 [kernel.kallsyms] [k] jhash2
[...]
0.67% kpktgend_0 [kernel.kallsyms] [k] netlink_lookup
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter/IPVS fixes for net
The following patchset contains Netfilter/IPVS fixes for your net tree,
they are:
1) Maintain all DSCP and ECN bits for IPv6 tun forwarding. This
resolves an inconsistency between IPv4 and IPv6 behaviour.
Patch from Alex Gartrell via Simon Horman.
2) Fix unnoticeable blink in xt_LED when the led-always-blink option is
used, from Jiri Prchal.
3) Add missing return in nft_del_setelem(), otherwise this results in a
double call of nft_data_uninit() in the nf_tables code, from Thomas Graf.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: e110861f86 ("net: add a sysctl to reflect the fwmark on replies")
Cc: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use kmem_cache to allocate/free inet_frag_queue objects since they're
all the same size per inet_frags user and are alloced/freed in high volumes
thus making it a perfect case for kmem_cache.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that we have INET_FRAG_EVICTED we might as well use it to stop
sending icmp messages in the "frag_expire" functions instead of
stripping INET_FRAG_FIRST_IN from their flags when evicting.
Also fix the comment style in ip6_expire_frag_queue().
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix a couple of functions' declaration alignments.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The last_in field has been used to store various flags different from
first/last frag in so give it a more descriptive name: flags.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Softirqs are already disabled so no need to do it again, thus let's be
consistent and use the IP6_INC_STATS_BH variant.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip_local_deliver_finish() already have a rcu_read_lock/unlock, so
the rcu_read_lock/unlock is unnecessary.
See the stack below:
ip_local_deliver_finish
|
|
->icmp_rcv
|
|
->icmp_socket_deliver
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
clean up names related to socket filtering and bpf in the following way:
- everything that deals with sockets keeps 'sk_*' prefix
- everything that is pure BPF is changed to 'bpf_*' prefix
split 'struct sk_filter' into
struct sk_filter {
atomic_t refcnt;
struct rcu_head rcu;
struct bpf_prog *prog;
};
and
struct bpf_prog {
u32 jited:1,
len:31;
struct sock_fprog_kern *orig_prog;
unsigned int (*bpf_func)(const struct sk_buff *skb,
const struct bpf_insn *filter);
union {
struct sock_filter insns[0];
struct bpf_insn insnsi[0];
struct work_struct work;
};
};
so that 'struct bpf_prog' can be used independent of sockets and cleans up
'unattached' bpf use cases
split SK_RUN_FILTER macro into:
SK_RUN_FILTER to be used with 'struct sk_filter *' and
BPF_PROG_RUN to be used with 'struct bpf_prog *'
__sk_filter_release(struct sk_filter *) gains
__bpf_prog_release(struct bpf_prog *) helper function
also perform related renames for the functions that work
with 'struct bpf_prog *', since they're on the same lines:
sk_filter_size -> bpf_prog_size
sk_filter_select_runtime -> bpf_prog_select_runtime
sk_filter_free -> bpf_prog_free
sk_unattached_filter_create -> bpf_prog_create
sk_unattached_filter_destroy -> bpf_prog_destroy
sk_store_orig_filter -> bpf_prog_store_orig_filter
sk_release_orig_filter -> bpf_release_orig_filter
__sk_migrate_filter -> bpf_migrate_filter
__sk_prepare_filter -> bpf_prepare_filter
API for attaching classic BPF to a socket stays the same:
sk_attach_filter(prog, struct sock *)/sk_detach_filter(struct sock *)
and SK_RUN_FILTER(struct sk_filter *, ctx) to execute a program
which is used by sockets, tun, af_packet
API for 'unattached' BPF programs becomes:
bpf_prog_create(struct bpf_prog **)/bpf_prog_destroy(struct bpf_prog *)
and BPF_PROG_RUN(struct bpf_prog *, ctx) to execute a program
which is used by isdn, ppp, team, seccomp, ptp, xt_bpf, cls_bpf, test_bpf
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
to indicate that this function is converting classic BPF into eBPF
and not related to sockets
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
trivial rename to indicate that this functions performs classic BPF checking
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
trivial rename to better match semantics of macro
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
attaching bpf program to a socket involves multiple socket memory arithmetic,
since size of 'sk_filter' is changing when classic BPF is converted to eBPF.
Also common path of program creation has to deal with two ways of freeing
the memory.
Simplify the code by delaying socket charging until program is ready and
its size is known
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>