Commit graph

842 commits

Author SHA1 Message Date
Daniel Borkmann
7d1d65cb84 net: sched: cls_bpf: add BPF-based classifier
This work contains a lightweight BPF-based traffic classifier that can
serve as a flexible alternative to ematch-based tree classification, i.e.
now that BPF filter engine can also be JITed in the kernel. Naturally, tc
actions and policies are supported as well with cls_bpf. Multiple BPF
programs/filter can be attached for a class, or they can just as well be
written within a single BPF program, that's really up to the user how he
wishes to run/optimize the code, e.g. also for inversion of verdicts etc.
The notion of a BPF program's return/exit codes is being kept as follows:

     0: No match
    -1: Select classid given in "tc filter ..." command
  else: flowid, overwrite the default one

As a minimal usage example with iproute2, we use a 3 band prio root qdisc
on a router with sfq each as leave, and assign ssh and icmp bpf-based
filters to band 1, http traffic to band 2 and the rest to band 3. For the
first two bands we load the bytecode from a file, in the 2nd we load it
inline as an example:

echo 1 > /proc/sys/net/core/bpf_jit_enable

tc qdisc del dev em1 root
tc qdisc add dev em1 root handle 1: prio bands 3 priomap 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

tc qdisc add dev em1 parent 1:1 sfq perturb 16
tc qdisc add dev em1 parent 1:2 sfq perturb 16
tc qdisc add dev em1 parent 1:3 sfq perturb 16

tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/ssh.bpf flowid 1:1
tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/icmp.bpf flowid 1:1
tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/http.bpf flowid 1:2
tc filter add dev em1 parent 1: bpf run bytecode "`bpfc -f tc -i misc.ops`" flowid 1:3

BPF programs can be easily created and passed to tc, either as inline
'bytecode' or 'bytecode-file'. There are a couple of front-ends that can
compile opcodes, for example:

1) People familiar with tcpdump-like filters:

   tcpdump -iem1 -ddd port 22 | tr '\n' ',' > /etc/tc/ssh.bpf

2) People that want to low-level program their filters or use BPF
   extensions that lack support by libpcap's compiler:

   bpfc -f tc -i ssh.ops > /etc/tc/ssh.bpf

   ssh.ops example code:
   ldh [12]
   jne #0x800, drop
   ldb [23]
   jneq #6, drop
   ldh [20]
   jset #0x1fff, drop
   ldxb 4 * ([14] & 0xf)
   ldh [%x + 14]
   jeq #0x16, pass
   ldh [%x + 16]
   jne #0x16, drop
   pass: ret #-1
   drop: ret #0

It was chosen to load bytecode into tc, since the reverse operation,
tc filter list dev em1, is then able to show the exact commands again.
Possible follow-up work could also include a small expression compiler
for iproute2. Tested with the help of bmon. This idea came up during
the Netfilter Workshop 2013 in Copenhagen. Also thanks to feedback from
Eric Dumazet!

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29 17:33:17 -04:00
David S. Miller
c3fa32b976 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/usb/qmi_wwan.c
	include/net/dst.h

Trivial merge conflicts, both were overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-23 16:49:34 -04:00
Linus Torvalds
0d645a8b82 Disable not-quite-ready userspace ABI for IB flow steering
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABCAAGBQJSZrLkAAoJEENa44ZhAt0hHrkQAIzkHBgLTcOzA4I4YZx8sXLG
 kL6cdK5RgBABq8OcboKup/QkSIcIacpXdiqaQWia5GnaLooBKPx2a8Wx6uW+P319
 YMe6faEkIi8RAwbYaF+Vm/wM9dP959N1vqwgQ/Hyx1Jslfx8+ychoD2C46CQGkz7
 MamApzh5arqox+Nql8Z5QB91SIGzybyBaTG0YNTgweDBR22mBS5+aKOywA4ndlKE
 aOfouI0hSgc7NzdwPHzwZcvA4mnHP+IvTvnt4EMCLMpgHtyYWpj7J4kPu7H58WwV
 XqSbtBsAnLDbNpNPWi08S9j8n0fIyUBttdEMrm7blgpB6gv8RffDoTQySE+J0ob2
 eDHgX7anliu+1/Iah8M4qTWtKWj2emA89zUqaFzj04PG12iLA3U27hnTe0eXFW8g
 GW1fE3Dkf3KytUS/pktRIGCeoNV1oePxmLsj9OK8Ii228vZIU04U/1th29k6zOdx
 8uauc7Mu3wV9UPcFLQy9AfNv/74tlNJHIjjyf7g8LPHCcyaSDJxabhIw+yvZISWH
 3MNbL2bRHqAUq9SqrknxK1pvddGxtupl585KksXOlWOHbpnC6iF6duB7OA+4x+RD
 9EfmmHwcWQ6WxDZyYHRvfZlRUYodbmRNVcoOwh921UxH7OjQzIU08FtoGrzFPWoR
 JBHFzGHJ47MVix5loqkp
 =4ySc
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull infiniband bugfix from Roland Dreier:
 "Disable not-quite-ready userspace ABI for IB flow steering"

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IB/core: Temporarily disable create_flow/destroy_flow uverbs
2013-10-23 07:51:25 +01:00
Linus Torvalds
db10accfd2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "Sorry I let so much accumulate, I was in Buffalo and wanted a few
  things to cook in my tree for a while before sending to you.  Anyways,
  it's a lot of little things as usual at this stage in the game"

 1) Make bonding MAINTAINERS entry reflect reality, from Andy
    Gospodarek.

 2) Fix accidental sock_put() on timewait mini sockets, from Eric
    Dumazet.

 3) Fix crashes in l2tp due to mis-handling of ipv4 mapped ipv6
    addresses, from François CACHEREUL.

 4) Fix heap overflow in __audit_sockaddr(), from the eagle eyed Dan
    Carpenter.

 5) tcp_shifted_skb() doesn't take handle FINs properly, from Eric
    Dumazet.

 6) SFC driver bug fixes from Ben Hutchings.

 7) Fix TX packet scheduling wedge after channel change in ath9k driver,
    from Felix Fietkau.

 8) Fix user after free in BPF JIT code, from Alexei Starovoitov.

 9) Source address selection test is reversed in
    __ip_route_output_key(), fix from Jiri Benc.

10) VLAN and CAN layer mis-size netlink attributes, from Marc
    Kleine-Budde.

11) Fix permission checks in sysctls to use current_euid() instead of
    current_uid().  From Eric W Biederman.

12) IPSEC policies can go away while a timer is still pending for them,
    add appropriate ref-counting to fix, from Steffen Klassert.

13) Fix mis-programming of FDR and RMCR registers on R8A7740 sh_eth
    chips, from Nguyen Hong Ky and Simon Horman.

14) MLX4 forgets to DMA unmap pages on RX, fix from Amir Vadai.

15) IPV6 GRE tunnel MTU upper limit is miscalculated, from Oussama
    Ghorbel.

16) Fix typo in fq_change(), we were assigning "initial quantum" to
    "quantum".  From Eric Dumazet.

17) Set a more appropriate sk_pacing_rate for non-TCP sockets, otherwise
    FQ packet scheduler does not pace those flows properly.  Also from
    Eric Dumazet.

18) rtlwifi miscalculates packet pointers, from Mark Cave-Ayland.

19) l2tp_xmit_skb() can be called from process context, not just softirq
    context, so we must always make sure to BH disable around it.  From
    Eric Dumazet.

20) On qdisc reset, we forget to purge the RB tree of SKBs in netem
    packet scheduler.  From Stephen Hemminger.

21) Fix info leak in farsync WAN driver ioctl() handler, from Dan
    Carpenter and Salva Peiró.

22) Fix PHY reset and other issues in dm9000 driver, from Nikita
    Kiryanov and Michael Abbott.

23) When hardware can do SCTP crc32 checksums, we accidently don't
    disable the csum offload when IPSEC transformations have been
    applied.  From Fan Du and Vlad Yasevich.

24) Tail loss probing in TCP leaves the socket in the wrong congestion
    avoidance state.  From Yuchung Cheng.

25) In CPSW driver, enable NAPI before interrupts are turned on, from
    Markus Pargmann.

26) Integer underflow and dual-assignment in YAM hamradio driver, from
    Dan Carpenter.

27) If we are going to mangle a packet in tcp_set_skb_tso_segs() we must
    unclone it.  This fixes various hard to track down crashes in
    drivers where the SKBs ->gso_segs was changing right from underneath
    the driver during TX queueing.  From Eric Dumazet.

28) Fix the handling of VLAN IDs, and in particular the special IDs 0
    and 4095, in the bridging layer.  From Toshiaki Makita.

29) Another info leak, this time in wanxl WAN driver, from Salva Peiró.

30) Fix race in socket credential passing, from Daniel Borkmann.

31) WHen NETLABEL is disabled, we don't validate CIPSO packets properly,
    from Seif Mazareeb.

32) Fix identification of fragmented frames in ipv4/ipv6 UDP
    Fragmentation Offload output paths, from Jiri Pirko.

33) Virtual Function fixes in bnx2x driver from Yuval Mintz and Ariel
    Elior.

34) When we removed the explicit neighbour pointer from ipv6 routes a
    slight regression was introduced for users such as IPVS, xt_TEE, and
    raw sockets.  We mix up the users requested destination address with
    the routes assigned nexthop/gateway.  From Julian Anastasov and
    Simon Horman.

35) Fix stack overruns in rt6_probe(), the issue is that can end up
    doing two full packet xmit paths at the same time when emitting
    neighbour discovery messages.  From Hannes Frederic Sowa.

36) davinci_emac driver doesn't handle IFF_ALLMULTI correctly, from
    Mariusz Ceier.

37) Make sure to set TCP sk_pacing_rate after the first legitimate RTT
    sample, from Neal Cardwell.

38) Wrong netlink attribute passed to xfrm_replay_verify_len(), from
    Steffen Klassert.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (152 commits)
  ax88179_178a: Add VID:DID for Samsung USB Ethernet Adapter
  ax88179_178a: Correct the RX error definition in RX header
  Revert "bridge: only expire the mdb entry when query is received"
  tcp: initialize passive-side sk_pacing_rate after 3WHS
  davinci_emac.c: Fix IFF_ALLMULTI setup
  mac802154: correct a typo in ieee802154_alloc_device() prototype
  ipv6: probe routes asynchronous in rt6_probe
  netfilter: nf_conntrack: fix rt6i_gateway checks for H.323 helper
  ipv6: fill rt6i_gateway with nexthop address
  ipv6: always prefer rt6i_gateway if present
  bnx2x: Set NETIF_F_HIGHDMA unconditionally
  bnx2x: Don't pretend during register dump
  bnx2x: Lock DMAE when used by statistic flow
  bnx2x: Prevent null pointer dereference on error flow
  bnx2x: Fix config when SR-IOV and iSCSI are enabled
  bnx2x: Fix Coalescing configuration
  bnx2x: Unlock VF-PF channel on MAC/VLAN config error
  bnx2x: Prevent an illegal pointer dereference during panic
  bnx2x: Fix Maximum CoS estimation for VFs
  drivers: net: cpsw: fix kernel warn during iperf test with interrupt pacing
  ...
2013-10-23 07:47:42 +01:00
Yann Droneaud
7afbddfae9 IB/core: Temporarily disable create_flow/destroy_flow uverbs
The create_flow/destroy_flow uverbs and the associated extensions to
the user-kernel verbs ABI are under review and are too experimental to
freeze at this point.

So userspace is not exposed to experimental features and an uinstable
ABI, temporarily disable this for v3.12 (with a Kconfig option behind
staging to reenable it if desired).

The feature will be enabled after proper cleanup for v3.13.

Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Link: http://marc.info/?i=cover.1381351016.git.ydroneaud@opteya.com
Link: http://marc.info/?i=cover.1381177342.git.ydroneaud@opteya.com

[ Add a Kconfig option to reenable these verbs.  - Roland ]

Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-10-21 09:44:17 -07:00
Jiri Pirko
ec76aa4985 bonding: add Netlink support active_slave option
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19 18:58:46 -04:00
Jiri Pirko
90af231106 bonding: add Netlink support mode option
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19 18:58:46 -04:00
Chris Wilson
bc5bd37ce4 drm: Pad drm_mode_get_connector to 64-bit boundary
Pavel Roskin reported that DRM_IOCTL_MODE_GETCONNECTOR was overwritting
the 4 bytes beyond the end of its structure with a 32-bit userspace
running on a 64-bit kernel. This is due to the padding gcc inserts as
the drm_mode_get_connector struct includes a u64 and its size is not a
natural multiple of u64s.

64-bit kernel:

sizeof(drm_mode_get_connector)=80, alignof=8
sizeof(drm_mode_get_encoder)=20, alignof=4
sizeof(drm_mode_modeinfo)=68, alignof=4

32-bit userspace:

sizeof(drm_mode_get_connector)=76, alignof=4
sizeof(drm_mode_get_encoder)=20, alignof=4
sizeof(drm_mode_modeinfo)=68, alignof=4

Fortuituously we can insert explicit padding to the tail of our
structures without breaking ABI.

Reported-by: Pavel Roskin <proski@gnu.org>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Airlie <airlied@redhat.com>
Cc: dri-devel@lists.freedesktop.org
Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
2013-10-18 07:42:23 +01:00
Pablo Neira Ayuso
0628b123c9 netfilter: nfnetlink: add batch support and use it from nf_tables
This patch adds a batch support to nfnetlink. Basically, it adds
two new control messages:

* NFNL_MSG_BATCH_BEGIN, that indicates the beginning of a batch,
  the nfgenmsg->res_id indicates the nfnetlink subsystem ID.

* NFNL_MSG_BATCH_END, that results in the invocation of the
  ss->commit callback function. If not specified or an error
  ocurred in the batch, the ss->abort function is invoked
  instead.

The end message represents the commit operation in nftables, the
lack of end message results in an abort. This patch also adds the
.call_batch function that is only called from the batch receival
path.

This patch adds atomic rule updates and dumps based on
bitmask generations. This allows to atomically commit a set of
rule-set updates incrementally without altering the internal
state of existing nf_tables expressions/matches/targets.

The idea consists of using a generation cursor of 1 bit and
a bitmask of 2 bits per rule. Assuming the gencursor is 0,
then the genmask (expressed as a bitmask) can be interpreted
as:

00 active in the present, will be active in the next generation.
01 inactive in the present, will be active in the next generation.
10 active in the present, will be deleted in the next generation.
 ^
 gencursor

Once you invoke the transition to the next generation, the global
gencursor is updated:

00 active in the present, will be active in the next generation.
01 active in the present, needs to zero its future, it becomes 00.
10 inactive in the present, delete now.
^
gencursor

If a dump is in progress and nf_tables enters a new generation,
the dump will stop and return -EBUSY to let userspace know that
it has to retry again. In order to invalidate dumps, a global
genctr counter is increased everytime nf_tables enters a new
generation.

This new operation can be used from the user-space utility
that controls the firewall, eg.

nft -f restore

The rule updates contained in `file' will be applied atomically.

cat file
-----
add filter INPUT ip saddr 1.1.1.1 counter accept #1
del filter INPUT ip daddr 2.2.2.2 counter drop   #2
-EOF-

Note that the rule 1 will be inactive until the transition to the
next generation, the rule 2 will be evicted in the next generation.

There is a penalty during the rule update due to the branch
misprediction in the packet matching framework. But that should be
quickly resolved once the iteration over the commit list that
contain rules that require updates is finished.

Event notification happens once the rule-set update has been
committed. So we skip notifications is case the rule-set update
is aborted, which can happen in case that the rule-set is tested
to apply correctly.

This patch squashed the following patches from Pablo:

* nf_tables: atomic rule updates and dumps
* nf_tables: get rid of per rule list_head for commits
* nf_tables: use per netns commit list
* nfnetlink: add batch support and use it from nf_tables
* nf_tables: all rule updates are transactional
* nf_tables: attach replacement rule after stale one
* nf_tables: do not allow deletion/replacement of stale rules
* nf_tables: remove unused NFTA_RULE_FLAGS

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-14 18:01:01 +02:00
Eric Leblond
5e94846686 netfilter: nf_tables: add insert operation
This patch adds a new rule attribute NFTA_RULE_POSITION which is
used to store the position of a rule relatively to the others.
By providing the create command and specifying the position, the
rule is inserted after the rule with the handle equal to the
provided position.

Regarding notification, the position attribute specifies the
handle of the previous rule to make sure we don't point to any
stale rule in notifications coming from the commit path.

This patch includes the following fix from Pablo:

* nf_tables: fix rule deletion event reporting

Signed-off-by: Eric Leblond <eric@regit.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-14 18:01:00 +02:00
Tomasz Bursztyka
eb31628e37 netfilter: nf_tables: Add support for IPv6 NAT
This patch generalizes the NAT expression to support both IPv4 and IPv6
using the existing IPv4/IPv6 NAT infrastructure. This also adds the
NAT chain type for IPv6.

This patch collapses the following patches that were posted to the
netfilter-devel mailing list, from Tomasz:

* nf_tables: Change NFTA_NAT_ attributes to better semantic significance
* nf_tables: Split IPv4 NAT into NAT expression and IPv4 NAT chain
* nf_tables: Add support for IPv6 NAT expression
* nf_tables: Add support for IPv6 NAT chain
* nf_tables: Fix up build issue on IPv6 NAT support

And, from Pablo Neira Ayuso:

* fix missing dependencies in nft_chain_nat

Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-14 18:00:58 +02:00
Pablo Neira Ayuso
9ddf632357 netfilter: nf_tables: add support for dormant tables
This patch allows you to temporarily disable an entire table.
You can change the state of a dormant table via NFT_MSG_NEWTABLE
messages. Using this operation you can wake up a table, so their
chains are registered.

This provides atomicity at chain level. Thus, the rule-set of one
chain is applied at once, avoiding any possible intermediate state
in every chain. Still, the chains that belongs to a table are
registered consecutively. This also allows you to have inactive
tables in the kernel.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-14 18:00:57 +02:00
Pablo Neira Ayuso
0ca743a559 netfilter: nf_tables: add compatibility layer for x_tables
This patch adds the x_tables compatibility layer. This allows you
to use existing x_tables matches and targets from nf_tables.

This compatibility later allows us to use existing matches/targets
for features that are still missing in nf_tables. We can progressively
replace them with native nf_tables extensions. It also provides the
userspace compatibility software that allows you to express the
rule-set using the iptables syntax but using the nf_tables kernel
components.

In order to get this compatibility layer working, I've done the
following things:

* add NFNL_SUBSYS_NFT_COMPAT: this new nfnetlink subsystem is used
to query the x_tables match/target revision, so we don't need to
use the native x_table getsockopt interface.

* emulate xt structures: this required extending the struct nft_pktinfo
to include the fragment offset, which is already obtained from
ip[6]_tables and that is used by some matches/targets.

* add support for default policy to base chains, required to emulate
  x_tables.

* add NFTA_CHAIN_USE attribute to obtain the number of references to
  chains, required by x_tables emulation.

* add chain packet/byte counters using per-cpu.

* support 32-64 bits compat.

For historical reasons, this patch includes the following patches
that were posted in the netfilter-devel mailing list.

From Pablo Neira Ayuso:
* nf_tables: add default policy to base chains
* netfilter: nf_tables: add NFTA_CHAIN_USE attribute
* nf_tables: nft_compat: private data of target and matches in contiguous area
* nf_tables: validate hooks for compat match/target
* nf_tables: nft_compat: release cached matches/targets
* nf_tables: x_tables support as a compile time option
* nf_tables: fix alias for xtables over nftables module
* nf_tables: add packet and byte counters per chain
* nf_tables: fix per-chain counter stats if no counters are passed
* nf_tables: don't bump chain stats
* nf_tables: add protocol and flags for xtables over nf_tables
* nf_tables: add ip[6]t_entry emulation
* nf_tables: move specific layer 3 compat code to nf_tables_ipv[4|6]
* nf_tables: support 32bits-64bits x_tables compat
* nf_tables: fix compilation if CONFIG_COMPAT is disabled

From Patrick McHardy:
* nf_tables: move policy to struct nft_base_chain
* nf_tables: send notifications for base chain policy changes

From Alexander Primak:
* nf_tables: remove the duplicate NF_INET_LOCAL_OUT

From Nicolas Dichtel:
* nf_tables: fix compilation when nf-netlink is a module

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-14 18:00:04 +02:00
Pablo Neira Ayuso
9370761c56 netfilter: nf_tables: convert built-in tables/chains to chain types
This patch converts built-in tables/chains to chain types that
allows you to deploy customized table and chain configurations from
userspace.

After this patch, you have to specify the chain type when
creating a new chain:

 add chain ip filter output { type filter hook input priority 0; }
                              ^^^^ ------

The existing chain types after this patch are: filter, route and
nat. Note that tables are just containers of chains with no specific
semantics, which is a significant change with regards to iptables.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-14 17:16:11 +02:00
Patrick McHardy
20a69341f2 netfilter: nf_tables: add netlink set API
This patch adds the new netlink API for maintaining nf_tables sets
independently of the ruleset. The API supports the following operations:

- creation of sets
- deletion of sets
- querying of specific sets
- dumping of all sets

- addition of set elements
- removal of set elements
- dumping of all set elements

Sets are identified by name, each table defines an individual namespace.
The name of a set may be allocated automatically, this is mostly useful
in combination with the NFT_SET_ANONYMOUS flag, which destroys a set
automatically once the last reference has been released.

Sets can be marked constant, meaning they're not allowed to change while
linked to a rule. This allows to perform lockless operation for set
types that would otherwise require locking.

Additionally, if the implementation supports it, sets can (as before) be
used as maps, associating a data value with each key (or range), by
specifying the NFT_SET_MAP flag and can be used for interval queries by
specifying the NFT_SET_INTERVAL flag.

Set elements are added and removed incrementally. All element operations
support batching, reducing netlink message and set lookup overhead.

The old "set" and "hash" expressions are replaced by a generic "lookup"
expression, which binds to the specified set. Userspace is not aware
of the actual set implementation used by the kernel anymore, all
configuration options are generic.

Currently the implementation selection logic is largely missing and the
kernel will simply use the first registered implementation supporting the
requested operation. Eventually, the plan is to have userspace supply a
description of the data characteristics and select the implementation
based on expected performance and memory use.

This patch includes the new 'lookup' expression to look up for element
matching in the set.

This patch includes kernel-doc descriptions for this set API and it
also includes the following fixes.

From Patrick McHardy:
* netfilter: nf_tables: fix set element data type in dumps
* netfilter: nf_tables: fix indentation of struct nft_set_elem comments
* netfilter: nf_tables: fix oops in nft_validate_data_load()
* netfilter: nf_tables: fix oops while listing sets of built-in tables
* netfilter: nf_tables: destroy anonymous sets immediately if binding fails
* netfilter: nf_tables: propagate context to set iter callback
* netfilter: nf_tables: add loop detection

From Pablo Neira Ayuso:
* netfilter: nf_tables: allow to dump all existing sets
* netfilter: nf_tables: fix wrong type for flags variable in newelem

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-14 17:16:07 +02:00
Patrick McHardy
96518518cc netfilter: add nftables
This patch adds nftables which is the intended successor of iptables.
This packet filtering framework reuses the existing netfilter hooks,
the connection tracking system, the NAT subsystem, the transparent
proxying engine, the logging infrastructure and the userspace packet
queueing facilities.

In a nutshell, nftables provides a pseudo-state machine with 4 general
purpose registers of 128 bits and 1 specific purpose register to store
verdicts. This pseudo-machine comes with an extensible instruction set,
a.k.a. "expressions" in the nftables jargon. The expressions included
in this patch provide the basic functionality, they are:

* bitwise: to perform bitwise operations.
* byteorder: to change from host/network endianess.
* cmp: to compare data with the content of the registers.
* counter: to enable counters on rules.
* ct: to store conntrack keys into register.
* exthdr: to match IPv6 extension headers.
* immediate: to load data into registers.
* limit: to limit matching based on packet rate.
* log: to log packets.
* meta: to match metainformation that usually comes with the skbuff.
* nat: to perform Network Address Translation.
* payload: to fetch data from the packet payload and store it into
  registers.
* reject (IPv4 only): to explicitly close connection, eg. TCP RST.

Using this instruction-set, the userspace utility 'nft' can transform
the rules expressed in human-readable text representation (using a
new syntax, inspired by tcpdump) to nftables bytecode.

nftables also inherits the table, chain and rule objects from
iptables, but in a more configurable way, and it also includes the
original datatype-agnostic set infrastructure with mapping support.
This set infrastructure is enhanced in the follow up patch (netfilter:
nf_tables: add netlink set API).

This patch includes the following components:

* the netlink API: net/netfilter/nf_tables_api.c and
  include/uapi/netfilter/nf_tables.h
* the packet filter core: net/netfilter/nf_tables_core.c
* the expressions (described above): net/netfilter/nft_*.c
* the filter tables: arp, IPv4, IPv6 and bridge:
  net/ipv4/netfilter/nf_tables_ipv4.c
  net/ipv6/netfilter/nf_tables_ipv6.c
  net/ipv4/netfilter/nf_tables_arp.c
  net/bridge/netfilter/nf_tables_bridge.c
* the NAT table (IPv4 only):
  net/ipv4/netfilter/nf_table_nat_ipv4.c
* the route table (similar to mangle):
  net/ipv4/netfilter/nf_table_route_ipv4.c
  net/ipv6/netfilter/nf_table_route_ipv6.c
* internal definitions under:
  include/net/netfilter/nf_tables.h
  include/net/netfilter/nf_tables_core.h
* It also includes an skeleton expression:
  net/netfilter/nft_expr_template.c
  and the preliminary implementation of the meta target
  net/netfilter/nft_meta_target.c

It also includes a change in struct nf_hook_ops to add a new
pointer to store private data to the hook, that is used to store
the rule list per chain.

This patch is based on the patch from Patrick McHardy, plus merged
accumulated cleanups, fixes and small enhancements to the nftables
code that has been done since 2009, which are:

From Patrick McHardy:
* nf_tables: adjust netlink handler function signatures
* nf_tables: only retry table lookup after successful table module load
* nf_tables: fix event notification echo and avoid unnecessary messages
* nft_ct: add l3proto support
* nf_tables: pass expression context to nft_validate_data_load()
* nf_tables: remove redundant definition
* nft_ct: fix maxattr initialization
* nf_tables: fix invalid event type in nf_tables_getrule()
* nf_tables: simplify nft_data_init() usage
* nf_tables: build in more core modules
* nf_tables: fix double lookup expression unregistation
* nf_tables: move expression initialization to nf_tables_core.c
* nf_tables: build in payload module
* nf_tables: use NFPROTO constants
* nf_tables: rename pid variables to portid
* nf_tables: save 48 bits per rule
* nf_tables: introduce chain rename
* nf_tables: check for duplicate names on chain rename
* nf_tables: remove ability to specify handles for new rules
* nf_tables: return error for rule change request
* nf_tables: return error for NLM_F_REPLACE without rule handle
* nf_tables: include NLM_F_APPEND/NLM_F_REPLACE flags in rule notification
* nf_tables: fix NLM_F_MULTI usage in netlink notifications
* nf_tables: include NLM_F_APPEND in rule dumps

From Pablo Neira Ayuso:
* nf_tables: fix stack overflow in nf_tables_newrule
* nf_tables: nft_ct: fix compilation warning
* nf_tables: nft_ct: fix crash with invalid packets
* nft_log: group and qthreshold are 2^16
* nf_tables: nft_meta: fix socket uid,gid handling
* nft_counter: allow to restore counters
* nf_tables: fix module autoload
* nf_tables: allow to remove all rules placed in one chain
* nf_tables: use 64-bits rule handle instead of 16-bits
* nf_tables: fix chain after rule deletion
* nf_tables: improve deletion performance
* nf_tables: add missing code in route chain type
* nf_tables: rise maximum number of expressions from 12 to 128
* nf_tables: don't delete table if in use
* nf_tables: fix basechain release

From Tomasz Bursztyka:
* nf_tables: Add support for changing users chain's name
* nf_tables: Change chain's name to be fixed sized
* nf_tables: Add support for replacing a rule by another one
* nf_tables: Update uapi nftables netlink header documentation

From Florian Westphal:
* nft_log: group is u16, snaplen u32

From Phil Oester:
* nf_tables: operational limit match

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-14 17:15:48 +02:00
David S. Miller
53af53ae83 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	include/linux/netdevice.h
	net/core/sock.c

Trivial merge issues.

Removal of "extern" for functions declaration in netdevice.h
at the same time "const" was added to an argument.

Two parallel line additions in net/core/sock.c

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-08 23:07:53 -04:00
David S. Miller
d639feaaf3 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
The following patchset contains Netfilter updates for your net-next tree,
mostly ipset improvements and enhancements features, they are:

* Don't call ip_nest_end needlessly in the error path from me, suggested
  by Pablo Neira Ayuso, from Jozsef Kadlecsik.

* Fixed sparse warnings about shadowed variable and missing rcu annotation
  and fix of "may be used uninitialized" warnings, also from Jozsef.

* Renamed simple macro names to avoid namespace issues, reported by David
  Laight, again from Jozsef.

* Use fix sized type for timeout in the extension part, and cosmetic
  ordering of matches and targets separatedly in xt_set.c, from Jozsef.

* Support package fragments for IPv4 protos without ports from Anders K.
  Pedersen. For example this allows a hash:ip,port ipset containing the
  entry 192.168.0.1,gre:0 to match all package fragments for PPTP VPN
  tunnels to/from the host. Without this patch only the first package
  fragment (with fragment offset 0) was matched.

* Introduced a new operation to get both setname and family, from Jozsef.
  ip[6]tables set match and SET target need to know the family of the set
  in order to reject adding rules which refer to a set with a non-mathcing
  family. Currently such rules are silently accepted and then ignored
  instead of generating an error message to the user.

* Reworked extensions support in ipset types from Jozsef. The approach of
  defining structures with all variations is not manageable as the
  number of extensions grows. Therefore a blob for the extensions is
  introduced, somewhat similar to conntrack. The support of extensions
  which need a per data destroy function is added as well.

* When an element timed out in a list:set type of set, the garbage
  collector skipped the checking of the next element. So the purging
  was delayed to the next run of the gc, fixed by Jozsef.

* A small Kconfig fix: NETFILTER_NETLINK cannot be selected and
  ipset requires it.

* hash:net,net type from Oliver Smith. The type provides the ability to
  store pairs of subnets in a set.

* Comment for ipset entries from Oliver Smith. This makes possible to
  annotate entries in a set with comments, for example:

  ipset n foo hash:net,net comment
  ipset a foo 10.0.0.0/21,192.168.1.0/24 comment "office nets A and B"

* Fix of hash types resizing with comment extension from Jozsef.

* Fix of new extensions for list:set type when an element is added
  into a slot from where another element was pushed away from Jozsef.

* Introduction of a common function for the listing of the element
  extensions from Jozsef.

* Net namespace support for ipset from Vitaly Lavrov.

* hash:net,port,net type from Oliver Smith, which makes possible
  to store the triples of two subnets and a protocol, port pair in
  a set.

* Get xt_TCPMSS working with net namespace, by Gao feng.

* Use the proper net netnamespace to allocate skbs, also by Gao feng.

* A couple of cleanups for the conntrack SIP helper, by Holger
  Eitzenberger.

* Extend cttimeout to allow setting default conntrack timeouts via
  nfnetlink, so we can get rid of all our sysctl/proc interfaces in
  the future for timeout tuning, from me.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-04 13:26:38 -04:00
Nikolay Aleksandrov
32819dc183 bonding: modify the old and add new xmit hash policies
This patch adds two new hash policy modes which use skb_flow_dissect:
3 - Encapsulated layer 2+3
4 - Encapsulated layer 3+4
There should be a good improvement for tunnel users in those modes.
It also changes the old hash functions to:
hash ^= (__force u32)flow.dst ^ (__force u32)flow.src;
hash ^= (hash >> 16);
hash ^= (hash >> 8);

Where hash will be initialized either to L2 hash, that is
SRCMAC[5] XOR DSTMAC[5], or to flow->ports which should be extracted
from the upper layer. Flow's dst and src are also extracted based on the
xmit policy either directly from the buffer or by using skb_flow_dissect,
but in both cases if the protocol is IPv6 then dst and src are obtained by
ipv6_addr_hash() on the real addresses. In case of a non-dissectable
packet, the algorithms fall back to L2 hashing.
The bond_set_mode_ops() function is now obsolete and thus deleted
because it was used only to set the proper hash policy. Also we trim a
pointer from struct bonding because we no longer need to keep the hash
function, now there's only a single hash function - bond_xmit_hash that
works based on bond->params.xmit_policy.

The hash function and skb_flow_dissect were suggested by Eric Dumazet.
The layer names were suggested by Andy Gospodarek, because I suck at
semantics.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-03 15:36:38 -04:00
stephen hemminger
5bc3db5c9c tc: export tc_defact.h to userspace
Jamal sent patch to add tc user simple actions to iproute2
but required header was not being exported.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-02 16:39:11 -04:00
David S. Miller
4fbef95af4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/emulex/benet/be.h
	drivers/net/usb/qmi_wwan.c
	drivers/net/wireless/brcm80211/brcmfmac/dhd_bus.h
	include/net/netfilter/nf_conntrack_synproxy.h
	include/net/secure_seq.h

The conflicts are of two varieties:

1) Conflicts with Joe Perches's 'extern' removal from header file
   function declarations.  Usually it's an argument signature change
   or a function being added/removed.  The resolutions are trivial.

2) Some overlapping changes in qmi_wwan.c and be.h, one commit adds
   a new value, another changes an existing value.  That sort of
   thing.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-01 17:06:14 -04:00
Pablo Neira Ayuso
91cb498e6a netfilter: cttimeout: allow to set/get default protocol timeouts
Default timeouts are currently set via proc/sysctl interface, the
typical pattern is a file name like:

/proc/sys/net/netfilter/nf_conntrack_PROTOCOL_timeout_STATE

This results in one entry per default protocol state timeout.
This patch simplifies this by allowing to set default protocol
timeouts via cttimeout netlink interface.

This should allow us to get rid of the existing proc/sysctl code
in the midterm.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-01 13:17:39 +02:00
David S. Miller
06b0a9a4b2 Merge branch 'for-davem' of git://gitorious.org/linux-can/linux-can-next
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-30 19:14:20 -04:00
Oliver Smith
68b63f08d2 netfilter: ipset: Support comments for ipset entries in the core.
This adds the core support for having comments on ipset entries.

The comments are stored as standard null-terminated strings in
dynamically allocated memory after being passed to the kernel. As a
result of this, code has been added to the generic destroy function to
iterate all extensions and call that extension's destroy task if the set
has that extension activated, and if such a task is defined.

Signed-off-by: Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2013-09-30 21:33:28 +02:00
Jozsef Kadlecsik
5e04c0c38c netfilter: ipset: Introduce new operation to get both setname and family
ip[6]tables set match and SET target need to know the family of the set
in order to reject adding rules which refer to a set with a non-mathcing
family. Currently such rules are silently accepted and then ignored
instead of generating a clear error message to the user, which is not
helpful.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2013-09-30 21:33:26 +02:00
Linus Torvalds
b97b869a83 Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
 "Nothing too major, radeon still has some dpm changes for off by
  default.

  Radeon, intel, msm:
   - radeon: a few more dpm fixes (still off by default), uvd fixes
   - i915: runtime warn backtrace and regression fix
   - msm: iommu changes fallout"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (27 commits)
  drm/msm: use drm_gem_dumb_destroy helper
  drm/msm: deal with mach/iommu.h removal
  drm/msm: Remove iommu include from mdp4_kms.c
  drm/msm: Odd PTR_ERR usage
  drm/i915: Fix up usage of SHRINK_STOP
  drm/radeon: fix hdmi audio on DCE3.0/3.1 asics
  drm/i915: preserve pipe A quirk in i9xx_set_pipeconf
  drm/i915/tv: clear adjusted_mode.flags
  drm/i915/dp: increase i2c-over-aux retry interval on AUX DEFER
  drm/radeon/cik: fix overflow in vram fetch
  drm/radeon: add missing hdmi callbacks for rv6xx
  drm/i915: Use a temporary va_list for two-pass string handling
  drm/radeon/uvd: lower msg&fb buffer requirements on UVD3
  drm/radeon: disable tests/benchmarks if accel is disabled
  drm/radeon: don't set default clocks for SI when DPM is disabled
  drm/radeon/dpm/ci: filter clocks based on voltage/clk dep tables
  drm/radeon/dpm/si: filter clocks based on voltage/clk dep tables
  drm/radeon/dpm/ni: filter clocks based on voltage/clk dep tables
  drm/radeon/dpm/btc: filter clocks based on voltage/clk dep tables
  drm/radeon/dpm: fetch the max clk from voltage dep tables helper
  ...
2013-09-29 10:02:40 -07:00
Eric Dumazet
62748f32d5 net: introduce SO_MAX_PACING_RATE
As mentioned in commit afe4fd0624 ("pkt_sched: fq: Fair Queue packet
scheduler"), this patch adds a new socket option.

SO_MAX_PACING_RATE offers the application the ability to cap the
rate computed by transport layer. Value is in bytes per second.

u32 val = 1000000;
setsockopt(sockfd, SOL_SOCKET, SO_MAX_PACING_RATE, &val, sizeof(val));

To be effectively paced, a flow must use FQ packet scheduler.

Note that a packet scheduler takes into account the headers for its
computations. The effective payload rate depends on MSS and retransmits
if any.

I chose to make this pacing rate a SOL_SOCKET option instead of a
TCP one because this can be used by other protocols.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-28 15:35:41 -07:00
Dave Airlie
41ed7fe92f Merge branch 'drm-fixes-3.12' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
More radeon fixes for 3.12.  Kind of all over the place: UVD, DPM,
tiling, etc.

* 'drm-fixes-3.12' of git://people.freedesktop.org/~agd5f/linux:
  drm/radeon: fix hdmi audio on DCE3.0/3.1 asics
  drm/radeon/cik: fix overflow in vram fetch
  drm/radeon: add missing hdmi callbacks for rv6xx
  drm/radeon/uvd: lower msg&fb buffer requirements on UVD3
  drm/radeon: disable tests/benchmarks if accel is disabled
  drm/radeon: don't set default clocks for SI when DPM is disabled
  drm/radeon/dpm/ci: filter clocks based on voltage/clk dep tables
  drm/radeon/dpm/si: filter clocks based on voltage/clk dep tables
  drm/radeon/dpm/ni: filter clocks based on voltage/clk dep tables
  drm/radeon/dpm/btc: filter clocks based on voltage/clk dep tables
  drm/radeon/dpm: fetch the max clk from voltage dep tables helper
  drm/radeon: fix missed variable sized access
  drm/radeon: Make r100_cp_ring_info() and radeon_ring_gfx() safe (v2)
  drm/radeon/cik: Add tiling mode index for 1D tiled depth/stencil surfaces
  drm/radeon/cik: Fix encoding of number of banks in tiling configuration info
  drm/radeon/cik: Fix printing of client name on VM protection fault
  drm/radeon: additional gcc fixes for radeon_atombios.c
  drm/radeon: avoid UVD corruption on AGP cards using GPU gart
2013-09-28 14:45:30 +10:00
Uwe Kleine-König
1c2da13c21 can: add explicit copyrights to can's netlink header
This file is copied to the source code of user space applications (in
this case can-utils) and so it makes sense to mention explicitly their
copyright.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2013-09-21 15:43:12 +02:00
Uwe Kleine-König
2485602f1a can: add explicit copyrights to can headers
These files are copied to the source code of user space applications (in
this case can-utils) and so it makes sense to mention explicitly their
copyright. I added the terms of C code that was introduced in the same
commit as these headers.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Acked-by: Urs Thuermann <urs.thuermann@volkswagen.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2013-09-21 15:43:12 +02:00
Michel Dänzer
42baf21d91 drm/radeon/cik: Add tiling mode index for 1D tiled depth/stencil surfaces
CIK uses a different index for 1D DST surfaces compared to SI.  Expose
the new index so libdrm_radeon can use it properly for userspace
drivers.

Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2013-09-20 17:33:40 -04:00
Eric Dumazet
df62cdf348 net_sched: htb: support of 64bit rates
HTB already can deal with 64bit rates, we only have to add two new
attributes so that tc can use them to break the current 32bit ABI
barrier.

TCA_HTB_RATE64 : class rate  (in bytes per second)
TCA_HTB_CEIL64 : class ceil  (in bytes per second)

This allows us to setup HTB on 40Gbps links, as 32bit limit is
actually ~34Gbps

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-20 14:41:03 -04:00
Peter Zijlstra
fa73158710 perf: Fix capabilities bitfield compatibility in 'struct perf_event_mmap_page'
Solve the problems around the broken definition of perf_event_mmap_page::
cap_usr_time and cap_usr_rdpmc fields which used to overlap, partially
fixed by:

  860f085b74 ("perf: Fix broken union in 'struct perf_event_mmap_page'")

The problem with the fix (merged in v3.12-rc1 and not yet released
officially), noticed by Vince Weaver is that the new behavior is
not detectable by new user-space, and that due to the reuse of the
field names it's easy to mis-compile a binary if old headers are used
on a new kernel or new headers are used on an old kernel.

To solve all that make this change explicit, detectable and self-contained,
by iterating the ABI the following way:

 - Always clear bit 0, and rename it to usrpage->cap_bit0, to at least not
   confuse old user-space binaries. RDPMC will be marked as unavailable
   to old binaries but that's within the ABI, this is a capability bit.

 - Rename bit 1 to ->cap_bit0_is_deprecated and always set it to 1, so new
   libraries can reliably detect that bit 0 is deprecated and perma-zero
   without having to check the kernel version.

 - Use bits 2, 3, 4 for the newly defined, correct functionality:

	cap_user_rdpmc		: 1, /* The RDPMC instruction can be used to read counts */
	cap_user_time		: 1, /* The time_* fields are used */
	cap_user_time_zero	: 1, /* The time_zero field is used */

 - Rename all the bitfield names in perf_event.h to be different from the
   old names, to make sure it's not possible to mis-compile it
   accidentally with old assumptions.

The 'size' field can then be used in the future to add new fields and it
will act as a natural ABI version indicator as well.

Also adjust tools/perf/ userspace for the new definitions, noticed by
Adrian Hunter.

Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Also-Fixed-by: Adrian Hunter <adrian.hunter@intel.com>
Link: http://lkml.kernel.org/n/tip-zr03yxjrpXesOzzupszqglbv@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-20 09:45:11 +02:00
Peter Zijlstra
c5ecceefdb perf: Update ABI comment
For some mysterious reason the sample_id field of PERF_RECORD_MMAP went AWOL.

Reported-by: Vince Weaver <vince@deater.net>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-20 06:54:34 +02:00
Linus Torvalds
186844b292 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Two small fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Fix UAPI export of PERF_EVENT_IOC_ID
  perf/x86/intel: Fix Silvermont offcore masks
2013-09-18 11:22:53 -05:00
Vince Weaver
a8e0108cac perf: Fix UAPI export of PERF_EVENT_IOC_ID
Without the following patch I have problems compiling code using
the new PERF_EVENT_IOC_ID ioctl().  It looks like u64 was used
instead of __u64

Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1309171450380.11444@vincent-weaver-1.um.maine.edu
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-18 11:29:07 +02:00
Linus Torvalds
8bf5e36d04 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input update from Dmitry Torokhov:
 "The only change is David Hermann's new EVIOCREVOKE evdev ioctl that
  allows safely passing file descriptors to input devices to session
  processes and later being able to stop delivery of events through
  these fds so that inactive sessions will no longer receive user input
  that does not belong to them"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: evdev - add EVIOCREVOKE ioctl
2013-09-15 07:13:39 -04:00
Linus Torvalds
26935fb06e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile 4 from Al Viro:
 "list_lru pile, mostly"

This came out of Andrew's pile, Al ended up doing the merge work so that
Andrew didn't have to.

Additionally, a few fixes.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (42 commits)
  super: fix for destroy lrus
  list_lru: dynamically adjust node arrays
  shrinker: Kill old ->shrink API.
  shrinker: convert remaining shrinkers to count/scan API
  staging/lustre/libcfs: cleanup linux-mem.h
  staging/lustre/ptlrpc: convert to new shrinker API
  staging/lustre/obdclass: convert lu_object shrinker to count/scan API
  staging/lustre/ldlm: convert to shrinkers to count/scan API
  hugepage: convert huge zero page shrinker to new shrinker API
  i915: bail out earlier when shrinker cannot acquire mutex
  drivers: convert shrinkers to new count/scan API
  fs: convert fs shrinkers to new scan/count API
  xfs: fix dquot isolation hang
  xfs-convert-dquot-cache-lru-to-list_lru-fix
  xfs: convert dquot cache lru to list_lru
  xfs: rework buffer dispose list tracking
  xfs-convert-buftarg-lru-to-generic-code-fix
  xfs: convert buftarg LRU to generic code
  fs: convert inode and dentry shrinking to be node aware
  vmscan: per-node deferred work
  ...
2013-09-12 15:01:38 -07:00
Linus Torvalds
b7c09ad401 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs updates from Chris Mason:
 "This is against 3.11-rc7, but was pulled and tested against your tree
  as of yesterday.  We do have two small incrementals queued up, but I
  wanted to get this bunch out the door before I hop on an airplane.

  This is a fairly large batch of fixes, performance improvements, and
  cleanups from the usual Btrfs suspects.

  We've included Stefan Behren's work to index subvolume UUIDs, which is
  targeted at speeding up send/receive with many subvolumes or snapshots
  in place.  It closes a long standing performance issue that was built
  in to the disk format.

  Mark Fasheh's offline dedup work is also here.  In this case offline
  means the FS is mounted and active, but the dedup work is not done
  inline during file IO.  This is a building block where utilities are
  able to ask the FS to dedup a series of extents.  The kernel takes
  care of verifying the data involved really is the same.  Today this
  involves reading both extents, but we'll continue to evolve the
  patches"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (118 commits)
  Btrfs: optimize key searches in btrfs_search_slot
  Btrfs: don't use an async starter for most of our workers
  Btrfs: only update disk_i_size as we remove extents
  Btrfs: fix deadlock in uuid scan kthread
  Btrfs: stop refusing the relocation of chunk 0
  Btrfs: fix memory leak of uuid_root in free_fs_info
  btrfs: reuse kbasename helper
  btrfs: return btrfs error code for dev excl ops err
  Btrfs: allow partial ordered extent completion
  Btrfs: convert all bug_ons in free-space-cache.c
  Btrfs: add support for asserts
  Btrfs: adjust the fs_devices->missing count on unmount
  Btrf: cleanup: don't check for root_refs == 0 twice
  Btrfs: fix for patch "cleanup: don't check the same thing twice"
  Btrfs: get rid of one BUG() in write_all_supers()
  Btrfs: allocate prelim_ref with a slab allocater
  Btrfs: pass gfp_t to __add_prelim_ref() to avoid always using GFP_ATOMIC
  Btrfs: fix race conditions in BTRFS_IOC_FS_INFO ioctl
  Btrfs: fix race between removing a dev and writing sbs
  Btrfs: remove ourselves from the cluster list under lock
  ...
2013-09-12 09:58:51 -07:00
Linus Torvalds
7b7a2f0a31 Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull CIFS fixes from Steve French:
 "CIFS update including case insensitive file name matching improvements
  for UTF-8 to Unicode, various small cifs fixes, SMB2/SMB3 leasing
  improvements, support for following SMB2 symlinks, SMB3 packet signing
  improvements"

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6: (25 commits)
  CIFS: Respect epoch value from create lease context v2
  CIFS: Add create lease v2 context for SMB3
  CIFS: Move parsing lease buffer to ops struct
  CIFS: Move creating lease buffer to ops struct
  CIFS: Store lease state itself rather than a mapped oplock value
  CIFS: Replace clientCanCache* bools with an integer
  [CIFS] quiet sparse compile warning
  cifs: Start using per session key for smb2/3 for signature generation
  cifs: Add a variable specific to NTLMSSP for key exchange.
  cifs: Process post session setup code in respective dialect functions.
  CIFS: convert to use le32_add_cpu()
  CIFS: Fix missing lease break
  CIFS: Fix a memory leak when a lease break comes
  cifs: add winucase_convert.pl to Documentation/ directory
  cifs: convert case-insensitive dentry ops to use new case conversion routines
  cifs: add new case-insensitive conversion routines that are based on wchar_t's
  [CIFS] Add Scott to list of cifs contributors
  cifs: Move and expand MAX_SERVER_SIZE definition
  cifs: Expand max share name length to 256
  cifs: Move string length definitions to uapi
  ...
2013-09-12 07:41:12 -07:00
Glauber Costa
3942c07ccf fs: bump inode and dentry counters to long
This series reworks our current object cache shrinking infrastructure in
two main ways:

 * Noticing that a lot of users copy and paste their own version of LRU
   lists for objects, we put some effort in providing a generic version.
   It is modeled after the filesystem users: dentries, inodes, and xfs
   (for various tasks), but we expect that other users could benefit in
   the near future with little or no modification.  Let us know if you
   have any issues.

 * The underlying list_lru being proposed automatically and
   transparently keeps the elements in per-node lists, and is able to
   manipulate the node lists individually.  Given this infrastructure, we
   are able to modify the up-to-now hammer called shrink_slab to proceed
   with node-reclaim instead of always searching memory from all over like
   it has been doing.

Per-node lru lists are also expected to lead to less contention in the lru
locks on multi-node scans, since we are now no longer fighting for a
global lock.  The locks usually disappear from the profilers with this
change.

Although we have no official benchmarks for this version - be our guest to
independently evaluate this - earlier versions of this series were
performance tested (details at
http://permalink.gmane.org/gmane.linux.kernel.mm/100537) yielding no
visible performance regressions while yielding a better qualitative
behavior in NUMA machines.

With this infrastructure in place, we can use the list_lru entry point to
provide memcg isolation and per-memcg targeted reclaim.  Historically,
those two pieces of work have been posted together.  This version presents
only the infrastructure work, deferring the memcg work for a later time,
so we can focus on getting this part tested.  You can see more about the
history of such work at http://lwn.net/Articles/552769/

Dave Chinner (18):
  dcache: convert dentry_stat.nr_unused to per-cpu counters
  dentry: move to per-sb LRU locks
  dcache: remove dentries from LRU before putting on dispose list
  mm: new shrinker API
  shrinker: convert superblock shrinkers to new API
  list: add a new LRU list type
  inode: convert inode lru list to generic lru list code.
  dcache: convert to use new lru list infrastructure
  list_lru: per-node list infrastructure
  shrinker: add node awareness
  fs: convert inode and dentry shrinking to be node aware
  xfs: convert buftarg LRU to generic code
  xfs: rework buffer dispose list tracking
  xfs: convert dquot cache lru to list_lru
  fs: convert fs shrinkers to new scan/count API
  drivers: convert shrinkers to new count/scan API
  shrinker: convert remaining shrinkers to count/scan API
  shrinker: Kill old ->shrink API.

Glauber Costa (7):
  fs: bump inode and dentry counters to long
  super: fix calculation of shrinkable objects for small numbers
  list_lru: per-node API
  vmscan: per-node deferred work
  i915: bail out earlier when shrinker cannot acquire mutex
  hugepage: convert huge zero page shrinker to new shrinker API
  list_lru: dynamically adjust node arrays

This patch:

There are situations in very large machines in which we can have a large
quantity of dirty inodes, unused dentries, etc.  This is particularly true
when umounting a filesystem, where eventually since every live object will
eventually be discarded.

Dave Chinner reported a problem with this while experimenting with the
shrinker revamp patchset.  So we believe it is time for a change.  This
patch just moves int to longs.  Machines where it matters should have a
big long anyway.

Signed-off-by: Glauber Costa <glommer@openvz.org>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-10 18:56:29 -04:00
Linus Torvalds
7426d62871 Add the ability to collect I/O statistics on user-defined regions of a
device-mapper device.  This dm-stats code required the reintroduction of
 a div64_u64_rem() helper, but as a separate method that doesn't slow
 down div64_u64() -- especially on 32-bit systems.
 
 Allow the error target to replace request-based DM devices
 (e.g. multipath) in addition to bio-based DM devices.
 
 Various other small code fixes and improvements to thin-provisioning, DM
 cache and the DM ioctl interface.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQEcBAABAgAGBQJSLyNnAAoJEMUj8QotnQNaXVEIAKA1l43enaGiROBZEZXgAGUY
 1JUsnHES4ujyn/jtT39jPTQf9AW/rS4FUCrZiXG2aaNHXo7+7cdVoBHAiWc7mXad
 budBSqn47W7WDyFlQarKwsuYFcdLnqdnieRDMXQ1cN5dl4Rx61LclnsylQd4SSS0
 lznXkfOTquetDSuEPOuUHJDZufdacw3PpxWbTKGJld40fd7YZfGWQoG0ek1OeqqL
 fA30DTlYnkFyhheLCjFcDY6H55Rt7QpBWOUAa2XXLR6GLfk5iFK99autjWk2xTPT
 nppRwQrw9VH+HdW0jGLU+LRs1Y3nxwT9OBLWt9wav87Smdg/7jQAjwde9eKbO2k=
 =3ooH
 -----END PGP SIGNATURE-----

Merge tag 'dm-3.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device-mapper updates from Mike Snitzer:
 "Add the ability to collect I/O statistics on user-defined regions of a
  device-mapper device.  This dm-stats code required the reintroduction
  of a div64_u64_rem() helper, but as a separate method that doesn't
  slow down div64_u64() -- especially on 32-bit systems.

  Allow the error target to replace request-based DM devices (e.g.
  multipath) in addition to bio-based DM devices.

  Various other small code fixes and improvements to thin-provisioning,
  DM cache and the DM ioctl interface"

* tag 'dm-3.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm stripe: silence a couple sparse warnings
  dm: add statistics support
  dm thin: always return -ENOSPC if no_free_space is set
  dm ioctl: cleanup error handling in table_load
  dm ioctl: increase granularity of type_lock when loading table
  dm ioctl: prevent rename to empty name or uuid
  dm thin: set pool read-only if breaking_sharing fails block allocation
  dm thin: prefix pool error messages with pool device name
  dm: allow error target to replace bio-based and request-based targets
  math64: New separate div64_u64_rem helper
  dm space map: optimise sm_ll_dec and sm_ll_inc
  dm btree: prefetch child nodes when walking tree for a dm_btree_del
  dm btree: use pop_frame in dm_btree_del to cleanup code
  dm cache: eliminate holes in cache structure
  dm cache: fix stacking of geometry limits
  dm thin: fix stacking of geometry limits
  dm thin: add data block size limits to Documentation
  dm cache: add data block size limits to code and Documentation
  dm cache: document metadata device is exclussive to a cache
  dm: stop using WQ_NON_REENTRANT
2013-09-10 13:06:15 -07:00
Linus Torvalds
300893b08f xfs: update for v3.12-rc1
For 3.12-rc1 there are a number of bugfixes in addition to work to ease usage
 of shared code between libxfs and the kernel, the rest of the work to enable
 project and group quotas to be used simultaneously, performance optimisations
 in the log and the CIL, directory entry file type support, fixes for log space
 reservations, some spelling/grammar cleanups, and the addition of user
 namespace support.
 
 - introduce readahead to log recovery
 - add directory entry file type support
 - fix a number of spelling errors in comments
 - introduce new Q_XGETQSTATV quotactl for project quotas
 - add USER_NS support
 - log space reservation rework
 - CIL optimisations
 - kernel/userspace libxfs rework
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABAgAGBQJSLeikAAoJENaLyazVq6ZOciEP/3tc850sQsPlNwP9aqd1l2Wk
 S1RJ8i+MUQ2W/PlbswCXvdUCT8DIwXWxL31tGvi8vtaLhh6t8ICSZwqNil+/GCIJ
 BErVvY4oXhEMHhlbIRRvpxblTfJGiYy3puUEz9VI0yDdUVnC33+DuEeLTQ/0mibo
 /UUqKFmM3KYpOc8vIQvH5K5i8PkjtMt9yge0k4l9COD30gtY2okkaD4b1voOsKc+
 5YFqulq7zcXBUYti+EFCQeV8aUBTGEPN4PJRdcS12/ylzsTzZivAOO+QREu7qBW8
 x+Gj8fOC+yYWCttmJlfa1n8taxge3ndEuzKN97nvvfQgjvvunMvwJ499skryYVdB
 EcPnBnpDUQuz/y7exKBT9uROK817vZBtfHzSova29ayQSWC+qDpNE4xXeDIqeCtT
 CPxdHuWMOvIdZg41E4x7je0elaZl8EAZ8hycc2WuRhtukEkIdE1O8aD7IVrMYee8
 kg+aVHG5nmYRInO1WuMinbtiCzwvVoBJToWM3y4cbfgW0dILASRyL53HDd+eCr1j
 kOpPIVgXlBZgiPMmdYahWxyVVWcE7zyex0w4frzWVlJMZ4lP5brppD6qfQg1JwOB
 z21Y95F5C2GxSyN/Lwps0G6jujHrpe6GVeYK7uKCtnqTD83nSShv5Naln7pQ3AUs
 qUMsqmJob4+bwt94Xgbx
 =V4s4
 -----END PGP SIGNATURE-----

Merge tag 'xfs-for-linus-v3.12-rc1' of git://oss.sgi.com/xfs/xfs

Pull xfs updates from Ben Myers:
 "For 3.12-rc1 there are a number of bugfixes in addition to work to
  ease usage of shared code between libxfs and the kernel, the rest of
  the work to enable project and group quotas to be used simultaneously,
  performance optimisations in the log and the CIL, directory entry file
  type support, fixes for log space reservations, some spelling/grammar
  cleanups, and the addition of user namespace support.

   - introduce readahead to log recovery
   - add directory entry file type support
   - fix a number of spelling errors in comments
   - introduce new Q_XGETQSTATV quotactl for project quotas
   - add USER_NS support
   - log space reservation rework
   - CIL optimisations
  - kernel/userspace libxfs rework"

* tag 'xfs-for-linus-v3.12-rc1' of git://oss.sgi.com/xfs/xfs: (112 commits)
  xfs: XFS_MOUNT_QUOTA_ALL needed by userspace
  xfs: dtype changed xfs_dir2_sfe_put_ino to xfs_dir3_sfe_put_ino
  Fix wrong flag ASSERT in xfs_attr_shortform_getvalue
  xfs: finish removing IOP_* macros.
  xfs: inode log reservations are too small
  xfs: check correct status variable for xfs_inobt_get_rec() call
  xfs: inode buffers may not be valid during recovery readahead
  xfs: check LSN ordering for v5 superblocks during recovery
  xfs: btree block LSN escaping to disk uninitialised
  XFS: Assertion failed: first <= last && last < BBTOB(bp->b_length), file: fs/xfs/xfs_trans_buf.c, line: 568
  xfs: fix bad dquot buffer size in log recovery readahead
  xfs: don't account buffer cancellation during log recovery readahead
  xfs: check for underflow in xfs_iformat_fork()
  xfs: xfs_dir3_sfe_put_ino can be static
  xfs: introduce object readahead to log recovery
  xfs: Simplify xfs_ail_min() with list_first_entry_or_null()
  xfs: Register hotcpu notifier after initialization
  xfs: add xfs sb v4 support for dirent filetype field
  xfs: Add write support for dirent filetype field
  xfs: Add read-only support for dirent filetype field
  ...
2013-09-09 11:19:09 -07:00
Linus Torvalds
d75671e36e VFIO updates include safer default file flags for VFIO device fds,
an external user interface exported to allow other modules to hold
 references to VFIO groups, a fix to test for extended config space
 on PCIe and PCI-x, and new hot reset interfaces for PCI devices
 which allows the user to do PCI bus/slot resets when all of the
 devices affected by the reset are owned by the user.
 
 For this last feature, the PCI bus reset interface, I depend on
 changes already merged from Bjorn's PCI pull request.  I therefore
 merged my tree up to commit cb3e433, which I think was the correct
 action, but as Stephen Rothwell noted, I failed to provide a commit
 message indicating why the merge was required.  Sorry for that.
 Thanks,
 Alex
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQIcBAABAgAGBQJSLKr0AAoJECObm247sIsiouMP/1rym/JDjSsjIONvjnQEH5hk
 aZy3gGjqvSBBSN2aWqYkEgsSmgh4lSdDAbSBpY6vfz4Q0h7OXhSZcZ5oG2eGECia
 rN6GvfI/g2mYBCHJbS30bEkpljx9ssMZi7n9dnWOWEkzXSTC/uLHx9DCUiHdczPF
 SEeEHbxAnCTL5S5SQgUwhjPI813eVPjm94KDZK6pHcd14GGtM/piGjJNkyfzymRZ
 0l7lu95YaY70qL1Bk1EZ2TlsqWtYeKfDXsp465U3uBj5eXslMKpxXBn/XndRWu25
 ZnjRitF/izMZdcBgIM4uVK/yDhdAI2qav1rtkVirqp9Qrs9BgJXxUpO3p0e/GhFl
 KZgq3DnE1aUfJlyu+YGmjIKWDT+CBrkkxidRwYoaBCs3if/1E19zUH5Gfuuimy/D
 rpET3HPuJlPhxyE2vH9u6EVTybBx4wlvDP43FyhetriQjwv+SuPP2oNWD96L/qBY
 BHcSMHfkyq90Lp6tjCeTISkwOb66qqkRUlzycPiB67BgrIea8yn14lKpE7GLcgmO
 XSLLRvv6tFD2VcqEWQZNLoLo8XqAngvgStbOXxiTFpSQD4h2Z0TbpSc9qdVZlgIt
 JBMA0BuDj+gzCDePUlsuM5L1EVOaFuKSHiZA6MUewDBqiiKoWPPsDKsFpKt2jFv4
 Mp47JcWkDBg1DCC8nQ7f
 =FAuJ
 -----END PGP SIGNATURE-----

Merge tag 'vfio-v3.12-rc0' of git://github.com/awilliam/linux-vfio

Pull VFIO update from Alex Williamson:
 "VFIO updates include safer default file flags for VFIO device fds, an
  external user interface exported to allow other modules to hold
  references to VFIO groups, a fix to test for extended config space on
  PCIe and PCI-x, and new hot reset interfaces for PCI devices which
  allows the user to do PCI bus/slot resets when all of the devices
  affected by the reset are owned by the user.

  For this last feature, the PCI bus reset interface, I depend on
  changes already merged from Bjorn's PCI pull request.  I therefore
  merged my tree up to commit cb3e433, which I think was the correct
  action, but as Stephen Rothwell noted, I failed to provide a commit
  message indicating why the merge was required.  Sorry for that.
  Thanks, Alex"

* tag 'vfio-v3.12-rc0' of git://github.com/awilliam/linux-vfio:
  vfio: fix documentation
  vfio-pci: PCI hot reset interface
  vfio-pci: Test for extended config space
  vfio-pci: Use fdget() rather than eventfd_fget()
  vfio: Add O_CLOEXEC flag to vfio device fd
  vfio: use get_unused_fd_flags(0) instead of get_unused_fd()
  vfio: add external user support
2013-09-09 10:19:36 -07:00
Scott Lovenberg
cdf1246ffb cifs: Move and expand MAX_SERVER_SIZE definition
MAX_SERVER_SIZE has been moved to cifs_mount.h and renamed
CIFS_NI_MAXHOST for clarity.  It has been expanded to 1024 as the
previous value of 16 was very short.

Signed-off-by: Scott Lovenberg <scott.lovenberg@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2013-09-08 14:34:22 -05:00
Scott Lovenberg
54fcf270de cifs: Expand max share name length to 256
The old max share name length limit was 80 due to Windows NET SHARE
command not allowing more than that.  However, share names can be much
longer.  This is a more reasonable maximum share name length.

Signed-off-by: Scott Lovenberg <scott.lovenberg@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2013-09-08 14:34:17 -05:00
Scott Lovenberg
8c3a2b4c42 cifs: Move string length definitions to uapi
The max string length definitions for user name, domain name, password,
and share name have been moved into their own header file in uapi so the
mount helper can use autoconf to define them instead of keeping the
kernel side and userland side definitions in sync manually.  The names
have also been standardized with a "CIFS" prefix and "LEN" suffix.

Signed-off-by: Scott Lovenberg <scott.lovenberg@gmail.com>
Reviewed-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2013-09-08 14:34:11 -05:00
Linus Torvalds
b409624ad5 Merge git://git.infradead.org/users/willy/linux-nvme
Pull NVM Express driver update from Matthew Wilcox.

* git://git.infradead.org/users/willy/linux-nvme:
  NVMe: Merge issue on character device bring-up
  NVMe: Handle ioremap failure
  NVMe: Add pci suspend/resume driver callbacks
  NVMe: Use normal shutdown
  NVMe: Separate controller init from disk discovery
  NVMe: Separate queue alloc/free from create/delete
  NVMe: Group pci related actions in functions
  NVMe: Disk stats for read/write commands only
  NVMe: Bring up cdev on set feature failure
  NVMe: Fix checkpatch issues
  NVMe: Namespace IDs are unsigned
  NVMe: Update nvme_id_power_state with latest spec
  NVMe: Split header file into user-visible and kernel-visible pieces
  NVMe: Call nvme_process_cq from submission path
  NVMe: Remove "process_cq did something" message
  NVMe: Return correct value from interrupt handler
  NVMe: Disk IO statistics
  NVMe: Restructure MSI / MSI-X setup
  NVMe: Use kzalloc instead of kmalloc+memset
2013-09-07 20:19:02 -07:00
Linus Torvalds
11c7b03d42 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem updates from James Morris:
 "Nothing major for this kernel, just maintenance updates"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (21 commits)
  apparmor: add the ability to report a sha1 hash of loaded policy
  apparmor: export set of capabilities supported by the apparmor module
  apparmor: add the profile introspection file to interface
  apparmor: add an optional profile attachment string for profiles
  apparmor: add interface files for profiles and namespaces
  apparmor: allow setting any profile into the unconfined state
  apparmor: make free_profile available outside of policy.c
  apparmor: rework namespace free path
  apparmor: update how unconfined is handled
  apparmor: change how profile replacement update is done
  apparmor: convert profile lists to RCU based locking
  apparmor: provide base for multiple profiles to be replaced at once
  apparmor: add a features/policy dir to interface
  apparmor: enable users to query whether apparmor is enabled
  apparmor: remove minimum size check for vmalloc()
  Smack: parse multiple rules per write to load2, up to PAGE_SIZE-1 bytes
  Smack: network label match fix
  security: smack: add a hash table to quicken smk_find_entry()
  security: smack: fix memleak in smk_write_rules_list()
  xattr: Constify ->name member of "struct xattr".
  ...
2013-09-07 14:34:07 -07:00
David Herrmann
c7dc65737c Input: evdev - add EVIOCREVOKE ioctl
If we have multiple sessions on a system, we normally don't want
background sessions to read input events. Otherwise, it could capture
passwords and more entered by the user on the foreground session. This is
a real world problem as the recent XMir development showed:
  http://mjg59.dreamwidth.org/27327.html

We currently rely on sessions to release input devices when being
deactivated. This relies on trust across sessions. But that's not given on
usual systems. We therefore need a way to control which processes have
access to input devices.

With VTs the kernel simply routed them through the active /dev/ttyX. This
is not possible with evdev devices, though. Moreover, we want to avoid
routing input-devices through some dispatcher-daemon in userspace (which
would add some latency).

This patch introduces EVIOCREVOKE. If called on an evdev fd, this revokes
device-access irrecoverably for that *single* open-file. Hence, once you
call EVIOCREVOKE on any dup()ed fd, all fds for that open-file will be
rather useless now (but still valid compared to close()!). This allows us
to pass fds directly to session-processes from a trusted source. The
source keeps a dup()ed fd and revokes access once the session-process is
no longer active.
Compared to the EVIOCMUTE proposal, we can avoid the CAP_SYS_ADMIN
restriction now as there is no way to revive the fd again. Hence, a user
is free to call EVIOCREVOKE themself to kill the fd.

Additionally, this ioctl allows multi-layer access-control (again compared
to EVIOCMUTE which was limited to one layer via CAP_SYS_ADMIN). A middle
layer can simply request a new open-file from the layer above and pass it
to the layer below. Now each layer can call EVIOCREVOKE on the fds to
revoke access for all layers below, at the expense of one fd per layer.

There's already ongoing experimental user-space work which demonstrates
how it can be used:
  http://lists.freedesktop.org/archives/systemd-devel/2013-August/012897.html

Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2013-09-07 12:53:20 -07:00