Bug report from Steven Jan Springl:
Issuing the following command causes a kernel oops:
tc qdisc add dev eth0 handle ffff: ingress
The problem mostly stems from all of the special case handling of
ingress qdiscs.
So, to fix this, do the grafting operation the same way we do for TX
qdiscs. Which means that dev_activate() and dev_deactivate() now do
the "qdisc_sleeping <--> qdisc" transitions on dev->rx_queue too.
Future simplifications are possible now, mainly because it is
impossible for dev_queue->{qdisc,qdisc_sleeping} to be NULL. There
are NULL checks all over to handle the ingress qdisc special case
that used to exist before this commit.
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (39 commits)
[PATCH] fix RLIM_NOFILE handling
[PATCH] get rid of corner case in dup3() entirely
[PATCH] remove remaining namei_{32,64}.h crap
[PATCH] get rid of indirect users of namei.h
[PATCH] get rid of __user_path_lookup_open
[PATCH] f_count may wrap around
[PATCH] dup3 fix
[PATCH] don't pass nameidata to __ncp_lookup_validate()
[PATCH] don't pass nameidata to gfs2_lookupi()
[PATCH] new (local) helper: user_path_parent()
[PATCH] sanitize __user_walk_fd() et.al.
[PATCH] preparation to __user_walk_fd cleanup
[PATCH] kill nameidata passing to permission(), rename to inode_permission()
[PATCH] take noexec checks to very few callers that care
Re: [PATCH 3/6] vfs: open_exec cleanup
[patch 4/4] vfs: immutable inode checking cleanup
[patch 3/4] fat: dont call notify_change
[patch 2/4] vfs: utimes cleanup
[patch 1/4] vfs: utimes: move owner check into inode_change_ok()
[PATCH] vfs: use kstrdup() and check failing allocation
...
make it atomic_long_t; while we are at it, get rid of useless checks in affs,
hfs and hpfs - ->open() always has it equal to 1, ->release() - to 0.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This reverts commit f867e6af94.
Based upon discussions between Jarek and Patrick McHardy
this is field being set is more a config parameter than a
statistic. And we should add a true statistic to provide
this information if we really want it.
Signed-off-by: David S. Miller <davem@davemloft.net>
Removes legacy reinvent-the-wheel type thing. The generic
machinery integrates much better to automated debugging aids
such as kerneloops.org (and others), and is unambiguous due to
better naming. Non-intuively BUG_TRAP() is actually equal to
WARN_ON() rather than BUG_ON() though some might actually be
promoted to BUG_ON() but I left that to future.
I could make at least one BUILD_BUG_ON conversion.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dump the "flows" number according to the number of active flows
instead of repeating the "limit".
Reported-by: Denys Fedoryshchenko <denys@visp.net.lb>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes the needlessly global qdisc_class_hash_alloc() static.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
As suggested by Dave:
This patch adds a function to get the driver name from a struct net_device,
and consequently uses this in the watchdog timeout handler to print as
part of the message.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit a0c80b80e0.
After discussions with Jamal and Herbert on netdev, we should
provide at least minimal prioritization at the qdisc level
even in multiqueue situations.
Signed-off-by: David S. Miller <davem@davemloft.net>
Removed unused variable 'skb' in the dev_deactivate_queue function
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The stab bits can't be referenced uniless the full
packet scheduler layer is enabled.
Reported by Stephen Rothwell.
Signed-off-by: David S. Miller <davem@davemloft.net>
Add size table functions for qdiscs and calculate packet size in
qdisc_enqueue().
Based on patch by Patrick McHardy
http://marc.info/?l=linux-netdev&m=115201979221729&w=2
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Like noop_qdisc, it needs a dummy backpointer and
explicit qdisc->q.lock initialization.
Based upon a report by Stephen Hemminger.
Signed-off-by: David S. Miller <davem@davemloft.net>
Idea is from Patrick McHardy.
Instead of managing the list of qdiscs on the device level, manage it
in the root qdisc of a netdev_queue. This solves all kinds of
visibility issues during qdisc destruction.
The way to iterate over all qdiscs of a netdev_queue is to visit
the netdev_queue->qdisc, and then traverse it's list.
The only special case is to ignore builting qdiscs at the root when
dumping or doing a qdisc_lookup(). That was not needed previously
because builtin qdiscs were not added to the device's qdisc_list.
Signed-off-by: David S. Miller <davem@davemloft.net>
The u32_list is just an indirect way of maintaining a reference
to a U32 node on a per-qdisc basis.
Just add an explicit node pointer for u32 to struct Qdisc an do
away with this global list.
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of 'pfifo_fast' we have just plain 'fifo_fast'.
No priority queues, just a straight FIFO.
This is necessary in order to legally have a seperate
qdisc per queue in multi-TX-queue setups, and thus get
full parallelization.
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the destruction of the old queue into qdisc_graft().
When operating on a root qdisc (ie. "parent == NULL"), apply
the operation to all queues. The caller has grabbed a single
implicit reference for this graft, therefore when we apply the
change to more than one queue we must grab additional qdisc
references.
Otherwise, we are operating on a class of a specific parent qdisc, and
therefore no multiqueue handling is necessary.
Signed-off-by: David S. Miller <davem@davemloft.net>
Lock the root of the qdisc being operated upon.
All explicit references to qdisc_tree_lock() are now gone.
The only remaining uses are via the sch_tree_{lock,unlock}()
and tcf_tree_{lock,unlock}() macros.
Signed-off-by: David S. Miller <davem@davemloft.net>
It just wants the qdisc tree for the filter to be synchronized.
So just BH lock qdisc_root_lock(q) instead.
Signed-off-by: David S. Miller <davem@davemloft.net>
This allows less strict control of access to the qdisc attached to a
netdev_queue. It is even allowed to enqueue into a qdisc which is
in the process of being destroyed. The RCU handler will toss out
those packets.
We will need this to handle sharing of a qdisc amongst multiple
TX queues. In such a setup the lock has to be shared, so will
be inside of the qdisc itself. At which point the netdev_queue
lock cannot be used to hard synchronize access to the ->qdisc
pointer.
One operation we have to keep inside of qdisc_destroy() is the list
deletion. It is the only piece of state visible after the RCU quiesce
period, so we have to undo it early and under the appropriate locking.
The operations in the RCU handler do not need any looking because the
qdisc tree is no longer visible to anything at that point.
Signed-off-by: David S. Miller <davem@davemloft.net>
We are registering the device, there is no way anyone can get
at this object's qdiscs yet in any meaningful way.
Signed-off-by: David S. Miller <davem@davemloft.net>
When we have shared qdiscs, packets come out of the qdiscs
for multiple transmit queues.
Therefore it doesn't make any sense to schedule the transmit
queue when logically we cannot know ahead of time the TX
queue of the SKB that the qdisc->dequeue() will give us.
Just for sanity I added a BUG check to make sure we never
get into a state where the noop_qdisc is scheduled.
Signed-off-by: David S. Miller <davem@davemloft.net>
When code wants to lock the qdisc tree state, the logic
operation it's doing is locking the top-level qdisc that
sits of the root of the netdev_queue.
Add qdisc_root_lock() to represent this and convert the
easiest cases.
In order for this to work out in all cases, we have to
hook up the noop_qdisc to a dummy netdev_queue.
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently it is associated with a netdev_queue, but when we have
qdisc sharing that no longer makes any sense.
Signed-off-by: David S. Miller <davem@davemloft.net>
We liberate any dangling gso_skb during qdisc destruction.
It really only matters for the root qdisc. But when qdiscs
can be shared by multiple netdev_queue objects, we can't
have the gso_skb in the netdev_queue any more.
Signed-off-by: David S. Miller <davem@davemloft.net>
This effectively "flips the switch" by making the core networking
and multiqueue-aware drivers use the new TX multiqueue structures.
Non-multiqueue drivers need no changes. The interfaces they use such
as netif_stop_queue() degenerate into an operation on TX queue zero.
So everything "just works" for them.
Code that really wants to do "X" to all TX queues now invokes a
routine that does so, such as netif_tx_wake_all_queues(),
netif_tx_stop_all_queues(), etc.
pktgen and netpoll required a little bit more surgery than the others.
In particular the pktgen changes, whilst functional, could be largely
improved. The initial check in pktgen_xmit() will sometimes check the
wrong queue, which is mostly harmless. The thing to do is probably to
invoke fill_packet() earlier.
The bulk of the netpoll changes is to make the code operate solely on
the TX queue indicated by by the SKB queue mapping.
Setting of the SKB queue mapping is entirely confined inside of
net/core/dev.c:dev_pick_tx(). If we end up needing any kind of
special semantics (drops, for example) it will be implemented here.
Finally, we now have a "real_num_tx_queues" which is where the driver
indicates how many TX queues are actually active.
With IGB changes from Jeff Kirsher.
Signed-off-by: David S. Miller <davem@davemloft.net>
This actually fixes a bug added by the RR scheduler changes. The
->bands and ->prio2band parameters were being set outside of the
sch_tree_lock() and thus could result in strange behavior and
inconsistencies.
It might be possible, in the new design (where there will be one qdisc
per device TX queue) to allow similar functionality via a TX hash
algorithm for RR but I really see no reason to export this aspect of
how these multiqueue cards actually implement the scheduling of the
the individual DMA TX rings and the single physical MAC/PHY port.
Signed-off-by: David S. Miller <davem@davemloft.net>
alloc_netdev_mq() now allocates an array of netdev_queue
structures for TX, based upon the queue_count argument.
Furthermore, all accesses to the TX queues are now vectored
through the netdev_get_tx_queue() and netdev_for_each_tx_queue()
interfaces. This makes it easy to grep the tree for all
things that want to get to a TX queue of a net device.
Problem spots which are not really multiqueue aware yet, and
only work with one queue, can easily be spotted by grepping
for all netdev_get_tx_queue() calls that pass in a zero index.
Signed-off-by: David S. Miller <davem@davemloft.net>
Accesses are mostly structured such that when there are multiple TX
queues the code transformations will be a little bit simpler.
Signed-off-by: David S. Miller <davem@davemloft.net>
Only plain netif_schedule() remains taking a net_device, mostly as a
compatability item while we transition the rest of these interfaces.
Everything else calls netif_schedule_queue() or __netif_schedule(),
both of which take a netdev_queue pointer.
Signed-off-by: David S. Miller <davem@davemloft.net>
It just wants the root qdisc given an arbitrary qdisc,
and that is simply qdisc->dev_queue->qdisc
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Now that our qdisc management is bi-directional, per-queue, and fully
orthogonal, there is no reason to have a special ingress qdisc pointer
in struct net_device.
Signed-off-by: David S. Miller <davem@davemloft.net>
Every qdisc is assosciated with a queue, and in the case of ingress
qdiscs that will now be netdev->rx_queue so using that queue's lock is
the thing to do.
Signed-off-by: David S. Miller <davem@davemloft.net>
The lock is now an attribute of the device queue.
One thing to notice is that "suspicious" places
emerge which will need specific training about
multiple queue handling. They are so marked with
explicit "netdev->rx_queue" and "netdev->tx_queue"
references.
Signed-off-by: David S. Miller <davem@davemloft.net>
It can be obtained via the netdev_queue. So create a helper routine,
qdisc_dev(), to make the transformations nicer looking.
Now, qdisc_alloc() now no longer needs a net_device pointer argument.
Signed-off-by: David S. Miller <davem@davemloft.net>
A netdev_queue is an entity managed by a qdisc.
Currently there is one RX and one TX queue, and a netdev_queue merely
contains a backpointer to the net_device.
The Qdisc struct is augmented with a netdev_queue pointer as well.
Eventually the 'dev' Qdisc member will go away and we will have the
resulting hierarchy:
net_device --> netdev_queue --> Qdisc
Also, qdisc_alloc() and qdisc_create_dflt() now take a netdev_queue
pointer argument.
Signed-off-by: David S. Miller <davem@davemloft.net>