With ib_qib options:
options ib_qib krcvqs=1 pcie_caps=0x51 rcvhdrcnt=4096 singleport=1 ibmtu=4
a run of ib_write_bw -a yields the following:
------------------------------------------------------------------
#bytes #iterations BW peak[MB/sec] BW average[MB/sec]
1048576 5000 2910.64 229.80
------------------------------------------------------------------
The top cpu use in a profile is:
CPU: Intel Architectural Perfmon, speed 2400.15 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask
of 0x00 (No unit mask) count 1002300
Counted LLC_MISSES events (Last level cache demand requests from this core that
missed the LLC) with a unit mask of 0x41 (No unit mask) count 10000
samples % samples % app name symbol name
15237 29.2642 964 17.1195 ib_qib.ko qib_7322intr
12320 23.6618 1040 18.4692 ib_qib.ko handle_7322_errors
4106 7.8860 0 0 vmlinux vsnprintf
Analysis of the stats, profile, the code, and the annotated profile indicate:
- All of the overflow interrupts (one per packet overflow) are
serviced on CPU0 with no mitigation on the frequency.
- All of the receive interrupts are being serviced by CPU0. (That is
the way truescale.cmds statically allocates the kctx IRQs to CPU)
- The code is spending all of its time servicing QIB_I_C_ERROR
RcvEgrFullErr interrupts on CPU0, starving the packet receive
processing.
- The decode_err routine is very inefficient, using a printf variant
to format a "%s" and continues to loop when the errs mask has been
cleared.
- Both qib_7322intr and handle_7322_errors read pci registers, which
is very inefficient.
The fix does the following:
- Adds a tasklet to service QIB_I_C_ERROR
- Replaces the very inefficient scnprintf() with a memcpy(). A field
is added to qib_hwerror_msgs to save the sizeof("string") at
compile time so that a strlen is not needed during err_decode().
- The most frequent errors (Overflows) are serviced first to exit the
loop as early as possible.
- The loop now exits as soon as the errs mask is clear rather than
fruitlessly looping through the msp array.
With this fix the performance changes to:
------------------------------------------------------------------
#bytes #iterations BW peak[MB/sec] BW average[MB/sec]
1048576 5000 2990.64 2941.35
------------------------------------------------------------------
During testing of the error handling overflow patch, it was determined
that some CPU's were slower when servicing both overflow and receive
interrupts on CPU0 with different MSI interrupt vectors.
This patch adds an option (krcvq01_no_msi) to not use a dedicated MSI
interrupt for kctx's < 2 and to service them on the default interrupt.
For some CPUs, the cost of the interrupt enter/exit is more costly
than then the additional PCI read in the default handler.
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Adds README file, TODO list, and a couple of other pieces that didn't seem
to fit into any other patch.
Signed-off-by: Jonas Bonn <jonas@southpole.se>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
The OpenRISC Linux kernel conforms to the "generic" syscall interface which
contains only the reduced set of syscalls deemed necessary for new
architectures. Unfortunately, the uClibc port for OpenRISC does not fully
support this reduced set; as such, an additional patch available out-of-tree
needs to be applied to the kernel in order to use the current uClibc. This
is just a temporary measure until the libc port can be straightened out; it
is likely that OpenRISC will make the transition to glibc shortly where the
generic syscall interface is better supported.
Signed-off-by: Jonas Bonn <jonas@southpole.se>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
This patch adds support for the OpenRISC PIC.
Signed-off-by: Jonas Bonn <jonas@southpole.se>
Cc: tglx@linutronix.de
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Implements support for the OpenRISC timer which is a 28 bit cycle counter
that can be read out of a special purpose register. This counter is
used as a both a clock event and clocksource device.
Signed-off-by: Jonas Bonn <jonas@southpole.se>
Cc: tglx@linutronix.de
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
This patch implements minimal PTrace support. The pt_regs structure is
not exported to userspace for OpenRISC; rather, the GETREGSET mechanism
is intended to be used and the registers, as such, exported in the core
dump format which is ABI stable. This is in line with what is intended
for new architectures as of 2.6.34 and has the advantage of permitting
the layout of the registers on the kernel stack (as per pt_regs) to be
freely modified.
Signed-off-by: Jonas Bonn <jonas@southpole.se>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
The OpenRISC architecture uses the device tree infrastructure for the
platform description. This is currently limited to having a device tree
built into the kernel, but work is underway within the OpenRISC project
to define how this device tree blob should be passed into the kernel from
an external resource.
Patch contains a single example DTS file to go with the defconfig for
or1ksim.
Signed-off-by: Jonas Bonn <jonas@southpole.se>
Cc: devicetree-discuss@lists.ozlabs.org
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Architecture code and early setup routines for booting Linux.
Signed-off-by: Jonas Bonn <jonas@southpole.se>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Use the CONFIG_HAS_IOPORT and CONFIG_PCI options to decide whether or
not functions for mapping these areas are provided.
Signed-off-by: Jonas Bonn <jonas@southpole.se>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Some of the implementations, in particular the ioremap variants, in
asm-generic/io.h are for systems without an MMU. In order to be able to
use the generic header file for systems with an MMU, this patch wraps
these implementations in checks for CONFIG_MMU.
Tested on OpenRISC.
Signed-off-by: Jonas Bonn <jonas@southpole.se>
Cc: liqin.chen@sunplusct.com
Cc: gxt@mprc.pku.edu.cn
Acked-by: Mike Frysinger <vapier@gentoo.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
This patch moves the in-tree architectures that were using the 'generic'
delay.h over to using the header file in asm-generic.
This is not done using the generic-y mechanism as none of these arch's
have started using that mechanism yet. This is a trivial change to make
later when the arch begins using generic-y.
Note the subtle change to the avr32 and SH architectures where the argument
to __const_udelay was previously using the rounded down constant value
instead of the rounded up value.
Signed-off-by: Jonas Bonn <jonas@southpole.se>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no>
With a non-constant 8-bit argument, a call to udelay() generates a warning:
drivers/gpu/drm/radeon/atom.c: In function 'atom_op_delay':
drivers/gpu/drm/radeon/atom.c:654: warning: comparison is always false due to limited range of data type
The code looks like it works OK with an 8-bit arg, and the calling code is
doing nothing wrong, so udelay() needs fixing.
Fixing it was rather tricky. Simply typecasting `n' in the comparison with
20000 didn't change anything. Hence the divide-by-20000 trick.
Using a do{}while loop didn't work because udelay() is used in ?: statements,
hence the ({...}) construct.
While I was there I replaced the brain-bending ?:?:?: mess with nice if/else
code.
Probably other architectures are generating the same warning and can use a
similar change.
[Taken from the x86 tree and moved to asm-generic by Jonas Bonn]
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jonas Bonn <jonas@southpole.se>
Building kernel 3.0 for an n2100 (plat-iop) results in:
In file included from arch/arm/plat-iop/cp6.c:20:
/tmp/linux-3.0/arch/arm/include/asm/traps.h:12: warning: 'struct pt_regs' declared inside parameter list
/tmp/linux-3.0/arch/arm/include/asm/traps.h:12: warning: its scope is only this definition or declaration, which is probably not what you want
/tmp/linux-3.0/arch/arm/include/asm/traps.h:48: warning: 'struct pt_regs' declared inside parameter list
/tmp/linux-3.0/arch/arm/include/asm/traps.h:48: warning: 'struct task_struct' declared inside parameter list
arch/arm/plat-iop/cp6.c:45: warning: initialization from incompatible pointer type
Nothing here depends on the layout of pt_regs or task_struct, so this
can be fixed by adding forward struct declarations to asm/traps.h.
Signed-off-by: Mikael Pettersson <mikpe@it.uu.se>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Currently all bio requests are 512 bytes, which may fail for media
whose physical sector size is larger than this. Ensure these
requests are not smaller than the block device logical block size.
BugLink: http://bugs.launchpad.net/bugs/734883
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
'recoff' is read from disk and used for an argument to memcpy, so if
the value read from disk is larger than the page size, it result to
"general protection fault". This patch add additional range check for
the value, so that disk fuzz won't cause such fault.
Signed-off-by: Naohiro Aota <naota@elisp.net>
Signed-off-by: Christoph Hellwig <hch@lst.de>
icmp_route_lookup() uses the wrong flow parameters if the reverse
session route lookup isn't used.
So do not commit to the re-decoded flow until we actually make a
final decision to use a real route saved in 'rt2'.
Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Test-case:
void *tfunc(void *arg)
{
execvp("true", NULL);
return NULL;
}
int main(void)
{
int pid;
if (fork()) {
pthread_t t;
kill(getpid(), SIGSTOP);
pthread_create(&t, NULL, tfunc, NULL);
for (;;)
pause();
}
pid = getppid();
assert(ptrace(PTRACE_ATTACH, pid, 0,0) == 0);
while (wait(NULL) > 0)
ptrace(PTRACE_CONT, pid, 0,0);
return 0;
}
It is racy, exit_notify() does __wake_up_parent() too. But in the
likely case it triggers the problem: de_thread() does release_task()
and the old leader goes away without the notification, the tracer
sleeps in do_wait() without children/tracees.
Change de_thread() to do __wake_up_parent(traced_leader->parent).
Since it is already EXIT_DEAD we can do this without ptrace_unlink(),
EXIT_DEAD threads do not exist from do_wait's pov.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
No need to define a new "cfs_rq" variable in the "for" block.
Just use the one at the top of the function.
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1311297271.3938.1352.camel@minggr.sh.intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch bumps the target core version to v4.1.0-rc1 now that we are
in sync with upstream lio-core-2.6.git/master
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
This patch drops transport_asciihex_to_binaryhex() in favor of proper
hex2bin usage from include/linux/kernel.h:hex2bin()
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Additionally this patch brings proper apply of the designator type.
However, the original code luckily has no bug, because the association
equals to 0.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: James Bottomley <jbottomley@parallels.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
This patch adds the default 'Unrestricted reordering allowed' for SCSI
control mode page QUEUE ALGORITHM MODIFIER on a per se_device basis in
target_modesense_control() following spc4r23. This includes a new
emuluate_rest_reord configfs attribute that currently (only) accepts
zero to signal 'Unrestricted reordering allowed' in control mode page
usage by the backend target device.
Reported-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@risingtidesystems.com>
This patch breaks up the ->map_task_SG() backend call into two seperate
->map_control_SG() and ->map_data_SG() in order to better address
IBLOCK and pSCSI. IBLOCK only allocates bios for ->map_data_SG(), and
pSCSI will allocate a struct request for both cases.
This patch fixes incorrect usage of ->map_task_SG() for all se_cmd descriptors
in transport_generic_new_cmd() by moving the call into it's proper location
directly inside of transport_allocate_data_tasks()
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
This patch contains the squashed version of forth round series cleanups
from Andy and Christoph following the post heavy lifting in the preceeding:
'Eliminate usage of struct se_mem' and 'Make all control CDBs scatter-gather'
changes. This also includes a conversion of target core and the v3.0
mainline fabric modules (loopback and tcm_fc) to use pr_debug and the
CONFIG_DYNAMIC_DEBUG infrastructure!
These have been squashed into this third and final round for v3.1.
target: Remove ifdeffed code in t_g_process_write
target: Remove direct ramdisk code
target: Rename task_sg_num to task_sg_nents
target: Remove custom debug macros for pr_debug. Use pr_err().
target: Remove custom debug macros in mainline fabrics
target: Set WSNZ=1 in block limits VPD. Abort if WRITE_SAME sectors = 0
target: Remove transport do_se_mem_map callback
target: Further simplify transport_free_pages
target: Redo task allocation return value handling
target: Remove extra parentheses
target: change alloc_task call to take *cdb, not *cmd
(nab: Fix bogus struct file assignments in fd_do_readv and fd_do_writev)
Signed-off-by: Andy Grover <agrover@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Both backstores and fabrics use arrays of struct scatterlist to describe
data buffers. However TCM used struct se_mems, basically a linked list
of scatterlist entries. We are able to simplify the code by eliminating
this intermediate data structure and just using struct scatterlist[]
throughout.
Also, moved attachment of task to cmd out of transport_generic_get_task
and into allocate_control_task and allocate_data_tasks. The reasoning
is that it's nonintuitive that get_task should automatically add it to
the cmd's task list -- it should just return an allocated, initialized
task. That's all it should do, based on the function's name, so either the
function shouldn't do it, or the name should change to encapsulate the
entire essence of what it does.
(nab: Fix compile warnings in tcm_fc, and make transport_kmap_first_data_page
honor sg->offset for SGLs from contigious memory with TCM_Loop, and
fix control se_cmd descriptor memory leak)
Signed-off-by: Andy Grover <agrover@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Since sectors is not modified, it's more straightforward to do this.
Signed-off-by: Andy Grover <agrover@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Due to all cdbs' data buffers being referenced by scatterlists, buffers
of more than a page are not contiguous. Instead of handling this in all
control command handlers, we may be able to get away with just limiting
control cdb data buffers to one page. The only control CDBs we handle that
have potentially large data buffers are REPORT LUNS and UNMAP, so if we
didn't want to live with this limitation, they would need to be modified
to walk the pages in the data buffer's sgl.
Signed-off-by: Andy Grover <agrover@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Previously, some control CDBs did not allocate memory in pages for their
data buffer, but just did a kmalloc. This patch makes all cdbs allocate
pages.
This has the benefit of streamlining some paths that had to behave
differently when we used two allocation methods. The downside is that
all accesses to the data buffer need to kmap it before use, and need to
handle data in page-sized chunks if more than a page is needed for a given
command's data buffer.
Finally, note that cdbs with no data buffers are handled a little
differently. Before, SCSI_NON_DATA_CDBs would not call get_mem at all
(they'd be in the final else in transport_allocate_resources) but now
these will make it into generic_get_mem, but just not allocate any
buffers.
Signed-off-by: Andy Grover <agrover@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Implement page B1h, Block Device Characteristics, so that we can report
a medium rotation rate of 1 (non-rotating / solid state) if the
is_nonrot device attribute is set; we update the iblock backend to set
this attribute if the underlying Linux block device has its nonrot
flag set.
Signed-off-by: Roland Dreier <roland@purestorage.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
The current handling of VPD page 00h (Supported VPD Pages) for INQUIRY
commands has a couple of problems:
- The page length field is incorrectly set to 3, so the entry for 86h
(Extended INQUIRY Data) is ignored since it is in the fourth slot.
- Even though the code handles pages B0h and B2h, those pages aren't
mentioned in the Supported VPD Pages list, so eg the Linux SCSI stack
won't actually try to use them.
Fix these problems and make things more robust to avoid future problems
by moving to a table of supported VPD pages, which means that any added
VPD page support will automatically get reported on page 0.
Signed-off-by: Roland Dreier <roland@purestorage.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
In target_fabric_configfs_init(), we should allow fabric_mod to be NULL,
since THIS_MODULE is NULL for built-in modules. The main method of
using the target code may be as modules, but having everything built-in
is useful eg to be able to do quick testing with "qemu -kernel".
In any case, we shouldn't bomb out fabric registration for a perfectly
valid configuration, so simply drop the check of fabric_mod.
Signed-off-by: Roland Dreier <roland@purestorage.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
This patch converts ft_queue_cmd() to use wake_up_process() and
ft_thread() to use schedule_timeout_interruptible(MAX_SCHEDULE_TIMEOUT)
instead of wait_event_interruptible(). This fixes a potential race with
the wait_event_interruptible() conditional with qobj->queue_cnt in
ft_thread().
This patch also drops the unnecessary set_user_nice(current, -20) in
ft_thread(), and drops extra () around two if (!(acl)) conditionals in
tfc_conf.c.
Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Nicholas A. Bellinger <nab@linux-iscsi.org>
This patch removes the unnecessary EXTRA_CFLAGS includes, and drops the
unused -DTCM_FC_DEBUG define.
Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Nicholas A. Bellinger <nab@linux-iscsi.org>
There is a memory leak in tcm_loop_make_scsi_hba().
If all the strstr() calls return NULL and we end up at return ERR_PTR(-EINVAL);
then we'll be leaking the memory previously allocated to tl_hba as
that variable goes out of scope.
This patch should fix the leak.
Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>