Commit graph

174473 commits

Author SHA1 Message Date
Alan Cox
d74e828688 tty: tty_port: Add IO_ERROR bit handling
To propogate tty_port_open/close to a few other devices we need to start
handling the IO_ERROR flag on the tty. We can do this pretty trivially.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 15:18:06 -08:00
Alan Cox
8f1e672344 tty: riscom8: switch to the tty_port_open API
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 15:18:06 -08:00
Alan Cox
44e4909e45 tty: tty_port: Change the buffer allocator locking
We want to be able to do this without regard for the activate/own open
method being used which causes a problem using port->mutex. Add another
mutex for now. Once everything uses port_open to do buffer allocs we can
kill it back off

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 15:18:06 -08:00
Alan Cox
1f100b323d tty: sdio_uart: Fix the locking on "func" for new code
The new dtr_rts function didn't take the port->func lock as it should
so add use of the lock there.


Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 15:18:06 -08:00
Alan Cox
4b3b49bb77 tty: sdio_uart: add modem functionality
Add the POSIX block for carrier

Linux TIOCMIWAIT functionality is still lacking from the driver.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 15:18:06 -08:00
Alan Cox
c271cf37ba tty: sdio_uart: Style fixes
Running the current code through checkpatch shows a few bits of noise
mostly but not entirely from before the changes.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 15:18:06 -08:00
Alan Cox
6238e712af tty: sdio_uart: Fix termios handling
Switching between two non standard baud rates fails because of the cflag
test. Do as we did elsewhere and just kill the "optimisation".

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 15:18:06 -08:00
Alan Cox
584abc3775 tty: sdio_uart: Switch to the open/close helpers
Gets us proper tty semantics, removes some code and fixes up a few corner
case races (hangup during open etc)

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 15:18:05 -08:00
Alan Cox
0a68f64feb sdio_uart: Move the open lock
When we move to the tty_port logic the port mutex will protect open v close
v hangup. Move to this first in the existing open code so we have a bisection
point.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 15:18:05 -08:00
Alan Cox
530646f469 sdio_uart: refcount the tty objects
The tty can go away underneath us, so we must refcount it. Do the naïve
implementation initially. We will worry about startup shortly.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 15:18:05 -08:00
Nicolas Pitre
0395b48c78 sdio_uart: Fix oops caused by the previous changeset
Now... testing reveals that the very first patch "sdio_uart: use
tty_port" causes a segmentation fault in sdio_uart_open():

Unable to handle kernel NULL pointer dereference at virtual address 00000084
pgd = dfb44000 [00000084] *pgd=1fb99031, *pte=00000000, *ppte=00000000
Internal error: Oops: 17 [#1] PREEMPT
last sysfs file:
/sys/devices/platform/mvsdio/mmc_host/mmc0/mmc0:f111/uevent
Modules linked in:
CPU: 0    Not tainted  (2.6.32-rc5-next-20091102-00001-gb36eae9 #10)
PC is at sdio_uart_open+0x204/0x2cc
[...]

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 15:18:05 -08:00
Alan Cox
b5849b1a82 sdio_uart: use tty_port
Add a tty_port object to the sdio uart. For the moment just begin using the
tty field of the port, as this is the critical one to clean up.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 15:18:05 -08:00
Alan Cox
e707c35cbb tty_port: Move hupcl handling
Move the HUCPL handling from the end of close_port_start to the beginning
of close_port_end. What this actually does is change the ordering from

	port shutdown
	port->dtr_rts

to

	port->dtr_rts
	port shutdown

Some hardware drops the physical connection on shutdown so we must perform
the port operations before the shutdown.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 15:18:05 -08:00
Sukadev Bhattiprolu
edfacdd6f8 devpts_get_tty() should validate inode
devpts_get_tty() assumes that the inode passed in is associated with a valid
pty.  But if the only reference to the pty is via a bind-mount, the inode
passed to devpts_get_tty() while valid, would refer to a pty that no longer
exists.

With a lot of debug effort, Grzegorz Nosek developed a small program (see
below) to reproduce a crash on recent kernels. This crash is a regression
introduced by the commit:

	commit 527b3e4773
	Author: Sukadev Bhattiprolu <sukadev@us.ibm.com>
	Date:   Mon Oct 13 10:43:08 2008 +0100

To fix, ensure that the dentry associated with the inode has not yet been
deleted/unhashed by devpts_pty_kill().

See also:
https://lists.linux-foundation.org/pipermail/containers/2009-July/019273.html 

tty-bug.c:

#define _GNU_SOURCE
#include <fcntl.h>
#include <sched.h>
#include <stdlib.h>
#include <sys/mount.h>
#include <sys/signal.h>
#include <unistd.h>
#include <stdio.h>

#include <linux/fs.h>

void dummy(int sig)
{
}

static int child(void *unused)
{
	int fd;

	signal(SIGINT, dummy); signal(SIGHUP, dummy);
	pause(); /* cheesy synchronisation to wait for /dev/pts/0 to appear */

	mount("/dev/pts/0", "/dev/console", NULL, MS_BIND, NULL);
	sleep(2);

	fd = open("/dev/console", O_RDWR);
	dup(0); dup(0);
	write(1, "Hello world!\n", sizeof("Hello world!\n")-1);
	return 0;
}

int main(void)
{
	pid_t pid;
	char *stack;

	stack = malloc(16384);
	pid = clone(child, stack+16384, CLONE_NEWNS|SIGCHLD, NULL);

	open("/dev/ptmx", O_RDWR|O_NOCTTY|O_NONBLOCK);

	unlockpt(fd); grantpt(fd);

	sleep(2);
	kill(pid, SIGHUP);
	sleep(1);
	return 0; /* exit before child opens /dev/console */
}

Reported-by: Grzegorz Nosek <root@localdomain.pl>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Tested-by: Serge Hallyn <serue@us.ibm.com>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 15:18:05 -08:00
Ian Jackson
68cb4f8e24 Serial: Do not read IIR in serial8250_start_tx when UART_BUG_TXEN
Do not read IIR in serial8250_start_tx when UART_BUG_TXEN

Reading the IIR clears some oustanding interrupts so it is not safe.
Instead, simply transmit immediately if the buffer is empty without
regard to IIR.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Jiri Kosina <jkosina@suse.cz>
Cc: Alan Cox <alan@linux.intel.com>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 15:18:05 -08:00
Tilman Schmidt
7e11a0fb3b tty: docs: serial/tty, add to ldisc methods
A small addition to the ldisc method descriptions.

Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 15:18:05 -08:00
Uwe Kleine-König
c934878cc0 Serial: pxa: work around Errata #75
Intel(R) PXA27x Processor Family Specification Update (Nov 2005)
says:

  E75. UART: Baud rate may not be programmed correctly on
       back-to-back writes.

  Problem:
  When programming the Divisor Latch registers, Low and High (DLL and
  DLH), with back-to-back writes, the second register write may not
  take effect. The result is an incorrect baud rate.

  Workaround:
  After programming the first Divisor Latch register, read and verify
  it before programming the second Divisor Latch register.

This was hit when changing the baud rate from 115200 to 9600 while
receiving characters at 9600 Bd.

And fixed indention of some comments nearby.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Wolfram Sang <w.sang@pengutronix.de>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: Eric Miao <eric.y.miao@gmail.com>
Cc: Alan Cox <alan@linux.intel.com>
Cc: Mike Rapoport <mike@compulab.co.il>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 15:18:04 -08:00
André Goddard Rosa
4c0ebb8057 serial, 8250: calculate irqflags bitmask before loop
Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 15:18:04 -08:00
André Goddard Rosa
82cb7ba10d serial: cascade needless conditionals
Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 15:18:04 -08:00
André Goddard Rosa
9e845abfc8 serial: fix NULL pointer dereference
If kzalloc() or alloc_tty_driver() fails, we call:
    put_tty_driver(normal = NULL).

Then:
    put_tty_driver -> tty_driver_kref_put -> kref_put(&NULL->kref, ...)

Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 15:18:04 -08:00
Alan Cox
2a0785ea37 opticon: Fix resume logic
Opticon now takes the right mutex to check the port status but the status
check is done wrongly for the modern serial code, so fix it.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Oliver Neukum <oliver@neukum.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 15:18:04 -08:00
Alan Cox
82fc594343 usb_serial: Kill port mutex
The tty port has a port mutex used for all the port related locking so we
don't need the one in the USB serial layer any more.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Oliver Neukum <oliver@neukum.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 15:18:04 -08:00
Alan Cox
e1108a63e1 usb_serial: Use the shutdown() operation
As Alan Stern pointed out - now we have tty_port_open the shutdown method
and locking allow us to whack the other bits into the full helper methods
and provide a shutdown op which the tty port code will synchronize with 
setup for us.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Oliver Neukum <oliver@neukum.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 15:18:04 -08:00
Alan Cox
d774a56d23 tty_port: coding style cleaning pass
Mind the hoover wire...

Signed-off-by: Alan Cox <alan@linux.intel.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Oliver Neukum <oliver@neukum.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 15:18:04 -08:00
Alan Cox
64bc397914 tty_port: add "tty_port_open" helper
For the moment this just moves the USB logic over and fixes the 'what if
we open and hangup at the same time' race noticed by Oliver Neukum.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Oliver Neukum <oliver@neukum.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 15:18:04 -08:00
Alan Cox
894cb91770 tty: stallion: kill BKL ioctl
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 15:18:03 -08:00
Alan Cox
626875381e tty: istallion: Kill off the BKL ioctl
Fairly trivial as the BKL push down into the methods has already been done.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 15:18:03 -08:00
Alan Cox
f53a2ade0b tty: esp: remove broken driver
The ESP driver has been marked broken for years. It's an old ISA device
that clearly nobody cares about any more. Remove it

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 15:18:03 -08:00
Alexey Dobriyan
1cceefd3a2 tty: const: constify remaining tty_operations
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 15:18:03 -08:00
Breno Leitao
e6bdf24cf2 jsm: adding EEH handlers
Adding EEH handlers for the serial jsm driver. This patch adds
the PCI error handlers and also register them to be called when
a error is detected.

Signed-off-by: Breno Leitao <leitao@linux.vnet.ibm.com>
Acked-by: Scott Kilau <scottk@digi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 15:18:03 -08:00
Breno Leitão
f9d1dff276 jsm: removing the field jsm_board->intr_count
Currently there is a field in the jsm_board structure to cont
the number of interrupt that the card recevived, but it's not
working properly when the IRQ line is shared, and also nowhere
else this field is used. So, This patch is removing it.

Signed-off-by: Breno Leitão <leitao@linux.vnet.ibm.com>
Cc: Scott Kilau <Scott.Kilau@digi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 15:18:03 -08:00
Breno Leitão
a44829dd8b jsm: Removing unused jsm_channel->ch_wopen field
Currently the jsm_channel->ch_wopen field is defined and never
used. So, this patch removes it.

Signed-off-by: Breno Leitão <leitao@linux.vnet.ibm.com>
Cc: Scott Kilau <Scott.Kilau@digi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 15:18:03 -08:00
Breno Leitão
2fd107c015 jsm: Remove ch_cpstime field
Currently the field jsm_channel->ch_cpstime is defined but never
used, so this patch removes it.

Signed-off-by: Breno Leitão <leitao@linux.vnet.ibm.com>
Cc: Scott Kilau <Scott.Kilau@digi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 15:18:03 -08:00
Breno Leitão
cead486f40 jsm: removing ch_old_baud field
Currently the field jsm_channel->ch_old_baud is not used, just
assigned in a lot of places but never used. This patches removes
this field.

Signed-off-by: Breno Leitão <leitao@linux.vnet.ibm.com>
Cc: Scott Kilau <scottk@digi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 15:18:02 -08:00
Breno Leitão
a53568a22a jsm: remove the ch_custom_speed field
Currently the ch_custom_speed field exists but is never used,
so, this patch removes it.

Signed-off-by: Breno Leitão <leitao@linux.vnet.ibm.com>
Cc: Scott Kilau <scottk@digi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 15:18:02 -08:00
Breno Leitão
354aaf964e jsm: Rewriting a bad log message
Actually jsm displays "Device Added" 8 times (for a 8 port device).
This silly patch just makes things more informative, showing
the port (instead of the device) that was added.

Signed-off-by: Breno Leitão <leitao@linux.vnet.ibm.com>
Cc: Scott Kilau <scottk@digi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 15:18:02 -08:00
Breno Leitão
a88a477f1d jsm: IRQ handlers doesn't need to have IRQ_DISABLED enabled
Currently jsm is showing the following message when loaded:

IRQ 432/JSM: IRQF_DISABLED is not guaranteed on shared IRQs

It's because the request_irq() is called using IRQF_DISABLED
and IRQF_SHARED.
Actually there is no need to use IRQF_DISABLED in this driver.

Signed-off-by: Breno Leitão <leitao@linux.vnet.ibm.com>
Cc: Scott Kilau <scottk@digi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 15:18:02 -08:00
Linus Torvalds
d71cb81af3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: Add debugobjects support
2009-12-10 09:35:44 -08:00
Linus Torvalds
ab1831b0b8 Merge branch 'bugfix' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen
* 'bugfix' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen:
  xen: try harder to balloon up under memory pressure.
  Xen balloon: fix totalram_pages counting.
  xen: explicitly create/destroy stop_machine workqueues outside suspend/resume region.
  xen: improve error handling in do_suspend.
  xen: don't leak IRQs over suspend/resume.
  xen: call clock resume notifier on all CPUs
  xen: use iret for return from 64b kernel to 32b usermode
  xen: don't call dpm_resume_noirq() with interrupts disabled.
  xen: register runstate info for boot CPU early
  xen: register runstate on secondary CPUs
  xen: register timer interrupt with IRQF_TIMER
  xen: correctly restore pfn_to_mfn_list_list after resume
  xen: restore runstate_info even if !have_vcpu_info_placement
  xen: re-register runstate area earlier on resume.
  xen: wait up to 5 minutes for device connetion
  xen: improvement to wait_for_devices()
  xen: fix is_disconnected_device/exists_disconnected_device
  xen/xenbus: make DEVICE_ATTR()s static
2009-12-10 09:35:02 -08:00
Linus Torvalds
eae6fa9b0c Merge branch 'xen/fbdev' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen
* 'xen/fbdev' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen:
  xen pvfb: Inhibit VM_IO flag to be set on vmalloc-ed framebuffers.
  fb-defio: Inhibit VM_IO flag to be set on vmalloc-ed framebuffers.
  fb-defio: If FBINFO_VIRTFB is defined, do not set VM_IO flag.
  Fix toogle whether xenbus driver should be built as module or part of kernel.
2009-12-10 09:34:40 -08:00
Linus Torvalds
02412f49f6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm:
  dlm: always use GFP_NOFS
2009-12-10 09:33:59 -08:00
Linus Torvalds
4515c3069d Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (47 commits)
  ext4: Fix potential fiemap deadlock (mmap_sem vs. i_data_sem)
  ext4: Do not override ext2 or ext3 if built they are built as modules
  jbd2: Export jbd2_log_start_commit to fix ext4 build
  ext4: Fix insufficient checks in EXT4_IOC_MOVE_EXT
  ext4: Wait for proper transaction commit on fsync
  ext4: fix incorrect block reservation on quota transfer.
  ext4: quota macros cleanup
  ext4: ext4_get_reserved_space() must return bytes instead of blocks
  ext4: remove blocks from inode prealloc list on failure
  ext4: wait for log to commit when umounting
  ext4: Avoid data / filesystem corruption when write fails to copy data
  ext4: Use ext4 file system driver for ext2/ext3 file system mounts
  ext4: Return the PTR_ERR of the correct pointer in setup_new_group_blocks()
  jbd2: Add ENOMEM checking in and for jbd2_journal_write_metadata_buffer()
  ext4: remove unused parameter wbc from __ext4_journalled_writepage()
  ext4: remove encountered_congestion trace
  ext4: move_extent_per_page() cleanup
  ext4: initialize moved_len before calling ext4_move_extents()
  ext4: Fix double-free of blocks with EXT4_IOC_MOVE_EXT
  ext4: use ext4_data_block_valid() in ext4_free_blocks()
  ...
2009-12-10 09:33:29 -08:00
Linus Torvalds
a5eba3f66f Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd
* 'for-linus' of git://git.open-osd.org/linux-open-osd:
  exofs: Multi-device mirror support
  exofs: Move all operations to an io_engine
  exofs: move osd.c to ios.c
  exofs: statfs blocks is sectors not FS blocks
  exofs: Prints on mount and unmout
  exofs: refactor exofs_i_info initialization into common helper
  exofs: dbg-print less
  exofs: More sane debug print
  trivial: some small fixes in exofs documentation
2009-12-10 09:32:24 -08:00
Linus Torvalds
fc1495bf99 Merge git://git.infradead.org/ubifs-2.6
* git://git.infradead.org/ubifs-2.6:
  UBIFS: fix return code in check_leaf
  UBI: flush wl before clearing update marker
  MAINTAINERS: change e-mail of Artem Bityutskiy
  UBIFS: remove manual O_SYNC handling
  UBIFS: support mounting of UBI volume character devices
  UBI: Add ubi_open_volume_path
2009-12-10 09:31:45 -08:00
David Wong
5476ffd2b7 V4L/DVB (13592): max2165: 32bit build patch
This patch drops usage of floating point variable for 32bit build

Signed-off-by: David T. L. Wong <davidtlwong@gmail.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2009-12-10 12:45:18 -02:00
Boaz Harrosh
04dc1e88ad exofs: Multi-device mirror support
This patch changes on-disk format, it is accompanied with a parallel
patch to mkfs.exofs that enables multi-device capabilities.

After this patch, old exofs will refuse to mount a new formatted FS and
new exofs will refuse an old format. This is done by moving the magic
field offset inside the FSCB. A new FSCB *version* field was added. In
the future, exofs will refuse to mount unmatched FSCB version. To
up-grade or down-grade an exofs one must use mkfs.exofs --upgrade option
before mounting.

Introduced, a new object that contains a *device-table*. This object
contains the default *data-map* and a linear array of devices
information, which identifies the devices used in the filesystem. This
object is only written to offline by mkfs.exofs. This is why it is kept
separate from the FSCB, since the later is written to while mounted.

Same partition number, same object number is used on all devices only
the device varies.

* define the new format, then load the device table on mount time make
  sure every thing is supported.

* Change I/O engine to now support Mirror IO, .i.e write same data
  to multiple devices, read from a random device to spread the
  read-load from multiple clients (TODO: stripe read)

Implementation notes:
 A few points introduced in previous patch should be mentioned here:

* Special care was made so absolutlly all operation that have any chance
  of failing are done before any osd-request is executed. This is to
  minimize the need for a data consistency recovery, to only real IO
  errors.

* Each IO state has a kref. It starts at 1, any osd-request executed
  will increment the kref, finally when all are executed the first ref
  is dropped. At IO-done, each request completion decrements the kref,
  the last one to return executes the internal _last_io() routine.
  _last_io() will call the registered io_state_done. On sync mode a
  caller does not supply a done method, indicating a synchronous
  request, the caller is put to sleep and a special io_state_done is
  registered that will awaken the caller. Though also in sync mode all
  operations are executed in parallel.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2009-12-10 09:59:23 +02:00
Boaz Harrosh
06886a5a3d exofs: Move all operations to an io_engine
In anticipation for multi-device operations, we separate osd operations
into an abstract I/O API. Currently only one device is used but later
when adding more devices, we will drive all devices in parallel according
to a "data_map" that describes how data is arranged on multiple devices.
The file system level operates, like before, as if there is one object
(inode-number) and an i_size. The io engine will split this to the same
object-number but on multiple device.

At first we introduce Mirror (raid 1) layout. But at the final outcome
we intend to fully implement the pNFS-Objects data-map, including
raid 0,4,5,6 over mirrored devices, over multiple device-groups. And
more. See: http://tools.ietf.org/html/draft-ietf-nfsv4-pnfs-obj-12

* Define an io_state based API for accessing osd storage devices
  in an abstract way.
  Usage:
	First a caller allocates an io state with:
		exofs_get_io_state(struct exofs_sb_info *sbi,
				   struct exofs_io_state** ios);

	Then calles one of:
		exofs_sbi_create(struct exofs_io_state *ios);
		exofs_sbi_remove(struct exofs_io_state *ios);
		exofs_sbi_write(struct exofs_io_state *ios);
		exofs_sbi_read(struct exofs_io_state *ios);
		exofs_oi_truncate(struct exofs_i_info *oi, u64 new_len);

	And when done
		exofs_put_io_state(struct exofs_io_state *ios);

* Convert all source files to use this new API
* Convert from bio_alloc to bio_kmalloc
* In io engine we make use of the now fixed osd_req_decode_sense

There are no functional changes or on disk additions after this patch.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2009-12-10 09:59:22 +02:00
Boaz Harrosh
8ce9bdd1fb exofs: move osd.c to ios.c
If I do a "git mv" together with a massive code change
and commit in one patch, git looses the rename and
records a delete/new instead. This is bad because I want
a rename recorded so later rebased/cherry-picked patches
to the old name will work. Also the --follow is lost.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2009-12-10 09:59:21 +02:00
Boaz Harrosh
cae012d853 exofs: statfs blocks is sectors not FS blocks
Even though exofs has a 4k block size, statfs blocks
is in sectors (512 bytes).

Also if target returns 0 for capacity then make it
ULLONG_MAX. df does not like zero-size filesystems

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2009-12-10 09:59:21 +02:00
Boaz Harrosh
19fe294f2e exofs: Prints on mount and unmout
It is important to print in the logs when a filesystem was
mounted and eventually unmounted.

Print the osd-device's osd_name and pid the FS was
mounted/unmounted on.

TODO: How to also print the namespace path the filesystem was
      mounted on?

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2009-12-10 09:59:20 +02:00