The problem that this patch aims to fix is vfsmount refcounting scalability.
We need to take a reference on the vfsmount for every successful path lookup,
which often go to the same mount point.
The fundamental difficulty is that a "simple" reference count can never be made
scalable, because any time a reference is dropped, we must check whether that
was the last reference. To do that requires communication with all other CPUs
that may have taken a reference count.
We can make refcounts more scalable in a couple of ways, involving keeping
distributed counters, and checking for the global-zero condition less
frequently.
- check the global sum once every interval (this will delay zero detection
for some interval, so it's probably a showstopper for vfsmounts).
- keep a local count and only taking the global sum when local reaches 0 (this
is difficult for vfsmounts, because we can't hold preempt off for the life of
a reference, so a counter would need to be per-thread or tied strongly to a
particular CPU which requires more locking).
- keep a local difference of increments and decrements, which allows us to sum
the total difference and hence find the refcount when summing all CPUs. Then,
keep a single integer "long" refcount for slow and long lasting references,
and only take the global sum of local counters when the long refcount is 0.
This last scheme is what I implemented here. Attached mounts and process root
and working directory references are "long" references, and everything else is
a short reference.
This allows scalable vfsmount references during path walking over mounted
subtrees and unattached (lazy umounted) mounts with processes still running
in them.
This results in one fewer atomic op in the fastpath: mntget is now just a
per-CPU inc, rather than an atomic inc; and mntput just requires a spinlock
and non-atomic decrement in the common case. However code is otherwise bigger
and heavier, so single threaded performance is basically a wash.
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Commit 2c01946c6b (omap3 nand: cleanup
virtual address usages) wrongly enabled CONFIG_MTD_NAND_OMAP_HWECC
which breaks boards like beagle and pandora that use software ECC
for write.
Boards like beagle and pandora uses sw ecc for write (e.g. binary flushed
from u-boot) and read from kernel.
Signed-off-by: Sukumar Ghorai <s-ghorai@ti.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
[tony@atomide.com: updated comments]
Signed-off-by: Tony Lindgren <tony@atomide.com>
My new shiny code for corrupted PEB detection has NOR specific bug.
We tread PEB as corrupted and preserve it, if
1. EC header is OK.
2. VID header is corrupted.
3. data area is not "all 0xFFs"
In case of NOR we have 'nor_erase_prepare()' quirk, which invalidates
the headers before erasing the PEB. And we invalidate first the VID
header, and then the EC header. So if a power cut happens after we have
invalidated the VID header, but before we have invalidated the EC
header, we end up with a PEB which satisfies the above 3 conditions,
and the scanning code will treat it as corrupted, and will print
scary warnings, wrongly.
This patch fixes the issue by firt invalidating the EC header, then
invalidating the VID header. In case of power cut inbetween, we still
just lose the EC header, and UBI can deal with this situation gracefully.
Thanks to Anatolij Gustschin <agust@denx.de> for tracking this down.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Reported-by: Anatolij Gustschin <agust@denx.de>
Tested-by: Anatolij Gustschin <agust@denx.de>
Commit 45aafd3299 "UBI: tighten the corrupted PEB criteria"
introduced some return paths that didn't release the ubi->buf_mutex
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
* git://git.infradead.org/mtd-2.6: (82 commits)
mtd: fix build error in m25p80.c
mtd: Remove redundant mutex from mtd_blkdevs.c
MTD: Fix wrong check register_blkdev return value
Revert "mtd: cleanup Kconfig dependencies"
mtd: cfi_cmdset_0002: make sector erase command variable
mtd: cfi_cmdset_0002: add CFI detection for SST 38VF640x chips
mtd: cfi_util: add support for switching SST 39VF640xB chips into QRY mode
mtd: cfi_cmdset_0001: use defined value of P_ID_INTEL_PERFORMANCE instead of hardcoded one
block2mtd: dubious assignment
P4080/mtd: Fix the freescale lbc issue with 36bit mode
P4080/eLBC: Make Freescale elbc interrupt common to elbc devices
mtd: phram: use KBUILD_MODNAME
mtd: OneNAND: S5PC110: Fix double call suspend & resume function
mtd: nand: fix MTD_MODE_RAW writes
jffs2: use kmemdup
mtd: sm_ftl: cosmetic, use bool when possible
mtd: r852: remove useless pci powerup/down from suspend/resume routines
mtd: blktrans: fix a race vs kthread_stop
mtd: blktrans: kill BKL
mtd: allow to unload the mtdtrans module if its block devices aren't open
...
Fix up trivial whitespace-introduced conflict in drivers/mtd/mtdchar.c
While building an x86 distro kernel, I hit the following:
Kernel: arch/x86/boot/bzImage is ready (#7)
ERROR: "of_mtd_parse_partitions" [drivers/mtd/devices/m25p80.ko]
undefined!
of_mtd_parse_partitions is defined with MTD_OF_PARTS, and that's only
built on PPC and microblaze. The code in question should be wrapped w/
a stricter #ifdef.
Signed-off-by: Andres Salomon <dilinger@queued.net>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
In commit 2a48fc0ab2 ('block: autoconvert
trivial BKL users to private mutex'), Arnd replaced the BKL usage with a
mutex. However, Maxim has already provided a better fix in commit
480792b7bf ('mtd: blktrans: kill BKL'),
which was simply to remove the BKL without replacing it — since he'd
already made it do all necessary locking for itself.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Conflicts:
drivers/mtd/mtd_blkdevs.c
Merge Grant's device-tree bits so that we can apply the subsequent fixes.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
register_blkdev return 1..255 when major = 0.
if (ret ) {
printk(KERN_WARNING "Unable to register %s block device on major %d: %d\n",
tr->name, tr->major, ret);
mutex_unlock(&mtd_table_mutex);
return ret;
}
Above code will return fail when register_blkdev return allocated major number.
Signed-off-by: Frank Li <Frank.Li@freescale.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
This reverts commit 432dc821c9.
The individual CFI geometry options were carefully set up to get sane
default values if the CFI_ADV_OPTIONS wasn't set, and it wasn't
appropriate to move them into an if/endif block.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Some old SST chips use 0x50 as sector erase command, instead
of 0x30. Make this value variable to handle such chips.
Signed-off-by: Guillaume LECERF <glecerf@gmail.com>
Acked-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Add support for SST38VF640x chips in CFI mode.
Signed-off-by: Guillaume LECERF <glecerf@gmail.com>
Signed-off-by: yidong zhang <zhangyd6@gmail.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
When block2mtd_erase fails, a duplicated assignment instantly
changes instr->state from MTD_ERASE_FAILED to MTD_ERASE_DONE.
It looks to me like this might not be intended, or is it?
Signed-off-by: Nicolas Kaiser <nikai@nikai.net>
Acked-By: Joern Engel <joern@logfs.org>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
* 'davinci-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-davinci: (50 commits)
davinci: fix remaining board support after io_pgoffst removal
davinci: mityomapl138: make file local data static
arm/davinci: remove duplicated include
davinci: Initial support for Omapl138-Hawkboard
davinci: MityDSP-L138/MityARM-1808 read MAC address from I2C Prom
davinci: add tnetv107x touchscreen platform device
input: add driver for tnetv107x touchscreen controller
davinci: add keypad config for tnetv107x evm board
davinci: add tnetv107x keypad platform device
input: add driver for tnetv107x on-chip keypad controller
net: davinci_emac: cleanup unused cpdma code
net: davinci_emac: switch to new cpdma layer
net: davinci_emac: separate out cpdma code
net: davinci_emac: cleanup unused mdio emac code
omap: cleanup unused davinci mdio arch code
davinci: cleanup mdio arch code and switch to phy_id
net: davinci_emac: switch to new mdio
omap: add mdio platform devices
davinci: add mdio platform devices
net: davinci_emac: separate out davinci mdio
...
Fix up trivial conflict in drivers/input/keyboard/Kconfig (two entries
added next to each other - one from the davinci merge, one from the
input merge)
* 'next-devicetree' of git://git.secretlab.ca/git/linux-2.6:
mtd/m25p80: add support to parse the partitions by OF node
of/irq: of_irq.c needs to include linux/irq.h
of/mips: Cleanup some include directives/files.
of/mips: Add device tree support to MIPS
of/flattree: Eliminate need to provide early_init_dt_scan_chosen_arch
of/device: Rework to use common platform_device_alloc() for allocating devices
of/xsysace: Fix OF probing on little-endian systems
of: use __be32 types for big-endian device tree data
of/irq: remove references to NO_IRQ in drivers/of/platform.c
of/promtree: add package-to-path support to pdt
of/promtree: add of_pdt namespace to pdt code
of/promtree: no longer call prom_ functions directly; use an ops structure
of/promtree: make drivers/of/pdt.c no longer sparc-only
sparc: break out some PROM device-tree building code out into drivers/of
of/sparc: convert various prom_* functions to use phandle
sparc: stop exporting openprom.h header
powerpc, of_serial: Endianness issues setting up the serial ports
of: MTD: Fix OF probing on little-endian systems
of: GPIO: Fix OF probing on little-endian systems
When system uses 36bit physical address, res.start is 36bit
physical address. But the function of in_be32 returns 32bit
physical address. Then both of them compared each other is
wrong. So by converting the address of res.start into
the right format fixes this issue.
Signed-off-by: Lan Chunhe-B25806 <b25806@freescale.com>
Signed-off-by: Roy Zang <tie-fei.zang@freescale.com>
Reviewed-by: Anton Vorontsov <cbouatmailru@gmail.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Move Freescale elbc interrupt from nand driver to elbc driver.
Then all elbc devices can use the interrupt instead of ONLY nand.
For former nand driver, it had the two functions:
1. detecting nand flash partitions;
2. registering elbc interrupt.
Now, second function is removed to fsl_lbc.c.
Signed-off-by: Lan Chunhe-B25806 <b25806@freescale.com>
Signed-off-by: Roy Zang <tie-fei.zang@freescale.com>
Reviewed-by: Anton Vorontsov <cbouatmailru@gmail.com>
Cc: Wood Scott-B07421 <B07421@freescale.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Use the more standard #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
No change in output strings.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The suspend & resume called from mtd core. So no need to call at driver.
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
RAW writes were broken by 782ce79a45
which introduced a check of ops->ooboffs in nand_do_write_ops().
When writing in RAW mode this is called with an ops struct on the stack
of mtdchar.c:mtd_write() which does not initialise ops->ooboffs, so it
is garbage and fails this test.
This test does not make sense if ops->oobbuf is NULL, which it is in the
RAW write path, so include that in the test.
Signed-off-by: Jon Povey <jon.povey@racelogic.co.uk>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
I didn't know that kernel allows use of that typedef.
Signed-off-by: Maxim Levitsky <maximlevitsky@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
It turns out that pci core now handles these, so this code is redundant
and can even cause bugs
Signed-off-by: Maxim Levitsky <maximlevitsky@gmail.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
There is small race window that could make kthread_stop hang forever.
I found that while hacking the IR subsystem.
Signed-off-by: Maxim Levitsky <maximlevisky@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
It not needed, because I already added locking for all fops
methods.
Signed-off-by: Maxim Levitsky <maximlevisky@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Now it once again possible to remove mtdtrans module.
You still need to ensure that block devices of that module aren't mounted.
This is due to the fact that as long as a block device is open, it still exists,
therefore if we were to allow module removal, this block device might became used again.
This time in addition to code review, I also made the code
pass some torture tests like module reload in a loop + read in a loop +
card insert/removal all at same time.
The blktrans_open/blktrans_release don't take the mtd table lock because:
While device is added (that includes execution of add_mtd_blktrans_dev)
the lock is already taken
Now suppose the device will never be removed. In this case even if we have changes
in mtd table, the entry that we need will stay exactly the same. (Note that we don't
look at table at all, just following private pointer of block device).
Now suppose that someone tries to remove the mtd device.
This will be propagated to trans driver which _ought_ to call del_mtd_blktrans_dev
which will take the per device lock, release the mtd device and set trans->mtd = NULL.
>From this point on, following opens won't even be able to know anything about that mtd device
(which at that point is likely not to exist)
Also the same care is taken not to trip over NULL mtd pointer in blktrans_dev_release.
Signed-off-by: Maxim Levitsky <maximlevitsky@gmail.com>
Tested-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
As reported on lkml, building this module for HIMEM systems spews warnings
about mismatch in pointer types. Further, we need to use ioremap() in order
to properly access the flash memory on most systems rather than just doing
it directly.
Reported-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The bbt structure isn't actually used, just the badblockpos. This lets
the driver correctly handle badblocks with the different OOB layout with
certain sized flashes. Previously, the blocks would all be reported as
bad and be completely unusable.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
- remove disabled code (hasn't been touched since the beginning of git
and should be reimplemented if really needed)
- convert remaining c++-comments to plain c-style
Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Acked-by: Guillaume LECERF <glecerf@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
For 4Gib non-DDP chip it does not follow that it is always 4KiB page chip.
The number of data buffers is checked and if it is equal to 1
we suppose that it is 4KiB page onenand chip.
Signed-off-by: Roman Tereshonkov <roman.tereshonkov@nokia.com>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
This patch changes the loop over the "reg" tuples to not exit
directly upon of_address_to_resource() failure but to continue
with the next "reg" tuple instead. This failure could be due to
size = 0, which might be passed via the device-tree.
This is needed for boards, where a "reg" tuple might have size 0
(of_address_to_resource() returns with EINVAL when size = 0).
Example:
Fully equipped board:
reg = <0 0x00000000 0x00400000
0 0x00400000 0x00400000>;
Partially equipped board:
reg = <0 0x00000000 0x00400000
0 0x00400000 0x00000000>;
This could be the case on boards with runtime detection of
multiple NOR flash configurations where the detected flash size
is inserted into the dtb in U-Boot.
Signed-off-by: Stefan Roese <sr@denx.de>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
This patch fixes sparse warning for static declaration of variable "use_dma"
drivers/mtd/nand/omap2.c:114:11: warning: symbol 'use_dma' was not declared. Should it be static?
Signed-off-by: G, Manjunath Kondaiah <manjugk@ti.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
This patch adds the appropriate conversions to correct the endianness
issues in the MTD driver whenever it accesses the device tree (which is
always big endian).
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
I used this to check the BBT on flash together with a hack in mtdchar in
order to read bad blocks.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
it will create an empty BBT table without considering vendor's BBT
information. Vendor's information may be unavailable if the NAND
controller has a different DATA & OOB layout or this information may be
allready purged.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The first (sixt) byte in the OOB area contains vendor's bad block
information. During identification of the NAND chip this information is
collected by scanning the complete chip.
The option NAND_USE_FLASH_BBT is used to store this information in a sector so
we don't have to scan the complete flash. Unfortunately the code stores
a marker in order to recognize the BBT in the OOB area. This will fail
if the OOB area is completely used for ECC.
This patch introduces the option NAND_USE_FLASH_BBT_NO_OOB which has to be
used with NAND_USE_FLASH_BBT. It will then store BBT on flash without
touching the OOB area. The BBT format on flash remains same except the
first page starts with the recognition pattern followed by the version byte.
This change was tested in nandsim and it looks good so far :)
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
No code change.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
S5PC210 has the same OneNAND controller as S5PC110
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Implement DMA interrupt method. previous time it polls the DMA status.
It can reduce the CPU power but decrease the performance a little.
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
When use HIGHMEM, dma_map_single doesn't get the proper DMA address.
So use the dma_map_page in this case.
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>