[PATCH] page migration: Update documentation
Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This commit is contained in:
parent
04e62a29bf
commit
8d3c138b77
1 changed files with 33 additions and 58 deletions
|
@ -62,15 +62,15 @@ A. In kernel use of migrate_pages()
|
||||||
It also prevents the swapper or other scans to encounter
|
It also prevents the swapper or other scans to encounter
|
||||||
the page.
|
the page.
|
||||||
|
|
||||||
2. Generate a list of newly allocates page. These pages will contain the
|
2. Generate a list of newly allocates pages. These pages will contain the
|
||||||
contents of the pages from the first list after page migration is
|
contents of the pages from the first list after page migration is
|
||||||
complete.
|
complete.
|
||||||
|
|
||||||
3. The migrate_pages() function is called which attempts
|
3. The migrate_pages() function is called which attempts
|
||||||
to do the migration. It returns the moved pages in the
|
to do the migration. It returns the moved pages in the
|
||||||
list specified as the third parameter and the failed
|
list specified as the third parameter and the failed
|
||||||
migrations in the fourth parameter. The first parameter
|
migrations in the fourth parameter. When the function
|
||||||
will contain the pages that could still be retried.
|
returns the first list will contain the pages that could still be retried.
|
||||||
|
|
||||||
4. The leftover pages of various types are returned
|
4. The leftover pages of various types are returned
|
||||||
to the LRU using putback_to_lru_pages() or otherwise
|
to the LRU using putback_to_lru_pages() or otherwise
|
||||||
|
@ -93,83 +93,58 @@ Steps:
|
||||||
|
|
||||||
2. Insure that writeback is complete.
|
2. Insure that writeback is complete.
|
||||||
|
|
||||||
3. Make sure that the page has assigned swap cache entry if
|
3. Prep the new page that we want to move to. It is locked
|
||||||
it is an anonyous page. The swap cache reference is necessary
|
|
||||||
to preserve the information contain in the page table maps while
|
|
||||||
page migration occurs.
|
|
||||||
|
|
||||||
4. Prep the new page that we want to move to. It is locked
|
|
||||||
and set to not being uptodate so that all accesses to the new
|
and set to not being uptodate so that all accesses to the new
|
||||||
page immediately lock while the move is in progress.
|
page immediately lock while the move is in progress.
|
||||||
|
|
||||||
5. All the page table references to the page are either dropped (file
|
4. The new page is prepped with some settings from the old page so that
|
||||||
backed pages) or converted to swap references (anonymous pages).
|
accesses to the new page will discover a page with the correct settings.
|
||||||
This should decrease the reference count.
|
|
||||||
|
5. All the page table references to the page are converted
|
||||||
|
to migration entries or dropped (nonlinear vmas).
|
||||||
|
This decrease the mapcount of a page. If the resulting
|
||||||
|
mapcount is not zero then we do not migrate the page.
|
||||||
|
All user space processes that attempt to access the page
|
||||||
|
will now wait on the page lock.
|
||||||
|
|
||||||
6. The radix tree lock is taken. This will cause all processes trying
|
6. The radix tree lock is taken. This will cause all processes trying
|
||||||
to reestablish a pte to block on the radix tree spinlock.
|
to access the page via the mapping to block on the radix tree spinlock.
|
||||||
|
|
||||||
7. The refcount of the page is examined and we back out if references remain
|
7. The refcount of the page is examined and we back out if references remain
|
||||||
otherwise we know that we are the only one referencing this page.
|
otherwise we know that we are the only one referencing this page.
|
||||||
|
|
||||||
8. The radix tree is checked and if it does not contain the pointer to this
|
8. The radix tree is checked and if it does not contain the pointer to this
|
||||||
page then we back out because someone else modified the mapping first.
|
page then we back out because someone else modified the radix tree.
|
||||||
|
|
||||||
9. The mapping is checked. If the mapping is gone then a truncate action may
|
9. The radix tree is changed to point to the new page.
|
||||||
be in progress and we back out.
|
|
||||||
|
|
||||||
10. The new page is prepped with some settings from the old page so that
|
10. The reference count of the old page is dropped because the radix tree
|
||||||
accesses to the new page will be discovered to have the correct settings.
|
reference is gone. A reference to the new page is established because
|
||||||
|
the new page is referenced to by the radix tree.
|
||||||
|
|
||||||
11. The radix tree is changed to point to the new page.
|
11. The radix tree lock is dropped. With that lookups in the mapping
|
||||||
|
become possible again. Processes will move from spinning on the tree_lock
|
||||||
|
to sleeping on the locked new page.
|
||||||
|
|
||||||
12. The reference count of the old page is dropped because the radix tree
|
12. The page contents are copied to the new page.
|
||||||
reference is gone.
|
|
||||||
|
|
||||||
13. The radix tree lock is dropped. With that lookups become possible again
|
13. The remaining page flags are copied to the new page.
|
||||||
and other processes will move from spinning on the tree lock to sleeping on
|
|
||||||
the locked new page.
|
|
||||||
|
|
||||||
14. The page contents are copied to the new page.
|
14. The old page flags are cleared to indicate that the page does
|
||||||
|
not provide any information anymore.
|
||||||
|
|
||||||
15. The remaining page flags are copied to the new page.
|
15. Queued up writeback on the new page is triggered.
|
||||||
|
|
||||||
16. The old page flags are cleared to indicate that the page does
|
16. If migration entries were page then replace them with real ptes. Doing
|
||||||
not use any information anymore.
|
so will enable access for user space processes not already waiting for
|
||||||
|
the page lock.
|
||||||
17. Queued up writeback on the new page is triggered.
|
|
||||||
|
|
||||||
18. If swap pte's were generated for the page then replace them with real
|
|
||||||
ptes. This will reenable access for processes not blocked by the page lock.
|
|
||||||
|
|
||||||
19. The page locks are dropped from the old and new page.
|
19. The page locks are dropped from the old and new page.
|
||||||
Processes waiting on the page lock can continue.
|
Processes waiting on the page lock will redo their page faults
|
||||||
|
and will reach the new page.
|
||||||
|
|
||||||
20. The new page is moved to the LRU and can be scanned by the swapper
|
20. The new page is moved to the LRU and can be scanned by the swapper
|
||||||
etc again.
|
etc again.
|
||||||
|
|
||||||
TODO list
|
Christoph Lameter, May 8, 2006.
|
||||||
---------
|
|
||||||
|
|
||||||
- Page migration requires the use of swap handles to preserve the
|
|
||||||
information of the anonymous page table entries. This means that swap
|
|
||||||
space is reserved but never used. The maximum number of swap handles used
|
|
||||||
is determined by CHUNK_SIZE (see mm/mempolicy.c) per ongoing migration.
|
|
||||||
Reservation of pages could be avoided by having a special type of swap
|
|
||||||
handle that does not require swap space and that would only track the page
|
|
||||||
references. Something like that was proposed by Marcelo Tosatti in the
|
|
||||||
past (search for migration cache on lkml or linux-mm@kvack.org).
|
|
||||||
|
|
||||||
- Page migration unmaps ptes for file backed pages and requires page
|
|
||||||
faults to reestablish these ptes. This could be optimized by somehow
|
|
||||||
recording the references before migration and then reestablish them later.
|
|
||||||
However, there are several locking challenges that have to be overcome
|
|
||||||
before this is possible.
|
|
||||||
|
|
||||||
- Page migration generates read ptes for anonymous pages. Dirty page
|
|
||||||
faults are required to make the pages writable again. It may be possible
|
|
||||||
to generate a pte marked dirty if it is known that the page is dirty and
|
|
||||||
that this process has the only reference to that page.
|
|
||||||
|
|
||||||
Christoph Lameter, March 8, 2006.
|
|
||||||
|
|
||||||
|
|
Loading…
Reference in a new issue