x86, ACPI, mm: Revert movablemem_map support
Tim found: WARNING: at arch/x86/kernel/smpboot.c:324 topology_sane.isra.2+0x6f/0x80() Hardware name: S2600CP sched: CPU #1's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency. smpboot: Booting Node 1, Processors #1 Modules linked in: Pid: 0, comm: swapper/1 Not tainted 3.9.0-0-generic #1 Call Trace: set_cpu_sibling_map+0x279/0x449 start_secondary+0x11d/0x1e5 Don Morris reproduced on a HP z620 workstation, and bisected it to commite8d1955258
("acpi, memory-hotplug: parse SRAT before memblock is ready") It turns out movable_map has some problems, and it breaks several things 1. numa_init is called several times, NOT just for srat. so those nodes_clear(numa_nodes_parsed) memset(&numa_meminfo, 0, sizeof(numa_meminfo)) can not be just removed. Need to consider sequence is: numaq, srat, amd, dummy. and make fall back path working. 2. simply split acpi_numa_init to early_parse_srat. a. that early_parse_srat is NOT called for ia64, so you break ia64. b. for (i = 0; i < MAX_LOCAL_APIC; i++) set_apicid_to_node(i, NUMA_NO_NODE) still left in numa_init. So it will just clear result from early_parse_srat. it should be moved before that.... c. it breaks ACPI_TABLE_OVERIDE...as the acpi table scan is moved early before override from INITRD is settled. 3. that patch TITLE is total misleading, there is NO x86 in the title, but it changes critical x86 code. It caused x86 guys did not pay attention to find the problem early. Those patches really should be routed via tip/x86/mm. 4. after that commit, following range can not use movable ram: a. real_mode code.... well..funny, legacy Node0 [0,1M) could be hot-removed? b. initrd... it will be freed after booting, so it could be on movable... c. crashkernel for kdump...: looks like we can not put kdump kernel above 4G anymore. d. init_mem_mapping: can not put page table high anymore. e. initmem_init: vmemmap can not be high local node anymore. That is not good. If node is hotplugable, the mem related range like page table and vmemmap could be on the that node without problem and should be on that node. We have workaround patch that could fix some problems, but some can not be fixed. So just remove that offending commit and related ones including:f7210e6c4a
("mm/memblock.c: use CONFIG_HAVE_MEMBLOCK_NODE_MAP to protect movablecore_map in memblock_overlaps_region().")01a178a94e
("acpi, memory-hotplug: support getting hotplug info from SRAT")27168d38fa
("acpi, memory-hotplug: extend movablemem_map ranges to the end of node")e8d1955258
("acpi, memory-hotplug: parse SRAT before memblock is ready")fb06bc8e5f
("page_alloc: bootmem limit with movablecore_map")42f47e27e7
("page_alloc: make movablemem_map have higher priority")6981ec3114
("page_alloc: introduce zone_movable_limit[] to keep movable limit for nodes")34b71f1e04
("page_alloc: add movable_memmap kernel parameter")4d59a75125
("x86: get pg_data_t's memory from other node") Later we should have patches that will make sure kernel put page table and vmemmap on local node ram instead of push them down to node0. Also need to find way to put other kernel used ram to local node ram. Reported-by: Tim Gardner <tim.gardner@canonical.com> Reported-by: Don Morris <don.morris@hp.com> Bisected-by: Don Morris <don.morris@hp.com> Tested-by: Don Morris <don.morris@hp.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Thomas Renninger <trenn@suse.de> Cc: Tejun Heo <tj@kernel.org> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This commit is contained in:
parent
14cc0b55b7
commit
20e6926dcb
10 changed files with 27 additions and 544 deletions
|
@ -1645,42 +1645,6 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
|
|||
that the amount of memory usable for all allocations
|
||||
is not too small.
|
||||
|
||||
movablemem_map=acpi
|
||||
[KNL,X86,IA-64,PPC] This parameter is similar to
|
||||
memmap except it specifies the memory map of
|
||||
ZONE_MOVABLE.
|
||||
This option inform the kernel to use Hot Pluggable bit
|
||||
in flags from SRAT from ACPI BIOS to determine which
|
||||
memory devices could be hotplugged. The corresponding
|
||||
memory ranges will be set as ZONE_MOVABLE.
|
||||
NOTE: Whatever node the kernel resides in will always
|
||||
be un-hotpluggable.
|
||||
|
||||
movablemem_map=nn[KMG]@ss[KMG]
|
||||
[KNL,X86,IA-64,PPC] This parameter is similar to
|
||||
memmap except it specifies the memory map of
|
||||
ZONE_MOVABLE.
|
||||
If user specifies memory ranges, the info in SRAT will
|
||||
be ingored. And it works like the following:
|
||||
- If more ranges are all within one node, then from
|
||||
lowest ss to the end of the node will be ZONE_MOVABLE.
|
||||
- If a range is within a node, then from ss to the end
|
||||
of the node will be ZONE_MOVABLE.
|
||||
- If a range covers two or more nodes, then from ss to
|
||||
the end of the 1st node will be ZONE_MOVABLE, and all
|
||||
the rest nodes will only have ZONE_MOVABLE.
|
||||
If memmap is specified at the same time, the
|
||||
movablemem_map will be limited within the memmap
|
||||
areas. If kernelcore or movablecore is also specified,
|
||||
movablemem_map will have higher priority to be
|
||||
satisfied. So the administrator should be careful that
|
||||
the amount of movablemem_map areas are not too large.
|
||||
Otherwise kernel won't have enough memory to start.
|
||||
NOTE: We don't stop users specifying the node the
|
||||
kernel resides in as hotpluggable so that this
|
||||
option can be used as a workaround of firmware
|
||||
bugs.
|
||||
|
||||
MTD_Partition= [MTD]
|
||||
Format: <name>,<region-number>,<size>,<offset>
|
||||
|
||||
|
|
|
@ -1056,15 +1056,6 @@ void __init setup_arch(char **cmdline_p)
|
|||
setup_bios_corruption_check();
|
||||
#endif
|
||||
|
||||
/*
|
||||
* In the memory hotplug case, the kernel needs info from SRAT to
|
||||
* determine which memory is hotpluggable before allocating memory
|
||||
* using memblock.
|
||||
*/
|
||||
acpi_boot_table_init();
|
||||
early_acpi_boot_init();
|
||||
early_parse_srat();
|
||||
|
||||
#ifdef CONFIG_X86_32
|
||||
printk(KERN_DEBUG "initial memory mapped: [mem 0x00000000-%#010lx]\n",
|
||||
(max_pfn_mapped<<PAGE_SHIFT) - 1);
|
||||
|
@ -1110,6 +1101,10 @@ void __init setup_arch(char **cmdline_p)
|
|||
/*
|
||||
* Parse the ACPI tables for possible boot-time SMP configuration.
|
||||
*/
|
||||
acpi_boot_table_init();
|
||||
|
||||
early_acpi_boot_init();
|
||||
|
||||
initmem_init();
|
||||
memblock_find_dma_reserve();
|
||||
|
||||
|
|
|
@ -212,9 +212,10 @@ static void __init setup_node_data(int nid, u64 start, u64 end)
|
|||
* Allocate node data. Try node-local memory and then any node.
|
||||
* Never allocate in DMA zone.
|
||||
*/
|
||||
nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid);
|
||||
nd_pa = memblock_alloc_nid(nd_size, SMP_CACHE_BYTES, nid);
|
||||
if (!nd_pa) {
|
||||
pr_err("Cannot find %zu bytes in any node\n", nd_size);
|
||||
pr_err("Cannot find %zu bytes in node %d\n",
|
||||
nd_size, nid);
|
||||
return;
|
||||
}
|
||||
nd = __va(nd_pa);
|
||||
|
@ -559,12 +560,10 @@ static int __init numa_init(int (*init_func)(void))
|
|||
for (i = 0; i < MAX_LOCAL_APIC; i++)
|
||||
set_apicid_to_node(i, NUMA_NO_NODE);
|
||||
|
||||
/*
|
||||
* Do not clear numa_nodes_parsed or zero numa_meminfo here, because
|
||||
* SRAT was parsed earlier in early_parse_srat().
|
||||
*/
|
||||
nodes_clear(numa_nodes_parsed);
|
||||
nodes_clear(node_possible_map);
|
||||
nodes_clear(node_online_map);
|
||||
memset(&numa_meminfo, 0, sizeof(numa_meminfo));
|
||||
WARN_ON(memblock_set_node(0, ULLONG_MAX, MAX_NUMNODES));
|
||||
numa_reset_distance();
|
||||
|
||||
|
|
|
@ -141,126 +141,11 @@ static inline int save_add_info(void) {return 1;}
|
|||
static inline int save_add_info(void) {return 0;}
|
||||
#endif
|
||||
|
||||
#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
|
||||
static void __init
|
||||
handle_movablemem(int node, u64 start, u64 end, u32 hotpluggable)
|
||||
{
|
||||
int overlap, i;
|
||||
unsigned long start_pfn, end_pfn;
|
||||
|
||||
start_pfn = PFN_DOWN(start);
|
||||
end_pfn = PFN_UP(end);
|
||||
|
||||
/*
|
||||
* For movablemem_map=acpi:
|
||||
*
|
||||
* SRAT: |_____| |_____| |_________| |_________| ......
|
||||
* node id: 0 1 1 2
|
||||
* hotpluggable: n y y n
|
||||
* movablemem_map: |_____| |_________|
|
||||
*
|
||||
* Using movablemem_map, we can prevent memblock from allocating memory
|
||||
* on ZONE_MOVABLE at boot time.
|
||||
*
|
||||
* Before parsing SRAT, memblock has already reserve some memory ranges
|
||||
* for other purposes, such as for kernel image. We cannot prevent
|
||||
* kernel from using these memory, so we need to exclude these memory
|
||||
* even if it is hotpluggable.
|
||||
* Furthermore, to ensure the kernel has enough memory to boot, we make
|
||||
* all the memory on the node which the kernel resides in
|
||||
* un-hotpluggable.
|
||||
*/
|
||||
if (hotpluggable && movablemem_map.acpi) {
|
||||
/* Exclude ranges reserved by memblock. */
|
||||
struct memblock_type *rgn = &memblock.reserved;
|
||||
|
||||
for (i = 0; i < rgn->cnt; i++) {
|
||||
if (end <= rgn->regions[i].base ||
|
||||
start >= rgn->regions[i].base +
|
||||
rgn->regions[i].size)
|
||||
continue;
|
||||
|
||||
/*
|
||||
* If the memory range overlaps the memory reserved by
|
||||
* memblock, then the kernel resides in this node.
|
||||
*/
|
||||
node_set(node, movablemem_map.numa_nodes_kernel);
|
||||
|
||||
goto out;
|
||||
}
|
||||
|
||||
/*
|
||||
* If the kernel resides in this node, then the whole node
|
||||
* should not be hotpluggable.
|
||||
*/
|
||||
if (node_isset(node, movablemem_map.numa_nodes_kernel))
|
||||
goto out;
|
||||
|
||||
insert_movablemem_map(start_pfn, end_pfn);
|
||||
|
||||
/*
|
||||
* numa_nodes_hotplug nodemask represents which nodes are put
|
||||
* into movablemem_map.map[].
|
||||
*/
|
||||
node_set(node, movablemem_map.numa_nodes_hotplug);
|
||||
goto out;
|
||||
}
|
||||
|
||||
/*
|
||||
* For movablemem_map=nn[KMG]@ss[KMG]:
|
||||
*
|
||||
* SRAT: |_____| |_____| |_________| |_________| ......
|
||||
* node id: 0 1 1 2
|
||||
* user specified: |__| |___|
|
||||
* movablemem_map: |___| |_________| |______| ......
|
||||
*
|
||||
* Using movablemem_map, we can prevent memblock from allocating memory
|
||||
* on ZONE_MOVABLE at boot time.
|
||||
*
|
||||
* NOTE: In this case, SRAT info will be ingored.
|
||||
*/
|
||||
overlap = movablemem_map_overlap(start_pfn, end_pfn);
|
||||
if (overlap >= 0) {
|
||||
/*
|
||||
* If part of this range is in movablemem_map, we need to
|
||||
* add the range after it to extend the range to the end
|
||||
* of the node, because from the min address specified to
|
||||
* the end of the node will be ZONE_MOVABLE.
|
||||
*/
|
||||
start_pfn = max(start_pfn,
|
||||
movablemem_map.map[overlap].start_pfn);
|
||||
insert_movablemem_map(start_pfn, end_pfn);
|
||||
|
||||
/*
|
||||
* Set the nodemask, so that if the address range on one node
|
||||
* is not continuse, we can add the subsequent ranges on the
|
||||
* same node into movablemem_map.
|
||||
*/
|
||||
node_set(node, movablemem_map.numa_nodes_hotplug);
|
||||
} else {
|
||||
if (node_isset(node, movablemem_map.numa_nodes_hotplug))
|
||||
/*
|
||||
* Insert the range if we already have movable ranges
|
||||
* on the same node.
|
||||
*/
|
||||
insert_movablemem_map(start_pfn, end_pfn);
|
||||
}
|
||||
out:
|
||||
return;
|
||||
}
|
||||
#else /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
|
||||
static inline void
|
||||
handle_movablemem(int node, u64 start, u64 end, u32 hotpluggable)
|
||||
{
|
||||
}
|
||||
#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
|
||||
|
||||
/* Callback for parsing of the Proximity Domain <-> Memory Area mappings */
|
||||
int __init
|
||||
acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
|
||||
{
|
||||
u64 start, end;
|
||||
u32 hotpluggable;
|
||||
int node, pxm;
|
||||
|
||||
if (srat_disabled())
|
||||
|
@ -269,8 +154,7 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
|
|||
goto out_err_bad_srat;
|
||||
if ((ma->flags & ACPI_SRAT_MEM_ENABLED) == 0)
|
||||
goto out_err;
|
||||
hotpluggable = ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE;
|
||||
if (hotpluggable && !save_add_info())
|
||||
if ((ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) && !save_add_info())
|
||||
goto out_err;
|
||||
|
||||
start = ma->base_address;
|
||||
|
@ -290,12 +174,9 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
|
|||
|
||||
node_set(node, numa_nodes_parsed);
|
||||
|
||||
printk(KERN_INFO "SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx] %s\n",
|
||||
printk(KERN_INFO "SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx]\n",
|
||||
node, pxm,
|
||||
(unsigned long long) start, (unsigned long long) end - 1,
|
||||
hotpluggable ? "Hot Pluggable": "");
|
||||
|
||||
handle_movablemem(node, start, end, hotpluggable);
|
||||
(unsigned long long) start, (unsigned long long) end - 1);
|
||||
|
||||
return 0;
|
||||
out_err_bad_srat:
|
||||
|
|
|
@ -282,10 +282,10 @@ acpi_table_parse_srat(enum acpi_srat_type id,
|
|||
handler, max_entries);
|
||||
}
|
||||
|
||||
static int srat_mem_cnt;
|
||||
|
||||
void __init early_parse_srat(void)
|
||||
int __init acpi_numa_init(void)
|
||||
{
|
||||
int cnt = 0;
|
||||
|
||||
/*
|
||||
* Should not limit number with cpu num that is from NR_CPUS or nr_cpus=
|
||||
* SRAT cpu entries could have different order with that in MADT.
|
||||
|
@ -295,24 +295,21 @@ void __init early_parse_srat(void)
|
|||
/* SRAT: Static Resource Affinity Table */
|
||||
if (!acpi_table_parse(ACPI_SIG_SRAT, acpi_parse_srat)) {
|
||||
acpi_table_parse_srat(ACPI_SRAT_TYPE_X2APIC_CPU_AFFINITY,
|
||||
acpi_parse_x2apic_affinity, 0);
|
||||
acpi_parse_x2apic_affinity, 0);
|
||||
acpi_table_parse_srat(ACPI_SRAT_TYPE_CPU_AFFINITY,
|
||||
acpi_parse_processor_affinity, 0);
|
||||
srat_mem_cnt = acpi_table_parse_srat(ACPI_SRAT_TYPE_MEMORY_AFFINITY,
|
||||
acpi_parse_memory_affinity,
|
||||
NR_NODE_MEMBLKS);
|
||||
acpi_parse_processor_affinity, 0);
|
||||
cnt = acpi_table_parse_srat(ACPI_SRAT_TYPE_MEMORY_AFFINITY,
|
||||
acpi_parse_memory_affinity,
|
||||
NR_NODE_MEMBLKS);
|
||||
}
|
||||
}
|
||||
|
||||
int __init acpi_numa_init(void)
|
||||
{
|
||||
/* SLIT: System Locality Information Table */
|
||||
acpi_table_parse(ACPI_SIG_SLIT, acpi_parse_slit);
|
||||
|
||||
acpi_numa_arch_fixup();
|
||||
|
||||
if (srat_mem_cnt < 0)
|
||||
return srat_mem_cnt;
|
||||
if (cnt < 0)
|
||||
return cnt;
|
||||
else if (!parsed_numa_memblks)
|
||||
return -ENOENT;
|
||||
return 0;
|
||||
|
|
|
@ -485,14 +485,6 @@ static inline bool acpi_driver_match_device(struct device *dev,
|
|||
|
||||
#endif /* !CONFIG_ACPI */
|
||||
|
||||
#ifdef CONFIG_ACPI_NUMA
|
||||
void __init early_parse_srat(void);
|
||||
#else
|
||||
static inline void early_parse_srat(void)
|
||||
{
|
||||
}
|
||||
#endif
|
||||
|
||||
#ifdef CONFIG_ACPI
|
||||
void acpi_os_set_prepare_sleep(int (*func)(u8 sleep_state,
|
||||
u32 pm1a_ctrl, u32 pm1b_ctrl));
|
||||
|
|
|
@ -42,7 +42,6 @@ struct memblock {
|
|||
|
||||
extern struct memblock memblock;
|
||||
extern int memblock_debug;
|
||||
extern struct movablemem_map movablemem_map;
|
||||
|
||||
#define memblock_dbg(fmt, ...) \
|
||||
if (memblock_debug) printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
|
||||
|
@ -61,7 +60,6 @@ int memblock_reserve(phys_addr_t base, phys_addr_t size);
|
|||
void memblock_trim_memory(phys_addr_t align);
|
||||
|
||||
#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
|
||||
|
||||
void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn,
|
||||
unsigned long *out_end_pfn, int *out_nid);
|
||||
|
||||
|
|
|
@ -1333,24 +1333,6 @@ extern void free_bootmem_with_active_regions(int nid,
|
|||
unsigned long max_low_pfn);
|
||||
extern void sparse_memory_present_with_active_regions(int nid);
|
||||
|
||||
#define MOVABLEMEM_MAP_MAX MAX_NUMNODES
|
||||
struct movablemem_entry {
|
||||
unsigned long start_pfn; /* start pfn of memory segment */
|
||||
unsigned long end_pfn; /* end pfn of memory segment (exclusive) */
|
||||
};
|
||||
|
||||
struct movablemem_map {
|
||||
bool acpi; /* true if using SRAT info */
|
||||
int nr_map;
|
||||
struct movablemem_entry map[MOVABLEMEM_MAP_MAX];
|
||||
nodemask_t numa_nodes_hotplug; /* on which nodes we specify memory */
|
||||
nodemask_t numa_nodes_kernel; /* on which nodes kernel resides in */
|
||||
};
|
||||
|
||||
extern void __init insert_movablemem_map(unsigned long start_pfn,
|
||||
unsigned long end_pfn);
|
||||
extern int __init movablemem_map_overlap(unsigned long start_pfn,
|
||||
unsigned long end_pfn);
|
||||
#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
|
||||
|
||||
#if !defined(CONFIG_HAVE_MEMBLOCK_NODE_MAP) && \
|
||||
|
|
|
@ -92,58 +92,9 @@ static long __init_memblock memblock_overlaps_region(struct memblock_type *type,
|
|||
*
|
||||
* Find @size free area aligned to @align in the specified range and node.
|
||||
*
|
||||
* If we have CONFIG_HAVE_MEMBLOCK_NODE_MAP defined, we need to check if the
|
||||
* memory we found if not in hotpluggable ranges.
|
||||
*
|
||||
* RETURNS:
|
||||
* Found address on success, %0 on failure.
|
||||
*/
|
||||
#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
|
||||
phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t start,
|
||||
phys_addr_t end, phys_addr_t size,
|
||||
phys_addr_t align, int nid)
|
||||
{
|
||||
phys_addr_t this_start, this_end, cand;
|
||||
u64 i;
|
||||
int curr = movablemem_map.nr_map - 1;
|
||||
|
||||
/* pump up @end */
|
||||
if (end == MEMBLOCK_ALLOC_ACCESSIBLE)
|
||||
end = memblock.current_limit;
|
||||
|
||||
/* avoid allocating the first page */
|
||||
start = max_t(phys_addr_t, start, PAGE_SIZE);
|
||||
end = max(start, end);
|
||||
|
||||
for_each_free_mem_range_reverse(i, nid, &this_start, &this_end, NULL) {
|
||||
this_start = clamp(this_start, start, end);
|
||||
this_end = clamp(this_end, start, end);
|
||||
|
||||
restart:
|
||||
if (this_end <= this_start || this_end < size)
|
||||
continue;
|
||||
|
||||
for (; curr >= 0; curr--) {
|
||||
if ((movablemem_map.map[curr].start_pfn << PAGE_SHIFT)
|
||||
< this_end)
|
||||
break;
|
||||
}
|
||||
|
||||
cand = round_down(this_end - size, align);
|
||||
if (curr >= 0 &&
|
||||
cand < movablemem_map.map[curr].end_pfn << PAGE_SHIFT) {
|
||||
this_end = movablemem_map.map[curr].start_pfn
|
||||
<< PAGE_SHIFT;
|
||||
goto restart;
|
||||
}
|
||||
|
||||
if (cand >= this_start)
|
||||
return cand;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
#else /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
|
||||
phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t start,
|
||||
phys_addr_t end, phys_addr_t size,
|
||||
phys_addr_t align, int nid)
|
||||
|
@ -172,7 +123,6 @@ phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t start,
|
|||
}
|
||||
return 0;
|
||||
}
|
||||
#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
|
||||
|
||||
/**
|
||||
* memblock_find_in_range - find free area in given range
|
||||
|
|
285
mm/page_alloc.c
285
mm/page_alloc.c
|
@ -202,18 +202,11 @@ static unsigned long __meminitdata nr_all_pages;
|
|||
static unsigned long __meminitdata dma_reserve;
|
||||
|
||||
#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
|
||||
/* Movable memory ranges, will also be used by memblock subsystem. */
|
||||
struct movablemem_map movablemem_map = {
|
||||
.acpi = false,
|
||||
.nr_map = 0,
|
||||
};
|
||||
|
||||
static unsigned long __meminitdata arch_zone_lowest_possible_pfn[MAX_NR_ZONES];
|
||||
static unsigned long __meminitdata arch_zone_highest_possible_pfn[MAX_NR_ZONES];
|
||||
static unsigned long __initdata required_kernelcore;
|
||||
static unsigned long __initdata required_movablecore;
|
||||
static unsigned long __meminitdata zone_movable_pfn[MAX_NUMNODES];
|
||||
static unsigned long __meminitdata zone_movable_limit[MAX_NUMNODES];
|
||||
|
||||
/* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */
|
||||
int movable_zone;
|
||||
|
@ -4412,77 +4405,6 @@ static unsigned long __meminit zone_absent_pages_in_node(int nid,
|
|||
return __absent_pages_in_range(nid, zone_start_pfn, zone_end_pfn);
|
||||
}
|
||||
|
||||
/**
|
||||
* sanitize_zone_movable_limit - Sanitize the zone_movable_limit array.
|
||||
*
|
||||
* zone_movable_limit is initialized as 0. This function will try to get
|
||||
* the first ZONE_MOVABLE pfn of each node from movablemem_map, and
|
||||
* assigne them to zone_movable_limit.
|
||||
* zone_movable_limit[nid] == 0 means no limit for the node.
|
||||
*
|
||||
* Note: Each range is represented as [start_pfn, end_pfn)
|
||||
*/
|
||||
static void __meminit sanitize_zone_movable_limit(void)
|
||||
{
|
||||
int map_pos = 0, i, nid;
|
||||
unsigned long start_pfn, end_pfn;
|
||||
|
||||
if (!movablemem_map.nr_map)
|
||||
return;
|
||||
|
||||
/* Iterate all ranges from minimum to maximum */
|
||||
for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) {
|
||||
/*
|
||||
* If we have found lowest pfn of ZONE_MOVABLE of the node
|
||||
* specified by user, just go on to check next range.
|
||||
*/
|
||||
if (zone_movable_limit[nid])
|
||||
continue;
|
||||
|
||||
#ifdef CONFIG_ZONE_DMA
|
||||
/* Skip DMA memory. */
|
||||
if (start_pfn < arch_zone_highest_possible_pfn[ZONE_DMA])
|
||||
start_pfn = arch_zone_highest_possible_pfn[ZONE_DMA];
|
||||
#endif
|
||||
|
||||
#ifdef CONFIG_ZONE_DMA32
|
||||
/* Skip DMA32 memory. */
|
||||
if (start_pfn < arch_zone_highest_possible_pfn[ZONE_DMA32])
|
||||
start_pfn = arch_zone_highest_possible_pfn[ZONE_DMA32];
|
||||
#endif
|
||||
|
||||
#ifdef CONFIG_HIGHMEM
|
||||
/* Skip lowmem if ZONE_MOVABLE is highmem. */
|
||||
if (zone_movable_is_highmem() &&
|
||||
start_pfn < arch_zone_lowest_possible_pfn[ZONE_HIGHMEM])
|
||||
start_pfn = arch_zone_lowest_possible_pfn[ZONE_HIGHMEM];
|
||||
#endif
|
||||
|
||||
if (start_pfn >= end_pfn)
|
||||
continue;
|
||||
|
||||
while (map_pos < movablemem_map.nr_map) {
|
||||
if (end_pfn <= movablemem_map.map[map_pos].start_pfn)
|
||||
break;
|
||||
|
||||
if (start_pfn >= movablemem_map.map[map_pos].end_pfn) {
|
||||
map_pos++;
|
||||
continue;
|
||||
}
|
||||
|
||||
/*
|
||||
* The start_pfn of ZONE_MOVABLE is either the minimum
|
||||
* pfn specified by movablemem_map, or 0, which means
|
||||
* the node has no ZONE_MOVABLE.
|
||||
*/
|
||||
zone_movable_limit[nid] = max(start_pfn,
|
||||
movablemem_map.map[map_pos].start_pfn);
|
||||
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#else /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
|
||||
static inline unsigned long __meminit zone_spanned_pages_in_node(int nid,
|
||||
unsigned long zone_type,
|
||||
|
@ -4500,6 +4422,7 @@ static inline unsigned long __meminit zone_absent_pages_in_node(int nid,
|
|||
|
||||
return zholes_size[zone_type];
|
||||
}
|
||||
|
||||
#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
|
||||
|
||||
static void __meminit calculate_node_totalpages(struct pglist_data *pgdat,
|
||||
|
@ -4941,19 +4864,12 @@ static void __init find_zone_movable_pfns_for_nodes(void)
|
|||
required_kernelcore = max(required_kernelcore, corepages);
|
||||
}
|
||||
|
||||
/*
|
||||
* If neither kernelcore/movablecore nor movablemem_map is specified,
|
||||
* there is no ZONE_MOVABLE. But if movablemem_map is specified, the
|
||||
* start pfn of ZONE_MOVABLE has been stored in zone_movable_limit[].
|
||||
*/
|
||||
if (!required_kernelcore) {
|
||||
if (movablemem_map.nr_map)
|
||||
memcpy(zone_movable_pfn, zone_movable_limit,
|
||||
sizeof(zone_movable_pfn));
|
||||
/* If kernelcore was not specified, there is no ZONE_MOVABLE */
|
||||
if (!required_kernelcore)
|
||||
goto out;
|
||||
}
|
||||
|
||||
/* usable_startpfn is the lowest possible pfn ZONE_MOVABLE can be at */
|
||||
find_usable_zone_for_movable();
|
||||
usable_startpfn = arch_zone_lowest_possible_pfn[movable_zone];
|
||||
|
||||
restart:
|
||||
|
@ -4981,24 +4897,10 @@ static void __init find_zone_movable_pfns_for_nodes(void)
|
|||
for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, NULL) {
|
||||
unsigned long size_pages;
|
||||
|
||||
/*
|
||||
* Find more memory for kernelcore in
|
||||
* [zone_movable_pfn[nid], zone_movable_limit[nid]).
|
||||
*/
|
||||
start_pfn = max(start_pfn, zone_movable_pfn[nid]);
|
||||
if (start_pfn >= end_pfn)
|
||||
continue;
|
||||
|
||||
if (zone_movable_limit[nid]) {
|
||||
end_pfn = min(end_pfn, zone_movable_limit[nid]);
|
||||
/* No range left for kernelcore in this node */
|
||||
if (start_pfn >= end_pfn) {
|
||||
zone_movable_pfn[nid] =
|
||||
zone_movable_limit[nid];
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
/* Account for what is only usable for kernelcore */
|
||||
if (start_pfn < usable_startpfn) {
|
||||
unsigned long kernel_pages;
|
||||
|
@ -5058,12 +4960,12 @@ static void __init find_zone_movable_pfns_for_nodes(void)
|
|||
if (usable_nodes && required_kernelcore > usable_nodes)
|
||||
goto restart;
|
||||
|
||||
out:
|
||||
/* Align start of ZONE_MOVABLE on all nids to MAX_ORDER_NR_PAGES */
|
||||
for (nid = 0; nid < MAX_NUMNODES; nid++)
|
||||
zone_movable_pfn[nid] =
|
||||
roundup(zone_movable_pfn[nid], MAX_ORDER_NR_PAGES);
|
||||
|
||||
out:
|
||||
/* restore the node_state */
|
||||
node_states[N_MEMORY] = saved_node_state;
|
||||
}
|
||||
|
@ -5126,8 +5028,6 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)
|
|||
|
||||
/* Find the PFNs that ZONE_MOVABLE begins at in each node */
|
||||
memset(zone_movable_pfn, 0, sizeof(zone_movable_pfn));
|
||||
find_usable_zone_for_movable();
|
||||
sanitize_zone_movable_limit();
|
||||
find_zone_movable_pfns_for_nodes();
|
||||
|
||||
/* Print out the zone ranges */
|
||||
|
@ -5211,181 +5111,6 @@ static int __init cmdline_parse_movablecore(char *p)
|
|||
early_param("kernelcore", cmdline_parse_kernelcore);
|
||||
early_param("movablecore", cmdline_parse_movablecore);
|
||||
|
||||
/**
|
||||
* movablemem_map_overlap() - Check if a range overlaps movablemem_map.map[].
|
||||
* @start_pfn: start pfn of the range to be checked
|
||||
* @end_pfn: end pfn of the range to be checked (exclusive)
|
||||
*
|
||||
* This function checks if a given memory range [start_pfn, end_pfn) overlaps
|
||||
* the movablemem_map.map[] array.
|
||||
*
|
||||
* Return: index of the first overlapped element in movablemem_map.map[]
|
||||
* or -1 if they don't overlap each other.
|
||||
*/
|
||||
int __init movablemem_map_overlap(unsigned long start_pfn,
|
||||
unsigned long end_pfn)
|
||||
{
|
||||
int overlap;
|
||||
|
||||
if (!movablemem_map.nr_map)
|
||||
return -1;
|
||||
|
||||
for (overlap = 0; overlap < movablemem_map.nr_map; overlap++)
|
||||
if (start_pfn < movablemem_map.map[overlap].end_pfn)
|
||||
break;
|
||||
|
||||
if (overlap == movablemem_map.nr_map ||
|
||||
end_pfn <= movablemem_map.map[overlap].start_pfn)
|
||||
return -1;
|
||||
|
||||
return overlap;
|
||||
}
|
||||
|
||||
/**
|
||||
* insert_movablemem_map - Insert a memory range in to movablemem_map.map.
|
||||
* @start_pfn: start pfn of the range
|
||||
* @end_pfn: end pfn of the range
|
||||
*
|
||||
* This function will also merge the overlapped ranges, and sort the array
|
||||
* by start_pfn in monotonic increasing order.
|
||||
*/
|
||||
void __init insert_movablemem_map(unsigned long start_pfn,
|
||||
unsigned long end_pfn)
|
||||
{
|
||||
int pos, overlap;
|
||||
|
||||
/*
|
||||
* pos will be at the 1st overlapped range, or the position
|
||||
* where the element should be inserted.
|
||||
*/
|
||||
for (pos = 0; pos < movablemem_map.nr_map; pos++)
|
||||
if (start_pfn <= movablemem_map.map[pos].end_pfn)
|
||||
break;
|
||||
|
||||
/* If there is no overlapped range, just insert the element. */
|
||||
if (pos == movablemem_map.nr_map ||
|
||||
end_pfn < movablemem_map.map[pos].start_pfn) {
|
||||
/*
|
||||
* If pos is not the end of array, we need to move all
|
||||
* the rest elements backward.
|
||||
*/
|
||||
if (pos < movablemem_map.nr_map)
|
||||
memmove(&movablemem_map.map[pos+1],
|
||||
&movablemem_map.map[pos],
|
||||
sizeof(struct movablemem_entry) *
|
||||
(movablemem_map.nr_map - pos));
|
||||
movablemem_map.map[pos].start_pfn = start_pfn;
|
||||
movablemem_map.map[pos].end_pfn = end_pfn;
|
||||
movablemem_map.nr_map++;
|
||||
return;
|
||||
}
|
||||
|
||||
/* overlap will be at the last overlapped range */
|
||||
for (overlap = pos + 1; overlap < movablemem_map.nr_map; overlap++)
|
||||
if (end_pfn < movablemem_map.map[overlap].start_pfn)
|
||||
break;
|
||||
|
||||
/*
|
||||
* If there are more ranges overlapped, we need to merge them,
|
||||
* and move the rest elements forward.
|
||||
*/
|
||||
overlap--;
|
||||
movablemem_map.map[pos].start_pfn = min(start_pfn,
|
||||
movablemem_map.map[pos].start_pfn);
|
||||
movablemem_map.map[pos].end_pfn = max(end_pfn,
|
||||
movablemem_map.map[overlap].end_pfn);
|
||||
|
||||
if (pos != overlap && overlap + 1 != movablemem_map.nr_map)
|
||||
memmove(&movablemem_map.map[pos+1],
|
||||
&movablemem_map.map[overlap+1],
|
||||
sizeof(struct movablemem_entry) *
|
||||
(movablemem_map.nr_map - overlap - 1));
|
||||
|
||||
movablemem_map.nr_map -= overlap - pos;
|
||||
}
|
||||
|
||||
/**
|
||||
* movablemem_map_add_region - Add a memory range into movablemem_map.
|
||||
* @start: physical start address of range
|
||||
* @end: physical end address of range
|
||||
*
|
||||
* This function transform the physical address into pfn, and then add the
|
||||
* range into movablemem_map by calling insert_movablemem_map().
|
||||
*/
|
||||
static void __init movablemem_map_add_region(u64 start, u64 size)
|
||||
{
|
||||
unsigned long start_pfn, end_pfn;
|
||||
|
||||
/* In case size == 0 or start + size overflows */
|
||||
if (start + size <= start)
|
||||
return;
|
||||
|
||||
if (movablemem_map.nr_map >= ARRAY_SIZE(movablemem_map.map)) {
|
||||
pr_err("movablemem_map: too many entries;"
|
||||
" ignoring [mem %#010llx-%#010llx]\n",
|
||||
(unsigned long long) start,
|
||||
(unsigned long long) (start + size - 1));
|
||||
return;
|
||||
}
|
||||
|
||||
start_pfn = PFN_DOWN(start);
|
||||
end_pfn = PFN_UP(start + size);
|
||||
insert_movablemem_map(start_pfn, end_pfn);
|
||||
}
|
||||
|
||||
/*
|
||||
* cmdline_parse_movablemem_map - Parse boot option movablemem_map.
|
||||
* @p: The boot option of the following format:
|
||||
* movablemem_map=nn[KMG]@ss[KMG]
|
||||
*
|
||||
* This option sets the memory range [ss, ss+nn) to be used as movable memory.
|
||||
*
|
||||
* Return: 0 on success or -EINVAL on failure.
|
||||
*/
|
||||
static int __init cmdline_parse_movablemem_map(char *p)
|
||||
{
|
||||
char *oldp;
|
||||
u64 start_at, mem_size;
|
||||
|
||||
if (!p)
|
||||
goto err;
|
||||
|
||||
if (!strcmp(p, "acpi"))
|
||||
movablemem_map.acpi = true;
|
||||
|
||||
/*
|
||||
* If user decide to use info from BIOS, all the other user specified
|
||||
* ranges will be ingored.
|
||||
*/
|
||||
if (movablemem_map.acpi) {
|
||||
if (movablemem_map.nr_map) {
|
||||
memset(movablemem_map.map, 0,
|
||||
sizeof(struct movablemem_entry)
|
||||
* movablemem_map.nr_map);
|
||||
movablemem_map.nr_map = 0;
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
oldp = p;
|
||||
mem_size = memparse(p, &p);
|
||||
if (p == oldp)
|
||||
goto err;
|
||||
|
||||
if (*p == '@') {
|
||||
oldp = ++p;
|
||||
start_at = memparse(p, &p);
|
||||
if (p == oldp || *p != '\0')
|
||||
goto err;
|
||||
|
||||
movablemem_map_add_region(start_at, mem_size);
|
||||
return 0;
|
||||
}
|
||||
err:
|
||||
return -EINVAL;
|
||||
}
|
||||
early_param("movablemem_map", cmdline_parse_movablemem_map);
|
||||
|
||||
#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
|
||||
|
||||
/**
|
||||
|
|
Loading…
Reference in a new issue