Write a Memory Allocator in C

Students who have learned C language must have used malloc, calloc, realloc, and free. Learning to manage memory on your own can improve program performance. However, using them without understanding their internal principles can easily lead to memory overflow, dangling pointers, and other issues.

Through this article, you will learn about concepts such as memory, virtual memory, etc., and you will write your own memory manager (Memory Allocator).

Memory (RAM) and Hard Disk (DRIVER)#

Some people confuse the concepts of memory and hard disk. Generally, memory is hardware used to store the binary instructions of programs and related variables, while the hard disk is hardware used to store large amounts of data.

In fact, memory and hard disk are essentially the same; they are both computer storage (memory) used to store certain data. Memory has a fast access speed but is expensive per unit, so it is used in places that require frequent read and write, such as storing program context. The hard disk has a slower read and write speed compared to memory but has a large capacity, making it suitable for long-term storage of files that do not require frequent read and write.

Memory can be seen as a cache for the hard disk. When a segment of data is needed, data from the hard disk can be temporarily copied to memory. When the computer needs this data, it can be directly retrieved from the fast-access memory.

Such caching mechanisms are ubiquitous in computers. In the CPU, in addition to the ALU (Arithmetic Logic Unit), there is another important part called registers, which are also a type of computer storage and can be seen as a cache for memory. This statement is somewhat one-sided; the CPU also has a mechanism called L3 cache, which will not be elaborated on here. Interested readers can look it up themselves.

Virtual Memory#

Virtual memory is an abstraction of a block of memory by the computer system (OS). From the perspective of an application, it occupies all the memory of the computer (a contiguous available address space). However, in reality, this memory is fragmented memory exposed to the application by the computer system. It may not only contain a portion of physical memory but also include hard disk space.

VirtualMem01

Without virtual memory, processes would compete for memory resources, and even access out of bounds, which could easily lead to memory leaks. Additionally, since the memory of a process is stored in blocks, small portions of unallocated memory cannot be requested by larger processes, leading to some resource waste.

By using virtual memory to manage physical memory, we can artificially divide physical memory into chunks, with multiple chunks forming a page, and then write the occupancy status of the pages into a page table. This allows us to store the memory required by processes in a chunked manner, and we do not need to keep this memory concentrated; instead, we can place it anywhere. We only need to record the mapping relationship (index) between virtual memory and physical memory, store the mapping relationship in the page table, and we can find the physical memory address based on this information. Moreover, these operations are completed by hardware, which is extremely fast, greatly increasing the availability and efficiency of memory.

The CPU first receives the operation instruction and sends the virtual address to the MMU (Memory Management Unit). The MMU accesses the physical memory (actually accessing the cache of physical memory, TLB) to obtain the page table, then finds its mapping relationship and calculates the physical address, which is then handed over to the CPU, allowing the CPU to access the physical memory to retrieve the data.

Structure of Memory#

Before we start operating on memory, let's first understand the structure of memory.

Text section
Data section
BSS
Heap
Stack

memlayout

The memory structure we operate on when using malloc is the Heap. At its end, there is a pointer called brk, which points to the top of the heap.

This is a memory structure that can be dynamically allocated, allowing for a certain amount of memory to be requested dynamically based on demand, while also supporting deletion operations. We generally perform data operations on the requested address using pointers.

Operating on memory does not require us to directly manipulate hardware; we can directly use the system's encapsulated API sbrk().
sbrk(0) returns the memory address of the current program execution position.
sbrk(x) causes brk to grow x bytes in the positive direction.
sbrk(-x) causes brk to grow x bytes in the negative direction.
If sbrk() fails to complete the operation for some reason, it will return (void*) -1.

malloc()#

Alright, so far we can finally write a simple malloc():

void* malloc(size_t size)
{
	void* block;
	block = sbrk(size);
	if (block == (void*) -1)
		return NULL;
	return block;
}

It looks good, but it's not perfect. When writing the free() function, we accept a pointer to release the memory associated with it, but how do we know the size of the memory to be released?

We can define a Header at the beginning of each address:

struct header_t {
	size_t size;
	unsigned is_free;
};

size is used to store the size of the memory pointed to by Header.
is_free is used to mark whether this block of memory is occupied.
Now our block of memory looks like this:

屏幕截图 2023-01-17 051214

We also need a method to traverse this memory because we need to traverse to find a block of unoccupied memory. A linked list is a good solution:

struct header_t {
	size_t size;
	unsigned is_free;
	struct header_t* next;
};

Now it looks like this:

屏幕截图 2023-01-17 051214 - 副本

Since we manage memory in blocks, we need to perform alignment operations:

typedef char ALIGN[16];

union header {
	struct {
		size_t size;
		unsigned is_free;
		union header *next;
	} s;
	ALIGN stub;
};
typedef union header header_t;

We also need to define head and tail nodes:

header_t *head, *tail;

To avoid multiple threads calling memory simultaneously, we need to define a global lock, releasing it after requesting an address:

pthread_mutex_t global_malloc_lock;

Now our version looks like this:

void* malloc(size_t size) {
    size_t total_size;
    header_t* header;
    void* block;

    if (size <= 0) {
        return NULL;
    };

    pthread_mutex_lock(&global_malloc_lock);
    header = get_header_t(size);

    if (header) {
        header->s.isFree = 0;
        pthread_mutex_unlock(&global_malloc_lock);
        return (void*)(header + 1);
    }

    total_size = size + sizeof(header_t);
    block = sbrk(total_size);

    if (block == (void*) - 1) {
        pthread_mutex_unlock(&global_malloc_lock);
        return NULL;
    } else {
        header = block;
    };
    
    header->s.isFree = 0;
    header->s.size = size;
    header->s.next = NULL;

    if (head == NULL)
        head = header;
    if (tail) 
        tail->s.next = header;

    tail = header;
    pthread_mutex_unlock(&global_malloc_lock);
    return (void*)(header + 1);
};

header_t* get_header_t(size_t size) {
    header_t* curr = head;
    while (curr) {
        if (curr->s.isFree && curr->s.size >= size) {
            pthread_mutex_unlock(&global_malloc_lock);
            return (void*)(head);
        };
        curr = curr->s.next;
    }
    return NULL;
}

First, we check whether the requested memory is valid, then look for the pointer to the node with the lowest address (get_header_t()). If found, we mark it as isFree, then unlock and return the pointer to the start of the virtual address. If not found, we call sbrk to request memory and initialize the header, setting it as the tail node, then unlock and return the address.

The total_size includes both the header and the actual required memory size.

free()#

When designing the free() function, one detail to note is to first check whether the address to be released is at the end of the heap. If it is at the end, it is directly handed over to the operating system for release; otherwise, it is marked as isFree, and we will reuse it later.

void free(void* block) {
    if (!block) {
        return NULL;
    }
    
    header_t* header = (header_t*)block - 1;
    void* programbreak = sbrk(0);
    pthread_mutex_lock(&global_malloc_lock);
    
    if ((char*)block + header->s.size == programbreak) {
        size_t total_size = header->s.size + sizeof(header_t);
        //release
        if (head == tail) {
            head = tail = NULL;
        } else {
            header_t* curr = head;
            //change tail
            while (curr->s.next == tail) {
                curr->s.next = NULL;
                tail = curr;
            }
        }
        
        sbrk(-total_size);
        pthread_mutex_unlock(&global_malloc_lock);
        
        return;
    };
    
    header->s.isFree = 1;
    pthread_mutex_unlock(&global_malloc_lock);
}

First, find the position of brk and assign it to programbreak, then check whether the object to be released is at the top of the heap.
char* and size_t, void* have the same byte size:

//test.c
#include <stddef.h>

int main(int argc, char const *argv[]) {
    printf("char* is:%d, char is:%d, void* is:%d, size_t is:%d", sizeof(char*), sizeof(char), sizeof(void*), sizeof(size_t));
    return 0;
}

//char* is:8, char is:1, void* is:8, size_t is:8

If at the top of the heap, then (char*)block + header->s.size == programbreak. Then traverse to find the last block of memory and release it, setting the second-to-last block of memory as tail. If not at the top of the heap, mark it temporarily as isFree, waiting for later reuse.

calloc()#

void* calloc(size_t nsize, size_t num) {
    size_t size;
    void* block;

    if (!num || !nsize)
        return NULL;
        
    size = nsize * num;
    //overflow check
    if (size != nsize * num)
        return NULL;

    block = malloc(size);
    if (!block)
        return NULL;
        
    memset(block, 0, size);
    return block;
}

calloc() requires that the memory block pointed to by the returned address is all Null, which we can set to 0 using memset.

realloc()#

void* realloc(void* block, size_t size) {
    header_t* header;
    void* ret;
    
    if (!block || !size)
        return NULL;
    
    header = (header_t*)block - 1;
    
    if (header->s.size >= size)
        return block;
    
    ret = malloc(size);
    if (ret) {
        memcpy(ret, block, header->s.size);
        free(block);
    }
    return ret;
}

realloc() is somewhat special; when using it, one should note that the address pointed to by the returned pointer may not be the original address. This is because during realloc(), two situations may occur:

The heap memory grows upwards, and the free memory above can accommodate the growth size. In this case, it returns the original address and modifies the header's size.
The free memory above the heap is insufficient, calling malloc to request a new address, and copying the content of the original address to the newly requested address.

Compile#

$ gcc -o memalloc.so -fPIC -shared memalloc.c
$ export LD_PRELOAD=$PWD/memalloc.so

$ ls
memalloc.c		memalloc.so

Extensions#

C/C++ has always been known for its superior performance, but this has also been a point of criticism. If users lack a deep understanding of computer systems, it is easy to make mistakes. Other high-level languages automate memory management and refer to their mechanism as garbage collection (OC), which can be summarized as: if an object is unreachable, it will be reclaimed by the garbage collector.

References#

Memory Allocators 101