Hi Ryan, really nice talk. One thing that keeps doubting if I could use this, is the multi-threading aspect of it. What if you have a whole bunch of worker threads making things (think future/promise paradigm). By the time you want to clean up the things the worker-threads produced (assume they are in a pool with a free-list), there is no easy way to clean them up, because you don't know which thread has allocated them. I always found it puzzling to think of shared resources. I see how allocating things within a thread for in-thread use is trivial (and you explained that well), but as soon as you start sharing dynamically allocated resources across threads, the whole arena-idea becomes tough to wrap my head around.
Allocating them up front from the main thread and handing out memory is not possible for several reasons, such as: the size is not known up front (which makes the memory pool idea also harder), the allocations would be large either way, and they would require too much memory to preallocate them all (think: a fairly large graph of tasks being executed by some worker threads, and every task has some dependencies and some products. The products are allocated during the runtime of the task. Once all dependencies of a task are ready, the task can start. So, imagine a serial chain of tasks (instead of a more complex graph), you see how task N will require the products of task N-1, but not of task N-2 anymore; hence the possibility to reduce peak memory usage by a lot: you can free up the products of earlier tasks that are no longer needed by any other task).
Basically, you could make a mutex-protected global arena, with a pool on top, which .... is pretty much malloc/free, but probably with worse performance (i.e., higher memory usage, fragmentation, etc).
Servicing requests doesn't inherently require sharing resources across threads. And sharing resources across threads decreases efficiency in all cases that I recall. Care to shed some light on exceptions you've encountered? Or are you just talking about single-threaded vs. multi-threading?
Ideally you would be able to scale to multiple threads (though in my use case it's just a basic server for testing locally, so it can just be a single-threaded).
You wanna be able to accept a new connection on any thread, and allocate at least a struct with some data and a big buffer for it (which windows will read/write to/from sometime later).
If you had a more generic server you might even want to have multiple reads/writes in flight at the same time, which would require more allocations.
You could do what Ryan said, where you would make a new arena for each new connection (which would get it's own VirtualAlloc()?). But I don't really see the point when you can have like a simple generic-purpose half-fit allocator or something.
The echo is a bit annoying in this one? You could run the audio through something like to clean it up: https://podcast.adobe.com/enhance It's going to make a huge difference!
Thanks for the feedback! Replaced the audio in the video by just making the right channel mono (since it seems like that was from the actual microphone), and it sounds much better now.
I think the "echo" you're referring to is just the left channel, which seems to be recorded from the built-in camera mic. Isolating the right channel (which seems to be recorded from the lav mic Ryan is wearing) will probably suffice.
Yeah, you right. It's not really an echo. Also, for people not wearing a headset it's probably not that bad either. But the left and right channel is very different, which makes it uncomfortable to listen too on a headset. For me anyway.
In case it helps, I tried a quick fix using a Firefox extension that can mono the right channel (sound fixer), and I could actually watch the video with just the good audio.
I've been thinking, could you use the virtual memory strategy to create growable queues as well? Because an arena is basically a stack with 1 pointer for push and pop, we could also imagine 'pop' being 'dequeue' and using 2 pointers for both enqueues and dequeues. If we then dequeue a lot, we might decommit the pages with lower addresses. This should then have all the benefits of a growable stack but with queues, right?
Hi Ryan, really nice talk. One thing that keeps doubting if I could use this, is the multi-threading aspect of it. What if you have a whole bunch of worker threads making things (think future/promise paradigm). By the time you want to clean up the things the worker-threads produced (assume they are in a pool with a free-list), there is no easy way to clean them up, because you don't know which thread has allocated them. I always found it puzzling to think of shared resources. I see how allocating things within a thread for in-thread use is trivial (and you explained that well), but as soon as you start sharing dynamically allocated resources across threads, the whole arena-idea becomes tough to wrap my head around.
Allocating them up front from the main thread and handing out memory is not possible for several reasons, such as: the size is not known up front (which makes the memory pool idea also harder), the allocations would be large either way, and they would require too much memory to preallocate them all (think: a fairly large graph of tasks being executed by some worker threads, and every task has some dependencies and some products. The products are allocated during the runtime of the task. Once all dependencies of a task are ready, the task can start. So, imagine a serial chain of tasks (instead of a more complex graph), you see how task N will require the products of task N-1, but not of task N-2 anymore; hence the possibility to reduce peak memory usage by a lot: you can free up the products of earlier tasks that are no longer needed by any other task).
Basically, you could make a mutex-protected global arena, with a pool on top, which .... is pretty much malloc/free, but probably with worse performance (i.e., higher memory usage, fragmentation, etc).
When is sharing resources across threads ever a good thing?
When you're trying to do async stuff like with IOCP on windows.
You want any available thread to be able to service any (e.g. http) request.
Servicing requests doesn't inherently require sharing resources across threads. And sharing resources across threads decreases efficiency in all cases that I recall. Care to shed some light on exceptions you've encountered? Or are you just talking about single-threaded vs. multi-threading?
Ideally you would be able to scale to multiple threads (though in my use case it's just a basic server for testing locally, so it can just be a single-threaded).
You wanna be able to accept a new connection on any thread, and allocate at least a struct with some data and a big buffer for it (which windows will read/write to/from sometime later).
If you had a more generic server you might even want to have multiple reads/writes in flight at the same time, which would require more allocations.
You could do what Ryan said, where you would make a new arena for each new connection (which would get it's own VirtualAlloc()?). But I don't really see the point when you can have like a simple generic-purpose half-fit allocator or something.
Nice talk. Really concise and well explained.
The echo is a bit annoying in this one? You could run the audio through something like to clean it up: https://podcast.adobe.com/enhance It's going to make a huge difference!
Edit: Was curious how well it did, made a small sample: https://tmpfiles.org/dl/5874335/transcodedtrimmedenhancedtrimmed.mkv
Thanks for the feedback! Replaced the audio in the video by just making the right channel mono (since it seems like that was from the actual microphone), and it sounds much better now.
I think the "echo" you're referring to is just the left channel, which seems to be recorded from the built-in camera mic. Isolating the right channel (which seems to be recorded from the lav mic Ryan is wearing) will probably suffice.
Yeah, you right. It's not really an echo. Also, for people not wearing a headset it's probably not that bad either. But the left and right channel is very different, which makes it uncomfortable to listen too on a headset. For me anyway.
In case it helps, I tried a quick fix using a Firefox extension that can mono the right channel (sound fixer), and I could actually watch the video with just the good audio.
Uuuh, so much better. Great fix - thanks!
This talk changed my perspective and made memory management very easy to do. Thank you 🙏😊
Hi! Is the example code available somewhere?
I've been thinking, could you use the virtual memory strategy to create growable queues as well? Because an arena is basically a stack with 1 pointer for push and pop, we could also imagine 'pop' being 'dequeue' and using 2 pointers for both enqueues and dequeues. If we then dequeue a lot, we might decommit the pages with lower addresses. This should then have all the benefits of a growable stack but with queues, right?