Isn't the case with packing multiple elements in a node essentially a different data structure? Yes, nominally it is a linked list in that you link nodes and content can be whatever, but packing elements makes it a hybrid, if you consider the element the basic data unit of the whole structure. Semantically it is to linked lists what a B-tree is to balanced trees. (And surely not a coincidence that in practice B-trees often exhibit massive performance benefits compared to "classic" balanced trees (e.g. BST, AVL, red-black) in that domain.)
It certainly is in tune with the composable data structures theme, but also likely not what people readily associate with the "linked list" moniker.
Casey in his refterm lecture said the "array is faster than linked list" statement is "fake optimization". It's at best neutral or worse counter-productive https://youtu.be/pgoetgxecw8?t=722
What do you think about having two separate arrays: one for the nodes (which just hold a pointer to the next node and a pointer to the value) and one for the actual values?
I totally agree with Casey's statement. Statements like that are nonsensical.
Regarding splitting memory for nodes and values, whether or not that's a good idea depends totally on what you're doing. I don't think it's worth optimizing past the nodes-and-values-on-an-arena case until you're in more concrete circumstances, when you know about your access patterns, what operations you're doing most frequently, what state you require (and what state you don't), and so on. Basically, this isn't the layer at which it's worth making decisions about that kind of thing.
Thanks for the great post. It made me realize that I am also dogmatically leaning on contiguous arrays far too much "because cache", creating unnecessarily complex allocation code just to manage them.
I'm particularly interested in the idea of storing a number of entities per linked list node, to be able to handle a large number of total entities and still iterate over them with relative speed. However, I'm wondering how one would handle updating entity references when the block is updated with additions/deletions/insertions with such an arrangement?
My guess is each entity would require a pointer to the block, and an index to the entity within that block. To delete an entity, we might need to do something like marking it, and then freeing the block when all entities are marked? Maybe it's not the best approach for my particular situation: I also need to sort entities, which would ruin any references if using anything other than a single entity per node linked list.
> However, I'm wondering how one would handle updating entity references
Personally I always opt for pointer stability. I don't believe in shifting things around to ensure contiguity of "active" things in an array. I used to have a blog post on this on my old website - I should probably boil it down and transfer its contents to my Substack at this point.
As for the allocation scheme, yeah basically the idea is that allocations should always prefer open slots in existing blocks. There is of course a worst case - allocate 1,000,000 entities, then free all-but-one in each block, leaving you with memory for 1,000,000 and no way to free them, but this is an unrealistically pathological case.
As for sorting, I don't think that needs to happen at the storage layer, and instead I'd opt for layering it on top (sorting e.g. a list of references) - like I said, I always prefer pointer stability if it's feasible, and with stuff like game entities, it seems to always be feasible.
> As for sorting, I don't think that needs to happen at the storage layer, and instead I'd opt for layering it on top (sorting e.g. a list of references)
I've been trying to implement this over the last couple of days, and it may be a small point, but I was wondering if by this, you meant that the sorted would be completely separate (i.e, a separate linked list containing references to individual block elements)?
Or were you meaning having next/prev pointers directly in the block elements themselves, and use those for sorting/defining order?
Either way seems it will work for me, but the latter is simpler, and still maintains pointer stability. However, I'm wondering if there's a good reason to choose the former solution? In terms of "separation of concerns" it keeps the blocks purely for memory storage, and letting the "order" list be more of an app construct.
Thanks for your reply. So the idea is the linked list of array blocks is purely for memory allocation, and then a separate linked list of references can be used for ordered traversal etc? That makes much more sense to me, and can easily see how it work for my particular situation. I was thinking you could possibly store a pointer directly into the array and then de-reference the block if the base address was at some known alignment, to keep the reference sizes smaller.
Pointer stability was something I never thought about until it became a problem. I was able to find your post on this so I'll give it a thorough read!
I had a really interesting scenario recently where I had a tree data structure implemented in left-child right-sibling (LCRS) fashion, where I stored the rank of each node in a separate, corresponding array to the array that held the Linked nodes. I realized I could actually keep the buffer containing the Linked nodes sorted by their rank (and the corresponding list of ranks would be sorted too, of course). This made sense in context because during an update I had to update the relevant pointers (offset into buffer in this case) anyway, so as long as I was expecting ranks to be incremented rather than jump around randomly I could perform the important query without actually having to look at the ranks, just by comparing the position in memory. It was a major win for memory locality! And because of the Linked nodes implementation you can do updates on existing data without any allocations at all. Of course using a non-LCRS, array-based version also has good cache locality in the typical way we expect, however updates require new arrays of different sizes to be allocated, data to be copied, and this pessimizes updates, which in the LCRS version do not need any heap allocations at all. It's a trade-off of course and not applicable in every scenario but I just wanted to throw it out there that there are cases where a Linked list implementation IMPROVES cache locality AND improves update speed.
Another technique that can be used is building a Linked list version of your structure first and then copying the data over to an array-based version, if the array-based version is indeed the better implementation for your structure but you don't want a million allocations and deallocations and copying each data element a dozen times each.
I think that I will try to utilize linked lists more after reading this article. I feel a little embarrassed because I was definitely one of the people talking about the cache locality, but I would like to ask about one use case of linked lists that I have seen. If you had an isolated function that needed to return a list (like a string split, for example), could you use the way that arenas allocate memory to dynamically grow a traditional list, or are linked lists still better in the case? I have not tried to implement something like this, but I have a gut feeling that it might fall apart at some level. Nevertheless, this was an interesting read and I will look for places where linked lists would be applicable in my code in the future.
> If you had an isolated function that needed to return a list (like a string split, for example), could you use the way that arenas allocate memory to dynamically grow a traditional list, or are linked lists still better in the case?
You could grow a dynamic array of strings as you describe, assuming you employ a contiguous arena growth strategy. In the case of a string-split (with length-based strings), it would work, because the only thing to do *is* allocate a single collection of strings (each string in the split array would just point into the original string contents - no need to allocate memory for string contents). If the arena's growth can cause discontiguity, then you have no choice but to use either the traditional dynamic array (discounting arenas, and massively complicating your allocator requirements), or use linked lists.
The more natural primitive here (and for other codepaths) is the linked list. This is because it is not restricted to contiguous growth strategies, as I've mentioned, but also because of one aspect I wrote about in the article. Each string is itself a "list" of characters - a linked list allows the allocations for the string linked list nodes to be interleaved in memory with the string content memory.
Isn't the case with packing multiple elements in a node essentially a different data structure? Yes, nominally it is a linked list in that you link nodes and content can be whatever, but packing elements makes it a hybrid, if you consider the element the basic data unit of the whole structure. Semantically it is to linked lists what a B-tree is to balanced trees. (And surely not a coincidence that in practice B-trees often exhibit massive performance benefits compared to "classic" balanced trees (e.g. BST, AVL, red-black) in that domain.)
It certainly is in tune with the composable data structures theme, but also likely not what people readily associate with the "linked list" moniker.
Casey in his refterm lecture said the "array is faster than linked list" statement is "fake optimization". It's at best neutral or worse counter-productive https://youtu.be/pgoetgxecw8?t=722
What do you think about having two separate arrays: one for the nodes (which just hold a pointer to the next node and a pointer to the value) and one for the actual values?
I totally agree with Casey's statement. Statements like that are nonsensical.
Regarding splitting memory for nodes and values, whether or not that's a good idea depends totally on what you're doing. I don't think it's worth optimizing past the nodes-and-values-on-an-arena case until you're in more concrete circumstances, when you know about your access patterns, what operations you're doing most frequently, what state you require (and what state you don't), and so on. Basically, this isn't the layer at which it's worth making decisions about that kind of thing.
Thanks for the great post. It made me realize that I am also dogmatically leaning on contiguous arrays far too much "because cache", creating unnecessarily complex allocation code just to manage them.
I'm particularly interested in the idea of storing a number of entities per linked list node, to be able to handle a large number of total entities and still iterate over them with relative speed. However, I'm wondering how one would handle updating entity references when the block is updated with additions/deletions/insertions with such an arrangement?
My guess is each entity would require a pointer to the block, and an index to the entity within that block. To delete an entity, we might need to do something like marking it, and then freeing the block when all entities are marked? Maybe it's not the best approach for my particular situation: I also need to sort entities, which would ruin any references if using anything other than a single entity per node linked list.
> However, I'm wondering how one would handle updating entity references
Personally I always opt for pointer stability. I don't believe in shifting things around to ensure contiguity of "active" things in an array. I used to have a blog post on this on my old website - I should probably boil it down and transfer its contents to my Substack at this point.
As for the allocation scheme, yeah basically the idea is that allocations should always prefer open slots in existing blocks. There is of course a worst case - allocate 1,000,000 entities, then free all-but-one in each block, leaving you with memory for 1,000,000 and no way to free them, but this is an unrealistically pathological case.
As for sorting, I don't think that needs to happen at the storage layer, and instead I'd opt for layering it on top (sorting e.g. a list of references) - like I said, I always prefer pointer stability if it's feasible, and with stuff like game entities, it seems to always be feasible.
> As for sorting, I don't think that needs to happen at the storage layer, and instead I'd opt for layering it on top (sorting e.g. a list of references)
I've been trying to implement this over the last couple of days, and it may be a small point, but I was wondering if by this, you meant that the sorted would be completely separate (i.e, a separate linked list containing references to individual block elements)?
Or were you meaning having next/prev pointers directly in the block elements themselves, and use those for sorting/defining order?
Either way seems it will work for me, but the latter is simpler, and still maintains pointer stability. However, I'm wondering if there's a good reason to choose the former solution? In terms of "separation of concerns" it keeps the blocks purely for memory storage, and letting the "order" list be more of an app construct.
Thanks for your reply. So the idea is the linked list of array blocks is purely for memory allocation, and then a separate linked list of references can be used for ordered traversal etc? That makes much more sense to me, and can easily see how it work for my particular situation. I was thinking you could possibly store a pointer directly into the array and then de-reference the block if the base address was at some known alignment, to keep the reference sizes smaller.
Pointer stability was something I never thought about until it became a problem. I was able to find your post on this so I'll give it a thorough read!
I had a really interesting scenario recently where I had a tree data structure implemented in left-child right-sibling (LCRS) fashion, where I stored the rank of each node in a separate, corresponding array to the array that held the Linked nodes. I realized I could actually keep the buffer containing the Linked nodes sorted by their rank (and the corresponding list of ranks would be sorted too, of course). This made sense in context because during an update I had to update the relevant pointers (offset into buffer in this case) anyway, so as long as I was expecting ranks to be incremented rather than jump around randomly I could perform the important query without actually having to look at the ranks, just by comparing the position in memory. It was a major win for memory locality! And because of the Linked nodes implementation you can do updates on existing data without any allocations at all. Of course using a non-LCRS, array-based version also has good cache locality in the typical way we expect, however updates require new arrays of different sizes to be allocated, data to be copied, and this pessimizes updates, which in the LCRS version do not need any heap allocations at all. It's a trade-off of course and not applicable in every scenario but I just wanted to throw it out there that there are cases where a Linked list implementation IMPROVES cache locality AND improves update speed.
Another technique that can be used is building a Linked list version of your structure first and then copying the data over to an array-based version, if the array-based version is indeed the better implementation for your structure but you don't want a million allocations and deallocations and copying each data element a dozen times each.
I think that I will try to utilize linked lists more after reading this article. I feel a little embarrassed because I was definitely one of the people talking about the cache locality, but I would like to ask about one use case of linked lists that I have seen. If you had an isolated function that needed to return a list (like a string split, for example), could you use the way that arenas allocate memory to dynamically grow a traditional list, or are linked lists still better in the case? I have not tried to implement something like this, but I have a gut feeling that it might fall apart at some level. Nevertheless, this was an interesting read and I will look for places where linked lists would be applicable in my code in the future.
> If you had an isolated function that needed to return a list (like a string split, for example), could you use the way that arenas allocate memory to dynamically grow a traditional list, or are linked lists still better in the case?
You could grow a dynamic array of strings as you describe, assuming you employ a contiguous arena growth strategy. In the case of a string-split (with length-based strings), it would work, because the only thing to do *is* allocate a single collection of strings (each string in the split array would just point into the original string contents - no need to allocate memory for string contents). If the arena's growth can cause discontiguity, then you have no choice but to use either the traditional dynamic array (discounting arenas, and massively complicating your allocator requirements), or use linked lists.
The more natural primitive here (and for other codepaths) is the linked list. This is because it is not restricted to contiguous growth strategies, as I've mentioned, but also because of one aspect I wrote about in the article. Each string is itself a "list" of characters - a linked list allows the allocations for the string linked list nodes to be interleaved in memory with the string content memory.