In Defense Of Linked Lists

Despite memes suggesting otherwise, linked lists are one of the most powerful data structures in simplifying and upgrading code.

Nov 01, 2022

Learning the basics of data structures is a rite of passage for new programmers. On their first day doing so, chances are they’ll encounter the idea of a “node”, and the idea of links between nodes. Both of these ideas can be used to organize data in innumerable ways.

One of the first kinds of data structures they’ll be exposed to is the linked list. This is because the linked list in particular is such a simple idea. A singly-linked list—the simplest case—begins with a single node (the “head”), which can point to another node, which can point to another, and another, until the final node (the “tail”), which points to nothing. It is, of course, only a list if nodes do not point to any node that occurred earlier in the list—every link points to an “unseen” node.

You might expect such a simple idea to be uncontroversial and banal. Well, at least, you might expect that if you’d never been on the Internet. Linked lists are, surprisingly, a topic of debate. Some claim that they’re impractical, and nonsensical to even consider as an option:

Sometimes, it’s though programmers (people?) on the Internet are not discussing anything at all, and instead only performing symbolic manipulation and substitution with certain phrases. The statement “linked lists = bad, silly data structure with bad cache effects” has been entered into the collective programmer consciousness, and so the occurrence of “linked list” in a sentence—by substitution—is replaced with “bad, silly data structure with bad cache effects”. Perhaps this is evidence of the Dead Internet Theory being correct? More likely, maybe, is that Twitter is an awful platform for careful discussion about programming. I don’t know, but at some point, it all seems a bit robotic.

I digress. I find the sort of above comments overly-simplistic and misleading. They’re more likely to lead new programmers astray than help them. My experience is that linked lists are extraordinarily useful. They’ve been important for me in simplifying problems without sacrificing computational efficiency nor reliability. I’m writing this post to provide a detailed account that fits my experience.

In this post, I’ll address some common objections to linked lists, and I’ll cover my understanding of why they’ve been useful for me. In doing so, I hope to add another tool to your programming toolbox, so that you don’t naively toss linked lists aside, and preemptively throw away the potential benefits.

Linked List Criticisms On Node Discontiguity

A lack of memory contiguity is often used as an example of negative characteristics of linked lists. Each node is not guaranteed to be, say, sequential in memory. Linked list detractors claim that this has four negative consequences: Random Access Algorithmic Pessimization, Serial Dependence, Non-Locality, and Cache Pessimization From Bloat.

Random Access Algorithmic Pessimization

With linked lists, the cost (in time) of performing random access—finding the nth element, where n is random—goes up with the number of elements, as opposed to remaining constant for any size (e.g. in an array). The reason should be clear—given only a linked list, if I ask you to find the nth element, the only route you can take is to scan the linked list until you find the nth element. With an array, it’s trivial to calculate the nth element’s address given only the base address of the array (and the size of each element).

This difference in how an algorithm behaves with respect to the number of elements is often described mathematically with “big O”. I won’t cover this topic fully in this post, but long story short, with big O notation, the “time complexity” for the “linked list random access algorithm” is expressed as O(n), where n is the number of elements in the linked list.

What this means—in an intuitive but mathematically loose sense—is that the function describing the time cost of random access in a linked list scales in a similar fashion to f(n) = n. If you add an element to a linked list, you’re adding some constant (meaning independent of n) “time cost” to the random access algorithm, on average. The time cost of the random access algorithm does not, for instance, scale like f(n) = n^2, where every new element in the linked list would add some constant multiplied to n to the algorithm’s time cost.

The time cost of the array random access “algorithm” is O(1), because the only work to be done is to take the array’s base address and add an offset to it—there is no work that is computationally dependent on n.

Note: It’s often overlooked that the mathematical definition behind big O—by definition—disregards constant factors. It does this intentionally, because big O is aimed at understanding the asymptotic behavior of a function with respect to a variable. It’s interested specifically in what happens to the time or memory cost of certain algorithms when a dependent variable approaches asymptotic limits. It is not about literally measuring runtime speed or memory usage.

When I first began programming, I would use random access everywhere, and so it made conceptual sense to prefer data structures that offered trivial random access. So, linked lists fell by the wayside. You can imagine how this arises—with certain programming styles, it’s quite common to store indices into arrays as a way to “key” a collection of data (stored within an array), and to subsequently use random access to produce an element from an index.

O(1) random access can be important. It can also be unimportant. It is sometimes preferable, and it sometimes makes no difference. With that in mind, it’s clear that losing other more important benefits for the sake of retaining O(1) random access is nonsensical. So, it’s important to understand when and why O(1) random access would be important.

Consider the problem’s data. O(1) random access can firstly be unimportant simply because of the size of the actual computational tasks. If n = 10, the n-dependent scaling factor of an O(n) algorithm may mean—practically—nothing. And while it may be conceptually friendly to prefer algorithms that “scale”, the idea that it’s helpful to do so in all cases without regard for the practical bounds of one’s problem is delusional. When the hard data requirements of a problem change, almost everything about the problem also changes. So, I posit that it’s best to take advantage of cases where you’re only concerned with n = 10 elements. And furthermore, I posit that cases like this are actually the common case (with the usual caveats about this not necessarily being true for all programmers in all fields—although I’d like to avoid any misconception that I’m strictly referring to game development).

Consider the entire problem. O(1) random access can also be unimportant because it may have no impact on overarching asymptotic complexity. That sounds a bit egg-headed, so allow me to explain.

Note that indices used for O(1) random access with an array must be produced somehow. In many cases, they are produced from batch processing of elements within the array.

For instance, let’s suppose that I have an array of Entity for my game world’s entities. I am iterating all Entitys every frame for the purposes of simulation, rendering, and so on. I may want to allow the player to select one entity, and I’d like to store the player’s selection as an index.

In this case, linked lists do not actually pessimize the overarching asymptotic complexity. The overall cost of producing the index (O(n)), and then using it to retrieve an element (O(1)), is altogether O(n). So, when strictly considering asymptotic complexity (which is again not necessarily indicative of any actual runtime performance characteristics), O(1) random access does not make a difference—O(n) random access would be identical (in asymptotic complexity terms).

But furthermore, because I am already doing batch processing of my Entity array, if I were to instead choose a linked list, rather than producing indices as my “key” for subsequent random access, I can simply produce pointers, and then retrieving the selected element in the linked list remains an O(1) operation.

Note: Storing pointers into a linked list (specifically one that is mutated) may, of course, may result in instability—what if one of the elements is removed? But note that the same problem exists in the array case (although the symptoms of the problem may be different)—one can hold an index (for random access), and the element referred to by that index may be removed. In such cases, it’s common to use techniques like generational references, which are useful tools regardless of the underlying data structure.

Using pointers in place of indices does not come without drawbacks:

Pointer sizes. Pointers index much more than a single array in your program—they index your entire address space. Thus, they have different size constraints than indices. You may be able to get away with 16-bit indices for your array, whereas pointers (on virtually all modern consumer machines) must be 64-bits wide. This can be mitigated in special-case ways—for example, you may be able to guarantee that the upper 32 or 48 bits of a pointer are unimportant noise, and you may be able to treat a linked list node pointer as a 16-bit offset from a base address that you always allocate nodes from.

Serialization. In general, you may not assume that your allocations from the operating system on a modern consumer machine will necessarily occupy the same portions of virtual address space. So, you cannot serialize a pointer to disk, deserialize it on a subsequent run of your program, and expect it to work. This is unlike indices—indices may be serialized to disk and later reused, as long as the array to which they refer also stays the same across runs of your program. This can also be mitigated—a serialization codepath must apply some transform to pointers that preserves the part of their encoded information that can be stable across runs.

Serial Dependence

Linked lists, by their definition, not only lack element memory contiguity, but also make no guarantees about any patterns or regularities with which elements are stored in memory. Therefore, any system—like a hardware-level prefetcher—may not be able to adequately predict which memory it needs in advance. Every codepath must first look at the pointer to a subsequent node, which requires memory access for the node in which the pointer resides.

Modern CPU cores don’t perform each machine code instruction as they’re encoded in sequence. While they will produce identical effects to a CPU which would’ve executed each instruction in sequence, they instead contain several hardware units that simultaneously operate when possible. For instance, some arithmetic operations may be occurring while the CPU is simultaneously fetching memory from an address it needs for a subsequent operation. When iterating an entire linked list, finding the next node requires reading the current node, meaning there is a serial dependence between nodes. This means that the parts of a CPU core responsible for fetching memory that is soon-to-be-required may stall, because the core may be waiting for memory access in order to make progress. This effectively eliminates any performance benefit that a hardware-level prefetcher would’ve offered. This becomes especially likely if some computational task is memory-access-bound, rather than compute-bound—in other words, if a given codepath is more likely to be waiting on memory than on completing some operation with memory it has.

But there is no inherent connection between performance degradation due to serial dependence and linked lists, and so this is not a well-formed criticism of linked lists. The linked list definition doesn’t define the content stored at each node, and thus the number of elements in my collection need not be the number of nodes in my linked list—I can store 1, 8, 64, or 1,024, or more elements per node. This, importantly, provides a knob which allows the mitigation or total elimination of any performance degradation due to serial dependence.

This same technique can be used to mitigate the cost of an O(n) random access scan, which I discussed earlier. It can also mitigate the cost of storing pointers for links between nodes, but I will cover more on that later.

Non-Locality

Linked list detractors will also claim not only that linked list nodes are serially-dependent, but that there is also no guarantee that nodes have any locality in memory. So, not only are nodes serially-dependent, but with each new node access, a data cache miss is highly likely. This, they claim, necessarily stresses data caches, because each linked list node may be widely scattered throughout memory.

Just as with serial dependence, this is not a well-formed nor meaningful criticism of linked lists, because it is assuming concrete information (namely, how linked list nodes are allocated) that is not inherent in the definition of linked lists.

This concern is born out of an outdated programming model, which assumes all dynamic allocations—including those for linked list nodes—are calls to malloc (or similar) at some point, and so each node is scattered throughout the heap (which is simultaneously being used for many other allocations), resulting in extremely varied and unpredictable access patterns at all levels.

As I’ve already discussed in prior posts, this model of dynamic allocation is not the only option. With arena allocation—as one example—nodes may be arranged sensibly, in local regions of memory, and even often in a sensible order.

Cache Pessimization From Bloat

One cost associated with linked lists—indirectly because of node discontiguity—is memory bloat. Because the order of elements is not implicitly encoded (as it is with arrays), it must be explicitly encoded with links. These links are—normally—just pointers. And, of course, pointers require space. With a modern 64-bit machine, with >32-bit address spaces, pointers cost 8 bytes, so pointers are not a trivial cost.

The argument goes, the bloat from these links will necessarily pessimize data caches. You’re spending memory encoding the chain through nodes, so you can fit less useful data into cache—“useful data” being that which is required for actual computation work.

This argument, again, disregards one’s ability to effectively divide the cost of this bloat by storing many elements per node. At some point, with some number of elements per node, the cost will be indistinguishable from an array—the problem will no longer be memory-access-bound, and will instead be compute-bound on computation over tightly-packed elements within a single linked list node.

The Power of Discontiguity

While I hope to have put your mind at ease about these supposed drawbacks of linked lists, you may still be thinking: “That’s great, but you have not presented an argument for using linked lists! I can still avoid these drawbacks in other ways. You’ve written how to avoid all of these drawbacks, but for what? To store elements discontiguously? Why?”

Storing linked list nodes discontiguously introduces some concerns, as I’ve covered already. But there’s a flip side—because nodes may be discontiguous and the order explicitly encoded, it’s necessarily also true that the builder of the linked list now has the freedom to arrange the list’s nodes in a manner of its choosing. That could include an arrangement that pessimizes node locality in memory or worsens the effects of serial dependence, but that is not the only option.

Freedom In Allocation

I’ve already mentioned that linked list nodes can be allocated with an arena allocator, which can alleviate concerns regarding node memory locality. This is also a useful way to bucket all linked list allocations, and thus avoid all additional work related to releasing those allocations that might arise with a different approach to memory allocation.

But that merely scratches the surface. Because of the lack of a requirement for node contiguity, it’s trivial to notice that linked lists may grow with an arena—expanding the linked list is just a matter of pushing a new node onto the arena. And furthermore, arenas may be used not for one growable linked list, but several. Contrast this to using an arena to back a growable array (which is a valid and useful case of arena usage). The arena in question is “locked” to only supporting a single data structure, with a single type (the dynamic array). With linked lists, the arena can be used to support several dynamic and growing data structures, binding them all to the same lifetime.

This provides even greater leverage over the arena allocator’s capabilities for allocation lifetime organization. But more than that, the combination of linked lists and arenas is uniquely apt for a large class of problems. Specifically, those in which a codepath builds one or several heterogeneously typed complex data structures, in whatever order makes most sense for the given codepath.

For example, consider a function that parses a list of tokens to produce a tree structure (imagine a JSON tree data structure as an example). Even if the output tree is homogeneously typed, a parser also can gather errors at any point within the input stream.

struct TreeNode
{
  TreeNode *first;
  TreeNode *last;
  TreeNode *next;
  String8 string;
  // etc...
};

struct Error
{
  Error *next;
  TokenNode *token_location;
  TreeNode *tree_location;
  String8 text;
};

struct ErrorList
{
  Error *first;
  Error *last;
  U64 count;
};

struct TreeParseResult
{
  TreeNode *root;
  ErrorList errors;
};

// list(Token) -> (list(Error) * Tree)
TreeParseResult TreeFromTokens(Arena *arena, TokenList tokens);

In the above example, because not TreeNodes nor Errors are required to be contiguous in memory, the parser codepath can allocate memory for both, immediately, as it sees fit. That can include dynamically allocating new strings. All of the involved heterogeneous types—tree nodes, errors, string content memory—can be allocated on the same arena, in any order. This means the most “natural” order—that which matches the order in which error conditions are encountered, or in which new tree nodes are parsed, or in which strings should be allocated—is trivially supported. And, of course, the usual bells and whistles of arenas apply—none of the allocations must be individually freed, and allocations are lightning fast. Furthermore, because each data structure is allocated on the same arena, and because linked list nodes do not require shifting in memory due to growth, the lifetimes of any two allocations can be assumed to be equal, and pointers are guaranteed to remain stable, so building links between multiple data structures is trivial and safe.

Contrast this to a case in which memory contiguity is required. You might imagine that errors, tree nodes, and each string can be stored within a dynamic array. One usual C++ method is exactly this, when each dynamic collection is implemented with an std::vector, and each string with std::string. Because of the requirement that each collection’s memory is contiguous, and because none of the required allocation sizes are known upfront, and because for sufficiently complex codepaths doing an upfront calculation of the required allocation sizes is impractical and unmaintainable, the usual dynamic array growth strategies are employed. In some cases, that may take advantage of virtual address space tricks (similar to those I’ve discussed for arena implementations), but in many cases, it instead requires reallocation and relocation. The initial allocation of memory for a collection will be identified as too small, and so a new, larger allocation will be made, the array will be copied, and the previous allocation freed.

This, of course, disqualifies arenas as a usable allocator, and requires a more sophisticated generic heap allocator that makes no assumptions about lifetimes nor sizes. This dispenses with all related benefits of the arena—allocations may no longer be lightning fast, and individual allocations must be freed. The programmer may not be forced to manually write code to free individual allocations—the code to do so may be automatically generated through complex compiler features, like destructors—but the computational cost of releasing memory is required nonetheless. It also disqualifies pointers as the method for linking various nodes or data structures together (because those pointers may be invalidated upon the growth of a data structure).

Data Structure Composition

Another initially subtle—but immensely powerful—benefit of node discontiguity is the ability to build, in effect, many data structures from one set of nodes.

Node discontiguity can be a useful building block in building all sorts of structures, not just linked lists. And those structures can be composed by noticing that one node doesn’t just need links for one data structure.

One data structure can provide the effects of both a tree, a hash table, and a linked list:

struct Node
{
  Node *hash_next;    // slot for next node in hash chain
  Node *first_child;  // slot for first child in hierarchy
  Node *last_child;   // slot for last child in hierarchy
  Node *next_sibling; // slot for next sibling in hierarchy
  Node *next_linear;  // slot for linked list through all nodes
  // ...
};

I demonstrated a practical example of this technique in my from-scratch UI programming series, in which a user interface node hierarchy functioned as both a hash table and a tree.

Modern programming languages and modern programming education teach that one’s job is to choose a data structure for the problem. The best choice might be an array, or a tree, or a hash table. I disagree with this framing—by dropping down a level, one’s job can instead be to design the perfect data structure for the problem, using more granular data structure principles. Through data structure composition, one can build an array, and a tree, and a hash table—all in one—and leverage the effects of each.

Implementation Notes

Linked lists are often implemented as generic data structures in, say, a language’s standard library. I don’t recommend using such data structures. I have no doubt that the pessimization some folks implicitly associate with linked lists are because of the common characteristics of such implementations. These implementations not only fail to avoid many—if not all—of the aforementioned possible problems with linked lists—they also dispense with one of the most powerful benefits I’ve mentioned in this post, data structure composition.

Allen Webster was the person who taught me that this “generic data structure” approach is all wrong. Now, you may object:

Ryan, are you just meant to implement the data structure multiple times? Managing pointers in, say, a doubly-linked list isn’t the hardest thing in the world, but it’s also not trivial—it’s very easy to imagine screwing it up a few times on every implementation.

And this is perfectly legitimate. But Allen showed me a better way to deduplicate linked list pointer management than through generics: wrapping the complex, finnicky work related to linked lists into several reusable macros, which can then be used in implementing custom data structures. He covers this in more detail in one of his Mr. 4th Programming videos. I also have an implementation of these macros in my application template repository over in my Code Depot.

Closing Thoughts

I overlooked linked lists for too long in my journey to become a better programmer. As a result, I was repeatedly forcing everything to be in contiguous blocks of memory. This disqualified me from meaningfully taking advantage of data structure composition, pointer stability, and using arenas as a basic allocator building block for even the most sophisticated of data structures.

Learning about their subtleties, and how to use them, has proven to be immensely useful, and has allowed me to rely on even simpler languages and tools, despite writing more sophisticated and more performance-oriented code.

I hope this post provided some clarity on why exactly that is, and that the concerns about linked lists you’ll see around the Internet are overblown.

If you enjoyed this post, please consider subscribing. Thanks for reading.

-Ryan

ezs

Apr 9, 2023Edited

Isn't the case with packing multiple elements in a node essentially a different data structure? Yes, nominally it is a linked list in that you link nodes and content can be whatever, but packing elements makes it a hybrid, if you consider the element the basic data unit of the whole structure. Semantically it is to linked lists what a B-tree is to balanced trees. (And surely not a coincidence that in practice B-trees often exhibit massive performance benefits compared to "classic" balanced trees (e.g. BST, AVL, red-black) in that domain.)

It certainly is in tune with the composable data structures theme, but also likely not what people readily associate with the "linked list" moniker.

Expand full comment

Trần Thành Long

Nov 2, 2022

Casey in his refterm lecture said the "array is faster than linked list" statement is "fake optimization". It's at best neutral or worse counter-productive https://youtu.be/pgoetgxecw8?t=722

What do you think about having two separate arrays: one for the nodes (which just hold a pointer to the next node and a pointer to the value) and one for the actual values?

1 reply by Ryan Fleury

8 more comments...

Digital Grove

Discussion about this post