Introduces the concept and structure of an "immediate mode user interface" and how we can use it to accomplish our goals, including strategies for autolayout, animation, and keying.
On the subject of deriving the key for caching immediate mode widgets to their retained state, I feel like I should mention how React solved the problem with its hooks model.
React caches retained values per invocation (what they call a hook) rather than per widget, and requires that each widget calls each hook in a deterministic order (hooks cannot be called conditionally in a widget's body). The key for each retained value then is simply its called position in the widget hierarchy.
This allows primitives to be composed together into higher-level utilities surprisingly well, as the calling order of retained values remains constant even if extracted into utility functions.
React keeps track of which hooks are called whenever a widget is mounted, and simply removes them from the calling order when the widget is unmounted, effectively updating the key for retained values for widgets later in the traversal, and thereby enabling conditional rendering.
I was following this article, up until the "Immediate Mode Build, + Cache" section. I feel like there are a few things that I'm missing to be able to understand.
To start off, why would a purely immediate-mode UI_Widget data structure that is rebuilt every frame need to encode a binary tree instead of an array list? Is it so that we can put all UI_Widget instances into a single allocation without having to worry about dynamic sizing? We just... don't need random access to a specific child, since we're rendering every frame? But what about scroll views then, surely we won't want to iterate over thousands of list elements before getting to the place where it actually draws what's on the screen...
Then finally, I think I just got completely lost on the part where we introduce hashing. Why do we have hash_next and hash_prev that are pointers to other UI_Widgets instead of being... hashes? Forgive me for stupid questions because it's the first time I see something like this, I'm very confused.
Edit: turns out UI_Widgets is being used as a node of a doubly linked list that is part of the hashmap, one that's being made from scratch. I'm too used to generic hashmaps in other languages lol
It may be just my lack of background knowledge, but I failed to understand some things.
So, you store the widgets of the previous frame in a hashmap, which you can access from the current frame's widget. You want to do this because the previous widgets are already calculated so you can know if the user clicked on something or not. Did I get this correct?
For some reason I still don't think I understood how to implement this.
Instead of caching all the widgets of the previous frame, can't you just pass a lambda with the code that would get executed if the user clicked? By the end of the frame, all the computations would be done so the GUI knows if the user clicked, so the lambda gets executed at the end of the frame.
This would be less clean user code but I still don't quite get how to implement the no-lambda version.
An easy way of implementing it, is by adding a 1-frame delay for your input. Then, you check mouse interaction when drawing the stuffand save the hash of the thing that got clicked.
Next frame, when specifying the widget again, you check if the hash is the same as the hash of the thing that got clicked in the last frame.
Alternatively (here, you don't have the frame delay), when rendering (or probably, when layouting), you store the rect of the widget tht you are hovering, then, when specifying a widget, you can directly check if the mouse is pressed and the widget is the one that was hovered.
The big advantage of the non-lambda version is that you can naturally debug your code, because everything happens just how you write it instead of having a lambda that gets executed somewhere out of place
In the layout section, you say that the various steps should iterate through the widget tree in a preorder fashion (top-down). But downward-dependent sizes like `UI_SizeKind_ChildrenSum` seem like they should be computed in a postorder fashion (bottom-up). After all, if any children use downward-dependent sizes as well, I'd want to resolve those first before computing sizes on the parent. Is that an exception to the preorder rule, or am I misunderstanding something?
Ah, you're right - that was a mistake in my wording. That part of the algorithm is indeed post-order; you want to recurse to children before solving for the layout for one node.
Yup. I just mean that this kind of thing runs every frame:
current = current + (target - current) * rate;
If you work it out mathematically, I *believe* you can use whatever your refresh rate's delta-time value to calculate a rate that produces an identical curve across various displays. But, truthfully, I just never do that - I just multiply by delta_time and tweak a constant rate value for each animation. It's not exactly identical across, say, 144 Hz and 60 Hz, but it's close enough, and being "exact" is just normally not that important when you're talking about simple UI animations (whereas being more precise is much more important in e.g. a game or cutscene).
This produces an exponential curve of motion across frames, which means that the fastest motion is on the first frame of the animation, and the slowest is on the last frame. This fits very naturally with the characteristics you more-or-less always want in a UI, which is as little time as possible between user interaction and perceived effect of user interaction. It's also robust to changes in the *target value* overtime. So, for example, if you have a scrolling offset, you want the user dragging the scroll bar *while the scroll offset is animating* to gracefully adjust the animation to the new target. This simple way of animating things achieves that, so it's a very low-friction way to get high-quality results, for the purposes of UI.
On the graph, t is the target, r is the rate, m is some epsilon that encodes a difference between the target and the current value at which the animation is considered complete, f is the "target refresh rate" in Hz, and z is the x value at which the animation is complete.
If the x axis is the number of frames it takes to complete the animation, then you can notice the slight differences between refresh rates with the z / f expression I put at the bottom. If f = 60, then it takes 411 frames, or 6.85 seconds, to complete. If f = 144, then it takes 911 frames, or 6.88 seconds, to complete. You can see that these are a bit off. I've never done the actual work to figure out the adjustment to the rate that is required to correct for this difference, but like I said, it's just not that big of an issue. I should probably figure it out though nevertheless. :)
EDIT: I figured it out. r should be specified as 1 - 2^(-rate/f). This will ensure that the amount of time it takes to complete the animation remains identical across all refresh rates. It will be a bit slower to compute the rate. Updated graph here: https://www.desmos.com/calculator/xwerlmi4cs
On the subject of deriving the key for caching immediate mode widgets to their retained state, I feel like I should mention how React solved the problem with its hooks model.
React caches retained values per invocation (what they call a hook) rather than per widget, and requires that each widget calls each hook in a deterministic order (hooks cannot be called conditionally in a widget's body). The key for each retained value then is simply its called position in the widget hierarchy.
This allows primitives to be composed together into higher-level utilities surprisingly well, as the calling order of retained values remains constant even if extracted into utility functions.
React keeps track of which hooks are called whenever a widget is mounted, and simply removes them from the calling order when the widget is unmounted, effectively updating the key for retained values for widgets later in the traversal, and thereby enabling conditional rendering.
I was following this article, up until the "Immediate Mode Build, + Cache" section. I feel like there are a few things that I'm missing to be able to understand.
To start off, why would a purely immediate-mode UI_Widget data structure that is rebuilt every frame need to encode a binary tree instead of an array list? Is it so that we can put all UI_Widget instances into a single allocation without having to worry about dynamic sizing? We just... don't need random access to a specific child, since we're rendering every frame? But what about scroll views then, surely we won't want to iterate over thousands of list elements before getting to the place where it actually draws what's on the screen...
Then finally, I think I just got completely lost on the part where we introduce hashing. Why do we have hash_next and hash_prev that are pointers to other UI_Widgets instead of being... hashes? Forgive me for stupid questions because it's the first time I see something like this, I'm very confused.
Edit: turns out UI_Widgets is being used as a node of a doubly linked list that is part of the hashmap, one that's being made from scratch. I'm too used to generic hashmaps in other languages lol
Damn, that hash_next, hash_prev stuff had me too - thx for that edit there. Conclusion: its just a hash-map, implemented the way u want.
It may be just my lack of background knowledge, but I failed to understand some things.
So, you store the widgets of the previous frame in a hashmap, which you can access from the current frame's widget. You want to do this because the previous widgets are already calculated so you can know if the user clicked on something or not. Did I get this correct?
For some reason I still don't think I understood how to implement this.
Instead of caching all the widgets of the previous frame, can't you just pass a lambda with the code that would get executed if the user clicked? By the end of the frame, all the computations would be done so the GUI knows if the user clicked, so the lambda gets executed at the end of the frame.
This would be less clean user code but I still don't quite get how to implement the no-lambda version.
An easy way of implementing it, is by adding a 1-frame delay for your input. Then, you check mouse interaction when drawing the stuffand save the hash of the thing that got clicked.
Next frame, when specifying the widget again, you check if the hash is the same as the hash of the thing that got clicked in the last frame.
Alternatively (here, you don't have the frame delay), when rendering (or probably, when layouting), you store the rect of the widget tht you are hovering, then, when specifying a widget, you can directly check if the mouse is pressed and the widget is the one that was hovered.
The big advantage of the non-lambda version is that you can naturally debug your code, because everything happens just how you write it instead of having a lambda that gets executed somewhere out of place
In the layout section, you say that the various steps should iterate through the widget tree in a preorder fashion (top-down). But downward-dependent sizes like `UI_SizeKind_ChildrenSum` seem like they should be computed in a postorder fashion (bottom-up). After all, if any children use downward-dependent sizes as well, I'd want to resolve those first before computing sizes on the parent. Is that an exception to the preorder rule, or am I misunderstanding something?
Ah, you're right - that was a mistake in my wording. That part of the algorithm is indeed post-order; you want to recurse to children before solving for the layout for one node.
Hi Ryan, enjoyed the article. Any chance you could provide a quick sketch of what you mean by "self-correcting exponential animation curves"?
Yup. I just mean that this kind of thing runs every frame:
current = current + (target - current) * rate;
If you work it out mathematically, I *believe* you can use whatever your refresh rate's delta-time value to calculate a rate that produces an identical curve across various displays. But, truthfully, I just never do that - I just multiply by delta_time and tweak a constant rate value for each animation. It's not exactly identical across, say, 144 Hz and 60 Hz, but it's close enough, and being "exact" is just normally not that important when you're talking about simple UI animations (whereas being more precise is much more important in e.g. a game or cutscene).
This produces an exponential curve of motion across frames, which means that the fastest motion is on the first frame of the animation, and the slowest is on the last frame. This fits very naturally with the characteristics you more-or-less always want in a UI, which is as little time as possible between user interaction and perceived effect of user interaction. It's also robust to changes in the *target value* overtime. So, for example, if you have a scrolling offset, you want the user dragging the scroll bar *while the scroll offset is animating* to gracefully adjust the animation to the new target. This simple way of animating things achieves that, so it's a very low-friction way to get high-quality results, for the purposes of UI.
Gotcha. Much appreciated :)
Here's the closed form curve, when you have a constant rate and target value: https://www.desmos.com/calculator/8f1cpfqlmw
On the graph, t is the target, r is the rate, m is some epsilon that encodes a difference between the target and the current value at which the animation is considered complete, f is the "target refresh rate" in Hz, and z is the x value at which the animation is complete.
If the x axis is the number of frames it takes to complete the animation, then you can notice the slight differences between refresh rates with the z / f expression I put at the bottom. If f = 60, then it takes 411 frames, or 6.85 seconds, to complete. If f = 144, then it takes 911 frames, or 6.88 seconds, to complete. You can see that these are a bit off. I've never done the actual work to figure out the adjustment to the rate that is required to correct for this difference, but like I said, it's just not that big of an issue. I should probably figure it out though nevertheless. :)
EDIT: I figured it out. r should be specified as 1 - 2^(-rate/f). This will ensure that the amount of time it takes to complete the animation remains identical across all refresh rates. It will be a bit slower to compute the rate. Updated graph here: https://www.desmos.com/calculator/xwerlmi4cs