0
Fixed

Debugging Data Assignment is slow

GhostAegis 2 years ago updated by Lazlo Bonin (Lead Developer) 2 years ago 16

Per the request from the other thread, I'll post what I can about this issue here.

I'm on Unity 2018.1.9f1 and currently using Bolt 1.3. When updating to 1.4.0f7 I see a significant increase in CPU time for Bolt processes, up from about ~3.5 ms with the profiler open on 1.3 to >13 ms on 1.4.0f7. This happens at all times when playing.

I've included deep profiling screenshots of both versions below. I capped it to a single screen because most of the time-intensive calls are contained here, but if necessary I can stitch together a screenshot of all the Bolt processes. There's a lot of them though, so it might take me a while.

Bolt 1.3

Bolt 1.4.0f7

Bolt Version:
Unity Version:
Platform(s):
Scripting Backend:
.NET Version (API Compatibility Level):
GOOD, I'M SATISFIED
Satisfaction mark by GhostAegis 2 years ago
Pending Review

Hey GhostAegis! Thanks for the report.

I know this is going to sound ridiculous, but those screenshots don't go deep enough ;)

Can you test with a much smaller graph (e.g. a smaller invoke / value tree) to compare, so we can reach the end of the value tree?

There's also a consistency problem between the two screenshots. In the 1.4 one, there are 52 update calls, while in the 1.3 one, there are only 42, which makes it impossible to compare them properly. Make sure your test scene has the same number of objects and the exact same behaviour for the frame you're comparing.

One other thing I would like you to test is the out-of-editor performance by linking the profiler. My educated guess is that the performance drop you're seeing could be related to the new way 1.4 handles reporting debug data to the editor. The tradeoff was that it would be slower in the editor, but leaner in builds, which is ultimately a good thing. There are instructions to profile builds on this page.

Okay, I'll try and make a stripped-down version that replicates it and see if that condenses the profiler enough to make it reasonably screenshotable. If I tried to stitch that profiler shot together to the end of the Bolt calls it would take hours. And I'll do that with a build as well to compare them all. I should be able to do that tonight.

I noticed the thing with the calls, but the weird thing is those are identical scenes. I don't think there was any difference in behaviors or content between the two shots. Just four multiplayer characters standing idly in a map for both, and both were taken roughly the same amount of time after entering play mode. I'll make sure I'm paying careful attention to that for the next test though, just to be sure.

Okay I've been at this for several hours this morning, and I'm stuck.

To start off, I can't get the profiler small enough to make it screenshotable. Even just stripping down a scene to a lone player ability macro graph and NOTHING ELSE results in a profiler that looks to be something like 50+ pages of screenshots to get to the end of the Bolt calls.

But even worse, now that I'm really digging around in the project with 1.4 installed, I'm finding that any and every macro on an instantiated object is missing. So if I load up a level and the players spawn in, ALL of their macros and object variables are empty. The rest of the macros/variables in the scene are fine. But if I open a level manually and drop the players into the scene before hitting play, and then hit play directly on that level, it works fine.

However, this only happens in the editor, and only when spawning objects. But that makes using or otherwise testing in the editor via full game flow impossible on 1.4. So just to be clear, with 1.4.0f7 and Bolt 2018.1.9f1, I cannot spawn an object without its Bolt components being completely reset to empty while in the editor. This does not happen in a build.

That said, I was at least able to take some measurements by building a level with all components manually and hitting play directly. I'll paste them below. But based on what I'm seeing, it looks like 1.4 does offer a small boost to player efficiency... but at a major, major expense to editor efficiency. 

Oh, and I did figure out the discrepancy in calls. In 1.3 FlowMachine.Update() and StateMachine.Update() are two separate categories in the profiler. In 1.4 all of the calls to those are rolled together under EventMachine. So that'd be why there's a difference between the screenshots, since that first one only shows FlowMachine calls.

I'm sorry I can't be of more assistance, but I'm way, way out of my element here. If there's anything else simple I can do to help, I'm happy to do it, but as it stands there's just no way I can use 1.4.

1.3

Profile in Editor

Average cpu time per frame 4.3 ms 4.6 KB  of GC alloc

Profile in Player

2.3 ms 4.6 KB of GC alloc

1.4.0f7

Profile in Editor

11.9 ms 4.3 KB of GC alloc

Profile in Player

1.7 ms 4.3 KB of GC alloc

+1

Hi GhostAegis! Thanks for looking into this some more. To keep this on focus, I'd ask you to open another thread for the new bug you seem to be experiencing with the machine resetting.

The results seem to point to what I was expecting though: in player, Bolt 1.4 is actually significantly faster than 1.3. I'll create my own small test scene to try to optimize debug data assignment speed so that we reach similar speeds.

Okay, I've posted a separate topic for that new bug I found. As far as this, it sounds like its less a bug and more just a side-effect of the new version? While it is nice how much faster it is in the player (-27% is great!), the corresponding editor cost is really problematic (+176%). On top of that, the cost in editor per macro seems to scale a lot faster than it did before.

As an example, I had previously been running everything off of Bolt. Shortly before discovering this, I had rewritten my player controller which is shared across all four players in a scene in order to get my GC alloc down. This pulled 4 copies of 1 macro from my scene and replaced it with a C# script.

As a result, my CPU time per frame in editor went from ~6 ms to ~4 ms and my GC alloc went from ~9.8 K to ~4.6 K. Great!

However, it also has illustrated that the scaling time cost of macros is kind of unsustainable with the new version. Before I replaced the player controller macros I tested it on 1.4 as well.

I got 22+ ms per frame in editor. Pulling the controller macro from each of my 4 players dropped it to ~11.9 ms. 

That's a really shocking scale rate and makes anything remotely scripting-heavy that depends on Bolt basically untestable in the editor. While it's really nice how fast it is in a build, on a large project it's critical to be able to test with reasonable performance in editor. With the new version that's just not doable, and that really limits its usefulness over the old version.

That would be consistent with the behaviour you were describing before. 

Basically, there is a an operation happening when a graph element tries to fetch its debug data. This happens very often: before traversing any connection in the graph. At runtime, this is a no op, because there's no editor data, which explains why it's so much faster.

The operation is roughly:

  1. Check where you are in the Graph Stack (e.g. the path of nested graphs)
  2. Convert that to an immutable Graph Reference
  3. Pass this Graph Reference to the editor side of things
  4. Validate the graph reference
  5. Look in the loaded Graph Contexts for this reference
  6. Return the associated context, or create it

The problem is step 2 allocates memory, and step 4 takes time. I'll try rethinking how that happens to make it faster and leaner.

Ah, well that makes sense then. That's unfortunate that it results in such a large performance cost increase, but it is understandable. I suppose since it's editor-only it isn't a huge issue, more an inconvenience than anything. 

Anything you can do to improve the editor performance scaling to bring it back in line with 1.3 in a future version will definitely be appreciated, but builds certainly matter more. Not a bug then, I guess. Thanks for looking into it and taking the time to explain it!

Hello (Im the guy of the memory allocations), I made a simple flow that rotates a cube via input. I made the exact flow in latest Bolt vs Flow Canvas. Flow Canvas had zero allocations, Bolt made allocations. Bolt "Time ms" was 10x  more than Flow Canvas. I suggest that improving performance should be considered over features. I can take some screenshots of the flows in case you are interested (New thread?). 

Hi Carlos! Actually, improving performance is pretty much our main priority, but it's efforts spread across many axes: startup times, allocations, unit database generation, and yes, runtime. I would bet that without the editor overhead discussed above, you would not even nearly see a 10x increase over FC. 

The allocations issue is trickier. It's because Bolt has boxing, whereas FC avoid that with lots of generics. This makes for a much simpler fuzzy finder, but it comes with minor allocations.  But there are two good news in sight:

  • C# generation in Bolt 2 will make that script entirely allocation-free, and much faster than Bolt 1, FC, or any other non-generative visual scripting tool
  • We are rethinking typing in Bolt to enforce strong typing whenever possible, which will minimize the cases where boxing is needed. Our end goal is a flat-zero alloc, but the minimal allocs from boxing has not prevented any game using Bolt from running or caused GC spikes from what I've ever heard.

Just confirming my hypothesis. Debugging data assignment (DDA) is really the culprit here. 

I tested a simple rotating cube scene:

With DDAWithout DDA
1.7 ms 0.38 ms

In that scene, DDA makes things 4.5x slower. Yikes. 

A simple solution would be to disable data on any graph that isn't currently in focus. But that would make debugging something that happened while you weren't looking much harder, so I don't want to do that. I'll have to find a creative way!

I've also found a minor cause of non-boxing allocation in v.1.4.0f7 that I'll fix for v.1.4.0f8, so allocations should go down substantially.

Here is what I mentioned.

Bolt

Flow Canvas

Profiler (Windows standalone) when hitting A key.


Maybe those allocations will be fixed with what you mentioned.In last post.

I also see some IMGUI type of calls when running stand alone. Are those needed?

As I said, there are two parts to this:

  • Boxing, which FlowCanvas managed to avoid with generics (60 B)
    This will be "fixed" in Bolt 2 with C# script generation
  • A bug in 1.4.0f7, which allocated memory for no good reason (52 B)

Not sure why'd you see IMGUI calls... maybe they're from the profiler itself?

Fixed in Next Version

I managed to refactor and fix the DDA's slowness. It's basically almost free now, so you can all expect a major runtime speed boost in the editor and memory alloc reduction for v.1.4.0f8! :)

Thanks for the report everyone!

Wow, that's awesome! Thanks for figuring it out, looking forward to the hotfix!

Cool, can wait to try the new hot fix. About the IMGUI I might create a new thread after trying hot fix. Those calls happen when using bolt, if uninstall Bolt and use plain c#, the IMGUI calls wont appear. 

Thanks for the hard work man. 

Fixed

The v.1.4.0f8 hotfix has been published!