+3
Fixed in Beta

FindInstances causes bottleneck in instantiation

Elapse 2 months ago • updated by Lazlo Bonin (Lead Developer) 9 hours ago 18 1 duplicate

*edits*

Hello,

I get huge lagspike when I instantiate a rigidbody object in runtime with a flow machine attached (macro does better than embed as recommended, but still). Without a flow machine its fine.
The behavior of the scenery although is functionally intact and running smooth after the instantiation.

Thanks, Marten

Bolt Version:
Unity Version:
.NET Version:

Duplicates 1

+1

Hey, I was about to post the same issue !!! We are having the same problem.

I made a test with a prefab that instantiate 90 gameobject with a Flow macro inside of them. It costed 1300ms !!!!!

I made an other test by replacing the Flow macro by a Regular Csharp scrpt and now it Cost only 13ms !!!!

Please fix that !

Under Review

Hi Elapse, hi Poinball!

It's hard to tell exactly what is happening without profiling, but here are two hypotheses:

  • Slow deserialization, because we use FullSerializer
  • Bolt's initialization for live runtime and debugging

The bad news is that neither of these is easy to fix. The good news is that we're fixing both of them in Bolt 2: we're replacing FullSerializer with Odin Serializer and making your graphs run from C# scripts directly. 

Note that v.1.4 has much better instantiation performance than v.1.3, especially on macros, so please update if you are still on an older version.

Performance is one of our top concerns, so we're taking this very seriously!

ohman... xD I just stop it here. the saved profiler data is 1.6 gb big. I will upload it, but maybe you find it already in the pic. *edit* when uploading the pic here it loses quality and gets unreadable. Have a look at this link instead:
https://drive.google.com/file/d/1gXcIxkoV-SkrVqBosCFuL0nHchgd8gMM/view?usp=sharing

I saw a comparison of the serializers. Looking forward for this Odin implementation, but even that 4x-speed up would leave a serious and unplayable lagspike.

I'm already on Bolt 1.4. The steps this instantiation takes is insane from what the profiler shows.

Just want to clarify that having an empty flow machine on the object doesn't cause it to lag. Only with a macro inserted (or embed type).

Hi Lazlo btw, and thanks for joining.

lagspike profiler pic.png

Working on Fix

Thanks for the profiling! I rarely get that detailed of a performance report, it really helps. 

So there are a few silver linings here:

  • The lag comes from a reflected-runtime initialization (FindChildInstances), so it would be completely eliminated with Bolt 2's generated C# scripts. If you don't get much or any lag with an empty flow machine, then Odin won't change much there, but it's good to know that it's not the bottleneck.
  • There might even be a way for us to optimize FindChildInstances, to eliminate or at least greatly reduce that lag in Bolt 1's reflected runtime.

I'll give it a look today, I was already working on another bug that involved this method. It's a risky optimization though, so it might not land this week!

If you still like to see the full profiler data, the upload just finished:

https://drive.google.com/file/d/1kNDnH7zCBBXRtCc2EeXJWI4Qj2UMNkVt/view?usp=sharing

Nice, looking forward to Bolt 2! But seems to take a longer while, so improvements on Bolt 1 would be great!

I'm getting in touch with object pooling. Maybe it will provide a usefull and better performant way to handle my stuff.

May the thoughts be with you!

By the way, 1st times Instantiate a object with bolt, takes more larger time than 2nd times Instantiate.


Starting from this function, each time an object with a bolt is instantiated, the calls will increase linearly, 3,6,9.....24,27

It looks like instantiating an object, it is necessary to serialize all previously instantiated objects.

This is very important and has seriously affected my game.

BoltInstantiateBugScene.unitypackage

1:make a new project by unity 2017.4. any version.

2:import bolt 1.4.f11

3: import this package and run simple scene,play game

4:click macro button,click,click

I use dnSpy to read Luqid.Core.Runtime,And Found GraphInstantiation.FindChildInstances  search all machines childinstance in scene when A new bolt Awake.

So It's series,Each bolt awake takes more time than before

Under Review

Hi Persuono,

This sounds like this other reported bottleneck:

https://support.ludiq.io/communities/5/topics/2319-findinstances-causes-bottleneck-in-instantiation

Can you enable deep profile and check if FindInstances is the cause?

Yes , them are same problem.But I describe it in more detail

look this code,why it search all machines  childinstance  in scene when a new bolt awake? This does not seem necessary 

Its mean The time I need to instantiate an object with a bolt is the sum of all the machines in the previous Graphinstantiace.machines.

Can you simply fix this bug on the basis of 1.4.f11 and send it to me separately?If you can't solve this bug, then my game can't use Bolt.

Yes, I'm actually working on a performance fix for the 1.4 cycle, it should land in 1.4.0f12 if everything goes well. It's not simple though, this code is required, and making it keep track of all live changes to the instances it complicated. I'm hoping to go from O(n) to O(1) performance. 
My 'simply' mean is like this 'if(Application.isEditor&& allowRuntimeEditor)'.
Users can set 'allowRuntimeEditor' in Preferences
Disabling live editing is not a direction I want to go towards, as it's a core feature that is intertwined everywhere in the runtime and editor code. Making sure this option works seamlessly is actually more complex than fixing the performance issue.
+1

I started implementing a fix for this bottleneck. It's a very complex, core part of Bolt, so it will likely require some testing. The next version will probably have a beta phase for that reason.

+3
Fixed in Beta

Good news! The fix seems to work and sustained some basic testing.

Instantiation was generally slow, but especially when you already had multiple instances of the graph in the scene. The code was refactored to make the amount of existing graphs irrelevant.

The test I ran was instantiating 10 objects with 4 descendant graphs at a time. The peak I captured was after 100 of these objects were already instantiated, and I instantiated up to the 110th.

Before the fix, instantiation took 123ms and allocated a whopping 6.8 MB of memory:



After the fix, instantation took 9ms and allocated only 207 KB of memory:



In that specific test instance, that's a >12x speed boost and >32x memory allocation reduction!

Profiler peaks for fun (left is after the fix; right is before the fix):



The remaining instantiation duration and allocation is due to FullSerializer. This is something we're fixing in Bolt 2 by getting rid of it altogether and replacing it with Odin Serializer, which is much faster and much leaner.

This optimization will land in the next version, which is likely to be 1.4.1b1 because so many core systems were modified.

glad to hear this!

Now that the beta is out (https://ludiq.io/bolt/download/1.4.1b1), can you tell me if you met the expected performance gains?