Numerous optimizations and dynamic batching

michalis · October 13, 2019, 12:23am

Thanks to many sessions dedicated to optimizing CGE rendering and animation, we have a number of new optimizations (some enabled by default, some experimental — you need to enable them yourself) and new ways to profile your applications.

Various “bottlenecks” (things that noticeably affect the speed) are now drastically optimized. This includes iterating over shapes (it now uses a cache and is instant), transforming frustum (uses a different algorithm that is > 2x faster), avoiding unnecessary preparing of resources (when everything is prepared), avoiding useless passes when shadow volumes are not used.

You’re most encouraged to test your code with the new engine version, and report the results 🙂
A new optimization, enabled by default: frustum culling of the whole scene (TCastleScene.SceneFrustumCulling). This works hand-in-hand with the existing per-shape frustum culling, which can be improved when using an octree (if ssRendering is in TCastleScene.Spatial).
A new experimental powerful optimization called “dynamic batching” is implemented. Multiple X3D shapes (with the same appearance) can be merged into one, and rendered using one “draw call” to OpenGL/OpenGLES. In some applications, this offers incredible speedup — if you have a lot of simple shapes with the same appearance.

You need to activate this optimization explicitly by setting DynamicBatching (global Boolean variable) to true. See also the the relevant section in the manual.

Please treat this optimization as “experimental” for now. There are many corner cases, and I’m not yet sure whether I covered them all. IOW, it is (temporarily) possible that this optimization will break rendering in some cases. It is also certain that we don’t use all merging possibilities, yet. This will be extended, and hardened, in the future.

Also be aware that this optimization is not guaranteed to be beneficial. We will spend some time, each frame, analyzing which shapes are “good candidates for merging”. While I tried to make this analysis very fast, but there are definitely cases when we will waste more time than we gain. If you have thousands of shapes, all using a completely different appearance, then “dynamic batching” will not merge anything, and will only waste time trying.
Another new experimental powerful optimization is implemented for animations that heavily transform the shapes (e.g. typical Spine animations). You can activate it by assigning the global variable InternalFastTransformUpdate to true.

See the documentation of InternalFastTransformUpdate for the risks. This optimization assumes that your animations only transform shapes. If you use X3D animations to transform e.g. lights, then they will fail. I will fix it at some point, and then enable this optimization by default, always.
A new simple way to observe what is rendered: just display somewhere (e.g. using TCastleLabel) SceneManager.Statistics.ToString.
A new FrameProfiler, to observe what is eating time per-frame (e.g. whether the problem is in OnRender or OnUpdate).
On Nintendo Switch, we are integrated with a cool profiler from Nintendo.
To activate risky optimization on Aarch64, you can define CASTLE_ENGINE_ENABLE_AARCH64_OPTIMIZER symbol to true. We’re working on making them non-risky (we need to reproduce and submit some cases to FPC), at which point we will activate them automatically.

The manual page about optimizations was updated to describe various features mentioned here.