CGE 2D render too slow

I use Zengl render tiled map fps=700+, but cge have only fps=60+

How can cge be faster?

Note: It’s not too effective comparing FPS for values above monitor’s refresh rate. Most drivers are capped at 60FPS (like Mesa) and everything above that is “fake”, and actually “harmful” - it means CPU is (over)loaded with work that doesn’t result in any useful outcome.

I.e. to have a more proper test you most likely would want to have a very heavy Tiled map that would render at 30 FPS in ZenGl.

Now specifically about Tiled maps - I myself haven’t used those but looking at castle-engine/castletiledmap_control.inc at 599eaf9b6d6738ee474bf98eca95c1fa17cb162c · castle-engine/castle-engine · GitHub I see that it doesn’t use “batched render” feature which can speed up the rendering significantly. It’s not absolutely trivial to implement it but I don’t think it’s too hard (unless I’m missing some crucial part of Tiled implementation).

I think that simply no one has hit a performance issue with TiledMapControl so far, so nobody looked into the issue. So, if you need to improve the performance of the component (not just for comparison, but for a real usecase) you could have a look at it yourself (as I’ve mentioned, at the first glance it’s rather easy [but will potentially need a decent amount of work and require understanding internal Tiled structure and the way it’s handled in CGE], I’ll explain how if you’re interested in taking this task) - I’m sure such pull request will be more than welcome, or create a ticket at GitHub and someone will look at it (but obviously this might take much longer :)).

use TiledMapControl show fps too low,fps 30 - 200 +

I render one map using TDrawableImage ,fps = 110+ ,
If I use zengl api draw, fps = 1000+

Because CGE is too huge, I don’t know how to adjust to speed up the 2D rendering speed. Can I give a look at the direction?

What you need is to use “batched” rendering.

E.g. you can have a look at how I implemented it here: code/batchedmap.pas · master · EugeneLoza / Kryftolike · GitLab to render absolutely enormous maps.

So, what you do is instead of calling a regular TDrawableImage.Draw(coordinates, size) you call an overloaded TDrawableImage.Draw(array of screen rectangles) (see Castle Game Engine: CastleGLImages: Class TDrawableImage). I.e. instead of

for X := 0 to 99 do
  for Y := 0 to 99 do
    Image.Draw(X * 10, Y * 10); // this will create 10000 draw-calls

you do

Count := 0;
for X := 0 to 99 do
  for Y := 0 to 99 do
  begin
     ScreenRects[Count] := FloatRectangle(X * 10, Y * 10, 10, 10);
     ImageRects[Count] := FloatRectangle(0, 0, 10, 10);
     Inc(Count);
  end;
Image.Draw(@ScreenRects[0], @ImageRects[0], Count); // this will be rendered in a single draw call

Note that the problem is that in order for that to work it needs to have “all map tiles” in a single texture which could be impossible for a Tiled map in a straightforward way and will require additional workarounds (sorting tiles by the texture they use, which in some complex cases may slow down performance instead of speeding it up).

There is a way to speed up the renderer even further, however it’ll require manually implementing map as a set of TShapes which is not trivial. Roughly it means doing this: https://gitlab.com/EugeneLoza/CastleCraft/-/blob/master/castlecraft.lpr#L114 just in a 2D case. The problem is that when you call the batched version of TDrawableImage.Draw it reallocates some memory every frame. It’s negligible for “small maps” but can quickly become a bottleneck if the amount of tiles goes over 10-20 thousands.

------------------ CGE code----------------------
for j := y1 to y2 do
for i := x1 to x2 do
begin
if FloorArray[i, j] <> nil then
begin
if (j mod 2) = 0 then
tx := i * TileWidth
else
tx := i * TileWidth + TileWidth div 2;
ty := MapHeight*TileHeight div 2 - j * TileHeight div 2;
N := FloorArray[i, j].GlobalIndex;
tx:= tx + XOffset;
ty:= ty + YOffset;
FloorArray[i, j].TileImage.Draw(tx, ty, N); // use DrawableImage.Draw
end;

------------------ zengl code----------------------
for j := y1 to y2 do
for i := x1 to x2 do
begin
if MapArray[i, j].FloorImageID >= 0 then
begin
if (j mod 2) = 0 then // 偶数行
tx := i * g_iMapTileWidth
else
tx := i * g_iMapTileWidth + g_iMapTileWidth div 2;
ty := j * g_iMapTileHeight div 2;
N := MapArray[i, j].FloorImageID;
img_base.DrawTile(tx, ty, N); //use csprite2d_Draw
end;

CGE’s DrawableImage 10 times slower than Zengl’s csprite2d_Draw

I’m not familiar with ZenGL or implementation of csprite2d. Maybe they’re using dynamic batching for sprites - like CGE does for sprites - but not TDrawableImage, it’s a rather low-level feature that you should use if you know what you’re doing, not that it can’t be improved, but if you go low-level you go there to optimize things yourself. This way if you try my solution above, you should get the same/similar speed. But again, don’t compare FPS above 60.

  1. See Optimization and profiling | Manual | Castle Game Engine for notes how to compare FPS.

  2. If you really want to compare full FPS, disregaring the fact that some frames drawn were never actually displayed, be sure to set ApplicationProperties.LimitFPS to 0, otherwise CGE limits them to 0.

  3. That being said, our Tiled drawing performance indeed can suck for larger maps. This is a larger TODO. The plan is to:

    • Advise loading Tiled maps to TCastleScene

    • Make DynamicBatching include more cases, such that Tiled maps in TCastleScene get more optimized.

    But that’s a larger work inside CGE.

    There is no perfect solution right now that is easy. Some solutions:

    A. You can “bake” a static map layers into big images, and rendering them as big images underneath TCastleTiledMapControl.

    B. Or you can implement your own map rendering using TDrawableImage.Draw with batching.

    Both A and B of course are far from optimal, as they mean you’re not really “just loading Tiled map into TCastleTiledMapControl and that’s it”.

    I’m sure that with B, you can achieve the same speed as with ZenGL, as underneath these are really just simple OpenGL calls. As Eugene notes, ZenGL likely does batching as lower level, while our TDrawableImage.Draw does not do automatic batching, you need to pass yourself an array of images to draw to one TDrawableImage.Draw. In CGE, batching is done for TCastleScene shapes.

Optimizations! :slight_smile:

Long story short: our Tiled rendering is now 30x faster.

Details:

  1. Added examples/tiled/map_viewer/data/maps/desert_big.tmx which is a 400x400 map and clearly showed our (past) performance problems.

    Before optimizations below, it had 2.7 FPS on my system. (yeah, less than 3 FPS…)

    Other maps in examples/tiled/map_viewer/data/ give me easily FPS > 100 with ApplicationProperties.LimitFps = 0, so they are not useful for performance testing.

    In this post, all FPS measurements are done with examples/tiled/map_viewer, compiled in release mode (castle-engine compile --mode=release), without FPS limit (ApplicationProperties.LimitFps := 0 in gameinitialize.pas).

  2. Implemented automatic batching in our TDrawableImage which can be activated by just TDrawableImage.BatchingBegin / TDrawableImage.BatchingEnd. This should be useful for various cases of TDrawableImage usage.

  3. Used automatic batching in TCastleTiledMapControl.

    This increases FPS to 24 on my system for desert_big.tmx :slight_smile: And it stays like this even at significant zoom-out. We’re back in business :slight_smile:

  4. Removed drawing off-screen images.

    This increases FPS to 61 on my system. And FPS stay reasonable this at significant zoom-out.


Thanks michalis’s answer , I try to 1 but FPS not improved. TDrawableImage.BatchingBegin and TDrawableImage.BatchingEnd, It looks effective,Where to modify, please give a demo。Open desert_big.tmx,it had 3 FPS on my system.

I have tried several frameworks, including CGE、Zengl、Afterwarp Framework、SDL2、DelphiX etc.

Most framework stops updating, or the update is very slow. But Only CGE is actively developing.

Compare their rendering performance, I think Afterwarp Framework>Zengl>SDL2>CGE>DelphiX

What I want to do is a game like Flare (Diablo like)

There are many layers of maps,floor wall obj etc. Each layer consists of many tile

When I use CGE draw floor layer, FPS about 100, if I use Afterwarp draw floor layer FPS about 1500+ , if I use Zengl draw floor layer FPS about 1000+

The problem is that I just painted a layer of map(floor layer),There are also multiple layers of maps to draw, and role animation,If everything is render, FPS will be very reduced.

So I think CGE rendering 2D is too slow

As for how to test it:

  • Just wait a few hours for Jenkins to build latest CGE with batching being used.

    ( You can watch it looking at Comparing snapshot...master · castle-engine/castle-engine · GitHub , right now it shows that commit “Implement almost automatic TDrawableImage batching, just use…” is not yet build in snapshot, but in a few yours it should disappear from that page → which means that snapshots contain it. I’ll make a note here when it’s ready actually. )

    Then just get latest CGE from Download | Castle Game Engine .

  • Then run examples/tiled/map_viewer , open in CGE editor, switch to “Release” mode using menu “Run → Release Mode” and then just 'Compile And Run" from CGE editor.

There is nothing you need to do from code to use it with TCastleTiledMapControl, rendering of TCastleTiledMapControl uses it automatically, and you should not get 1 FPS anymore :slight_smile:

As for using TDrawableImage batching in your own applications, if you use TDrawableImage.Draw explicitly: see API docs on Castle Game Engine: CastleGLImages: Class TDrawableImage . There’s not much to do, you just surround your rendering with TDrawableImage.BatchingBeginTDrawableImage.BatchingEnd and it should work like magic.

As for the rest of your measurements, we need to have a good testcase so that we’re all talking about the same thing :slight_smile: If you have any particular map where you’re testing CGE / ZenGL / Afterwarp and can share it, it will be much appreciated. And remember

  • FPS above your monitor refresh rate are easy to get and don’t mean that much. And for users, it doesn’t really matter if you have 100, 1000 or 1500 FPS, users probably see at most 60 of them anyway :slight_smile: See Optimization and profiling | Manual | Castle Game Engine for various notes how GPUs work.

  • CGE has by default Application.LimitFps set to 100, to not eat CPU (and laptop battery) rendering useless frames. So if you compare CGE, be sure to disable it using Application.LimitFps := 0

  • … but, as Eugene also mentioned, it is in general best to have a testcase that doesn’t have high FPS from start. You should not compare 100 and 1000+ vs 1500+ :slight_smile: Instead it’s better to have a testcase that forces FPS to drop below 60 (that’s why I made that desert_big.tmx) and work on optimizing it.

I would also like to see the report how we detect your GPU, in case CGE decided to use some “safe but unoptimal” route because of something. To get this, you can use “Help → System Information” from CGE (open any project to see the menu) and then “Save To File” to get the report.

So, for further optimization, it would be best if you can submit us how exactly are you testing – if you have any project and/or map you can share, this would be best, to make sure we talk about the same thing and we can test it too and say where’s the issue (and possibly improve it in CGE). If you cannot share it publicly but can share it privately, that is also an option, you can send it e.g. by email to me ([email protected]).

I look forward to the performance adjustment of the new version. CGE’s biggest advantage is that the author is working hard to update.

ZenGL use ‘batch2d_Begin… batch2d_End’ in drawing, The speed improvement may be the same as TDrawableImage.BatchingBegin……TDrawableImage.BatchingEnd

I want to wait for the new version to rewrite the performance example to send it to you, it takes some time.

I hope CGE’s rendering speed is the same as ZenGL

I found two demo , which can explain this rendering performance problem

1、isometric_game
It had 120 FPS on my system

I found BatchingBegin at ExeFile, but I don’t know how to add this

2、deprecated_to_upgrade\isometric_game
This demo draws more tiles.
It had 600 FPS on my system.

After I add TDrawableImage.BatchingBegin TDrawableImage.BatchingEnd , it had 1800 FPS

Find this problem that affects 2D performance,Let isometric_game raise from 120FPS to 2000+FPS

isometric_game is just a very simple example. I think 2000+FPS is normal. After I add map complexity, various animations, various UI, it can maintain more than 60 frames. If it have 120FPS now, after I add more, it will fall to about 10 frames.

examples/isometric_game

and

examples/deprecated_to_upgrade/isometric_game

use very different rendering methods.

  1. The examples/deprecated_to_upgrade/isometric_game uses just direct image rendering, TDrawableImage. This can indeed be trivially speeded up now using new TDrawableImage.BatchingBegin and TDrawableImage.BatchingEnd.

  2. The examples/isometric_game uses a TCastleViewport, with lots of images inside (TCastleImageTransform). It can easily include more scenes, also sprite sheets, loaded from TCastleScene. Scenes have a lot of features, like easily running sprite sheet animations, collisions and physics, 3D, possible to be designed in CGE editor etc. See e.g. new CGE project from template “2D game”.

    This is in general the approach we want to pursue in the future. It is much more feature-packed. And extending it helps a lot of use-cases (various types of 3D and 2D games can be constructed as a composition of TCastleScene inside a viewport). See Viewport with scenes, camera, navigation | Manual | Castle Game Engine about viewports and scenes.

    However, for now, it is not as efficient. While TCastleScene is great at rendering heavy 3D stuff, it is not really that efficient when using a loot of trivial TCastleScene or trivial TCastleImageTransform for each map tile. So this case is just not optimal yet.

    I know that examples/isometric_game will not be efficient if you try to build a huge map there, or even load huge Tiled map to one TCastleScene. The good news is I know exactly why :), this is what I described in my previous post in this thread CGE 2D render too slow - #7 by michalis

So in the meantime, specifically for map rendering, using 2D TCastleTiledMapControl is a more efficient solution (albeit limited only to the simple case of 2D map). We have a manual page also How to render 2D games with images and sprites | Manual | Castle Game Engine that describes 2 approaches for 2D rendering we feature now. There is even a possibility to combine them (like render transparent TCastleViewport over TCastleTiledMapControl).

So to be clear, I know exactly what to do to speedup examples/isometric_game, and it will actually help a lot of other use-cases (not only 2D games with map). But it will take time, which is why in the meantime we advise some alternative approaches.

As for your time measurements, as we mentioned, comparing FPS above 60 is not very useful, and it is better to talk about a precise test-case than speculating. You do tests and measurements, but we should make sure that we test the same thing and that we know your use-case. Then we can advise best how to achieve it, in efficient way, using current CGE, before we implement planned optimizations in AD 2.

So, please send us your testcases :slight_smile: I would happily see what exactly you’re testing. And if you have any plan, I would happily see exactly what you plan, to advise best.

Note that CGE downloads on Download | Castle Game Engine now contain the latest optimizations. It took Jenkins a while, due to unrelated troubles :slight_smile:

Thank you very much for michalis’s guidance. I generally understand. The demo was probably written by me half a year ago, it can’t compile now after i update CGE.I will rewrite a demo, add various complex situations. In fact, half a year ago I gave up to use CGE after I found that the efficiency was too low. I know how to improve now.It takes some time, I will finish this demo.

1 Like

Note that I’m working on many renderer improvements and cleanups in a CGE branch Commits · castle-engine/castle-engine · GitHub since ~2 weeks. I just added to the features included there 1 code refactor that gets us closer to implementing “batching across many TCastleScene / TCastleImageTransform instances”, Big rearrangement of RenderBegin/RenderEnd code, to do it only once · castle-engine/[email protected] · GitHub . So I am really working on making TCastleScene / TCastleImageTransform usage (thus example examples/isometric_game) more optimal :slight_smile:

I use TDrawableImage drawing map completely , After I add TDrawableImage.BatchingBegin TDrawableImage.BatchingEnd,demo has achieved the highest performance, as high as ZENGL(1800+FPS). Finally found this bug ( I think )

Add BatchingBegin…BatchingEnd is the key. This will make CGE a huge performance improvement

I look forward to michalis’s development progress

1 Like