I use Zengl render tiled map fps=700+, but cge have only fps=60+
How can cge be faster?
I use Zengl render tiled map fps=700+, but cge have only fps=60+
How can cge be faster?
Note: It’s not too effective comparing FPS for values above monitor’s refresh rate. Most drivers are capped at 60FPS (like Mesa) and everything above that is “fake”, and actually “harmful” - it means CPU is (over)loaded with work that doesn’t result in any useful outcome.
I.e. to have a more proper test you most likely would want to have a very heavy Tiled map that would render at 30 FPS in ZenGl.
Now specifically about Tiled maps - I myself haven’t used those but looking at castle-engine/castletiledmap_control.inc at 599eaf9b6d6738ee474bf98eca95c1fa17cb162c · castle-engine/castle-engine · GitHub I see that it doesn’t use “batched render” feature which can speed up the rendering significantly. It’s not absolutely trivial to implement it but I don’t think it’s too hard (unless I’m missing some crucial part of Tiled implementation).
I think that simply no one has hit a performance issue with
TiledMapControl so far, so nobody looked into the issue. So, if you need to improve the performance of the component (not just for comparison, but for a real usecase) you could have a look at it yourself (as I’ve mentioned, at the first glance it’s rather easy [but will potentially need a decent amount of work and require understanding internal Tiled structure and the way it’s handled in CGE], I’ll explain how if you’re interested in taking this task) - I’m sure such pull request will be more than welcome, or create a ticket at GitHub and someone will look at it (but obviously this might take much longer :)).
use TiledMapControl show fps too low，fps 30 - 200 +
I render one map using TDrawableImage ，fps = 110+ ，
If I use zengl api draw， fps = 1000+
Because CGE is too huge, I don’t know how to adjust to speed up the 2D rendering speed. Can I give a look at the direction?
What you need is to use “batched” rendering.
E.g. you can have a look at how I implemented it here: code/batchedmap.pas · master · EugeneLoza / Kryftolike · GitLab to render absolutely enormous maps.
So, what you do is instead of calling a regular
TDrawableImage.Draw(coordinates, size) you call an overloaded
TDrawableImage.Draw(array of screen rectangles) (see Castle Game Engine: CastleGLImages: Class TDrawableImage). I.e. instead of
for X := 0 to 99 do for Y := 0 to 99 do Image.Draw(X * 10, Y * 10); // this will create 10000 draw-calls
Count := 0; for X := 0 to 99 do for Y := 0 to 99 do begin ScreenRects[Count] := FloatRectangle(X * 10, Y * 10, 10, 10); ImageRects[Count] := FloatRectangle(0, 0, 10, 10); Inc(Count); end; Image.Draw(@ScreenRects, @ImageRects, Count); // this will be rendered in a single draw call
Note that the problem is that in order for that to work it needs to have “all map tiles” in a single texture which could be impossible for a Tiled map in a straightforward way and will require additional workarounds (sorting tiles by the texture they use, which in some complex cases may slow down performance instead of speeding it up).
There is a way to speed up the renderer even further, however it’ll require manually implementing map as a set of
TShapes which is not trivial. Roughly it means doing this: https://gitlab.com/EugeneLoza/CastleCraft/-/blob/master/castlecraft.lpr#L114 just in a 2D case. The problem is that when you call the batched version of
TDrawableImage.Draw it reallocates some memory every frame. It’s negligible for “small maps” but can quickly become a bottleneck if the amount of tiles goes over 10-20 thousands.
------------------ CGE code----------------------
for j := y1 to y2 do
for i := x1 to x2 do
if FloorArray[i, j] <> nil then
if (j mod 2) = 0 then
tx := i * TileWidth
tx := i * TileWidth + TileWidth div 2;
ty := MapHeight*TileHeight div 2 - j * TileHeight div 2;
N := FloorArray[i, j].GlobalIndex;
tx:= tx + XOffset;
ty:= ty + YOffset;
FloorArray[i, j].TileImage.Draw(tx, ty, N); // use DrawableImage.Draw
------------------ zengl code----------------------
for j := y1 to y2 do
for i := x1 to x2 do
if MapArray[i, j].FloorImageID >= 0 then
if (j mod 2) = 0 then // 偶数行
tx := i * g_iMapTileWidth
tx := i * g_iMapTileWidth + g_iMapTileWidth div 2;
ty := j * g_iMapTileHeight div 2;
N := MapArray[i, j].FloorImageID;
img_base.DrawTile(tx, ty, N); //use csprite2d_Draw
CGE’s DrawableImage 10 times slower than Zengl’s csprite2d_Draw
I’m not familiar with ZenGL or implementation of
csprite2d. Maybe they’re using dynamic batching for sprites - like CGE does for sprites - but not TDrawableImage, it’s a rather low-level feature that you should use if you know what you’re doing, not that it can’t be improved, but if you go low-level you go there to optimize things yourself. This way if you try my solution above, you should get the same/similar speed. But again, don’t compare FPS above 60.
See Optimization and profiling | Manual | Castle Game Engine for notes how to compare FPS.
If you really want to compare full FPS, disregaring the fact that some frames drawn were never actually displayed, be sure to set
ApplicationProperties.LimitFPS to 0, otherwise CGE limits them to 0.
That being said, our Tiled drawing performance indeed can suck for larger maps. This is a larger TODO. The plan is to:
Advise loading Tiled maps to
DynamicBatching include more cases, such that Tiled maps in
TCastleScene get more optimized.
But that’s a larger work inside CGE.
There is no perfect solution right now that is easy. Some solutions:
A. You can “bake” a static map layers into big images, and rendering them as big images underneath
B. Or you can implement your own map rendering using
TDrawableImage.Draw with batching.
Both A and B of course are far from optimal, as they mean you’re not really “just loading Tiled map into
TCastleTiledMapControl and that’s it”.
I’m sure that with B, you can achieve the same speed as with ZenGL, as underneath these are really just simple OpenGL calls. As Eugene notes, ZenGL likely does batching as lower level, while our
TDrawableImage.Draw does not do automatic batching, you need to pass yourself an array of images to draw to one
TDrawableImage.Draw. In CGE, batching is done for
Long story short: our Tiled rendering is now 30x faster.
examples/tiled/map_viewer/data/maps/desert_big.tmx which is a 400x400 map and clearly showed our (past) performance problems.
Before optimizations below, it had 2.7 FPS on my system. (yeah, less than 3 FPS…)
Other maps in
examples/tiled/map_viewer/data/ give me easily FPS > 100 with
ApplicationProperties.LimitFps = 0, so they are not useful for performance testing.
In this post, all FPS measurements are done with
examples/tiled/map_viewer, compiled in release mode (
castle-engine compile --mode=release), without FPS limit (
ApplicationProperties.LimitFps := 0 in gameinitialize.pas).
Implemented automatic batching in our
TDrawableImage which can be activated by just
TDrawableImage.BatchingEnd. This should be useful for various cases of TDrawableImage usage.
Used automatic batching in TCastleTiledMapControl.
This increases FPS to 24 on my system for
desert_big.tmx And it stays like this even at significant zoom-out. We’re back in business
Removed drawing off-screen images.
This increases FPS to 61 on my system. And FPS stay reasonable this at significant zoom-out.
Thanks michalis’s answer , I try to 1 but FPS not improved. TDrawableImage.BatchingBegin and TDrawableImage.BatchingEnd, It looks effective，Where to modify, please give a demo。Open desert_big.tmx，it had 3 FPS on my system.
I have tried several frameworks, including CGE、Zengl、Afterwarp Framework、SDL2、DelphiX etc.
Most framework stops updating, or the update is very slow. But Only CGE is actively developing.
Compare their rendering performance, I think Afterwarp Framework>Zengl>SDL2>CGE>DelphiX
What I want to do is a game like Flare (Diablo like)
There are many layers of maps,floor wall obj etc. Each layer consists of many tile
When I use CGE draw floor layer, FPS about 100, if I use Afterwarp draw floor layer FPS about 1500+ , if I use Zengl draw floor layer FPS about 1000+
The problem is that I just painted a layer of map（floor layer），There are also multiple layers of maps to draw, and role animation，If everything is render, FPS will be very reduced.
So I think CGE rendering 2D is too slow
As for how to test it:
Just wait a few hours for Jenkins to build latest CGE with batching being used.
( You can watch it looking at Comparing snapshot...master · castle-engine/castle-engine · GitHub , right now it shows that commit “Implement almost automatic TDrawableImage batching, just use…” is not yet build in snapshot, but in a few yours it should disappear from that page → which means that snapshots contain it. I’ll make a note here when it’s ready actually. )
Then just get latest CGE from Download | Castle Game Engine .
Then run examples/tiled/map_viewer , open in CGE editor, switch to “Release” mode using menu “Run → Release Mode” and then just 'Compile And Run" from CGE editor.
There is nothing you need to do from code to use it with TCastleTiledMapControl, rendering of TCastleTiledMapControl uses it automatically, and you should not get 1 FPS anymore
As for using TDrawableImage batching in your own applications, if you use
TDrawableImage.Draw explicitly: see API docs on Castle Game Engine: CastleGLImages: Class TDrawableImage . There’s not much to do, you just surround your rendering with
TDrawableImage.BatchingEnd and it should work like magic.
As for the rest of your measurements, we need to have a good testcase so that we’re all talking about the same thing If you have any particular map where you’re testing CGE / ZenGL / Afterwarp and can share it, it will be much appreciated. And remember
FPS above your monitor refresh rate are easy to get and don’t mean that much. And for users, it doesn’t really matter if you have 100, 1000 or 1500 FPS, users probably see at most 60 of them anyway See Optimization and profiling | Manual | Castle Game Engine for various notes how GPUs work.
CGE has by default
Application.LimitFps set to 100, to not eat CPU (and laptop battery) rendering useless frames. So if you compare CGE, be sure to disable it using
Application.LimitFps := 0
… but, as Eugene also mentioned, it is in general best to have a testcase that doesn’t have high FPS from start. You should not compare 100 and 1000+ vs 1500+ Instead it’s better to have a testcase that forces FPS to drop below 60 (that’s why I made that
desert_big.tmx) and work on optimizing it.
I would also like to see the report how we detect your GPU, in case CGE decided to use some “safe but unoptimal” route because of something. To get this, you can use “Help → System Information” from CGE (open any project to see the menu) and then “Save To File” to get the report.
So, for further optimization, it would be best if you can submit us how exactly are you testing – if you have any project and/or map you can share, this would be best, to make sure we talk about the same thing and we can test it too and say where’s the issue (and possibly improve it in CGE). If you cannot share it publicly but can share it privately, that is also an option, you can send it e.g. by email to me (
I look forward to the performance adjustment of the new version. CGE’s biggest advantage is that the author is working hard to update.
ZenGL use ‘batch2d_Begin… batch2d_End’ in drawing, The speed improvement may be the same as TDrawableImage.BatchingBegin……TDrawableImage.BatchingEnd
I want to wait for the new version to rewrite the performance example to send it to you, it takes some time.
I hope CGE’s rendering speed is the same as ZenGL
I found two demo , which can explain this rendering performance problem
It had 120 FPS on my system
I found BatchingBegin at ExeFile, but I don’t know how to add this
This demo draws more tiles.
It had 600 FPS on my system.
After I add
TDrawableImage.BatchingBegin TDrawableImage.BatchingEnd ， it had 1800 FPS
Find this problem that affects 2D performance，Let isometric_game raise from 120FPS to 2000+FPS
isometric_game is just a very simple example. I think 2000+FPS is normal. After I add map complexity, various animations, various UI, it can maintain more than 60 frames. If it have 120FPS now, after I add more， it will fall to about 10 frames.
use very different rendering methods.
examples/deprecated_to_upgrade/isometric_game uses just direct image rendering,
TDrawableImage. This can indeed be trivially speeded up now using new
examples/isometric_game uses a
TCastleViewport, with lots of images inside (
TCastleImageTransform). It can easily include more scenes, also sprite sheets, loaded from
TCastleScene. Scenes have a lot of features, like easily running sprite sheet animations, collisions and physics, 3D, possible to be designed in CGE editor etc. See e.g. new CGE project from template “2D game”.
This is in general the approach we want to pursue in the future. It is much more feature-packed. And extending it helps a lot of use-cases (various types of 3D and 2D games can be constructed as a composition of
TCastleScene inside a viewport). See Viewport with scenes, camera, navigation | Manual | Castle Game Engine about viewports and scenes.
However, for now, it is not as efficient. While
TCastleScene is great at rendering heavy 3D stuff, it is not really that efficient when using a loot of trivial
TCastleScene or trivial
TCastleImageTransform for each map tile. So this case is just not optimal yet.
I know that
examples/isometric_game will not be efficient if you try to build a huge map there, or even load huge Tiled map to one
TCastleScene. The good news is I know exactly why :), this is what I described in my previous post in this thread CGE 2D render too slow - #7 by michalis
So in the meantime, specifically for map rendering, using 2D TCastleTiledMapControl is a more efficient solution (albeit limited only to the simple case of 2D map). We have a manual page also How to render 2D games with images and sprites | Manual | Castle Game Engine that describes 2 approaches for 2D rendering we feature now. There is even a possibility to combine them (like render transparent TCastleViewport over TCastleTiledMapControl).
So to be clear, I know exactly what to do to speedup
examples/isometric_game, and it will actually help a lot of other use-cases (not only 2D games with map). But it will take time, which is why in the meantime we advise some alternative approaches.
As for your time measurements, as we mentioned, comparing FPS above 60 is not very useful, and it is better to talk about a precise test-case than speculating. You do tests and measurements, but we should make sure that we test the same thing and that we know your use-case. Then we can advise best how to achieve it, in efficient way, using current CGE, before we implement planned optimizations in AD 2.
So, please send us your testcases I would happily see what exactly you’re testing. And if you have any plan, I would happily see exactly what you plan, to advise best.
Note that CGE downloads on Download | Castle Game Engine now contain the latest optimizations. It took Jenkins a while, due to unrelated troubles
Thank you very much for michalis’s guidance. I generally understand. The demo was probably written by me half a year ago, it can’t compile now after i update CGE.I will rewrite a demo, add various complex situations. In fact, half a year ago I gave up to use CGE after I found that the efficiency was too low. I know how to improve now.It takes some time, I will finish this demo.
Note that I’m working on many renderer improvements and cleanups in a CGE branch Commits · castle-engine/castle-engine · GitHub since ~2 weeks. I just added to the features included there 1 code refactor that gets us closer to implementing “batching across many TCastleScene / TCastleImageTransform instances”, Big rearrangement of RenderBegin/RenderEnd code, to do it only once · castle-engine/[email protected] · GitHub . So I am really working on making TCastleScene / TCastleImageTransform usage (thus example
examples/isometric_game) more optimal
I use TDrawableImage drawing map completely , After I add TDrawableImage.BatchingBegin TDrawableImage.BatchingEnd,demo has achieved the highest performance, as high as ZENGL（1800+FPS）. Finally found this bug ( I think )
Add BatchingBegin…BatchingEnd is the key. This will make CGE a huge performance improvement
I look forward to michalis’s development progress