I use Zengl render tiled map fps=700+, but cge have only fps=60+
How can cge be faster?
I use Zengl render tiled map fps=700+, but cge have only fps=60+
How can cge be faster?
Note: Itâs not too effective comparing FPS for values above monitorâs refresh rate. Most drivers are capped at 60FPS (like Mesa) and everything above that is âfakeâ, and actually âharmfulâ - it means CPU is (over)loaded with work that doesnât result in any useful outcome.
I.e. to have a more proper test you most likely would want to have a very heavy Tiled map that would render at 30 FPS in ZenGl.
Now specifically about Tiled maps - I myself havenât used those but looking at castle-engine/castletiledmap_control.inc at 599eaf9b6d6738ee474bf98eca95c1fa17cb162c · castle-engine/castle-engine · GitHub I see that it doesnât use âbatched renderâ feature which can speed up the rendering significantly. Itâs not absolutely trivial to implement it but I donât think itâs too hard (unless Iâm missing some crucial part of Tiled implementation).
I think that simply no one has hit a performance issue with TiledMapControl
so far, so nobody looked into the issue. So, if you need to improve the performance of the component (not just for comparison, but for a real usecase) you could have a look at it yourself (as Iâve mentioned, at the first glance itâs rather easy [but will potentially need a decent amount of work and require understanding internal Tiled structure and the way itâs handled in CGE], Iâll explain how if youâre interested in taking this task) - Iâm sure such pull request will be more than welcome, or create a ticket at GitHub and someone will look at it (but obviously this might take much longer :)).
use TiledMapControl show fps too lowïŒfps 30 - 200 +
I render one map using TDrawableImage ïŒfps = 110+ ïŒ
If I use zengl api drawïŒ fps = 1000+
Because CGE is too huge, I donât know how to adjust to speed up the 2D rendering speed. Can I give a look at the direction?
What you need is to use âbatchedâ rendering.
E.g. you can have a look at how I implemented it here: code/batchedmap.pas · master · EugeneLoza / Kryftolike · GitLab to render absolutely enormous maps.
So, what you do is instead of calling a regular TDrawableImage.Draw(coordinates, size)
you call an overloaded TDrawableImage.Draw(array of screen rectangles)
(see Castle Game Engine: CastleGLImages: Class TDrawableImage). I.e. instead of
for X := 0 to 99 do
for Y := 0 to 99 do
Image.Draw(X * 10, Y * 10); // this will create 10000 draw-calls
you do
Count := 0;
for X := 0 to 99 do
for Y := 0 to 99 do
begin
ScreenRects[Count] := FloatRectangle(X * 10, Y * 10, 10, 10);
ImageRects[Count] := FloatRectangle(0, 0, 10, 10);
Inc(Count);
end;
Image.Draw(@ScreenRects[0], @ImageRects[0], Count); // this will be rendered in a single draw call
Note that the problem is that in order for that to work it needs to have âall map tilesâ in a single texture which could be impossible for a Tiled map in a straightforward way and will require additional workarounds (sorting tiles by the texture they use, which in some complex cases may slow down performance instead of speeding it up).
There is a way to speed up the renderer even further, however itâll require manually implementing map as a set of TShape
s which is not trivial. Roughly it means doing this: https://gitlab.com/EugeneLoza/CastleCraft/-/blob/master/castlecraft.lpr#L114 just in a 2D case. The problem is that when you call the batched version of TDrawableImage.Draw
it reallocates some memory every frame. Itâs negligible for âsmall mapsâ but can quickly become a bottleneck if the amount of tiles goes over 10-20 thousands.
------------------ CGE code----------------------
for j := y1 to y2 do
for i := x1 to x2 do
begin
if FloorArray[i, j] <> nil then
begin
if (j mod 2) = 0 then
tx := i * TileWidth
else
tx := i * TileWidth + TileWidth div 2;
ty := MapHeight*TileHeight div 2 - j * TileHeight div 2;
N := FloorArray[i, j].GlobalIndex;
tx:= tx + XOffset;
ty:= ty + YOffset;
FloorArray[i, j].TileImage.Draw(tx, ty, N); // use DrawableImage.Draw
end;
------------------ zengl code----------------------
for j := y1 to y2 do
for i := x1 to x2 do
begin
if MapArray[i, j].FloorImageID >= 0 then
begin
if (j mod 2) = 0 then // ć¶æ°èĄ
tx := i * g_iMapTileWidth
else
tx := i * g_iMapTileWidth + g_iMapTileWidth div 2;
ty := j * g_iMapTileHeight div 2;
N := MapArray[i, j].FloorImageID;
img_base.DrawTile(tx, ty, N); //use csprite2d_Draw
end;
CGEâs DrawableImage 10 times slower than Zenglâs csprite2d_Draw
Iâm not familiar with ZenGL or implementation of csprite2d
. Maybe theyâre using dynamic batching for sprites - like CGE does for sprites - but not TDrawableImage, itâs a rather low-level feature that you should use if you know what youâre doing, not that it canât be improved, but if you go low-level you go there to optimize things yourself. This way if you try my solution above, you should get the same/similar speed. But again, donât compare FPS above 60.
See Optimization and profiling | Manual | Castle Game Engine for notes how to compare FPS.
If you really want to compare full FPS, disregaring the fact that some frames drawn were never actually displayed, be sure to set ApplicationProperties.LimitFPS
to 0, otherwise CGE limits them to 0.
That being said, our Tiled drawing performance indeed can suck for larger maps. This is a larger TODO. The plan is to:
Advise loading Tiled maps to TCastleScene
Make DynamicBatching
include more cases, such that Tiled maps in TCastleScene
get more optimized.
But thatâs a larger work inside CGE.
There is no perfect solution right now that is easy. Some solutions:
A. You can âbakeâ a static map layers into big images, and rendering them as big images underneath TCastleTiledMapControl
.
B. Or you can implement your own map rendering using TDrawableImage.Draw
with batching.
Both A and B of course are far from optimal, as they mean youâre not really âjust loading Tiled map into TCastleTiledMapControl
and thatâs itâ.
Iâm sure that with B, you can achieve the same speed as with ZenGL, as underneath these are really just simple OpenGL calls. As Eugene notes, ZenGL likely does batching as lower level, while our TDrawableImage.Draw
does not do automatic batching, you need to pass yourself an array of images to draw to one TDrawableImage.Draw
. In CGE, batching is done for TCastleScene
shapes.
Optimizations!
Long story short: our Tiled rendering is now 30x faster.
Details:
Added examples/tiled/map_viewer/data/maps/desert_big.tmx
which is a 400x400 map and clearly showed our (past) performance problems.
Before optimizations below, it had 2.7 FPS on my system. (yeah, less than 3 FPSâŠ)
Other maps in examples/tiled/map_viewer/data/
give me easily FPS > 100 with ApplicationProperties.LimitFps = 0
, so they are not useful for performance testing.
In this post, all FPS measurements are done with examples/tiled/map_viewer
, compiled in release mode (castle-engine compile --mode=release
), without FPS limit (ApplicationProperties.LimitFps := 0
in gameinitialize.pas).
Implemented automatic batching in our TDrawableImage
which can be activated by just TDrawableImage.BatchingBegin
/ TDrawableImage.BatchingEnd
. This should be useful for various cases of TDrawableImage usage.
Used automatic batching in TCastleTiledMapControl.
This increases FPS to 24 on my system for desert_big.tmx
And it stays like this even at significant zoom-out. Weâre back in business
Removed drawing off-screen images.
This increases FPS to 61 on my system. And FPS stay reasonable this at significant zoom-out.
Thanks michalisâs answer , I try to 1 but FPS not improved. TDrawableImage.BatchingBegin and TDrawableImage.BatchingEnd, It looks effectiveïŒWhere to modify, please give a demoăOpen desert_big.tmxïŒit had 3 FPS on my system.
I have tried several frameworks, including CGEăZenglăAfterwarp FrameworkăSDL2ăDelphiX etc.
Most framework stops updating, or the update is very slow. But Only CGE is actively developing.
Compare their rendering performance, I think Afterwarp Framework>Zengl>SDL2>CGE>DelphiX
What I want to do is a game like Flare (Diablo like)
There are many layers of maps,floor wall obj etc. Each layer consists of many tile
When I use CGE draw floor layer, FPS about 100, if I use Afterwarp draw floor layer FPS about 1500+ , if I use Zengl draw floor layer FPS about 1000+
The problem is that I just painted a layer of mapïŒfloor layerïŒïŒThere are also multiple layers of maps to draw, and role animationïŒIf everything is render, FPS will be very reduced.
So I think CGE rendering 2D is too slow
As for how to test it:
Just wait a few hours for Jenkins to build latest CGE with batching being used.
( You can watch it looking at Comparing snapshot...master · castle-engine/castle-engine · GitHub , right now it shows that commit âImplement almost automatic TDrawableImage batching, just useâŠâ is not yet build in snapshot, but in a few yours it should disappear from that page â which means that snapshots contain it. Iâll make a note here when itâs ready actually. )
Then just get latest CGE from Download | Castle Game Engine .
Then run examples/tiled/map_viewer , open in CGE editor, switch to âReleaseâ mode using menu âRun â Release Modeâ and then just 'Compile And Run" from CGE editor.
There is nothing you need to do from code to use it with TCastleTiledMapControl, rendering of TCastleTiledMapControl uses it automatically, and you should not get 1 FPS anymore
As for using TDrawableImage batching in your own applications, if you use TDrawableImage.Draw
explicitly: see API docs on Castle Game Engine: CastleGLImages: Class TDrawableImage . Thereâs not much to do, you just surround your rendering with TDrawableImage.BatchingBegin
⊠TDrawableImage.BatchingEnd
and it should work like magic.
As for the rest of your measurements, we need to have a good testcase so that weâre all talking about the same thing If you have any particular map where youâre testing CGE / ZenGL / Afterwarp and can share it, it will be much appreciated. And remember
FPS above your monitor refresh rate are easy to get and donât mean that much. And for users, it doesnât really matter if you have 100, 1000 or 1500 FPS, users probably see at most 60 of them anyway See Optimization and profiling | Manual | Castle Game Engine for various notes how GPUs work.
CGE has by default Application.LimitFps
set to 100, to not eat CPU (and laptop battery) rendering useless frames. So if you compare CGE, be sure to disable it using Application.LimitFps := 0
⊠but, as Eugene also mentioned, it is in general best to have a testcase that doesnât have high FPS from start. You should not compare 100 and 1000+ vs 1500+ Instead itâs better to have a testcase that forces FPS to drop below 60 (thatâs why I made that desert_big.tmx
) and work on optimizing it.
I would also like to see the report how we detect your GPU, in case CGE decided to use some âsafe but unoptimalâ route because of something. To get this, you can use âHelp â System Informationâ from CGE (open any project to see the menu) and then âSave To Fileâ to get the report.
So, for further optimization, it would be best if you can submit us how exactly are you testing â if you have any project and/or map you can share, this would be best, to make sure we talk about the same thing and we can test it too and say whereâs the issue (and possibly improve it in CGE). If you cannot share it publicly but can share it privately, that is also an option, you can send it e.g. by email to me ([email protected]
).
I look forward to the performance adjustment of the new version. CGEâs biggest advantage is that the author is working hard to update.
ZenGL use âbatch2d_Begin⊠batch2d_Endâ in drawing, The speed improvement may be the same as TDrawableImage.BatchingBeginâŠâŠTDrawableImage.BatchingEnd
I want to wait for the new version to rewrite the performance example to send it to you, it takes some time.
I hope CGEâs rendering speed is the same as ZenGL
I found two demo , which can explain this rendering performance problem
1ăisometric_game
It had 120 FPS on my system
I found BatchingBegin at ExeFile, but I donât know how to add this
2ădeprecated_to_upgrade\isometric_game
This demo draws more tiles.
It had 600 FPS on my system.
After I add TDrawableImage.BatchingBegin TDrawableImage.BatchingEnd
ïŒ it had 1800 FPS
Find this problem that affects 2D performanceïŒLet isometric_game raise from 120FPS to 2000+FPS
isometric_game is just a very simple example. I think 2000+FPS is normal. After I add map complexity, various animations, various UI, it can maintain more than 60 frames. If it have 120FPS now, after I add moreïŒ it will fall to about 10 frames.
examples/isometric_game
and
examples/deprecated_to_upgrade/isometric_game
use very different rendering methods.
The examples/deprecated_to_upgrade/isometric_game
uses just direct image rendering, TDrawableImage
. This can indeed be trivially speeded up now using new TDrawableImage.BatchingBegin
and TDrawableImage.BatchingEnd.
The examples/isometric_game
uses a TCastleViewport
, with lots of images inside (TCastleImageTransform
). It can easily include more scenes, also sprite sheets, loaded from TCastleScene
. Scenes have a lot of features, like easily running sprite sheet animations, collisions and physics, 3D, possible to be designed in CGE editor etc. See e.g. new CGE project from template â2D gameâ.
This is in general the approach we want to pursue in the future. It is much more feature-packed. And extending it helps a lot of use-cases (various types of 3D and 2D games can be constructed as a composition of TCastleScene
inside a viewport). See Viewport with scenes, camera, navigation | Manual | Castle Game Engine about viewports and scenes.
However, for now, it is not as efficient. While TCastleScene
is great at rendering heavy 3D stuff, it is not really that efficient when using a loot of trivial TCastleScene
or trivial TCastleImageTransform
for each map tile. So this case is just not optimal yet.
I know that examples/isometric_game
will not be efficient if you try to build a huge map there, or even load huge Tiled map to one TCastleScene
. The good news is I know exactly why :), this is what I described in my previous post in this thread CGE 2D render too slow - #7 by michalis
So in the meantime, specifically for map rendering, using 2D TCastleTiledMapControl is a more efficient solution (albeit limited only to the simple case of 2D map). We have a manual page also How to render 2D games with images and sprites | Manual | Castle Game Engine that describes 2 approaches for 2D rendering we feature now. There is even a possibility to combine them (like render transparent TCastleViewport over TCastleTiledMapControl).
So to be clear, I know exactly what to do to speedup examples/isometric_game
, and it will actually help a lot of other use-cases (not only 2D games with map). But it will take time, which is why in the meantime we advise some alternative approaches.
As for your time measurements, as we mentioned, comparing FPS above 60 is not very useful, and it is better to talk about a precise test-case than speculating. You do tests and measurements, but we should make sure that we test the same thing and that we know your use-case. Then we can advise best how to achieve it, in efficient way, using current CGE, before we implement planned optimizations in AD 2.
So, please send us your testcases I would happily see what exactly youâre testing. And if you have any plan, I would happily see exactly what you plan, to advise best.
Note that CGE downloads on Download | Castle Game Engine now contain the latest optimizations. It took Jenkins a while, due to unrelated troubles
Thank you very much for michalisâs guidance. I generally understand. The demo was probably written by me half a year ago, it canât compile now after i update CGE.I will rewrite a demo, add various complex situations. In fact, half a year ago I gave up to use CGE after I found that the efficiency was too low. I know how to improve now.It takes some time, I will finish this demo.
Note that Iâm working on many renderer improvements and cleanups in a CGE branch Commits · castle-engine/castle-engine · GitHub since ~2 weeks. I just added to the features included there 1 code refactor that gets us closer to implementing âbatching across many TCastleScene / TCastleImageTransform instancesâ, Big rearrangement of RenderBegin/RenderEnd code, to do it only once · castle-engine/castle-engine@42f78b0 · GitHub . So I am really working on making TCastleScene / TCastleImageTransform usage (thus example examples/isometric_game
) more optimal
I use TDrawableImage drawing map completely , After I add TDrawableImage.BatchingBegin TDrawableImage.BatchingEnd,demo has achieved the highest performance, as high as ZENGLïŒ1800+FPSïŒ. Finally found this bug ( I think )
Add BatchingBeginâŠBatchingEnd is the key. This will make CGE a huge performance improvement
I look forward to michalisâs development progress