I am trying to start with the lowest LOD tree to see how many I can add.
This is using TCastlePlanes with the same tree texture and with the billboard behavior. It looks really good except for the known issue of transparent objects in front of the transparent water. With a few hundred trees, memory use remains low, but fps really crashes from over 60fps to struggling to hold 30fps. Is there anyway to go lighter? Perhaps a single tcastlescene using triangleset… maybe would solve the transparency issue? Or would that have a sorting issue if they were all combined into one scene? How would you approach having a forest with many thousands of trees?
If I make the trees just a triangle (ttriangleset with just 3 coords) with a color, it remains very fast with many trees, even with them billboarding. I think the slowness came from the transparency. I can probably come up with a tree glsl that won’t make them look so flat.
Why not TCastleScene + 3 LOD, then clone them with random rotation / scale?
If you want players to zoom down to the ground level, you’ll need meshes anyway. Plus:
- you get rid of transparency over transparent water
- far LOD can be as simple as few triangles
- casts shadows, or can have fake shadow attached
- can be easily animated, although the animation is shared with the original object (i.e. all clones and the original have the same animation frame) but having different scale / rotation will make them look different
Trees typically form a group. If the forest is somehow dense, then you can make the trees as a bunch of them grouped somehow “random” inside one model. Then the far LOD is even simpler because some trees will be obscured anyway - you could even reduce it to a single box with painted trunks + few triangles or box for canopies.
Also forests have some bushes which also helps greatly if you want to reduce trees to few triangles, and it’ll reduce the cost of shadows too.
Bush + tree trunk + another bush + a trunk behind = 1 far LOD texture.
I understand your game offers some terraforming, I guess you use cells in a grid. You can place 2 or 3 trees in your model roughly in some corners (somehow random) but within the cell-sized area. Then, when player edits the cell, you remove the 2-3 trees all at once and it will look quite natural - you TNT the cell out, the trees are gone. The bonus is that you’ll have 1 model for 2-3 trees instead of having 2-3 separate models.
Alternatively, take a look at a free, open source game OpenTTD. It’s not Pascal, but they have terrain maps up to 4k x 4k. Their trees are sprites, and are placed similar to the way I described above, the difference is they use random amount of trees per cell (0-4) and the trees are random specie from a set of sprites. The default graphics are 8bit (you can download 32bit version) but it’s a good example of how to make grand landscapes easy.
Thanks for the tips. I also have a ‘flora’ layer that is good for very far distant trees ‘clumping’, but looks poor close or middle range I have trees that can generate quite detailed, but trying to come up with the low LOD version to maximize number. Here I am using the same texture, on triangles instead of tcastleplane stays fast with a lot of trees. So it seems like the culprit isn’t transparency but the TCastlePlane vs TCastleScene with one triangle. I do have terraforming, but it will uproot trees if you dig where trees are.
I think there is something slow with TCastlePlane. I went from 1 triangle per tree to 7 triangles in an 8 vertex ttrianglefanset to get a better shadow shape with trunk. And even with shadows on, and the more complex billboard, it isn’t as slow as it was with TCastlePlane. I like the triangle fan because it will allow for lots of tree shapes, while casting better shadow.
Now though I wonder if there is a way to billboard the tree toward the light source when it is calculating shadows. Now when you move around the shadows get really skinny at some angles. (probably moot since I will likely have shadows off for trees that are billboarded LOD)
Hmm, billboard shadows go away altogether when you look toward the light at the tree. Not to worry, I will use 3d trees when they are close enough to want shadows.
@DiggiDoggi This is my second pass at trees. My OG trees were cool but too heavy for my needs. Here is a screen shot mixing 3d trees and the flora layer simulated trees (flora height is another single that deforms the water mesh when outside of water)… that is creating the pointy trees and the green areas. This was years ago. I have worked a long time to get nowhere haha. But that was 13fps…
I made the triangle fans configurable. This is the simplest with 4 vector2 to define it (but could have arbitrary many) that it uses to build the billboard (or could be used to lathe solid shape). This give better shadows. And now no transparency, with a tree texture that will work with any shape. Since these are individual objects they could be randomized a tad to look more different. But somehow adding more trees than ever made the fps FASTER (with no shadows). 80fps instead of normal 60 fps. I am not sure why. With this many trees and shadows it really slows down, but I won’t use shadows for low LOD trees anyways.
(ps its a bit faster if I turn off ReceiveShadowVolumes on the trees, and looks better because the trees aren’t so black)
Some randomization as it builds the triangle fan, combined with randomizing the texture coordinates some, the trees don’t look so cloned yet take the same amount of data and perform the same.
First of all you aren’t using transparency anymore. That’s a big difference in performance. On the previous pictures I can clearly see that you used textures with transparency. Transparency is expensive because every single pixel, for every single tree has to be tested and resulting colour calculated.
My observations (I write what I think, without a promise of being wise):
-
Change the material roughness all the way up (like 0.9 or something) so the trees don’t shine. I guess in CGE you’d need at least a 1 pixel roughness-metallic texture, I don’t think you can set it just by a number. If you import a model, then it’s set for you, otherwise you need to do the job.
-
From afar they’re good (the far LOD), but I’d argue it’s good for mid-close LOD (when seen on a reasonably sized screen). I base my opinion on the screenshots. Quality is more important than amount (amount can be left for players to choose from settings). Also LOD changes should be smooth, don’t produce visible jumps but your method has the weakness that it’ll be difficult for you to fit better models (mid and close LOD) without unpleasant glitches.
-
They’re flat, which means you have to turn them around Y so they face the player. It won’t work well when seen from close. Also you’ll have to set rotation for every single tree whenever player moves. It’s not efficient, and as you mentioned before - the shadow can be a tiny thin line when you look from a wrong angle (around +/-90deg from the Sun’s direction). If they were 3D meshes (even simple ones) you just rotate the whole world, or the Sun, by a simple command easily executed by the engine / GPU, and the shadows would stay as they were.
-
If you use clones (TCastleTransformReference thingy) instead of separate objects (meshes), then you have 1 geometry on GPU, and it’s rendered hundred times with various scales and rotations without giving a sweat. That opens you doors to have more detailed trees without performance hit - and good graphics are never bad. Plus, the LOD switching is handled by the engine for each tree. When you generate the mesh procedurally, then you have to handle the LOD by yourself.
-
You don’t really need randomly ‘personalized’ trees (shifting the texture, moving vertices), because when you use the abovementioned ‘clones’ (CGEs transform reference) they can be rotated / scaled randomly, still being just a single geometry on GPU.
Scale & Rotate is a common practice, and it’s cheap. Even very successful games use that trick.
For example: with just 3 model variants for a birch tree, with 3 available families (birch, oak and spruce, like in Medieval Dynasty) gives you 9 different models, and with that you can make very diverse-looking forests easy, because of rotation and scale.
I don’t say your algorithm is bad, on the contrary, I envy your creativity and it has potential. However, if the trees are just a disposable obstacle that player will bulldozer away, then BestSolution := SimpleSolution
… as long as it looks convincing.
Bulldozers coming soon. Thanks for the good insights. (from the old prototype…)
ps, I welcome you or anyone to play along at home, as the repo is all current for better or worse.
In general, it is indeed important to test here the cost of using “transparency”, and more precisely “partial-transparency by blending”. This has 2 costs:
- The smaller cost is that when rendering, each incoming color is mixed with screen color.
- The bigger cost, and this may be responsible for things you observe, is that we need to sort the shapes used by blending. See the docs about blending that have a section “4. How transparency works” . The rendering order is important when doing this, otherwise the mixing could result in unnatural look.
This points to some things:
- If your shape doesn’t use blending, it will be much faster.
- Note that you don’t need to resign from “transparency in general”. Only “partial-transparency using blending” has this cost. The “yes/no transparency using alpha test” is much less costly, no sorting, no mixing.
And to make it more practical:
- First of all, measure
Test FPS when TCastleViewport.BlendingSort is
sortAuto
vssortNone
. You can also trysort3DVerticalBillboards
. See Blending (Rendering Partially-Transparent Objects) | Manual | Castle Game Engine for all links and docs. IfsortNone
makes everything fast whensortAuto
is slow (and nothing else changes), then you know it’s the fault of this sorting. - This in turn means that a good idea for far LOD is to have a shape that has no transparency, or has transparency using alpha-test. If you will only have a few shapes to sort in a frame, things will be fast.
If you want to force a scene to use alpha-test, see 5. How do we determine whether to use alpha testing or alpha blending. In short, if you use Blender and export to glTF (which we recommend), there’s a UI in Blender to set this.
Using the TCastleTransformReference
+ LOD after our recent improvements is also very much advised, like @DiggiDoggi says
( There’s also a TODO in CGE in all of this too, we could possibly speedup the sorting, using the fact that most frames have similar results of this. Right now we sort in each frame from start. We have a TODO to make it better. So, while we cannot totally eliminate the sorting cost, we can make it less noticeable in some common cases. )
No, you never need to create dummy 1x1 textures for such things in CGE (or X3D, which I helped to upgrade materials in X3D 4.0) Every material parameter, including
TPhysicalMaterialNode.Roughness
has a factor that can be optionally multiplied by a texture. You never need a texture if you want to just have the same value of this parameter on the surface. See about X3D 4.0 material nodes:
I don’t think transparency was causing poor performance for me. The culprit seemed to be the tcastleplane. But the transparency worked poorly combined with the already transparent and everywhere water layer. If you don’t see it, it is undergound, or is acting as ‘flora’. I think maybe the reason I saw a big performance increase with so many simple trees is that they were quicker for the renderer to ‘hit’ than the distant terrain? Not sure. I am happy enough with the current 10 point triangle fan billboards as the lowest LOD to proceed. Now trying to get the trees to flow through the server so they will be persistent. My trees will grow and die and be part of a sim… all so that you can bulldoze them haha. Oh and raging forest fires. That will be fun.
Does the alpha test mode have the issues with other transparent objects like blend mode does? I like the idea of ragged edges on the trees, or gaps in the foliage. It would also make dead trees easier.
I continue to resist the idea of using the transform reference but probably will have to for distant stuff. I really want a dynamic detailed world with individual everything. Reality will get in the way of that strategy eventually. But so far I have way more headroom on fps and memory than I used to for already a bigger world.
Hmm, I would need to see the testcase to tell for me. Blending, for sure, requires proper sorting to look correct (details and ways to customize in Blending (Rendering Partially-Transparent Objects) | Manual | Castle Game Engine ) and the sorting unavoidably causes some additional performance hit. So that would be my first guess as to performance issues. As for correctness issues – I would need to see testcase.
No, alpha test is trivial, requires no sorting and works nicely with everything. “Alpha test” means just that we reject from rendering some parts of the shape. Everything else works the same, in particular Z-buffer testing works as usual, just like for opaque shapes.
I’ve made some quick test for TCastleImageTransform
vs TCastlePlane
vs TCastleScene
vs Castle clones (TCastleTransformReference
for short).
The testing world here comes from an empty template, no ground, no sky, no shadows, no other interferences. All settings are unchanged, e.g. the empty world Fps is about 64, no antialiasing, etc. I run it in a debug session with about 3000x1500 screen size, windowed, but at 4k fullscreen numbers are the same.
As an interesting observation, Blending in all tests apparently increased the Fps.
Code is trivial:
{
Viewport blending sort: sortAuto
Various PNG images, all 1024px high, width ~ 700-1000px
with alpha channel & without it (transparent color := custom color)
Blending is: acBlending for images, plus RenderOptions.Blending for all objects
Viewport blending sort: sortAuto
AlphaChannel # trees Fps:
Image Plane Scene Clone Clone*10
acNone 2000 27 22 27 30 30
acTest 2000 27
Blending 2000 40 35 37 45 40
acNone 5000 9.5 8 9 11.8 12
acTest 5000 6-8
Blending 5000 13 11 13 18 15
}
if Event.IsKey(keyF1) then
begin
for i := 1 to 1999 do // 2000 trees, because 1 is already there from design
begin
Img := TCastleImageTransform.Create(TreeImage.Owner);
Img.Direction := TreeImage.Direction;
Img.Translation := Vector3(4*Random(5000), 492, -4*Random(5000));
Img.Scale := TreeImage.Scale;
Img.AlphaChannel := acNone;
Img.RenderOptions.Blending := false;
Img.Url := TreeImage.Url;
TreeImage.Parent.Add(Img);
end;
Exit(true); // key was handled
end;
Replacing the Image by TCastleScene (very similar code) shows almost the same values. I used a simple model with textured plane - exactly the same textures, number of triangles = 2. Then 10 different models at once with 200 / 500 instances (2000 & 5000 total objects).
The memory usage is at around the same level for every class (when scene.Cached is true).
Using clones (TCastleTransformReference) vs individual TCastleScene. I have used the same single model, then used 2 different models with clones using 50/50 distribution, then 10 scenes with 200 / 500 clones each.
I also used, in a separate run, a TCastleImageTransform as a flat ground. No difference.
In another test I made 1000 random terrain chunks (each 100x100) . Having the terrain increased the Fps for rendering the ‘trees’.
Changing sortAuto → sort3D → sortNone didn’t make much difference, but here I used flat models only (as in edj’s screenshots).
So, my curiosity is satisfied
Edit: full source code. Copyright notice: most of the images have their copyright owners, I have re-edited them and honestly have no idea where they come from. Posting them here only for testing purposes. They can’t be used in your project (They are too ugly anyway).
tree-test-0.1-src.zip (75.5 MB)
This is weird Added to TODO to look at your testcase soon.
This is a positive information, it means that blending sorting doesn’t have a real impact on the speed (it’s just fast enough to not matter), which is good news – one less thing to worry about.
With the source code I’ve done some clean up before posting here, and apparently small mistake got there. Here is corrected one:
tree-test-0.1-src.zip (68.8 MB)
However… while testing it again I’ve checked how FPC looks vs Delphi (my original test was Delphi).
And it seems FPC beats Delphi for many many FPS!
Viewport blending sort: sortAuto
Release mode
1) Compiler: Delphi, Windows 11
2) Compiler: Lazarus, FPC, Windows 11
AlphaChannel # trees Fps:
Image Plane Scene Clone Clone*10
acNone 2000 30 22 27 30 30 Delphi
acNone 2000 63 35 45 60 61 FPC
acNone 5000 9 8 9 11.8 12 Delphi
acNone 5000 15 10 12 25 24 FPC
Blending 2000 42 37 40 41 44 D.
Blending 2000 72 47 63 80 80 F.
Blending 5000 13 11 13 18 17 D.
Blending 5000 20 15 18 31.5 32 F.
acTest 2000 27 Delphi
acTest 2000 50 FPC
acTest 5000 8.3 Delphi
acTest 5000 13 FPC
To be sure comparison is fair, are you 100% sure you compare both versions (FPC and Delphi) in release (not debug) mode? All compilation options support switching the debug/release modes, and they affect (for some code, a lot) the execution speed, see Optimization and profiling | Manual | Castle Game Engine .
Interesting scientific results. I have also found FPC to be much faster than Delphi, but for me it had to do with how Delphi hated doing tons of single addition in a thread compared to FPC. I am curious what your hardware setup is? I am on old i7-4790K with radeon RX580. I just ordered a new computer to prepare for the coming dark ages in USA. It should be a lot faster. I presume turning on shadows cut all your results in half?