Drawing Many Cubes Faster

MBa · July 10, 2025, 1:41pm

Hi Michalis,

In my game, I mainly work with cubes. The cubes are drawn by hand, side by side, and not using the engine’s built-in cube. Houses and trees are represented with these cubes, but it’s not a Minecraft clone

Since I want to display hundreds or even thousands of cubes, speed is crucial. I wanted to ask if it would be possible to create a small demo app with the following structure:

Draw a main cube (by hand, with a texture).

Copy this main cube via the GPU 1000 times (or so).

Pick one cube from these and modify it (e.g., change its size or texture).

Delete one cube from these 1000.

All of this should happen in just a single scene.

When I first started with the engine, I created a separate scene for each individual house. I then realized that this made my game slower. In the meantime, I’ve been using zones with individual scenes, but I’m still drawing the houses (in the background) by hand, one by one.

This approach is better, but it’s still far from what a GPU could do if I knew how to do it properly. I’m hoping that with this small demo, I could start fresh. I’ve also looked at existing demos but haven’t found any that create objects “by hand” or use just a single scene. Maybe I missed something.

Thanks a lot for your help!

edj · July 10, 2025, 3:03pm

I am curious what your project is. I made code for making parametric houses with a variety of roof shapes. They use x3d indexed triangle sets. A cube has 8 vertices and 10 triangles (I don’t do the bottom). I use a class I call tshapebuilder that stores all the vertices : TVector3List, Indexes : TInt32List and TexCoords : TVector2List and builds the shape. It has methods for adding face by indexes. My code ends up being pretty complex so you should work from scratch. There is a demo that builds a cube with ‘faces’ in this manner and puts a squirrel image on it. I don’t recall which but it was how I got started working at the x3d level.

edj · July 10, 2025, 3:11pm

If your houses are the same, use the TCastleTransformReference to clone it.

michalis · July 10, 2025, 4:19pm

The summary of all our optimization advises and techniques are on Optimization and profiling | Manual | Castle Game Engine manual page. It has evolved a bit as our engine grows, so if you use the engine for a long time and haven’t consulted it lately, do look at it again

As for how to optimize your particular application, more details would be helpful, as “the devil is in the details” But in general, similar to what @edj said above:

Use TCastleTransformReference for 1000 of different cubes. See API docs, manual and e.g. this news post for example what it can do. The 1000 references can refer to one TCastleBox.

It’s not a single scene as you suggest, but it will be fast, for 1000 cubes it should be more than necessary.
If you need to modify one of the cubes, then at that monent instantiate a design using TCastleComponentFactory (see e.g. how rockets are instantiated here) that contains inside a TCastleBox (it’s internally a scene and that’s OK) and then modify it.

This is what I would do for the example you describe.

I’m afraid I don’t have resources to literally implement the demo as you described above – hopefully information above is enough for you to experiment on your side. Moreover, I predict it will be much more useful to you if you make it yourself because you know your use-case best and you can test the solution on your use-case and in your context. Context, these “details in which devil hides” are important here – like

how many cubes are really identical in your case?
How many of them stay identical during the game?
How can user modify them, in discrete or continuous steps (so you can put the same, or similar ones, in a “bin” and have one TCastleTransformReference for each bin)?
If these bins for close-up objects are not good, then maybe use it only for far away objects?
Does it make sense to modify each cube with a shader? This could reuse rendering resources. You could even go wild, with geometry shader exploding 1000 different cubes on GPU from some trivial description. But whether it is useful, depends on everything that happens with cubes, how are they created, animated etc.
Did you investigate exactly why is your current/past approach problematic? 1000 cubes should not be a problem In a single scene, or as 1000 scenes. You say you reached the conclusion that you want a single scene, but why? Multiple scenes are better for some cases (dynamic world). And we generally optimize the engine heavily for “lots of scenes”, though there’s always room to improve.
What happens with the cubes? Do they just lie in place statically, or you move them somehow? Does anything collide with them, including the player?
What is the view, is it perspective view, and LODs and occlusion culling make sense?
What exactly do you mean “doing by hand” in your post above? This can mean a number of things, just creating TCastleBox instances from Pascal, or creating TCastleScene with TBoxNode, or creating TCastleScene with TIndexedFaceSetNode, or using TCastleRenderUnlitMesh to draw etc.

This and more details will be helpful to help you best, to determine what is the best approach.

If you have an existing code that is producing something that is slower than you think it could be, it’s also something I can always investigate, if you can share it (as simple as possible, but working example, that we can build and run to see the slowness).

MBa · July 10, 2025, 8:01pm

Hi again, and thanks to everyone for the help.

Because “edj” is interested in my project, I’ve attached two screenshots. Thanks for your interest. Please note that this is a work in progress and has been for several years now

There’s still an old/new menu, etc. I’m constantly trying to figure out the best way to do things.

I started with cubes because I want to draw a house using fewer triangles if possible. And I don’t necessarily want to use LOD, since I have other problems as well. One of them is that I have no idea about 3D math. That’s why I usually can’t follow the explanations in the forum. But the actual programming behind the 3D, the complex processes of a city with people, is just fun for me.

So, a house can be small at first. It’s a cube with a texture. Then it can reach different sizes (to accommodate more people). But it’s still a cube, just with different textures for different floors. I thought this would be the fastest way to draw small/large houses. In the background, I have a few different scenes, but always in pairs. I always fill the cubes in the non-visible scene with threads and when it’s finished I switch it. The scenes depend on what they are supposed to show: a scene with no shadows (for distant objects), with shadows, transparent, with or without collision, and one for the landscape. For the landscape, I once asked how the engine landscape can be used with “holes,” since I have tunnels and the landscape needs to be somewhat adaptable when placing objects on it. Since I have no idea how to do this with the built-in landscape, I just used my old code and draw it myself. But then there were other things I didn’t know how to solve.

Anyway: until a 200x200 world is full (one cell = one house, if there’s space), I have hundreds of cubes for the houses. Also trees, and then a few thousand people running around, but I only show those that are nearby and can be displayed at the current speed.

Since the houses can “grow,” the scenes are constantly changing. So, I need to be able to fill, change, or delete scenes. But at the moment, everything is simply redrawn in the non-visible scene and then switched.

The user can select a house and have it enlarged. Then it’s in a “build” mode. So that also changes as well.

For the trees, I use a kind of LOD: far away, only two cubes are used.

But since you can also see the whole world from above, at least I’ve turned off the grass, but with all those cubes, that’s still not enough. And at the very beginning with the engine, I tried using billboards for distant objects (with individual scenes), but that was VERY slow. Maybe I’d do it better today, but I still don’t quite trust it.

I also once tried to load grass patches as 3D objects and render them multiple times. That was so slow. Probably done wrong, but I gave up on that.

Shaders: I once tried to use them to make the landscape look better. But I never really figured out how, since I draw everything myself.

To the question: “cubes: Do they just lie in place statically?” The people are moving all the time, the rest mostly stays in the same place if it doesn’t change. Trees can also be felled. And I also have moving transport objects.

Unfortunately, I can’t just make a small runnable demo version, but a cube is drawn approximately like this:

procedure AddPoint(dx, dy, dz: single);
begin
sceneStackOpponents[opp,sceneNr].coords[sceneStackOpponents[opp,sceneNr].whatScene].FdPoint.Items.Add(
Vector3(scale * (ix + dx), scale * (iy + dy), scale * (iz + dz)));
Inc(sceneStackOpponents[opp,sceneNr].index[sceneStackOpponents[opp,sceneNr].whatScene]);
sceneStackOpponents[opp,sceneNr].indexFaceSet[sceneStackOpponents[opp,sceneNr].whatScene].
FdCoordIndex.Items.Add(sceneStackOpponents[opp,sceneNr].index[sceneStackOpponents[opp,sceneNr].whatScene]);
end;

procedure AddTex(aValue: integer; rotation: integer);  
procedure CalculateTextureCoordinates(row, column: integer; out texCoords: array of single);
begin
  texLeft := column * SubTextureSize / TextureWidth;
  texRight := (column + 1) * SubTextureSize / TextureWidth;
  texTop := row * SubTextureSize / TextureWidth;
  texBottom := (row + 1) * SubTextureSize / TextureWidth;

  case rotation of
    90: // 90° 
      begin
        texCoords[0] := texLeft;  texCoords[1] := texBottom;
        texCoords[2] := texLeft;  texCoords[3] := texTop;
        texCoords[4] := texRight; texCoords[5] := texTop;
        texCoords[6] := texRight; texCoords[7] := texBottom;	
		.....

end;

  procedure AddTex(aValue: integer; rotation: integer);

procedure RightQuad(x, y, z: single);
begin
for i := 0 to verticalRepeat - 1 do
begin
// Unten: entranceTex, Oben: tex
if i = verticalRepeat - 1 then
currentTex := entranceTex
else
currentTex := tex;

  if currentTex = 0 then
    Continue;

  AddPoint(x, y - i * segmentHeight, z);
  AddPoint(0, y - i * segmentHeight, z);
  AddPoint(0, y - (i + 1) * segmentHeight, z);
  AddPoint(x, y - (i + 1) * segmentHeight, z);	
..
end;

FrontQuad(sx, sy, sz);
BackQuad(sx, sy, sz);
RightQuad(sx, sy, sz);
LeftQuad(sx, sy, sz);

I hope this makes sense. One cube routine is rotatable for the people who walk around. Others are not, etc. But because I’ve been trying out different things for quite a while, the routines are probably not that efficient.

Sorry for the long explanation, but you asked
If necessary, I’ll just keep going as before; I still have a few ideas for how it might get faster.
But if I had a small demo with add, change, and remove of a cube, multiply over GPU, I would be able to try that.

Thanks to everyone for reading and for the help.

DiggiDoggi · July 11, 2025, 2:26pm

From what I see, you could have a cube for every floor, with optional entrance texture. So you can use pre-made models and rotate it, or even scale - and as mentioned by others you could clone them using transform reference. Then, instead of creating hundreds or thousands individual objects you could just re-use the few prefabs. User can pick the prefab from the menu as they (probably) do right now. But instead of creating a new individual object you just place the “clone”. User can still manipulate it - move & rotate - as they would with real cube.

If the upper floors can have a different style (texture) each, you can also use the prefab cubes and clone them instead of manually painting and tilling.

For the “under construction” mode visuals you can simply use a prefab with “sticks and stones” and then swap it for a “ready” prefab. It’s very simple to use as a list of cubes for each building. Only the first item has an entrance, you keep their prefab reference and rotation (if each floor can be rotated individually and use different style), and only after the last item you need the roof.

In that way you also separate the graphic/model part from the logic part easily. It’s a bit more triangles but the prefabs are loaded to the GPU just once. You can save on triangles count if the cubes have no floor/ceiling. Generally, simpler solutions are easier to maintain and less prone to errors.

Modern graphic cards can handle million of vertices without sweat, but as you currently create individual objects for every house, with custom UVs, it’ll always be slower than using few prefabs and “clones”.
If instead of manipulating UVs and vertices, you could just reuse prefabs and rotate them on the GPU, then the GPU needs just a few geometries and UV maps to load.

Transformation can be done on all cubes fast on the GPU’s side, simultaneously in parallel, but loading tons of vertices is not so fast and requires both the CPU and normal RAM, then transfer to GPU and GPU’s VRAM.

For a 1000 of houses 10 floor average, you have triangle count of 100k. Calculating it on CPU and then loading to GPU is going to take some time.

For a picture, I have loaded (into CGE editor) a human with each hair individually meshed - total some 4M triangles with 4-8k textures and normal maps, no one sane does that for a game Loading time was few minutes, but when it did load I could manipulate it as the manipulation was done on GPU. On the other hand I had a few thousand trees, all “clones” with LOD and they load in a blink of an eye because only one model has to be loaded.

Lastly, having LOD is important not only for the speed. You use cubes so that can’t be reduced much more, but you can reduce the textures, so they look good from different distances. If you have detailed textures and show them from a distance the result may be just a chaos with a chance for flicker. The LOD also could let you have a nicer walls with more “real” windows and other details when seen from close - you could use a normal map to sculpt some details on a flat wall, then use the normal map only for close LOD and no normals for distant LODs.

I know I’m a bit chaotic, it’s more thinking out a loud. In my game I also let players build stuff with individual parts and in general, very general, it’s a kind of solution I use myself.

edj · July 11, 2025, 2:29pm

Yes, I was hoping it was some sort of city builder as I am trying to build one too. I was also having performance issues with many simple shapes in a scene. The bottleneck for me was the volumetric shadows. Do you have shadows turned on in your main light source? I found the currently more tricky to use shadowmaps solved the performance problem, but you should wait for future development on that feature before attempting to use. Disregard this message if you do not have shadows enabled.

MBa · July 11, 2025, 5:14pm

@DiggiDoggi
I had just hoped that I could make such a prefab-like object by hand and then copy it faster via the GPU. That was actually the original question. But I assume such an prefab-object has more going on in the background than just a few rectangles and a texture. That’s more on me; I always prefer to create something in code rather than through an external tool.

Although, there is a CGE demo with a car driving down the road. I did a test with that. Even with 1000 cars being copied, it noticeably slowed down. Of course, I don’t know how complex the car is to draw.

I also considered using individual cubes for each floor. In fact, I did have that in a very old version. But I concluded that if it works with one cube and different textures, it would be faster. But yes, it would be simpler. I just wanted to be clever and do it with a single, faster cube.

I see, I will probably have to use an external tool after all

But I probably still have to figure out how to add, modify, or delete the objects.
Thank you for your advice.

@edj
But I admire you. It already looks very good, from what I’ve seen in the screenshots. I also wanted to do roads and such. But now, they’re more like paths where the texture just gets darkened.

Yes, I also have shadows. The ones you can simply turn on in a scene

But I always have double scenes: one for shadows and one for “no shadows.” When the camera moves, I draw a house in the shadow scene if it’s nearby; otherwise, without shadows for farther away. Since this happens in different threads, sometimes there’s a “hole” where you quickly don’t see the house because it switches from no shadow to shadow or back. But that wasn’t important to me at the moment. First, the “game” has to work. I would rather change the visible elements at the end. Until then, a lot could fundamentally change in the game anyway.

Thanks again for your interest and help.

DiggiDoggi · July 11, 2025, 9:00pm

When I say prefab, I literally mean something that is pre-made rather than any specific technology or format.

Although using external software is a good thing oftentimes (like free Blender3D), you can still make the prefabs using your own software - create the combinations you need in similar way that you’re doing now. That will include one ground-floor with the doors - for every style you have in the app (ie. for every texture, like wooden, colonial, bricks, etc). Then the upper floor - one cube for each texture. So you have 2 sets of cubes for every texture. If you offer, say, 8 different styles, you keep only 16 castle transforms.

When the model is already created (as a TCastleTransform) you use it by placing the TCastleTransformReference (a clone) in the same way you’d place the original objects.

The car model has 334 triangles.

It made me wonder. I’ll show you some test results made out of curiosity, hopefully helpful for you and other readers.

Here I have a model with 71.5k triangles (after a haircut ).

And… here are 1000 clones (no LOD, just brutal referenced copy)

Tested on 2 GPUs. One of which is a little radeon graphics embedded inside CPU. The second is nvidia rtx 5090. In my previous tests I also used older rtx 2080ti with very similar results. On radeon I used a second screen (2k resolution) while for nvidia 4k resolution.

When all 71.5 million of polygons are visible both GPUs sweat a bit with about 10 fps. No shadows. With shadow volumes nvidia had 0.36 fps. I was too worried to test that on embedded radeon
The same experiment done with 100 clones (just over 7 million polygons) there was no visible hit on performance (118 fps from 120). With volume shadows the fps dropped sometimes to ~20.
The same tests done with car model (334 tris, which is over 200 times lower than photorealistic human) the 1000 cars dropped my fps to ~30 with no shadows.
Last test was a simple plane (2 tris) with texture, 1000 clones with no any performance drop.

In that way I know that amount of polygons isn’t the real performance hit. Blocky, low-poly car is just 3 times faster than the real-like human, with the complexity ratio 1:214. I emphasise that LOD was not used.
The amount of TCastleTransformReference objects actually matters more than the amount of triangles inside your model, obviously because the geometry is shared.

MBa · July 12, 2025, 9:48am

“you can still make the prefabs using your own software”
So it is possible to do this “by hand” as well. There are no hidden parameters that only an external tool can handle. I’m a bit hesitant to ask again: which combination of CGE classes/objects do I need to use to create a cube with a texture, which I can then clone? Sorry to ask again: add, modify, delete. Maybe even change the color, since (as shown in the screenshots) I have different computer players. I just don’t want to attach too much to such a cube that isn’t needed and would slow things down.

Maybe there is already a really simple and fast demo available. Maybe I need to go back to that car/road example. I’m just hoping for something absolutely simple but fast, that I can use as a starting point.

“The amount of TCastleTransformReference objects actually matters more than the amount of triangles”
Does that mean that, even if I am very economical with triangles, I still won’t be able to get, let’s say, 3000-6000 cubes? With all the houses and people walking around.
Then the question is, what can I draw in the background that is faster? But the second problem would be, it would also have to look good enough from above.
But that’s more a question for myself, one that I’ve asked myself several times already.
And thank you again very much for all your tests; now I know more because of them.