Slow code causes weird behavior

spazzle · November 9, 2020, 5:58pm

I’m running the following pattern. The intention is to have several discrete objects move through the rendering volume and for a pixel to light up based on its x,y,z proximity to the objects. If a pixel is too far away it will be off, and if it is close enough to multiple objects it will use the closest one. The objects are currently moving on simple circular paths, but that can change. I update the object positions during the before render step and then compute the hue and brightness of every pixel during the render3d step. The code works fine. It doesn’t explicitly use much memory since I’m only keeping track of a small number of things.

However it can be quite slow for understandable reasons. I use a large number of pixels (currently 1440, eventually I hope to go to 2400). I use a fair number of objects, right now 6. The system has to compute the distance from each pixel (n) to each object (m) every cycle, leading to ~n*m computations. It goes faster with less pixels and less objects. With 1 object and 1440 pixels it runs at about 9 fps, with 6 it says 3.5, but probably actually less. Nothing really surprising here.

What concerns me is the following. When testing this the system itself becomes unstable in weird ways. The preview bar can take a very long time to load (20-60 seconds is not uncommon eventually). The pattern as seen on the LEDs can start relatively smooth (but slow) but become less so over time, eventually becoming very stuttery. Once the PB basically locked up and then when I went to go back to the code the pattern code itself had disappeared, though other patterns were fine. Often I have to reset the device or force reload the page. This becomes worse as I keep editing the pattern.

I’m sure my code is poorly optimized, obviously suggestions would be nice, but this is more of a bug report on how the system behaves under extreme loads.

var timespeed=0.5
var t1
var numballs=6
var ballsX=array(numballs)
var ballsY=array(numballs)
var ballsZ=array(numballs)
var ballsC=array(numballs)


function update_balls(t){
for(n=0;n<numballs;n++){
  ballsX[n]=0.5*sin((t+n/numballs)*PI2)
  ballsY[n]=0.5*cos((t+n/numballs)*PI2)
  ballsZ[n]=1
  ballsC[n]=t+n/numballs
}}

function render_balls(x,y,z){
  mindist=10
  color=0
  
  for(n=0;n<numballs;n++){
    X=ballsX[n]-x+0.5
    Y=ballsY[n]-y+0.5
    Z=ballsZ[n]-z
    rad=sqrt(X*X+Y*Y+Z*Z)
    
  if(rad<mindist){mindist=rad;color=ballsC[n]}
  }
  if(mindist<0.2){hsv(color+mindist,1-mindist,1-mindist*3)}
  else(hsv(0,0,0))  
}

export function beforeRender(delta) {
  t1 = time(timespeed)
  update_balls(t1)
}

export function render3D(index, x, y, z) {
render_balls(x,y,z)
}

zranger1 · November 10, 2020, 1:31am

@spazzle, this happens to me too, especially lately as I’m trying more physics-based computationally intensive stuff in anticipation of the Pixelblaze 3. Push hard enough, and it gets very cranky. Eventually, you’ll get the “execution steps exhausted” message in the IDE which for me, most often means a reset.

Possibly, @wizard could “fix” this by giving the PB’s web code more cycles one way or another. But I’m not sure I’d even ask for that. (unless it could be done dynamically and not take cpu from pattern execution under normal circumstances…) We’re operating near the edge of the cpu’s capability, and there’s always going to be a limit somewhere.

You can lighten the web UI and preview bar load a little by adding /?min to the PB’s web page address. Other than that – reworking the algorithm – moving as much pattern computation out of render() as possible is probably the best solution.

In this case, there are lots of potential ways to optimize. Is the display regularly spaced, like a volumetric cube or sphere? Are all your spheres going to be the same diameter? If so, maybe you can trade memory for cpu cycles by precalculating a table of distance offsets for the radius you’re interested in.

Or you can allocate an internal voxel buffer and do all the calculation for the frame in beforeRender(). You can pack hue and brightness for a given voxel into one 16.16 integer, and just have render() unpack and display. That ought to get you at least a few more fps.

Scruffynerf · November 10, 2020, 1:51am

First thoughts:

You call render balls for each pixel. Highly inefficient.

You should calculate the moving points and the. Spheres of influence (which is made much harder by making it just be the “closest”, that makes it have to look at many…). What if you just lit up with a brightness relative to distance? Now the closest points will be bright and more distant ones will be less so. (Beyond the radial cutoff, no light of course)

So now you just say “for all xyz items within [ Ball 1’s sphere area” which I believe is the way to do XYZ rendering speedily, and then Ball2 etc.

Potentially, you could also do the polar hack, and treat XYZ as radius and polar coordinates, which could be much faster. I suspect some math translation would be needed (fixed center of universe versus moving balls, gonna Google that now)
TIL:
https://planetcalc.com/7952/

X-Y-Z vs r-azimuth-z (cylinder, or polar plus z) and r-azimuth-polar (radius plus 2 angles)

Might be wrong here, but that could be a huge piece of the overall problem.

Could you post/link a video of the effect? Unsure what it is supposed to look like.

Scruffynerf · November 10, 2020, 2:23am

Answer is:

∥r−r′||
= √
r^2+r′^2−2rr′[sin⁡(θ)sin⁡(θ′)cos⁡(ϕ−ϕ′)+cos⁡(θ)cos⁡(θ′)].

And given that you don’t care about actual distance, just closest within range X, you can precalculate most of this, and not bother with squareroots (the smallest distance is still smallest, even squared)

Likewise, anything outside of a given radius range is too distant (if I want within 2 radial units, and my first point is at r=10, there is no way that r<8 or r>12 can be close enough, and same for angles (has to be within a range, unless really towards center), so you can narrow the scope down greatly before you calculate what’s left to see what is closest)

Now I totally want to do a 3d display…

spazzle · November 10, 2020, 2:58pm

Unfortunately the pixels are all over the place, and at best could be described cylindrically. Since they are described via a map, and I don’t have access to the XYZ coordinates easily outside of render3D I’d have to figure out some way of finding them out first.

Scruffynerf · November 10, 2020, 4:37pm

If you have XYZ, you can convert your map to any other format… Mostly the advantage would be to reduce math, I think.

As I said, a video of the behavior might help.

zranger1 · November 10, 2020, 8:23pm

Here’s the pattern running on a 12x12x12 (1536 pixel) cube. It works as described – I’m getting just under 4 fps.

@spazzle, just for fun, if you want to post your map generator code (or data) here, I can post a rendering and see if anybody can come up with any good geometry specific optimizations.

spazzle · November 10, 2020, 9:32pm

It would be the mapping I posted here Pavilion Mapping

spazzle · November 10, 2020, 9:50pm

Obviously the current circular paths aren’t particularly interesting, but it was more of a test to see how fast it ran.

Scruffynerf · November 10, 2020, 10:19pm

Oh yeah, that design calls out for a height (z), plus R+Polar mapping… Cylindrical, not spherical.

It’d map nicely to that… Strand goes up in Z, then at the top, R increases, then polar angle increases then R decreases, then Z drops again… Adjust angle to the next and repeat.

One of my goals is to remap as many existing patterns into polar coordinates (if they don’t work now), because often it’s a the best answer for discs, cylinders, etc which are common designs.

zranger1 · November 11, 2020, 1:57am

I like this project a lot. Even before optimizing, it’s a really nice proof of concept! Wish I could see it with the cloth pavillion up. This is the kind of thing that really shows LED lighting beautifully.

There are things you can do to optimize a full 3D approach, but I’m with @scruffynerf here – a cylindrical mapping (angle,radius, height) would gain you speed and flexibility – for one thing, you’d get easy “tree” patterns by using the height axis to determine “trunk” and “canopy”. You could also go with a 2D (angle/height) cylinder and use the strand number as the third axis – this would make radial symmetries really easy.

This looks like fun. If I can help w/code, just ask. Looking forward to seeing video of it in operation one of these days!

Scruffynerf · November 11, 2020, 2:06am

Awesome job with your simulation, and it also shows the weirdness of that algo you are using: watch the play of a color (the nearest “frond”)
It doesn’t flow like a smooth curve, the ends of the light recede.

Ideally (in my head at least, looking at that flow pattern), as the light source moves across the frond, the light would begin in the middle, and spread both directions as it traversed, same as it left (think moon wax/waning, this currently is like the crescent moon going dark at one end first…)

wizard · November 11, 2020, 2:15am

@spazzle,
I think this will be tricky to optimize, and is definitely pushing the system. I get about 15 fps on a V3 running 512 pixels (my volumetric cube), and everything is happy and usable, but the animation could be smoother. I see with some tweaking of constants that the spheres soften up a bit, and would look nice as a slower running pattern with smooth animation.

They key issue here is doing numballs expensive operations per pixel. Here are a few small optimizations that boosted FPS slightly when i was experimenting, some mentioned by others above:

Break out of the loop early when the pixel is inside one of the balls instead of continuing to find the closest ball (since they aren’t intersecting).
Use the square of the distance as a threshold, which can avoid the cost of calling sqrt. Adjust thresholds as needed.
Avoid calling render_balls when the coordinate is outside of any possible sphere position, such as the top and bottom area (i moved the balls to the center, but if you do have them just on top, then something like the lower 80% of pixels can’t have balls in it, so don’t check!)
You can offset the ball x,y coordinates by 0.5 in update_balls and avoid doing it each time inside the for loop in render_balls.

Untested ideas:

Divide and conquer. Split the world space into quadrants and pre-compute which quadrants a ball will intersect based on position and radius. Then you can reduce the scope of balls that render_balls needs to check. First, find the quadrant of the pixel (keep this relatively simple), then check fewer balls by either iterating and checking quadrant first (perhaps a bitfield test), or by precomputing an array of balls for each quadrant, reducing the number of items that needs to be iterated. You can subdivide more, trading off more upfront work.
since the balls are arranged regularly with some math, I think it would be possible to create a very similar effect without keeping an array of balls, but instead with some math on the coordinates themselves. like intersection of waves outward from the center, and one radially, with a period that creates n intersections.

spazzle · November 11, 2020, 2:15am

I think the issue it that the lines are space relatively far apart for the effect to work well. I actually liked it better in the cube render because the density of pixels is so much more uniform and small compared to the ball. It might work better if there were 32 spokes, but 16 is clearly too sparse for something like this. Better to have a more specifically designed pattern that explicitly takes the geometry into account.

spazzle · November 11, 2020, 2:22am

@wizard, The issue is that the rotating balls were simply a test to see that it turns on and runs. Ideally they would each be traversing a unique path through the 3d volume, and at that point some simplifications would break down.

But the removal of the sqrt does help, and just breaking out once one ball is found is good enough since it would simply just give priority to the lower numbers which is good enough.

I agree with the untested ideas, its probably best to just come up with an interesting functional form that only requires a single pass through every pixel.

Scruffynerf · November 11, 2020, 2:49am

I think the “roving sources” (aka numballs) is fine, it’s the massive calculation that bites you.

To flesh out how a cylindrical space would speed this up:

Numball 1 has a moving position. It’s at Radius, Angle, Height. We’ll assume all values are normalized to 0…1.

So the “stalk” consist of a number of near zero radius, climbing Z, and all manner of angles.

The “fronds” are all high Z, varying Radius, and all manner of angles,

Let’s assume Numball 1 (N1) affects pixels within .1 radius. That means when we iterate thru all pixels:

Is pixel radius within .1 units of of N1’s radius? if not, it’s too far.

Same for Z, more than .1 units away? Too far.

Angle, it depends on how big R is, for large R (over .3, ie not near the stalk), Pixels angle should be close to N1’s angle (though as R decreases towards stalk (For small R, you could be near more angles and at R = 0, any angle would be close enough. Math TBD.

If R and Z and Angle are all close enough, color pixel HSV relative to values.

We only calculate N1 once, (and N2, N3, etc)

and then we check each of all of the pixels for all 3 matches, failing any, fails N1, proceed to N2 check, N3, etc, then next pixel.

Alternatively, you could just describe the math and apply it all at once. But then you are doing all of the math on every pixel, so this way, any failure moves onward, we don’t waste time.

Hopefully that’s clear enough. I have to write it up better.

A prerender with access to 3d coordinates would help, especially if you wanted to use an history array for fading out or multiple sources