Programming for IO Expander

I am experimenting with IO Expander board and Text Scrolling Marquee pattern.
First set has just a PB configured to run 512 LEDs in 8x64 Matrix configuration
(two 8x32 LED Matrix serially connected).
Second set has a PB + IO Expander. Overall configuration is the same except
each 8x32 LED Matrix is connected to two channels on IO Expander.
Both sets are running exactly the same pattern code and using exactly the
same Mapper code. Visually everything is identical and running just fine.

The reason for this experiment is to see what performance (FPS) improvement
could be achieved by using IO Expander.
All timing measurements where done on a physical LED Data Line with
Digital Oscilloscope.
The nature of this pattern is to refresh entire Render Buffer on every rendering
cycle in the beforeRender function.
Timing for the PB only configuration is 29mS per frame (15.6mS for sending serial
data to the LEDs plus 13.4mS for the processing data in between frames).
Timing for the PB + IO Expander configuration is 17.6mS per frame (7.8mS for
sending serial data to the LEDs (it is twice faster because all channels on the
IO Expander board are running synchronously in parallel) plus 9.8mS for data
processing in between frames (this is a bit faster most likely because serial driver
for the IO Expander is more efficient vs. serial driver for the LEDs)).

So, the performance (FPS) is only 1.65 times better with the IO Expander vs. PB itself.

Physical serial link between PB and IO Expander runs at 2mhZ vs 0.8mhZ for the
WS2812b LEDs ( I am lazy to calculate what is logical speed difference).
Taking this in account the performance increase could be about 2 times for the
same configuration.

But I am reading a lot of posts where people achieved amazing performance
increase by adding IO Expander.

My question is - What significant detail(s) I am missing?

Just link to the detailed explanation how to program PB + IO Expander for
achieving maximum performance will be OK.

@Vitaliy, my experience is that the expander board improves frame rates by lessening or eliminating communication speed from Pixelblaze to the LEDs as a bottleneck. There’s nothing magical going on, just a faster serial link from the Pixelblaze to the expander, and then the opportunity to then send data from the expander to LEDs in parallel. Just as you’ve already observed.

People who see the biggest frame rate improvements tend to be doing things like breaking their single long (for example, 1024 pixel) strips into (say, 4x256) sections on separate expander channels. Doing this can roughly double the effective frame rate, depending on how quickly the pattern being displayed renders pixels. (With the maximum limited, of course, by the 2Mhz serial data rate.)

2 Likes

Yes,
but the difference wire serial speed is only 2.5 faster (2.0NHz/0.8MHz) for the WS2812b LEDs.
Logical speed difference could be even less because you still have to send the same 24bits
for each LED plus Magic Packet plus CRC. And with UART you are sending 10bits for
each Byte. But yes, once all frames are transferred to the IO Expander it will send out all
channels in parallel. Now if you need to update say, only 1 channel out of 8 (very unlikely
case) it definitely will run faster. But most likely you still will need to update all channels.
In this case I don’t see how single 2MHz serial channel can improve overall performance
unless IO Expander has some sort of built-in intelligence other than buffering and
synchronization.

I just tested with 1000 pixels of WS2812b – using the default “New Pattern” pattern, I get:

  • whole string from Pixelblaze - 25.9 fps
  • 5 channels of 200 pixels from expander - 62.75 fps

This seems logical - the Pixelblaze is able to send data to the expander in about half the time it would take to send directly to the LEDs, so it can get to work rendering the next frame while the expander is drawing.

Yes, I agree. About 2x fps increase make sense.
But I saw few posts (sorry, I did not memorize links) where people claimed
significant FPS improvement for multi thousands LED installation just by
adding the IO Expander. That didn’t make any sense.
And I started to think maybe IO Expander has built-in processing power.

The only case I can think of in which you might get an increase greater than 2x is if you’re driving multiple strips in parallel from the same expander channel. This means that not all pixels will be addressable.

But if you’ve built an object that uses symmetrical lighting on one or more axes, you can hook it up this way, tell Pixelblaze only about the LEDs you actually need to address, and frame generation would be very fast in comparison to the total number of LEDs.

Maybe this is what they’ve done. Otherwise, I think there is no way around the fact that fps is limited by LED count and data link speed.

Good point. I did not think about application like this.
Another good reason to use IO Expander board even for single channel -
is a HW implementation of low level LED Driver. @wizard mentioned this in the description
for the IO Expander board. I am lazy to check what MC is used and its spec but apparently
it has a HW Blocks suitable for creating 8 LED Driver channels.
Being EE I don’t like when SW is messing up with low level HW protocols specifically
when timing is tight.

This will depend a lot on where the bottleneck is. Is it CPU or bandwidth? You can check bandwidth / driver by running a trivial pattern with different LED types (output drivers). You can check CPU only by setting No LEDs, which is basically a no-op driver that doesn’t send the data anywhere.

Compared to WS2812 driver, which is limited to 800Kbps (or 33k pixels/sec at 24-bit color); the output expander is limited to 2Mbps (10 bits per UART byte, minimal frame overhead) or about 66k pixels/sec.

Of course you can input 66k pixels/sec to an expander and make it drive a single WS2812 output which is still limited to 33k/s, in this case the expander will skip frames when it gets a draw command while it is still drawing. Instead, use 2 or more outputs to increase the total bandwidth to the pixels.

It could double performance which is pretty amazing! You can run an absurd number of total pixels, but the bandwidth is still limited. Still, running 2,500 WS2812 LEDs directly on a single string is going to be a lot slower than running them via an expander to multiple strips (2 or more) for the same number of LEDs.

Take this trivial pattern for example:

export function render(index) {rgb(0,0,0)}

Disable live preview (reduce overhead of UI slightly).

With “No LEDs” driver and 2,500 pixels, I get 138.45 FPS. Here the CPU is the bottleneck since bandwidth to null is infinite. Pixelblaze is rendering 346,125 trivial pixels per second. Put another way, it takes 7.22ms of CPU time to generate 2,500 black pixels.

With 2,500 WS2812 pixels, I get 12.09 FPS. Here the bandwidth is the bottleneck, and rendering doesn’t run in parallel while data is sent. Thats to say, rendering is stalled until data transmission is complete. Thats 30,225 pixels/sec, fairly close to the theoretical bandwidth limit. Put another way, thats 82.7ms per frame, and we know 7.22ms of that is rendering time. If it had run in parallel, we might get 13.25FPS or 33,115 pixels/sec.

With it set to Output Expander and 5 channels of 500 pixels, I get 25.95 FPS. The bandwidth is doubled, and rendering can happen in parallel with data transmission. Rendering buffers pixels to the UART (up to 256 bytes), and continues rendering the next pixel unimpeded (unless buffer is full). This is fully 64,875 pixels/sec, very close to the theoretical saturation of bandwidth limit of 66k pixels/sec.

The overhead of control frames and CRC is fairly low, but there is some.

If I increase this to 6,400 pixels by setting up the expander for 8x800 pixels, I get 10.3 FPS or 65,920 pixels/sec. The difference here is there’s less overhead sending control frames to the output expander relative to the pixels data. But it’s only a 1.6% difference in throughput.

A small control frame is sent for every channel so that the expander knows to disable that channel. The driver makes no assumptions about the internal state of the expander from previous render frames.

Right, you could see for yourself.

Other than parsing the input data and setting things up, the output data runs without the CPU by using hardware timers + DMA to the GPIO directly. This frees up the CPU to parse incoming packets while a draw is happening concurrently.

2 Likes

This bandwidth math should be in the documentation, and next time someone asks about 2000+ pixels on a PB, we can point to it and say “if a Max of 12fps at 2500 pixels is ok with you, go for it, if not, an output expander could give you max 25fps, and if you need better than that, you need to consider multiple PBs.”

We absolutely have had people who refused to listen to what the limits are, or wanted to do things we know they won’t be happy with.

Yes, I did read this few times.

And this is very clear not even digging in details which MC HW is used and how.

Bottom line:
PixelBlaze plus IOExpander is amazingly nice product.
Tanks to @zranger1, I discovered power of PixelBlaze for the controlling addressable LEDs.
And again thanks to @zranger1, PixelBlaze is integrated with HE Home Automation Controller.
I already finished (sort of) project for Balcony Lighting.
Now thanks to @jeff, I almost done with Scrolling Text Message project.
For this project I will use IO Expander because I am using to 8x32 Matrix.
My initial test showed only very minor improvements in performance but this is attributed
to the pattern itself. So, I will try to optimize the pattern itself.
Plus the plan is to replace all existing RGB LED strips with PixelBlaze.
Every of my LED related project must be integrated with HE HA controller.

My biggest problem in this area - all this projects are 95% SW projects.
I am well behind all this high level programming languages such as
Java, Groove, Python, etc.
My Scrolling Text Message project requires me to write a custom
application for the HE because some limitations of PB WebSocket API
and limited capabilities of the HE Device Driver (probably because of
PB WebSocket API limitations).
So, I have to learn at lot in the SW development area.

Anyway, very BIG Thank you to all of you who are helping my adventures.

I am not one of these guys.
I was very surprised to read something like: 1000+ FPS with 1000+ LEDs.

I haven’t seen anyone claim anything like that. Not 1000 pixels and 1000 FPS at the same time. Pixelblaze can do 1000+ FPS, but only for a small number of pixels.

I am sorry, but I do not remember where did I see this.
What is worse, I cannot find this by doing search on forum and even Google.
But I did seen something like this and it caught my eyes.
I am sorry again if this happens to be a fake info.

If you find it again, let me know please!

I’m glad you are happy with the PB + expander, and hope I could answer your questions about performance in a way that made sense.

Out of curiosity, are you looking to start a project that would need good FPS with several thousand LEDs? At some point I’d like to make multiple PB seamlessly work together for larger projects.

Certainly I will inform you if I will find this again.

Thank you very much.
You already explained everything with details.
Plus, I am EE and I can easily calculate all timings, etc.

I started this thread because I saw that fancy claim about superior performance
just by adding IO Expander. This immediately sparkled in my head how this is possible.
The only way for this to happened - the IO Expander must have buit-in intelligence.
In this case just by sending a command to do something definitely will be a huge
performance improvement. But I did not find any reference/evidence about any
commands other than DrawAll() for synchronization and Channel Init.
I thought, I might be missing very important set of commands.
Of course, this was not a case.
So, please excuse me for bothering you with stupid questions.

So far I am (almost) OK with the performance.
My Balcony Lighting project (complete) does not require any high FPS.
My Scrolling Text Message Board project (still in progress) runs a bit slower than desired
(request from my wife to speed it up, Wife Acceptance Factor).
I guess, this is because almost all processing (messing up with relatively big buffers) is
done in the beforeRender function. My idea how to speed up things - is to create an
oversized buffer for the entire message, fill it once and than simply change pointer(s)
for scrolling.

Surprisingly my wife was very impressed with the power of PB.
She already asked me for few more projects.
So I already ordered more LED Rings, Matrix, Single Pixels and even two Curtains
from Ali Express (the source I don’t like). I definitely will need few more PSc + IO Expanders
but at this time I am not sure how many to order.

So far neither project will require very high FPS but who knows, appetite comes with dinner.

My biggest problem with all these projects is a SW.
All my projects absolutely must be integrated with the HE HA hub.
I am glad, major components for the integration already exist and works very well.
But few things are missing or not yet supported.
For instance, PB WebSocket API does not support variables other than decimals.
I can update/replace the entire array but did not figure out how to update just a
specific array element(s).
On the HE side there is next to impossible to create nice string formatting
compatible with PB API.
Unfortunately my Scrolling Text projects requires me to create few missing
SW components or to find a reasonable work around with what is already available.
It looks like I will be very busy for a while learning all these SW tricks.

I’m not sure I follow, what kind of variables would you want to update? The setVars supports a number or array of numbers, which could be an integer or float, through the JSON format this is often done in decimal form, but I think you can use the base 10 notation like 123456e-3 instead of 123.456.

Right. The JSON syntax would have to be much more complex to support selectively updating part of an array.

It’s more software, but you could implement a command + response API using these primitives in order to to achieve more complex behaviors. For example, you could have a “command” var and an “args” array of some predefined length. On the pattern side, check the command var, and run an appropriate handler function, passing along some args, then clear the command var. To send a command, wait for the old command to be cleared, then setVars to set the command and args. To add a reply/response, add another command code the client checks for, and use the args as return values.

Ooh, so if we passed a array of ascii values in, created by parsing a word into characters, it could do “text-y” stuff?

@vitaliy (and other Hubitat users),

Previously, I did not support directly setting arrays from the user-accessible Hub API.

However, because this nice use case just appeared, I’ve posted a new version of the driver to the experimental branch of my repo that does support this, via the “setVariables” command. (It will appear as a new button on the devices page, and in RM.)

To set a variable or variables with this API, enter a correctly formatted JSON string (according to Pixelblaze documentation) just as you would send to the websockets API. For example, to change color temperature to 6000k on one of my house fixtures, I send it this string:

{"colorTemp" : 60}

OK,
Say, I want to display message ["Test "].
Obviously I cannot pass ASCII string to PB.
Also, I cannot use HEX representation for the ASCII characters.
Another words, array [0x54, 0x65, 0x73, 0x74, 0x20] also cannot be passed to PB.
The only way to pass this message to PB is to use Decimal (not Number)
representations for the ASCII characters.
I.e only array [84.0, 101.0, 115.0, 116.0, 32.0] is working.
On the HE side I don’t have a problem to encode (but only manually) static
message for the PB in a Decimal Characters Representation.
This is already working all the way from HE to PB.

Now say, I want to display something like this: “Temperature is 72.0 F”
This message has two static parts: “Temperature is " and " F” and
a Decimal (this is coming from the temp. sensor) in the middle.
Unfortunately there is no easy way on the HE side to compose
the above message as a single piece.
I guess, on a PB side I have to create the array for the entire message
with a place holder for the temp. and a dedicated exported var for an
actual temp value. Then simply replace place holder with the actual var
in the beforeRender function.

Yes.
But at this time the SW is my Achilles Heel.

It looks like I am trying to use a PB in a way it was not designed for.
But I am happy, it works!
Plus WAF rating is very high (not all my HA project deserved this, few were rejected).
And I have a lot fun learning how to program PB and/or HE.
(HE is far more complex in terms of programming, at least to my eyes).
I really appreciate all very valuable help I am getting from the PB community
and I am open for all ideas but please don’t spend your valuable time on my
project specifics.

Wow!
Very BIG Thank You, @zranger1
I will try this shortly and let you know the results.

I guess, I created a use case for the unusual PB projects.