Glen Pitt-Pladdy :: BlogPICing up 433MHz Signals for OSS Home Automation - Part 8 | |||
More bugs and gradual progress.... Bugs, bugs, and more bugsToo quietFollowing my previous circuit mods a strange thing happened. I started arriving back home to find the RX LED off and the transceiver otherwise operating happily other than not receiving any transmissions. This was somewhat unexpected after a mod that should make things better and I mulled over the possibility of weird scenarios that were affecting the PIC hardware... that comes of being a mixed signal person who has debugged many a hardware problem relating to bizarre parasitics. For those that have ever tried working with original 555 timer designs driving grounded relays you will know what I mean.... there's a reason why all the designs in the datasheet drive a +V connected relay! :-) After putting a load of debug code in and a lot of patience I exposed an unexpected bug. I have various sanity checks that will reset the transceiver into a sane state and one of these disables RF modules when the host is not responding. That is determined by there being unsent data in the buffer for more than 10s without sending it. It also turns out that to avoid excessive overhead of sending a byte or two I buffer data until a minimum threshold before sending. The combination of the circuit mod (quieting down excess noise), the host timeout and the buffer send threshold meant that there was sometimes insufficient data in 10s to trigger the send so the timeout got reached. At least that demonstrates the circuit mod is reducing edges that are not valid! The solution has been to add a buffer flush timeout so that data that has been in the buffer a while gets flushed. Too noisy(?)That leaves us with the other problem of occasionally getting mangled data back from the transceiver. With debug code added for the "too quiet" bug, what I expect are overruns have been much more common. The reason is that now we have status checks tripping the number of commands sent to the transceiver. This does raise the question of how common overruns are. As it turns out I already took care of this by having a rolling counter on each receive frame, and checking the logs suggests that there are no buffer overruns occurring. So, is this another problem or are these the only case where overruns occur? Some evidence here is that since fixing the "too quiet" bug there also have been far fewer of these corruptions. That will reduce the amount of data in the buffer on average which will also make overruns less likely. Not conclusive, but important none the less. Then I captured a failure sequence and we can see the frame counters (in bold) jumps at the point where 2 bytes of garbage arrive:
0xff <- receive frame So we have conclusive evidence now of missing data with an overrun. So what we have is 36 missing bytes, and 4 extras, so a 32 bytes. Other examples it's 56 bytes, 54 bytes or others, so at least it's variable which would be expected with the varying data rates. The way the transceiver firmware is written it should not attempt to buffer whole frames when there is no space so it's overwhelmingly likely that this is happening on the host. I suspect a lot is the non-threaded read which means when any significant processing occurs there is longer than normal delay in reading the buffer. Threading things upAs more plugins are added there will be a higher processing overhead between reads. This has to be dealt with sooner or later so now is likely a good time. Due to limitations with Perl threads this is a little tricky. Perl variables are not shared between threads without some extra work, and even that has limitations like with references within a thread shared object. The only practical way of doing this within the limitaitons is to spin up a continous read thread and then turn the existing read into grabing chunks of the buffer from the read. That means that we should be able buffer a lot more and avoid overruns. I've also hit likely bugs with the Perl, threads and/or threads::shared versions I'm using resulting on SEGV or a glibc trap on exit. This doesn't happen if I don't clean up threads in a destructor. Now that there is a lightweight read thread running continously there should be far less scope for overruns, but the first error I've caught shows a mangled frame rather than an overrun (frame counters remained sequential). This is very interesting as it suggests there may be another bug at play, perhaps even the unthreaded overruns being a consequence of that. Update: Ouch!As it turns out, yest there was another bug at play. This is just classic programing stuff - a "+" instead of a "-" when calculating bounds checking on buffers in the PIC code (Transceiver firmware). Along with that the read thread has meant that we have negligible chance of missing anything and as a result I'm also seeing situations where insufficient data was read to get the ACK after all the received frames in the buffer. This is fixed by looping until we have cleared off anything that might be a received frame. This has also highlighted another potential problem - if buffers are full then sending ACKs is skipped and it would be better if it forced flushing the buffer instead. |
|||
This is a bunch of random thoughts, ideas and other nonsense, and is not intended to be taken seriously. I'm experimenting and mostly have no idea what I am doing with most of this so it should be taken with cuation and at your own risk. Intrustive technologies are minimised where possible. For the purposes of reducing abuse and other risks hCaptcha is used and has it's own policies linked from the widget.
Copyright Glen Pitt-Pladdy 2008-2023
|