Glen Pitt-Pladdy :: BlogPICing up 433MHz Signals for OSS Home Automation - Part 6 | |||
Things have progressed gradually (been busy at work, putting major projects live etc.) but this project is now reaching the point it's becoming robust enough for regular use. The firmware has hardly changed other than a few minor things (see below), but the host/server side has seen some more significant changes. Hook Script FailPreviously I was having problems with a bizarre situation where on executing the hook scripts (after a fork() to allow the mail process to continue) communications with the Transceiver stalled for no apparent reason. I had worked out a work-around which seemed to solve the problem by re-opening the device a second time. I had a disk controller failure in my home server and consequently did a major hardware upgrade from an ancient AMD processor I had been hanging on to for it's ultra-low power consumption to a current i3 which paired with the right mobo actually uses about 30W less power! The result was a massive increase in speed and the problem re-appeared. There is obviously some kind of race going on. After reading the Perl docs on fork(), they noted that the way Perl does things (it's not a pure POSIX fork() that Perl is doing) on some systems (Linux not mentioned) there can be problems with file descriptors getting corrupted on exit() and that calling POSIX::_exit() was needed on affected systems. As it turns out, that seems to solve things. Obviously when it comes to some devices Linux is also affected and the file descriptors are getting mangled. That alone has made normal "passive" listening completely reliable now and the system has run 24x7 for weeks with no problems. Interference & Transmit FailPreviously I described crude modifications to the 433MHz Receiver to reduce excessive noise in idle periods which were causing over-runs. That certainly helped a lot and made it possible to reliably decode the data. It turns out that since upgrading my main workstation, its producing loads of interference that the 433MHz module is picking up. The result is while the workstation is on, the buffers have significant queues and when triggering a transmit (eg. to turn a light on) it does sometimes get it's self in trouble and ends up with the data stream mangled. What isn't clear at this stage is why. The normal running state is receiving to pick up and process all the sensors (power, temperature, humidity, remotes etc.) and to make a transmission the following is done:
At step 1 there seems to be a number of different things that may fail when there is a lot of interference (and a lot of receive traffic):
I have some suspicions here that the transmission on a busy device is tripping up the PIC and/or Linux USB-serial subsystem resulting in a buffer overrun outside of my software. The approaches I've looked at to solve this are:
PIC dead-lockOne of the above problems is clearly some kind of firmware failure: I worked out that there was a problem with some kind of dead-lock condition with the PIC where I could prove the main loop had stopped, yet interrupts were clearly still working. Since this would only happen vary rarely (typically a 2-3 times a week) it's been extremely difficult to debug. After weeks of adding and refining debug code each time it locked, I've finally established what the cause was. As it turns out, during all commands I disable interrupts to prevent responses from receiving data in the interrupt from being merged with command responses which would render them corrupted. What I didn't take into account is that interrupts do still occur, just the ISR doesn't get run.... until that is I re-enable interrupts and the ISR gets run for the interrupt that occurred previously and due to the changed circumstances the interrupt isn't properly handled. End result is that the ISR re-triggers permanently leaving stopping the main loop. This one was easy to fix - just clear the interrupt flag after the changes made by the command if it affected the status of the device. That should avoid deadlocks of this type. PIC WatchdogI'd been wondering about this. Many USB devices do seem to get themselves into unusable states occasionally and that's not a good thing, especially when I might be relying on this to do something for me while I'm away from home. I've bitten the bullet with this and implemented a watchdog ping from the host which if communications are lost will completely reset the PIC so that we get the best chance of being able to re-connect and recover the situation. This is simply using the built-in PIC watchdog timer, but resetting it on demand from the host rather than from the PIC itself. That closes the loop around the whole system so if any part fails we reset everything to a sane state. This is working well so far with signal handling added to disable the WDT again on a normal exit of the host software. What next?I've been brave enough to screw the cover back on (stopping easy access to the firmware update switch) and now I'm going to be looking at making the client tools (that talk to the host/server process over the network) production ready as well as making the server a real daemon. I need to add authentication, config files, and possibly also look at making the client side of things more reusable (eg. move it out its own class) so that it can be used by other home automation tools to integrate with this. |
|||
This is a bunch of random thoughts, ideas and other nonsense, and is not intended to be taken seriously. I'm experimenting and mostly have no idea what I am doing with most of this so it should be taken with cuation and at your own risk. Intrustive technologies are minimised where possible. For the purposes of reducing abuse and other risks hCaptcha is used and has it's own policies linked from the widget.
Copyright Glen Pitt-Pladdy 2008-2023
|