Menu
Index

Contact
Atom Feed
Comments Atom Feed

Similar Articles

2014-02-16 13:12
PICing up 433MHz Signals for OSS Home Automation - Part 5
2014-07-21 18:37
PICing up 433MHz Signals for OSS Home Automation - Part 9
2013-12-15 10:34
PICing up 433MHz Signals for OSS Home Automation - Part 1
2013-12-21 21:16
PICing up 433MHz Signals for OSS Home Automation - Part 2
2014-06-08 10:30
PICing up 433MHz Signals for OSS Home Automation - Part 7

Recent Articles

2019-07-28 16:35
git http with Nginx via Flask wsgi application (git4nginx)
2018-05-15 16:48
Raspberry Pi Camera, IR Lights and more
2017-04-23 14:21
Raspberry Pi SD Card Test
2017-04-07 10:54
DNS Firewall (blackhole malicious, like Pi-hole) with bind9
2017-03-28 13:07
Kubernetes to learn Part 4

Glen Pitt-Pladdy :: Blog

PICing up 433MHz Signals for OSS Home Automation - Part 6

Things have progressed gradually (been busy at work, putting major projects live etc.) but this project is now reaching the point it's becoming robust enough for regular use. The firmware has hardly changed other than a few minor things (see below), but the host/server side has seen some more significant changes.

Hook Script Fail

Previously I was having problems with a bizarre situation where on executing the hook scripts (after a fork() to allow the mail process to continue) communications with the Transceiver stalled for no apparent reason. I had worked out a work-around which seemed to solve the problem by re-opening the device a second time.

I had a disk controller failure in my home server and consequently did a major hardware upgrade from an ancient AMD processor I had been hanging on to for it's ultra-low power consumption to a current i3 which paired with the right mobo actually uses about 30W less power! The result was a massive increase in speed and the problem re-appeared. There is obviously some kind of race going on.

After reading the Perl docs on fork(), they noted that the way Perl does things (it's not a pure POSIX fork() that Perl is doing) on some systems (Linux not mentioned) there can be problems with file descriptors getting corrupted on exit() and that calling POSIX::_exit() was needed on affected systems.

As it turns out, that seems to solve things. Obviously when it comes to some devices Linux is also affected and the file descriptors are getting mangled.

That alone has made normal "passive" listening completely reliable now and the system has run 24x7 for weeks with no problems.

Interference & Transmit Fail

Previously I described crude modifications to the 433MHz Receiver to reduce excessive noise in idle periods which were causing over-runs. That certainly helped a lot and made it possible to reliably decode the data.

It turns out that since upgrading my main workstation, its producing loads of interference that the 433MHz module is picking up. The result is while the workstation is on, the buffers have significant queues and when triggering a transmit (eg. to turn a light on) it does sometimes get it's self in trouble and ends up with the data stream mangled.

What isn't clear at this stage is why. The normal running state is receiving to pick up and process all the sensors (power, temperature, humidity, remotes etc.) and to make a transmission the following is done:

  1. Put the Transceiver in "idle" (stop receiver) which in turn returns a single byte acknowledge
  2. Configure the encoding for the transmission
  3. Load the data to and set transmission to occur however many times is required (open-loop protocols so we have to transmit enough times to have confidence in the actor responding)

At step 1 there seems to be a number of different things that may fail when there is a lot of interference (and a lot of receive traffic):

  • Received data not-yet read by the host process gets mangled hence it can't be decoded and fails due to invalid communications with the Transceiver
  • No acknowledge byte is received although the Transceiver clearly goes into idle mode
  • With the above often the serial link goes dead and requires re-connecting the USB to get it working again... it even appears the main loop in the PIC gets stuck - see below.
  • A huge amount of received data comes in and the safety limit for the amount of data buffered is tripped

I have some suspicions here that the transmission on a busy device is tripping up the PIC and/or Linux USB-serial subsystem resulting in a buffer overrun outside of my software.

The approaches I've looked at to solve this are:

  • Read (completely clear) the buffers before changing modes - this minimizes the chance of over-runs during mode-switch. Currently this is implemented and seems to be doing an acceptable job since the data rates are actually quite low and the real enemy is stopping long enough for buffers to fill.
  • The previous circuit modification was crude and with an extra resistor and with a change of capacitor I could make this into a much more robust circuit which will have better immunity to interference. It's tempting, but so far I've left this since I want to make the firmware and host software robust first and improving noise immunity takes away my test case.

PIC dead-lock

One of the above problems is clearly some kind of firmware failure: I worked out that there was a problem with some kind of dead-lock condition with the PIC where I could prove the main loop had stopped, yet interrupts were clearly still working. Since this would only happen vary rarely (typically a 2-3 times a week) it's been extremely difficult to debug.

After weeks of adding and refining debug code each time it locked, I've finally established what the cause was. As it turns out, during all commands I disable interrupts to prevent responses from receiving data in the interrupt from being merged with command responses which would render them corrupted. What I didn't take into account is that interrupts do still occur, just the ISR doesn't get run.... until that is I re-enable interrupts and the ISR gets run for the interrupt that occurred previously and due to the changed circumstances the interrupt isn't properly handled. End result is that the ISR re-triggers permanently leaving stopping the main loop.

This one was easy to fix - just clear the interrupt flag after the changes made by the command if it affected the status of the device. That should avoid deadlocks of this type.

PIC Watchdog

I'd been wondering about this. Many USB devices do seem to get themselves into unusable states occasionally and that's not a good thing, especially when I might be relying on this to do something for me while I'm away from home.

I've bitten the bullet with this and implemented a watchdog ping from the host which if communications are lost will completely reset the PIC so that we get the best chance of being able to re-connect and recover the situation. This is simply using the built-in PIC watchdog timer, but resetting it on demand from the host rather than from the PIC itself. That closes the loop around the whole system so if any part fails we reset everything to a sane state.

This is working well so far with signal handling added to disable the WDT again on a normal exit of the host software.

What next?

I've been brave enough to screw the cover back on (stopping easy access to the firmware update switch) and now I'm going to be looking at making the client tools (that talk to the host/server process over the network) production ready as well as making the server a real daemon. I need to add authentication, config files, and possibly also look at making the client side of things more reusable (eg. move it out its own class) so that it can be used by other home automation tools to integrate with this.