Menu
Index

Contact
LinkedIn
GitHub
Atom Feed
Comments Atom Feed



Tweet

Similar Articles

21/06/2014 22:10
PICing up 433MHz Signals for OSS Home Automation - Part 8
16/02/2014 13:12
PICing up 433MHz Signals for OSS Home Automation - Part 5
19/04/2014 09:42
PICing up 433MHz Signals for OSS Home Automation - Part 6
08/06/2014 10:30
PICing up 433MHz Signals for OSS Home Automation - Part 7
15/12/2013 10:34
PICing up 433MHz Signals for OSS Home Automation - Part 1
21/12/2013 21:16
PICing up 433MHz Signals for OSS Home Automation - Part 2

Recent Articles

23/04/2017 14:21
Raspberry Pi SD Card Test
07/04/2017 10:54
DNS Firewall (blackhole malicious, like Pi-hole) with bind9
28/03/2017 13:07
Kubernetes to learn Part 4
23/03/2017 16:09
Kubernetes to learn Part 3
21/03/2017 15:18
Kubernetes to learn Part 2

Glen Pitt-Pladdy :: Blog

PICing up 433MHz Signals for OSS Home Automation - Part 9

Some significant host software design changes and hopefully the end of the last major bugs....

Weird stuff

After introducing threaded reads we stopped loosing data and there was never any overruns, however a new type of breakage started happening. The failure would involve what appeared to be duplicated received frames, or at least partly duplicated. This was very suspicious as it was consistent enough that it couldn't be by chance, but apparently not possible from the code.

After some time to think on it I realized that Perl keeps having problems with threads / forked processes and open filehandles which among other things get corrupted. What this change had done was introduce reads in a thread while simultaneously writing in the foreground to the same filehandle.... not good!

Towards consistency

The obvious thing to do seemed to be to put all the IO into the thread so I adapted it to that. I then also put the opening of the port in the thread so that there is only one open filehandle to the port, and that's the one in the thread. All IO to the transceiver occurs via the thread now.

The thread loop now checks if there is anything in the shared transmit buffer and if so sends that, followed by the usual reads. Locking for the buffers ensures that critical sections are always kept exclusive.

A side effect of this is that the SEGVs and glibc errors have gone on exit - it's now completely clean.

But...

More weird stuff

The remaining failure every few days seems to be on particular commands and seems to be a similar duplication of data over the length of command returns. That means if a command returns two bytes of 0x00 0x03 and we have say a RX frame of four bytes 0xff 0x36 0x12 0x73 what we end up with is 0xff 0x36 0xff 0x36 0x12 0x73.

So it's as if the command return got overwritten with duplicated frame data. I've added some debug code to the firmware to check for possible missing sections of code where interrupts should be disabled to protect against corruptions, and that indicates things are OK, so I have confidence that this isn't an anomaly like corruptions from the interrupt.

There is also a repeating pattern here - it appears to be the first bytes of a receive frame each time that get duplicated for the number of bytes in the command response.

What is however more bizarre is that debug code that logs each individual read from the port on the host shows clearly the data is being read in this pattern, often part way through a the data in a single read iteration. That's significant because it rules out being related to stitching together data, locking between threads and other aspects like this - it's actually already wrong in what has come directly from a single read from a port. As the port is only open in the IO thread I've got higher confidence that this is also not one of the Perl threading problems I see too often.

Another pattern seen sometimes is the (say 2) bytes of command return read corrupted, then a delay of several read iterations with no data, then the full frame. Again, this suggests that the frame data is getting mixed up with the command before it, yet appears to have made it into two separate transmissions.

So where is this coming from? Very difficult to say. It's unlikely to be either the part of the firmware I wrote based on the debug code output, but might be PIC USB libraries, possibly hardware related, corruptions down the wire, possibly host USB hardware, possibly host Kernel, less likely to be host software aside from Perl threading which I'm often suspect of. The one thing to note here is that this did appear to be happening to some extent before introducing threading.

Practical Solutions (recovery)

At this point I think it's very unlikely that I will be able to identify the root cause of the remaining problem within any time practical, perhaps ever. In this case I think I have to be pragmatic and go for the recovery mechanism.

I've implemented all phases of recovery and it works so effectively I'm having to hard-code skipping phases in order to test them. After weeks of testing each stage, I've put it all together and so far it has not had reason to go past level 1 of recovery.

The next part of recovery I need to implement is re-trying of any command that was sent. This is important since we want commands to succeed if they can, and only fail if all options are exhausted.

Making use of data

The main sensors in place at this point are Temperature & Humidity, and Energy (Electric) usage. By making these available to Cacti, we now have graphs that show trends, and are very useful for determining the right times to open all the doors and windows to manage our sticky summer (yes, it does happen in the UK).

Making use of control

Now that I have a system that isn't broken on data errors, I can actually use it to control things. For now it's lights - I have a light that turns on in the bedroom in the morning to wake me up gradually, and that's now controlled by this system.

The real use will come in with deploying home automation software in conjunction with this (likely FHEM with a bit of work to integrate it) which will then allow me remote control.

New Sensor

I've got myself an Oregon Scientific THGR122NX as a third Temperature and Humidity sensor and will write that up in time. The one thing I will say about this one is it seems to give far more consistent readings that agree very closely with other high quality temperature and humidity devices than the cheapie sensors I had been using before. It is more expensive, but for the money it's got a solid protocol and seems dependable so I would say worth the extra money.

Comments:




Are you human? (reduces spam)
Note: Identity details will be stored in a cookie. Posts may not appear immediately