Atom Feed
Comments Atom Feed


Similar Articles

29/12/2011 21:47
Bayesian Classifier Classes for Perl and PHP

Recent Articles

23/04/2017 14:21
Raspberry Pi SD Card Test
07/04/2017 10:54
DNS Firewall (blackhole malicious, like Pi-hole) with bind9
28/03/2017 13:07
Kubernetes to learn Part 4
23/03/2017 16:09
Kubernetes to learn Part 3
21/03/2017 15:18
Kubernetes to learn Part 2

Glen Pitt-Pladdy :: Blog

IMDB ratings for MythTV

I lead a busy life and rarely have time to waste. I've been using MythTV for a few years now and it allows me to arrange TV in a way that it fits with my life - I can time shift programs, I can watch informative programs at 1.5X speed to get the interesting bits while avoiding wasting my time with all the dressing and drama added to them, I can take my TV with me over net network, I can pause live TV, I can set-up and plan my viewing while I am away from home anywhere I can get an internet connection and much more.

What is good to watch?

Currently the program guide has around ten thousand programs in it for the next 7-8 days. Almost all of that is of no interest to me at all. All the channels are trying to make their programs look good, and sorting through the chaff is difficult.

For movies there is one advantage - IMDB is a massive database of movies, reviews and viewer ratings plus trivia and much more. Being currently a place for the most dedicated movie fans, I find their viewer ratings very reliable: I enjoy almost everything getting an rating of 8 or more.

IMDB also publishes a freely downloadable dump of their database which makes it practical for easily importing into MythTV.

IMDB to MythTV

There are many other scripts around for doing this. After playing with many of them, they all had deficiencies that made them impractical for me.

One of the biggest failings is that there seems to be little consistency in the data in the program guide on DVB. Some channels categorise loads of things as movies which are not (eg. Home improvement or cooking programs), some channels set the release date as 2000 irrespective of what it actually is, other titles are full of typos, some times the subtitle is included in the title, sometimes ...... I could go on for ages. The data in the program guide is very inconsistent.

If anything was going to be practical, then it needed to be able to come with all the inconsistencies, and have manual overrides when things went beyond what could be automated.

What I did was to load the database, all the aliases, apply various regular expressions to  improve consistency (eg. sometimes "one" is written other times just the number is given) and then attempt different matching strategies, eventually falling through to Levenshtein edit distance which helps where there are typos.

The script is a bit rough-and-ready, but I have decided to make it available anyway.

Download:  Perl script to add IMDB ratings to MythTV movies

You will also need to download the IMDB dumps from one of their FTP sites. Specifically you need the Ratings, AKA titles and ISO AKA titles dumps.

There are a bunch of variables to set for the location of various files in the script.

You also need to ensure that you have the Text::Levenshtein Perl module.

I have a cron job that runs this each morning after the standard MythTV cron job so ratings get added immediately after the program guide is updated.

Other ideas

We have created effective ways of filtering SPAM in email. This often relies on learning filters based on Bayesian inference.

The thing I am curious about is if I could use these to learn my viewing habits and train a Bayesian filter to rate upcoming programs based on past knowledge.

I am currently experimenting with this using dbacl as the classifier, and at this stage am waiting for enough data to train the filter properly. I may also give NLTK a try at a later date.


Are you human? (reduces spam)
Note: Identity details will be stored in a cookie. Posts may not appear immediately