Kaltura Community Edition

TL;DR version: Don’t waste your time. Kaltura server is prone to problems and the player simply doesn’t work reliably on a number of platforms.


Kaltura Open Source Video DeveloperVideo has become an important part of many websites, and for some YouTube simply does not cut it any more. Not using YouTube for video though can be expensive, in fact it can cost several thousand dollars a month for some sites to host, serve and encode their video for their reader and watchers. Because of the cost of the service, that I will not name, we took a hard look at testing out the Kaltura Community Edition to see if it would meet our needs and potentially save us thousands of dollars in cost per month.

We choose to install Kaltura on a DigitalOcean droplet and using the guide on Github for a single server install using Ubuntu had the server up and running. The first time round it was pretty easy to get the server running without using SSL, and in fact we were able to do a lot on the install including uploading, encoding, serving video etc. All seemed pretty positive, but then we tried to to do a mass upload of a few files and discovered pretty quickly our test droplet simply did not have the power to handle what we were trying to do.

One of the reasons why we choose DigitalOcean to test Kaltura was because it is easy to resize our test machine. A simple resize of our droplet had the Kaltura server up and running again with more ram and processor power to handle a test bulk upload. We were still testing without SSL but were hopeful that adding in SSL would not be to cumbersome. The FAQ here makes it sound like reconfiguring Kaltura is a simple matter of running, in our case the Debian commands,

# dpkg-reconfigure kaltura-base
# dpkg-reconfigure kaltura-front
# dpkg-reconfigure kaltura-batch

and adding the values to change ports and SSL certificates. In theory this should have been simple and easy, in reality it caused multiple headaches and searches on the Kaltura support boards and on the GitHub Issues section.

The problem often seemed to be related to the zzzkaltura.ssl.conf file not properly having the Apache variables replaced. After some searches we tried to edit the template file for zzzkaltura.ssl.conf located at /opt/kaltura/app/configurations/apache/kaltura.ssl.conf.template without the Apache variables but with our inputs and hoped that our problems would be fixed. No such luck. We still were not able to load Kaltura with SSL, in fact we were only able to get the default Apache SSL configuration.

At this point we tried to disable the default Apache SSL site configuration, much like we had to disable the default Apache website to get the non-SSL version of Kaltura to work. No such luck, we were then not able to load anything from the server and only got 404 errors. At that point we decided to re enable the default SSL site and move all of the settings from zzzkaltura.ssl.conf into the default SSL configuration and we restarted Apache. Much to our surprise the Kaltura Community Edition server worked. Once we got this working doing a few searches on the support forum and issues board found no mention of such a problem. I mention it here in the hopes that if you are having problems getting Kaltura running over SSL that it might help you.

Things were looking up once we got the server running with SSL we thought we might get lucky and actually be able to use the community edition for hosting, encoding and serving our video. Encoding seemed to work well, we had some small hiccups with the players but since we wanted to use a pre roll HTML5 player those small hiccups with the flash players did not seem to bother us. Unfortunately the next hiccup was a deal breaker, and if you are still reading because you are trying to still get Kaltura running I would recommend you stop trying to get the server running and save yourself time and energy. The short version is the HTML5 player in Kaltura does not work with Safari and in our tests that included both Mac OS and several versions of iOS.

We tried several things to get video to play using the HTML5 player in Safari from Kaltura, including some of the fixes listed on this post in the support forum but came up empty. At this point we decided that we had spent enough time trying to get Kaltura Community Edition to work properly and moved on to looking for a new provider to host, encode and play our videos.

I prefer to work with open source projects whenever possible but in this case the problems with the player and issues with setup added to the overall frustration with Kaltura. The project seems to be aimed more at driving people to use their commercial service, which by the time we were done testing Kaltura Community Edition, we wanted nothing to do with. What would be nice to see is the ability to use the Kaltura server for managing and encoding videos, since this part of our testing did seem to work well, but have the ability to use other HTML5 players, like Video.js, in place of the Kaltura HTML5 player.

Update
I have had a couple people email me asking what we ended up using. In the end we landed at StreamShark. While the interface at StreamShark needs some work we have been quite happy with their service and have saved quite a bit of money since moving there. Even built a WordPress plugin to manage our videos with their API.

Blocking WordPress Blog Spam with .htaccess

While I am a fan of Monty Python’s Spam skit, I am not a fan of automated WordPress spam, and it seems to be getting worse every day. Of course the large majority of WordPress comment spam is just automated comments posting directly to the WordPress wp-comments-post.php file. I have used different methods in the past but recently came across a way to help keep the spammers away.

While there are many very good plugins available for WordPress to help keep spam down, sometimes the best method is to use your we server to block it in the first place. Thanks to a very helpful post on the V7N forum here is a way that you can block a large portion of automated comment spam using your .htaccess file.

Before you add these six lines of code to your .htaccess file on the root of your WordPress installation be sure to make a copy, just in case something goes wrong. The wp-comments-post.php file is located in the root of your WordPress install so you need to add this code to the main .htaccess file. If you have pretty permalinks turned on you probably will not need the “RewriteEngine On” line, since pretty permalinks already turns that on.

RewriteEngine On
RewriteCond %{REQUEST_METHOD} POST
RewriteCond %{REQUEST_URI} .wp-comments-post\.php*
RewriteCond %{HTTP_REFERER} !.*yourdomain.com.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^$
RewriteRule (.*) http://%{REMOTE_ADDR}/$ [R=301,L]

Anyways the code basically does this. It checks for someone posting directly to the wp-comments-post.php file, which automated spam bots do, and if the referrer is not your domain it redirects that request back to the IP address of of where the request came from.

I have been testing this out for a few days now and I am pleased to say that the amount of spam that has made it through is significantly down. It is not all gone, and there is still some making it through, but the percentage has decreased significantly. You will still need to run Akismet to catch the spam that is done by real people but this can help lower the load on your WordPress site from the automated spam bots.

If spam has your WordPress website swamped give this a try and see if it helps to turn the tide in your favour a little bit.

After you have done that take a break and enjoy the Monty Python Spam skit.

Htaccess Tools

If you want to save some time editing your .htaccess file check out Htaccess Tools. It has some great .htaccess generators to help save you some time. Generators include:

  • Htpasswd Generator
  • Htaccess Authentication
  • Hotlink protection of images
  • Block IPs with .htaccess
  • Block hitbots with .htaccess
  • Error Document
  • Redirection by Language

While you can do all of these things without using an online generator, I have found that for some people using an online generator like the ones available here enables people to manage their websites more. Perhaps one of the most useful generators available on the site is the hotlink image protection. By using hotlink image protection you will be able to save on your bandwidth and prevent other websites using your images directly.

Trackback Spam on the Rise

Anyone else notice a rise in trackback spam recently or is it just me they feel like picking on? The last few days I have been getting upwards of 50 trackback spams. Thanks to Akismet I have not seen any of them go through, but I decided that I was tired of deleting it and letting the spammers get access to my server resources. A quick look in my logs showed that the spam was not coming from the same IP so banning the IP or IP range would be pretty much useless.blank1.gif

Here are some entries from my log file:

Host: 216.104.34.250
/2007/03/text-link-ads.html/trackback
Http Code: 200 Date: Dec 18 20:24:03 Http Version: HTTP/1.0 Size in Bytes: 78
Referer: –
Agent: TrackBack/1.6

Host: 91.186.21.51
/2007/02/blogger-label-list-for-ftp-published.html/trackback
Http Code: 200 Date: Dec 18 20:22:38 Http Version: HTTP/1.0 Size in Bytes: 78
Referer: –
Agent: TrackBack/1.6

Host: 66.90.104.22
/2007/02/has-digg-jumped-the-shark.html/trackback
Http Code: 200 Date: Dec 18 20:20:28 Http Version: HTTP/1.0 Size in Bytes: 615
Referer: –
Agent: TrackBack/1.6

Notice anything in common? The User Agent strings are all the same: Agent: TrackBack/1.6.

A quick Yahoo search and I turned up this post Spiders and Bots .htaccess Ban List, which looked like just what I needed. There are tons of bad bots and user agents out there, and this list is only a small number of them I am sure. I really only want to block the Trackback user agent and the libwww-perl user agent since I have been getting several hacking attempts from a libwww-perl user agent.blank1.gif

There are several ways I could have done this but I thought I would try adding this first and see how it goes.


#block bad bots including trackback bot
SetEnvIfNoCase User-Agent "^libwww-perl" bad_bot
SetEnvIfNoCase User-Agent "^TrackBack" bad_bot

<Limit GET POST>
order allow,deny
allow from all
deny from env=bad_bot
</Limit>

I may have to edit the Trackback bot line since I did not include the version number, but I will leave it like that for a day and see what shows up in my log files. I will update this post if/when I do edit the Trackback bot line.

Thanks to Brontobytes Blog for the .htaccess code. It saved me lots of time.

Hope this helps someone that is having problems with automated trackback spam.

Use .htaccess to Block a Country

blank1.gifThere are occasions when you need to do some serious blocking on your website, and you have to block an entire country. I have helped people in the past block countries like China from accessing their website. While there can be many reasons why you would want to block en entire country from accessing your website it used to be a bit of a chore to create the .htaccess file to do it. Well not anymore, check out block a country and with a couple of clicks you can generate an .htaccess file that will block the countries of your choice.

I have been playing with some screencasting software so I took a short screencast of how to use the site. Watch closely or you might miss it. If you feel like blocking off all of us friendly Canadians it only takes you a few seconds now.

After you either copy the information or download the generated .htaccess file all that is left to do is either upload it to your website or integrate it into your existing .htaccess file. It makes blocking a whole country very easy to do. I will definately use this tool the next time I get a call/email to block a country from access a website.

Using .htaccess to Block Comment Spam

When I checked my blog on Saturday I had a large amount of comment spam that had been caught by Akismet, larger than usual for my little place on the web. Browsing through it briefly, I quickly noticed a common thread, they were all from the same IP address. I have better things to do on a Saturday (and actually most days) than wade through a bunch of comment spam, so I quickly went and added another new line to my .htaccess file.

deny from 195.225.177.48

I then deleted all of the comment spam and went on my merry way not thinking much about it until I went a checked my error log here today.


[Sun Oct 14 13:40:17 2007] [error] [client 195.225.177.48] client denied by server configuration: /home/*****/public_html/blog/index.php
[Sun Oct 14 13:40:17 2007] [error] [client 195.225.177.48] client denied by server configuration: /home/*****/public_html/blog/wp-comments-post.php
[Sun Oct 14 12:55:59 2007] [error] [client 195.225.177.48] client denied by server configuration: /home/*****/public_html/blog/index.php
[Sun Oct 14 12:55:59 2007] [error] [client 195.225.177.48] client denied by server configuration: /home/*****/public_html/blog/wp-comments-post.php
[Sun Oct 14 12:45:11 2007] [error] [client 195.225.177.48] client denied by server configuration: /home/*****/public_html/services/order_form.php
[Sun Oct 14 12:13:27 2007] [error] [client 195.225.177.48] client denied by server configuration: /home/*****/public_html/blog/index.php
[Sun Oct 14 12:13:27 2007] [error] [client 195.225.177.48] client denied by server configuration: /home/*****/public_html/blog/wp-comments-post.php

It goes on and on actually. I had an average of 14 hits an hour from this IP address. Image how much comment spam I would have had if I had not blocked the IP address? Now I was also curious as to who might be so interested in spamming the daylights out of my blog. A quick IPWHOIS on DNSStuff.com. You can take a look at the IPWHOIS information yourself, but what I found most interesting is they have a complete IP address range 195.225.176.0 – 195.225.179.255. Now I only blocked a single IP address, and I hope that it is just one bad user on their network, but the minute I see another 195.225.*.* address in my comment spam the whole IP address range will be blocked using:

deny from 195.225

I sent an email to the email address on record for the host, but it is my experience that it will either never be read, simply ignored or will dissappear into :blackhole:.

Custom 404 Page using .htaccess

Mike posted a good question on my earlier post on how to Disable Indexes using .htaccess.

Is there a way to specify what page to redirect to if there is a 404? Currently it’s displaying one created by my web hosting company, which I would prefer to get rid of.

If you want to display your own 404 error page with something other than the standard 404 Not Found Page that is returned by your hosting company all you need to do is add one line to the .htccess file in the root of your web server.

ErrorDocument 404 /404.html

Now when someone types the wrong filename or is trying to browse a folder and you have turned off indexes to the page that will be returned will be your custom 404.html page. It does not have to be just an html page either. You could make it return a php page that has code to email you when someone has triggered a 404 error telling you what page they were looking for, other interesting information to help them find what they might be looking for or maybe something that is just fun. You can do any number of things with your own 404 page.

If you are a blogger and are using WordPress you should take look at this great page “Creating an Error 404 Page” on WordPress.org.

There are other options as well. The one I have on this blog right now is not very exciting, since the home page is simply returned as the 404 page.

RSS Feed Scraper

It appears that I have a fan, ok maybe not a fan. I have a website scraper that is just not smart enough to actually read the content they are scraping so they are getting my nice RSS feed additional content and posting it in the site. They have many of my posts and the majority of them have this at the bottom of them:

Copyright © LGR Webmaster Blog. This feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at is guilty of copyright infringement.

Visit the LGR Webmaster Blog for more great content.

You would think that would make it pretty obvious that they stole the content from somewhere. I have sent an email to both the email address on the whois record for the domain and the email address I could find for the web host for their IP address in hopes of having the content removed from the site. Considering the email I sent to the address on the whois record bounced for the domain, I don’t know if I will have much luck.

After the email to the address on the whois bounced I thought I would have a little fun with this scrapper site. If I can’t get the content removed I can at least make sure people know that the content is stolen, just in case they don’t read the copyright notice at the bottom of the post. I post the odd image into my posts, but from now on I will make sure there is always an image in the post, even if it is just a blank image that you can’t see in the post itself. This image is important. It is placed in the WordPress uploads folder, but I suppose it could be placed anywhere on your website. Inside of the WordPress uploads folder I have added another .htaccess file with the following:

ErrorDocument 403 /images/403.gif

RewriteEngine on
RewriteCond %{HTTP_REFERER} websiteIwantBlocked\.com
RewriteRule .* - [F]

I changed the website name obviously, but you should get the idea. This stops sending all the images from the WordPress uploads folder to any request coming with the referrer of websiteIwantBlocked.com and returns the 403 error document. Because these are all images that should be sent out from this folder I have created a custom error document that is an image for this folder and placed it in another folder (images). Now when an image is requested from the websiteIwantBlocked.com instead of the server sending out the image I have in the post it returns a 403 error and my custom error image, which by the way looks like this:

403

Now when someone visits the website that scraped my feed that I have listed they get a nice warning that the site has stolen bandwidth, content or both. It only does this for the sites I have listed so feed readers should not be affected.

There are other things I have done as well. I have added the website IP address into the blogs root .htaccess file and denied access, in case the website was scraping the feed directly. It looks like this if you are wondering:

deny from IP ADDRESS YOU WANT BLOCKED

I use FeedBurner for my feeds, and usually they list uncommon uses of feeds, but there has been no mention of this one. I did notice that one of the bots is WordPress so it is possible that the site is scraping the FeedBurner feed and not directly from the site. One of the features I wish FeedBurner had was the ability to block individual IP addresses from accessing a feed. That would make it so much easier since every website has an IP address.

I guess we will see if I get an email back from the web host. I am not holding my breath. I think I might have to make due with this, or move the feed away from FeedBurner so I can block individual IP addresses.

How do other people handle very persistent RSS feed scrapers?

Disable Indexes using .htaccess

I have several personal websites on a shared server where indexes are turned on by default in Apache. That is simply annoying, because I hate having stray empty index.html files sitting all over the place. I suppose I could just leave the indexes on but I dislike the idea of anyone in the world being able to just peek into folders, even if it is unlikely they will find anything very interesting, you just never know. They might have some hole into the system. Anyways, if you are like me the easiest way of getting rid of indexes is using one line in an .htaccess file in the root folder:

Options -Indexes

Now if a folder does not have an index.html file the server will respond with a 404 file not found error and send people your error page. Amazing how that one simple line can save time and keep you from having to go and create index.html files in all those folders you don’t want people poking around in.

If you are still wondering why you would want to do this take a read through this post titled “Find almost any kind of Ebook or File Online” over at Earners Blog. One line in the .htaccess file stops that from happening.

Whitehat SEO Tips for Bloggers

This video has Matt Cutt’s doing a presentation at WordCamp 2007 with search engine optimization tips for bloggers.

It is a long video, just over 1 hour in length so you might want to just put the headphones on and let it play while you are working on something else.

There is a lot of good, basic information in the video that will help all people that run blogs and websites. Aside from the basic information about SEO, Matt encourages people to be creative to find ways to get links. He also has a great security tip using an .htaccess file to protect the WordPress admin folder. Make sure you change the IP address of your home computer and your work computer.

Put this .htaccess in /wp-admin/ (not in your root directory!

AuthUserFile /dev/null
AuthGroupFile /dev/null
AuthName “Access Control”
AuthType Basic order deny,allow
deny from all
# whitelist home IP address
allow from 123.45.67.89
# whitelist work IP address
allow from 89.67.45.123
Read more at: http://www.reubenyau.com/protecting-the-wordpress-wp-admin-folder/