21 · 07

CirruxCache: Advanced configuration sample

That's it! This whole blog is cached and directly delivered by CirruxCache (only static files were cached before). My origin server is a tiny eeebox connected through my personal ISP. So this configuration is a good challenge to offload my tiny web server as much as possible. I think this is a good opportunity to show an example of a configuration which is a little bit more evolved. The point is, I cannot set the same cache TTL for the whole website, and actually, I want to cache several websites...
# URL mapping
urls = {}

base = (
'/_admin/(.*)', 'Admin',
'/_store/(.*)', 'Store',
'/_cron/(.*)', 'Cron'
)

urls['default'] = base + (
'(/debug/.*)', 'Debug',
'/(.*)', 'Root'
)

urls['www.shad.cc'] = base + (
'(/themes/.*)', 'Blog_Static',
'(/plugins/.*)', 'Blog_Static',
'(/admin/.*)', 'Blog_Forward',
'(/.*)', 'Blog_Page'
)

urls['www.zaphod.eu'] = base + (
'(/pub/.*)', 'Zaphod_Redirect',
'(/.*)', 'Zaphod'
)

# still supporting the old config

urls['cdn.shad.cc'] = base + (
'/blog(/.*)', 'Blog_Static',
'/(.*)', 'Root'
)

urls['cdn.zaphod.eu'] = base + (
'(/admin/.*)', 'Zaphod',
'/(.*)', 'Root'
)

# POP definition
# You can define and configure your Point Of Presence

class Blog_Static(cache.Service):
origin = 'http://orig.shad.cc'
forceTTL = 2592000 # 1 month
ignoreQueryString = True
forwardPost = False
allowFlushFrom = ['x.x.x.x']

class Blog_Page(cache.Service):
origin = 'http://orig.shad.cc'
forceTTL = 3600 # 1 hour
ignoreQueryString = True
forwardPost = True
allowFlushFrom = ['x.x.x.x']

class Blog_Forward(forward.Service):
origin = 'http://orig.shad.cc'

class Zaphod(cache.Service):
origin = 'http://orig.zaphod.eu'
forceTTL = 2592000 # 1 month
ignoreQueryString = True
forwardPost = False
allowFlushFrom = ['x.x.x.x']

class Zaphod_Redirect(redirect.Service):
origin = 'http://zaphod.eu'

# !POP
I think this configuration is enough readable to avoid any explanation. However, do not hesitate to leave any comments.
Finally, I created a google groups to centralize all help requests. So if you need help, go to http://groups.google.com/group/cirr... or send an email to cirruxcache 'at' googlegroups 'dot' com.
18 · 07

New release: CirruxCache 0.3.1

I am really glad to announce a new major release of CirruxCache.

This new release includes the following changes:

  • A storage webservice: store big files (<= 2GB) on the Blobstore in order to deliver them through CirruxCache. This feature is useful to bypass the 1MB limit on appengine.
  • An admin panel that enables users to flush objects, manage big files and see some statistics about the resources used.
  • Bugfixes

It is really important to note there are few limitations on the panel admin:

  • There is no error reporting on the flush panel (it only displays the number of objects trying to be flushed).
  • Storage manager displays a "500 Internal Error" when uploading. It only happens when you don't have a billing account (the Blobstore is only available on billing accounts, refer to appengine).

These two limitations will be improved in the next release, and there will be more informations in the statistics panel.

The Storage WebService will be documented really soon, but you can access the admin panel through "http://your.cirruxcache.app/_admin/"

I make the most of this opportunity to announce some changes on the project website:

I hope you will enjoy this new release.

19 · 05

Minor release: CirruxCache 0.2.2

CirruxCache 0.2.2 has just been released. It contains some bugfixes (thanks to Devattas to have reported errors on Datastore latency). Webpy has been updated to the last version.

I have also updated the documentation, especially I brought more details on Point of Presence configuration and usage of cron tasks for garbage collection.

Finally, some of the users reported me that there is a real problem with the cached object size limit (currently 1MB). I am working on the solution, I will take advantage of the new Blobstore service on AppEngine to store objects. Maybe I will keep the Datastore only for meta-data. This solution will raise the cache object limit to 50MB.

Stay tuned :)

11 · 03

CirruxCache 0.2.1 is released

I have just released a new version (0.2.1) of CirruxCache. To remember:

CirruxCache provides a software solution to dynamically cache HTTP objects on Google Appengine (using the Datastore and the Memcache services).

This new version includes an interesting set of features:

  • allow object flushing from restricted IP
  • configure a PoP (Point of Presence) according to a virtual host
  • several behaviors (cache, redirect, forward)

In more details, the last feature is the ability to configure a point of presence to differ from a classical caching mechanism. For example, I may want to configure "/admin/*" on my website to be redirected on the origin without caching.

Of course, this release includes several bugfixes, especially a fix on the "Expires" HTTP header which improves the caching performances.

Do not hesitate to test this new version and to comment any bugs or any suggestions.

19 · 02

Adding virtual host support to webpy

Webpy is a tiny web framework. I use it a lot for my web-services applications. In general, I let my web server (lighttpd) to handle virtual hosting. But as you may know, I am working on a CDN solution on top of Google App Engine, named CirruxCache. In that case, while I have absolutely no control on the server configuration, I need to handle virtual hosting from the code. Webpy maps urls by iterating through a tuple. So my solution is quite simple: wrapping the tuple to override the __iter__ function according to an environment variable (HTTP_HOST). Let's take this basic webpy example, without vhosting:

import web

urls = ('/(.*)', 'hello')

class hello(object):
def GET(self, name):
if not name:
name = 'World'
return 'Hello, %s' % name

if __name__ == "__main__":
app = web.application(urls, globals())
app.run()
Let's add the VhostMapper class:
import web

urls = {
'default' : ('/(.*)', 'hello'),
'my-vhost.domain.tld' : ('/(.*)', 'helloVhost')
}

class hello(object):
def GET(self, name):
if not name:
name = 'World'
return 'Hello, %s !' % name

class helloVhost(object):
def GET(self, name):
return 'Hello %s' % web.ctx.environ['HTTP_HOST']

class VhostMapper(object):
def __iter__(self):
url = urls['default']
if 'HTTP_HOST' in web.ctx.environ:
vhost = web.ctx.environ['HTTP_HOST']
if vhost in urls:
url = urls[vhost]
return iter(url)

if __name__ == "__main__":
app = web.application(VhostMapper(), globals())
app.run()
Finally, you can use curl or wget to test your vhosts:
$> curl -H "Host: my-vhost.domain.tld" http://localhost:8080/

It is not so early to announce that the next version of CirruxCache will handle virtual hosting :) I am sure this simple hack can be easily reproduced to use virtual hosting in some other Rest frameworks.

30 · 10

CirruxCache: speeds up your HTTP app using Google Appengine as a CDN

It is a great moment, for the first time since I have started to work at Zoomorama, I have just released as open-source an important part of our server platform.

I previously explained how to use Google AppEngine as a Content Delivery Network (CDN). CirruxCache project concretizes this idea. I released the first version based as the one we use in production.

Here is the features it currently supports:

  • honor Cache-Control
  • cache TTL override
  • several POP (Point Of Presence) configuration mapped on custom base-url
  • ignore query string
  • POST forwarding
  • expired entries garbage collection
  • extensibility


CirruxCache is not documented at the moment even if you would be able to use it after reading the comments in the app.py file. I'll document this app in the next few days, but if you need more documentation, don't hesitate to contact me.

The project website.

30 · 07

Speed up HTTP delivering using Google AppEngine

Google AppEngine provides an high-level cloud service which means that your application will be distributed automatically on top of the Google platform. All of your code will depends on the AppEngine SDK, so it could be risky to develop complex application on it. I develop a webservice application for content delivering and content publishing at Zoomorama. We currently use Akamai CDN as a simple cache layer to improve data delivering accross the world. It is interesting for me to use AppEngine in the same way: without changing anything on my existing code base. I have found some posts on blogs dealing with this AppEngine usage, but they are not focused on dynamic HTTP caching like a real CDN. Principle is very simple, all HTTP requests on my AppEngine application will be copied to the AppEngine Datastore. Moreover data which are delivered through AppEngine are cached by AppEngine servers. The code below is a tiny proof of concept:
# HTTP caching on Google App Engine
# - by shad <shad@zaphod.eu>
#

import web # webpy 0.3x
from google.appengine.ext import db
from google.appengine.api import urlfetch

origin = 'http://my.website.com'

urls = (
'(/.*)', 'Root'
)

class Cache(db.Model):
data = db.BlobProperty(default=None)
headers = db.ListProperty(str)

class Root(object):
def GET(self, request):
cache = self.readCache(request)
if cache is None:
cache = self.writeCache(request)
for h in cache.headers:
print h
return cache.data

def readCache(self, key):
cache = cache = Cache.get_by_key_name(key)
if cache:
return cache

def writeCache(self, request):
url = origin + request
response = urlfetch.Fetch(url=url)
if response.status_code != 200:
raise web.NotFound()
cache = Cache(key_name=request)
cache.data = db.Blob(response.content)
cache.headers = []
for k, v in response.headers.iteritems():
cache.headers.append('%s: %s' % (k, v))
cache.put()
return cache

if __name__ == '__main__'
app = web.application(urls, globals())
app.cgirun()
I use webpy to depends on the AppEngine SDK as less as possible.
I have almost finished the production version of this application. I am doing some performance tests. This application is closed source for now. But I am going to release the code source in few weeks. This version will include:
  • Fetch from Memcache (about 10 times faster).
  • Headers forwarding.
  • Read "Cache-Control" and "Expires" to define a TTL (rfc 2616).
  • Multi origins (according to url mount points).
  • Other small features (force TTL, ignore query string, etc...).
It is important to note that AppEngine does not keep running instances of your application (your CGI is distributed and it is executed on demand). So this application have to start very quickly (no configuration file, no dynamic generation, etc...).
17 · 07

Gmail-Notify improvements and a modified behavior

Gmail-notify is a small program written in Python. I use it because it is light and it just do what I want. I am a regular user of the firefox extension gmail-notifyer and I was uncomfortable with the Gmail-notify behavior, so I wrote a little patch to make it more like I want. I also added the possibility to execute a command when a new message arrive. I use it to play a sound (aplay ~/sounds/mail.wav). I don't know your habits but maybe you could enjoy my modifications. Here is a small description:
  • A click on the event box close it.
  • A click on tray icon open the mailbox.
  • Mailbox is now open as HTTPS.
  • A command can be executed when a new mail is received (like playing a sound).
To use my modifications, download my patch, and apply it to gmail-notify-1.6.1.1:
wget http://garr.dl.sourceforge.net/sourceforge/gmail-notify/gmail-notify-1.6.1.1.tar.gz
wget http://zaphod.eu/pub/gmail-notify-1.6.1.1.patch tar zxvf gmail-notify-1.6.1.1.tar.gz
(cd gmail-notify ; patch -p1 < ../gmail-notify-1.6.1.1.patch)
rm gmail-notify-1.6.1.1.tar.gz
Then run gmail-notify.
gmail-notify/notifier.py

Pages