Implementing a static file Producer Into different CMS/Frameworks

Intro

The static file producer is a concept developed by the Mir Coders, to fit the need for a very resource friendly content distribution. Mir has a very interesting way of distributing content. In Mir, there are two kinds of servers: the production server (which has the database, Tomcat, Apache, and executable Java code running as a web application) and the mirror servers (which run only Apache). Basically the production server periodically runs its producer every few minutes to crank out static HTML pages. These pages are then transferred to the mirror servers via an update script (currently Rsync is usually used to do the actual transfer). The result is that there is a high degree of redundancy, the system is incredibly scalable, and any single mirror server can be pulled from the pool of available mirrors without taking down the entire site (from an end-user point of view). The fact that Mir is really easy to set up for mirror admins, and is really low maintenance from their perspective, is an added bonus.

Producer Features

Here are some basic features of a producer. We try to add these functions to another CMS.

  • The producer is creating a static presentation of a specific part of the website (articles, images, start page, archive page)
  • The producer is able to create pieces of a website (only the newswire or only the comments etc pp)

Producer Basics

Here are some basics to keep in mind

  • In the MVC model, the producer is the controller
  • To be more exact, the producers.xml, the producer configuration file, is the controller and the producer itself is the dispatcher (sometimes called the request handler)
  • The producer does not work based on a user request
  • The producer is not a cache. All cache system out there are request based, they will NOT work as a producer.

Producers.xml

The cool thing about Mir is that the site can be reconfigured on the fly simply by changing the producers.xml file. This file serves as an XML configuration which defines content types, which jobs are available to the producer, and quite a lot of other stuff.

Why build a producer? What's it for? Is there any other way to do it?

The whole question of static site production from dynamic content is more difficult than it first appears. If you've got a newswire showing, for example, it's no big deal to tell whatever framework you're using to update the static HTML of the frontpage when a new article is published (this could even happen right after the database saves the article). That's when the trouble starts.

Let's say there are 50 articles showing in the newswire on the front page. Somebody has just published a new one, so that goes at the top of the wire - this means that one article also drops off the bottom of the wire. So far, so good. However, it also means that potentially every archives list HTML page in the site is now out of sync with what's in the database, and probably all of the RSS feeds are screwed as well, and if the article was published in more than one region or category, every page that the article was published on in the site (along with their associated archive pages) are also out of sync. The job of the producer is basically to periodically run through the site and generate any pages that need to be updated. Keep in mind that this also applies to any article text that somebody updated, any images that somebody uploaded into a feature, etc.

Most caching systems would simply blow out the cache at this point and reconstruct all of the pages (if you were able to keep track of what needed to be reconstructed). For small sites, this is probably OK. However, for Indy sites, it's quite possible to have thousands or tens of thousands of pages of archive listings, and we don't want to have to reconstruct all of these every time somebody publishes an article. So the producer can be configured to do some things every time an article is published (i.e. regenerate the front page of the site) and do other jobs less frequently (i.e. regenerating the archives pages).

Is there another way to accomplish the same thing? It's an interesting question to consider, but it really is a difficult problem for big sites where we want to serve content statically. It might be possible to take a totally different approach and try to synchronize the databases and media between servers, but this would mean more complex configuration for each mirror server (currently they don't need their own databases). One benefit of such an approach, though, would be that it would be impossible to stop a site by knocking out its production server - keep in mind that if we lose a production server, then open publishing comes to a halt on the affected site.

The producer concept and CakePHP

If we decided to go with a producer for a CakePHP site, there would be some coding and architectural implications. Currently within Mir, the admin system is totally separated from the front-end content of the site, and for good reason - the mirrors don't have any admin systems, and aren't capable of processing any requests that require an admin system. So, although it'd be nice to include things like dynamic hiding controls directly on the page within a CakePHP indy site, which would only be visible to admin users, it would potentially not work because the mirror sites would have no way of telling that the user is an admin (remember, the mirrors are just dumb HTML sites).

It might be possible to get around this through a creative use of subdomains, as Mir does. Currently, most Mir sites are set up so that for instance http://www0.indymedia.org.uk is the production server, showing the most recent generated HTML of the UK site. In Cake, we could potentially run all admin functions on a production server, and set things up so that we could view the site on that machine directly from the database (allowing us to do dynamic-content stuff, like having a "Currently logged in as Yossarian" message at the top of the page, having extra UI controls available to logged-in users, etc).

The producer concept would probably have implications for any advanced "web 2.0" features that we would want to include. This is not to say they shouldn't be there, just that if we want to include a lot of snazzy features, we need to have a clear architectural distinction between stuff that's going to be served from the static mirror sites (probably articles, and other media) and what would have to be served by the production site (potentially, maybe some community based features which depend heavily on the context of the currently logged-in user to figure out how to present the site).

An example for CakePHP

Occam has thoughtfully provided a sample producer.php file for us to try out as a CakePHP producer. It's in our source code repositories here.

Some further words of wisdom from the pros

From occam (dealing with the first part of this write-up):

<occam> the problem i tried to describe is that most CMS do not really have a good seperation in the MVC model.. not the ones i had a closer look into
<occam> in most cases there is not really a "dispatcher", something like a main() or the main controller who takes care of all steps of the page generation
<occam> something like the "producer" in a dynamic way
<yossarian> you mean in order to figure out what needs to be produced?
<occam> yeah
<occam> in cake its the dispatcher
<occam> there you have at least one
<occam> but
<occam> the problem with cake is, that the dispatcher just hands over the controll to the views etc pp..
<occam> it is not acting as a controller of the process
<occam> if you would have such a clear structure, you could almos use the dispacther as a "producer"
<occam> also
<occam> what i wrote about the pieces
<occam> the thing i like about Mir is that you can generate pieces of a website
<occam> its very hard to do that in most cms'es
occam> cake has some, how they call it.,..
<occam> Components
<occam> well, not really
<occam> hmpg
<occam>  /app/views/elements/
<occam> and as i said, with cake there would be no need to have a producers.xml... you could just write a controller
<occam> in fact, i think it is just a matter of the setup
<occam> you could even create shtml pages
<occam> and use the SSI stuff
<occam> but its kind of a hack
<occam> it could work like this... for example, you only want to generate the newswire, you would write a newswire controller, for this controller you write a short views/elements . then you simply call /newswire/show/ or with the producer and write it to a shmtl file..
<occam> you would transfer all the producers.xml logic into the controller and views..
<yossarian> so the producers.php file you supplied, where does that come into this?
<occam> you would have to make some seperation to not mess everything up
<occam> that file needs to run as a daemon
<occam> or by cron or so
<yossarian> is that separate from the idea of transferring the producer functionality into the controllers?"
<occam> so, basic indy publishing example
<occam> producers.XML logic... its kind of seperated, yes
<occam> let make a example process
<occam> we have a basic cake function "publish", normal dynamic cake site with a controller and a view
<occam> or even more basic, we just have a typical article model
<occam> with controller, view and model
<occam> we just add one article
<occam> you can access the article by lets say /article/show/1.php
<occam> ok, now, at the moment where you write or update the article
<occam> you add a task to a producer queue
<occam> to (re)generate the article
<yossarian> so that would occur on both "create" and "update" actions
<occam> with the next call of the producer, it will look into the queue and produce the shtml file by calling the dispatcher with the url /article/show/1.php
<occam> yeah, depends on how you write it, but both makes sense
<yossarian> in Rails that could potentially be accomplished with after_create and after_update observers to clean up the code a little, i don't know if cakephp implements nice convenient observers liek that

From zapata (on the whole subject of "producers"):

<Zapata> for me there is a separation between the idea of generating a static site and the producers.xml
<Zapata> (the first one was invented by others, the latter one by me ;-)
<Zapata> also
<Zapata> it's important to note the downsides of the mir implementation of producing:
<Zapata> * There's no automatic way of determining what needs to be produced
<Zapata> * And relatedly: there's no automatic mechanism to delete stuff
<Zapata> and of course
<Zapata> the producers.xml format isn't very handy, there's no sandbox mode, there's no documentation, ...
<Zapata> there's no easy interaction between dynamic and static parts of a mir site
<Zapata> btw for me the producers.xml concept is about customization
<Zapata> the static file generation is a scalability feature