An update on our Drupal conversion

We've pushed into mid-November the relaunch of Jacksonville.com on our new Drupal-based news site management system. We're not concerned about the technology, but we are concerned about the people. Radically changing production processes in the middle of the presidential election didn't seem to be a really bright idea.

I spent most of last week in Jacksonville teaching classes on how to use the new system, and I'm back again this week to help in any way I can. Joe Allen-Black, Jonathan Bennett and others on the team have been doing a great job teaching basic sessions while also continuing to work on site development and their day jobs. Just about everybody in the Times-Union's newsroom has been through at least a 90-minute training session, and many have had advanced training beyond that.

Are we overtraining? It's worth keeping in mind that most of the folks in the newsroom have never had any opportunity to work on the Web, being locked out through organizational and technological barriers.

One surprising thing is how much time we've spent discussing some parts you might think were pretty simple and straightforward.

Take the basic story ("editorial node") editing form, for example. It's just a news story. How complicated could it be?

Well, it turns out that the development team has blended in some very cool enhancements, such as an Ajax-driven (Javascript) tool for finding and creating bidirectional links with related content and a tool for creating stock data links if the story is about a publicly held business. Embedded video handling is very important -- Jacksonville.com actually posts more video than the local television stations. Photo handling and the interaction of the Web system with the newsroom's print-focused legacy DTI content management system is quite sophisticated.

We're making very heavy use of some common Drupal tools that have a steep learning curve but a big payoff: the highly flexible Taxonomy system, the Views family of modules, and the Panels family. Part of the training challenge is equipping editors and senior managers with a deep enough understanding of those tools that they can actively manage the site's future growth and direction.

Back at the server farm in Augusta, a team at the network operations center is taking all this very seriously as well. We're moving from a model in which the server generally worked with flat HTML files to one in which the site is heavily dynamic, with pageviews created on the fly by PHP scripts, potentially highly personalized, and much more expensive in terms of computational horsepower. So there's a whole new cluster of Linux-based servers with an interesting configuration.

Out in front is a server running Squid (a caching server) and Squirm (a redirection/proxy server). We've used Squid for years as an accelerator in our classified systems. This new role is different: Squid/Squirm are going to let us blend new Drupal site content with old flat-file content coming from an array of separate servers, hiding all the details from the outside world, and (importantly) preserving old URLs.

Behind Squid/Squirm will be the old Jacksonville.com server and an array of new boxes running Drupal. One of the strengths of Drupal is that it's designed for linear scalability -- you can just throw more hardware at it when your traffic grows. Drupal also makes use of a lot of fancy database query caching internally, and we're adding Memcached to take that load away from the database and put the caches directly in system RAM.

Then behind Drupal is yet another array of servers: A MySQL database cluster. MySQL is especially good at "read" (as opposed to "write") performance, which is why Google uses it to power it ad-delivery engine. We have Drupal sending all writes to a master server, while reads can come from satellites.

All of this is designed to serve all the Morris newspapers as we go forward, so scalability is critically important.

Notably, every piece of software I've mentioned -- Linux, PHP, Javascript, Drupal, Squid, Squirm, Memcached, MySQL -- is open source.

One (legitimate, I think) criticism of "old media" companies is that they've been unable to get their technology organizations out of the 20th century. While both "two guys in a garage" Web startups and new-media giants like Google are moving quickly with open-source tools, corporate IT departments tend to look to familiar old vendors and even closed systems for solutions. We may make some new mistakes, but we're not going to make that one.

Comments

Hello Steve, Very interesting! Long-time reader, first-time caller. About that quite sophisticated photo handling and interaction with the newsroom's print-focused legacy DTI content management system, will print photos and captions from the DTI system be published to the Web, and if so, are you be able to make that transfer without a human watching over it? I'd love to see that. Another print-to-Web question: Once a story is pulled from the DTI system and published to the Web, what's the plan for how long that URL will stay valid? Also, is there the capability for "reverse publishing" from Drupal into the DTI system? Regards, John

Yes, basically all images that are associated with a story in DTI are exported automatically. The images are pushed to the Drupal server, and an XML version of the story is passed to Drupal. The XML includes all the image references. We're using Imagecache on the Drupal side to auto-generate resized versions on the fly. Any given image might be displayed about 300 pixels wide in a Javascript-driven slideshow (embedded in the story template), or in two larger versions using the Thickbox DHTML layering effect, or placed in a promotional slot on a section front or the homepage. That last size is the tough one, because our section front design spec calls for some inflexible sizes and images have a habit of being randomly proportioned. An image that won't fit in a promo slot is automatically cropped by Imagecache, and if an editor doesn't like the crop, there's a tool to choose a different crop. However, the right way to do it is to plan properly and make sure the first-listed image is suitable for a section front promotional slot in the first place. There is one oddity to our system: We're exporting NITF from DTI, then feeding it into some proprietary Morris technologies, then generating a custom Atom-based feed for Drupal. The Drupal FeedAPI system proved to be inadequate for dealing with multiple image references, so we wrote a custom Drupal loader. We may streamline some of those processes in the future. Once a story is published on the Web, we're not planning on removing it. The URL -- which is search-optimized using custom Pathauto rules -- should remain active indefinitely. (The Yahoo ad-network deal raises the economic value of log-tail pageviews and search-driven traffic on newspaper sites.) Regarding "reverse" publishing: We're not building an automated pipe back into DTI at this time.

Very cool. To follow-up on the DTI associated-image export, are the copy desk-edited photo captions that ran in the paper mapped to the print-published images (and exported too)? How does the export deal with the odd case of one caption sandwiched between two photos (and the caption references both photos)? We've struggled with that. So once a story goes on the Web with an indefinite URL, there's got to be something clever that goes on with the request dispatching so you don't end up with story databases that have to expand to theoretical infinity. How do sites deal with that? I've often wondered. For the Web page templating, are you using Smarty or somesuch? Thanks for your time.

I don't use DTI, but I think we've set it up so that there's a 1:1 relationship between photo and caption. We haven't worried about database size; MySQL is easy to expand, and Drupal's node IDs can handle up to a billion items. As for templating: We use PHPTemplate, the standard engine shipped with Drupal.

Hi Steve: Thought I'd try your blog rather than e-mail since others will probably also be interested in this. Lately we've been using personal iPhones and Windows Mobiles for news gathering. For example, CoverItLive's Windows Mobile app is killer. It allowed me to post cell pics and entries directly to the live chat without going through e-mail, SMS or some other middleman during a recent Obama Jacksonville rally. This week Rand Miranda filed numerous SWAT situation story updates via his Wordpress (our current breaking news platform) iPhone app rather than lugging a laptop and 3g card to the scene. Does Drupal have equivalent apps or mobile platforms for iPhone, Windows Mobile and Blackberry that I can spread the word on? While sophisticated mobile browsers such as Opera, Skyfire and Safari in many cases make it possible to use full-on Web-based software on smart phones, they are often still too clunky to be practical. As one of Jacksonville.com's resident news guys, I'm VERY excited about what Jacksonville.com and Morris Digital Works have put together using Drupal for the new Jacksonville.com. Bill Bortzfield Content Manager Jacksonville.com/The Florida Times-Union

The Wordpress app is probably hardwired to work only with Wordpress. However, I've used half a dozen desktop clients to post blog items, pages and stories into Drupal without a Web browser. Drupal is very standards-based, and implements an XML-RPC interface that supports the Blogger API, MetaWeblog API and most of the Moveable Type API. This should mean that properly written client applications can post directly into Drupal, if the administrator turns that functionality on. iblogger may be a solution for iphone users, but we would need to do some testing. I don't know whether it would be limited to the blog content type, or if it's possible to support editorial nodes. Our editorial node type is highly enhanced and blog clients may not know what to do with it.

I have tried drupal, i never could understand it, all of it was too confusing for me. So i shifted to joomla, much easier and a larger community. but ive heard that drupal can handle heavy traffic as compared to other scripts. im not at that point yet so ill stick with joomla till then :)