Tools you can’t miss when starting a project #3 – document repository

Last week I decided to help what I think is a very good idea (more on that, I’m sure, in the future) try and get itself off the ground. In my mental preparations for this task, I began to wonder what we were going to need at the very beginning stages that would allow our work to proceed. I came up with this list:

  • Collaborative Software
  • Version Control System
  • Document Repository (with versioning abilities)
  • Issue Tracking Software

In the past couple of weeks I’ve talk about Collaborative Software as well as making sure your Version Control was up and running from the get go. Next on the list is the newest addition to my bullet points, as well as the one I personally know the least about. Although it’s the one I’ve personally dealt with the least, it’s also the tool that I’m quickly becoming a huge huge advocate for requiring.

Here’s the problem as I see it. When a project is first starting there is a lot of communication going on surrounding lots of decisions that are hard or impossible to change once a project is in motion or growing. These communications take the form of email, shared docs, official paper thingies, napkins, business cards, pages ripped from magazines, etc. Not only should you be documenting these decisions to shed light on them down the road, but you should also be versioning them whenever possible to show future contributors or employees how they evolved over time during these heady first days of your new endeavor.

This is where most groups fall down. This tool doesn’t exist in an infrastructure until way further down the road, if it exists at all. Those important initial documents, outlining visions and values and processes and structures are either a static document or lost altogether. I’ve seen groups tackle this two different ways in the past, and both involve another crucial tool being a stunt-double.

First I saw a group try and make their collaborative software shoehorn into this tool’s feature-set. If you have a robust collaborative software  application like Confluence it can work, but it’s going to be a huge effort to keep up with all of the attachements and mixes of media. It just wasn’t meant to be a document repository. Unfortunately this group was trying to use Trac, and it was a massive, confusing failure.

The second attempt I’ve seen at this was to try to use a version control system (in this case subversion) to handle the document repository duties. This also, technically, can perform the task. It CAN keep version copies of just about anything. However, the problem arose as this company grew. First off, the hierarchy of documents was amazingly complex in structure and permissions requirements, so managing SVN was significantly more complex than just maintaining a codebase. Secondly, the IT staff had constant trouble as CPAs and Project Managers and Sales Reps and Executives and everybody else tried to navigate and maintain this maze on a daily basis. This is still limping along, but begging for a better solution.

The best tool I’ve seen so far to handle something like this is Alfresco, by Alfresco, Inc. It tries to be a one-stop shop for several tools (collaboration, records management, web publishing, etc. according to their website), but where it really excels is Document Management. It’s simple, well thought out and it just works. There are, however, a few drawbacks.

  1. It has a community and an “Enterprise” edition. The Enterprise edition is a completely different animal based off of the publicly available source code. I just don’t like those fauxpen source models. Never have…
  2. Technically, there is a weak link in their application change. Whenever you view a document, you get a flash-based preview on the page, which is great. It uses a headless OpenOffice daemon to open the file and convert it to a PDF. It then uses pdf2swf to convert it to a flash object to display on the page. Clever, to be sure. And about as stable as mashed potatoes. We actually have a zabbix check that automatically restarts OpenOffice when it randomly eats itself on this server.
So I haven’t found the best tool out there, I don’t think for document management. If anyone knows of anything else I’d love to hear about it. But I do think that having this solution in place as a project gets off the ground is critically important to help document that process and inform people down the road.

Setting up a new project – Tool #2 you can’t miss from the start

Last week I decided to help what I think is a very good idea (more on that, I’m sure, in the future) try and get itself off the ground. In my mental preparations for this task, I began to wonder what we were going to need at the very beginning stages that would allow our work to proceed.  I came up with this list:

  • Collaborative Software
  • Version Control System
  • Document Repository (with versioning abilities)
  • Issue Tracking Software

I spoke about collaborative software, giving my basic thoughts and a few examples of it out there, last week.  This week I’ll take a little closer look at Version Control. I’ll take a look at why it’s needed (I think) at the earliest stages of a project, and some examples of the major players in the field.

Why is it needed when you start a project?
Even though you may not be producing a lot of code in /trunk in the early days of a project, having that platform ready is really a no-brainer. Different types of version control have different hurdles to get them set up and ready for use. While we won’t go into that here, I will say that just about any new project I start working on uses github, a distributed version control system (DVCS). While it may be empty at the beginning, having your Version Control System set up from the outset will help your development team be prepared quickly as well as have your repository ready to roll as soon as the first line of code is ready to be checked in.

The Primary Options

There are a TON of options out there for Version Control, and growing all the time. For the purposes of this post, I’m dodging anything that’s not open source and/or very Windows Centric (usually they’re mutually inclusive).

CVS (Concurrent Versions System)
CVS is the granddaddy of all version control. It’s got a pretty basic feature set, and is not really used for new projects by anyone that I know (although that doesn’t mean it’s not, it just means that I’m not all-knowing). I have only ever run into it as a legacy system.

SVN (Subversion)
SVN, started in 2000, is an Apache organization project sponsored by Collabnet. It was designed to be a mostly-compatible replacement for CVS. It’s userbase is massive, and it is the prototype for a centralized version control application.
Thousands of huge projects use and/or support it, and it’s easily installable (both the client and the server) from any Linux distribution I’ve ever seen.
I’m not going to get into the years-long debate of centralized vs. distributed version control systems. Google has millions of opinions on it already. What I will say is that:
  1. centralized version control has less of a learning curve and is easier for more casual users to grasp and operate
  2. most of my experience is in Subversion
  3. any new project I spin up will use a distributed version control application (Git, primarily)

Git (distributed version control)
Git has been around for some time now, as well as it’s largest online presence GitHub. Git actually goes way beyond being only a version control system. It’s really a pretty amazing framework for a lot of things. On top of that, it’s FAST. Like, crazy fast.
Github currently has ~1,000,000 users and ~2,000,000 repositories. One of the biggest contributors to the Git framework itself is Linus Torvalds, of Linux fame.

Bazaar (distributed version control)
Bazaar is sponsored by Canonical, aka Mark Shuttleworth’s hobby. I’ve never actually used or admin’d an installation of Bazaar, but it’s always in the conversation when picking a distributed VCS. If someone wanted to offer up some opinions on it I’d love hear them / put them up on here.

Mercurial (“Hg” – distributed version control)
Mercurial is an open source project that is apparently sponsored by lots of known organizations (Google, Atlassian, among others) and is actually used as the version control for Firefox. I’ve only ever seen a poor installation of Hg that attempted to be configuration management for a pretty awful collection of servers, but it was pretty speedy, and a stable application even when being woefully mis-used.

That gives at least a gloss coat of version control options out there.  Of the tools needed to start a project, it’s the least needed “on Day 0”, but you really can’t start a project without knowing what you’re using to hold the source code so I feel compelled to include it in this list.

Next up: Document Repositories (ugh)