Interview with Karl Fogel of Subversion and CollabNet




<

Introduction

Karl Fogel is a founding developer of the Subversion project. Subversion is sponsored by CollabNet and under the company's employ, Karl describes himself as the CollabNet-to-developer liaison. In the following, Karl explains the inception of the open source Subversion project, what it has required to build its community, and what he has learned in order to successfully maintain it. Karl's vantage is interesting not just from the perspective of managing such a community but also because the Subversion project itself is one of the required sorts of software technologies used in open source development.

Subversion is a type of software configuration management (SCM) tool known as a version control system. These types of tools are important toward letting developers collaborate on software projects. Subversion is part of the tigris.org community's focus on building collaborative software development tools. CollabNet provides enterprises with distributed software development solutions. It's used by companies such as Sun Microsystems, HP, and Barclays Global Investors to help coordinate development teams spread out around the world.

Part III of the Concerted Disruption, Climb Aboard series.

We started Subversion about five years ago, and I think it is a little bit different from a lot of open source projects because we started with the goal of replacing a specific piece of open source software We were trying to replace CVS.

You had a good reference point.

We had a great reference point and also that saved us from a lot of arguments about what should and shouldn't be in our first release. We could say that if it's in CVS it should be in our 1.0 version, if it's not in CVS it doesn't need to be. There was an inherent controversy reduction substance in our projects—at least before 1.0. Now we get into all those discussions that we put off. But we have a foundation/relationship already built with all these people that makes it a lot easier to do that because they all worked together to get to 1.0.

As to how we got those developers. The numbers we have right now are roughly thirty full committers—people who can commit anywhere in the source code, thirty partial committers—people that just do documentation fixes, fix support scripts, or something like that but do not have commit rights in all the code. Of those thirty full committers, I'd say roughly fifteen are really active on a day-to-day basis. You get some others that come flying in like Han Solo every now and then—they fix a bug and then they go out and you don't hear from them for a few months.

The way we founded it was mainly word-of-mouth. We knew the CVS space pretty well, we started contacting those people, they talked to their friends, and pretty soon people just showed up. We actually held physical, open-to-the-public design meetings when we began the project in San Francisco. Some of those people are still with the project today. But you know, one of best committers is in Slovenia and he certainly didn't come to those design meetings. But we wouldn't be where we are without him.

Could you please clarify your role in the project?

I guess you could call it, founding developer. CollabNet only employs somewhere between three and four of those committers. We don't all work 100 percent on Subversion all the time. Somewhere between three and four is accurate. My role was mainly—you know I had a lot of experience working with open source projects before, and in particular with CVS, which helped to get me involved with version control—it was sort of to set the tone at the beginning of the project—a CollabNet-to-developer liaison when necessary, although there haven't been that many conflicts, we haven't needed a liaison that much. My role is also to write code.

It's hard to put my finger on it exactly but when you have a bunch of volunteers, the main currency that's going on is attention. If one of them does something, does somebody notice? They're going to do more things if somebody notices. The main part of my job that isn't coding, it's noticing. When someone makes a change I always review it, even if there is nothing, no negative comments to make about it. I'm talking about the commit e-mails now, when somebody makes a change, an e-mail gets sent out to all the developers showing the change. So usually to review it, you respond to that e-mail quoting positive things, why did you code this way, etc. and the real point of that review is partly code quality but it's also partially so that people know that attention is being paid to what they do. I'm pretty sure that if we didn't do that consistently, there would be fewer developers and they wouldn't be as active.

Do you think that's critical? Other people have said paying attention to developers' contributions is at least at the top of the list of things to do to maintain their interest. Would you agree?

Totally. If you ever talk to some open source developer and they say, "well I just want to get a working piece of software out there, that's what I'm in this for," don't believe them. That's not human nature. People working in groups want reinforcement from the group and from particular members of the group.

What are some other features like that, which you think are very important for maintaining developer interest?

Aside from other human beings paying attention to them, and responding, like taking them up on good suggestions and code and things like that, there are principles in infrastructure I think. One of them is, people can deal with any consistent amount of low-level annoyance and administrative overhead, but if you have a kind of peaks and valleys set up where there are high obstacles that they have to get over in order to get to the lowlands, then they don't do it.

A good example is OpenOffice.org—I mean OpenOffice.org is great software but it's maybe an atypical open source project, if Sun weren't paying a lot of the developers, I don't think anything would be going on over there. Not because it's bad code but because you need a PhD in building the software just to compile it. A normal human being cannot download OpenOffice.org and build it. They have a kind of huge obstacle there and that will always be in the way of people hacking on it and fixing bugs If you want people to join in a kind of low overheard, low initial commitment way, things need to be made easy for them, which means that someone else needs to make a big upfront effort in structuring things to be that easy. Software won't be easy to build out-of-the-box unless someone takes the time to do it.

And that applies to other infrastructure as well, like the bug tracker, the mailing list, the web site in general, the version control system, the more specialist insider knowledge you have to have to use the stuff, the fewer developers you get. Because people find a bug in your code and they want to go fix it, but if they get stopped five minutes into that process by something that looks like it's going to take a lot of investment, then they'll just give up and go do something else. It's easier to live with the bug.

Would you say that's one of the goals in developing—well a lot of what I see on tigris.org is facilitating that ease.

Yeah, obviously the project infrastructure can't solve all of the problems. For example, you can run your project in the CollabNet environment or in the tigris environment, but that's not going to make your software easy to build, that just has to do with how you structure the code. Things like the issue tracker and having it linked up with the same user accounts that are in the version control system, things like that, those are basically about making things easy for people. You have one password that you use everywhere, if you send a mail you can be guaranteed that there's going to be a link in the archive to that mail later that you can then paste into the issue tracker to make it easy for someone to follow that discussion etc.

When you say to make it easy for someone to follow the discussion, that's an instance, I would think, of how it facilitates the social aspect of the development community. What are some other areas that you think socially the software comes into play?

It's hard for me to make a clear boundary between social and technical. There are some things that are definitely social, I mean if you say something on IRC and someone's feelings get hurt, that's clearly not a technical issue but a diplomacy issue. But things like making it easy for every piece of information in the project to have a canonical reference, for example a URL, or a mailing list message number, or an issue number, something like that, those are partly social—I mean when you're talking you need to have those handles. You want be able to say issue number 908 and people just know where that is and what you're talking about. But it's also technical in that nobody can have a technical discussion if they didn't have these abbreviated handles to work with all the time—it's you know, the limitations of the human brain.

You mentioned that one of the things for example, Subversion had going for it, was that you had this reference point going back to CVS. Do you think that in general it's best to start a community of developers around a project with an existing reference like that? Or do you think it's easier to build that community with something completely new—a totally different idea?

I don't know. It might just be that I don't know or it could be that it's too early in general for us to know the answer to that question. If you ask again in ten years we might have some idea.

Too early in the sense of open source development in general? Or just from your experience with this particular project?

I meant too early with open source in general but if you look at it, it could be that people who have observed wider trends may have an answer for that. My sense is that it's not that you get fewer or more developers for a completely new thing, it's that you get a different kind. We tended to get people who knew CVS, hated it, and wanted to solve their problems. They were after a very specific thing and even now there tends to be a kind of conservativism on the Subversion development list, where you pose some completely outrageous new feature that really would save the world but people are like "c'mon we lived for so many years without that, is it really part of our core mission? Is it worth the trouble? What else will it disrupt if we do that?" Whereas if either Subversion were really a radical design or we were just a whole different kind of project that had been something new right out of the box, probably we'd have a different type of developer and people would be less resistant to new ideas, but on the other hand you know, it might take a lot longer to get anything done and shipped out the door. I'm not saying that true, blue-sky projects never succeed, but the potential failure paths are of a different nature of that sort of project than they are in ours.

Do you continue to hold physical design meetings?

Not really because now they're [developers] spread all over the world and it might make people feel left out. We might eventually have a conference but we'd have to alternate it between Europe and America if we did. We couldn't just hold it in Silicon Valley every year because a lot of our developers are in the UK and Europe. We even have one guy in a little cabin on a hilltop in Taiwan somewhere.

How do you keep people interested now that they are working on the project and it's well-established? How do you keep people from getting bored?

Part of what I do is try to figure out what is making each developer tick. There are some people who are really about maintenance and fixing bugs and they're not interested in new feature development. And there are other people who are interested in anything as long as they get to be the leader of that effort when they do it, they don't want to be involved in things where someone else is really driving and they're just helping. For each developer you have to watch their e-mails and the kinds of changes they make in the software and the sorts of activities they get involved in. You know, first ask yourself the question, "do I want this person to stick around, do we want this person to stick around, and if so what steps should we take to keep them from getting bored or disenchanted?"

It can be something like when you run across a really hard problem you might specifically go to that person—you might mail a public list but if the topic of your mail addresses that person by name and says, "Fred, I'm having some trouble figuring out how to work this, can you tell me your thoughts on the matter?" and then you ask some technical questions. What's really going on is, yeah you're asking for help and it will be useful when you get it but it will also be reaffirming Fred's idea of his own position in the project as the go-to guy for that area. When someone is being treated like that, they don't want to leave unless they really have something better to do. Or there is some specific reason why they get disillusioned.

Every communication that takes place has two or three things going on. It could be a technical request for information but it's also a confirmation of someone's position—it also has psychological effects and those are sometimes as much the driving purpose of the communication as the technical ones. Have I been sufficiently Machiavellian for you?

Can you tell me more about Subversion, tigris.org, and CollabNet?

Tigris is a community of projects that are for the most part centered around the idea of running projects or making software projects work better. Its ambition is to be a little like SourceForge but without the 80,000 failed projects.

Since your experience is more with Subversion and CollabNet—could you talk more about those relationships?

It's a funny situation, I mean CollabNet definitely started the project and all the developers are aware of that. If you ask any of them who founded Subversion, they would say CollabNet. At the same time, CollabNet never really gets to exercise any great degree of possessiveness over the project and we certainly can't say to the developer group "OK, we're going to release version 1.4 on Friday, September 23, 2005 so let's all get to work"

CollabNet has slightly more influence than the number of developers it employs because it provides the hosting infrastructure and it has a well-regarded history. That's a kind of moral capital that could be used up The other developers are aware that CollabNet is using Subversion as part of its own product and while they wouldn't tolerate us making changes into the version solely to support that product (which is not itself open source so it's not like those developers would get the benefit of it), they do view our use of Subversion as a good source of information as to how people are using Subversion. We have customers that tend to have larger projects with more users, more files, larger files, and deeper directory structures than the typical Subversion user. Often when a scalability problem comes up, it's noted first by CollabNet's own customer relations people, well, by CollabNet's customers first, then it goes through our engagement person, and then it filters its way through the issue process to one of us. We take it to the public community and they've been made aware of this bug that they otherwise wouldn't have found out about. They see the value of having CollabNet make corporate use of Subversion, and nowadays CollabNet isn't the only entity doing that but for a long time we were, and I think we're still the only that has done real performance and scalability testing. And made the results public, so providing good real world usage data like that is a point in CollabNet's favour.

The greatest testing source is the user community in terms of finding bugs but in terms of scalability issues, then CollabNet's customers are really driving it. There are some other large installations of Subversion out there and we get bugs from them too, so it's no longer just us. But you know the development community does know that there's this channel of information coming from our customers through the public issue tracker. We don't show the actual bug report, which is internally issued by the customer, because that's private data, but we take the relevant parts of that and copy it into a public issue. Thereafter, the internal issue just refers back to the public issue and tracks the public issue's progress.

CollabNet does have a practice for its own clients. How do clients benefit from CollabNet, how is it helping them facilitate development as opposed to installing Subversion themselves?

Subversion by itself doesn't have an issue tracker, doesn't have a mailing list, doesn't have integrated user account management, it's just a version control system. So what people are getting from using the CollabNet Suite of products, the entity now known as CollabNet Enterprise Edition, is that they get a unified user account management system and various other forms of integration between the version control, the bug tracking, the mailing list, their file sharing area, things like that. And they don't have to manage it. They can just say, press a button, we're starting this project, and fill in the fields, and the thing is set up, it goes and CollabNet's just managing it. Although there are possible scenarios in which they could manage it too.

Suppose you're starting a brand new company, you want to build an enterprise application, what's your recommendation to get this off-the-ground in terms of attracting that developer community and facilitating their collaboration on the project?

I think the most important thing is ease of entry and the appearance of credibility. I mean, credibility is always this weird sort of positive feedback loop where you know you have to look like someone who's taken seriously in order to be taken seriously. And so making the web site look well-organized and professional, having all the information developers would want easily accessible and organized in a way that's oriented toward them, is important. Specifications documents, requirements, that sort of thing, on-line and in formats that open source developers are likely to prefer such as plain text or OpenOffice.org.

And make the license clear. Make the commitment to open source clear very early. If people get the sense that the company is dithering around about whether they really want to open source it or, you know, if the license is such that the company could take everything in-house after getting a lot of contributions and it looks like they might do that, then people will be wary of contributing. But if you make it clear that the stuff is really going to be developed out in the open that comforts people.

I'd say the big thing is to avoid having any particular single high hurdle at the beginning, whether it be building the initial version of the software or wrapping your head around the requirements or whatever it is. I'm trying to coin a memorable phrase for this principle but I haven't got one yet. Maybe you can come up with it. The idea is that the amount of information somebody gets out of the project should be at all times linearly proportional to the amount of effort they're putting in. So if you spend five minutes browsing around a web page you should get five minutes worth of information out and it should be the most important ten things about that project. If you like what you see and you put in another half hour, then the award level has to stay at that level. It can't be that it looks interesting for the first five minutes but then they spend the next half-hour poking around and find all of a sudden that it's hard to figure out where stuff actually is. All efforts have to be rewarded, then people stick around and get involved.

I'm pretty sure I didn't come up with that actually. And I used to think that it was crucial that there be some running code when the project started that would be something people could download and play with, but Subversion didn't start like that and we certainly got developers right away, so I don't know, maybe that's just not important. I mean it certainly helps.

Do you think that the fact that the concept you had for it was tantalizing to people is what made it so that you didn't need some code to start with?

Yeah, I think that it's so close you can taste it factor and knowing exactly what we were setting out to do made it less important to have running code. But if you were doing something new or something where it is going to be large and complex and you can't always be sure that the reader is going to have a sense of the scope then you probably want something they can download and start playing with right away. Because I think that once someone gets to the stage of downloading, they're already half-way to the stage of contributing, and if they actually fix a bug, send in a patch, then you've got them hooked. As long as you accept the patch and stay engaged with them, they're probably going to come back and do more.

You gave some numbers at the beginning, that there were about thirty full time committers, about fifteen on a day-to-day basis, and then you mentioned three or four are employed by CollabNet. So the ones that are not employed with the company, do you know whether many of them are contributing their time on the basis of another company that they work for, which is getting something out of the project? Is that why they're involved?

No for the most part they're not, they're just contributing their time pro-bono. Lately a few of them have started getting contract work. But most of the contract work tends to be things like installation and consulting. In a few cases, another company has paid one of those developers to fix a bug that was in our issue tracker. I guess that's starting to happen a little bit more. The rate of that happening is growing at a slow rate but that's not the main thing that's going on. For the most part they're just contributing their time.

The ones that have found some work doing consulting, would you say that was a goal they had or is that something that just so happened.

It just happened. I don't think they started out expecting that. If that was what they got involved in Subversion for, they made a really huge investment for a small gain. Because it's not like it's providing anyone with a full time living. It could, if someone wanted to dedicate themselves to it. I don't think any of them have made a goal of doing that. When I look at their consulting web sites they list Subversion as one of the things they do, but it's not their focus. People have all put in a year or more of hard work on Subversion to get the experience and credibility to be in that position. I don't think they started out trying to do that.

It sounds like for a lot of them it's a labor of love. So if that's the case, then if you want to get a successful project off the ground you're going to have to find something that can inspire that sense in people. Or present it in a way that does.

I guess so and I sort of think that implies that there are certain projects that are really hard to get started as open source projects. You know, you're never going to find somebody that wants to devote their time to writing something like an open source license server. Although there is one on SourceForge but I don't think it has a very big development community now.

Is there anything else that you'd like to add from your perspective on having a development community? Is there something you think is important to consider that we haven't talked about?

I think we've touched the most important points. I'd say if you're giving someone advice on how to start an open source project, I would say make sure that whoever they get to drive that effort is someone with good people skills, and especially good people skills in writing because the contact is not going to be face-to-face, you know, with the developers. You see a lot of people post to, especially an open source development list, and they sound kind've curmudgeonly and they're flaming people right and left, and you know, they make their technical point as harshly as they can. And often those people are very, very good programmers but I don't think they tend to gather groups of developers around them and maintain loyalty. They're sort of lone wolves but I think that point is pretty obvious.

This concludes Part Three of a four-part note. Part One provided background on what the open source community is and how to engage it. Part Two featured an interview with Jeff Bates of SourceForge.net, Slashdot, and the Open Source Technology Group. Part Four's interview with Louis Surez-Potts, Community Development Manager for the OpenOffice.org.org project, covers the political and social architecture of open source communities as well as practices for successful oversight of a project.

 
comments powered by Disqus