Discussing the nuts and bolts of software development

Monday, July 07, 2008


Mining Project Scope with Python + SVN

After developing a component for a fairly large enterprise application, I was tasked with coordinating a complete review of all code touched by our team. As the project involved hundreds of commits across every tier of the application, this was not a trivial task. I didn’t want to make developers hunt through SVN logs and JIRA - even if the result was accurate, producing meta-documentation in the time allotted for actual documentation seemed wasteful. There had to be a way to automate it. After failing to find an appropriate SVN tool via Google, I weighed the pros and cons [1] then decided that using an SVN API would be easy enough.

To make it interesting As part of my commitment to Macadamian’s core value of continual improvement, I decided to try out Python. The following script was maybe an hour of entertaining work that saved at least a day of tedium.

import pysvn

# Script variables. Could be read from a command line or config. file.
username = "jlennon"
password = "yoko"
projectFirstRevision = 103991
svnRoot = "http://path/to/svn/trunk/"

# Team members’ names for SVN commits.
users = set([

# Set up SVN callbacks.
def get_login(realm, username, may_save):
return True, username, password, False;

def get_log_message() :
return True, "log message called"

client = pysvn.Client()
client.callback_get_login = get_login
client.callback_get_log_message = get_log_message

# Convert the revision number into a pysvn revision.
projectFirstRevision = pysvn.Revision(pysvn.opt_revision_kind.number, projectFirstRevision)

# Get all commits since the project's first commit.
items = client.log(

# Set up variables for storing commit information
allPaths = set([])
removedPaths = set([])

# For each commit entry *made by one of this project's users*
# Add any deleted files in the commit to removedPaths
# Add any added/updated files to the list of all paths
for item in [ e for e in items if e.author in users ]:

for changed_path in item.changed_paths:

if changed_path.action == "D":

# Remove deleted files from allPaths, then display
allPaths = allPaths - removedPaths

for path in sorted(allPaths):
print path

Overall, I was impressed by Python. My biggest problem was with the structure of the documentation, but that’s to be expected with any new language. My biggest surprise was that the indentation rules didn’t bother me – why is this cited as a show-stopper by so many developers? I’ll definitely be coming back to Python to test its OO facilities.

[1] The main con being that a developer would much rather write a neat SVN spider than write documentation. In a leadership role, you have to think twice before eagerly firing up your compiler to solve a problem.

Comments: Post a Comment

<< Home

This page is powered by Blogger. Isn't yours?