A brief history of Carwow

Tom Lord
Carwow Product, Design & Engineering
5 min readApr 4, 2024
Black and white photo of David, Alex and James; Carwow’s co-founders

Just for fun… I wanted to explore visualisations Carwow’s codebase history, in a digestible format. I looked at several interesting tools/approaches to animate code contributions, and opted to experiment with Gource as a tool to generate an animation for the history of some of Carwow’s main Github repositories.

Let’s skip to the good part:

Disclaimer: The animation is intended for illustrative/entertainment purposes only. It is certainly not (as explained in more detail below) a comprehensive graphic of all Carwow’s code contributions.
It’s also worth remembering that a big proportion of “contributions” to the business are, of course, not in the form of code — so not represented at all by this animation!

There are two key areas to understand when generating this animation:

  1. How does a Gource log file work (and therefore, how can we construct an optimal one for such a big data set?)
  2. What are the available Gource command line options (and therefore, how can we generate a good visualisation)?

Creating a Gource log for the whole organisation

To answer (1), you can try running on any supported repo (Git, Bazaar, Mercurial, SVN, Apache, orCVS):

gource --output-custom-log example.log

And observe that each line looks something like this:

1304448053|Tom Lord|A|/README

Those components are: Timestamp, Username, Type ((A)dded, (M)odified or (D)eleted) and file.

So in order to create a log for the whole organisation, we need to do two things:

  • Merge the logs of multiple projects.
  • Clean the data. For example:
    - Merge “duplicate” users (e.g. tom-lord vs Tom Lord)
    - Delete/merge “ambiguous” users (e.g. Tom, in an organisation with more than one person called Tom!)
    - Rename “inconsistent” users (e.g. Joe, who didn’t submit his code under his full name of Joe Bloggs!)
    - Redact sensitive information, if applicable.
    - …

The precise requirements/solution for data cleaning will vary by organisation. In an ideal world you wouldn’t even need to do this step, but chances are your logs are not perfectly “clean”!

Tinkering with Gource command line options

Gource is hugely customisable. To see the full list of available options, check the project’s README — there’s no point in me repeating the documentation here, but you can reference this to understand all the options used in my script.

The Gource command options used to generate the “history of Carwow” animation, shared below, serves as a much better foundation for a “good visualisation” than the Gource defaults, in my opinion, because the defaults are are much better suited for a single, small project.

One final point to note: If you want to include a captions file (as used below), then the timestamp of each caption must exactly match a timestamp in the log.

For example (based on the `1304448053|Tom Lord|A|/README` log entry shown above), this will work:

1304448053|May 2011 - Example project is created

But this will not:

2011-05-01|May 2011 - Example project is created

even though the captions does support any ISO 8601-compliant timestamp. This is something to be especially aware of if you want to add captions for things that aren’t directly coupled to the code, like “Fundraising round completed”.

The full script

So, here it is:

#! /bin/sh

# 1. Generate a Gource custom log files for each repository
gource --output-custom-log research_site.log ~/projects/carwow/research_site
gource --output-custom-log quotes_site.log ~/projects/carwow/quotes_site
gource --output-custom-log dealers_site.log ~/projects/carwow/dealers_site
gource --output-custom-log car-data-app.log ~/projects/carwow/car-data-app
gource --output-custom-log dsutils.log ~/projects/carwow/dsutils
gource --output-custom-log deals_service.log ~/projects/carwow/deals_service

# 2. (optional) Make each repo appear on a separate branch instead of merged onto each other
sed -i "s/(.+)\|/\1\|\/research_site/" research_site.log
sed -i "s/(.+)\|/\1\|\/quotes_site/" quotes_site.log
sed -i "s/(.+)\|/\1\|\/dealers_site/" dealers_site.log
sed -i "s/(.+)\|/\1\|\/car-data-app/" car-data-app.log
sed -i "s/(.+)\|/\1\|\/dsutils/" dsutils.log
sed -i "s/(.+)\|/\1\|\/deals_service/" deals_service.log

# 3. Join the logs together, and sort them numerically by the first column (the time):
cat research_site.log quotes_site.log dealers_site.log car-data-app.log dsutils.log deals_service.log | sort -n > combined.log

# 4. Standardise and de-duplicate names into "<first-name> <last-name>"
sed -i "s/tom-lord|/Tom Lord|/" combined.log
sed -i "s/David S|/David Santoro|/" combined.log
# ...Lots more lines like this!!

# De-duplicate various "bot" contributions to a single username
sed -i "s/dependabot-preview\[bot\]|/bot|/" combined.log
sed -i "s/dependabot\[bot\]|/bot|/" combined.log
sed -i "s/vagrant|/bot|/" combined.log
# ...Lots more lines like this!!

# Ambiguous/Unknown users. Hiding their contributions :(
sed -i "/|John|/d" combined.log
# ...A few more lines like this.
# (Thankfully, the vast majority of Carwow contributions can be easily attributed to a specific person)

# 5. Feed result into gource:
# NOTE: Remove the `--output-ppm-stream -` and `| ffmpeg ...` to just display the video without saving it!
gource -1280x720 \
--background-colour 000000 \
--date-format "%B %Y" \
--max-user-speed 200 \
--seconds-per-day 0.03 \
--auto-skip-seconds 0.5 \
--file-idle-time 4 \
--elasticity 0.006 \
--bloom-multiplier 0.30 \
--hide "filenames,progress" \
--multi-sampling \
--max-files 600 \
--highlight-all-users \
--fullscreen \
--user-image-dir photos \
--key \
--caption-file captions.txt \
--caption-duration 8 \
--logo carwow-logo.png \
--dir-name-depth 1 \
--output-ppm-stream - \
--output-framerate 60 \
combined.log \
| ffmpeg -y -r 60 -f image2pipe -vcodec ppm -i - -b 65536K movie.mp4

Note that the above setup also requires:

  • A captions.txt, as mentioned above.
  • A logo (in this case, carwow-logo.png)
  • A folder with everyone’s photo. I wrote another little script to pull them from Slack.

And speaking of photos, here’s another “sanity check” script I wrote to display all users in the log file, along with whether or not we have a photo of them, ordered by their number of contributions (as measured by gource!)

!/bin/sh

contributors_with_count=$(cat combined.log | cut -d"|" -f2 | sort | uniq -c | sort)

while read -r entry || [ -n "$entry" ]; do
# Extract the number and name
number=$(echo "$entry" | awk '{print $1}')
name=$(echo "$entry" | awk '{$1=""; sub(/^[[:space:]]*/, ""); print}')

# Check if a corresponding file exists in the photos folder
if ls "photos/$name".* 1> /dev/null 2>&1; then
echo "YES $number $name"
else
echo " NO $number $name"
fi
done <<< "$contributors_with_count"

This is helpful to quickly check if there are any “key” people included in the animation, for which you’re missing a photo.

Conclusion

As mentioned at the start of this article, the animation is certainly not a comprehensive graphic of all Carwow’s code contributions: it’s only showing up-to 600 files, from up-to 4 directory levels deep, from six key repos.
I found that was a good compromise between illustrating of how the business has grown/evolved over the years, without experiencing “information overload” in the animation.

What’s also pretty neat about generating the animation from a tool like Gource is that anyone can trivially re-run the script, to re-generate a new animation. (And absolutely no custom video editing is needed.)

Maybe in years to come, someone will discover this and re-run it with all the updates!

--

--