Generating a histogram of all commits over time

Generating a histogram of all commits over time
๐Ÿ‘‰
This is a conversation with ChatGPT to ask it to help me with a task that I wouldn't have otherwise invested the time to figure out how to do. This was before ChatGPT let you share your conversations.
What would I use to generate a histogram of all the commits over time?

To generate a histogram of commits over time, you can use Git's built-in git log command in combination with a tool like awk, sort, and uniq. Here's an example command that you can run in your Git repository to generate a histogram of commits per month:

git log --format="%ad" --date=short | awk '{print $1}' | sort | uniq -c | awk '{print $2, $1}' | sort

This command extracts the commit dates from your Git log, formats them to extract the year and month, sorts them, counts the number of commits for each month, and finally outputs the results as a two-column histogram of year/month and commit count.

Here's an example of what the output might look like:

2022-01  6
2022-02  3
2022-03  1
2022-04  2

You can use this output to create a histogram visualization of your commits over time, for example, using a tool like gnuplot or matplotlib.

How to use this with gnuplot or matplotlib?

To use the data generated by git log with gnuplot or matplotlib, you can save the output of the git log command to a file and then load that file into gnuplot or matplotlib.

For example, to generate a histogram of the number of commits per day using gnuplot, you can use the following commands:

  1. Run git log --pretty=format:"%ad" --date=short | sort | uniq -c > commits_per_day.txt to generate a file commits_per_day.txt containing the number of commits per day.
  2. Create a gnuplot script file, for example plot_commits_per_day.gp, with the following contents:
set xdata time
set timefmt "%Y-%m-%d"
set format x "%Y-%m-%d"
set xlabel "Date"
set ylabel "Number of Commits"
set title "Commits per Day"
plot "commits_per_day.txt" using 2:1 with histogram

This script sets the x-axis to use time, specifies the format of the dates, and creates a histogram of the data in commits_per_day.txt.

  1. Run gnuplot plot_commits_per_day.gp to generate the histogram.

To use matplotlib instead, you can load the data from the commits_per_day.txt file using Python's built-in file handling and then plot the data using matplotlib. Here's an example Python script:

import matplotlib.pyplot as plt
import datetime

dates = []
counts = []

with open("commits_per_day.txt") as f:
    for line in f:
        count, date = line.split()
        dates.append(datetime.datetime.strptime(date, "%Y-%m-%d").date())
        counts.append(int(count))

plt.bar(dates, counts)
plt.xlabel("Date")
plt.ylabel("Number of Commits")
plt.title("Commits per Day")
plt.show()

This script reads in the data from commits_per_day.txt, converts the dates to datetime objects, and creates a bar chart of the data using matplotlib's bar function.

change this to commits per week

To convert the histogram of commits per day to commits per week, you can sum the counts of commits for every 7 days. Here's an updated script that does this:

#!/bin/bash

# create a histogram of commits per day
git log --pretty=format:"%ad" --date=short | sort | uniq -c > commits_per_day.txt

# convert to commits per week
awk 'BEGIN {count=0;} {count+=$1; if (NR%7==0) {print count; count=0;}}' commits_per_day.txt > commits_per_week.txt

# plot the histogram using gnuplot
gnuplot -p -e "set term dumb; set title 'Commits per week'; plot 'commits_per_week.txt' with boxes"

In this updated script, git log is used to extract the commit dates as before, but this time the output is piped to uniq -c to count the number of commits per day. The resulting histogram is stored in commits_per_day.txt.

The awk command then reads commits_per_day.txt and sums the counts of commits for every 7 days. The output is stored in commits_per_week.txt.

Finally, gnuplot is used to plot the histogram of commits per week.