Periklis Ntanasis:
Master's Touch

fade out

Collaborative distance in git contributors…

Some time ago I was checking my LinkedIn connection recommendations and I realized once again that the word is small. There are people who we may have never talked to but as we change jobs or projects we may write to the same codebases they used to contribute to or vice versa.

And of course I am not talking about open source contributors who of course one way or the other we could argue that have contributed to our codebase by providing their open source code. I am talking about colleagues who are working on something else now or they have moved to a different job or people who contributed as external consultants etc.

The world is definitely small.

This made me thinking about the Erdős number.

Erdős number

The Erdős number (Hungarian: [ˈɛrdøːʃ]) describes the "collaborative distance" between mathematician Paul Erdős and another person, as measured by authorship of mathematical papers. The same principle has been applied in other fields where a particular individual has collaborated with a large and broad number of peers.

— Wikipedia

There are variations applied to other fields, one of the most known is the Bacon number.

Six degrees of separation

Six degrees of separation is the idea that all people are six, or fewer, social connections away from each other. Also known as the 6 Handshakes rule.

— Wikipedia

In a sense the Six degrees of separation idea is similar to the Erdős number but Erdős number is actually stricter. A person does not have only to be acquainted to Paul Erdős but in action to be a scientist who has co-authored a paper. Even this definition is not strict enough.

Code Collaborative Distance

Going back to what made me think of all that, I made the thought that possibly most of the software engineers in a region or in a specific field are connected between them according to their code contributions.

When saying code contributions consider mainly git commits which is the most common. Of course this could be any other VCS or equivalent.

Of course the number of "hops" between 2 persons to be connected may vary and be big but I guess that in most cases if we could query all the private and public VCS’s then for specific geographical region and/or field and/or business domain then the number would be usually pretty small. In that case the Six degrees of separation could actually stand true.

Collaborative Distance in Open Source

As you understand searching the contributors inside private VCS’s of companies to find the inter-connections between different contributors is impossible.

However, how easy would this be for open source software? Wouldn’t be great to find your distance against giants such as Linus Torvalds?

It turns out that this is not as easy as a pie too.

Undoubtedly, Git is the most popular VCS today and some of the most successful hosting services are built around it. To name a few some of them are GitHub, GitLab and Bitbucket. But this is not everything. Organizations are hosting their own git instances and there are a few other names of hosting services out there. Also, in some cases people are using other VCS and not the king Git.

To make the problem even harder there is not a common API one could use to query each one of the hosting services. Also, the APIs are usually rate limited and probably finding connection between 2 persons results to a nonlinear addition of requests after each hop. This would cause the rate limit to be a blocking issue for many searches.

Another potential problem is that people may have used multiple email addresses in the code they have contributed to different code repositories.

Finally, people may have contributions to repositories they don’t own.

All the above make finding the connections between 2 persons difficult even in one specific public hosting provider such as GitHub.

I certainly after putting some thought in it I won’t attempt trying to write a program about it! If someone builds something drop me a line in the comments! :-)

Center of the Open Source software

The magic of the Erdős or Bacon numbers is that they measure the distance away from specific points and people may compare their values based on that.

This makes me wonder, who would be the Paul Erdős or Kevin Bacon of open source software?

Of course, this should be the most "linkable" open source contributor. This person may change as the time passes. To be a good center this person should have contributions to different projects with many contributors each. This makes me think that there is a possibility this person won’t be any expected recognizable figure.

What do you think? Who would you like to be the Paul Erdős of open source software? Leave your comment bellow!

Thank you for reading! Take care!

Comments

fade out