Wikipedia's Network of History

Created by Cameron Astor ([email protected])

About

Open any Wikipedia article about a state or society which no longer exists, such as Gran Colombia or the Song Dynasty, and you’ll probably find a section in the sidebar that looks like this:

A list of states, societies and/or cultures which preceded the one in question, and those which came after — its ‘successors’. On its own this would not be noteworthy, but being Wikipedia, these are lists of hyperlinks. Click on one and you are taken to another article, usually with its own list of predecessors and successors. Keep following these links up or down the chain and it won’t take long to reach quite far backwards or forwards in time along the evolutionary tree of history, so to speak.

Taken as a whole, these links form a directed graph which could theoretically span the entirety of Wikipedia’s thousands of articles on historical polities. The aim of this project is to visualize this graph, and in doing so reveal Wikipedia’s ‘map’ of historical development.

Controls

Click and drag to pan, scroll to zoom (pinch zoom on mobile).

Click on a node to highlight its direct predecessors and successors. Successor edges are highlighted in green, predecessors in purple.

With a node selected, click on the title to be redirected to the original Wikipedia article.

Methodology

The graph layout is generated by the graphviz dot layout engine from a collection of nodes (articles) and edges (links between articles) scraped from Wikipedia. Nodes are sized by total degree (number of incoming and outgoing edges).

For any particular node, the adjoining edges reflect the links given in the associated article as closely as possible, except for some adjustments made to the data as part of the scraping and visualizing process:

Erroneous or broken links are removed. These include links to image files or articles which do not exist.
Edges are added in a ‘greedy’ fashion. Imagine a scenario like this: an article has a list of successors, but many of those successors do not include that article in their own list of predecessors. In these cases, all it takes is one article to suggest a linkage for it to be added to the graph, regardless of whether it is consistently referenced in both the origin and destination articles.
Cycles are not permitted. There are many cases where the successor and predecessor of an article are the same, forming a cycle. These are especially common in cases of short-lived states, rebellions, or changes of government, such as the Central African Empire or the Emirate of Bari.

In these cases, the cycle is broken and only one of the edges is kept.

Observations

The results, while fairly noisy, reveal some interesting patterns. Despite no explicit date or time data being involved in laying out the graph, there is a distinct historical chronology extending from the bottom of the graph (earliest) to the top (latest). The bottom section of the graph narrows into several ‘trunk’ like structures which terminate with some of the earliest cradles of civilization such as Predynastic Egypt, Mesopotamia, and pre-Vedic India.

Meanwhile, the upper portion of the graph is populated by many currently existing or relatively recent modern states.

The chronological aspect of the layout is inconsistent and forms more of a rough gradient than a solid structure. Certain subtrees which do not contain many nodes terminate ‘early’ relative to others. These may be indicative of areas with low rates of ‘state turnover’ or politogenesis. The best example is the British Isles, which after the unification of England in the 10th century was remarkably stable, at least from the perspective of state collapse and formation. As a result, the modern United Kingdom sits very low on the graph relative to, say, China, Russia, Germany, and many others.

The graph also reveals some interesting data issues in Wikipedia’s network of polities.

First of all, this particular graph by no means covers all of the articles on historical states that exist within Wikipedia. Indeed, some relatively large chunks of history are missing from it. For example, the Warring States in China are absent, as are some ancient city states of Mesopotamia, and many others. This is due in large part to inconsistencies with predecessor and successor assignments in pairs of adjacent nodes. In the example of the Warring States, the state of Chu lists the Qin Dynasty as its successor. However, the Qin Dynasty article does not list Chu as its predecessor, meaning that Chu is inaccessible if the propagation is running from the Qin side. I have written the code which crawls the network in such a way that one can easily append these missing sections to the main graph by starting propagation from within one of the disconnected sections, but this still requires manually identifying those sections. I hope to find and capture more of them in future versions.

More fundamental is the issue of what sorts of articles are allowed to be part of this network in the first place. There are plenty of oddities here. Among these are the numerous articles which do not represent polities at all. They include articles such as ‘History of ___’ articles as well as articles for specific leaders, events, policies, etc. such as the Doctrine of Lapse or the Dominican Civil War.

We also see the inclusion of sub-polity level administrative divisions, such as specific provinces. These can create severe distortions in the flow of the network such as confusing bifurcations or even the complete dispersal of a single polity across a sparsely connected web of administrative divisions. The Roman Empire is a good example of this. In the case of the Roman Empire, we find not only a node representing the whole empire, but also many other nodes representing specific provinces of the empire like Hispania, Judea, etc. scattered around the graph. There is no clear logic behind whether to assign the whole empire or a specific province as the successor of a polity that was absorbed by Rome. More modern states have this issue as well. Every bezirk of East Germany is listed completely separately from the single node for the whole state. Several Polish voivodeships are listed, as well as many Russian oblasts.

It’s worth mentioning that this data was taken from English Wikipedia, and presumably entirely separate and different networks exist for every different language edition of Wikipedia. The excruciating detail with which certain European countries’ subdivisions are represented in the English edition may not be replicated in other editions.

Degree Distribution

We can also make some observations about the graph’s degree distribution. With nodes sized by degree, one can easily make out the existence of large ‘hub’ nodes like the Ottoman Empire. This invites speculation that the network may be scale-free. A scale-free network is one where the degree distribution follows a power law. These types of networks appear in some real world physical, biological, and social systems.

Tests of the degree distribution’s best fitting model yielded the following results:

Power Law vs. Lognormal:
R = -0.690
p = 0.249

Power Law vs. ‘Lognormal Positive’:
R = 0.552
p = 0.139

The network's degree distribution (blue) plotted against a Power Law function (green)

Comparison to a Lognormal distribution suggests that a Lognormal is a better fit than a Power Law, but the result is insignificant.

Python’s powerlaw package includes a ‘Lognormal Positive’ distribution which keeps the Lognormal’s μ value positive. Since it is unclear what negative values would mean in the case of a network of historical polities or even a network of Wikipedia articles, this may provide a better comparison.

In comparison to a Lognormal Positive distribution a Power Law is favored, but its significance is still weak. At the very most, the network may be weakly scale-free, but it is not definitive.

One wonders whether the network’s degree distribution, regardless of its type, says more about the nature of Wikipedia’s network of articles or points to something deeper about the relationships between actual historical polities. Analogously, other studies on broader sections of Wikipedia’s network of articles have tried to ascertain whether this network is reflective of some deeper semantic structure.

One such study distinguishes between the ‘hyperlink network’ and the underlying ‘semantic network’ and found that the topology of the hyperlink network was “drastically different” from the semantic network. However, this is a much more general case dealing with hyperlinks embedded in natural language text across wide swaths of Wikipedia. Our case is much more tightly bounded, and is not really a ‘semantic’ network at all: the hyperlinks are not embedded in arbitrary natural language but are tied to the constructs of predecessor and successor exclusively, themselves rooted in historical data (X state came after Y state, etc.). This much more rigid, grounded definition suggests that the successor-predecessor network may be at least somewhat reflective of the underlying historical reality.

To this point, hub nodes do seem to be consistent with what one would expect for such a role in a real world network of polities: large, territorial empires spanning diverse geographic regions. Examples include the Ottoman Empire, the Mughal Empire, the Seljuk Empire, the Byzantine Empire, and the Soviet Union. Note, however, that this does not correlate directly to the size or power of a polity historically. Chinese dynasties, despite being some of the largest in history, have a relatively low degree. What really seems to be measured is the number of distinct polities that an empire absorbed, released, or was absorbed by during its lifespan. Later Chinese states were often succeeded directly by another large polity, as in a dynastic transition, or by a small number of short lived successors. The Ottoman Empire, by contrast, conquered an extremely ‘politically diverse’ region, absorbing and releasing many different small states during its existence.

The Concept of Succession

Beyond the specific structure of the network, the existence of these categories on Wikipedia at all begs the question: What do we precisely mean by ‘predecessor’ and ‘successor’? Owing to the large number of editors that create this network in an uncoordinated fashion, there exist different and sometimes conflicting ideas on what counts as a predecessor or successor. One fairly consistent choice that stands out, though, is that the logic of succession is tied to geography. That is, succession is determined by what states were present in the same geographic area regardless of political or cultural (dis)continuity.

To take just one example: the case of colonies. With Spanish colonies in the Americas, we find as predecessors various indigenous groups, the Inca Empire, etc. This is certainly valid no matter how one defines succession, but conspicuously missing is the single polity that directly established these colonies: Spain itself! This highlights one issue with the ‘geographic’ model of succession: New Spain clearly inherited vast amounts of institutional, cultural, and human ‘DNA’ from the Spanish state back in Europe, yet the latter is omitted from New Spain’s succession hierarchy presumably because it is geographically far removed.

The questions raised here are unavoidably evolutionary ones. In the last few decades the study of social, cultural, and historical evolution has grown in popularity, giving birth to emerging fields such as cliodynamics and macrohistory. These new approaches are supported by interdisciplinary ideas drawn from the study of biological evolution and complex systems, and focus on synthesizing the vast amounts of historical data now available with a mathematical rigor unusual in traditional historical analysis.

Rigorously defining evolutionary ideas such as succession with regards to historical development is one small piece of these greater projects. One such project, Seshat, even makes use of the ‘preceded by’ and ‘succeeded by’ categories in its own entries on historical polities. Given these developments, perhaps more refined and rigorous definitions created in light of systems theory and general evolution will one day filter down into Wikipedia’s model.

Options

Data Version:

Render Images: