Seshat: The Global History Databank is a game-changing database construction project that may reveal there is actually one thing war is good for: historical statistics.
By publishing free-to-public data on historical societies, Seshat will soon become the world’s largest professional historical database.
The massive research effort has been far too much for any individual or a small team to work on alone. It is a collaborative project bringing together the expertise of historians, anthropologists, economists, and archaeologists from universities around the world.
The hope is that social science models, new, long-held, speculative and dogmatic alike will in future be submitted to the test of historical data: to let Seshat decide which ones are right and which are wrong.
With the amount of data accessible to anyone, Seshat will also change the way historians write books and revolutionize the teaching of history.
To date, progress has been made on over 100 polities. These include the period of the Roman Empire and Ancient Egypt. There are hundreds of polities still to code.
The scale is huge. For each polity Seshat has over 681 variables (and counting) on government, religion, economy and society. Seshat will also make available information on resources and agricultural productivity for natural geographic areas.
Due to the size and ambition of the database, however, there is one more area on the list for development: ancient warfare.
Warfare is a human behavior at the group-level which as discussed on SEF may have played a major role in the development of group-level cooperation. Can Seshat, in addition to everything else, also provide researchers with a database to test theories relating to warfare?
As Peter Turchin put the argument: The logic is very simple. Groups of people who can’t cooperate to put together an army, will be overrun by those who can. The result is that genetic and cultural traits for noncooperation will go extinct.
Is this true, or not? Can we test the claim that societies that possess universal norms (such as equality or charity) and large-scale ultrasocial state institutions (like bureaucracies, health and education systems) exist because over the last 3000 years ancestral societies used war to eliminate societies that did not possess their prototypes and antecedents?
With enough data on warfare – who fought whom, where, when, what were the consequences? – many evolutionary models, and non-evolutionary models, can be tested, and improved.
A Seshat contribution to warfare statistics would end where Correlates of War (1816-present) database has already begun with data up to about the late 18th century.
Whilst data collection for COW was begun by the political scientist J. David Singer in 1967 the field of ancient warfare statistics has no comparable digital database. It is stuck at the stage where primary and secondary historical sources have been collected together into large compendia.
One example of the genre, which was first published in 1967, Eggenberger’s “An Encyclopedia of Battles” contains descriptions for 1,560 battles. Since then the more recent multi-volume tomes have begun to creak the bookshelves. Is the time now right for a digital approach?
While the analogue paper literature is very good, and provides superb “highlights programming”, it is not accessible for statistical analysis, and there does not yet exist even a reasonable presentation of the “full game”: all the wars, battles and sieges that have been recorded in history.
Proving this last point, mostly using the remarkable resource that is Wikipedia, in an experimental effort I coded over 1,800 battles and sieges only from the Roman era, through the Byzantine Empire, to Ottoman Empire up to 1700 CE.
Lots of the battles and sieges did not have their own webpage (many were under the name of a famous general or ruler). However, the exercise suggested all battles and sieges recorded in history could amount to something staggering like 100,000-150,000 – a magnitude beyond the current compendiums.
Wikipedia will never get all available historical data on warfare and nor will Seshat. The historical record is not detailed enough. At some point you have to say that is all the data you can get and leave it to the analysts to figure out how to account for record bias.
The take home point is there is a lot more data out there than is being put into war compendia. Seshat will do a more complete job putting it all together and, unlike Wikipedia and the paper volumes, it will be an accessible, expert checked, open-for-all resource designed by and for researchers.
(Some improvised graphics above and below (with help of Mapsdata.co.uk) shows what could be done with a large amount of data-points).
What variables are covered?
A Seshat warfare database would separately code warfare variables, and variables for the battles and sieges that make up the wars.
Battle and siege pages would record the most obvious information about the belligerents and consequences of interest. The siege page would also identify the besieger who initiated the siege of a city. Raids on cities would be coded with a siege template (but not as a siege).
A “war” could be considered a polity versus polity violent conflict (e.g. Roman-Sassanid, Roman-Parthian) or a collection of military events between polities (First Punic, Second Punic etc.). In the latter case, the war is nested within a meta-conflict (such as between Rome and Carthage).
We propose variables that distinguish between types of war and the cultural distance between the belligerents.
What are the challenges of coding warfare variables on Seshat?
Accuracy is the main challenge. This is the distinguishing feature of a professional academic database. All codes have acceptable references and are checked by experts in the field.
While accuracy is of central importance the first thing the coder and experts may notice in their historical sources is the absence of accuracy.
A result of the exaggeration, bias, and the inconsistent and contradictory ways in which witnesses, chroniclers and writers have reported facts about historical events modern estimates must be used. This is again why the role of the expert is important.
The database must also be constructed with the accuracy challenge in mind. Let’s take one example. In reference to army losses in battle a source might say the army was “completely routed.”
Usually this phrase is used as part of a battle description where claimed losses ranged from 25-50%. However, a coder cannot assume every author means “completely routed” in the same percentage.
This does not make data collection impossible. Seshat would provide space for written explanation and allow the coder to input values in the form of a range, or even in the form of a disagreement between experts.
Some of the most inconsistent data might still be used for derivative variables: how about one for reported army size based on the logarithmic magnitude of total participants at the battle? A useful statistic based on the total size of fielded armies would be one that does not require the historical data to be exceptionally accurate (because it isn’t).