April 21, 2020
April 21, 2020
by Daniele Di Clemente | 5 min read
The most notorious example of how organizations and societies have always had to deal with large volumes of data takes us back to the first demography poll, run by the Romans on the population of its then super vast empire, almost 2000 years ago (in English we still use the Latin term census). But the Roman census is not the only example.
A census is the scientific procedure for systematically acquiring and recording information on members of a given population. Censuses date back to the earliest societies that walked the planet because governments have always needed to classify and assess their citizens’ possessions for taxation purposes. Demonstrating, yet again, how money is the real mother of science.
If we go back in time, we can trace several civilizations, even geographically very distant, that developed methodologies to take account of their people. For example, as reported by Herodotus (probably the most famous ancient Greek historian), in ancient Egypt the practice of census originated in the late Middle Kingdom (2055 – 1650 BC) and developed during the New Kingdom (1550 – 1069 BC), under the pharaoh Amasis. Again, in ancient Greece, at about the same time as the Egyptian example, Cecrops, the legendary first sovereign of Athens, also conducted a census. It was performed by asking each inhabitant to bring the sovereign a stone and it was the final amount of such stones that would reveal the number of taxpayers. Later in history, in 1662 John Graunt (probably the founder of modern demography) published the first mortality index tables, both for the purpose of protecting public health and, surprisingly, as a marketing study. In his work, the data on the number of deaths was originally collected on the request of London merchants who wanted to estimate the number of potential customers, namely people alive in London by age, during the devastating outbreaks of various epidemics.
Since these examples, Big data, applied for a variety of purposes, has become a cornerstone for studying social phenomena and understanding external and internal factors that influence communities, as in the case of COVID-19 health emergency in 2020.
In ancient Rome, the practice of census became common under Servius Tullius, a king who reigned in the 6th century BC. He officially introduced it as an administrative instrument designed to categorise citizens by their social class or their role as civil servants (military, administrative or political), to evaluate the right amount of taxes each citizen was due. In fact, the term census derives from the Latin verb censere, which means to evaluate.
About 500 years later, at the beginning of the first century BC, Rome had become an empire. The Roman Empire extended its territories to what we know today as Spain, Germany, Greece, North Africa, Turkey, Iran and Israel to name a few. At that point, keeping track of human and economic resources from all the conquered provinces under its domain became an impossible task. The Romans quickly understood that trying to convince his subjects to march towards Rome, in order to be accounted for, was not a viable solution.
In response to this challenge, the Romans came up with the idea of establishing a body of officials called “censors” whose job was to go to the provinces (in Big data, we would call “provinces” data sources) to carry out the data collection (yes, another Big data term here) locally, rather than making every Roman subject to convene all to the capital city to carry out the necessary data collection. Upon their return to Rome, the censors would submit the result tables to authorities and the numbers were finally put together centrally to perform the data analysis (and yes, another Big data term here again).
In this way, the Romans brilliantly solved the problem of collecting and processing the necessary information on the different social and economic profiles of their vast population. In so many words, the first census under the Roman Empire relied on a modern typical Big data model were resources of a software system are shared among multiple computers to improve efficiency and performance.
Of course, from the simple headcount initiated for tax purposes two millennia ago, the Big data approach, devised to profile large populations, has long provided unlimited new use cases.
However the many examples traced by going back in history, the approach implemented under the Roman Empire still remains the first example of distributed computing combined with centralised data analysis. That is why, even today, when designing a Big data architecture, if the amount of data to be processed surpasses a certain threshold, data management engineers favour distributed, rather than centralised computing and they still refer to it as “The” Roman Census Approach.