What is open data? Open data is usually defined as data that can be used without limitations, shared and with contributions from anyone in any location. One of the best definitions I’ve seen is the one provided by the Mexican’s government website for open data. There, for data sets to be considered open data they must be:
Free: No transaction is required to obtain them.
Non-discriminatory: They’re accessible without restrictions to users.
Unlicensed usage: Referencing the source is the only requirement to be used freely.
Machine-readable: Structured, entirely or partially, to be processed and interpreted automatically by electronic equipment.
Comprehensive: They contain, as far as possible, the theme they are describing in detail and the necessary metadata
Primary: They come from the source from which they were generated with the maximum level possible of disaggregation.
Timely: They are updated periodically as new data is generated.
Permanent: Historically relevant versions for public use are kept available with their respective identifiers.
Open Data Exercise
Working with data to answer questions one might have is not as overwhelming as one might think (usually the hard work lies on cleaning and preparing it). In this case that part of the process wasn’t too time-consuming since as seen in the definition, the government aims to provide structured data that is easy to read. To illustrate this point, I would like to share a small exercise (meant more as an example, definitely not a thesis, essay, or profound report) that tries to attain insights for the following questions:
Did Mexico have increasing crime rates since decades ago? Or did the war on drugs in 2008 really start it all as perceived by many?
To answer these questions the following was analyzed:
*Did they present a growth over the years?
*Did femicides between relatives increased or decreased (an augmentation could indicate a social issue)?
*Did the majority of cases involve domestic violence or is there no connection?
Crime aside from homicides
*The levels of other crimes such as rape, highway armed robberies, and property crimes since before the war on drugs started in 2008.
Sources of Information
For this exercise I used two sources. The first one being INEGI, a public but autonomous institution when it comes to data gathering techniques and management (which results in data not being manipulated by the government). INEGI’s main focus is population and housing censuses, but for this case the open data regarding deaths by homicide from 1990 to 2016 was used:
The other source is the Secretariado Ejecutivo del Sistema Nacional de Seguridad Pública, which is part of the federal government. It sets the basis for coordination and distribution of the responsibilities regarding federal, state, municipal, and local. They are constantly changing their classification of crimes in order to be more precise and greater disaggregation (many crimes used to be piled up under the category “other crimes”:
Using the data from INEGI I downloaded three separate datasets since the website doesn’t do a good job at creating a multidimensional table. The three csv tables to download are as follows:
- Homicides in general separated by gender
- Homicides split by gender and there was a kinship between the victim and the murderer.
- Homicides divided by gender where there was domestic/family violence.
It’s good practice to check the information contained (when the data sets are too big usually just the first and last rows will do, but in this case there was no need). We see that the data frame containing homicides where there was a relationship between the murderer and the victim only has data starting from 2012:
The next data frame to be checked is the one concerning homicides where there had been domestic violence. The information is only available starting from the year 2000. It seems the government integrated the previous table into this one, so this is the one that will be used to carry on the research.
Last, is the data frame containing all murders separated by gender:
Unnecessary columns or blank values are dealt with in the data frame containing the numbers for murders with domestic/family violence and the libraries ggplot2 and reshape2 are loaded to visualize the results:
We get the following visualization. With it we can observe that there was already a considerable murder rate increase in cases involving domestic/family violence. It’s also noticeable that there is no pattern, both in general and when comparing murders in both genders.
Next for the data frame containing all murders some cleaning had to be done to remove some columns, commas, converting to numeric values and selecting only data starting from the year 2000. Also a graph using ggplot to see if there was any pattern.
By looking at all murders we see that there is definitely a jump starting in 2008; especially, among men. We can also see that the main drivers behind the pattern are the numbers for men murdered, but the rates for women does follow the tendency when it increases or decreases (unlike the one for domestic/family violence).
In order to keep exploring if domestic/family violence is present in many of the femicides I decided to measure the percentage of femicides where there was family violence from the total femicides:
We see that from 2000 to 2016 the percentage of femicides where there was domestic violence involved from the total femicides has moved without a pattern, 3% being the lowest and 11% the highest. Although the numbers didn’t remain consistent the total number of femicides where there was family/domestic violence is still pretty alarming and illustrates the need for better education regarding family and relationships (they may not be fifty percent of the total, but the fact that they are not zero is always a problem).
Conclusion: Unlike the murder rates for men that really shot up in 2008, women’s numbers remained with a steady increase (the information that can be consolidated is from 2000 forward but the problem probably starts decades before), which means that femicide is something that can’t be pinned on the war on drugs but a social issue that has been unattended for decades.
Crime aside from homicides
To keep investigating if there is an problem in Mexico not just with homicides but with crime in general I went through the data downloaded from the SNSP for crime classified by state. I begun by focusing on property crimes.
The graph illustrates that way before the war on drugs in 2008 there had been a sharp increase in property crimes; and even more to the point, starting in 2009 there was a decrease (in the past people used to argument that crime had risen because of cartels expanding to other areas instead of focusing on the problems of the judicial system and society).
I decided to take a look at other crimes to see if there is a similar pattern or not. I decided to check and graph the rates for rape:
Sadly we see that like property crimes, sex crimes had a sharp increase starting in 2000 and presented the same decrease as property crimes in the period between 2011 and 2015.
We take a brief look at another felony, highway robbery:
The first part where it shows zero crimes (1997-2001) may be due to this type of crime not being reported separately. Confirmation and a deeper study (which this isn’t) would be needed with the SNSP to better understand this error in their open data.
Like with other crimes we see that it also started to augment at the beginning of the decade and that it fell between 2011-2015 before jumping back again.
Through extraction, cleaning, and working the data from INEGI and SNSP one can observe that crime rates in Mexico were already a problem and sharply increasing even before cartels diversified their operations when the war on drugs started. Crimes like femicide has been occurring for decades at the same levels (also it’s noteworthy to mention that the number of missing women isn’t accounted in the murder rates).
Crime in Mexico has been growing since decades ago and definitely requires a complex research since the reasons for the tendency could be various such as impunity, corruption, unemployment, social injustice, complicity between crime and governments, etc. The purpose of this exercise was just to share how simple exploring open data can be to try and answer one’s questions and challenge notions or generalizations with data.