Don't go near the water
John Snow, one of the fathers of modern epidemiology, produced a famous map in 1854 showing the deaths caused by a cholera outbreak in Soho, London, and the locations of water pumps in the area. He found there was a significant clustering of the deaths around a certain pump; removing the handle of the pump stopped the outbreak. For more detail about this incident and the experiment see Wikipedia.
Robin Wilson released the Cholera data in multiple formats. See his blog for more information. This app is an attempt at visualizing the data and using it to derive interesting insights from the data.
Links
You may access the application here.
Click here to download the source code and data files. To run, you need R; RStudio is recommended. Install the libraries listed at the top of app.R, and click Run App.
If you'd rather watch a video, here it is:
The data
This application visualizes the following:
Cholera Attacks and Deaths that occurred during the peak period between 8/19/1854 to 9/29/1854 are presented by date as a table and a line graph.
Fatalities per 10,000 inhabitants of by Age Group and Sex are presented as a table and bar graphs.
UK Census from 1851 by Age Group and Sex is presented as a raw data table, as a bar graph as well as pie charts for each sex by age groups.
Cholera death locations and the locations of water pumps at the time are presented on a map view on the streets of London as they are presently.
The death and pump locations are also presented on John Snow's Map, modified by Robin Wilson (to make it match better with the lat-lon coordinates).
Application
This application is written in R, using the shiny dashboard framework. Graphs use ggplot2 and plotly; the map view uses leaflet and location data from OpenStreetMap and CartoDB. In addition, tidyr and jpeg libraries are used to support corresponding functions.
It is designed to be viewed as a dashboard with no scrolling. The hamburger icon on the top bar will show/hide the navigation pane. Select the items on the left navigation pane to switch between various views.
Bar graphs may be viewed side-by-side or stacked. Items in the legend may be clicked to show/hide the item from the graph. Double-click on a legend item to show only that item. Hover over chart elements for more information about the point, bar or pie.
Special thanks to Stack Overflow for all the help with working with the gotchas in Shiny and the libraries.
Observations
By plotting the locations of the water pumps and the deaths that occurred on a map, it is so easy to see that a significant number of deaths occurred around the water pump at Broadwick and Lexington. It it important to note that this correlation should be used to explore the issue further and not be used as causation.
Given that the people in the 1850's did not know what caused cholera, it is scary to see such a sudden spike in the cumulative deaths in early September 1854. One can imagine that people must have associated the sudden deaths to all kinds of reasons - natural and otherwise.
It is also interesting to see, just by looking at the census, how quickly the number of people who live drop off so dramatically. Given today's life expectancy, we can see that advances in medicine have come a long way in the last 165 years.
The scatter plot (on the right) shows the number of fatalities per 10,000 people in the area, and highlights the sudden spike in attacks and deaths due to Cholera in the first week of September 1854. The data itself, at first sight, may not portray such a dire picture.