One of my favourite new tools that I’ve added to my social scientific research into environmental activism and religion in Scotland over the past three years has been geospatial work or GIS (short for Geographic Information System). Scholarship in the sociology of religion often works with very large data sets (like censuses), but this work is very seldom parsed out on a geospatial basis. This is a huge loss, I think, as there are a number of important ways that geography inflects demographic data sets. One of the reasons that GIS remained a specialist domain for many years is likely because it required specialised software, i.e. ESRI ArcGIS which is VERY expensive. I’ve worked with ArcGIS, but aside from a few specific features (like normalising data sets quickly and running data collection in the field with tablets) I find that it is inferior to the current Free and Open Source (FOSS) competitor. The obvious reason is that ArcGIS (remarkably!) hasn’t been ported to MacOS. However, in my experience – like older versions of Microsoft Office – ArcGIS also crashes and hangs a lot.
QGIS is a fantastic tool. It has a lively development community, a huge range of plugins, it is multiplatform (with versions for Windows, MacOS, Linux and Android) and generally it “just works”. If you want to run QGIS on a Mac, you’ve generally got two options. KyngChaos provides current and stable versions of QGIS in a MacOS package along with pointers to a few external libraries you need to install. This is your best option if you want the very latest and greatest version of QGIS. Your other option is provided by the GIS software development company Boundless. Theirs is a free download as well, but you just need to fill out a wee form and they’ll email you the download link. Boundless is your best option if you’re looking for a clean, simple install. They also provide support packages which I recommend if you are writing QGIS work into a grant as this will ensure that you can call someone and get help if you’re stuck and your money goes towards further development on QGIS. I switch back and forth between the two versions, so I don’t a decisive word on which is better, but for our purposes, I’ll recommend that you start with the boundless distribution by filling out that form.
Check your email, download the DMG file and then follow the Boundless instructions to get QGIS installed on your Mac. Once you’ve got the software installed, there are a number of useful guides you can follow which will help you on your way towards becoming a GIS ninja.
If you want to dive in and get a sense of what’s possible with QGIS, I recommend you start with the QGIS user guide. If you’re the linear type, like me, you may want to start by orienting yourself to the QIGS GUI or look into working with vector data. If you like learning by doing, then start with the QGIS training manual which walks you through a number of basic steps towards creating your own map. QGIS tutorials also has some good tips. Or, you can download the PDF of Michael L. Treglia’s QGIS tutorial. Once you’re underway, you’re invariably likely to come up with some specific questions. The best place to get answers to these is on the GIS stackexchange (though be warned: the moderators on this board get justifiably grouchy if you post a question without first searching to see if it’s already been answered before, which is about 99% likely).
For people who haven’t done a degree in Geography, I’ll add a few notes here that will help you get started.
- Geospatial data sets tend to come in a few flavours. In their most basic form they’re just a long list of coordinates (X and Y OR latitude and longitude) with some other data associated with each point, i.e. feature name, website, notes, etc. The most common format for datasets of points is the plain text comma separated value file (or CSV – hurray for plaintext!). Beware, fields in these files can sometimes be separated by characters other than commas… often the pipe: “|” character.
- Sometimes geospatial data involves shapes. Whereas a building can serve as a point (designated by coordinates) on a map, a number of other kinds of features – a lake, ecological zone, population centre, or census data zone – will be identified by a polygon (see wikipedia if your last math course was a long time ago). Polygons are stored in shapefiles, usually in the proprietary ESRI format with the file extension SHP. However, sometimes, shapefiles are stored as raster images with an accompanying data file. These look like a plain old TIFF and JPEG image file, but have a geospatial data file accompanying them which QGIS reads automatically. To get the full lowdown on GIS file formats, you can read more here. Most of the GIS operations you’ll do involve calculating correlations between points and shapefiles and I’ll give a few examples of this in future posts.
- One last thing you’ll invariably run across which will likely be puzzling is “CRS”. This acronym stands for Coordinate Reference System. Here’s the problem: maps are flat and the earth is not. Depending on how big your map is, i.e. just the Seattle metropolitan area, the whole of Washington State, the USA, or North America, the angle at which you display your map, and the way in which you stretch the coordinates will be different, sometimes very different. To address this problem, GIS gurus over the years have created a VERY large array of CRSes. If you’re in the UK, like me, the Ordnance Survey has a helpful guide to CRS which you will use in the UK. I’d recommend skimming it to get started. If you’re using one of my data sets, I’m usually working off the British National Grid or OSGB1936/EPSG27700. If you ever download maps in KMZ/KML format from Google maps, you should know that they encode their coordinates according to WGS84 (EPSG 4326) (more on this here) and will need to be converted to the BNG. For the most part QGIS can interpret the CRS of your data sets and convert on the fly, so all you really need to know is what to choose when that CRS window pops up.
One more thing if you’re just getting started. There is a huge amount of geospatial data out there which is under some kind of open access license (which is good since public money tends to subsidize geospatial research!). This includes a number of standard demographics. Here are a few good ones to put in your arsenal to get started:
- Download yourself an overview map of Britain via the Ordnance Survey.
- Get some bigger (i.e. “earth”-level) overview maps and other data via Natural Earth.
- If you’re a social scientist, but haven’t tinkered with GIS before, check out this really helpful page at ReStore where they break down some of the concepts I’ve touched on very briefly above in more detail and show you where to find data.
Some more specific data (with a bias towards demographics and ecology) for my fellow researchers in Scotland:
- EDINA (hosted by JISC and based in Edinburgh EDINA is a consortium which makes commercially licensed geospatial data available to persons with a school or University affiliation – this is where you start to get Ordnance Survey data, but see also OS open data)
- Scottish Urban/Rural 8-point scale via the Scottish Government
- Scottish Index of Multiple Deprivation (very generously packaged by Alasdair Rae – more from him here and see also UK Data Explorer)
- Historic Scotland data
- Scottish Natural Heritage data (all kinds of shapefiles for protected areas, wilderness, etc.)
- The Brand Spanking New Statistics Scotland Portal has LOTS of data (131 data sets at my last count).
If you want the rest of Britain:
- English indices of multiple deprivation via gov.uk official statistics (helpful tips here on how to find data on the ONS website and a handy list of releases) or via OpenDataCommunities.
- You’ll find census data etc. via the UK data service.
Any Americans out there?
- Urban Rural classifications from 2010 census here and other census related shapefile data here.
- US Geographical Survey site
- National Data from the US Department of Agriculture
- The Fish and Wildlife service has data on things like wetlands
- A big list of Washington State data compliments of the University of Washington and data on the world (with an American academic perspective)
- The US Department of Energy has quite a lot of data at OpenEI