By Allison Bearly
Time to introduce you to the work of Allison Bearly! Allison spent the spring 2020 semester working as an intern for the Special Collections and Digitisation Departments of KU Leuven Libraries, as part of her Advanced Master’s in Digital Humanities programme. We were delighted she took to the challenge of working creatively with one of our open data collections. Here’s her story.
Hello there! I’m Allison. During my internship at KU Leuven Libraries, I worked with the digitized Views of Leuven collection from the library’s Digital Heritage Online collection, a collection that contains 352 images of Leuven and the surrounding areas dating from the 16th to the 20th centuries. With the Library’s recent move to adopt an open data policy for its digitized collections and a previous intern’s work on a pilot study to share the collections as computationally amenable data, I built upon this foundation.
In my internship, I had two main objectives. The first was to create another machine-actionable dataset to share on the Library’s GitHub account, using the pilot study as a starting point. The second goal was to creatively reuse this data to show an example of what can be done when cultural heritage institutions share their collections as open data. Concretely, this took the form of creating an online, interactive map of Leuven through the centuries. In this blog post, I will explain the steps I took to achieve these goals.
Data Cleaning and Enriching
Thanks to Mariana’s work on the pilot study, I was able to follow many of her steps for the data cleaning, transforming and refining. For a more detailed explanation, give Mariana’s blog post a read. I also chose to do the data cleaning in OpenRefine since it is a free, open source tool that easily lets users clean and transform data.
Once I had done basic data cleaning and transforming, I began to enrich the data with the location information. To use the dataset to map the images, I needed coordinates for each image. Before adding coordinates, however, I added another category, which I called Place Name. Although the metadata already included information about the place pictured in the image, it wasn’t always complete or consistent, especially when more than one location was featured in an image. Although I had to add the information for the Place Name column manually, it was a valuable use of time because the result was a consistent name for each place, which allowed me to take advantage of OpenRefine’s text facet feature to sort the records and add the coordinates in mass.
Figure 1: Screenshot from OpenRefine: Abdij Keizersberg, which has 5 records, is selected from the text facet
I decided to use OpenStreetMap (OSM) to get the coordinates and as the base layer for the map because it is open source and widely used for various web applications.
Figure 2: View of the OpenStreetMap interface. By right clicking and choosing “Show address” at the location of Abdij Keizersberg, we get the coordinates in the search results window in decimal degree format (here highlighted in yellow).
The latitude is copied from OSM and pasted into OpenRefine by clicking on the edit button on the latitude cell. Then the “Fill down” feature is applied to add the latitude to the rest of the records with the same Place Name. The same steps were then repeated with the longitude column. The text facet feature made adding the coordinate information a relatively quick process and it ensured that all records with the same Place Name have the exact same coordinates.
To see the complete cleaned and enriched dataset, check out the repository on GitHub.
Figure 3: Three views of the OpenRefine interface. The first image shows the latitude being pasted into the corresponding cell of the first record. In the second image, the fill down function is applied, resulting in the third image which shows that the latitude has been added to the remaining records.
Creative Reuse: Georeferencing Images
With the data cleaning and enrichment complete, I moved on to the next step of my project. As part of my interactive map of Leuven, I wanted to feature the historical maps in the collection so I used them as overlays over a base layer map. In order to do so, I first needed to georeference the historical maps. Georeferencing is the process of adding coordinate information to raster files (the historical maps) so that they can be aligned to a map with a coordinate system. It works by assigning ground control points (GCPs) to the raster image for which the coordinates are known.
In order to georeference all 18 of the historical maps in the collection, I turned to QGIS, an open source GIS software. QGIS has a georeferencer feature which allows you to select GCPs and assign them to coordinates on a base map. The first step is to add the OpenStreetMap tile as the base map and zoom in to the correct location (in this case, Leuven), and subsequently to upload the raster image, i.e. the historical map.
Figure 4: Views of the QGIS interface, adding the OSM tile (left image) and the raster file (the historical map in jpg format; right image).
Next, within the QGIS georeferencer, a point on the historical map is selected to add a GCP. When a point is chosen, a popup box lets you either manually enter the coordinates or choose them from the map canvas, in this case, the OpenStreetMap which was added in the first step. The same spot is selected on the OSM map, the coordinates get filled in and are assigned to that GCP. The same steps are then repeated to add more GCPs.
Figure 5: Views of the QGIS georeferencer showing a popup box after selecting a point on the historical map through which to add a GCP (top image), the OSM on which to select the same point (bottom left image) and the assignation of the coordinates to the GCP on the raster image (bottom right image).
A minimum of 4 GCPs should be added that are evenly distributed around the image. The more GCPs, the more accurately the georeferenced image will align with the base map. Once a sufficient number of GCPs are added, the georeferencer is run and the historical map is aligned over the base map. Depending on how accurately the GCPs were chosen and how accurate the historical map is, there will be some warping and distortion of the historical map.
Figure 6: The QGIS interface shows the historical map of Leuven georeferenced on top of the base map.
Creative Reuse: Making the interactive online map
With the map pages set up with the historical map overlay, the next step was to add all of the places to the map. The d3 library let me parse the CSV file and add all of the locations (by using the coordinates that were added in the data enrichment stage) as points to the map. Using the Leaflet popup feature, I added the name of the place and a Bootstrap carousel of the thumbnail images to the popup.
Figure 8: Map with locations as points added. Popup includes the place name and a carousel of thumbnail images of that place.
For some final touches on the map, the Leaflet Marker Cluster plug-in was used to cluster the markers that were close to each other so the map wasn’t overwhelmed with markers on the map. The Leaflet Extra Markers plug-in was used to change the color and add icons and numbers to the markers to allow for easy identification of the category of the place.
Enjoy exploring our selected Views of Leuven!
Figure 9: Map with the customized, clustered markers showing the layers that are available.