The WWII RAF Aerial Photograph Collection

Since 1994 the Library of the Wageningen University and Research (UR) centre houses a collection of aerial photographs taken by the Allied Air Forces, popularly known as the RAF-collection.1 The dates of the photographs taken lie mainly between 1944 and 1945. The collection came in our possession when they became out of use for the Dutch institute of soil mapping in Wageningen, who received the collection after World War II with the intention to use them as source information for producing soil maps. Over the years several parties had shown their interest in these images, amongst them the Dutch Explosive Ordnance Disposal Service (EOD) and the National Archive. Because of the relevance of the pictures for research on land use, the collection was donated to the library of the Agricultural University, which became later our current library.

The collection consists of 94,257 photos, taken with different cameras, on different heights and different angles, mainly vertical but some were taken oblique. Some of the photos are distorted due to hazardous conditions during the reconnaissance flights and the hasty photographic production and use. The photos were taken by RAF pilots who flew in a Mosquito airplane in sorties (flights) of several runs of tens of photos. Within a run the photos overlap 60% so that they can offer 3D images when studied with a stereoscopic viewer. The geographic position of the sorties is also documented on sortie maps.

As the ideas for a digital presentation to the public became more realistic, a first step taken by us was the digitization of the analogue photos to a resolution of 1,200 dpi and serving them on a plain website in 150 dpi samples. This was done with a subsidy granted within the War Heritage Programme (Erfgoed van de Oorlog) from the Ministry of Health, Welfare and Sport because of the collection’s large cultural and historical value. The purpose of this website2 was merely to create visibility of the collection and to set up a web-shop to lower the need for handling the original photographs. Once this task was completed we had a digital collection storage of 11 TB of data. But searching the collection for relevant photographs still had to be done by using the old digitized sortie maps. From this originated the idea to create a geo portal according to the wishes of our clients: to be able to locate the photographs geographically and to integrate the collection with other geographical material like actual topographic maps and landscape plans. We also wanted to offer a restricted view on the high resolution scans, in order to minimize completely the need to use the vulnerable originals.

Fig. 1: 

Example of a sortie map. The location is Zeeland, province of The Netherlands.

Fig. 2: 

Example of an aerial photograph. Location is near Apeldoorn, the Netherlands.

Development of the Discovery Application

As the objective of the project was to create a digital discovery environment that could serve the interest of a variety of users, the technical requirements of the collection application had to satisfy a variety of interest groups. During the design phases in the project we kept this objective in mind. Phases in which important decisions were made were the georeferencing of the collection, the metadata management design and the specification and design of the user interface. The latter is still a work in progress.

The architecture of the collection discovery application is based on three layers: the actual data (photographs and metadata), the data and metadata servicing layer and the presentation layer (user interface). The application can be used in a Spatial Data Infrastructure (SDI) and a Library Information System (LIS).

The Data Layer: Positioning the Photographs on the Map

In order to project the aerial photographs on a digital map, the material had to be georeferenced. Normally this is a manual exercise performed by specialized staff. In this process an image is manually positioned, using a GIS tool, upon a reference map. Geographic coordinates are assigned by clicking reference points on the image and linking them with corresponding points on the reference map. Depending on the distortion of the image, the projection of the reference map and the accuracy requirements, the number of referencing points should at least be two. Because of the enormous number of photographs to be processed we aimed for the design of a methodology that was more cost effective than manual georeferencing. The eventual method still consisted of a substantial portion of manual work, such as checking the quality for processing of the almost 95,000 images as well as bits of manual georeferencing, and automated work. The total duration of the manual task was approximately ten person months. We developed a script that performed the definitive georeferencing and image positioning.

The input files for the total process were the maps showing the geography of the flights (‘sorties’). The start- and endpoint photographs of those sorties were georeferenced by hand. We used these as reference to automatically calculate the position of every photograph taken between the begin and end point of each flight. Some flights appeared to be deviating from a straight line, probably caused by wind or situational military dangers (see Figure 3). This brought about the necessity to manually georeference some more intermediate points positioned between the begin and end points.

Fig. 3: 

Off track position of images of one flight (number 092_11) causing the necessity of manually georeferencing intermediate points.

The begin and end points served as input for an automated script that calculated the position of every photograph in between. The calculated position was used to automatically georeference the photograph.

A second script took care of the rotation of the images, necessary to make the pictures fit in within the cartographical background. The calculation of the rotation was derived from the direction of the flight. Both scripts were developed with the Python programming language in combination with ArcGIS geoprocessing tools.

The Data Layer: Metadata

For discovery of the collection we wanted not only to offer a visual geographic interface — the map of the Netherlands — but also textual retrieve options based on metadata stored with individual photographs. Users search with a certain goal in mind and within their own context (Rose and Levinson, 2004). It is evident that the metadata elements determine which information about the accompanying dataset one is able to retrieve, so the composition of a metadata element set must be in accordance with the search behaviour of a user group or harvesting system. This is particularly valid for geo information objects, to which the RAF aerial photographs belong, because they can be of interest to a large and varying audience.

Thus, a metadata element set was developed with these requirements in mind. The most important step in this process was the definition of the end user groups. To each end user group a collection of metadata elements was assigned. In most cases we were able to use a standardized metadata element set. For some we had to define additional elements. An overview of the end user group and the corresponding metadata element set we assigned is given in Table 1.

For publishing the metadata in Europeana we used the Dublin Core set as part of the Europeana Semantic Element Specification ESE (Europeana, 2011) and the Europeana Data Model (EDM) elements (version 5.2.3, 24/02/2012). Europeana’s aim is to accumulate digitised content, standardize the data that describe it, apply linked data techniques to enrich it, and promote persistent identifiers to locate it in the long-term (Europeana, 2012). Following their published Content Strategy, they extended their network of data providers and encouraged the development of aggregators that fit the needs of different countries, domains and users. In the Netherlands our RAF collection can be aggregated by either the War Heritage Sources or the Dutch State Institute for Cultural Heritage or maybe both.

Table 1

End User groups and related metadata element sets.

End User Group Metadata element set
Cultural heritage (Europeana) Dublin Core and the Europeana Semantic Elements
Geo Information Scientists ISO 19115 and ISO 19119 standards (definition) and ISO 19139 (XML format)
General public ISBD (CM) International Standard Book Description (Cartographic Material)

To enable Geo Information Scientists to use and exchange services for the aerial photos in their spatial data infrastructures (SDI’s) the aerial photos must be opened up in a interoperable fashion. Interoperability refers to the ability of one system to interact with other systems in a reliable and unambiguous manner. Within the spatial domain, the open standards of the Open Geospatial Consortium (OGC) are widely used. This is in line with the European INSPIRE directive that regulates that common implementing rules are adopted by member states. It is for that reason that we used the OGC standards. Geographic metadata are served via the OGC metadata Catalogue Service for the Web (CSW) implementation specification. CSW’s support the ability to publish and search collections of descriptive information (metadata) for data, services, and related information objects (OGC Catalogue). Different metadata implementation standards can be used with a CSW. In order to comply with the European INSPIRE regulations, in our project the ISO 19115 and ISO 19139 metadata implementation standards have been used.

Lastly, the general public of the Wageningen UR Library must be able to find the photos via the common user interface of the library.3 The metadata on which the search facility for this collection is based is the ISBD(CM) metadata format.

The RAF photo collection itself was described with limited information. Less than 10 metadata elements were present describing the details of the flight, such as flight date, pilot name and sortie number. These elements were added to the total collection of user targeted metadata sets from Table 1.

The complete metadata element set consists of approximately 35 elements. The elements are organised in a nested structure in order to group them together and use them in the appropriate context levels. These levels are: general metadata elements which will apply to virtually any dataset, metadata elements which will apply to almost any dataset with a spatial reference, metadata elements which will apply to all raster datasets with a spatial reference and metadata elements which are specific to this collection.

To describe the metadata for the RAF aerial photos, an XML Schema Definition (XSD) was created. An XSD is used to describe the contents of XML files.

Serving the Image Data

For serving the aerial photographs we used the OGC Web Map Service (WMS) and Web Coverage Service (WCS) implementation specifications. A WMS dynamically produces maps of spatially referenced data, such as our georeferenced images. A “map” is in this case a portrayal of geographic information as a digital image file suitable for display on a computer screen (OGC Web Map). The WMS standard is merely used for data visualization. A WCS on the other hand describes and delivers multidimensional coverage data over the internet. A coverage is a georeferenced raster, for instance gridded geospatial data, or a collection of remote sensing images or aerial photos (OGC Web Coverage). A WCS provides access to the actual data.

Serving the Metadata

In our application we decided not to present to the user the metadata set that contained every element. This would overload a user with irrelevant information. Instead we decided to store all the metadata information in one combining metadata dataset and present to the user a subset of all this metadata information based on the search context of the user.

This concept is supported by the general architecture of our LIS. This is based on a three tier environment of which the XML database is the first tier. The full metadata set is stored here. The second layer is a content management system based on WebQuery and merely regulates the store and retrieve requests to the XML database. WebQuery accepts either a URL or a form, containing fields that represent a query. The query will be passed to the database backend, which will perform the query and return a record-set. Records are always stored as XML records. WebQuery can be instructed to use XSLT (Extensible Style Language Transformation) files to transform the XML records into other formats. This architecture supports not only serving appropriate metadata formats to any user group, but it also supports serving any metadata format to a harvesting system.

The presentation layer is based on XSLT. XSLT files are used to transform the full RAF collection XML metadata documents to other formats (in our case HTML for presentation mark-up in the library user interface, Dublin Core XML for description of content for Europeana and ISO19139 XML for use in SDIs).

The metadata will be indexed for search in the library central indexer. This indexer, based on Apache Lucene SOLR4 allows us to present to the user a ranked and faceted result set. This guarantees that the individual aerial photographs are also findable from the central library text search interface.

Presenting to the User

The enormous amount of photos scattered over a large area of The Netherlands and sometimes positioned on top of each other, requires a well-balanced functional design of the presentation of the collection. While this is still work in progress we are currently building a list of required functionalities. The final layout of the interface will be realised via discussions in a series of workshops to be held with representatives of user groups. The most important search tasks that we already identified are:

  1. searching for photographs of a specific location,
  2. searching for photographs via a description in the metadata,
  3. browsing through the runs, exploring areas and features of interest.

To facilitate these search tasks, the interface will offer a search facility for text entries stored in the metadata as well as a geographical search facility based on a map of the Netherlands. The map search will support both interactive search (panning, zooming) and search via entering the location name in a map search interface.

Since the images are positioned on top of each other the total collection will not be shown as images in the geographical interface, but rather via all the image location points as illustrated in Figure 4.

Fig. 4: 

Automatically positioned images (smaller dots) added to manually georeferenced images (larger dots).

Conclusion and Discussion

The main challenge that we were facing at the start of this project was to combine the geospatial perspective with the library perspective. While we were working, we learned that the methods of both perspectives are the same, but the use of the content is very different. Technologies like metadata management and harvesting, web services and indexing are alike. Since the content is used differently, different metadata formats and presentation techniques are needed. In order to manage the metadata content in both infrastructure perspectives (SDI’s and LIS), we have used a methodology for metadata management that serves both worlds.

Some users search via geographical interfaces. Some use text. We aim at developing an interface that satisfies both types of users. In addition we have plans to provide the users with facilities to submit features that they discover on the photographs into the metadata. We expect that this type of crowdsourcing feedback will enrich the collection as a whole and improve its discovery.

The collection of photographs is very large. This brought us to develop an automated method for georeferencing. Obviously, the automatic georeference calculation provides an idealized and average result. Our calculation is based on assumptions such as a constant flying speed, camera height and camera angle. In reality, this may not always have been the case. The georeferencing results are not as precise as they would have been when done manually. We found that zooming to a scale over 1:50,000 will show deviations. However, taken into account the cost versus the results, we have assumed that these results are sufficiently satisfying for the interest groups. To verify this assumption we have plans to invite the users to indicate which areas they consider candidates for more precise georeferencing. These plans include the development of tools for user generated georeferencing.

Even though most parts of the application are based on existing modules, such as the web map services, the map interface and the metadata handling module, the project as a whole will take approximately 14 person months to complete. Once it is finished, the WOII aerial photograph discovery application can be accessed via the home page of our library.