1. Introduction

Have we embraced complacency and become too comfortable with the internet’s knowledge production capabilities? If so, by choosing to rest on our laurels and exploit this affordance, what happens to epistemic curiosity? (D’Arnault, 2019)

For all their rhetorical flair, these questions raised by Digital Culturalist blogger Clayton D’Arnault force us to face an inconvenient reality. Current estimates suggest that over 60 percent of the world’s population is connected to the internet (Internet World Stats, 2020; Statista, 2020), and that of those people, a substantial group relies on search engines for information about its politics and its governmental leadership (Dutton, Reisdorf, Dubois, & Blank, 2017). Therefore, citizens curious about, say, the nomination date of former Belgian Prime Minister Sophie Wilmès, are likely to satisfy their information needs by turning to Google and Wikipedia, rather than to query the online portal of the Belgian Federal Public Service Justice to consult the legal nomination document in the Belgian official journal (‘Belgisch Staatsblad’, ‘Moniteur belge’) (Belgisch Staatsblad, 2019). It is safe to say that very few would go as far as to consult this physical document at the journal’s archives. And in most cases, these knowledge-seekers would be right to assume that the world’s leading search engine and the most prominent online encyclopedia yield relevant answers and facts.

However, while the merits of Google, Wikipedia and related projects cannot be overstated, it has also been established that these platforms are marked by algorithmic, ideological, gender and other forms of bias. Criticisms have for instance been levelled at Google’s opaque ranking and rating algorithms (Wakabayashi, 2017), and an overreliance on the use of Google-like search engines fosters what Lynch (2016) describes as ‘Google knowing’, a form of knowledge-seeking that precludes critical comparisons between sources, and which boils down to following the opinion of the majority. Along those lines, Wikipedia has been shown to be a battleground for conflicting ideological perspectives on the same topic (Rogers, 2013, Ch. 8), and to be marked by a significant gender gap in terms of editors (Ford & Wajcman, 2017), and content (Adler, 2016; Filipacchi, 2013; Women in Red, 2020). Similar concerns about knowledge diversity have been raised for Wikidata, a multilingual knowledge graph hosted by the Wikimedia foundation, which is predicted to become a key machine-readable knowledge base for artificial intelligence systems (Graham, 2012). Finally, research by among others McMahon, Johnson, and Hecht (2017), Vincent, Johnson, and Hecht (2018), and investigative reporting by Wired magazine has shown that the relationship between Google and Wikipedia is particularly close-knit (Matsakis, 2019). This is controversial, as it makes the Google-Wikipedia partnership the de-facto source of knowledge on the web, and thus also a political hub. The critical examination of representations of politicians on these platforms is therefore an active area of research. Recent scholarship in this domain has for instance uncovered that search results for politicians in Google and Wikipedia can be biased for gender and party identity (Pradel, 2020), and that editors of politicians’ pages tend to focus on particular parties and choose references from specific news outlets (Agarwal, Redi, Sastry, Wood, & Blick, 2020).

2. Research Question and Hypotheses

These well-researched problems with the web’s central knowledge sources lift the present article’s main research question, that is: how deep does one actually need to dig into Wikipedia and Wikidata’s historical, political or biographical information content, routed through Google or not, before confronting fundamental epistemological issues? We are thereby interested in those problems that emerge when considering representations of even the most basic data concerning governments and those in positions of government, such as their names and time in office. It is our contention that a variety of issues can be revealed through a detailed, comparative study of multilingual Wikipedia and Wikidata content of a same-topic item, in this case Belgian prime ministers, and that these problems transcend the affordances of the platforms under scrutiny. Specifically, we hypothesise that a micro-level analysis of these government-related data points towards fundamental problems of humanistic knowledge formalisation, such as concerns with the naming, classification and interlinking of entities.

On a foundational level, we thus address the question of how data in Wikipedia and Wikidata are imagined in the context of (digital) humanistic inquiry, thereby positioning our research in the emerging field of ‘data studies’. Following media scholar Lisa Gitelman (2013), this humanistic endeavour at the intersections of science and technology studies and media studies asks how data are ‘variously cooked within the varied circumstances of their collection, storage, and transmission’ (idem, 3.). In the present context, this question can be approached from many angles. One might for instance choose to analyse the technological particularities of MediaWiki implementations such as Wikipedia or Wikidata as software platforms (MediaWiki, 2020a,b), investigate systemic bias (Martin, 2018; Oeberst, von der Beck, Cress, & Nestler, 2019), discuss the philosophical, sociological or economic foundations and impact of a free, open software movement (Tkacz, 2015), or explore the whole of Wikipedia or Wikidata content supported by big data approaches (Farda-Sarbas & Müller-Birn, 2019; Schroeder & Taylor, 2015). While each of these avenues is worth exploring, our approach instead draws inspiration from the epistemological criticism of information technologies and databases for humanistic knowledge in Oldman, Doerr, and Gradmann (2015), and the detailed analyses of online representations of humanistic (biographical) data and personhood in Brown and Simpson (2013). The latter effectively show how semantic web technologies, including the more sophisticated uses of ontologies such as OWL and SKOS fail to capture the nuance, complex relationships and social meanings that characterise humanities scholarship – complexities that ‘might otherwise be overlooked or dismissed as a trivial technicality’ (idem, p. 77). They achieve this by minutely examining the errors, blind spots and contradictions that mark the DBpedia and Project Orlando representations of the ‘outlier’ Michael Field, a pseudonym shared by the late Victorian writers Katharine Harris Bradley and Edith Emma Cooper. A thorough analysis of the problematic representations of one item of information content thus brings into view wide-ranging implications for the way in which humanist data is imagined in digital information spaces.

In the present study, we likewise assume the roles of critical knowledge-seekers, and engage in a practice that could be characterised as one of detailed information analysis, or, borrowing a term from biology, ‘nitpicking’. Despite its pejorative overtones, the act of nitpicking is an essential task for most mammals, as it prevents infectious parasites from affecting the health of the social group. In the same sense, our ‘grooming’ of Wikipedia and Wikidata targets inconsistencies and errors, which we consider as symptomatic for deeper problems with how humanistic data are demarcated and organised on said platforms. Concretely, we develop a case study that sifts through different layers of knowledge representation, starting from a Google search and concentrating on Wikipedia and Wikidata, in order to assess the data quality of a multi-lingual representation of a single-topic item: the factual information or data concerning Belgian governments and their prime ministers. This choice of topic is doubly motivated. For one thing – and contrary to Brown & Simpson – we chose not to examine an ‘outlier’, but instead to focus on the basic data concerning one of Europe’s central democracies. It is expected that a knowledge-seeker’s inquiry into the elementary data on Belgian governmental leadership should yield straightforward answers, especially from established knowledge bases such as Wikipedia and Wikidata, thus magnifying any discrepancies. Secondly, the topic aligns with the authors’ background knowledge, which facilitates the assessment of the retrieved data’s accuracy.

In order to concentrate our efforts as well as diversify the range of potential epistemological issues that are brought to light, we scrutinise the Dutch, French, English and German Wikipedia lists of historical Belgian governments and their leaders (focusing on data such as the names of the governments, their prime ministers, and the duration of their legislatures), the Wikidata equivalents of those Wikipedia lists, as well as different language-variants of the biographical Wikipedia pages of the post-war Belgian prime ministers Achille Van Acker (1898–1975), Leo Tindemans (1922–2014), and Sophie Wilmès (born 1975). A closer examination of the retrieved data allows us to systematically document and analyse crucial points where the data display a lack of agreement, both across sources (e.g. differences between the academic information and Wikipedia) and within a source (e.g. differences between different language versions of a Wikipedia item).

3. Data Collection and Methodology

The lists of governments and prime ministers, and the biographical pages that inform our main analysis were retrieved through a series of queries on Google, Wikipedia and Wikidata. As our case study consists of a ‘close reading’ of a limited number of information sources (pages) rather than a big data analysis, we manually conducted our search through the graphical user interfaces instead of programmatically through their API’s. The data were collected between April and July 2020. For the lists of governments, only the latest versions available in July were considered, for the biographical pages on Wikipedia, we also took into consideration the edit histories up to that point. The outcomes of each of the queries are documented in the tables included in the Appendix. The main purpose of these tables is to bring together and compare those data that might otherwise be spread over different platforms such as academic research projects and language-specific same-topic Wikipedia pages. In compiling the tables, transformations to the original research objects were kept to an absolute minimum, that is: we refrain from normalising or aggregating the data, and take it at face value. The remainder of this section documents the process of retrieving and organising the data that inform our further analysis.

3.1. Lists of Belgian Prime Ministers

The first, most general query that was executed consisted of searching the web for a list of prime ministers of Belgium. This search consisted of two stages. In a first stage, we consulted official and trusted resources in order to establish a factual baseline for the information on prime ministers. As of the moment of writing (July 2020), the official Belgian Federal Public service (FPS) website of the Chancellery of the Prime Minister thus yielded a chronological overview of prime ministers, which proved to be incomplete as it was missing the biographical descriptions of most prime ministers before 1979, as well as that of Mark Eyskens, who was prime minister in 1981 (Chancellery, 2020). Using the search window on the official website of the Belgian Parliament did not yield any results. However, a Google search did reveal the presence of a PDF document with an overview of Belgian governments since the Second World War on the website of the Parliament (Parliament, 2020). This document cannot be accessed through the website and is not up to date, as, at the moment of writing, it stays an orphaned web document fixed in time (2018). Finally, an academically peer reviewed and updated list of prime ministers was found through the website of the Royal Historical Commission of Belgium, which was founded in 1834 and has as its mission to provide access to written sources and studies related to the history of Belgium. One of these sources is the ‘Belelite’ database project (henceforth: ‘RHC-Belelite’), which was started in 2017 under the supervision of KU Leuven Professor Emmanuel Gerard (Aspeslagh, Verleden, Matheve, Heyneman, & Gerard 2020). This authoritative academic list of prime ministers since the independence of Belgium is used as a basis for further comparisons. With our academic baseline thus established, we executed a Google search for the term ‘List of Belgian prime ministers’, which returned a Wikipedia page with such a list as one of the first results. As Wikipedia provides a link to all language versions of any page, the associated pages in English, Dutch, French and German are easily accessible. Appendix Tables 1a, b offer a comprehensive comparison between the RHC-Belelite data and these Wikipedia lists of prime ministers.

3.2. Biographical Pages of Prime Ministers on Wikipedia

In a second step, data including the edit histories were sourced from the English, Dutch, French and German language versions of the biographical pages of three selected prime ministers, with the aim of scrutinising problematic representations and examining the differences between and within the versions of these biographies. This changelog was accessed through the ‘View history’ tab in the top-right corner of each Wikipedia page. We limit our scope to three biographies out of approximately 70 possible historical holders of the office of prime minister, in order to leverage a more meticulous comparative analysis. The political figures under discussion here are Achille Van Acker, a socialist prime minister of multiple governments between 12 February 1945 and 3 August 1946, Leo Tindemans, a Christian-democrat leading multiple governments between 25 April 1974 and 20 October 1978, and Sophie Wilmès, the incumbent prime minister of Belgium who replaced Charles Michel on 27 October 2019 when he was elected president of the European council and who was the leader of a minority government between 17 March 2020 and 1 October 2020.

In our comparative investigation of these Wikipedia entries, we take into consideration any edits to the biographical text or the infobox (a boxed summary on the right-hand side of the Wikipedia page) that are documented in the page’s changelog. This notably includes changes in the text strings and hyperlinks referring to the office of Prime minister of Belgium. The detailed outcomes of this query are documented in Appendix Tables 24ad.

3.3. Wikidata Items

A third and final step of data collection consisted of the creation of a list of Belgian prime ministers from Wikidata, with the objective of establishing a comparison between these Wikidata entries and the lists sourced from Wikipedia. A list of Belgian prime ministers with a record for each prime minister with start and end dates of their continuous mandates as PM was obtained by entering the relevant SPARQL queries into the Wikidata Query Service. A first query yielded a list of 72 entries (https://w.wiki/ZX4). As this list excluded the incumbent prime minister Wilmès, a second query was created to obtain the missing entry (https://w.wiki/ZXH). Appendix Tables 5ad contrast the outcomes of these queries with the corresponding Wikipedia lists of prime ministers.

4. Findings and Discussion

The tables in the Appendix allow us to systematically compare data from the different pages and platforms, with the aim of documenting inconsistencies within and across sources.

4.1. Problems with the Retrieved Lists of Belgian Governments and their Prime Ministers

A general observation that can be made with regards to retrieving lists of Belgian governments and their prime ministers, is that for knowledge-seekers it is non-trivial to find an authoritative version of such a list. This can be explained by the lack of an official list of Belgian prime ministers, as well as some shortcomings in the available literature and resources, such as a lack of broad historical coverage, a lack of digital resources, and occasional errors (Aspeslagh et al., 2020). When we compare the different-language Wikipedia listings of Belgian governments and their prime ministers with the authoritative RHC-Belelite list (see Appendix Tables 1a, b), a further four types of problems can be discerned.

Firstly, the data reflect different interpretations as to who actually held the office of Prime Minister at different points in time. This is particularly the case for the listings of the first Belgian governments. According to RHC-Belelite for instance, the first Belgian government leader was Goblet (28 February 1831–23 March 1831). Yet according to all of the Wikipedia listings, Belgium’s first prime minister was De Gerlache. It should also be noted that there are inconsistencies in how the latter’s time in office is represented differently on the Dutch and French Wikipedias on the one hand, and the English and German ones on the other. Belgium’s second prime minister according to RHC-Belelite is De Sauvage, whereas the Wikipedia lists put forward Lebau as prime minister, again with diverging term dates between them. Similar problems can be observed in the descriptions of the governments under De Meûlenaere and Goblet d’Alviella.

Secondly, the data display a lack of consensus about what constitutes a successor of a new government under the same prime minister. RHC-Belelite for instance lists three governments with Jaspar as Prime Minister, whereas the Dutch, French and English Wikipedia lists discern only two, and the German Wikipedia list mentions only one. A knowledge-seeker is confronted with a similar disagreement among sources in the case of the governments of Pierlot, some of which worked from exile in London during the Second World War. While sources agree on the start date of the first Pierlot government and the end date of the last Pierlot government, RHC-Belelite lists seven governments within this timeframe, and the Wikipedia lists only attest to six governments. These differences stem from an apparent lack of a common definition of what constitutes a successor government versus a continuation of the same government with some of its ministers changed.

Thirdly, we can observe some discrepancies between the authoritative RHC-Belelite list and the Wikipedia entries stemming from what are most likely typographical errors. It appears, for instance, that days and months are switched around in the dates that mark the end of Tindemans I and the start of Tindemans II in the English Wikipedia list. According to RHC-Belelite, Tindemans II ends on 6 March 1977 (06-03-1977 in Day-Month-Year notation). The English Wikipedia, in contrast, puts forward 3 June 1977 (03-06-1977 in Day-Month-Year notation) as the end of Tindemans I and the beginning of Tindemans II.

Fourthly, differences can be observed in the represented start and end dates of governments. In this regard, a striking example that reflects the possible extent of discrepancies between Wikipedia-representations and authoritative sources, is that of the recent governments of Michel and Wilmès. After Charles Michel, the then prime minister of the government Michel II, was elected president of the European council and he was replaced as PM by Sophie Wilmès on 27 October 2019. Wilmès then led the government Michel II until she eventually became the prime minister of a new minority government on 17 March 2020. RHC-Belelite correctly lists three Belgian governments between 11 October 2014 and 31 July 2020: Michel I (11 October 2014–9 December 2018), Michel II/Wilmès (9 December 2018–17 March 2020), and Wilmès I (starting on 17 March 2020). The Dutch, French and English Wikipedia lists of Belgian prime ministers, by contrast, each demarcate two Michel governments and two Wilmès governments. This example demonstrates that the different sources hold contrasting interpretations of how successive governments can and should be represented: RHC-Belelite has adjusted its naming convention for the idiosyncrasy of this change of prime ministers, while the Wikipedia lists rigorously follow the Belgian political mores of naming governments after their prime minister. Consequently, the latter leads to a contradiction when there is a de-facto new prime minister, but not a new government. Zooming in on the data, it can indeed be seen that all sources agree on 11 October 2014 as the start date of Michel I. However, while RHC-Belelite states that Michel I ends on 9 December 2018, the English Wikipedia list has Michel I ending on 21 December 2018. The successor government is named ‘Michel II/Wilmès’ by RCH-Belelite, with a start date of 9 December 2018. The English Wikipedia list puts this start date on 21 December 2018. This is consistent with the end date of Michel I in either list. RHC-Belelite has ‘Michel II/Wilmès’ ending on 17 March 2020. The English Wikipedia has the same end date for Wilmès I. Thus, according to RCH-Belelite, the incumbent government at the time of writing this article is Wilmès I, but according to the Wikipedia lists it is Wilmès II.

4.2. Problems with the Wikipedia Lists of Belgian Governments and their Prime Ministers

Further differences and inconsistencies are foregrounded when we compare the different language versions of the Wikipedia listings of governments among each other (see Appendix Table 1a, b). Firstly, the data show important differences in the spellings of first names, which are sometimes adapted to the main language of the article, but not consistently. ‘Frans Schollaert’ on the Dutch, English and German Wikipedias for instance equate to ‘François Schollaert’ on the French Wikipedia. ‘Henri Carton de Wiart’ in the French Wikipedia becomes ‘Henry Carton de Wiart’ in the English version. Another example of inconsistent naming is the concatenation of ‘Van de Vyvere’ to ‘Vande Vyvere’ in the German Wikipedia. Similarly, there is a lack of systematicity in the naming of governments, which sometimes consist of a single name and sometimes of two names. The convention of using double names of governments (e.g. De Mûelenaere-Nothomb or De Theux-Malou) are mostly the result of adding a precursor or successor to a government name, except for the case of Poullet-Vandervelde in the English Wikipedia, which apart from the name of Prime Minister Prosper Poullet also includes the name of the minister of foreign affairs Emile Vandervelde.

Secondly, the different language versions display a lack of agreement on the start dates and end dates of governments, in particular in the case of the first three governments. The first Belgian government led by de Gerlache took office on 26 February 1831 according to the Dutch and French Wikipedia, but the German and English Wikipedia posit 27 February 1831 as the start date. The difference is larger with regard to the end date of this government, as the Dutch and French Wikipedia indicate 23 March as end date, but German and English Wikipedia mention 10 March 1831, a difference of almost two weeks. The successor government of Lebeau took office on 23 March 1831 in Dutch and French Wikipedia, whereas the German Wikipedia gives 28 March 1831 as start date but all three do mark 21 July 1831 as end date. Finally, the English Wikipedia declares 10 March as the start date. The Lebeau government ended on 21 July 1831 in every language except for the English Wikipedia, where it is presented as ending on 24 July 1831. Furthermore, the third Belgian Government of de Mêulenaere took office on 24 July 1831 according to the German and English Wikipedia, but according to the Dutch and French Wikipedia this happened two days later (26 July 1831). The latter gives as end date 17 September 1832, but the German and French Wikipedia mark 20 October as the end date of that government.

Finally, we can observe that as a result of the inconsistent splits or concatenations of governments across the different languages, the total number of Belgian governments since 1831 is different for several of the studied Wikipedia pages. The Dutch Wikipedia page thus lists 99 Belgian governments, the French version 98 governments, and the English version 96 governments. The German version does not give a ranking number to Belgian governments, but does rank the Belgian governments in historical order (Appendix Table 1a, b).

4.3. Problems with the Prime Ministers’ Biographical Articles

A deeper analysis of the individual biographical Wikipedia pages (including a comprehensive look into their development over time) reveals the extent of content and information differences of a Wikipedia biography in different languages. The structure, contents and edit histories of these pages testify to variances in what is considered appropriate, worthwhile and correctly sourced information to be included in biographies, as well as to the differences in the pace with which this information is edited or corrected.

For one thing, these discrepancies manifest themselves on the pages’ structural level. While preformatted templates for writing articles and categorising information exist on Wikipedia, editors are not obliged to follow them. Consequently, different versions of the same topic item, such as a biography, can consist of different sections, which in themselves might contain very different types of information. This, e.g., becomes readily apparent when we compare the different language versions of the biographical page of Sophie Wilmès (see Appendix Figure 3ab for full renderings of the pages), or the corresponding infoboxes (see Figure 1).

Fig. 1: 

Side-by-side comparison of the infoboxes on the biographical Wikipedia pages of Sophie Wilmès (SW:NL 2020, SW:FR 2020, SW:DE 2020, and SW:EN 2020; snapshots show the last available revisions up to the end of July 2020).1

The edit histories of the different language versions of the pages reveal a further diversity and debate about which information to include or exclude. Until 19 April 2007, e.g., the first sentence of the Dutch version of Achiel Van Acker’s biography referred to him as a ‘freemason’ (AVA:NL 2020). This qualification was removed, with one commenter explaining that Van Acker might as well be called a ‘broom binder’ or ‘basket weaver’. This relatively swift change contrasts with a rather joking reference to Van Acker’s lack of mastery of Hebrew when meeting then-prime minister of Israel, which was only removed from the Dutch page on 21 June 2013, after figuring online for nine years. The French version of the page even features a similar joke since 5 November 2005, albeit in its ‘anecdote’ section (AVA:FR 2020). The editors’ criteria for relevance thus seem rather heterogeneous, and the speed at which corrections are made differs between language versions.

A closer look at the edit histories of the biographies of Leo Tindemans shows some remarkable changes in the number of governments in which he participated. Tindemans was prime minister from 25 April 1974 till 20 October 1978. All his Wikipedia biographies were first created in 2004. Until the end of 2006 the Dutch biography referred to six governments (LT:NL 28-03-2004 @ 19:06 till LT:NL 14-11-2006 @ 08:44), thereafter it referred to only two governments. Until 2011, the French biography stated that he was prime minister, without mentioning the number of governments over which he presided (LT:FR 17-09-2011 @ 12:17), after which date the infobox mentions four governments. The German Wikipedia biography only refers to the fact that he was prime minister without referencing his governments (LT:DE 2020). The English Wikipedia changes the number of governments he led from six to two in November 2007 (LT:EN 05-11-2007 @ 19:12, LT:EN 10-11-2007 @ 01:31). When inspecting the Wikipedia pages of each government in the different languages at the time of writing, the Dutch Wikipedia has separate pages for two Tindemans governments (Tindemans I, II), whereas the French and German Wikipedias each have four pages (Tindemans I, II, III, IV). The English Wikipedia does not offer specific pages for governments led by Tindemans. Arguably, the reasons for these differences are political-cultural, and depend on whether or not a transformation from a minority government to a majority government (or vice versa), notably through participation of regionalist political parties, is interpreted as constituting a new government. Starting out as a minority government, the first Tindemans I government was enlarged with a regional Walloon government (Tindemans II) which quit after three years, returning Tindemans to lead a minority government (Tindemans III). After elections a new Belgian majority government was formed (Tindemans IV). The French Wikipedia considers these as four separate governments, whereas the Dutch Wikipedia groups the first three together into one government.

Immediately after becoming the prime minister the edit histories for some of the biographical pages of Sophie Wilmès add information about her ancestry. Since 1 November 2019, the French version features the statement that ‘her mother is Jewish and lost multiple relatives in the Shoah’, with a reference to the newspaper the Times of Israel (SW:FR 01-11-2019 @ 05:31). This statement is debated in the discussion section, which raises concerns about the relevance and potential privacy issues concerning such information. Similar references to Wilmès’ ancestry are made on the German and English pages, which also cite Israeli newspapers as their sources (SW:DE 29-10-2019 @ 21:08, SW:EN 28-10-2019 @ 19:11). The Dutch page, by contrast, mentions the professional credentials of Wilmès’ mother, but does not refer to religion (SW:NL 2020). Thus, in this case, the texts and discussions reveal different ideological stances to the subject matter.

The problems of classification that present themselves in the body of the pages are further reflected in the historical changes made to the pages’ hyperlink texts and infoboxes (see Appendix Tables 24ad for an in-depth evaluation). A detailed but crucial piece of information to consider here, are the strings that classify the political figures under discussion as ‘prime ministers’, and the destinations to which these classifiers might lead. These strings display a high degree of variation over time and between language versions. On the Dutch page of Achiel Van Acker for instance, the string ‘premier van België’ (‘Belgian Prime Minister’) refers to a Dutch page that explains the role of ‘Eerste minister’ by contrasting it with the regional Belgian functions of Flemish ‘minister-president’ and the term used for the leader of the Dutch government (also ‘minister-president’) (Appendix Table 2a). In this version, there is no link to the aforementioned Wikipedia list of Belgian prime ministers. The French and English Wikipedia biographies do however alternate between referring to a detailed page on the role of ‘prime minister’ and the contextualising overviews of lists of prime ministers (Appendix Table 2b and Table 2d). The German page does not contain any hyperlinks from the string ‘Premierminister’ (Appendix Table 2c). Furthermore, this page never featured a summary infobox.

In addition to similar types of problems, the biographical pages of Leo Tindemans present a mismatch between the non-hyperlinked number in the succession of Belgian prime ministers, and the actual, correct number. A string such as ‘58ste Premier van België’ (Appendix Table 3a) is thus ‘hard-coded’ by the writer of the article, and in no way connected to the numberings in the Wikipedia listings of Prime Ministers discussed earlier.

Finally, a particularly striking classification choice concerns the fact that the English Wikipedia page for Sophie Wilmès includes the prime minister in the list of ‘Jewish Belgian politicians’ (a category which exists only in English, Hebrew and Urdu), as well as the list of ‘Jewish Prime Ministers’ (a category that exists only in English, Hebrew, Urdu, and Vietnamese). However, apart from mentioning her mother’s Jewish ancestry, none of the actual biographical texts discussed above claims that Wilmès herself is Jewish.

4.4. Wikidata representations

The third level of representation (which could be considered the ‘deepest’ level), comprises the Wikidata knowledge graph. While Wikidata is intended to become one of the main knowledge bases for artificial intelligence systems, a close comparison of the retrieved data with the authoritative RHC-Belelite information reveals similar inconsistencies as the Wikipedia data (see Appendix Table 5a). One such inconsistency is the end date of the government of De Trooz, which is marked as 9 January 1908 in RHC-Belelite, but as 31 December 1907 in all of Wikipedia, and Wikidata.

Of particular interest here, however, are a number of discrepancies between the Wikipedia lists of prime ministers, and the corresponding Wikidata items (Appendix Tables 5bd). When we, e.g., inspect the succession of the governments of Paul Vanden Boeynants by the government of Gaston Eyskens, a logical contradiction presents itself, as Wikidata lists two different start dates for this government (17 June 1968 and 17 July 1968) (Appendix Table 5b). The former date is consistent with the RHC-Belelite list, the latter is consistent with Wikipedia’s listing. A similar situation presents itself in the case of the governments Vanden Boeynants-Martens (Appendix Table 5c). Here as well, Vanden Boeynants’ premiership is contradictorily presented as ending on two different dates (3 April 1979 in RHC-Belelite as well as the German and English Wikipedia listings of Belgian prime ministers and 3 March 1979 in the Dutch and French Wikipedia listings of Belgian prime ministers). Finally, when we zoom in on the governments of Mark Eyskens, who succeeded the fourth Martens government, the start of the premiership of Mark Eyskens is likewise marked by two different dates (6 April 1981 in RHC-Belelite and 31 March 1981 in all Wikipedia listings of Belgian prime ministers) (Appendix Table 5d).

These detailed observations profoundly problematise the relation between Wikipedia and Wikidata, as it is obvious that both platforms are not as closely connected as their names would suggest. While Wikidata formalises some of the ontological categories that are also present in the Wikipedia data and biographical pages discussed earlier (such as start and end dates of governments), we can nonetheless observe problems on the level of the actual information content, that is, of the facts that fill these ontological categories. Arguably, the observed differences between the Wikidata items and same-topic Wikipedia information can be attributed to the fact that these projects do not necessarily share user communities (also see Wikidata, 2020 for a discussion on the relation between both projects).

4.5. Overview of findings

Our ‘nitpicking’ of the knowledge representations of Belgian governments and their prime ministers yields four main findings concerning the individual platforms under investigation, as well as the relationships between those platforms. For one thing, the examined data suggest a lack of agreement between authoritative academic sources on Belgian governments and their leadership, and the information that is presented in Wikipedia. Moreover, it has been shown that, unlike the Wikipedia information, authoritative sources are not easily retrieved by knowledge-seekers through a traditional Google search. Next, the examined data show that for basic factual information, there can be disparity between the different language Wikipedia articles on the same topic, be they lists of governments and their leaders, or biographical pages dedicated to individual prime ministers. Furthermore, it has been demonstrated that there are significant differences between the information presented in Wikipedia and that in Wikidata, suggesting a rather loose-knit relation between both platforms. Finally, the different types of errors that were discussed span different categories, including problems of naming, classification, and linking.

5. Implications and conclusions

We have investigated different representations of factual biographical information about Belgian prime ministers, in order to test the central hypothesis that a detailed analysis of these representations might reveal inconsistencies and errors that are indicative of more fundamental epistemological problems. In support of this main hypothesis, a detailed analysis of the information content on Belgian prime ministers as found on Google, Wikipedia and Wikidata indeed revealed different types of inconsistencies and errors. Yet, what are the implications of these observations for the online representation of knowledge, in particular humanistic knowledge related to those in positions of power?

First and foremost, it should be acknowledged that the observed problems can in part be attributed to the affordances of the platforms under discussion. Differences between the linguistic variants of Wikipedia pages on the same topic for instance, are due to the fact that these pages are by no means translations, but rather stand-alone pages that are often edited by different communities. Discussions in the ‘Discussion’ section of the French version of the pages are mostly conducted by Francophone editors, whereas discussions concerning the Dutch page are conducted in Dutch. Similarly, some of the problems with Wikidata might be attributed to the fact that this project’s data are sourced by humans and machines (bots) alike, and that much of its social and technical infrastructure is still under development.

Of course, the errors and inconsistencies observed in our case study do not render platforms like Wikipedia or Wikidata useless. Pragmatically speaking, the documented issues could manually be resolved by any engaged Wikipedian. However, a different picture presents itself when we consider that we have only discussed a fraction of the information related to the 70 historical Belgian prime ministers available in 19 of Wikipedia’s many languages, let alone of all the other potential topics that could have been chosen as the object of this study. In order to tackle the observed problems at this scale, a degree of automation becomes necessary. While proposing such a technical solution is beyond the scope of this article, we argue that the type of grooming demonstrated in these pages is a necessary prerequisite for the construction of such systems, as an understanding of the fundamental problems precedes their solutions.

In this regard, the nature of the observed errors does point to deeper issues. First and foremost, it is striking that most of the observed problems go to the core of any knowledge representation, that is: naming, classifying and interlinking entities. The representations that were evaluated fail to capture or find a consensus on the details that define biographical personhood and identity. Indeed, our analyses have revealed problems with the spelling of names of individual prime ministers, the nomenclature of the governments in which they served, and the start and end dates of these governments. Moreover, mechanisms for resolving those issues, such as hyperlinks, were revealed missing or inadequate. Such details and problems can easily be overlooked in ‘big data’ approaches. In this regard, the outcomes of our case study align with previous research on the problem of formalising humanistic knowledge conducted by Brown and Simpson (2013).

When we finally do zoom out again and, following Gitelman (2013), evaluate how humanist data might be imagined on Google, Wikipedia and Wikidata, we have begun to reveal a rather fragmented picture. While the three platforms under discussion are growing towards each other on an organisational level, the actual representations of same-topic items are still quite disjointed. This is not because the objects themselves are marked by differences or fragmentation (although we have acknowledged some idiosyncrasies pertaining to the Belgian situation), but rather because on the level of information contents, the platforms display important discrepancies and errors. Our findings thus stress the continued importance of critical, humanistic evaluation of data, especially in growing knowledge ecosystems where humans increasingly work alongside machines. In such environments, where errors are (semi-)automatically compounded or fed into newer knowledge systems, it is necessary to remain epistemologically curious and vigilant about information quality, in particular at the smallest scales. Future research is thus required to continuously monitor the state and quality of our trusted knowledge bases, and to develop measures for incorporating humanistic criticism into information infrastructures.