Historical Household Finance Database for the Low Countries: a common extensible data model for historical household financial data.

The World Bank’s Global Findex Database measures the extent to which present-day households use commercially available financial services to organise their finances.[1] The Hisfindex, part of the Social History of Finance project led by Oscar Gelderblom at the University of Antwerp[2], adds an historical dimension by extending Findex measurements for Belgium and the Netherlands further back in time and by expanding the set of services to include services unmediated by banks (e.g. shop credit). Historical household financial data is heterogeneous, however, and must be collected from a variety of sources including notarial deeds, probate inventories, succession tax returns, household budget surveys, statistical data from statistical agencies, annual reports and archives from financial service providers. Hence, merging these data in a single database to calculate Hisfindex indicators and analyse how past households managed theirs payments, savings, loans and insurance is challenging. Moreover, the Historical Household Finance Database has the ambition to become an international datahub for historical household finance research that can accommodate data in other formats from other countries.

We propose a solution to the challenges in the form of an extensible common data model based on GSIM – the United Nation’s General Statistical Information Model.[3] Our data model logically separates data that identifies entities (e.g. individuals and banks) from data that measures their activities in the field of household finance (e.g. borrowing and lending). In GSIM terminology, the data model has an identifier component and a measure component. The identifier component consists of the reference data for the unique identification of entities belonging to different classes (organizations, persons and households, plus sources and geographic locations). The measure component describes which characteristics (i.e. variables) of these entities are measured and how they are measured. Each data point can then be contextualised by a reference to an entity (from the identifier component) and a represented variable (from the measure component).

Our model is extensible because the identifier component allows for the unique identification of entities and the measure component is so generic that it can be used to store information about any kind of financial service or product. It is, in other words, possible to add new variables and coding schemes without the need to add tables or columns to the model. We believe our approach can therefore inspire other researchers who are building large-scale and long-term datasets from different sources for several countries.

[3] https://statswiki.unece.org/display/ClickableGSIM

Johan Poukens studied History and Archival Science. He obtained his PhD in History at the University of Leuven (Belgium) in 2017. He worked as an archivist and a librarian before becoming involved in 2018 as a postdoctoral researcher, project coordinator and a data manager in research infrastructure design and data collection projects at the University of Antwerp (Belgium) and the University of Groningen (the Netherlands). Since 2022, he holds a FED-tWIN mandate (funded by the Belgian Science Policy) at the University of Antwerp and the Belgian State Archives where he respectively manages the stock exchange data and catalogues the stock exchange archives collected by the Study Center for Companies and Exchanges (SCOB).