28 Nov-1 Dec 2022 Paris (France)
The convert2xml script - publishing metadata from data files using DDI-Lifecycle
Wolfgang Zenk-Möltgen  1@  
1 : GESIS – Leibniz Institute for the Social Sciences



Within the ExploreData project, GESIS has brought many of its data collections into a common metadata database to support the search and re-use of documentation using the DDI-Lifecycle 3.2 standard. This effort focused on fully documented studies with a codebook, variable report, and complete questionnaire documentation available.

In addition to those, there are many more research data collections in the archive that only have documentation at the dataset level available, i.e. variable names, labels, and value labels. With the convert2xml project, we aimed to bring these studies into the same DDI-Lifecycle 3.2 standard and include them into the metadata database.

The current implementation is a Python script that reads SPSS sav files and exports the metadata into DDI 3.2 XML files. It is integrated into the data archiving infrastructure and does automatically update new versions of any study when it is published. It supports English and German documentation, several versions of SPSS data files, and different encodings. Up to now, we have been able to include more than a thousand data files into the metadata database, making them available to the GESIS search for research data on our website.

