Create Datasets that are Easy to Combine and Reuse

Our new R software package, dataset is released on CRAN

Daniel Antal, Cultural & Creative Sectors and Industries Observatory

Nov 22, 2022 2 min read R_bloggers

The latest Reprex R package, dataset was released today on the Comprehensive R Archive Network. It is a very early, conceptual package that will help make scientific achievements more open, governmental data easier to find, and store information that can be better combined.

Data interoperability is almost a buzzword, yet we see very few comprehensive, good solutions to apply it. Try to find information on open government portals or on big open science repositories—apart from a few good examples, most datasets are as disorganized as any PC’s hard disk that is collecting dust in a shed.

The dataset package aims to bring together the best practices of data semantics, data organization, and the use of standard metadata to make sure that whatever you store in a data table, it will be immediately available for data analysis, activation, or combination in any new database.

Ambitious? It is, and dataset 0.1.9 is a very experimental product. While our other packages are aimed at intermediate users with a clear use case in mind, dataset at this point is aimed at package developers. Casual or even heavy R users are unlikely to download it as a standalone product. Instead, dataset aims to be a stable developer basis for our existing products, rOpenGov packages, and many new uses.

Download [dataset](https://dataset.dataobservatory.eu/) — Download dataset

The metadata aim of dataset it to add standardized metadata to r data.frames, tibbles, data.tables and other similar structured, tabular objects. The organization and semantic objectives are to bring the tidy data concept closer to the datacube model, which is the basis of all statistical data exchanges, and W3C standards, which foster machine-to-machine data communications on the traditional web APIs and the semantic web.

Makes data importing easier and less error-prone;
Leaves plenty of room for documentation automation, resulting in far better reusability and reproducibility;
The publication of results from R following the FAIR principles is far easier, making the work of the R user more findable, more accessible, more interoperable and more reusable by other users;
Makes the placement into relational databases, semantic web applications, archives, repositories possible without time-consuming and costly data wrangling (See From dataset To RDF).

The first official release offers little immediate benefits. However, if you are an R package developer, we can bring you a few steps nearer to releasing your data products in a way that conforms the FAIR metadata principles. We can make a few steps to streamline your data wrangling. Make integration with relational databases easier. To make a step towards the semantic web.

Open Science open-data RDF Semantic web R

Daniel Antal

Data Scientist & Founder of the Digital Music Observatory

My research interests include reproducible social science, economics and finance.

Cultural & Creative Sectors and Industries Observatory

Automated data observatory

The creative and cultural sectors and industries are mainly made of networks of freelancers and microenterprises, with very few medium-sized companies. Their economic performance, problems, and innovation capacities are hidden. Our open collaboration to create this data observatory is committed to change this.