Maintaining the I/O infrastructure of R
Chung-hong Chan
GESIS
2024-07-10
Source: R for Data Science
“First, you must import your data into R. This typically means that you take data stored in a file, database, or web application programming interface (API) and load it into a data frame in R. If you can’t get your data into R, you can’t do data science on it!”
Source: https://xkcd.com/2347/
Who maintain these packages?
havenreadr, readxlwritexl, jsonlitedata.tableyamlopenxlsxforeignWho maintain these packages?
haven Hadley Wickhamreadr, readxl Jennifer Bryanwritexl, jsonlite Jeroen Oomsdata.table Tyson Barrettyaml Shawn Garbettopenxlsx 🇦🇹 Philipp Schaubergerforeign R Core TeamBefore 2013, data import and export
rio, since 2013rio version 0.1.1 2013-08-26 14:02 CESTimport <- function(file="", format=NULL, header=TRUE, ... ) {
  format <- .guess(file, format)
  x <- switch(format,
              txt=read.table(file=file, sep="\t", header=header, ...), ##tab-seperate txt file
              rds=readRDS(file=file, ...),
              csv=read.csv(file=file, ...),
              dta=read.dta(file=file, ...),
              sav=read.spss(file=file,to.data.frame=TRUE, ...),
              mtp=read.mtp(file=file, ...),
              rec=read.epiinfo(file=file, ...),
              stop("Unknown file format")
              )
  return(x)
}rio developmentswitch()) by Jason Beckerrio developmentrio 1.0.0rio in real world 1rio developmentswitch()) by Jason Beckerrio developmentswitch()) by Jason BeckerreadODSreadxl::read_excel() and writexl::write_xlsx()readODS prior 2.0.0XML, and then xml2, in pure RreadODS issue 49readODS issue 71readODS issue 71odfpy (Also Julia’s wrapper: OdsIO.jl), JS SheetJSWorking but
data.table::fread() - 1sreadxl::read_excel() - 2s“I’ll devote my 2023 to the project I tentatively called”Projekt 71”. The idea is simple: I want to have a way that can read the aforementioned “jts0501.ods” directly as an R data frame without memory issues; but yet pass at least 80% of the current unit tests of
readODS. So, I am embarking on solving just one Github issue ofreadODS. I will put other of my R packages into maintenance mode and focus only on this.”
readODS::read_ods() in C++ (RapidXML) - super speed improvementxml2Reading speed: 5539 x 11
Writing speed: 3000 x 8
“Software, unlike papers or grants, is never done.”
readxl - working on mintyreadxl (and other formats)rio-based comparisonLahman::Batting (11,2164 x 22)
| Rank | Format | Export | Import | Size | Accuracy | 
|---|---|---|---|---|---|
| 4 | csv | 1 | 1 | 1 | 2 | 
rio-based comparison| Rank | Format | Export | Import | Size | Accuracy | 
|---|---|---|---|---|---|
| 1 | feather | 0.9 | 0.3 | 0.5 | 0 | 
| 2 | parquet | 3.6 | 0.4 | 0.3 | 0 | 
| 3 | qs | 1.7 | 0.7 | 0.2 | 2 | 
| 4 | csv | 1 | 1 | 1 | 2 | 
rio-based comparison| Rank | Format | Export | Import | Size | Accuracy | 
|---|---|---|---|---|---|
| 1 | feather | 0.9 | 0.3 | 0.5 | 0 | 
| 2 | parquet | 3.6 | 0.4 | 0.3 | 0 | 
| 3 | qs | 1.7 | 0.7 | 0.2 | 2 | 
| 4 | csv | 1 | 1 | 1 | 2 | 
| … | |||||
| 16 | xlsx | 141.4 | 36.3 | 1.3 | 21 | 
rio-based comparison| Rank | Format | Export | Import | Size | Accuracy | 
|---|---|---|---|---|---|
| 1 | feather | 0.9 | 0.3 | 0.5 | 0 | 
| 2 | parquet | 3.6 | 0.4 | 0.3 | 0 | 
| 3 | qs | 1.7 | 0.7 | 0.2 | 2 | 
| 4 | csv | 1 | 1 | 1 | 2 | 
| 16 | xlsx | 141.4 | 36.3 | 1.3 | 21 | 
| 23 | fods | 77.3 | 119.7 | 42.2 | 21 | 
| 25 | ods | 258.2 | 253.7 | 0.8 | 21 | 
rio and readODSrio and readODS“A good beginning requires enthusiasm, a good ending requires discipline.”
The Motto of the German Football Association for Word Cup 2014
rio and readODS“Ein guter Anfang braucht Begeisterung, ein gutes Ende Disziplin.”
das Motto des DFB für die WM 2014
rio and readODS“Ein guter Anfang braucht Begeisterung, ein gutes Ende Disziplin.”
das Motto des DFB für die WM 2014
Thank you:
rio from 2015 to 2023)readODS)read_ods)rio)rio and leading the GESIS TSA Team)readr)readODS)rio and readODS contributors and users.More about me: https://www.chainsawriot.com/
rio in real world 2rio in real world 3chainsawriot.github.io/salzburg_user2024