Maintaining the I/O infrastructure of R
Chung-hong Chan
GESIS
2024-07-10
Source: R for Data Science
“First, you must import your data into R. This typically means that you take data stored in a file, database, or web application programming interface (API) and load it into a data frame in R. If you can’t get your data into R, you can’t do data science on it!”
Source: https://xkcd.com/2347/
Who maintain these packages?
haven
readr
, readxl
writexl
, jsonlite
data.table
yaml
openxlsx
foreign
Who maintain these packages?
haven
Hadley Wickhamreadr
, readxl
Jennifer Bryanwritexl
, jsonlite
Jeroen Oomsdata.table
Tyson Barrettyaml
Shawn Garbettopenxlsx
🇦🇹 Philipp Schaubergerforeign
R Core TeamBefore 2013, data import and export
rio
, since 2013rio
version 0.1.1 2013-08-26 14:02 CESTimport <- function(file="", format=NULL, header=TRUE, ... ) {
format <- .guess(file, format)
x <- switch(format,
txt=read.table(file=file, sep="\t", header=header, ...), ##tab-seperate txt file
rds=readRDS(file=file, ...),
csv=read.csv(file=file, ...),
dta=read.dta(file=file, ...),
sav=read.spss(file=file,to.data.frame=TRUE, ...),
mtp=read.mtp(file=file, ...),
rec=read.epiinfo(file=file, ...),
stop("Unknown file format")
)
return(x)
}
rio
developmentswitch()
) by Jason Beckerrio
developmentrio
1.0.0rio
in real world 1rio
developmentswitch()
) by Jason Beckerrio
developmentswitch()
) by Jason BeckerreadODS
readxl::read_excel()
and writexl::write_xlsx()
readODS
prior 2.0.0XML
, and then xml2
, in pure RreadODS
issue 49readODS
issue 71readODS
issue 71odfpy
(Also Julia’s wrapper: OdsIO.jl
), JS SheetJS
Working but
data.table::fread()
- 1sreadxl::read_excel()
- 2s“I’ll devote my 2023 to the project I tentatively called”Projekt 71”. The idea is simple: I want to have a way that can read the aforementioned “jts0501.ods” directly as an R data frame without memory issues; but yet pass at least 80% of the current unit tests of
readODS
. So, I am embarking on solving just one Github issue ofreadODS
. I will put other of my R packages into maintenance mode and focus only on this.”
readODS::read_ods()
in C++ (RapidXML) - super speed improvementxml2
Reading speed: 5539 x 11
Writing speed: 3000 x 8
“Software, unlike papers or grants, is never done.”
readxl
- working on minty
readxl
(and other formats)rio
-based comparisonLahman::Batting
(11,2164 x 22)
Rank | Format | Export | Import | Size | Accuracy |
---|---|---|---|---|---|
4 | csv | 1 | 1 | 1 | 2 |
rio
-based comparisonRank | Format | Export | Import | Size | Accuracy |
---|---|---|---|---|---|
1 | feather | 0.9 | 0.3 | 0.5 | 0 |
2 | parquet | 3.6 | 0.4 | 0.3 | 0 |
3 | qs | 1.7 | 0.7 | 0.2 | 2 |
4 | csv | 1 | 1 | 1 | 2 |
rio
-based comparisonRank | Format | Export | Import | Size | Accuracy |
---|---|---|---|---|---|
1 | feather | 0.9 | 0.3 | 0.5 | 0 |
2 | parquet | 3.6 | 0.4 | 0.3 | 0 |
3 | qs | 1.7 | 0.7 | 0.2 | 2 |
4 | csv | 1 | 1 | 1 | 2 |
… | |||||
16 | xlsx | 141.4 | 36.3 | 1.3 | 21 |
rio
-based comparisonRank | Format | Export | Import | Size | Accuracy |
---|---|---|---|---|---|
1 | feather | 0.9 | 0.3 | 0.5 | 0 |
2 | parquet | 3.6 | 0.4 | 0.3 | 0 |
3 | qs | 1.7 | 0.7 | 0.2 | 2 |
4 | csv | 1 | 1 | 1 | 2 |
16 | xlsx | 141.4 | 36.3 | 1.3 | 21 |
23 | fods | 77.3 | 119.7 | 42.2 | 21 |
25 | ods | 258.2 | 253.7 | 0.8 | 21 |
rio
and readODS
rio
and readODS
“A good beginning requires enthusiasm, a good ending requires discipline.”
The Motto of the German Football Association for Word Cup 2014
rio
and readODS
“Ein guter Anfang braucht Begeisterung, ein gutes Ende Disziplin.”
das Motto des DFB für die WM 2014
rio
and readODS
“Ein guter Anfang braucht Begeisterung, ein gutes Ende Disziplin.”
das Motto des DFB für die WM 2014
Thank you:
rio
from 2015 to 2023)readODS
)read_ods
)rio
)rio
and leading the GESIS TSA Team)readr
)readODS
)rio
and readODS
contributors and users.More about me: https://www.chainsawriot.com/
rio
in real world 2rio
in real world 3chainsawriot.github.io/salzburg_user2024