--- title: Use PRQL on R output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Use PRQL on R} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r} #| include: false knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(prqlr) ``` [PRQL](https://prql-lang.org/) (Pipelined Relational Query Language, pronounced "Prequel") is a modern language for transforming data, can be compiled to SQL. This package provides a simple function to convert a PRQL query string to a SQL string. For example, this is a PRQL query. ```{prql} #| label: sample #| eval: false from mtcars filter cyl > 6 select {cyl, mpg} derive {mpg_int = math.round 0 mpg} ``` And, this is the SQL query that is compiled from the PRQL query. ```{prql} #| label: sample #| echo: false #| engine-opts: #| signature_comment: false ``` To compile a PRQL string, just pass the query string to the `prql_compile()` function, like this. ```{r} library(prqlr) " <> " |> prql_compile() |> cat() ``` This output SQL query string can be used with already existing great packages that manipulate data with SQL. ## Work with DB Using it with the `{DBI}` package, we can execute PRQL queries against the database. ```{r} library(DBI) # Create an ephemeral in-memory SQLite database con <- dbConnect(RSQLite::SQLite(), ":memory:") # Create a table inclueds `mtcars` data dbWriteTable(con, "mtcars", mtcars) # Execute a PRQL query " <> take 3 " |> prql_compile("sql.sqlite") |> dbGetQuery(con, statement = _) ``` We can also use the `sqldf::sqldf()` function to automatically register Data Frames to the database. ```{r} " <> take 3 " |> prql_compile("sql.sqlite") |> sqldf::sqldf() ``` Since SQLite is used here via `{RSQLite}`, the `target` option of `prql_compile()` is set to `"sql.sqlite"`. Available target names can be found with the `prql_get_targets()` function. ## Work with R Data Frames Using `{prqlr}` with the `{tidyquery}` package, we can execute PRQL queries against R Data Frames via `{dplyr}`. `{dplyr}` is a very popular R package for manipulating Data Frames, and the PRQL syntax is very similar to the `{dplyr}` syntax. Let's run a query that aggregates a Data Frame `flights`, contained in the `{nycflights13}` package. ```{r} library(tidyquery) library(nycflights13) " from flights filter (distance | in 200..300) filter air_time != null group {origin, dest} ( aggregate { num_flts = count this, avg_delay = (average arr_delay | math.round 0) } ) sort {-origin, avg_delay} take 2 " |> prql_compile() |> query() ``` This query can be written with `{dplyr}`'s functions as follows. ```{r} library(dplyr, warn.conflicts = FALSE) library(nycflights13) flights |> filter( distance |> between(200, 300), !is.na(air_time) ) |> group_by(origin, dest) |> summarise( num_flts = n(), avg_delay = mean(arr_delay, na.rm = TRUE) |> round(0), .groups = "drop" ) |> arrange(desc(origin), avg_delay) |> head(2) ``` Note that `{dplyr}` queries can be generated by the `tidyquery::show_dplyr()` function!