---
title: Use PRQL on R
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Use PRQL on R}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r}
#| include: false
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
library(prqlr)
```
[PRQL](https://prql-lang.org/) (Pipelined Relational Query Language, pronounced "Prequel")
is a modern language for transforming data, can be compiled to SQL.
This package provides a simple function to convert a PRQL query string to a SQL string.
For example, this is a PRQL query.
```{prql}
#| label: sample
#| eval: false
from mtcars
filter cyl > 6
select {cyl, mpg}
derive {mpg_int = math.round 0 mpg}
```
And, this is the SQL query that is compiled from the PRQL query.
```{prql}
#| label: sample
#| echo: false
#| engine-opts:
#| signature_comment: false
```
To compile a PRQL string, just pass the query string to the `prql_compile()` function, like this.
```{r}
library(prqlr)
"
<>
" |>
prql_compile() |>
cat()
```
This output SQL query string can be used with already existing great packages that manipulate data with SQL.
## Work with DB
Using it with the `{DBI}` package, we can execute PRQL queries against the database.
```{r}
library(DBI)
# Create an ephemeral in-memory SQLite database
con <- dbConnect(RSQLite::SQLite(), ":memory:")
# Create a table inclueds `mtcars` data
dbWriteTable(con, "mtcars", mtcars)
# Execute a PRQL query
"
<>
take 3
" |>
prql_compile("sql.sqlite") |>
dbGetQuery(con, statement = _)
```
We can also use the `sqldf::sqldf()` function to automatically register Data Frames to the database.
```{r}
"
<>
take 3
" |>
prql_compile("sql.sqlite") |>
sqldf::sqldf()
```
Since SQLite is used here via `{RSQLite}`, the `target` option of `prql_compile()` is set to `"sql.sqlite"`.
Available target names can be found with the `prql_get_targets()` function.
## Work with R Data Frames
Using `{prqlr}` with the `{tidyquery}` package, we can execute PRQL queries against R Data Frames via `{dplyr}`.
`{dplyr}` is a very popular R package for manipulating Data Frames,
and the PRQL syntax is very similar to the `{dplyr}` syntax.
Let's run a query that aggregates a Data Frame `flights`, contained in the `{nycflights13}` package.
```{r}
library(tidyquery)
library(nycflights13)
"
from flights
filter (distance | in 200..300)
filter air_time != null
group {origin, dest} (
aggregate {
num_flts = count this,
avg_delay = (average arr_delay | math.round 0)
}
)
sort {-origin, avg_delay}
take 2
" |>
prql_compile() |>
query()
```
This query can be written with `{dplyr}`'s functions as follows.
```{r}
library(dplyr, warn.conflicts = FALSE)
library(nycflights13)
flights |>
filter(
distance |> between(200, 300),
!is.na(air_time)
) |>
group_by(origin, dest) |>
summarise(
num_flts = n(),
avg_delay = mean(arr_delay, na.rm = TRUE) |> round(0),
.groups = "drop"
) |>
arrange(desc(origin), avg_delay) |>
head(2)
```
Note that `{dplyr}` queries can be generated by the `tidyquery::show_dplyr()` function!