In this example, we’re going to show how the gor_create can be used to prepare and construct a query closure. This both reduces repetitions in code, as well as simplifies iterative workflows in GOR.

Load packages

First load the gorr package, the tidyverse package is recommended, but for the sake of simplicity we pick out the ones we’re using:

library(gorr)
library(magrittr) # pipe
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

Next we make a conn object for holding information on the API we’re connecting to. gor_connect takes 2 parameters api_key and project but if either are left out then it will try to read the environment variables GOR_API_KEY, and GOR_API_PROJECT respectively. Here below we have the GOR_API_KEY environment variable already defined so supplying the function only with a target project suffices. After this we create a query function/closure so we don’t have to reference conn again:

conn <- gor_connect(project = "ukbb_hg38")
#> Warning: 'gor_connect' is deprecated.
#> Use 'platform_connect' instead.
#> See help("Deprecated")
query <- gor_create(conn = conn)
query
#> ── GOR Creation Query ──────────────────────────────────────────────────────────
#> Connection
#>  Service Root: https://platform.wuxinextcodedev.com/api/query
#>  Project: ukbb_hg38
#> Definitions
#>   None
#> Create statements & virtual relations
#>   None

Now we can call that function with only the query parameter. Let’s search for genes containing BRCA and save the resulting table as a local dataframe mygenes:

mygenes <- query("gor #genes# | grep BRCA")
mygenes
#> # A tibble: 3 × 4
#>   chrom gene_start gene_end gene_symbol
#>   <chr>      <int>    <int> <chr>      
#> 1 chr13   32315085 32400268 BRCA2      
#> 2 chr17   43044294 43170245 BRCA1      
#> 3 chr17   43168169 43168249 BRCA1P1

Next we can expand on our previously defined query function by supplying it back into gor_create as the replace parameter. This time we include some definitions using the defs parameter and then we can alias our local dataframe so that we can reference it in remote queries. In GOR this is called virtual relations:

query <- gor_create(
  defs = "def variants = #dbsnp#",
  mygenes = mygenes, 
  replace = query
)
query
#> ── GOR Creation Query ──────────────────────────────────────────────────────────
#> Connection
#>  Service Root: https://platform.wuxinextcodedev.com/api/query
#>  Project: ukbb_hg38
#> Definitions
#>   def variants = #dbsnp#;
#> Create statements & virtual relations
#>  mygenes
#>    # A tibble: 3 × 4
#>      chrom gene_start gene_end gene_symbol
#>      <chr>      <int>    <int> <chr>      
#>    1 chr13   32315085 32400268 BRCA2      
#>    2 chr17   43044294 43170245 BRCA1      
#>    3 chr17   43168169 43168249 BRCA1P1

Now that we have our updated query function, we can use it to gor our table of genes and join it to the #dbsnp# table we aliased as variants in the definitions part above. The result is a list of all variants within each gene in our table

brca_variants <- query("
    gor [mygenes] | join -segvar variants        
")

brca_variants %>% 
  group_by(gene_symbol) %>%
  summarize(records = n(),
            variants = n_distinct(rsids))
#> # A tibble: 3 × 3
#>   gene_symbol records variants
#>   <chr>         <int>    <int>
#> 1 BRCA1         60548    49772
#> 2 BRCA1P1          34       31
#> 3 BRCA2         42346    35724

The reason for the difference in # recordsand # variants above can be explained by looking into the data:

target_variant <- 
  brca_variants %>%
  group_by(rsids) %>%
  count() %>% 
  ungroup() %>%
  arrange(desc(n)) %>%
  head(n = 1) %>%
  pull(rsids)

target_variant
#> [1] "rs397838402"
brca_variants %>% filter(rsids == target_variant) %>% select(-distance)
#> # A tibble: 48 × 8
#>    chrom gene_start gene_end gene_symbol      pos reference         allele rsids
#>    <chr>      <int>    <int> <chr>          <int> <chr>             <chr>  <chr>
#>  1 chr13   32315085 32400268 BRCA2       32395970 TTTTTTTTTTTTTTTT… T      rs39…
#>  2 chr13   32315085 32400268 BRCA2       32395971 TTTTTTTTTTTTTTTT… T      rs39…
#>  3 chr13   32315085 32400268 BRCA2       32395972 TTTTTTTTTTTTTTTTT T      rs39…
#>  4 chr13   32315085 32400268 BRCA2       32395973 TTTTTTTTTTTTTTTT  T      rs39…
#>  5 chr13   32315085 32400268 BRCA2       32395974 TTTTTTTTTTTTTTT   T      rs39…
#>  6 chr13   32315085 32400268 BRCA2       32395975 TTTTTTTTTTTTTT    T      rs39…
#>  7 chr13   32315085 32400268 BRCA2       32395976 TTTTTTTTTTTTT     T      rs39…
#>  8 chr13   32315085 32400268 BRCA2       32395977 TTTTTTTTTTTT      T      rs39…
#>  9 chr13   32315085 32400268 BRCA2       32395978 TTTTTTTTTTT       T      rs39…
#> 10 chr13   32315085 32400268 BRCA2       32395979 TTTTTTTTTT        T      rs39…
#> # … with 38 more rows