Skip to contents

Usage

The only function of rowcolr is extract_values(), which extracts structured data from Excel files by identifying row and column labels. The function works by drawing a horizontal line from a detected row label and a vertical line from a detected column label. The value at their intersection is extracted and stored.

The function provides flexibility by allowing both explicit identifiers and regex patterns:

  • If explicit identifiers (row_identifiers and col_identifiers) are provided, they take priority over regex.

  • If regex patterns (row_pattern, col_pattern) are provided, they act as a fallback when explicit identifiers are not exhaustive.

The package can be loaded with:

Extracting Data from an Excel File using specific row and column identifiers

If you have predefined row or column labels, you can use them directly by providing character vectors to the col_identifiers and row_identifiers arguments:

row_identifiers_example <- c("int_fixed_assets_row", "tang_fixed_assets_row",
                             "fin_fixed_assets_row", "tot_fixed_assets_row")
col_identifiers_example <- "2025_col"

dataset <- extract_values(rowcolr_example("example.xlsx"), 
                          row_identifiers = row_identifiers_example,
                          col_identifiers = col_identifiers_example)

head(dataset |> dplyr::select(-c(filename, sheet, row, col)))
#> # A tibble: 4 × 9
#>   row_label             col_label description    data_type error logical numeric
#>   <chr>                 <chr>     <chr>          <chr>     <chr> <lgl>     <dbl>
#> 1 int_fixed_assets_row  2025_col  int_fixed_ass… numeric   NA    NA       100000
#> 2 tang_fixed_assets_row 2025_col  tang_fixed_as… numeric   NA    NA       200000
#> 3 fin_fixed_assets_row  2025_col  fin_fixed_ass… numeric   NA    NA       150000
#> 4 tot_fixed_assets_row  2025_col  tot_fixed_ass… numeric   NA    NA       450000
#> # ℹ 2 more variables: date <dttm>, character <chr>

Extracting Data from an Excel File using regex patterns

Values can be extracted from an Excel file using regex patterns:

# Extract values using regex patterns
dataset <- extract_values(rowcolr_example("example.xlsx"), 
                          row_pattern = ".*_row$", 
                          col_pattern = ".*_col$")

head(dataset |> dplyr::select(-c(filename, sheet, row, col)))
#> # A tibble: 6 × 9
#>   row_label            col_label description data_type error logical  numeric
#>   <chr>                <chr>     <chr>       <chr>     <chr> <lgl>      <dbl>
#> 1 name_row             value_col _           character NA    NA            NA
#> 2 year_row             value_col _           numeric   NA    NA          2025
#> 3 kvk_row              value_col _           numeric   NA    NA      12345678
#> 4 test1_row            2026_col  _           character NA    NA            NA
#> 5 tes2_row             2026_col  _           character NA    NA            NA
#> 6 int_fixed_assets_row 2025_col  _           numeric   NA    NA        100000
#> # ℹ 2 more variables: date <dttm>, character <chr>

Extracting Data from an Excel File using both identifiers and regex patterns

You can combine the use of predefined row or column labels and regex patterns:

dataset <- extract_values(rowcolr_example("example.xlsx"), 
                          row_identifiers = row_identifiers_example,
                          col_identifiers = col_identifiers_example,
                          row_pattern = ".*_row$", 
                          col_pattern = ".*_col$")

No regex pattern or identifiers

If no identifiers or regex are provided, extract_values() will return an error:

dataset <- extract_values(rowcolr_example("example.xlsx"))

#> Error: No identifiers or regex patterns provided. Please specify 
#> `row_identifiers`, `col_identifiers`, `row_pattern`, or `col_pattern`.

Controlling Description Cleaning

By default, row and column labels are cleaned by removing row and column patterns before combining them into the description. To disable this, use clean_description = FALSE:

dataset <- extract_values(rowcolr_example("example.xlsx"), 
                          row_pattern = ".*_row$", 
                          col_pattern = ".*_col$",
                          clean_description = FALSE)

head(dataset |> 
       dplyr::select(c(row_label, col_label, description)))
#> # A tibble: 6 × 3
#>   row_label            col_label description                  
#>   <chr>                <chr>     <chr>                        
#> 1 name_row             value_col name_row_value_col           
#> 2 year_row             value_col year_row_value_col           
#> 3 kvk_row              value_col kvk_row_value_col            
#> 4 test1_row            2026_col  test1_row_2026_col           
#> 5 tes2_row             2026_col  tes2_row_2026_col            
#> 6 int_fixed_assets_row 2025_col  int_fixed_assets_row_2025_col

Conclusion

The rowcolr package is a powerful tool for extracting structured data from Excel files based on customizable row and column patterns. It is designed to be flexible, allowing for both regex-based matching and predefined lists of identifiers.