Usage
The only function of rowcolr
is
extract_values()
, which extracts structured data from Excel
files by identifying row and column labels. The function works by
drawing a horizontal line from a detected row label and a vertical line
from a detected column label. The value at their intersection is
extracted and stored.
The function provides flexibility by allowing both explicit identifiers and regex patterns:
If explicit identifiers (
row_identifiers
andcol_identifiers
) are provided, they take priority over regex.If regex patterns (
row_pattern
,col_pattern
) are provided, they act as a fallback when explicit identifiers are not exhaustive.
The package can be loaded with:
Extracting Data from an Excel File using specific row and column identifiers
If you have predefined row or column labels, you can use them
directly by providing character vectors to the
col_identifiers
and row_identifiers
arguments:
row_identifiers_example <- c("int_fixed_assets_row", "tang_fixed_assets_row",
"fin_fixed_assets_row", "tot_fixed_assets_row")
col_identifiers_example <- "2025_col"
dataset <- extract_values(rowcolr_example("example.xlsx"),
row_identifiers = row_identifiers_example,
col_identifiers = col_identifiers_example)
head(dataset |> dplyr::select(-c(filename, sheet, row, col)))
#> # A tibble: 4 × 9
#> row_label col_label description data_type error logical numeric
#> <chr> <chr> <chr> <chr> <chr> <lgl> <dbl>
#> 1 int_fixed_assets_row 2025_col int_fixed_ass… numeric NA NA 100000
#> 2 tang_fixed_assets_row 2025_col tang_fixed_as… numeric NA NA 200000
#> 3 fin_fixed_assets_row 2025_col fin_fixed_ass… numeric NA NA 150000
#> 4 tot_fixed_assets_row 2025_col tot_fixed_ass… numeric NA NA 450000
#> # ℹ 2 more variables: date <dttm>, character <chr>
Extracting Data from an Excel File using regex patterns
Values can be extracted from an Excel file using regex patterns:
# Extract values using regex patterns
dataset <- extract_values(rowcolr_example("example.xlsx"),
row_pattern = ".*_row$",
col_pattern = ".*_col$")
head(dataset |> dplyr::select(-c(filename, sheet, row, col)))
#> # A tibble: 6 × 9
#> row_label col_label description data_type error logical numeric
#> <chr> <chr> <chr> <chr> <chr> <lgl> <dbl>
#> 1 name_row value_col _ character NA NA NA
#> 2 year_row value_col _ numeric NA NA 2025
#> 3 kvk_row value_col _ numeric NA NA 12345678
#> 4 test1_row 2026_col _ character NA NA NA
#> 5 tes2_row 2026_col _ character NA NA NA
#> 6 int_fixed_assets_row 2025_col _ numeric NA NA 100000
#> # ℹ 2 more variables: date <dttm>, character <chr>
Extracting Data from an Excel File using both identifiers and regex patterns
You can combine the use of predefined row or column labels and regex patterns:
dataset <- extract_values(rowcolr_example("example.xlsx"),
row_identifiers = row_identifiers_example,
col_identifiers = col_identifiers_example,
row_pattern = ".*_row$",
col_pattern = ".*_col$")
No regex pattern or identifiers
If no identifiers or regex are provided,
extract_values()
will return an error:
dataset <- extract_values(rowcolr_example("example.xlsx"))
#> Error: No identifiers or regex patterns provided. Please specify
#> `row_identifiers`, `col_identifiers`, `row_pattern`, or `col_pattern`.
Controlling Description Cleaning
By default, row and column labels are cleaned by removing row and
column patterns before combining them into the description. To disable
this, use clean_description = FALSE
:
dataset <- extract_values(rowcolr_example("example.xlsx"),
row_pattern = ".*_row$",
col_pattern = ".*_col$",
clean_description = FALSE)
head(dataset |>
dplyr::select(c(row_label, col_label, description)))
#> # A tibble: 6 × 3
#> row_label col_label description
#> <chr> <chr> <chr>
#> 1 name_row value_col name_row_value_col
#> 2 year_row value_col year_row_value_col
#> 3 kvk_row value_col kvk_row_value_col
#> 4 test1_row 2026_col test1_row_2026_col
#> 5 tes2_row 2026_col tes2_row_2026_col
#> 6 int_fixed_assets_row 2025_col int_fixed_assets_row_2025_col