Locate Event
locate_event.Rd
Locate Event
Usage
locate_event(
text,
landmark_gazetteer,
landmark_gazetteer.name_var = "name",
landmark_gazetteer.type_var = "type",
roads,
roads.name_var = "name",
areas,
areas.name_var = "name",
event_words,
prepositions_list = list(c("at", "next to", "around", "just after", "opposite", "opp",
"apa", "hapa", "happened at", "just before", "at the", "outside", "right before"),
c("near", "after", "toward", "along", "towards", "approach"), c("past", "from",
"on")),
junction_words = c("intersection", "junction"),
false_positive_phrases = "",
type_list = NULL,
clost_dist_thresh = 500,
fuzzy_match = TRUE,
fuzzy_match.min_word_length = c(5, 11),
fuzzy_match.dist = c(1, 2),
fuzzy_match.ngram_max = 3,
fuzzy_match.first_letters_same = TRUE,
fuzzy_match.last_letters_same = TRUE,
quiet = TRUE,
mc_cores = 1
)
Arguments
- text
Vector of texts to be geolocated.
- landmark_gazetteer
sf
spatial data.frame representing landmarks.- landmark_gazetteer.name_var
Name of variable indicating
name
of landmark.- landmark_gazetteer.type_var
Name of variable indicating
type
of landmark.- roads
sf
spatial data.frame representing roads.- roads.name_var
Name of variable indicating
name
of road.- areas
sf
spatial data.frame representing areas, such as administrative areas or neighborhoods.- areas.name_var
Name of variable indicating
name
of area.- event_words
Vector of event words, representing events to be geocoded.
- prepositions_list
List of vectors of prepositions. Order of list determines order of preposition precedence. (Default:
list(c("at", "next to","around", "just after", "opposite","opp", "apa", "hapa","happened at", "just before","at the","outside", "right before"), c("near", "after", "toward", "along", "towards", "approach"), c("past","from","on"))
).- junction_words
Vector of junction words to check for when determining intersection of roads. (Default:
c("intersection", "junction")
).- false_positive_phrases
Common words found in text that include spurious location references (eg, githurai bus is the name of a bus, but githurai is also a place). These may be common phrases that should be checked and ignored in the text. (Default:
""
).- type_list
List of vectors of types. Order of list determines order or type precedence. (Default:
NULL
).- clost_dist_thresh
Distance (meters) as to what is considered "close"; for example, when considering whether a landmark is close to a road. (Default:
500
).- fuzzy_match
Whether to implement fuzzy matching of landmarks using levenstein distance. (Default:
TRUE
).- fuzzy_match.min_word_length
Minimum word length to use for fuzzy matching; vector length must be the same as
fuzzy_match.dist
. (Default:c(5,11)
).- fuzzy_match.dist
Allowable levenstein distances for fuzzy matching; vector length must be same as
fuzzy_match.min_word_length
. (Default:c(1,2)
).- fuzzy_match.ngram_max
The number of n-grams that should be extracted from text to calculate a levensteing distance against landmarks. For example, if the text is composed of 5 words: w1 w2 w3 w4 and
fuzzy_match.ngram_max = 3
, the function extractsw1 w2 w3
and compares the levenstein distance to all landmarks. Then in checksw2 w3 w4
, etc. (Default:3
).- fuzzy_match.first_letters_same
When implementing a fuzzy match, should the first letter of the original and found word be the same? (Default:
TRUE
).- fuzzy_match.last_letters_same
When implementing a fuzzy match, should the last letter of the original and found word be the same? (Default:
TRUE
).- quiet
If
FALSE
, prints text that is being geocoded. (Default:TRUE
).- mc_cores
If > 1, uses geolocates events in parallel across multiple cores relying on the
parallel
package. (Default:1
).
Examples
library(ulex)
library(sf)
#> Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
## Landmarks
landmarks_sf <- data.frame(lat = runif(3),
lon = runif(3),
name = c("restaurant", "bank", "hotel"),
type = c("poi", "poi", "poi")) |>
st_as_sf(coords = c("lon", "lat"),
crs = 4326)
## Road
coords <- matrix(runif(4), ncol = 2)
road_sf <- coords |>
st_linestring() |>
st_sfc(crs = 4326)
road_sf <- st_sf(geometry = road_sf)
road_sf$name <- "main st"
## Area
n <- 5
coords <- matrix(runif(2 * n, min = 0, max = 10), ncol = 2)
coords <- rbind(coords, coords[1,])
polygon <- st_polygon(list(coords))
area_sf <- st_sfc(polygon, crs = 4326)
area_sf <- st_sf(geometry = area_sf)
area_sf$name <- "place"
## Locate Event
event_sf <- locate_event(text = "accident near hotel",
landmark_gazetteer = landmarks_sf,
roads = road_sf,
areas = area_sf,
event_words = c("accident", "crash"))
#> Warning: phrase is not in sentence
#> Warning: phrase is not in sentence
#> Warning: phrase is not in sentence
#> Warning: phrase is not in sentence
#> Warning: phrase is not in sentence
#> Warning: phrase is not in sentence
#> Warning: phrase is not in sentence
#> Warning: phrase is not in sentence
#> Warning: phrase is not in sentence
#> Warning: phrase is not in sentence
#> Warning: phrase is not in sentence
#> Warning: phrase is not in sentence
#> Warning: phrase is not in sentence
#> Warning: phrase is not in sentence
#> Warning: phrase is not in sentence
#> Warning: phrase is not in sentence
#> Warning: phrase is not in sentence
#> Warning: phrase is not in sentence
#> Warning: phrase is not in sentence
#> Warning: phrase is not in sentence
#> Warning: phrase is not in sentence
#> Warning: phrase is not in sentence