abstract_lut
LutDataLoader(bounds, params)
¶
Bases: DataLoaderInterface
Abstract class for all LookUp Table Datasets.
This is where large-scale operations are performed, such as importing data, downsampling, reprojecting, and renaming variables
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bounds
|
Boundary
|
Initial mesh boundary to limit scope of data ingest |
required |
params
|
dict
|
Values needed by dataloader to initialise. Unique to each dataloader |
required |
Attributes:
| Name | Type | Description |
|---|---|---|
self.data |
DataFrame
|
Data stored by dataloader to use when called upon by the mesh. Must be saved in mercator projection (EPSG:4326), with columns 'geometry' and data_name. |
self.data_name |
str
|
Name of scalar variable. Must be the column name in the dataframe |
Raises:
| Type | Description |
|---|---|
ValueError
|
If no data lies within the parsed boundary |
add_default_params(params)
¶
Set default values for all LUT dataloaders. This function should be overloaded to include any extra params for a specific dataloader
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
params
|
dict
|
Dictionary containing attributes that are required for each dataloader. |
required |
Returns:
| Type | Description |
|---|---|
dict
|
Dictionary of attributes the dataloader will require, completed with default values if not provided in config. |
calculate_coverage(bounds, data=None)
¶
Calculates percentage of boundary covered by dataset
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bounds
|
Boundary
|
Boundary being compared against |
required |
data
|
DataFrame
|
Dataset with shapely polygons in 'geometry' column Defaults to objects internal dataset. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
float |
Decimal fraction of boundary covered by the dataset |
downsample()
¶
Downsampling not supported by LookUpTable Dataloader
get_data_col_name()
¶
Retrieve name of data column. Used for when data_name not defined in params.
Returns:
| Name | Type | Description |
|---|---|---|
str |
Name of data column |
Raises:
| Type | Description |
|---|---|
AssertionError
|
If multiple possible data columns found, can't retrieve data name |
get_hom_condition(bounds, splitting_conds, data=None)
¶
Retrieves homogeneity condition of data within boundary.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bounds
|
Boundary
|
Boundary object with limits of datarange to analyse |
required |
splitting_conds
|
dict
|
Containing the following keys: 'boundary':
|
required |
Returns:
| Name | Type | Description |
|---|---|---|
str |
The homogeniety condtion returned is of the form: 'CLR' = the boundary is completely contained within the LUT regions, no need to split 'MIN' = the boundary contains no LUT data, can't split 'HET' = the boundary contains an edge within the LUT data, should split |
get_value(bounds, agg_type=None, skipna=False, data=None)
¶
Retrieve aggregated value from within bounds
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
aggregation_type
|
str
|
Method of aggregation of datapoints within bounds. Can be upper or lower case. Accepts 'MIN', 'MAX', 'MEAN', 'MEDIAN', 'STD', 'COUNT' |
required |
bounds
|
Boundary
|
Boundary object with limits of lat/long |
required |
skipna
|
bool
|
Defines whether to propogate NaN's or not Default = False (includes NaN's) |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
dict |
{variable (str): aggregated_value (float)} Aggregated value within bounds following aggregation_type |
Raises:
| Type | Description |
|---|---|
ValueError
|
aggregation type not in list of available methods |
import_data(bounds)
abstractmethod
¶
User defined method for importing data from files, or even generating data from scratch
Returns:
| Type | Description |
|---|---|
|
pd.DataFrame: Coordinates and data being imported from file if pd.DataFrame, - Must have columns 'geometry' and data_name - Must have single data column |
reproject()
¶
Reprojection not supported by LookUpTable Dataloader
set_data_col_name(new_name)
¶
Sets name of data column/data variable
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Name to replace currently stored name with |
required |
Returns:
| Type | Description |
|---|---|
|
pd.DataFrame: Data with variable name changed |
trim_datapoints(bounds, data=None)
¶
Trims datapoints from self.data within boundary defined by 'bounds'. self.data can be pd.DataFrame or xr.Dataset
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bounds
|
Boundary
|
Limits of lat/long/time to select data from |
required |
Returns:
| Type | Description |
|---|---|
|
pd.DataFrame: Trimmed dataset in same format as self.data |
verify_data(data=None)
¶
Verifies that all geometries read in are Polygons or MultiPolygons If MultiPolygon, then split out into multiple Polygons
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
DataFrame with at least columns 'geometry' and a variable. Defaults to dataloader's data attribute. |
None
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If read in a geometry that is not Polygon or MultiPolygon |