Skip to content

abstract_lut

LutDataLoader(bounds, params)

Bases: DataLoaderInterface

Abstract class for all LookUp Table Datasets.

This is where large-scale operations are performed, such as importing data, downsampling, reprojecting, and renaming variables

Parameters:

Name Type Description Default
bounds Boundary

Initial mesh boundary to limit scope of data ingest

required
params dict

Values needed by dataloader to initialise. Unique to each dataloader

required

Attributes:

Name Type Description
self.data DataFrame

Data stored by dataloader to use when called upon by the mesh. Must be saved in mercator projection (EPSG:4326), with columns 'geometry' and data_name.

self.data_name str

Name of scalar variable. Must be the column name in the dataframe

Raises:

Type Description
ValueError

If no data lies within the parsed boundary

add_default_params(params)

Set default values for all LUT dataloaders. This function should be overloaded to include any extra params for a specific dataloader

Parameters:

Name Type Description Default
params dict

Dictionary containing attributes that are required for each dataloader.

required

Returns:

Type Description
dict

Dictionary of attributes the dataloader will require, completed with default values if not provided in config.

calculate_coverage(bounds, data=None)

Calculates percentage of boundary covered by dataset

Parameters:

Name Type Description Default
bounds Boundary

Boundary being compared against

required
data DataFrame

Dataset with shapely polygons in 'geometry' column Defaults to objects internal dataset.

None

Returns:

Name Type Description
float

Decimal fraction of boundary covered by the dataset

downsample()

Downsampling not supported by LookUpTable Dataloader

get_data_col_name()

Retrieve name of data column. Used for when data_name not defined in params.

Returns:

Name Type Description
str

Name of data column

Raises:

Type Description
AssertionError

If multiple possible data columns found, can't retrieve data name

get_hom_condition(bounds, splitting_conds, data=None)

Retrieves homogeneity condition of data within boundary.

Parameters:

Name Type Description Default
bounds Boundary

Boundary object with limits of datarange to analyse

required
splitting_conds dict

Containing the following keys:

'boundary': (boolean) True if user wants to split when polygon boundary goes through bounds

required

Returns:

Name Type Description
str

The homogeniety condtion returned is of the form:

'CLR' = the boundary is completely contained within the LUT regions, no need to split

'MIN' = the boundary contains no LUT data, can't split

'HET' = the boundary contains an edge within the LUT data, should split

get_value(bounds, agg_type=None, skipna=False, data=None)

Retrieve aggregated value from within bounds

Parameters:

Name Type Description Default
aggregation_type str

Method of aggregation of datapoints within bounds. Can be upper or lower case. Accepts 'MIN', 'MAX', 'MEAN', 'MEDIAN', 'STD', 'COUNT'

required
bounds Boundary

Boundary object with limits of lat/long

required
skipna bool

Defines whether to propogate NaN's or not Default = False (includes NaN's)

False

Returns:

Name Type Description
dict

{variable (str): aggregated_value (float)} Aggregated value within bounds following aggregation_type

Raises:

Type Description
ValueError

aggregation type not in list of available methods

import_data(bounds) abstractmethod

User defined method for importing data from files, or even generating data from scratch

Returns:

Type Description

pd.DataFrame: Coordinates and data being imported from file

if pd.DataFrame, - Must have columns 'geometry' and data_name - Must have single data column

reproject()

Reprojection not supported by LookUpTable Dataloader

set_data_col_name(new_name)

Sets name of data column/data variable

Parameters:

Name Type Description Default
name str

Name to replace currently stored name with

required

Returns:

Type Description

pd.DataFrame: Data with variable name changed

trim_datapoints(bounds, data=None)

Trims datapoints from self.data within boundary defined by 'bounds'. self.data can be pd.DataFrame or xr.Dataset

Parameters:

Name Type Description Default
bounds Boundary

Limits of lat/long/time to select data from

required

Returns:

Type Description

pd.DataFrame: Trimmed dataset in same format as self.data

verify_data(data=None)

Verifies that all geometries read in are Polygons or MultiPolygons If MultiPolygon, then split out into multiple Polygons

Parameters:

Name Type Description Default
data DataFrame

DataFrame with at least columns 'geometry' and a variable. Defaults to dataloader's data attribute.

None

Raises:

Type Description
ValueError

If read in a geometry that is not Polygon or MultiPolygon