abstract_vector

`VectorDataLoader(bounds, params)` ¶

Bases: DataLoaderInterface

Abstract class for all vector Datasets.

This is where large-scale operations are performed, such as importing data, downsampling, reprojecting, and renaming variables

Parameters:

Name	Type	Description	Default
`bounds`	`Boundary`	Initial mesh boundary to limit scope of data ingest	required
`params`	`dict`	Values needed by dataloader to initialise. Unique to each dataloader	required

Attributes:

Name	Type	Description
`self.data`	`DataFrame or Dataset`	Data stored by dataloader to use when called upon by the mesh. Must be saved in mercator projection (EPSG:4326), with coordinates names 'lat', 'long', and 'time' (if applicable).
`self.data_name`	`str`	Name of scalar variable. Must be the column name if self.data is pd.DataFrame. Must be variable if self.data is xr.Dataset

`add_default_params(params)` ¶

Set default values for all scalar dataloaders. This function should be overloaded to include any extra params for a specific dataloader

Parameters:

Name	Type	Description	Default
`params`	`dict`	Dictionary containing attributes that are required for each dataloader.	required

Returns:

Type	Description
`dict`	Dictionary of attributes the dataloader will require, completed with default values if not provided in config.

`add_mag_dir(data=None, data_names=None)` ¶

Adds magnitude and direction variables/columns to data for easier retrieval of value

Parameters:

Name	Type	Description	Default
`data`	`DataFrame or Dataset`	Data with 'lat' and 'long' columns/dimensions. Assumes that the existing data is in cartesian form (x and y components). If None, will use self.data	`None`
`data_names`	`list`	List of data columns/variables to use in calculation If None, will use self.data_name_list	`None`

Returns:

Name	Type	Description
`data`	`DataFrame or Dataset`	Original dataset with two new columns/variables called '_magnitude' and '_direction', containing the corresponding values for each.

`calc_curl(bounds, data=None, collapse=True, agg_type='MAX')` ¶

Calculates the curl of vectors in a cellbox

Parameters:

Name	Type	Description	Default
`bounds`	`Boundary`	Cellbox boundary in which all relevant vectors are contained	required
`data`	`DataFrame or Dataset`	Dataset with 'lat' and 'long' columns/dimensions with vectors	`None`
`collapes`	`bool`	Flag determining whether to return an aggregated value, or a vector field (values for each individual vector).	required
`agg_type`	`str`	Method of aggregation if collapsing value. Accepts 'MAX' or 'MEAN'	`'MAX'`

Returns:

Type	Description
	float or pd.DataFrame: float value of aggregated curl if collapse=True, or pd.DataFrame of curl vector field if collapse=False

Raises:

Type	Description
`ValueError`	If agg_type is not 'MAX' or 'MEAN'

`calc_dmag(bounds, data=None, collapse=True, agg_type='MEAN')` ¶

Calculates the dmag of vectors in a cellbox. dmag is defined as being the difference in magnitudes between each vector and the average vector within the bounds.

dmag = mag(vector - mean_vector)

Parameters:

Name	Type	Description	Default
`bounds`	`Boundary`	Cellbox boundary in which all relevant vectors are contained	required
`data`	`DataFrame or Dataset`	Dataset with 'lat' and 'long' columns/dimensions with vectors	`None`
`collapes`	`bool`	Flag determining whether to return an aggregated value, or a vector field (values for each individual vector).	required
`agg_type`	`str`	Method of aggregation if collapsing value. Accepts 'MAX' or 'MEAN'	`'MEAN'`

Returns:

Type	Description
	float or pd.DataFrame: float value of aggregated dmag if collapse=True, or pd.DataFrame of dmag vector field if collapse=False

Raises:

Type	Description
`ValueError`	If agg_type is not 'MAX' or 'MEAN'

`calculate_coverage(bounds, data=None)` ¶

Calculates percentage of boundary covered by dataset

Parameters:

Name	Type	Description	Default
`bounds`	`Boundary`	Boundary being compared against	required
`data`	`DataFrame or Dataset`	Dataset with 'lat' and 'long' coordinates. Extent calculated from min/max of these coordinates. Defaults to objects internal dataset.	`None`

Returns:

Name	Type	Description
`float`		Decimal fraction of boundary covered by the dataset

`downsample(agg_type=None)` ¶

Downsamples imported data to be more easily manipulated. Data size should be reduced by a factor of m*n, where (m,n) are the downsample_factors defined in the params. self.data can be pd.DataFrame or xr.Dataset

Parameters:

Name	Type	Description	Default
`agg_type`	`str`	Method of aggregation to bin data by to downsample. Default is same method used for homogeneity condition.	`None`

Returns:

Type	Description
	xr.Dataset or pd.DataFrame: Downsampled data

`get_data_col_name()` ¶

Retrieve name of data column (for pd.DataFrame), or variable (for xr.Dataset). Used for when data_name not defined in params. Variable names are appended and comma seperated

Returns:

Name	Type	Description
`str`		Name of data columns, comma seperated

`get_data_col_name_list()` ¶

Retrieve names of data columns (for pd.DataFrame), or variable (for xr.Dataset). Used for when data_name not defined in params.

Returns:

Name	Type	Description
`list`		Contains strings of data namesk

`get_hom_condition(bounds, splitting_conds, agg_type='MEAN', data=None)` ¶

Retrieves homogeneity condition of data within boundary.

Parameters:

Name	Type	Description	Default
`bounds`	`Boundary`	Boundary object with limits of datarange to analyse	required
`splitting_conds`	`dict`	Containing the following keys: 'threshold': `(float)` The threshold at which data points of type 'value' within this CellBox are checked to be either above or below	required

Returns:

Name	Type	Description
`str`		The homogeniety condtion returned is of the form: 'MIN' = the cellbox contains less than a minimum number of data points 'HET' = Threshold values defined in config are exceeded 'CLR' = None of the HET conditions were triggered

`get_value(bounds, agg_type=None, skipna=True, data=None)` ¶

Retrieve aggregated value from within bounds

Parameters:

Name	Type	Description	Default
`aggregation_type`	`str`	Method of aggregation of datapoints within bounds. Can be upper or lower case. Accepts 'MIN', 'MAX', 'MEAN', 'MEDIAN', 'STD', 'COUNT'	required
`bounds`	`Boundary`	Boundary object with limits of lat/long	required
`skipna`	`bool`	Defines whether to propogate NaN's or not Default = True (ignore's NaN's)	`True`

Returns:

Name	Type	Description
`dict`		{variable (str): aggregated_value (float)} Aggregated value within bounds following aggregation_type

Raises:

Type	Description
`ValueError`	aggregation type not in list of available methods

`import_data(bounds)` `abstractmethod` ¶

User defined method for importing data from files, or even generating data from scratch

Returns:

Type	Description
	xr.Dataset or pd.DataFrame: Coordinates and data being imported from file if xr.Dataset, - Must have coordinates 'lat' and 'long' - Should have multiple data variables if pd.DataFrame, - Must have columns 'lat' and 'long' - Should have multiple data columns Downsampling and reprojecting happen in init() method

`reproject(in_proj='EPSG:4326', out_proj='EPSG:4326', x_col='lat', y_col='long')` ¶

Reprojects data using pyProj.Transformer self.data can be pd.DataFrame or xr.Dataset

Parameters:

Name	Type	Description	Default
`in_proj`	`str`	Projection that the imported dataset is in Must be allowed by PyProj.CRS (Coordinate Reference System)	`'EPSG:4326'`
`out_proj`	`str`	Projection required for final data output Must be allowed by PyProj.CRS (Coordinate Reference System) Shouldn't change from default value (EPSG:4326)	`'EPSG:4326'`
`x_col`	`str`	Name of coordinate column 1	`'lat'`
`y_col`	`str`	Name of coordinate column 2 x_col and y_col will be cast into lat and long by the reprojection	`'long'`

Returns:

Type	Description
	pd.DataFrame: Reprojected data with 'lat', 'long' columns replacing 'x_col' and 'y_col'

`set_data_col_name(new_names)` ¶

Sets name of data column/data variables from a comma-seperated string

Parameters:

Name	Type	Description	Default
`name_dict`	`dict`	Dictionary mapping old variable names to new variable names, of the form {old_name (str): new_name (str)}	required

Returns:

Type	Description
	xr.Dataset or pd.DataFrame: Data with variable name changed

`set_data_col_name_list(new_names)` ¶

Sets name of data column/data variables from a list of strings. Also updates self.data_name_list with new names from list

Parameters:

Name	Type	Description	Default
`new_names`	`list`	List of strings containing new variable names	required

Returns:

Type	Description
	pd.DataFrame or xr.Dataset: Original dataset with data variables renamed

`trim_datapoints(bounds, data=None)` ¶

Trims datapoints from self.data within boundary defined by 'bounds'. self.data can be pd.DataFrame or xr.Dataset

Parameters:

Name	Type	Description	Default
`bounds`	`Boundary`	Limits of lat/long/time to select data from	required

Returns:

Type	Description
	pd.DataFrame or xr.Dataset: Trimmed dataset in same format as self.data

abstract_vector

VectorDataLoader(bounds, params) ¶

add_default_params(params) ¶

add_mag_dir(data=None, data_names=None) ¶

calc_curl(bounds, data=None, collapse=True, agg_type='MAX') ¶

calc_dmag(bounds, data=None, collapse=True, agg_type='MEAN') ¶

calculate_coverage(bounds, data=None) ¶

downsample(agg_type=None) ¶

get_data_col_name() ¶

get_data_col_name_list() ¶

get_hom_condition(bounds, splitting_conds, agg_type='MEAN', data=None) ¶

get_value(bounds, agg_type=None, skipna=True, data=None) ¶

import_data(bounds) abstractmethod ¶

reproject(in_proj='EPSG:4326', out_proj='EPSG:4326', x_col='lat', y_col='long') ¶

set_data_col_name(new_names) ¶

set_data_col_name_list(new_names) ¶

trim_datapoints(bounds, data=None) ¶

`VectorDataLoader(bounds, params)` ¶

`add_default_params(params)` ¶

`add_mag_dir(data=None, data_names=None)` ¶

`calc_curl(bounds, data=None, collapse=True, agg_type='MAX')` ¶

`calc_dmag(bounds, data=None, collapse=True, agg_type='MEAN')` ¶

`calculate_coverage(bounds, data=None)` ¶

`downsample(agg_type=None)` ¶

`get_data_col_name()` ¶

`get_data_col_name_list()` ¶

`get_hom_condition(bounds, splitting_conds, agg_type='MEAN', data=None)` ¶

`get_value(bounds, agg_type=None, skipna=True, data=None)` ¶

`import_data(bounds)` `abstractmethod` ¶

`reproject(in_proj='EPSG:4326', out_proj='EPSG:4326', x_col='lat', y_col='long')` ¶

`set_data_col_name(new_names)` ¶

`set_data_col_name_list(new_names)` ¶

`trim_datapoints(bounds, data=None)` ¶