abstract_vector
VectorDataLoader(bounds, params)
¶
Bases: DataLoaderInterface
Abstract class for all vector Datasets.
This is where large-scale operations are performed, such as importing data, downsampling, reprojecting, and renaming variables
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bounds
|
Boundary
|
Initial mesh boundary to limit scope of data ingest |
required |
params
|
dict
|
Values needed by dataloader to initialise. Unique to each dataloader |
required |
Attributes:
| Name | Type | Description |
|---|---|---|
self.data |
DataFrame or Dataset
|
Data stored by dataloader to use when called upon by the mesh. Must be saved in mercator projection (EPSG:4326), with coordinates names 'lat', 'long', and 'time' (if applicable). |
self.data_name |
str
|
Name of scalar variable. Must be the column name if self.data is pd.DataFrame. Must be variable if self.data is xr.Dataset |
add_default_params(params)
¶
Set default values for all scalar dataloaders. This function should be overloaded to include any extra params for a specific dataloader
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
params
|
dict
|
Dictionary containing attributes that are required for each dataloader. |
required |
Returns:
| Type | Description |
|---|---|
dict
|
Dictionary of attributes the dataloader will require, completed with default values if not provided in config. |
add_mag_dir(data=None, data_names=None)
¶
Adds magnitude and direction variables/columns to data for easier retrieval of value
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame or Dataset
|
Data with 'lat' and 'long' columns/dimensions. Assumes that the existing data is in cartesian form (x and y components). If None, will use self.data |
None
|
data_names
|
list
|
List of data columns/variables to use in calculation If None, will use self.data_name_list |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
data |
DataFrame or Dataset
|
Original dataset with two new columns/variables called '_magnitude' and '_direction', containing the corresponding values for each. |
calc_curl(bounds, data=None, collapse=True, agg_type='MAX')
¶
Calculates the curl of vectors in a cellbox
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bounds
|
Boundary
|
Cellbox boundary in which all relevant vectors are contained |
required |
data
|
DataFrame or Dataset
|
Dataset with 'lat' and 'long' columns/dimensions with vectors |
None
|
collapes
|
bool
|
Flag determining whether to return an aggregated value, or a vector field (values for each individual vector). |
required |
agg_type
|
str
|
Method of aggregation if collapsing value. Accepts 'MAX' or 'MEAN' |
'MAX'
|
Returns:
| Type | Description |
|---|---|
|
float or pd.DataFrame: float value of aggregated curl if collapse=True, or pd.DataFrame of curl vector field if collapse=False |
Raises:
| Type | Description |
|---|---|
ValueError
|
If agg_type is not 'MAX' or 'MEAN' |
calc_dmag(bounds, data=None, collapse=True, agg_type='MEAN')
¶
Calculates the dmag of vectors in a cellbox. dmag is defined as being the difference in magnitudes between each vector and the average vector within the bounds.
dmag = mag(vector - mean_vector)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bounds
|
Boundary
|
Cellbox boundary in which all relevant vectors are contained |
required |
data
|
DataFrame or Dataset
|
Dataset with 'lat' and 'long' columns/dimensions with vectors |
None
|
collapes
|
bool
|
Flag determining whether to return an aggregated value, or a vector field (values for each individual vector). |
required |
agg_type
|
str
|
Method of aggregation if collapsing value. Accepts 'MAX' or 'MEAN' |
'MEAN'
|
Returns:
| Type | Description |
|---|---|
|
float or pd.DataFrame: float value of aggregated dmag if collapse=True, or pd.DataFrame of dmag vector field if collapse=False |
Raises:
| Type | Description |
|---|---|
ValueError
|
If agg_type is not 'MAX' or 'MEAN' |
calculate_coverage(bounds, data=None)
¶
Calculates percentage of boundary covered by dataset
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bounds
|
Boundary
|
Boundary being compared against |
required |
data
|
DataFrame or Dataset
|
Dataset with 'lat' and 'long' coordinates. Extent calculated from min/max of these coordinates. Defaults to objects internal dataset. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
float |
Decimal fraction of boundary covered by the dataset |
downsample(agg_type=None)
¶
Downsamples imported data to be more easily manipulated. Data size should be reduced by a factor of m*n, where (m,n) are the downsample_factors defined in the params. self.data can be pd.DataFrame or xr.Dataset
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
agg_type
|
str
|
Method of aggregation to bin data by to downsample. Default is same method used for homogeneity condition. |
None
|
Returns:
| Type | Description |
|---|---|
|
xr.Dataset or pd.DataFrame: Downsampled data |
get_data_col_name()
¶
Retrieve name of data column (for pd.DataFrame), or variable (for xr.Dataset). Used for when data_name not defined in params. Variable names are appended and comma seperated
Returns:
| Name | Type | Description |
|---|---|---|
str |
Name of data columns, comma seperated |
get_data_col_name_list()
¶
Retrieve names of data columns (for pd.DataFrame), or variable (for xr.Dataset). Used for when data_name not defined in params.
Returns:
| Name | Type | Description |
|---|---|---|
list |
Contains strings of data namesk |
get_hom_condition(bounds, splitting_conds, agg_type='MEAN', data=None)
¶
Retrieves homogeneity condition of data within boundary.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bounds
|
Boundary
|
Boundary object with limits of datarange to analyse |
required |
splitting_conds
|
dict
|
Containing the following keys: 'threshold':
|
required |
Returns:
| Name | Type | Description |
|---|---|---|
str |
The homogeniety condtion returned is of the form: 'MIN' = the cellbox contains less than a minimum number of data points 'HET' = Threshold values defined in config are exceeded 'CLR' = None of the HET conditions were triggered |
get_value(bounds, agg_type=None, skipna=True, data=None)
¶
Retrieve aggregated value from within bounds
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
aggregation_type
|
str
|
Method of aggregation of datapoints within bounds. Can be upper or lower case. Accepts 'MIN', 'MAX', 'MEAN', 'MEDIAN', 'STD', 'COUNT' |
required |
bounds
|
Boundary
|
Boundary object with limits of lat/long |
required |
skipna
|
bool
|
Defines whether to propogate NaN's or not Default = True (ignore's NaN's) |
True
|
Returns:
| Name | Type | Description |
|---|---|---|
dict |
{variable (str): aggregated_value (float)} Aggregated value within bounds following aggregation_type |
Raises:
| Type | Description |
|---|---|
ValueError
|
aggregation type not in list of available methods |
import_data(bounds)
abstractmethod
¶
User defined method for importing data from files, or even generating data from scratch
Returns:
| Type | Description |
|---|---|
|
xr.Dataset or pd.DataFrame: Coordinates and data being imported from file if xr.Dataset, - Must have coordinates 'lat' and 'long' - Should have multiple data variables if pd.DataFrame, - Must have columns 'lat' and 'long' - Should have multiple data columns Downsampling and reprojecting happen in init() method |
reproject(in_proj='EPSG:4326', out_proj='EPSG:4326', x_col='lat', y_col='long')
¶
Reprojects data using pyProj.Transformer self.data can be pd.DataFrame or xr.Dataset
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
in_proj
|
str
|
Projection that the imported dataset is in Must be allowed by PyProj.CRS (Coordinate Reference System) |
'EPSG:4326'
|
out_proj
|
str
|
Projection required for final data output Must be allowed by PyProj.CRS (Coordinate Reference System) Shouldn't change from default value (EPSG:4326) |
'EPSG:4326'
|
x_col
|
str
|
Name of coordinate column 1 |
'lat'
|
y_col
|
str
|
Name of coordinate column 2 x_col and y_col will be cast into lat and long by the reprojection |
'long'
|
Returns:
| Type | Description |
|---|---|
|
pd.DataFrame: Reprojected data with 'lat', 'long' columns replacing 'x_col' and 'y_col' |
set_data_col_name(new_names)
¶
Sets name of data column/data variables from a comma-seperated string
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name_dict
|
dict
|
Dictionary mapping old variable names to new variable names, of the form {old_name (str): new_name (str)} |
required |
Returns:
| Type | Description |
|---|---|
|
xr.Dataset or pd.DataFrame: Data with variable name changed |
set_data_col_name_list(new_names)
¶
Sets name of data column/data variables from a list of strings. Also updates self.data_name_list with new names from list
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
new_names
|
list
|
List of strings containing new variable names |
required |
Returns:
| Type | Description |
|---|---|
|
pd.DataFrame or xr.Dataset: Original dataset with data variables renamed |
trim_datapoints(bounds, data=None)
¶
Trims datapoints from self.data within boundary defined by 'bounds'. self.data can be pd.DataFrame or xr.Dataset
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bounds
|
Boundary
|
Limits of lat/long/time to select data from |
required |
Returns:
| Type | Description |
|---|---|
|
pd.DataFrame or xr.Dataset: Trimmed dataset in same format as self.data |