Consensus.LGInform module
- class Consensus.LGInform.LGInform(api_key=None, api_secret=None, proxies={}, area='E09000023,Lewisham_CIPFA_Near_Neighbours')
Bases:
object
The class takes a dictionary of LG Inform datasets (such as {‘IMD_2010’: 841, ‘IMD_2009’: 842, ‘Death_of_enterprises’: 102}), finds all metrics, downloads the data, and merges them into one. The dictionary keys can be any string of your choosing, but the integer values must be one of https://webservices.esd.org.uk/datasets?ApplicationKey=ExamplePPK&Signature=YChwR9HU0Vbg8KZ5ezdGZt+EyL4= The main method to download data for multiple datasets is the
mp_download()
method, which uses multiprocessing to download data from multiple datasets simultaneously. However, this requires that the class is called withinif __name__ == '__main__'
. If multiprocessing is not necessary, it’s better to usedownload()
method, which is what the multiprocessing wrapper method also calls.- api_key
Application Key to LG Inform Plus.
- Type:
str
- api_secret
Application Secret to LG Inform Plus.
- Type:
str
- proxies
Proxy address if known.
- Type:
Dict[str, str]
- area
A comma separated string of areas, excluding whitespace. You can either use GSS codes or use LG Inform’s off-the-shelf groups for areas. For instance, Lewisham GSS code is E09000023 and it’s CIPFA nearest neighbours is called Lewisham_CIPFA_Near_Neighbours. Together these would be input as ‘E09000023,Lewisham_CIPFA_Near_Neighbours’.
- Type:
str
- json_to_pandas(json_data
JSONDict): Transform downloaded json data to Pandas dataframe.
- sign_url(url
str): Sign all url calls with your unique secret and key.
- download_variable_data(identifier
int, latest_n: int): Download data for a given metricType, area, and period.
- download_data_for_many_variables(variables
JSONDict, latest_n: int = 20, arraytype: str = ‘metricType-array’): Download the variables for an array of metricTypes.
- get_dataset_table_variables(dataset
int): Given a dataset, output all the metricType numbers (dataset columns).
- format_tables(outputs
List[JSONDict], drop_discontinued: bool = True): Format the data for each variable and create a metadata table.
- merge_tables(dataset_name
str): Merge the variables to form a table for a given dataset.
- download(datasets
Dict[str, int], output_folder: Path, latest_n: int = 5, drop_discontinued: bool = True): Download data for one or more datasets.
- mp_download(datasets
Dict[str, int], output_folder: Path, latest_n: int = 20, drop_discontinued: bool = True, max_workers: int = 8): Multiprocessing wrapper to download data for multiple datasets simultaneously.
Usage:
from Consensus.LGInform import LGInform from Consensus.ConfigManager import ConfigManager from dotenv import load_dotenv from os import environ from pathlib import Path dotenv_path = Path('.env') load_dotenv(dotenv_path) lg_key = environ.get("LG_KEY") # public key to LG Inform Plus lg_secret = environ.get("LG_SECRET") # secret to LG Inform Plus conf = ConfigManager() # Use ConfigManager to save environment variables and proxy address if you want the information to be stored with this package conf.update_config("lg_inform_key", lg_key) conf.update_config("lg_inform_secret", lg_secret) out_folder = Path('./data/mp_test/') # folder to store final data datasets = {'IMD_2010': 841, 'IMD_2009': 842, 'Death_of_enterprises': 102} # a dictionary of datasets. The key can be any string, but the integer value must be an identifier from https://webservices.esd.org.uk/datasets?ApplicationKey=ExamplePPK&Signature=YChwR9HU0Vbg8KZ5ezdGZt+EyL4= if __name__ '__main__': # when using the multiprocessing wrapper method, you have to run it under if __name__ '__main__' statement. api_call = LGInform(area='E09000023,Lewisham_CIPFA_Near_Neighbours') #api_call.download(datasets=datasets, output_folder=out_folder, latest_n=20, drop_discontinued=False) # normal, single threaded download api_call.mp_download(datasets, output_folder=out_folder, latest_n=20, drop_discontinued=False, max_workers=8)
- __init__(api_key=None, api_secret=None, proxies={}, area='E09000023,Lewisham_CIPFA_Near_Neighbours')
Initialise the class with API key, secret, and proxy address.
- Parameters:
api_key (str) – Application Key to LG Inform Plus.
api_secret (str) – Application Secret to LG Inform Plus.
proxies (Dict[str, str]) – Proxy address if known.
area (str) – A comma separated string of areas, excluding whitespace. You can either use GSS codes or use LG Inform’s off-the-shelf groups for areas. For instance, Lewisham GSS code is E09000023 and it’s CIPFA nearest neighbours is called Lewisham_CIPFA_Near_Neighbours. Together these would be input as ‘E09000023,Lewisham_CIPFA_Near_Neighbours’.
- Returns:
None
- _multiprocessing_wrapper(input_queue)
This is just the same as download() method, but wrapped to be used with multiprocessing library.
- Parameters:
input_queue (mp.Queue) – A multiprocessing queue.
- Return type:
None
- Returns:
None
- download(datasets, output_folder, latest_n=5, drop_discontinued=True)
Download all variables for many datasets, merging the variables to one table by area and time period.
- Parameters:
datasets (Dict[str,int]) – Dictionary of format {“some_name”: some_integer}’, where the integer value is an identifier from https://webservices.esd.org.uk/datasets?ApplicationKey=ExamplePPK&Signature=YChwR9HU0Vbg8KZ5ezdGZt+EyL4=
latest_n (int) – The period is currently restricted to using the latest n periods. This means that the period can be years, quarters, months, weeks or some other period (e.g. for Indices of Multiple Deprivation, the period refers to publications so that latest_n=2 would get data for 2019 and 2015).
drop_discontinued (bool) – If you set this to False, the downloaded data will include discontinued metrics. Default is True.
- Return type:
None
- Returns:
None
- download_data_for_many_variables(variables, latest_n=20, arraytype='metricType-array')
Download the variables for an array of metricTypes using download_variable_data method.
- Parameters:
variables (JSONDict) – variables JSON from get_dataset_table_variables method.
latest_n (int) – Latest n periods. Period could be year, quarter, month, week, or some other period such as the latest n publications.
arraytype (str) – Type of variables to download. Default is metricType-array.
- Returns:
A list of JSON variables.
- Return type:
List[JSONDict]
- download_variable_data(identifier, latest_n)
Download data for a given metricType, area, and period (latest n periods).
- Parameters:
identifier (int) – metricType integer.
latest_n (int) – Latest n periods. Period could be year, quarter, month, week, or some other period such as the latest n publications.
- Returns:
Downloaded data as JSON.
- Return type:
JSONDict
- format_tables(outputs, drop_discontinued=True)
Format the data for each variable and create a metadata table.
- Parameters:
outputs (List[JSONDict]) – A list of JSONDict objects.
drop_discontinued (bool) – Boolean to select whether to include discontinued metrics.
- Return type:
None
- Returns:
None
- get_dataset_table_variables(dataset)
Given a dataset, output all the metricType numbers (dataset columns). The output dictionary is a JSON.
- Parameters:
dataset (int) – The number of the dataset from https://webservices.esd.org.uk/datasets?ApplicationKey=ExamplePPK&Signature=YChwR9HU0Vbg8KZ5ezdGZt+EyL4=
- Returns:
A JSON dictionary object
- Return type:
JSONDict
- json_to_pandas(json_data)
Transform downloaded json data to Pandas.
- Parameters:
json_data (JSONDict) – JSON data to transform.
- Returns:
Downloaded data as Pandas dataframe.
- Return type:
pd.DataFrame
- merge_tables(dataset_name)
Merge the variables to form a table for a given dataset.
- Parameters:
dataset_name (str) – Dataset name string.
- Returns:
All variables of the dataset merged as one Pandas dataframe.
- Return type:
pd.DataFrame
- mp_download(datasets, output_folder, latest_n=20, drop_discontinued=True, max_workers=8)
Multiprocessing method for downloading data for multiple datasets. Use max_workers to split the dataset dictionary to chunks of size max_workers.
- Parameters:
datasets (Dict[str,int]) – Dictionary of format {“some_name”: some_integer}’, where the integer value is an identifier from https://webservices.esd.org.uk/datasets?ApplicationKey=ExamplePPK&Signature=YChwR9HU0Vbg8KZ5ezdGZt+EyL4=
latest_n (int) – The period is currently restricted to using the latest n periods. This means that the period can be years, quarters, months, weeks or some other period (e.g. for Indices of Multiple Deprivation, the period refers to publications so that latest_n=2 would get data for 2019 and 2015).
drop_discontinued (bool) – If you set this to False, the downloaded data will include discontinued metrics. Default is True.
max_workers (int) – Set the number of workers for multiprocessing. Typically this would be the number of logical CPUs in your system. This will also process the datasets in chunks, so that if you list 16 datasets in your datasets dictionary and have 8 workers, the script will work through the datasets in two steps (16/8 = 2).
- Return type:
None
- Returns:
None
- sign_url(url)
Each url needs to be signed.
- Parameters:
url (str) – URL to be signed.
- Returns:
Signed URL.
- Return type:
str