Consensus.LGInform module

class Consensus.LGInform.LGInform(api_key=None, api_secret=None, proxies={}, area='E09000023,Lewisham_CIPFA_Near_Neighbours')

Bases: object

The class takes a dictionary of LG Inform datasets (such as {‘IMD_2010’: 841, ‘IMD_2009’: 842, ‘Death_of_enterprises’: 102}), finds all metrics, downloads the data, and merges them into one. The dictionary keys can be any string of your choosing, but the integer values must be one of https://webservices.esd.org.uk/datasets?ApplicationKey=ExamplePPK&Signature=YChwR9HU0Vbg8KZ5ezdGZt+EyL4= The main method to download data for multiple datasets is the mp_download() method, which uses multiprocessing to download data from multiple datasets simultaneously. However, this requires that the class is called within if __name__ == '__main__'. If multiprocessing is not necessary, it’s better to use download() method, which is what the multiprocessing wrapper method also calls.

api_key

Application Key to LG Inform Plus.

Type:: str

api_secret

Application Secret to LG Inform Plus.

Type:: str

proxies

Proxy address if known.

Type:: Dict[str, str]

area

A comma separated string of areas, excluding whitespace. You can either use GSS codes or use LG Inform’s off-the-shelf groups for areas. For instance, Lewisham GSS code is E09000023 and it’s CIPFA nearest neighbours is called Lewisham_CIPFA_Near_Neighbours. Together these would be input as ‘E09000023,Lewisham_CIPFA_Near_Neighbours’.

Type:: str

json_to_pandas(json_data: JSONDict): Transform downloaded json data to Pandas dataframe.

sign_url(url: str): Sign all url calls with your unique secret and key.

download_variable_data(identifier: int, latest_n: int): Download data for a given metricType, area, and period.

download_data_for_many_variables(variables: JSONDict, latest_n: int = 20, arraytype: str = ‘metricType-array’): Download the variables for an array of metricTypes.

get_dataset_table_variables(dataset: int): Given a dataset, output all the metricType numbers (dataset columns).

format_tables(outputs: List[JSONDict], drop_discontinued: bool = True): Format the data for each variable and create a metadata table.

merge_tables(dataset_name: str): Merge the variables to form a table for a given dataset.

download(datasets: Dict[str, int], output_folder: Path, latest_n: int = 5, drop_discontinued: bool = True): Download data for one or more datasets.

mp_download(datasets: Dict[str, int], output_folder: Path, latest_n: int = 20, drop_discontinued: bool = True, max_workers: int = 8): Multiprocessing wrapper to download data for multiple datasets simultaneously.

Usage:

from Consensus.LGInform import LGInform
from Consensus.ConfigManager import ConfigManager
from dotenv import load_dotenv
from os import environ
from pathlib import Path
dotenv_path = Path('.env')
load_dotenv(dotenv_path)


lg_key = environ.get("LG_KEY")  # public key to LG Inform Plus
lg_secret = environ.get("LG_SECRET")  # secret to LG Inform Plus
conf = ConfigManager()  # Use ConfigManager to save environment variables and proxy address if you want the information to be stored with this package
conf.update_config("lg_inform_key", lg_key)
conf.update_config("lg_inform_secret", lg_secret)
out_folder = Path('./data/mp_test/')  # folder to store final data
datasets = {'IMD_2010': 841, 'IMD_2009': 842, 'Death_of_enterprises': 102}  # a dictionary of datasets. The key can be any string, but the integer value must be an identifier from https://webservices.esd.org.uk/datasets?ApplicationKey=ExamplePPK&Signature=YChwR9HU0Vbg8KZ5ezdGZt+EyL4=

if __name__ '__main__':  # when using the multiprocessing wrapper method, you have to run it under if __name__ '__main__' statement.
    api_call = LGInform(area='E09000023,Lewisham_CIPFA_Near_Neighbours')
    #api_call.download(datasets=datasets, output_folder=out_folder, latest_n=20, drop_discontinued=False)  # normal, single threaded download
    api_call.mp_download(datasets, output_folder=out_folder, latest_n=20, drop_discontinued=False, max_workers=8)

__init__(api_key=None, api_secret=None, proxies={}, area='E09000023,Lewisham_CIPFA_Near_Neighbours')

Initialise the class with API key, secret, and proxy address.

Parameters:

api_key (str) – Application Key to LG Inform Plus.
api_secret (str) – Application Secret to LG Inform Plus.
proxies (Dict[str, str]) – Proxy address if known.
area (str) – A comma separated string of areas, excluding whitespace. You can either use GSS codes or use LG Inform’s off-the-shelf groups for areas. For instance, Lewisham GSS code is E09000023 and it’s CIPFA nearest neighbours is called Lewisham_CIPFA_Near_Neighbours. Together these would be input as ‘E09000023,Lewisham_CIPFA_Near_Neighbours’.

Returns:

None

_multiprocessing_wrapper(input_queue)

This is just the same as download() method, but wrapped to be used with multiprocessing library.

Parameters:: input_queue (mp.Queue) – A multiprocessing queue.
Return type:: None
Returns:: None

download(datasets, output_folder, latest_n=5, drop_discontinued=True)

Download all variables for many datasets, merging the variables to one table by area and time period.

Parameters:

datasets (Dict[str,int]) – Dictionary of format {“some_name”: some_integer}’, where the integer value is an identifier from https://webservices.esd.org.uk/datasets?ApplicationKey=ExamplePPK&Signature=YChwR9HU0Vbg8KZ5ezdGZt+EyL4=
latest_n (int) – The period is currently restricted to using the latest n periods. This means that the period can be years, quarters, months, weeks or some other period (e.g. for Indices of Multiple Deprivation, the period refers to publications so that latest_n=2 would get data for 2019 and 2015).
drop_discontinued (bool) – If you set this to False, the downloaded data will include discontinued metrics. Default is True.

Return type:

None

Returns:

None

download_data_for_many_variables(variables, latest_n=20, arraytype='metricType-array')

Download the variables for an array of metricTypes using download_variable_data method.

Parameters:

variables (JSONDict) – variables JSON from get_dataset_table_variables method.
latest_n (int) – Latest n periods. Period could be year, quarter, month, week, or some other period such as the latest n publications.
arraytype (str) – Type of variables to download. Default is metricType-array.

Returns:

A list of JSON variables.

Return type:

List[JSONDict]

download_variable_data(identifier, latest_n)

Download data for a given metricType, area, and period (latest n periods).

Parameters:

identifier (int) – metricType integer.
latest_n (int) – Latest n periods. Period could be year, quarter, month, week, or some other period such as the latest n publications.

Returns:

Downloaded data as JSON.

Return type:

JSONDict

format_tables(outputs, drop_discontinued=True)

Format the data for each variable and create a metadata table.

Parameters:

outputs (List[JSONDict]) – A list of JSONDict objects.
drop_discontinued (bool) – Boolean to select whether to include discontinued metrics.

Return type:

None

Returns:

None

get_dataset_table_variables(dataset)

Given a dataset, output all the metricType numbers (dataset columns). The output dictionary is a JSON.

Parameters:: dataset (int) – The number of the dataset from https://webservices.esd.org.uk/datasets?ApplicationKey=ExamplePPK&Signature=YChwR9HU0Vbg8KZ5ezdGZt+EyL4=
Returns:: A JSON dictionary object
Return type:: JSONDict

json_to_pandas(json_data)

Transform downloaded json data to Pandas.

Parameters:: json_data (JSONDict) – JSON data to transform.
Returns:: Downloaded data as Pandas dataframe.
Return type:: pd.DataFrame

merge_tables(dataset_name)

Merge the variables to form a table for a given dataset.

Parameters:: dataset_name (str) – Dataset name string.
Returns:: All variables of the dataset merged as one Pandas dataframe.
Return type:: pd.DataFrame

mp_download(datasets, output_folder, latest_n=20, drop_discontinued=True, max_workers=8)

Multiprocessing method for downloading data for multiple datasets. Use max_workers to split the dataset dictionary to chunks of size max_workers.

Parameters:

datasets (Dict[str,int]) – Dictionary of format {“some_name”: some_integer}’, where the integer value is an identifier from https://webservices.esd.org.uk/datasets?ApplicationKey=ExamplePPK&Signature=YChwR9HU0Vbg8KZ5ezdGZt+EyL4=
latest_n (int) – The period is currently restricted to using the latest n periods. This means that the period can be years, quarters, months, weeks or some other period (e.g. for Indices of Multiple Deprivation, the period refers to publications so that latest_n=2 would get data for 2019 and 2015).
drop_discontinued (bool) – If you set this to False, the downloaded data will include discontinued metrics. Default is True.
max_workers (int) – Set the number of workers for multiprocessing. Typically this would be the number of logical CPUs in your system. This will also process the datasets in chunks, so that if you list 16 datasets in your datasets dictionary and have 8 workers, the script will work through the datasets in two steps (16/8 = 2).

Return type:

None

Returns:

None

sign_url(url)

Each url needs to be signed.

Parameters:: url (str) – URL to be signed.
Returns:: Signed URL.
Return type:: str