Usage

Installation

To use Consensus, first install it using pip directly:

(.venv) $ pip install -U Consensus

or from Github:

(.venv) $ python -m pip install git+https://github.com/Ilkka-LBL/Consensus.git

Configuration

To make full use of this package, you need to do two things: 1) configure your API keys and proxies. To help with this, there is a ConfigManager() class. And 2) build a lookup file of the tables stored in Open Geography Portal.

Creating the config.json file

Note that the modules and classes in Consensus rely on the keys provided in this config file. However, you can extend the config.json file with the update_config() method to use any other API keys and secrets that you may want. The file is stored as a json file and in Python you can work with nested dictionaries.

from Consensus.ConfigManager import ConfigManager

This class has three methods for saving, updating, and resetting the config.json file. The config.json file resides in the folder “Consensus/config” inside the package installation folder.

The default config.json contents are:

self.default_config = {
            "nomis_api_key": "",
            "lg_inform_key": "",
            "lg_inform_secret": "",
            "proxies": {
               "http": "",
               "https": ""
            }
      }

For the DownloadFromNomis() class to function, you must provide at least the “nomis_api_key” API key, which you can get by signing up on www.nomisweb.co.uk and heading to your profile settings. Likewise, for LGInform() class to function, you must provide the “lg_inform_key” and “lg_inform_secret”, although your institution must also have signed up to use their platform.

Minimum configuration example using Nomis:

from Consensus.ConfigManager import ConfigManager

config_dict = {"nomis_api_key": "your_new_api_key_value"}

conf = ConfigManager()
conf.update_config(config_dict)

If you also want to add proxies:

from Consensus.ConfigManager import ConfigManager

config_dict = {
               "nomis_api_key": "your_new_api_key_value",
               "proxies": {'http': "your_proxy", 'https': "your_proxy"}
            }

conf = ConfigManager()
conf.update_config(config_dict)

Building a lookup table for Open Geography Portal

Building a Open_Geography_Portal_lookup.json file and the accompanying Open_Geography_Portal.pickle file is necessary if you want to make full use of the capabilities of this package. The Open_Geography_Portal_lookup.json file is used by the class Consensus.GeocodeMerger.SmartLinker() to search for the quickest path from your starting table (through the starting_columns attribute) to the ending table (via ending_columns attribute), whereas the Open_Geography_Portal.pickle file is used by the Consensus.EsriConnector.FeatureServer() to download the filtered data. While accessing Open Geography Portal directly through the combination of Consensus.EsriServers.OpenGeography() and Consensus.EsriConnector.FeatureServer() classes gives you more control over what you download, it is also more tedious as you will have to write longer scripts, particularly when accessing several datasets that you want to merge.

You can create Open_Geography_Portal_lookup.json and Open_Geography_Portal.pickle files by running the below snippet:

from Consensus.EsriServers import OpenGeography
import asyncio

def main():
   og = OpenGeography(max_retries=30)
   asyncio.run(og.build_lookup(replace_old=True))

if __name__ == "__main__":
   main()

or inside Jupyter notebook cells:

from Consensus.EsriServers import OpenGeography
import asyncio

og = OpenGeography(max_retries=30)
await og.build_lookup(replace_old=True)

Note that Open Geography Portal uses ESRI servers and they do not always respond to queries. To circumvent the non-responsiveness, we set max_retries=30. On rare occasions, this is not enough and you may have to increase the number of retries. The max_retries argument sets the number of times the class tries to load the list of all services available in the server, as well as how many times each layer of any given server is attempted to load into the class’ service_table attribute. This attribute is key to building the lookup table and the accompanying pickle file.

Explore UK geographies

The package contains a GeoHelper() class that is designed to help you understand UK geographies and select the starting and ending columns when using SmartLinker().run_graph(). This class also relies on the Open Geography Portal lookup file.

from Consensus.GeocodeMerger import GeoHelper

gh = GeoHelper()
print(gh.geography_keys())  # outputs a dictionary of explanations for nearly all UK geographic units.
print(gh.available_geographies())  # outputs all geographies currently available in the lookup file.
print(gh.geographies_filter('WD'))  # outputs all columns referring to wards.

Please note that the geography_keys() method does not explain all geographies as explanations were not always available when developing this method.

Example pipeline

Let’s say you’ve explored the geographies using GeoHelper() and decided you want to map Tenure data from Census 2021 to 2022 wards. Using GeoHelper().geographies_filter('WD') you found the column WD22CD. To download the relevant geometry, you can do the following:

async def get_data():
   gss = SmartLinker()
   gss.allow_geometry('geometry_only')  # use this method to restrict the graph search space to tables with geometry.
   gss.run_graph(starting_columns=['WD22CD'], ending_columns=['LAD22CD'], geographic_areas=['Lewisham', 'Southwark'], geographic_area_columns=['LAD22NM'])  # you can choose the starting and ending columns using ``GeoHelper().geographies_filter()`` method.
   codes = await gss.geodata(selected_path=0, chunk_size=50)  # the selected path is the first in the list of potential paths output by ``run_graph()`` method. Increase chunk_size if your download is slow and try decreasing it if you are being throttled (or encounter weird errors).
   print(codes['table_data'][0])  # the output is a dictionary of ``{'path': [[table1_of_path_1, table2_of_path1], [table1_of_path2, table2_of_path2]], 'table_data':[data_for_path1, data_for_path2]}``
   return codes['table_data'][0]
ward_geos = asyncio.run(get_data())

From here, you can take the WD22CD column from ward_geos and use it as input to the Consensus.Nomis.DownloadFromNomis() class if you wanted to:

from Consensus.Nomis import DownloadFromNomis
from Consensus.ConfigManager import ConfigManager
from dotenv import load_dotenv
from pathlib import Path
from os import environ

# get your API keys and proxy settings from .env file
dotenv_path = Path('.env')  # assuming .env file is in your working directory
load_dotenv(dotenv_path)
api_key = environ.get("NOMIS_API")  # assuming you've saved the API key to a variable called NOMIS_API
proxy = environ.get("PROXY") # assuming you've saved the proxy address to a variable called PROXY

# set up your config.json file - only necessary the first time you use the package
config = {
         "nomis_api_key": api_key,  # the key for NOMIS must be 'nomis_api_key'
         "proxies": {"http": proxy,  # you may not need to provide anything for proxy
                     "https": proxy}  # the http and https proxies can be different if your setup requires it
         }
conf = ConfigManager()
conf.save_config()

# establish connection
nomis = DownloadFromNomis()
nomis.connect()

# print all tables from NOMIS
nomis.print_table_info()

# Get more detailed information about a specific table. Use the string starting with NM_* when selecting a table.
# In this case, we choose TS054 - Tenure from Census 2021:
nomis.detailed_info_for_table('NM_2072_1')  #  TS054 - Tenure

# If you want the data for all geographies:
df_bulk = nomis.bulk_download('NM_2072_1')
print(df_bulk)

# And if you want just an extract for a specific geography, in our case all wards in Lewisham and Southwark:
geography = {'geography': list(ward_geos['WD22CD'].unique())}  # you can extend this list
df_lewisham_and_southwark_wards = nomis.download('NM_2072_1', params=geography)  # note that this method falls back to using bulk_download() if it fails for some reason and then applies the geography filter.
print(df_lewisham_and_southwark_wards)

Now you have both the geopandas GeoDataFrame() of the wards and the TS054 - Tenure data for the wards and you’re free to create maps and graphs as you like.