Recently, I created online maps of creative companies in Scottish Borders, the Lothians, Edinburgh and Fife (collectively ‘south-east Scotland’). This was commissioned by the Creative Informatics programme, which aims ‘to explore how data can be used to drive ground-breaking new products, businesses and experiences’, among other good things.
Without further ado, here are the maps:
Below I rant about the problems encountered in this project – they are almost all about the data.
By the way, Creative Informatics is also providing most of the funding for the Platform to Platform project which I will lead next year, so I had better say nice things about it and its contributors. And in truth, I can say nice things: it has been fun to work with Inge Panneels and Ingi Helgason. They answered my many questions during the project clearly and quickly, and were very responsive to my suggestions. They have written nice things about me, and they were very quick to sign off my invoice – thank you!
However, I can only say nasty things about they data they inherited and supplied to me. (That is, it’s not Ingi’s, Inge’s or Creative Informatics’ fault that I spend longer making the data usable than writing the mapping code.)
How’s it done?
I’m told that the data came from FAME from Bureau van Dijk, who in turn get their data from Companies House and supplement it with 118 market data. Companies House thus only collects data on companies and mostly excludes sole traders, who make up a significant part of the creative workforce. There are other data sources but most are tightly gatekept.
Issues I could fix
For a start, the data was supplied in 48 separate spreadsheets, some of which had columns in different orders. So copying it all into a single spreadsheet for processing was less fun than it should have been.
Then there was the SIC (standard industrial classification of economic activities) code data. This was supplied in this format. (Not that SIC codes are currently used in the maps, but I don’t like throwing away data.)
|Company||All SIC codes||Primary UK SIC code|
|<company name 1>||71111||71111|
|<company name 2>||62012||62012|
It took a lot of manual work to get to
|Company||primary SIC code||2nd SIC code||3rd SIC code||4th SIC code|
|<company name 1>||71111|
|<company name 2>||62012||62020||62030||62090|
A few extra rows of data had been sourced using a Google form. This presented the data in another format that needed manual cutting and pasting to get it into the right format.
Company data also said whether companies were in ‘Edinburgh and the Lothians’, ‘Fife’ or ‘Scottish Borders’. Fortunately, converting postcodes to latitudes and longitudes using Doogal’s batch-geocoder also stated which local authority companies are in. It also found that some companies in the data are outwith south-east Scotland. (Some were in other parts of Scotland, but some were in south-east England!) So these ‘irrelevant’ companies were deleted from the data.
Some company data omitted postcodes, but this was fairly quickly fixed by web-searching for the companies on the Companies House website and other online resources. But in at least one case, the Companies House website stated a street address that does not exist.
There are issues when converting SCCI codes (which were supplied for each company) to DCMS codes (which were not):
|SCCI codes||DCMS codes|
|advertising||advertising and marketing|
|visual art||museums, galleries and libraries|
|craft and antiques||crafts|
|fashion and textiles||design (product, graphic, fashion etc)|
|design||design (product, graphic, fashion etc)|
|performing arts||music, performing & visual arts|
|music||music, performing & visual arts|
|photography||film, TV, video, radio & photography|
|film and video||film, TV, video, radio & photography|
|computer games||no DCMS code|
|radio and tv||film, TV, video, radio & photography|
|writing and publishing||publishing|
|libraries and archives||no DCMS code|
|software and electronic publishing||tech – IT, software, hardware and computer services|
|cultural education||no DCMS code|
The data still contains a large number of duplicates, leading to many cases where a company has two or more markers on the SCCI map, one for each SCCI code in that company’s data. There is no obvious ‘programmatic’ way to resolve this duplication – I cannot decide which SCCI code(s) should be removed. For example:
|Company||SCCI code||DCMS code|
|<company name 3>||computer games||no DCMS code|
|<company name 3>||film and video||film, TV, video, radio & photography|
|<company name 3>||software and electronic publishing||tech etc|
|<company name 3>||visual art||museums, galleries and libraries|
|<company name 3>||writing and publishing||publishing|
This problem also occurs on the DCMS map, because each SCCI code has an equivalent DCMS code. So there are two markers on each map for many companies. On the DCMS map, a company may have two identical markers, even if its markers are different on the SCCI map. For example:
|Company||SCCI code||DCMS code|
|<company name 4>||design||design (product, graphic, fashion etc)|
|<company name 4>||fashion and textiles||design (product, graphic, fashion etc)|
I am convinced the data isn’t complete. For example, I cannot believe that there are no creative companies in St Andrews. (I lived there for a long time – I think I’d have noticed if there weren’t any.) What about the south-west of Scottish Borders?
I am not convinced the maps are that helpful, mostly because they have so many duplicate markers and probably have a lot of missing data. Instead, I believe they give a rough flavour of what’s happening. They are also useful for showing the paucity of data available to government. This is important to me: without decent data, how can any government create and implement the right polices, or do the right amount of whatever it chooses to do. In brief:
Does the government know what it’s doing?
Pingback: Mapping creative industries in south-east Scotland - Social Informatics research