Recently, I created online maps of creative companies in Scottish Borders, the Lothians, Edinburgh and Fife (collectively ‘south-east Scotland’). This was commissioned by the Creative Informatics programme, which aims ‘to explore how data can be used to drive ground-breaking new products, businesses and experiences’, among other good things.
Without further ado, here are the maps:
Below I rant about the problems encountered in this project – they are almost all about the data.
By the way, Creative Informatics is also providing most of the funding for the Platform to Platform project which I will lead next year, so I had better say nice things about it and its contributors. And in truth, I can say nice things: it has been fun to work with Inge Panneels and Ingi Helgason. They answered my many questions during the project clearly and quickly, and were very responsive to my suggestions. They have written nice things about me, and they were very quick to sign off my invoice – thank you!
However, I can only say nasty things about they data they inherited and supplied to me. (That is, it’s not Ingi’s, Inge’s or Creative Informatics’ fault that I spend longer making the data usable than writing the mapping code.)
How’s it done?
It’s all based on the wonderful Leaflet library, and uses the MarkerCluster plugin. Leaflet provides a great tutorial: following that, and given the data, anyone could have produced these maps. I had a head-start: I’d already used this software to map community councils, SFC-funded GCRF projects and RIVAL network members. So I just had to adapt my codebase for one of these previous maps. Then I collated the data into a large spreadsheet, and used concatenation functions in Excel to produce lines of javascript, one for each row of data. Then there is a javascript ‘programme’ for each map. The programmes are called by the web-pages: they ingest the data and invoke Leaflet to draw maps and marks on the web-pages.
Data source
I’m told that the data came from FAME from Bureau van Dijk, who in turn get their data from Companies House and supplement it with 118 market data. Companies House thus only collects data on companies and mostly excludes sole traders, who make up a significant part of the creative workforce. There are other data sources but most are tightly gatekept.
Issues I could fix
For a start, the data was supplied in 48 separate spreadsheets, some of which had columns in different orders. So copying it all into a single spreadsheet for processing was less fun than it should have been.
Then there was the SIC (standard industrial classification of economic activities) code data. This was supplied in this format. (Not that SIC codes are currently used in the maps, but I don’t like throwing away data.)
Company | All SIC codes | Primary UK SIC code | |
<company name 1> | 71111 | 71111 | |
<company name 2> | 62012 | 62012 | |
62020 | |||
62030 | |||
62090 |
It took a lot of manual work to get to
Company | primary SIC code | 2nd SIC code | 3rd SIC code | 4th SIC code |
<company name 1> | 71111 | |||
<company name 2> | 62012 | 62020 | 62030 | 62090 |
A few extra rows of data had been sourced using a Google form. This presented the data in another format that needed manual cutting and pasting to get it into the right format.
Company data also said whether companies were in ‘Edinburgh and the Lothians’, ‘Fife’ or ‘Scottish Borders’. Fortunately, converting postcodes to latitudes and longitudes using Doogal’s batch-geocoder also stated which local authority companies are in. It also found that some companies in the data are outwith south-east Scotland. (Some were in other parts of Scotland, but some were in south-east England!) So these ‘irrelevant’ companies were deleted from the data.
Some company data omitted postcodes, but this was fairly quickly fixed by web-searching for the companies on the Companies House website and other online resources. But in at least one case, the Companies House website stated a street address that does not exist.
There are issues when converting SCCI codes (which were supplied for each company) to DCMS codes (which were not):
SCCI codes | DCMS codes |
advertising | advertising and marketing |
architecture | architecture |
visual art | museums, galleries and libraries |
craft and antiques | crafts |
fashion and textiles | design (product, graphic, fashion etc) |
design | design (product, graphic, fashion etc) |
performing arts | music, performing & visual arts |
music | music, performing & visual arts |
photography | film, TV, video, radio & photography |
film and video | film, TV, video, radio & photography |
computer games | no DCMS code |
radio and tv | film, TV, video, radio & photography |
writing and publishing | publishing |
libraries and archives | no DCMS code |
software and electronic publishing | tech – IT, software, hardware and computer services |
cultural education | no DCMS code |
Remaining issues
Duplication
The data still contains a large number of duplicates, leading to many cases where a company has two or more markers on the SCCI map, one for each SCCI code in that company’s data. There is no obvious ‘programmatic’ way to resolve this duplication – I cannot decide which SCCI code(s) should be removed. For example:
Company | SCCI code | DCMS code |
<company name 3> | computer games | no DCMS code |
<company name 3> | film and video | film, TV, video, radio & photography |
<company name 3> | software and electronic publishing | tech etc |
<company name 3> | visual art | museums, galleries and libraries |
<company name 3> | writing and publishing | publishing |
This problem also occurs on the DCMS map, because each SCCI code has an equivalent DCMS code. So there are two markers on each map for many companies. On the DCMS map, a company may have two identical markers, even if its markers are different on the SCCI map. For example:
Company | SCCI code | DCMS code |
<company name 4> | design | design (product, graphic, fashion etc) |
<company name 4> | fashion and textiles | design (product, graphic, fashion etc) |
WordPress
There are plugins to make WordPress use Leaflet. If I’d got these to work, the maps could have been on Creative Informatics’ WordPress-based website. However, the WordPress plugins don’t support (as far as I can see) clustering or the selector-panels in the top-right of my maps. Also the plugins will display 10, or 100, or even 1000 markers, but when I tried to display all 9000 markers, my browsers crashed. Hence the maps are hosted on the Edinburgh Napier University School of Computing ‘projects’ server as HTML, CSS and javascript files.
Missing data?
I am convinced the data isn’t complete. For example, I cannot believe that there are no creative companies in St Andrews. (I lived there for a long time – I think I’d have noticed if there weren’t any.) What about the south-west of Scottish Borders?
Conclusion
I am not convinced the maps are that helpful, mostly because they have so many duplicate markers and probably have a lot of missing data. Instead, I believe they give a rough flavour of what’s happening. They are also useful for showing the paucity of data available to government. This is important to me: without decent data, how can any government create and implement the right polices, or do the right amount of whatever it chooses to do. In brief:
Does the government know what it’s doing?
Pingback: Mapping creative industries in south-east Scotland - Social Informatics research