How long will you wait for an email? (In a geospatial context)
So the most frequent question I hear is - "why is esri is so slow". I'm currently preparing a slide deck for some workshops this week, I know I'm going to be asked this question, and so I'm preparing a demo on the subject. I thought I'd share a quick taste of the content to a wider audience.
This blog is going to show you that Esri isn't slow! I'm going to use a classic NZ data set which will show that the data is slowing esri down.
"You might have a Ferrari, but if you're stuck behind a tractor, then it's not that your car isn't fast, it's simply a case that some 3rd Party influence is slowing you."
I'm going out on a limb here, but I cannot say strongly enough that the way 'you' prepare and store 'your' data is the biggest influence on performance. It's not always about the tech, the hardware or the network, or the IT guys and gals. It's nice or even comforting to blame them, but performance is very often down to how you curate your geospatial data.
Have you ever used the LRIS Koordinates site and downloaded the data for LCDB? First, I love that you can search for and find data so quickly and easily. As a part of my demo prep, I downloaded LCDB 5.0 dataset yesterday (as a file geodatabase) and plugged it into ArcGIS Pro on a grunty laptop with lots of memory and a fast disk. Esri is slow to draw it, bother! There is no network in the way of consuming this data; I'm using an 8th Generation i7 Processor, 32GB of RAM and an NVME SSD disk. I can't blame the hardware.
When I started to investigate the actual data you find things like this polygon:
The above polygon has an outer edge, and I'm not sure how many doughnuts within it. When you use the "Calculate Geometry" tool you can determine the number of vertices in the polygon. This one polygon has 983,721 coordinate pairs.
So having selected the polygon I used the "Features to JSON" tool to extract the attributes and geometry. The resultant JSON (text) file is 37.2MB - for one polygon - not all the polygons in the image above, just that one blue polygon. Why is it so big? Well, the nearly 1 Million coordinate pairs are made up of many decimal places:
So let me ask this, do you need accuracy greater than 1cm for a dataset like this? If we slimmed the above coordinate pair to centimetre accuracy then we see:
Then we almost halve the amount of data that we're storing! How much faster would that be?
By the time you take the attributes (a few hundred characters) and add the thousands of vertices to it, there are a staggering 39,078,176 characters stored in the JSON file! That's massive. 1 polygon on a map, but nearly 40 Million characters of information to describe it. Now bear in mind there are over 500,000 individual polygons in the whole dataset - though to be fair there are not many that are nearly as big as that.
Let's bring this back to the title of this blog. If I copy the JSON file (for this one polygon) and paste it to a Word document, and use A4, standard margins, Calibri size 10, then:
Word just about crashes and takes a very long time for the paste operation (>10 minutes)
There are 7,197 pages.
You can see in the following image, that on the first page the attribution and metadata take up half the page. The geometry then starts (highlighted) and carries on for 7196 pages.
The size of the geometry is directly proportional to decisions made regarding database precision, and the size and complexity of the polygon.
If you attach that 7,197 page MS Word document to an email and send it, how long will you wait? How does that expectation differ to your expectation for seeing it draw on a map?
Would you complain about MS Word performing badly with such a large document? If you had to email that document to a team member, then would you complain if it wasn't there within one second? Experience suggests the answer would be 'No' to both questions. The expectations of MS Word and Outlook/Exchange are much lower:
The image above shows the memory consumption and processing in MS Word for this copy/paste. I had been waiting ~10 minutes at this point in time. I think this illustrates that large datasets take time and resources to process. My assumption is that the poor performance is due to grammar/spell-check functions. The data at this point is pure text.
So why do we expect so much from Esri, and complain so willingly, when the actual problem is the way that we structure and store the data?
There are many tips and tricks that can be used to remedy this situation.
I've tried hard to brand geoworx as an ArcGIS Enterprise consulting/architectural company. What I haven't presented so well is the fact that to make ArcGIS Enterprise 'fly' you need to get the data right that sits beneath it. I consult at this level as well, and this is actually one of the biggest requests for my services that I'm getting right now.
If data takes a long time to process then it's chewing up your CPU. That's a no brainer. So, rather than throwing more CPU and Memory, or more servers, at your ArcGIS Enterprise deployment, why not review your data and tune that instead? If your ArcGIS Enterprise is able to read optimised data then it stands to reason that it will have to process less, and therefore go faster and offer more Return on Investment?
Don't be slow to email me and ask for a discussion. email@example.com
Also please keep an eye on future blogs. I haven't even started to talk about disk types, geodatabase types, the use of other tools to improve data or the effects of networks or server types.
The use of GIS is on an Enterprise scale. You need to leverage your 'enterprise' to make GIS truly work.