Where should I put my data? Going down a black hole...
Updated: Jan 22
A regular consulting issue that I have faced over the years has been 'why is Esri so slow'? Generally it doesn't take too long to prove that Esri isn't slow, but that something else in the enterprise is! There's a multitude of reasons for this, and over time I'll examine a number of these in blogs, but today 'Where should I put my data'?
Depending who you ask there will be different answers, so lets just say that 'SQL Server' and 'IaaS' are commonly found and in principle they’re great decisions. SQL Server is robust, well understood and supported and pervasive across many NZ customers. IaaS offers resilience, performance and centralised management on, often, top quality server hardware. So why could that combination possibly go wrong?
I'm going to discuss alternatives to databases in another post, so lets for today make the assumption that you're using SQL Server within desktop editing workflows and you're using NZ based IaaS. I've consulted with users who have powerful workstation class machines, the latest version of ArcGIS Desktop (or Pro) and use SQL Server in a really good data-centre. But their editing experience is slow, if not really slow.
Often the problem is not the software, or the machine or the server at the other end. Instead it is often the network between the two. There are two important factors that come into play, bandwidth and latency. Bandwidth basically equates to the diameter of the pipe and latency is a case of how long that pipe is. Data travels at speeds approaching the speed of light, so it should be fast right? For me the speed of light is intangible, so lets bring it down to the speed of a vehicle, something we can relate to.
Imagine you have a car that can travel at a set speed of 100KM/h. The longer the tunnel, the longer it takes to get there, that's simple physics. I once had a customer that was based in Wellington. They had databases in their building and performance was acceptable, even when using old servers. They moved to modern servers in IaaS but the data had to travel from one end of the city to the other, the GIS software stayed on their local machines. Their editing processes slowed down. Literally if you make a tunnel longer, it literally takes longer to get from one end to the other (based on the vehicle having a fixed speed). For Civil Defence reasons they later moved the database to Hamilton and things got really slow. It doesn't matter if data can travel at the speed of light, the further away the data is the slower it gets. Imagine what would happen if you move your database to Amazon Web Services or MS Azure (both in Australia), but you keep your software on your desktop. If you ping a local server then the response should take less than 1ms. The further away the server, the greater the response time. This issue is latency.
You may have lots of capacity in your office network, and I guarantee that in your IaaS data centre that there will be lots of capacity. But what about the pipe between the two? The fatter the pipe the more you can get down it, but fatter pipes cost more than thinner pipes.
If you're approaching a tunnel on a two or three lane road, but the tunnel bore ahead has only one lane, then you're going to be forced to slow if there is a lot of traffic. Merging and competing to get ahead of one another is common on NZ roads, and your connection to the data centre is no different. If a new bore is drilled so that the two lanes can continue on a twin bore then there is no reason to slow. Increased bandwidth in this situation can therefore be really useful.
I once saw an organisation that, in response to a slow down in performance (due to latency), chose to get a number of temporary employees to get through a data processing backlog. The impact of this was like doubling the number of cars on the road, all heading to the same single bore tunnel. Basically, more editors caused more network traffic. Instead of increasing the amount of work completed, they actually made things even slower!
Another thing to bear in mind with all of this network traffic is that some organisations will prioritise traffic through 'the pipe'. For instance VOIP can be considered the most important thing, and so a percentage of the bandwidth can be ring-fenced for VOIP to the exclusion of GIS data. This is like having a two lane road and a two lane tunnel, but saying that one lane in the tunnel can only be used for buses! So all the cars still have to squeeze into a single lane.
There are multiple fixes to this, and geoworx can help you decide which is best for your use. The key thing to remember though is data gravity, hence the comment about going down a black hole. The data needs to be moved to the process or the process needs to be moved to the data. This can mean having a local copy of data for analytics/editing, or moving your editing workflows to the data centre. There are other options and geoworx can work with you to explore the options more.
The key customer reaction is to often get a bigger engine. That isn't always the best option...
I've tried to break this down into concepts and language that is accessible to anyone, but if you want to go and geek out a bit more then Esri have provided the detail to this story here. I'm happy to discuss the concepts here if needed. I'll follow up with more thoughts on data in later blog posts.