Geospatial Software Development – The Geometries

Geospatial Software Development – The Geometries

Featured image credit to OpenStreetMap contributors, full copyright information available here.

For the past few years I’ve been working in the Geospatial domain, developing various Geographic Information Systems (GIS). Although some consider GIS a bit of a dark art, and although in some ways they’re right, it’s actually fairly straight forward to get to grips with the underlying concepts.

The Data

Geographic data is broadly found in two different representations. Firstly you have raster data, which are images. If you bought a map in a shop you would likely be looking at a raster representation of some geographic data. The other representation is vector data. I’m sure there’s probably a super-accurate mathsy definition of vector data, but you’re really just dealing with geometries — shapes. It’s common for geospatial data to be stored in vector form and then used to create rasters. We’ll focus on the vector representation of geospatial data here.

The Geometries

We need a standard way of representing the vector data. Just like the “basic” programming types such as Boolean, String, Integer etc there are a set of geometry types which you’ll see across languages and GIS projects. The geometries available tend to differ slightly depending on what you’re developing with, so we’ll cover the more common ones here. We’ll also focus on the 2D geometries, although 3D geometries do exist.

There are a multitude of ways that these geometries are represented. For example you have the ESRI Shapefile, GML, KML, GeoJSON and many more. For familiarity and simplicity, we’ll use GeoJSON examples.

Point

Let’s say you take a map and you stick a pin in it, you’ve just created a Point. This Point has has a x value and a y value, and that’s it.

Geometries_Point

{
    "type": "Point",
    "coordinates": [
        102.0, // Here's your x value
        0.5    // Here's your y value
    ]
 }

MultiPoint

A MultiPoint is a collection of Points. To keep up the analogy, let’s say you decided to stick 3 pins in your map, perhaps to represent places you would like to visit, you now have a MultiPoint.

Geometries_MultiPoint

{
 "type": "MultiPoint",
 "coordinates": [
     [             // Here's your first Point
         100.0,
         0.0
     ],
     [             // Here's your second
         101.0,
         1.0
     ],
     [             // Here's your third
         102.0,
         1.0
     ]
 ]
} 

LineString

A LineString is line drawn on a map. Let’s say you place two pins on a map, one to represent where you are and another to represent where you are travelling to. You link the pins with some string. There you have it, a LineString.

Geometries_LineString

{
  "type": "LineString",
  "coordinates": [
    [
      100.0,
      0.0
    ],
    [
      101.0,
      1.0
    ]
  ]
}

You may have noticed that the structure of the LineString example is very similar to the MultiPoint example. This makes sense as we are dealing with a bunch of Points in both cases, it just so happens in the LineString case that we join the Points up to create a line.

MultiLineString

You’ve probably guessed by now, but a MultiLineString is a collection of LineStrings. Let’s say you were planning a trip for multiple people and needed to plan each of their travel routes on a map, just like you did in the LineString example. Multiple lines in a single geometry, a MultiLineString.

Geometries_MultiLine

{
 "type": "MultiLineString",
 "coordinates": [
     [               // First line
         [
             100.0,
             0.0
         ],
         [
             101.0,
             1.0
         ]
     ],
     [              // Second line
         [
             102.0,
             2.0
         ],
         [
             103.0,
             3.0
         ]
     ]
 ]
}

Polygon

A Polygon is a shape drawn on a map in which the first Point and the last Point are identical — it’s a “closed” shape. Let’s say you wanted to plot the area of your house on a map, you would stick a pin for each corner of your house, join up the pins, with the string visiting each pin to create a shape. A Polygon can be as simple or as complex as it needs to be, it just needs to close.

Geometries_Polygon

{
  "type": "Polygon",
  "coordinates": [
    [
      [          // Here's your first Point
        100.0,
        0.0
      ],
      [
        101.0,
        0.0
      ],
      [
        101.0,
        1.0
      ],
      [
        100.0,
        1.0
      ],
      [         // Here's the last Point, note that they're the same
        100.0,
        0.0
      ]
    ]
  ]
}

It is valid for a Polygon to contain more than one ring. For example, Let’s say you wanted to plot a donut-like shape on a map, you would have a Polygon for the outer ring and another for the inner ring.

MultiPolygon

For when a single Polygon is just not enough. Let’s say you were plotting a University campus, or some other place made up of multiple buildings, all the buildings are “the campus”, but they are also separate buildings. We could represent each of the buildings as a Polygon and then wrap them in a MultiPolygon so that they belong to the same geometry.

Geometries_MultiPolygon

{
  "type": "MultiPolygon",
  "coordinates": [
    [               // Polygon 1
      [            
        [
          102.0,
          2.0
        ],
        [
          103.0,
          2.0
        ],
        [
          103.0,
          3.0
        ],
        [
          102.0,
          3.0
        ],
        [
          102.0,
          2.0
        ]
      ]
    ],
    [                  // Polygon 2
      [
        [
          100.0,
          0.0
        ],
        [
          101.0,
          0.0
        ],
        [
          101.0,
          1.0
        ],
        [
          100.0,
          1.0
        ],
        [
          100.0,
          0.0
        ]
      ]
    ]
  ]
}

Yes, with a MultiPolygon it would even be possible to plot a University campus made entirely of donut-like shaped buildings.

Wrap-up

For more information on what sort of geometries are available checkout the OGC’s “Simple Features” standard or the ESRI Shapefile technical description.

You may have noticed that the numeric values we’ve been using for the geometry examples have been pretty arbitrary and that’s because they have been. To get a handle on how we put a shape in a certain place on the planet, we’ll need to discuss Spatial Reference Systems (SRS), which we’ll look at in the next post.

Comments? Suggestions? Drop me a comment below.

Takeaways from QCon London 2017 – Day 3

Takeaways from QCon London 2017 – Day 3

Here’s day 3. Day 1 can be found here and Day 2 can be found here.

The Talks

  1. Avoiding Alerts Overload From Microservices with Sarah Wells
  2. How to Backdoor Invulnerable Code with Josh Schwartz
  3. Spotify’s Reliable Event Delivery System with Igor Maravic
  4. Event Sourcing on the JVM with Greg Young
  5. Using FlameGraphs To Illuminate The JVM with Nitsan Wakart
  6. This Will Cut You: Go’s Sharper Edges with Thomas Shadwell

Avoiding Alerts Overload From Microservices

  • Actively slim down your alerts to only those for which action is needed
  • “Domino alerts” are a problem in a microservices environment — one service goes down and all dependent services fire alerts
  • Uses Splunk for log aggregation
  • Dashing mentioned for custom dashboards
  • Graphite and Grafana mentioned for metrics
  • Use transaction IDs (uses UUIDs) in the headers of requests to tie them all together
  • Each service to report own health with a standard “health check endpoint”
  • All errors in a service are logged and then graphed
  • Rank the importance of your services – Should you be woken up when service X goes down?
  • Have “Ops Cops” — Developers charged with checking alerts during the day
  • Deliberately break things to ensure alerts are triggered
  • Only services containing business logic should alert

How to Backdoor Invulnerable Code

A highly enjoyable talk of infosec war stories. 

Spotify’s Reliable Event Delivery System

  • The Spotify clients generates an event for each user interaction
  • The system is built on guaranteed message delivery
  • Runs on Google Cloud Platform
  • Hadoop and Hive used on the backend
  • Events are dropped into hourly “buckets”
  • Write it, run it culture
  • System monitoring for:
    • Data monitors – message timeliness SLAs
    • Auditing – 100% delivery
  • Microservices based system
  • Uses Elasticsearch + Kibana
  • Uses CPU based autoscaling with Docker
  • All services are stateless — cloud pub/sub
  • Machines are built with Puppet for legacy reasons
  • Apparently, Spotify experienced a lot of problems with Docker — at least once an hour
  • Services are written in Python
  • Looking to investigate Rocket in future

Event Sourcing on the JVM

  • Event sourcing is inherently functional
  • A single data model is almost never appropriate, event sourcing can feed many and keep them in sync e.g:
    • RDMS
    • NoSQL
    • GraphDB
  • Kafka can be used as an event store by configuring it to persist data for a long time, however this isn’t what it is currently intended to do
  • Event Store mentioned
  • Axon Framework mentioned
    • Mature
  • Eventuate mentioned
    • Great for distributed environments/geolocated data
  • Akka.persistence
    • Great, but needs other Akka libraries
  • Reactive Streams will be a big help when dealing with event sourcing

Using FlameGraphs To Illuminate The JVM

  • Base performance on requirements
  • Flamegraphs come out of Netflix
  • Visualisation of profiled software
  • First must collect Java stacks
  • JVisual VM mentioned
  • Linux Perf mentioned

This Will Cut You: Go’s Sharper Edges

  • It is possible, in some cases, to cause Go to crash through reading (JSON, XML etc) without closing tags —  it just tries to read forever (DOS attack)
  • Go doesn’t have an upload size limit, put your go servers behind a proxy with an upload size limit to mitigate this e.g NGINX, Apache HTTP
  • Go doesn’t have CSRF protection built-in, this must be added manually
  • DNS rebinding attacks may be possible against Go servers

That about wraps it up for my summary QCon London 2017.