Creating Simple Choropleth Maps

Although the act of map making (cartography!) has much older roots than the concept of the ‘grammar of graphics’, the two are very much compatible. With some extensions, the grammar of graphics can also be used to create maps of any kind.

florence’s coordinate systems and components natively support all kinds of geospatial visualisations or maps. In combination with the DataContainer and proj4, it furthermore supports any coordinate system supported by proj4. In this tutorial, you will learn how to construct a map that encodes data through color – a choropleth map – using GeoJSON data.

⭐️Learning Goal(s)

In this tutorial, you will make a map of average resale prices in different planning areas of Singapore. You will use the following components:

To manage our data, we will use the DataContainer library.

What data do we need?

Spatial data can be understood as a combination of spatial ‘geometries’ with non-spatial attributes. For example, a dataset of shops might consist of a set of pairs of cartesian or longitude-latitude coordinates (spatial geometries), and a complementary set of attributes like the names of shops and opening hours (non-spatial attributes). In online mapping, GeoJSON is the defacto standard to store spatial data. It is an extension of regular JSON that follows specific rules. In GeoJSON, a point might be represented like this:

{
  type: 'Feature',
  geometry: {
    type: 'Point',
    coordinates: [125.6, 10.1]
  },
  properties: {
    name: 'Dinagat Islands'
  }
}

A ‘Feature’ has two additional keys: geometry, holding information about the spatial geometry; and properties, holding any additional non-spatial attributes. florence and the DataContainer can work with GeoJSON out of the box.

Importing and preparing the dataset

For this tutorial, we import a GeoJSON file of Singapore’s planning areas and mean resale prices per square meter derived from public data from the Housing & Development Board. Let’s get started by making it available to our Svelte component (scroll down all the way to the end of this tutorial for a REPL that includes the dataset).

import { geojson } from './planning_areas_data.js' 

Next, we will load this file into the DataContainer:

const convertNullToUndefined = value => value === null ? undefined : value

const dataContainer = new DataContainer(geojson)
  .mutate({ resale_price_sqm: r => convertNullToUndefined(r.resale_price_sqm) })

Note that directly after loading the geojson data into a DataContainer, we use the mutate method. What that line of code does is take all values in the resale_price_sqm column that are null (missing), and convert them to undefined. This bit of data gymnastics is necessary because we need to treat missing data in a special way. The reason for this will become clear later!

For further details on loading data to the DataContainer, refer to its documentation.

Creating geospatial scales

In the case of maps, some data may be recorded in three-dimensional ‘earth’ coordinates (like longitude/latitude) while in other cases data might already be pre-projected in a two-dimensional plane for you. In both cases, you still need to translate the spatial coordinates to pixel (screen) coordinates.

When dealing with geodata, we can’t treat the x and y dimension as separate, but must take them into account together. If we don’t, we will loose the aspect ratio between the x and y dimensions, and our map will look stretched. We can accomplish this with a helper function that florence exports called fitScales, that uses the bounding box of our GeoJSON geometries to create the correct scales. We can calculate the bounding box simply by using the bbox method of the DataContainer. The function then returns an object with a scaleX and scaleY key that we can pass to the Section’s scaleX and scaleY.

Our code now looks like this:

<script>
  import { fitScales } from '@snlab/florence'
  import DataContainer from '@snlab/florence-datacontainer'

  // import data
  import { geojson } from '../data/planning_areas.js'

  // load data to DataContainer
  const convertNullToUndefined = value => value === null ? undefined : value

  const dataContainer = new DataContainer(geojson)
    .mutate({ resale_price_sqm: r => convertNullToUndefined(r.resale_price_sqm) })

  // create geo scales
  const geoScales = fitScales(data.bbox())
</script>

Setting up our ‘canvas’ for mapping

We are now ready to set up our ‘canvas’ for creating our map with the Graphic and Section components. After specifying the width and height of the Graphic (which creates an empty svg of the given width and height), we use the geoScales variable from our previous step to set the scaleX and scaleY of the Section. To make our code more concise, we will use the spread syntax for this.

<script>
  import { Graphic, Section, fitScales } from '@snlab/florence'
  ...
</script>

<Graphic width={500} height={500}>

  <Section
    {...geoScales}
    flipY
  >

  </Section>

</Graphic>

Note that (as in the basic tutorial) we need to use the flipY prop of the Section to make our coordinate system go from bottom to top, instead of top to bottom.

Drawing the map

Next, we have to draw the actual polygons: one for each planning area. Instead of setting the x and y properties separately like usual, we use the geometry of the PolygonLayer instead. The geometry prop expects a GeoJSON geometry of type 'Polygon'. For more information on the geometry prop, please refer to the PolygonLayer documentation.

If we were plotting only a single polygon, we would use the Polygon mark instead of the PolygonLayer.

<script>
  import { Graphic, Section, PolygonLayer, fitScales } from '@snlab/florence'
  ...
</script>

<Graphic width={500} height={500}>

  <Section
    {...geoScales}
    flipY
  >

    <PolygonLayer geometry={dataContainer.column('$geometry')} />

  </Section>

</Graphic>

Too make our map look more appealing, will pass appropriate values to aesthetic properties like fill, stroke, and strokeWidth:

...

<Graphic width={500} height={500}>

  <Section
    {...geoScales}
    flipY
  >
    <!-- step 1 -->
    <PolygonLayer 
      geometry={data.column('$geometry')}
      fill={'#d3d3d3'} 
      stroke={'white'} 
      strokeWidth={1}
    />

  </Section>

</Graphic>

Our map now looks like this:

Turning our map into a choropleth

That was not too hard! Now we take it a step further and add colors to the polygons to reflect a variable in our dataset – in other words, we turn our map into a choropleth map. First, we are going to pick a set of colors for our polygons. In the basic tutorial, we used a predefined color scheme from d3-scale-chromatic, but in this tutorial we will create own color scheme to demonstrate how that works.

const COLORS = ['#fff0d2', '#FDD1A5', '#FD9243', '#982f05', '#4e1802']

The variable we want to map to this color scheme is the resale_price_sqm variable in our DataContainer. Like in the basic tutorial, we do this using a scale. We need a scale that maps a continuous / quantitative domain (the price per square meter) to a categorical range (a set of colors). For this, we will have to create a ‘classification scheme’, that divides the observed price data in a set of classes, which we can then link to the colors. For example, if our resale_price_sqm would contain the values [1, 2, 3, 4], and we wanted to create two classes, it would make sense to put the numbers [1, 2] in a class, and the numbers [3, 4] in a class. We could then say that all numbers in the first class should light-blue, and all numbers in the second class dark-blue.

We can do all of this automatically with the DataContainer’s classify method:

const priceColorScale = dataContainer
  .dropNA('resale_price_sqm')
  .classify(
    { column: 'resale_price_sqm', method: 'EqualInterval', numClasses: 5 },
    COLORS
  )
  .unknown('#d3d3d3')

Before calling classify, we first remove the missing (undefined) values from the resale_price_sqm column. Next, we divide the resale_price_sqm column into 5 classes using the EqualInterval method for determining the class boundaries. We then assign a color from the previously defined COLORS variable to each class. The classify method returns a d3 threshold scale. We then tell the scale that if it encounters ‘unknown’ values, like undefined or NaN, it should return a gray-ish color ('#d3d3d3'). This is why we had to convert our null values to undefined earlier: null values are interpreted by the scale as 0, not as missing/unknown!

In summary, the part of our code between <script> should now look like this:

  import { Graphic, Section, PolygonLayer, fitScales } from '@snlab/florence'
  import DataContainer from '@snlab/florence-datacontainer'
  import { geojson } from './planning_areas_data.js'

  const convertNullToUndefined = value => value === null ? undefined : value

  const dataContainer = new DataContainer(geojson)
    .mutate({ resale_price_sqm: r => convertNullToUndefined(r.resale_price_sqm) })

  const geoScales = fitScales(dataContainer.bbox())

  const COLORS = ['#fff0d2', '#FDD1A5', '#FD9243', '#982f05', '#4e1802']

  const priceColorScale = dataContainer
    .dropNA('resale_price_sqm')
    .classify(
      { column: 'resale_price_sqm', method: 'EqualInterval', numClasses: 5 },
      COLORS
    )
    .unknown('#d3d3d3')

Adding colors to the polygons

The last thing we need to do is assign the colors to the PolygonLayer’s fill prop. Just replace the input to fill with dataContainer.map('resale_price_sqm', priceColorScale) and voila! Our simple map of Singapore is now a choropleth map that shows us each planning area’s mean HDB flat resale price per square meter.

...

  <PolygonLayer 
    geometry={dataContainer.column('$geometry')}
    fill={dataContainer.map('resale_price_sqm', priceColorScale)}
    stroke={'white'} 
    strokeWidth={1}
  />

...

Plot Title

Graphics, including maps, usually come with titles describing the subject of the graph. We can add a title using the Label component. Like in the basic tutorial, we will position the title in the Graphic’s coordinate system.

<script>
  import { 
    Graphic, Section, PolygonLayer, fitScales, Label
  } from '@snlab/florence'
  ...
</script>

<Graphic width={500} height={500}>

  <Section
    {...geoScales}
    padding={30}
    flipY
  >
    
    ...

  </Section>

  <Label
    x={0.5}
    y={0.14}
    text={'Mean resale price per m2 (S$)'}
    fontFamily={'Montserrat'}
    fontSize={18}
  />

</Graphic>
Mean resale price per m2 (S$)

Legend

For the final touch, we will add a legend to our plot to make it easier to understand what the colors mean. Since we are using a set of discrete colors (as opposed to a color gradient) we will use the DiscreteLegend component.

The DiscreteLegend component requires us to specify an Array of labels for the labels prop, and in this case, an Array of colors of the same length for the fill prop. fill is easy: we can just use our previously defined COLORS variable. The labels prop is a bit trickier. In this case, we will use the getClassLabels helper function exported by florence to automatically create some labels for us based on the priceColorScale. To make the class boundaries look nice, we will turn them into integers by passing Math.floor as a formatting function to getClassLabels.

<script>
  import { 
    Graphic, Section, PolygonLayer, fitScales,
    Label, DiscreteLegend, getClassLabels
  } from '@snlab/florence'
</script>

  ...
  <Label
    ...
  />

  <DiscreteLegend
    x1={0.75} x2={1}
    y1={0.75} y2={1}
    labels={getClassLabels(priceColorScale, Math.floor)}
    fill={COLORS}
  />
Mean resale price per m2 (S$) < 4120 4120 - 4722 4722 - 5325 5325 - 5927 ≥ 5927

Result