Rekha Khandhadia | CrunchyData Blog

Timezone Transformation Using Location Data & PostGIS

Rekha.Khandhadia@crunchydata.com (Rekha Khandhadia) — Fri, 06 Jan 2023 10:00:00 EST

Imagine your system captures event data from all over the world but the data all comes in UTC time giving no information about the local timing of an event. How can we quickly convert between the UTC timestamp and local time zone using GPS location? We can quickly solve this problem using PostgreSQL and PostGIS.

This example assumes you have a Postgres database running with PostGIS. If you’re new to PostGIS, see PostGIS for Newbies.

Steps we will follow

Timezone Shape file Overview: For World Timezone shape file, I have been following a really nice project by Evan Siroky, Timezone Boundary Builder. We’ll download the timezones-with-oceans.shapefile.zip from this location.
Load Shape file: Using shp2pgsql, convert shape file to sql to create timezones_with_ocean table.
PostgreSQL internal view pg_timezone_names: Understand pg_timezone_names view.
Events table and insert sample data: Create events table and insert sample data.
Transformation Query: Transform event UTC timestamp to event local timestamp.

Overview of data relationship

Below is an overview of the data relationship and join conditions we will be using.

Timezone Shape file Overview

A “shape file” commonly refers to a collection of files with .shp, .shx, .dbf, and other extensions on a common prefix name, which in our case is combined-shapefile-with-oceans.*. **combined-shapefile-with-oceans contains polygons with the boundaries of the world's timezones. With this data we can start our process.

Load Shape file

We will be using shp2pgsql to generate sql file from shape file to create public.timezones_with_ocean and inserts data in a table. The table contains fields gid, tzid and geometry.

Export the Host, user, password variables

export PGHOST=p.<pgcluster name>.db.postgresbridge.com
export PGUSER=<DBUser>
export PGPASSWORD=<dbPassword>

Create sql file from shape file

shp2pgsql -s 4326  "combined-shapefile.shp" public.timezones_with_oceans   > timezone_shape.sql

Create public.timezones_with_ocean and load timestamp data

psql -d timestamp -f timezone_shape.sql

Query a bit of sample data

SELECT tzid, ST_AsText(geom), geom FROM public.timezone_with_oceans limit 10;

Visualize Sample data

Using PgAdmin highlight geom column and click on eye icon visualize the geometry on map showing below.

PostgreSQL internal view pg_timezone_names

PostgreSQL provides a view of pg_timezone_names with a list of time zone names recognized by SET TIMEZONE. By default, PostgreSQL also provides their associated abbreviations, UTC offsets, and daylight-savings status, which our clients need to know.

pg_timezone_names view columns description

Column	Type	Description
name	text	Time zone name
abbrev	text	Time zone abbreviation
utc_offset	interval	Offset from UTC (positive means east of Greenwich)
is_dst	bool	True if currently observing daylight savings

pg_timezone sample data

Events table and insert sample data

Now that we have the timezone shape file loaded, we can create an event table, load sample transaction data, and apply a timestamp conversion transformation query.

CREATE TABLE IF NOT EXISTS public.events
(
    event_id bigint NOT NULL,
    eventdatetime timestamp without time zone NOT NULL,
	event_type varchar(25) not null,
    latitude double precision NOT NULL,
    longitude double precision NOT NULL,
    CONSTRAINT events_pkey PRIMARY KEY (event_id)
);

INSERT INTO public.events(
	event_id, eventdatetime, event_type, latitude, longitude)
	VALUES (10086492,'2021-08-17 23:17:05','Walking',34.894089,-86.51148),
(50939,'2021-08-19 10:27:12','Hiking',34.894087,-86.511484),
(10086521,'2021-09-09 19:32:37','Swiming',34.642584,-86.761291),
(22465493,'2021-09-30 11:43:34','Swiming',33.611151,-86.799522),
(22465542,'2021-11-26 22:40:44.197','Swiming',34.64259,-86.761452),
(22465494,'2021-09-30 11:43:34','Hiking',33.611151,-86.799522),
(10087348,'2021-07-01 13:42:15','Swiming',25.956098,-97.535303),
(22466679,'2021-09-01 12:25:06','Hiking',25.956112,-97.535304),
(22466685,'2021-09-02 13:41:07','Swiming',25.956102,-97.535305),
(10088223,'2021-11-29 13:19:53','Hiking',25.956097,-97.535303),
(22246192,'2021-06-16 22:21:23','Walking',37.083726,-113.577984),
(9844188,'2021-06-23 20:18:43','Swiming',37.1067,-113.561401),
(22246294,'2021-06-25 21:50:06','Walking',37.118719,-113.598038),
(22246390,'2021-07-01 18:15:54','Hiking',37.109579,-113.562923),
(9844332,'2021-07-04 19:11:13','Walking',37.251538,-113.614708),
(9845242,'2021-11-04 13:25:40.425','Swiming',37.251542,-113.614699),
(84843,'2021-11-23 14:33:20','Swiming',37.251541,-113.614698),
(22247674,'2021-12-21 14:31:15','Swiming',37.251545,-113.614691),
(22246714,'2021-08-09 14:46:51','Swiming',37.109597,-113.562912),
(9845116,'2021-10-18 14:59:51','Swiming',37.082777,-113.554991);

Sample Event Data

Transformation Query

Now we can convert the UTC timestamp to the local time for an event. Using PostGIS function St_Intersects, we can find the timezone_with_oceans.geom polygon in which an event point lies. This gives the name of the timezone where the event occurred. To create our transformation query:

First we create the location geometry using Longitude and Latitude from the events table.
Using PostGIS function St_Intersects, we will find common points between timezone_with_oceans.geom and an event’s location geometry giving us information on where the event occurred.
Join pg_timezone_names to timezone_with_oceans on name and tzid respectively, to retrieve abbrev, utc_offset, and is_dst fields from pg_timezone_names.
Using PostgreSQL AT TIME ZONE operator and pg_timezone_name, we convert UTC event timestamp to local event timestamp completing the process, e.g.
timestamp '2021-07-05 00:59:12' at time zone 'America/Denver' → 2021-07-04 18:59:12+00’

Transformation Query SQL:

SELECT event_id, latitude, longitude, abbrev,
       utc_offset,is_dst, eventdatetime,
       ((eventdatetime::timestamp WITH TIME ZONE AT TIME ZONE abbrev)::timestamp WITH TIME ZONE)
           AS eventdatetime_local
FROM public.events
JOIN timezone_with_oceans ON ST_Intersects(ST_Point(longitude, latitude, 4326) , geom)
JOIN pg_timezone_names ON tzid = name;

Closing Thoughts

PostgreSQL and PostGIS allow you to easily and dynamically solve timezone transformation. I hope this blog was helpful, and we at Crunchy Data wish you happy learning.

Accelerating Spatial Postgres: Varnish Cache for pg_tileserv using Kustomize

Rekha.Khandhadia@crunchydata.com (Rekha Khandhadia) — Fri, 05 Aug 2022 11:00:00 EDT

Recently I worked with one of my Crunchy Data PostgreSQL clients on implementing caching for pg_tileserv. pg_tileserv is a lightweight microservice to expose spatial data to the web and is a key service across many of our geospatial customer sites. pg_tileserv can generate a fair amount of database server load, depending on the complexity of the map data and number of end users, so putting a proxy cache such as Varnish in front of it is a best practice. Using Paul Ramsey's Production PostGIS Vector Tiles Caching as a starting point, I started to experiment with client standard toolset of Kustomize, OpenShift, and Podman. I found this was an easy and fast way to deploy Crunchy Data Kubernetes components and caching.

In this blog I would like to share with you a step-by-step guide on the process of deploying Varnish Caching for pg_tileserv using Kustomize in Redhat Openshift Container Platform and PGO, the Postgres Operator from Crunchy Data (PostgreSQL 13.x, PostGIS, pg_tileserv).

Prerequisites

Access to Redhat Openshift Container Platform, with permission to deploy Kubernetes objects in the Namespace.
Access to Openshift CLI 4.9 and knowledge of oc and Kubernetes commands.
Access to deployed PostgreSQL Operator for Kubernetes (PGO) (PostgreSQL 13.x, PostGIS, pg_tileserv) environment able to view deployed components information.
pg_tileserv pod/deployment, services are deployed, and in our environment pg_tileserv service is named tileserv.

Deployment Steps

Varnish Image

Find the right Varnish Cache image for your use case and upload it to your private client repository. Below are the steps we used in our client environment:

podman login registry.redhat.io
podman pull registry.redhat.io/rhel8/varnish-6:1-191.1655143360
podman tag registry.redhat.io/rhel8/varnish-6:1-191.1655143360 client registry/foldername/varnish-6:1-191.1655143360
podman push client registry/foldername/varnish-6:1-191.1655143360

Create the VCL file

Create the default.vcl file using a starting template from Varnish website. VCL stands for Varnish Configuration Language and is the domain-specific language used by Varnish to control request handling, routing, caching, and several other things.

Configure backend .host in default segment set to pg_tileserv service object, in our case it was tileserv.

.host = tileserv
.port = 7800

Sample `default.vcl`

vcl 4.0;

backend default { .host = "tileserv"; .port = "7800"; }

Create `varnish_deployment.yaml`

I will highlight few things that are important and provide our example below.

Configure the image in the client private repository (client private_registry/ varnish-6:1-191.1655143360), use image to pull the secret if the private repository requires it.
Mount tmpfs, mountPath /var/lib/varnish, Volumes emptyDir: {}.
Volume mount the varnish configMap, to /etc/varnish/default.vcl.
Configure containerPort to 8080.
Configure Container resources, if this is not configured it will take default configuration configured at the cluster level, e.g. cpu request: 1M.
Configure pod commands to execute varnishd and varnishncsa. varnishcsa command below adds varnish request info to the pod logs, this provides cache hit or miss information.

    command: ["/bin/sh"]
    args:
      - -c
      - |
      varnishd -a :8080 -a :6081 -f /etc/varnish/default.vcl -s malloc,512M;
      varnishncsa -F '%h %l %u %t "%r" %s %b "%{Referer}i" "%{User-agent}i" % Varnish:handling}x %{Varnish:hitmiss}x'

Sample Varnish `deployment.yaml`

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: varnish
  labels:
  app: varnish
spec:
  replicas: 1
  selector:
  matchLabels:
    app: varnish
  template:
  metadata:
    labels:
    app: varnish
  spec:
    containers:
    - name: varnish
    image: privateclientregistry/crunchydata/varnish-6:1-191.1655143360
    imagePullPolicy: Always
    command: ["/bin/sh"]
    args:
      - -c
      - |
      varnishd -a :8080 -a :6081 -f /etc/varnish/default.vcl -s malloc,512M;
      varnishncsa -F '%h %l %u %t "%r" %s %b "%{Referer}i" "%{User-agent}i" % Varnish:handling}x %{Varnish:hitmiss}x'
    ports:
    - containerPort: 8080
    - containerPort: 6081
    resources:
      limits:
      cpu: 200m
      memory: 1024Mi
      requests:
      cpu: 100m
      memory: 512Mi
    volumeMounts:
    - mountPath: /etc/varnish/default.vcl
      name: varnish-config
      subPath: default.vcl
      defaultMode: 0777
    - mountPath: /var/lib/varnish
      name: tmpfs
      defaultMode: 0777
    volumes:
    - name: varnish-config
    configMap:
      name: varnish-config
      defaultMode: 0777
      items:
      - key: default.vcl
        path: default.vcl
    - name: tmpfs
    emptyDir: {}
    serviceAccount: pgo-default
    securityContext:
    runAsNonRoot: true
    fsGroupChangePolicy: "OnRootMismatch"
    terminationGracePeriodSeconds: 30

Create `varnish_service.yaml` for the Kubernetes varnish service with the ClusterIP

Below is our example:

---
kind: Service
apiVersion: v1
metadata:
  name: varnish-svc
  labels:
  name: varnish
spec:
  selector:
  app: varnish
  ports:
  - protocol: TCP
  port: 8080
  targetPort: 8080
  type: ClusterIP

Create `varnish_route.yaml` for a secure Varnish route

In our case we have created edge route with no certificates to enable https. Our recommendation is to follow your use case security guidelines and Openshift documentation. Below is an example:

kind: Route
apiVersion: route.openshift.io/v1
metadata:
  name: varnish
  labels:
  name: varnish
spec:
  tls:
  insecureEdgeTerminationPolicy: Redirect
  termination: edge
  to:
  kind: Service
  name: varnish-svc
  weight: 100
  port:
  targetPort: 8080
  wildcardPolicy: None

Create `kustomize.yaml`

Here is an example manifest:

---
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
- varnish_deployment.yaml
- varnish_service.yaml
- varnish_route.yaml


configMapGenerator:
  - name: varnish-config
  files:
    - default.vcl

Deploy

The final step is to list files and deploy kustomize manifest using Openshift cli. Assumption here is that kustomize.yaml is in current directory.

$ ls
  # default.vcl
  # kustomization.yaml
  # varnish_deployment.yaml
  # varnish_route.yaml
  # varnish_service.yaml

oc apply -k .

Troubleshooting

Adjust Default VCL Cookies

After the initial deployment the Varnish pod log was showing most of the pages as a miss on the cache hit/miss. After analyzing, I found that set-cookie header was added to the request and response. Add default.vcl unset the cookies:

sub vcl_recv {
  if (req.url ~ "^[^?]*\.(pbf)(\?.*)?$") {
      unset req.http.Cookie;
    unset req.http.Authorization;
      # Only keep the following if VCL handling is complete
      return(hash);
  }
  }
  sub vcl_backend_response {
  if (bereq.url ~ "^[^?]*\.(pbf)(\?.*)?$") {
      unset beresp.http.Set-Cookie;
      set beresp.ttl = 1d;
  }
  }

Redeploy Varnish, assumption kustomize.yaml is in current directory.

# delete the deployment
oc delete -k .
# Deploy
oc apply -k .

Encountered Errors and Resolution

Errors	Resolution
Permission denied cannot create <some path> vsm? and pod restarting with CrashLoopbackoff error.	Check `tmpfs` , set `mountPath` to the path mentioned in the error, also map the Volumes to `emptyDir: {}`
503 Backend fetch failed	`default.vcl` backend `.host` and `.port` in default segment is not mapped correctly to pg_tileserv Openshift service
Port 80 permission denied	Configure ContainerPort to 8080
Pod logs show all Cache pages are MISS	In default.vcl Configure vcl_recv and vcl_backend_response functions to unset req.http.Cookie and beresp.http.Set-Cookie, after our deployment we found that pages were not caching as our webserver was setting cookies on every page request this meant that every page was a new page and the hash value calculated by the Varnish cache is going to be different.

Closing Thoughts

Our clients have a complex mapping data. pg_tileserv was generating a fair amount of database server load and rendering could be slow at times. Implementing Varnish caching improved map rendering performance by 25% to 30% and equally reduced load on our database. I hope this blog was helpful, and we at Crunchy Data wish you happy learnings.