CrunchyData Blog

Advent of Code in PostgreSQL: Tips and Tricks from 2022

Greg.Sabino.Mullane@crunchydata.com (Greg Sabino Mullane) — Mon, 04 Dec 2023 08:00:00 EST

I’ve nearly finished solving the 2022 series in Advent of Code in PostgreSQL on our blog, many of these are available on our browser based Postgres playground as well. As many of you embark on your own Advent of Code adventures for 2023 this week, or maybe watch from afar, I wanted to pull together some themes, recommendations, tips, and tricks that I’ve seen work with the solutions. If there’s anything I’ve learned, it’s that you can solve almost anything with PostgreSQL!

psql holiday presets

Before you do anything, get in the holiday spirit and set your nulls to a snowman ☃️ or any other image you’d like:

\pset null ☃

Data loading via text FDW

You’ll see that every time I use the file_fdw extension to connect to the file via a foreign table. This saves me from having to load the file in. I can connect to that, build my new relational tables and move the data from the foreign data wrapper.

CREATE EXTENSION file_fdw;

CREATE SERVER aoc2022 foreign data wrapper file_fdw;

CREATE FOREIGN TABLE aoc_day1 (calorie int)
  SERVER aoc2022 options(filename '/tmp/aoc2022.day1.input', null '');

I’m also a big fan of unlogged tables. These are helpful because these challenges are ephemeral and everything will run faster if you take out the logging.

Using sequences

Most of these puzzles require you to take an input file of plan ASCII text and start organizing into a way that will help you create the solutions, puzzles, mazes, etc. One key PostgreSQL feature here is to use CREATE SEQUENCE. When used in combination with CTEs, regex, arrays and other functions, it will help you create order out of the chaos given to you with your starting Advent of Code file.

A lot of my sequences appear at the start of a CTE; see this example from Day 22

CREATE SEQUENCE aoc;
CREATE SEQUENCE aoc2;

WITH x AS (SELECT nextval('aoc') AS myrow, setval('aoc2',1,false), line
  FROM aoc_day22 WHERE line !~ '\d')
,y AS materialized (SELECT *, string_to_table(line, null) AS a FROM x)
,z AS (SELECT *, nextval('aoc2') AS mycol FROM y)
INSERT INTO monkeymap (y,x,item)
SELECT myrow, mycol, a FROM z WHERE a <> ' ';

A few key functions in sequences are:

setval: For setting a value
nextval: For getting the next value in a sequence.
currval: For calling the current value in a sequence.

Day 1 uses all three of these:

-- Call it once so that `currval` works. The starting number can be anything.
SELECT setval('aoc', 1);

SELECT calorie, CASE WHEN calorie is null THEN nextval('aoc') ELSE currval('aoc') END
FROM aoc_day1;

Window Functions

You’ll also find a lot of uses for window functions in creating sequences and keeping track of what row you’re working in. See Day 12 for an example.

WITH x AS (SELECT setval('aoc',1,false), line FROM aoc_day12)
,xgrid AS (
  SELECT row_number() over() AS row, nextval('aoc') AS col,
  regexp_split_to_table(line, '') AS height
FROM x)
SELECT * FROM xgrid LIMIT 10;

PL/pgSQL functions

If you’re brand new to doing this in Postgres, many of these games are solved by creating a large function that goes through the data and does the work for you. Then the subsequent actions or bonus points section is another round of that. The DO command can be a good idea if you’re running a one-time function for playing a game, see Day 6 an example of that. Don’t be scared to create huge functions and build them pieces by piece. I recommend lots of annotations in your code here to help you debug later.

Recursive functions

Recursive functions are going to be a go-to for many of the games. These allow you to run pieces of code, get the results from that to input into a later part of the function. Day 19 is a great example of this where the first part of the function gets a score and the later parts of the function declare what to do with different scores:

/* In the final minute, all we care about is gathering geodes */
  IF minute >= maxminute THEN RETURN geodes + geode_robots; END IF;

  /* If we can afford a geode robot, make it */
  IF ore >= geode_robot_cost AND obsidian >= geode_robot_cost2 THEN

    geode_score = give_me_the_remote(
      ore + ore_robots - geode_robot_cost, ore,
      clay + clay_robots, clay,
      obsidian + obsidian_robots - geode_robot_cost2,
      geodes + geode_robots,
      ore_robots, clay_robots, obsidian_robots, geode_robots + 1,
      ore_robot_cost, clay_robot_cost, obsidian_robot_cost,
      obsidian_robot_cost2, geode_robot_cost, geode_robot_cost2,
      minute, maxminute, ''
    );

Looped functions

Looped functions are going to be common since you’ll be iterating over the data input files and trying to fill in gaps or make sense of things. You might do something like creating a Brute Force Search using a looped function like in Day 12.


FOR myrec IN SELECT * FROM heightmap WHERE id = any(currentq) LOOP
    -- Code for checking directions and updating the queue
END LOOP;

You can also limit the number of loops, like I did in Day 23 without an exit clause.

/* For this main loop, let's bail if we hit 5000 rounds */
WHILE myround < 5000 LOOP
  myround = myround + 1;

Locations, tracking, and arrays

Quite a few of these games deal with finding things in a grid or location and you’ll be getting pretty creative with SQL to make this work. You can use || and array_remove to populate and then remove some of your text based location data. There’s a good example of this in Day 12.

-- This array stores which points we are currently interested in
  currentq = currentq || startpoint;

  -- This array stores points that have already been checked
  visited = visited || startpoint;

The animated solutions puzzles take these even further by creating locations and moving objects. See Day 19 for an example of this.

rockpaths POINT[][] = ARRAY[
    [(0,0),(0,0),(1,0),(2,0),(3,0)], /* HLINE  */
    [(1,0),(0,1),(1,1),(2,1),(1,2)], /* CIRCLE */
    [(0,0),(1,0),(2,0),(2,1),(2,2)], /* ANGLE  */
    [(0,0),(0,0),(0,1),(0,2),(0,3)], /* VLINE  */
    [(0,0),(0,0),(0,1),(1,0),(1,1)]  /* BOX    */

Regex

Regular expressions are a super important part of the PostgreSQL functions in solving Advent of Code, both for sequencing the initial data set and in the game functions. Here’s some popular regex functions to keep in your back pocket.

regexp_split_to_table()

Used for splitting strings into individual characters, see Day 2.

regexp_split_to_table(password, '')

regexp_matches()

Employed for matching and extracting specific patterns in strings, see Day 4.

regexp_matches(passport_data, '(\w+):', 'g')

regexp_replace()

Utilized for replacing specific substrings in strings, see Day 7.

regexp_replace(description, ' bags?', '', 'g')

regexp_like()

Used for checking if a string matches a specified pattern, also in Day 2.

regexp_like(password, '^(\d+)-(\d+) (\w): (\w+)$')

regexp_split_to_array()

Similar to regexp_split_to_table(), used for splitting strings into arrays, see Day 3.

regexp_split_to_array(line, '')

regexp_substr()

Employed for extracting substrings based on a regular expression, see Day 7.

regexp_substr(contains, '(\d+) (\w+) bags?', 1, 1, '', 1)

ASCII animations

Moving ASCII art

One of the things that made AOC in PostgreSQL really cool was the ability to actually have things move around inside the console. It was fun to move from plain numbers to actual graphics. By combining functions for moving objects in a loop, you can see your ASCII animations as the functions run. See Day 5 for a good example of this.

Drawing with data and creating a pixel screen output

You can create functions to draw table data, to solve an ASCII puzzle, see Day 10 for an example of this.

NOTICE:   ##  #    #### #### #### #     ##  ####
NOTICE:  #  # #    #       # #    #    #  # #
NOTICE:  #  # #    ###    #  ###  #    #    ###
NOTICE:  ###  #    #     #   #    #    # ## #
NOTICE:  # #  #    #    #    #    #    #  # #
NOTICE:  #  # #### #### #### #    ####  ### ####

Adding ANSI colors

For more advanced visual effects, you can add color to your functions so they output with a specific look in your terminal emulator. See Day 9.

CASE WHEN myrec.head THEN E'\033[31mH\033[0m'
             WHEN myrec.middle ~ '9' THEN E'\033[33mT\033[0m'
             WHEN myrec.tailcount>0 THEN E'\033[32m*\033[0m' ELSE ' ' END);

Animating Objects inside a PostgreSQL Terminal

Day 17’s puzzle builds a falling block game akin to Tetris. The fundamentals here are creating sequences to track positions, a function to add new objects, a function for shifting objects, a function to draw the base chamber/game board, and a function that combines these other functions. What you get is a whole game like this.

My takeaways from doing AOC ALL in Postgres/SQL

Postgres is so extensible, I even created a new operator in Day 2
SQL is great for some things, atrocious for others
Sometimes fast and ugly is better than slow and perfect
Performance is often a bunch of small things over a large number of iterations
Brute force seldom works - it’s the algorithm / approach that matters
Pure SQL is possible but painful and PL/pgSQL is much easier than recursive CTEs

The series of solutions for 2022 Advent of Code in Postgres is available if you want to scroll through and use ideas for 2023. Good luck to all!

This post was co-authored with Elizabeth Christensen.

Fun with Postgres ASCII Map and Cardinal Directions

Greg.Sabino.Mullane@crunchydata.com (Greg Sabino Mullane) — Wed, 29 Nov 2023 08:00:00 EST

Disclaimer

This article will contain spoilers both on how I solved 2022 Day 23's challenge "Unstable Diffusion" using SQL, as well as general ideas on how to approach the problem. I recommend trying to solve it yourself first, using your favorite language.

AOC Day 23

Tech used in this Day:

The file_fdw Foreign Data Wrapper
Materialized (and not materialized) CTEs aka Common Table Expressions
Custom data types
Various handy functions like string_to_table and array_agg and unnest
Tweaking the plan_cache_mode parameter
Using the create as / truncate / copy trick to remove table bloat
Windowing functions along with avg and stddev
More ANSI/Unicode fun

For this challenge, we received an ASCII map representing where elves are standing and where they are not:

....#..
..###.#
#...#.#
.#...##
#.###..
##.#.##
.#..#..

The octothorpes are the elves. The idea is that a number of rounds happen. During the first half of the round, the elves look around and decide on which direction they would like to move, as long as the area in that direction is not occupied. In the second half of the round, they try to move. If two or more elves try to move into the same spot, then neither of them moves. Our goal is to calculate how diffuse the grid becomes after ten rounds. To start, we do our usual setup of using a foreign data wrapper to import the text file. Not that the test file above is small: the actual one is a 72 x 72 character grid.

CREATE EXTENSION if not exists file_fdw;

CREATE SERVER if not exists aoc2022 foreign data wrapper file_fdw;

DROP SCHEMA if exists aoc2022_day23_grove CASCADE;
CREATE SCHEMA aoc2022_day23_grove;
SET search_path = aoc2022_day23_grove, public;

CREATE FOREIGN TABLE aoc_day23 (line text)
  SERVER aoc2022 options(filename '/tmp/aoc2022.day23.input'
-- SERVER aoc2022 options(filename '/tmp/aoc2022.day23.testinput'
);

The first step is to transform that text file into a SQL table. We will throw things into an unlogged table, with X and Y coordinates, plus extra columns to put our proposed movements each round. Two sequences help us to put characters into the correct coordinates. Also note the WHERE a = '#' which ensures we only add the positions in which there is an elf. If the spot is empty, we do not even add it to the table (which as we will see, will be helpful later on). We use string_to_table to break each line into separate characters, and the MATERIALIZED keyword to ensure that part of the CTE does not get combined with the rest, but runs standalone.

CREATE SEQUENCE aoc;
CREATE SEQUENCE aoc2;

CREATE UNLOGGED TABLE grove (
  y INT,
  x INT,
  propx INT,
  propy INT
);

WITH x AS (SELECT nextval('aoc') AS myrow, setval('aoc2',1,false), line
  FROM aoc_day23)
,y AS materialized (SELECT *, string_to_table(line, null) AS a FROM x)
,z AS (SELECT *, nextval('aoc2') AS mycol FROM y)
INSERT INTO grove (y,x)
SELECT myrow, mycol FROM z WHERE a = '#';

CREATE INDEX groveindex on grove(x,y);

Before we create the final function, we need to do a couple more things. While writing this solution, I found that a generic query plan was much faster than having Postgres use a custom one. Quick aside: when Postgres runs the same query over and over inside a function, the query becomes a prepared statement (see the recent post on doing this inside pgbouncer). Postgres has a choice of using a generic plan to cover all possible inputs, or a custom one, which generates a plan based on the specific inputs. By default, the choice is set to "auto", which means Postgres will use custom for five times, then try a generic and pick a winner for all future runs. This strategy works well, but sometimes forcing it to one or the other works best. In our case, we want to force it to always use a generic plan from the start. Not that this only affects the current session.

SET plan_cache_mode = force_generic_plan;

The other thing we need is a way to represent the exact coordinate of a single elf. We have two coordinates, so we need a custom type to bind them into a single object:

CREATE TYPE boi AS (x INT, y INT);

Here, "boi" stands for "bundle of ints" (but please pronounce it in the most fun way possible). While I could have used Postgres' built-in point type, it uses the float data type, and can be a little finicky in some of its casting and operators, so I opted to make my own type. The function below solves parts one and two of the puzzle, hence the single arguments it takes directing it which to solve.

CREATE or replace FUNCTION grove_walk(puzzle int)
  returns int language plpgsql AS $$
DECLARE
  q RECORD; myround INT = 0; r RECORD; myarray boi[];
  dirloop TEXT = 'NSWE'; mydir CHAR; mymoves int = 0;

BEGIN

/* For this main loop, let's bail if we hit 5000 rounds */
WHILE myround < 5000 LOOP
  myround = myround + 1;

/* Every once in a while, rebuild the whole table */
  IF 0=myround%10 THEN
    RAISE NOTICE 'At round %, we are going to rebuild the table', myround;
    DROP TABLE if exists temp_grove;
    CREATE UNLOGGED TABLE temp_grove AS SELECT * FROM grove;
    TRUNCATE TABLE grove;
    INSERT INTO grove SELECT * FROM temp_grove ;
    ANALYZE grove;
  END IF;

  raise debug 'Round % start; previous moves was %', myround, mymoves;
  mymoves = 0;

  /* Loop through every elf in random order and find its proposed movement */
  <<elf>> FOR q IN SELECT ctid,x,y FROM grove LOOP

    /* Grab from 1 to 9 nearby points and put into an array */
    SELECT INTO myarray array_agg((x,y)) FROM grove
        WHERE y BETWEEN q.y-1 AND q.y+1
          AND x BETWEEN q.x-1 AND q.x+1;

    /* If nobody is nearby, we stay put! */
    IF array_length(myarray,1) = 1 THEN
      CONTINUE elf;
    END IF;

    /* Check each direction - the order changes each time per the rules */
    FOR mydir IN SELECT unnest(string_to_array(dirloop, NULL)) LOOP


      IF mydir = 'N' AND NOT myarray &&
        ARRAY[(q.x-1,q.y-1)::boi, (q.x,q.y-1)::boi, (q.x+1,q.y-1)::boi] THEN
          UPDATE grove SET propx = q.x+0, propy=q.y+-1 WHERE ctid=q.ctid;
          CONTINUE elf;
      ELSEIF mydir = 'S' AND NOT myarray &&
        ARRAY[(q.x-1,q.y+1)::boi, (q.x, q.y+1)::boi, (q.x+1,q.y+1)::boi] THEN
          UPDATE grove SET propx = q.x+0, propy=q.y+1 WHERE ctid=q.ctid;
          CONTINUE elf;
      ELSEIF mydir = 'W' AND NOT myarray &&
        ARRAY[(q.x-1,q.y-1)::boi, (q.x-1, q.y)::boi, (q.x-1,q.y+1)::boi] THEN
          UPDATE grove SET propx = q.x+-1, propy=q.y+0 WHERE ctid=q.ctid;
          CONTINUE elf;
      ELSEIF mydir = 'E' AND NOT myarray &&
        ARRAY[(q.x+1,q.y-1)::boi, (q.x+1, q.y)::boi, (q.x+1,q.y+1)::boi] THEN
          UPDATE grove SET propx = q.x+1, propy=q.y+0 WHERE ctid=q.ctid;
          CONTINUE elf;
      END IF;

    END LOOP; /* end of each direction */

  END LOOP; /* end of each elf */

  /* Remove all proposals that bump into each other */
  UPDATE grove SET propx=null WHERE (propx,propy)
    = ANY(SELECT propx,propy FROM grove where propx IS NOT NULL GROUP BY 1,2 HAVING count(*) > 1);

  /* Move each elf that is not going to run into another one */
  FOR r IN SELECT * FROM grove WHERE propx IS NOT NULL LOOP
    mymoves = mymoves + 1;
    -- Cannot use ctid below!
     UPDATE grove g SET propx=null,x=r.propx,y=r.propy WHERE x=r.x AND y=r.y;
   END LOOP;

  /* Leave if puzzle 2 is solved */
  IF mymoves < 1 THEN RETURN myround; END IF;

  /* Leave if puzzle 1 is solved */
  IF puzzle=1 AND myround = 10 THEN
    return 1;
  END IF;

  /* Start in a new direction next round */
  dirloop = right(dirloop,-1) || left(dirloop,1);

  END LOOP;

  return 0;

END
$$;

Let's break this function down line by line, starting with the first two:

CREATE or replace FUNCTION grove_walk(puzzle int)
  returns int language plpgsql AS $$

This part is where we give the function a name, set the argument to the variable `puzzle', tell Postgres what language is being used, declare that it should return a single integer. Finally, we start the body of the function with the useful dollar-sign quoting technique.

DECLARE
  q RECORD; myround INT = 0; r RECORD; myarray boi[];
  dirloop TEXT = 'NSWE'; mydir CHAR; mymoves int = 0;

Next up is the declaration section, in which we tell Postgres what variables we are going to use in this function, and what types they are. We use the RECORD types when iterating through the results of a query. The INTEGER types for simple counting. The TEXT and CHAR types keep track of which direction we are currently trying. We also assigned a variable to our new data type boi - actually, an array of them.

BEGIN

WHILE myround < 5000 LOOP
  myround = myround + 1;

The keyword BEGIN marks the actual code that will run when the function begins. We immediately enter a WHILE loop, with a safety hatch of 5000 loops, and start counting the loops with the myround variable.

/* Every once in a while, rebuild the whole table */
  IF 0=myround%10 THEN
    RAISE NOTICE 'At round %, we are going to rebuild the table', myround;
    DROP TABLE if exists temp_grove;
    CREATE UNLOGGED TABLE temp_grove AS SELECT * FROM grove;
    TRUNCATE TABLE grove;
    INSERT INTO grove SELECT * FROM temp_grove ;
    ANALYZE grove;
  END IF;

This function is going to do a LOT of updates to the grove table. Each update actually does a DELETE and an INSERT behind the scenes, due to the way MVCC works in Postgres. Those deletes are a problem, as the table becomes more bloated each round, slowing down the time it takes a round to complete. To get around this, one could tune the table to make the autovacuum daemon more aggressive, and/or do some manual vacuum cleanup. Because no other connections are using it, we can rebuild it in place by creating a temporary copy of it, truncating the original, then re-populating it by copying back from the temp table. As one might imagine, this is a heavy lift, but in this case the performance boost of completely removing all table bloat every 10 rounds is well worth it.

  raise debug 'Round % start; previous moves was %', myround, mymoves;
  mymoves = 0;

  /* Loop through every elf in random order and find its proposed movement */
  <<elf>> FOR q IN SELECT ctid,x,y FROM grove LOOP

A little debugging line goes next. By setting the level to DEBUG, we ensure the output will not appear unless the caller forces their level to debug like this: SET client_min_messages - DEBUG. We also want to track how many times each round an elf has moved, so we reset the mymoves variable to 0. Then we run a SELECT statement and pull back each row (i.e. each elf) from the table. We are going to examine every elf and see which, if any, space it desires to move to. Because all the elves act independent of each other, we can run this LOOP without the use of an ORDER BY - but remember that a lack of ORDER BY is usually a red flag except in special circumstances like this one. We also gave this LOOP a name ("elf"). The use of a name is optional for loops inside of Pl/pgsql, but can be very handy when you have many loops and need to refer to a specific one. Rather than a SELECT * we only pull back the exact rows we need - in this case the x and y coordinates, as well as the special system column ctid which represents the actual location of the row.

    /* Grab from 1 to 9 nearby points and put into an array */
    SELECT INTO myarray array_agg((x,y)) FROM grove
        WHERE y BETWEEN q.y-1 AND q.y+1
          AND x BETWEEN q.x-1 AND q.x+1;

This next part took a little trial and error to find the best way to do this. And by "best", I mean fastest way via SQL. We are going to need to look all around each elf and see which nearby spots are free. Which spots to check depend on the direction. For example, if we want to see if an elf can go north, we need to look north (N), northeast (NE), and northwest (NW). While we could do this via four separate SELECT statements for each direction, and then act on it, it is far more efficient to make a single call, and then react to that. So our statement is going to pull back all rows representing the 9 squares surrounding the current location. So we are going to look North, South, East, West, and also NE, NW, SE, and SW. We need to store all that information into a single variable, even though it will get returned as a collection of 1 to 9 rows. We always return at least one row because of ourselves in the circle of 8. All the rows get collapsed into an array by use of the array_agg function. We also need to combine the x and y into a single value, so they get put into a boi array. Because myarray is already declared as an array of the data type boi, we do not need to cast the argument to array_agg, although one could for clarity.

    /* If nobody is nearby, we stay put! */
    IF array_length(myarray,1) = 1 THEN
      CONTINUE elf;
    END IF;

This part is simple, and reflects the first rule of the puzzle in part one: if there are no elves around us, we stay put and do not populate our proposed x and y coordinates. We do this by checking the length of the array we built. If there is only one nearby elf (us), we continue on to the next elf. Note that the array_length function has an annoying mandatory second argument, specifying which part of the array to check. As our array only has a single dimension, we set it as 1. It would be nice if there were a single argument array_length function that worked for simple arrays.

  /* Check each direction - the order changes each time per the rules */
  FOR mydir IN SELECT unnest(string_to_array(dirloop, NULL)) LOOP

The rules also dictate that the cardinal directions get checked in a different order each time. So the first time through, we check north first, and if we find a match, we don't bother checking the other directions. Each round, the order of directions checked changes. We track this by modifying the four-character string (e.g. "SNWE") inside the dirloop variable. That string gets separated into a four-element array with the string_to_array function. The second argument of NULL ensures we split on every character. We then feed that to unnest as a way to walk through the items in the array as part of a loop.

    IF mydir = 'N' AND NOT myarray &&
        /* Anything to the N, NE, or NW? */
        ARRAY[(q.x-1,q.y-1)::boi, (q.x,q.y-1)::boi, (q.x+1,q.y-1)::boi] THEN
          UPDATE grove SET propx = q.x+0, propy=q.y+-1 WHERE ctid=q.ctid;
          CONTINUE elf;
      ELSEIF mydir = 'S' AND NOT myarray &&
        ARRAY[(q.x-1,q.y+1)::boi, (q.x, q.y+1)::boi, (q.x+1,q.y+1)::boi] THEN
          UPDATE grove SET propx = q.x+0, propy=q.y+1 WHERE ctid=q.ctid;
          CONTINUE elf;
/* Not shown: West and East */

In this section, for the current direction of interest, we see if there are any matching entries by adjusting the x and y values, then using the && operator to see if any of the spots we care about are already inside (overlaps) the myarray variable. This variable is an array (or list) or x/y coordinates that we know are nearby. If there are no matches, we know that the three coordinates are empty and thus we can move in the current direction. Since we still have to account for bumping into another elf trying to move to the same place, we update the table and put our new x/y coordinates as proposed x and proposed y. Once we get a match, we do not need to check the other directions, so we continue to the next elf. Because we are already inside of a loop, a bare CONTINUE would go to the next direction, not the next elf, so we use a CONTINUE elf; instead.

    END LOOP; /* end of each direction */

  END LOOP; /* end of each elf */


  /* Remove all proposals that bump into each other */
  UPDATE grove SET propx=null WHERE (propx,propy)
    = ANY(SELECT propx,propy FROM grove where propx IS NOT NULL GROUP BY 1,2 HAVING count(*) > 1);

We finish going in each direction, then finish up each elf. At that point, we have walked through every point in the grid and populated the proposed x and proposed y for each elf that can move. We then build a list of all the proposed x,y coordinates, and use GROUP BY and HAVING to only list the ones used by more than one elf. That SELECT statement gets passed to ANY, which allows the outer UPDATE to void any proposed spots in which more than one elf is trying to use. We don't need to set both propx and propy.

  /* Move each elf that is not going to run into another one */
  FOR r IN SELECT * FROM grove WHERE propx IS NOT NULL LOOP
    mymoves = mymoves + 1;
    -- Cannot use ctid below!
     UPDATE grove g SET propx=null,x=r.propx,y=r.propy WHERE x=r.x AND y=r.y;
   END LOOP;

Now that we removed the conflicting propx values, we can walk through the remaining ones and perform the move, by setting their x/y to the proposed x/y. We also null out the propx while we are here. We also increment our move count, which is important for part two of the puzzle.

  /* Leave if puzzle 2 is solved */
  IF mymoves < 1 THEN RETURN myround; END IF;

  /* Leave if puzzle 1 is solved */
  IF puzzle=1 AND myround = 10 THEN
    return 1;
  END IF;

These are straightforward - we are looking for the exit conditions for each part of the puzzle. For part 1, we still have some further calculations to do, so we leave with an arbitrary and unimportant return value of "1"

  /* Start in a new direction next round */
  dirloop = right(dirloop,-1) || left(dirloop,1);

  END LOOP;
  return 0;

END
$$;

Finally, before we end our main outer loop, we switch the direction. The rules ask us to shift the order by one each turn, such that "NSWE" becomes "SWEN" and then becomes "WENS" etc. This is easy in SQL by using the left and right functions to grab the last item and then stick it in front of the other three. Finally we end the loop, and end the function. For part one, we also need to see how many elves there are in the final area. Before we do so, we make a copy of the grove table, as part two needs to start from a pristine state.

CREATE TEMP TABLE grove_backup AS SELECT * FROM grove;
SELECT grove_walk(1);

WITH m as (SELECT (max(x) - min(x)+1) * (max(y)-min(y)+1) AS total FROM grove)
,t as (SELECT count(*) AS used FROM grove)
SELECT total - used AS aoc_2022_day23_part1 FROM m, t;

The above runs about 4.8 seconds on my system. On to part two!

AOC Day 23 - Part Two

Part two of the puzzle asks us to go way beyond 10 rounds and find the point at which no more elves have moved. In other words, until the mymoves variable is zero. We can use the same function, although our periodic table rebuild becomes more important than ever, as the final answer (for my input) was 957 rounds. Before we run the function, we do need to start with a copy of the table that was not updated by part one, so we rollback the table to the state it was in right before we ran part one.

/* PITR */
TRUNCATE TABLE grove;
INSERT INTO grove SELECT * FROM grove_backup;
SELECT grove_walk(2) AS aoc_2022_day23_part2;

That's all for this puzzle. This part took the longest yet to run of any Day - about six minutes. There are ways to speed that up a lot - such as putting things into memory instead of constant updates to the table. But the UPDATEs and SELECTs feel more true to the goal of doing this all in SQL as much as possible.

AOC Day 23 - Bonus Round!

Earlier on, I made the choice to periodically rebuild the entire table we were using to track the location of the elves. By doing so, we got a "fresh", unbloated version of the table to appear every 10 turns. However, was I correct in thinking things slow down? And was 10 a decent default? As it turns out, analyzing data and finding trends is something databases are particularly good at! The first step was to create a simple table to collect how long each round took:

CREATE UNLOGGED TABLE public.elf_timing (
  freq int,
  rebuild bool,
  round int,
  mytime timestamptz
);

Next, we run an INSERT inside our function, once per loop, and an extra one any time that we rebuild the table:

myfreq INT = 10;
...

INSERT INTO public.elf_timing SELECT myfreq, true, myround, timeofday()::timestamptz;

  /* Every once in a while, rebuild the whole table */
  IF 0=myround % myfreq THEN
    DROP TABLE if exists temp_grove;
    CREATE UNLOGGED TABLE temp_grove AS SELECT * FROM grove;
    TRUNCATE TABLE grove;
    INSERT INTO grove SELECT * FROM temp_grove ;
    INSERT INTO public.elf_timing SELECT myfreq, false, myround, timeofday()::timestamptz;
  END IF;

The true/false lets us pick out the times when we are rebuilding the table, which will give us insight later as to how long it actually takes to rebuild this table. We use timeofday() to return the current time. If we were to use now(), it would return the same timestamp each round, as it only returns the time the current transaction started with. Once those new inserts are in place, we can rerun the function and adjust the freq variable each time to see exactly how long each round takes. Our timing table starts to look like this:

  freq | rebuild | round |            mytime
 ------+---------+-------+-------------------------------
    10 | t       |     1 | 2023-11-22 00:05:25.265515-05
    10 | t       |     2 | 2023-11-22 00:05:25.744037-05
    10 | t       |     3 | 2023-11-22 00:05:26.22159-05
    10 | t       |     4 | 2023-11-22 00:05:26.691277-05
    10 | t       |     5 | 2023-11-22 00:05:27.161819-05
    10 | t       |     6 | 2023-11-22 00:05:27.644129-05
    10 | t       |     7 | 2023-11-22 00:05:28.135452-05
    10 | t       |     8 | 2023-11-22 00:05:28.635376-05
    10 | t       |     9 | 2023-11-22 00:05:29.138946-05
    10 | t       |    10 | 2023-11-22 00:05:29.647686-05
    10 | f       |    10 | 2023-11-22 00:05:29.693706-05
    10 | t       |    11 | 2023-11-22 00:05:30.117274-05
    10 | t       |    12 | 2023-11-22 00:05:30.539393-05
    10 | t       |    13 | 2023-11-22 00:05:30.972813-05

I ran the function 20 times with the new tracking information: starting with a frequency of 5 (i.e. rebuilding the table every 5 rounds), then went up by 5 until I hit 100.

First order of business: how expensive is it to rebuild that table? It's a SELECT* plus a CREATE TABLE plus a TRUNCATE plus another SELECT* plus some index rebuilding. A good amount of work. On the other hand, the table is unlogged and very, very small. So, let's calculate some numbers. What we need to focus on is the border between rebuild false and rebuild true. Specifically, we need to see how much the mytime value changes from the final true rebuild call for each round, until the next false rebuild call. When we are trying to compare rows to nearby rows, we reach for windowing functions:

WITH x AS (SELECT *,
  CASE WHEN rebuild is false THEN mytime-lag(mytime) OVER(order by mytime) ELSE null END as cc
  FROM elf_timing)
, y AS (SELECT extract(epoch from cc) AS secs FROM x WHERE cc is not null)
SELECT min(secs),max(secs),avg(secs),stddev(secs) from y;

So the first thing the CTE above will do is calculate the difference at these borders by computing the current mytime versus the previous rows mytime (via the lag function) and do this over a simple window in which we order by mytime. If the rebuild value for the row is true, we throw away the result by setting it to null. In the next part of the CTE, labeled y, we throw away all the rows that have that null value, and change the interval generated by x into a number of seconds. Finally, we run some simple statistics on our final list of numbers.

   min    |   max    |          avg           |           stddev
----------+----------+------------------------+----------------------------
 0.010417 | 0.100946 | 0.03749472956782199401 | 0.007553743791384258077350

So, on average, this rebuild took about 37 milliseconds. How does this compare to not letting it run at all? Let's peek at the two rounds before the rebuild, by adding an extra argument to the lag function to have it go back an extra row:

WITH x AS (SELECT *,
  CASE WHEN rebuild is false THEN lag(mytime) OVER(order by mtime)
        - lag(mytime,2) OVER(order by mytime) ELSE null END as cc
  FROM elf_timing)
, y AS (SELECT extract(epoch from cc) AS secs FROM x WHERE cc is not null)
SELECT min(secs),max(secs),avg(secs),stddev(secs) from y;
   min    |   max    |          avg           |         stddev
----------+----------+------------------------+------------------------
 0.037867 | 1.944933 | 0.42882609798887462559 | 0.22408780679059776592

We can see from this that our table rebuild is a success, as the normal non-rebuild runs are much more expensive. But we still need confirmation that the longer we wait to rebuild, the worse the total time is. For that, we need to compare our final run (which we know was round 957) to the first run, for each of the frequencies. A slight tweak of the CTE above gives us a nice answer:

WITH
x AS (SELECT *, CASE WHEN round=957 THEN mytime-lag(mytime) OVER(order by mytime) ELSE null END as mylag
  FROM timing WHERE round=1 OR round=957 ORDER BY mytime),
y AS (SELECT * FROM x where mylag is not null ORDER BY freq),
z AS (SELECT date_trunc('minute', (min(mylag))) AS floor FROM y),
q AS (SELECT freq, mylag, round(extract(epoch from mylag-floor),0) AS stretch FROM y,z)
SELECT * FROM q ORDER BY freq;

As before, we use a lag and a IS NOT NULL to produce a list of timings. We also add in a new CTE named z to find the lowest and nearest minute, for use in an upcoming function. For now, let's see the result:

 freq |    mylag        | delta
------+-----------------+-------
    5 | 00:05:58.344671 |    58
   10 | 00:06:19.189716 |    79
   15 | 00:06:38.778634 |    99
   20 | 00:06:59.784113 |   120
   25 | 00:07:15.632686 |   136
   30 | 00:07:38.084029 |   158
   35 | 00:07:56.465293 |   176
   40 | 00:08:10.91232  |   191
   45 | 00:08:26.602071 |   207
   50 | 00:08:47.820099 |   228
   55 | 00:09:05.50898  |   246
   60 | 00:09:20.44909  |   260
   65 | 00:09:41.765944 |   282
   70 | 00:09:56.672234 |   297
   75 | 00:10:16.567748 |   317
   80 | 00:10:37.709436 |   338
   85 | 00:10:46.580778 |   347
   90 | 00:10:59.836356 |   360
   95 | 00:11:21.743917 |   382
  100 | 00:12:06.669316 |   427
(20 rows)

Okay, we can see the total time to run our function increases at a regular rate as the delay of rebuild grows. But can we do better? What fun are boring numbers when we have a terminal that supports ANSI color codes and (some) Unicode characters? Let's create a quick function to output a bar chart. One annotated function coming right up:

CREATE OR REPLACE FUNCTION elf_graph()
returns void language plpgsql as $$
DECLARE
  myrec RECORD; len INT;
  mytext TEXT = chr(10); /* The final string to output: start it with a newline */
  mycolor TEXT;
  green   TEXT = E'\x1b[38;5;77m';
  red     TEXT = E'\x1b[38;5;196m';
  orange  TEXT = E'\x1b[38;5;214m';
  purple  TEXT = E'\x1b[38;5;165m';
  reset   TEXT = E'\033[0m';
BEGIN

/* Alas, we cannot use a WITH and a LOOP, so we need a temp table.
   This table does not need to hang around, so we add ON COMMIT DROP        */
CREATE TEMP TABLE myinfo ON COMMIT DROP AS
WITH
x AS (SELECT *, CASE WHEN round=957 THEN mytime-lag(mytime) OVER(order by mytime) ELSE null END as mylag
  FROM timing WHERE round=1 OR round=957 ORDER BY mytime),
y AS (SELECT * FROM x where mylag is not null ORDER BY freq),
z AS (SELECT date_trunc('minute', (min(mylag))) AS floor FROM y),
q AS (SELECT freq, mylag, round(extract(epoch from mylag-floor),0) AS stretch FROM y,z)
SELECT * FROM q;

mycolor = red;

FOR myrec in SELECT * FROM myinfo ORDER BY freq LOOP
  /*
     My screen is not wide enough to represent a strict 1:1 relationship,
     so we cut the size in 2 for our stretch column
   */
  len = (myrec.stretch)/2;

  /*
    For contrast, we will alternate between red and green lines.
    Also, because elves = christmas = red and green
  */
  mycolor = CASE WHEN mycolor=red THEN green ELSE red END;

  /*
     Every new row of information, we start a new line, change the color to purple,
     and output the frequency for the row. We use to_char with 999 to make sure
     the numbers are the same width and right-justified. We also orange-output our
     current length, which is roughly the number of seconds to run.
  */

  mytext = mytext || chr(10) || purple || to_char(myrec.freq, '999');
  mytext = mytext || orange || to_char(len, ' 999  ');

  /* For each two values we find, output a Unicode "FULL BLOCK" */
  mytext = mytext || mycolor || repeat(U&'\2588', len/2);

  /*
    To help make things a little more accurate, we also output
    a Unicode "LEFT HALF BLOCK" if the number was odd
  */
  IF 0 != len %2 THEN mytext = mytext || U&'\258C'; END IF;

END LOOP;

/*
  We were lazy and did not reset the color codes above, relying instead
  on the new color to clobber the old one. However, we now need to
  reset any and all colors at the end of the string
*/
mytext = mytext || reset;

/* Output the entire graph */
RAISE NOTICE '%', mytext;

END;
$$;

Let's run it and see what happens:

Fun with Postgres Text File Mazes, Charts, and Routes

Greg.Sabino.Mullane@crunchydata.com (Greg Sabino Mullane) — Fri, 24 Nov 2023 08:00:00 EST

Disclaimer

This article will contain spoilers both on how I solved 2022 Day 22's challenge "Monkey Map" using SQL, as well as general ideas on how to approach the problem. I recommend trying to solve it yourself first, using your favorite language.

AOC Day 22

Tech used:

The file_fdw extension to read the input
Unlogged tables
Sequences
Building and modifying arrays via regexp_split_to_array and array_remove
More ASCII animation!

The first step is to read the text-based input file into a Postgres table:

CREATE EXTENSION if not exists file_fdw;

CREATE SERVER if not exists aoc2022 foreign data wrapper file_fdw;

DROP SCHEMA if exists aoc2022_day22_monkeymap CASCADE;
CREATE SCHEMA aoc2022_day22_monkeymap;
SET search_path = aoc2022_day22_monkeymap;

CREATE FOREIGN TABLE aoc_day22 (line text)
  SERVER aoc2022 options(filename '/tmp/aoc2022.day22.input'
--  SERVER aoc2022 options(filename '/tmp/aoc2022.day22.testinput'
);

AOC Day 22 - Part One

This puzzle asks us to chart a route through a maze, following specific directions about how far to walk and when to turn. The input file looks like this:

        ...#
        .#..
        #...
        ....
...#.......#
........#...
..#....#....
..........#.
        ...#....
        .....#..
        .#......
        ......#.

10R5L5R10L4R5L5

This is the small test file: the actual one is always much larger and more complex. We can see it is divided into two parts: the maze, and the instructions. Our first step will be to translate that input into SQL tables. For now, we will only focus on the map part, which we will put into a new table:

CREATE UNLOGGED TABLE monkeymap (
  id INT GENERATED ALWAYS AS IDENTITY,
  y SMALLINT,
  x SMALLINT,
  item CHAR(1),
  eswn TEXT[]
);

We will need some supporting sequences, and then we can read the file line for line and transform it into the columns above:

CREATE SEQUENCE aoc;
CREATE SEQUENCE aoc2;

WITH x AS (SELECT nextval('aoc') AS myrow, setval('aoc2',1,false), line
  FROM aoc_day22 WHERE line !~ '\d')
,y AS materialized (SELECT *, string_to_table(line, null) AS a FROM x)
,z AS (SELECT *, nextval('aoc2') AS mycol FROM y)
INSERT INTO monkeymap (y,x,item)
SELECT myrow, mycol, a FROM z WHERE a <> ' ';

In the CTE above, we first use "x" to read one line at a time from our text file, using the sequence "aoc" to represent the row number, and resetting our column number "aoc2" to 1. Next we use "y" to break that line apart character by character. Then with "z" we gather all the items from y, along with incrementing the column number "aoc2" for each item. Finally, we insert all non-empty spots on the maze into our x,y grid. The final table looks like this for the first two rows:

SELECT * FROM monkeymap where y <= 2 ORDER BY y,x;

id | y | x  | item | eswn
----+---+----+------+------
  1 | 1 |  9 | .    | ☃
  2 | 1 | 10 | .    | ☃
  3 | 1 | 11 | .    | ☃
  4 | 1 | 12 | #    | ☃
  5 | 2 |  9 | .    | ☃
  6 | 2 | 10 | #    | ☃
  7 | 2 | 11 | .    | ☃
  8 | 2 | 12 | .    | ☃

Because we are going to be consulting this table a lot, we are going to precompute all the possible moves from one location to another, taking into account the special rules about "wrapping" from one end to the other. So each cell (i.e. unique x/y location) will get assigned an array indicating what happens when you move east, south, west, or north from the current cell. Here is the function to do that:

CREATE or replace FUNCTION monkey_premap()
  returns INTEGER language plpgsql AS $$
DECLARE
  myrec RECORD;
  north INT; south SMALLINT; east SMALLINT; west INT;
BEGIN

FOR myrec IN SELECT * FROM monkeymap WHERE item = '.' LOOP

  -- north: x the same, y decreases
  SELECT INTO north CASE WHEN item = '.' THEN id ELSE 0 END
    FROM monkeymap WHERE x=myrec.x AND y=myrec.y-1;
  IF north IS NULL THEN
    SELECT INTO north CASE WHEN item = '.' THEN id ELSE 0 END
      FROM monkeymap WHERE x=myrec.x ORDER BY y DESC LIMIT 1;
  END IF;

  -- south: x the same, y increases
  SELECT INTO south CASE WHEN item = '.' THEN id ELSE 0 END
    FROM monkeymap WHERE x=myrec.x AND y=myrec.y+1;
  IF south IS NULL THEN
    SELECT INTO south CASE WHEN item = '.' THEN id ELSE 0 END
      FROM monkeymap WHERE x=myrec.x ORDER BY y ASC LIMIT 1;
  END IF;

  -- east: y the same, x increases
  SELECT INTO east CASE WHEN item = '.' THEN id ELSE 0 END
    FROM monkeymap WHERE y=myrec.y AND x=myrec.x+1;
  IF east IS NULL THEN
    SELECT INTO east CASE WHEN item = '.' THEN id ELSE 0 END
      FROM monkeymap WHERE y=myrec.y ORDER BY x ASC LIMIT 1;
  END IF;

  -- west: y the same, x decreases
  SELECT INTO west CASE WHEN item = '.' THEN id ELSE 0 END
    FROM monkeymap WHERE y=myrec.y AND x=myrec.x-1;
  IF west IS NULL THEN
    SELECT INTO west CASE WHEN item = '.' THEN id ELSE 0 END
      FROM monkeymap WHERE y=myrec.y ORDER BY x DESC LIMIT 1;
  END IF;

  UPDATE monkeymap SET eswn = ARRAY[east,south,west,north]
    WHERE ctid = myrec.ctid;

  END LOOP;

  return 1;

END
$$;

Before we run the function, we should create some indexes that the queries in it will benefit from, then analyze the table to generate fresh statistics:

CREATE INDEX monkeyindex ON monkeymap(x,y);
CREATE INDEX monkeyids ON monkeymap(id);
ANALYZE monkeymap;
SELECT monkey_premap();

Our table now looks like this, for the first two rows of "y":

 id | y | x  | item |    eswn
----+---+----+------+------------
  1 | 1 |  9 | .    | {2,5,0,89}
  2 | 1 | 10 | .    | {3,0,1,90}
  3 | 1 | 11 | .    | {0,7,2,91}
  4 | 1 | 12 | #    | ☃
  5 | 2 |  9 | .    | {0,0,8,1}
  6 | 2 | 10 | #    | ☃
  7 | 2 | 11 | .    | {8,11,0,3}
  8 | 2 | 12 | .    | {5,12,7,0}

Just how big is the real data set? Even with our indexes, it took around 10 seconds to run that function. Here's what the first few table rows look like:

 id | y | x  | item |       eswn
----+---+----+------+-------------------
  1 | 1 | 51 | .    | {2,101,100,12451}
  2 | 1 | 52 | .    | {3,102,1,12452}
  3 | 1 | 53 | .    | {0,103,2,0}

Finally, we need a function to do the actual walking of the maze, based on the instructions given in the last line of the input file.

CREATE or replace FUNCTION monkeywalk()
  RETURNS int language plpgsql AS $$
DECLARE
walk TEXT[]; spin TEXT[];
myid INT;
mydir INT;
myrec RECORD;
j INT = 0;
newdir INT;
BEGIN
  /* Stick all of our distance commands into an array */
  SELECT INTO walk regexp_split_to_array(line, '\D+')
    FROM aoc_day22 WHERE line ~ '\d';

  /* Stick all of our direction commands into an array, and trim empty items */
  SELECT INTO spin array_remove(regexp_split_to_array(line, '\d+'),'')
    FROM aoc_day22 WHERE line ~ '\d';

  /* We always start in the top row, on the far left, facing east */
  SELECT INTO myid, mydir id,1 FROM monkeymap
    WHERE y=1 AND item='.' ORDER BY x ASC LIMIT 1;

  UPDATE monkeymap SET item = '>' WHERE id = myid;

  WHILE walk[j+1] IS NOT NULL LOOP
    j = j + 1;
    /* First, we walk as far as we can */
    FOR m IN 1 .. walk[j] LOOP

      /* What is in this direction? */
      SELECT eswn[mydir] INTO newdir FROM monkeymap WHERE id = myid;

      /* If we hit a wall, stop walking and go to the rotation */
      IF newdir = 0 THEN EXIT; END IF;

      /* Move to the new location */
      myid = newdir;

    END LOOP;

    /* Done walking, so time to rotate left or right */
    IF spin[j] IS NULL THEN EXIT; END IF;

    IF spin[j] = 'L' THEN
      mydir = CASE WHEN mydir = 1 THEN 4 ELSE mydir-1 END;
    ELSE
      mydir = CASE WHEN mydir = 4 THEN 1 ELSE mydir+1 END;
    END IF;
  END LOOP;

  /* Finished - display the final score */
  RETURN (y * 1000) + (x * 4) + (mydir-1) FROM monkeymap WHERE id = myid;

END
$$;

When we run it, we get the correct answer in about 1.7 seconds:

SELECT monkeywalk();

monkeywalk
------------
     186128

AOC Day 22 - Part Two

Part two gets...tricky. Rather than a simple two-dimensional map, we find ourselves holding a three dimensional cube which has been flattened out. So our map actually works like this:

        1111
        1111
        1111
        1111
222233334444
222233334444
222233334444
222233334444
        55556666
        55556666
        55556666
        55556666

Each of the numbers represents a different face of the cube. Of course, all of the movement rules are different now too, as walking off the "edge" of one face of the cube makes you appear on another face, with a new orientation! I tried really hard to solve this mentally by just looking at the map, but eventually had to create a small paper cube to keep everything straight and derive the correct rules as we moved from face to face.

Our first step will be to reset our initial table, as we need things to not be affected by any updates we did in part one:

TRUNCATE TABLE monkeymap;
SELECT setval('aoc',1,false);
WITH x AS (SELECT nextval('aoc') AS myrow, setval('aoc2',1,false), line
  FROM aoc_day22 WHERE line !~ '\d')
,y AS materialized (SELECT *, string_to_table(line, null) AS a FROM x)
,z AS (SELECT *, nextval('aoc2') AS mycol FROM y)
INSERT INTO monkeymap (y,x,item)
SELECT myrow, mycol, a FROM z WHERE a <> ' ';

We need to add some more columns to track new information. Each side of the cube will be represented by a letter from A to F. Everytime we go over the edge from one face to another, our orientation on the 2-D map may change, so we also need to record what sort of "twist" things take when we do so. Finally, we make a "xy" column as a shorthand array of our x and y coordinates.

ALTER TABLE monkeymap ADD COLUMN z CHAR, ADD COLUMN twist TEXT[];
ALTER TABLE monkeymap ADD COLUMN xy INT[];
UPDATE monkeymap SET xy= ARRAY[x,y];

Next, we need to map each cell, or original x/y coordinate, to one of the faces. This depends heavily on how the cube is folded. The solution below is optimized for my real data, not the test data. That's why each cube face is 50x50 characters wide.

\set Q 50
UPDATE monkeymap SET z =
CASE WHEN y <= :Q AND x <= (:Q*2)  THEN 'A'
     WHEN y <= :Q AND x > (:Q*2)   THEN 'B'
     WHEN y BETWEEN :Q+1 AND :Q*2 THEN 'C'
     WHEN y BETWEEN 1+(:Q*2) AND :Q*3 AND x <= :Q THEN 'D'
     WHEN y BETWEEN 1+(:Q*2) AND :Q*3 AND x > :Q THEN 'E'
     WHEN y >= 1+(:Q*3) THEN 'F' END;

As a sanity check, let's run a GROUP BY and confirm that each face has the same number of cells:

SELECT z, count(*) FROM monkeymap GROUP BY 1 ORDER BY 1;

 z | count
---+-------
 A |  2500
 B |  2500
 C |  2500
 D |  2500
 E |  2500
 F |  2500
(6 rows)

It kind of looks like this:

  AABB
  AABB
  CC
  CC
DDEE
DDEE
FF
FF

Our table is still a 2-D map which has "holes" that represent places where the cube faces are not. In other words, we now need to fold our table into a 3-D space, by very carefully shifting things around. For example, we need to shift the "A" values left by 50. Getting this part just right is where most of the puzzle's time was actually spent!

/*  A,C,E gets x-shifted over by Q  */
UPDATE monkeymap SET x = x-:Q WHERE z IN ('A','C','E');
/*  B gets x-shifted over by 2xQ  */
UPDATE monkeymap SET x = x - (:Q*2) WHERE z = 'B';
/*  C get y-shifted by Q  */
UPDATE monkeymap SET y = y - :Q WHERE z = 'C';
/*  D,E get y-shifted by Q*2  */
UPDATE monkeymap SET y = y - (:Q*2) WHERE z IN ('D','E');
/*  F gets y-shifted by Q*3  */
UPDATE monkeymap SET y = y - (:Q*3) WHERE z = 'F';

This part was so tricky I wrote a quick custom assertion to sanity check the results. We basically want to ensure that all cells live somewhere between 1 and 50 on both the x and y axis:

CREATE OR REPLACE FUNCTION monkey_assert(INT) RETURNS void
  language plpgsql as $$
BEGIN
PERFORM 1 FROM monkeymap WHERE x > $1;
IF FOUND THEN RAISE 'Invalid monkeymap x> %!', $1; END IF;
PERFORM 1 FROM monkeymap WHERE x > $1 OR y > $1 OR x < 1 OR y < 1;
IF FOUND THEN RAISE 'Invalid monkeymap!'; END IF;
END $$;
SELECT monkey_assert(:Q);

At this point, we are ready to write and run a function to walk the cube and generate all the solutions to where we appear for each direction we head from any point, by populating the eswn array. However, unlike the previous time we did this, we also need to account for the fact that we may also change our direction because we walked over the edge from one face to another! So we store that information in a second array called twist. Here's our newarray population function:

CREATE or replace FUNCTION monkeycube(int)
  RETURNS int language plpgsql AS $$
DECLARE
  myrec RECORD; east SMALLINT; south SMALLINT; west INT; north INT;
  teast CHAR; tsouth CHAR; twest CHAR; tnorth CHAR;
  maxx SMALLINT = $1;
BEGIN

/* For every spot on the map we could possibly walk to,
   figure out what is in each direction, and if we hit an edge */
FOR myrec IN SELECT * FROM monkeymap WHERE item = '.' LOOP
  /*
    A: e=BE s=CS w=DE n=FE  B: e=EW s=CW w=AW n=FN
    C: e=BN s=ES w=DS n=AN  D: e=EE s=FS w=AE n=CE
    E: e=BW s=FW w=DW n=CN  F: e=EN s=BS w=AS n=DN
  */

/* Heading east */
teast = '>';
/* Is there a valid space to the east on this cube face? */
SELECT INTO east CASE WHEN item = '.' THEN id ELSE 0 END
  FROM monkeymap WHERE x=myrec.x+1 AND y=myrec.y AND z=myrec.z;

/* If not found, we mus have walked over the edge to a new side of the cube */
IF east IS NULL THEN
 SELECT INTO east CASE WHEN item='.' THEN id ELSE 0 END FROM monkeymap WHERE
      (myrec.z='A' AND z='B' AND y=myrec.y AND x=1)           /* East */
   OR (myrec.z='B' AND z='E' AND y=maxx-myrec.y+1 AND x=maxx) /* West USD */
   OR (myrec.z='C' AND z='B' AND x= myrec.y AND y=maxx)       /* North */
   OR (myrec.z='D' AND z='E' AND y=myrec.y AND x=1)           /* East */
   OR (myrec.z='E' AND z='B' AND y=maxx-myrec.y+1 AND x=maxx) /* West USD */
   OR (myrec.z='F' AND z='E' AND x=myrec.y AND y=maxx);       /* North */
 teast = CASE WHEN myrec.z IN ('C','F') THEN '^'
         WHEN myrec.z IN ('B','E') THEN '<' ELSE '>' END;
END IF;

/* Heading south */
tsouth = 'v';
SELECT INTO south CASE WHEN item = '.' THEN id ELSE 0 END
  FROM monkeymap WHERE x=myrec.x AND y=myrec.y+1 AND z=myrec.z;
IF south IS NULL THEN
  SELECT INTO south CASE WHEN item='.' THEN id ELSE 0 END FROM monkeymap WHERE
       (myrec.z='A' AND z='C' AND x=myrec.x AND y=1)    /* South */
    OR (myrec.z='B' AND z='C' AND y=myrec.x AND x=maxx) /* West */
    OR (myrec.z='C' AND z='E' AND x=myrec.x AND y=1)    /* South */
    OR (myrec.z='D' AND z='F' AND x=myrec.x AND y=1)    /* South */
    OR (myrec.z='E' AND z='F' AND y=myrec.x AND x=maxx) /* West */
    OR (myrec.z='F' AND z='B' AND x=myrec.x AND y=1);   /* South */
  tsouth = CASE WHEN myrec.z IN ('B','E') THEN '<' ELSE 'v' END;
END IF;

/* Heading west */
twest = '<';
SELECT INTO west CASE WHEN item = '.' THEN id ELSE 0 END
  FROM monkeymap WHERE y=myrec.y AND x=myrec.x-1 AND z=myrec.z;
IF west IS NULL THEN
  SELECT INTO west CASE WHEN item='.' THEN id ELSE 0 END FROM monkeymap WHERE
       (myrec.z='A' AND z='D' AND y=maxx-myrec.y+1 AND x=1) /* East USD */
    OR (myrec.z='B' AND z='A' AND y=myrec.y AND x=maxx)     /* West */
    OR (myrec.z='C' AND z='D' AND x=myrec.y AND y=1)        /* South */
    OR (myrec.z='D' AND z='A' AND y=maxx-myrec.y+1 AND x=1) /* East USD? */
    OR (myrec.z='E' AND z='D' AND y=myrec.y AND x=maxx)     /* West */
    OR (myrec.z='F' AND z='A' AND x=myrec.y AND y=1);       /* South */
  twest = CASE WHEN myrec.z IN ('A','D') THEN '>'
               WHEN myrec.z IN ('C','F') THEN 'v' ELSE '<' END;
END IF;

/* Heading north */
tnorth = '^';
SELECT INTO north CASE WHEN item = '.' THEN id ELSE 0 END
  FROM monkeymap WHERE x=myrec.x AND y=myrec.y-1 AND z=myrec.z;
IF north IS NULL THEN
  SELECT INTO north CASE WHEN item='.' THEN id ELSE 0 END FROM monkeymap WHERE
       (myrec.z='A' AND z='F' AND y=myrec.x AND x=1)     /* East */
    OR (myrec.z='B' AND z='F' AND x=myrec.x AND y=maxx)  /* North */
    OR (myrec.z='C' AND z='A' AND x=myrec.x AND y=maxx)  /* North */
    OR (myrec.z='D' AND z='C' AND y=myrec.x AND x=1)     /* East */
    OR (myrec.z='E' AND z='C' AND x=myrec.x AND y=maxx)  /* North */
    OR (myrec.z='F' AND z='D' AND x=myrec.x AND y=maxx); /* North */
  tnorth = CASE WHEN myrec.z IN ('A','D') THEN '>' ELSE '^' END;
END IF;

UPDATE monkeymap SET eswn = ARRAY[east,south,west,north],
  twist = ARRAY[teast, tsouth, twest, tnorth]
WHERE ctid = myrec.ctid;

END LOOP;

RETURN 1;

END
$$;

Running this takes about 2 seconds

SELECT monkeycube(:Q);

Finally we can write a function to walk around the outside of the cube:

CREATE or replace FUNCTION monkey_inception()
  RETURNS int language plpgsql AS $$
DECLARE
  walk TEXT[]; spin TEXT[]; j INT = 0;
  myid INT; mydir SMALLINT; newdir INT; newflip CHAR;
  myrec RECORD; oldid INT=0; veryoldid INT=0;
BEGIN
  /* Stick all of our distance commands into an array */
  SELECT INTO walk regexp_split_to_array(line, '\D+')
    FROM aoc_day22 WHERE line ~ '\d';

  /* Stick all of our direction commands into an array, and trim empty items */
  SELECT INTO spin array_remove(regexp_split_to_array(line, '\d+'),'')
    FROM aoc_day22 WHERE line ~ '\d';

  /* We always start in the top row, on the far left, facing east */
  SELECT INTO myid, mydir id,1 FROM monkeymap
    WHERE y=1 AND item='.' AND z='A' ORDER BY x ASC LIMIT 1;

  WHILE walk[j+1] IS NOT NULL LOOP
    j = j + 1;

    /* First, we walk as far as we can */
    FOR m IN 1 .. walk[j] LOOP
      /* What is in this direction? */
      SELECT INTO newdir, newflip eswn[mydir], twist[mydir] FROM monkeymap WHERE id = myid;

      IF newdir IS NULL THEN RAISE 'newdir cannot be null for id % and dir %', myid, mydir; end if;

      /* If we hit a wall, stop walking and go to the rotation */
      IF newdir = 0 THEN EXIT; END IF;

      /* Move to the new location */
      myid = newdir;

      /* Set our new direction, as it might have changed by walking off the edge */
      mydir = CASE WHEN newflip='>' THEN 1 WHEN newflip='v' THEN 2
        WHEN newflip='<' THEN 3 ELSE 4 END;

      /* Graphical output - see below
      SELECT INTO myrec * FROM monkeymap WHERE id = myid;
      PERFORM monkeydraw(myrec.z, myid, oldid, veryoldid); PERFORM pg_sleep(0.1);
      veryoldid = oldid; oldid = myid;
      */
    END LOOP;

    /* Done walking, so time to rotate left or right */
    IF spin[j] IS NULL THEN EXIT; END IF;

    IF spin[j] = 'L' THEN
      mydir = CASE WHEN mydir = 1 THEN 4 ELSE mydir-1 END;
    ELSE
      mydir = CASE WHEN mydir = 4 THEN 1 ELSE mydir+1 END;
    END IF;

  END LOOP;

  /* Finished - display the final score */
  RETURN (xy[2] * 1000) + (xy[1] * 4) + (mydir-1) FROM monkeymap WHERE id = myid;

  END
$$;


SELECT monkey_inception();
-- Runs in 130ms !!

Running it produces the correct results in only 2s, as long as we force generic plans to run:

SET plan_cache_mode = force_generic_plan;
SELECT monkey_inception();

monkey_inception
------------------
            34426

This was a hard one, mostly due to all the mental gymnastics of moving from 2-D to 3-D space and trying to get that represented correctly. Is this the last we will see of the monkeys? Stay tuned, we are close to the end.

AOC Day 22 - Bonus Round!

I built a paper cube, but it would also be nice to view how people move around the outside of the cube in real time. To that end, let's make some more ANSI graphics and have psql create some animated images! Our monkey_inception() function has these calls inside of it:

  SELECT INTO myrec * FROM monkeymap WHERE id = myid;
  PERFORM monkeydraw(myrec.z, myid, oldid, veryoldid); PERFORM pg_sleep(0.1);
  veryoldid = oldid; oldid = myid;

When this is commented out, we grab the current face (myrec.z) and pass that, along with our current position, to a new function called monkeydraw. As this is meant to be us walking through a maze, followed by others, we also pass in the previous two positions, which allows us to simulate one leader and two followers moving along the outside of the cube. We sleep for 1/10 of a second, which controls how fast the animation appears.

The monkeydraw() function is detailed below. In short, it uses ANSI color codes to draw the current face of the cube, the current location as we are walking through it, and an indicator of which face is along each edge. The details are explained in the comments:

CREATE OR REPLACE FUNCTION monkeydraw(zz TEXT, myid INT, oldid INT, veryoldid INT)
RETURNS VOID language plpgsql AS $$
DECLARE
  myrec RECORD; mytext TEXT = '';

  resetcolor TEXT = E'\033[0m';

  Acolor TEXT = E'\x1b[38;5;196m'; /* red */
  Bcolor TEXT = E'\x1b[38;5;227m'; /* yellow */
  Ccolor TEXT = E'\x1b[38;5;214m'; /* orange */
  Dcolor TEXT = E'\x1b[38;5;225m'; /* pink */
  Ecolor TEXT = E'\x1b[38;5;165m'; /* purple */
  Fcolor TEXT = E'\x1b[38;5;21m';  /* blue */
  Zcolor TEXT = E'\x1b[38;5;21m';  /* blue */


  buffy    TEXT = E'\x1b[38;5;196m'; /* red */
  willow   TEXT = E'\x1b[38;5;212m'; /* lightred */
  xander   TEXT = E'\x1b[37m';       /* white */
  yellowbg TEXT = E'\x1b[48;5;227m';

  topcolor TEXT; bottomcolor TEXT; leftcolor TEXT; rightcolor TEXT;
  topname TEXT;  bottomname TEXT;  leftname TEXT;  rightname TEXT;


BEGIN

Zcolor = CASE WHEN zz='A' THEN Acolor WHEN zz='B' THEN Bcolor
              WHEN zz='C' THEN Ccolor WHEN zz='D' THEN Dcolor
              WHEN zz='E' THEN Ecolor ELSE Fcolor END;

topcolor = CASE WHEN zz IN ('A','B') THEN Fcolor
                WHEN zz IN ('C') THEN Acolor
                WHEN zz IN ('D','E') THEN Ccolor ELSE Dcolor END;

topname = CASE WHEN zz='C' THEN 'A' WHEN zz IN('D','E') THEN 'C'
               WHEN zz='F' THEN 'D' ELSE 'F' END;

bottomcolor = CASE WHEN zz IN ('A','B') THEN Ccolor
                   WHEN zz IN ('C') THEN Ecolor
                   WHEN zz IN ('D','E') THEN Fcolor ELSE Bcolor END;

bottomname = CASE WHEN zz='C' THEN 'C' WHEN zz IN('D','E') THEN 'F'
                  WHEN zz='F' THEN 'B' ELSE 'C' END;

leftcolor = CASE WHEN zz IN ('A','C','E') THEN Dcolor ELSE Acolor END;
leftname = CASE WHEN zz IN ('A','C','E') THEN 'D' ELSE 'A' END;
rightcolor = CASE WHEN zz IN ('A','C','E') THEN Bcolor ELSE Ecolor END;
rightname = CASE WHEN zz IN ('A','C','E') THEN 'B' ELSE 'E' END;

/* Draw the top border, showing the adjacent face's color and name */
mytext = format('%s%s%s%s%s%s', chr(10), topcolor, repeat(U&'\2588',25),
                                topname,repeat(U&'\2588',26), resetcolor);

/* Walk through each cell in the current face of the cube */
FOR myrec IN SELECT * FROM monkeymap WHERE z=zz ORDER BY y,x LOOP

  /* If this is the first column, draw the left border first */
  IF myrec.x=1 THEN
    mytext = mytext || format('%s%s%s%s', chr(10), leftcolor,
      CASE WHEN myrec.y=25 THEN leftname ELSE U&'\2588' END, resetcolor);
  END IF;

  /* If we are in the middle, show the name of the current face */
  IF myrec.x = 25 AND myrec.y = 25 THEN
    mytext = mytext || format('%s%s%s', yellowbg, zz, resetcolor);

  /* If we are at the current location, show a red indicator */
  ELSEIF myrec.id = myid THEN mytext = mytext
    || format('%s%s%s', buffy, U&'\2606', resetcolor);
  /* Show our followers as well */
  ELSEIF myrec.id = oldid THEN mytext = mytext
    || format('%s%s%s', willow, U&'\2606', resetcolor);
  ELSEIF myrec.id = veryoldid THEN mytext = mytext
    || format('%s%s%s', xander, U&'\2606', resetcolor);

  /* If this is an empty space, write out ...er... an empty space */
  ELSEIF  myrec.item = '.' THEN mytext = mytext || ' ';

    /* This must be a block, so write it out in the current face's color */
  ELSE mytext = mytext || format('%s%s%s', Zcolor, U&'\2588', resetcolor);
  END IF;

  /* IF this is the last column, draw the right border */
  IF myrec.x=50 THEN
    mytext = mytext || format('%s%s%s', rightcolor,
      CASE WHEN myrec.y=25 THEN rightname ELSE U&'\2588' END, resetcolor);
  END IF;

END LOOP;

/* Write the bottom border */
mytext = mytext || format('%s%s%s%s%s', chr(10), bottomcolor,
  repeat(U&'\2588',25),bottomname,repeat(U&'\2588',26), resetcolor);

RAISE NOTICE '% %', chr(10), mytext;

END;
$$;

I decided against only showing a small part of the face, but went with the entire 50x50 grid. It makes the graphic a lot bigger, but the results are worth it:

Fun with Postgres Looped Functions and Linear Progressions

Greg.Sabino.Mullane@crunchydata.com (Greg Sabino Mullane) — Wed, 22 Nov 2023 08:00:00 EST

Disclaimer

This article will contain spoilers both on how I solved 2022 Day 21's challenge "Monkey Math" using SQL, as well as general ideas on how to approach the problem. I recommend trying to solve it yourself first, using your favorite language.

AOC Day 21

Tech used:

The file_fdw extension to read the input
Functions such as regexp_substr
Unlogged tables

As always, we will use file_fdw to put our text input into a virtual Postgres table:

CREATE EXTENSION if not exists file_fdw;

CREATE SERVER if not exists aoc2022 foreign data wrapper file_fdw;

DROP SCHEMA if exists aoc2022_day21_monkeymath CASCADE;
CREATE SCHEMA aoc2022_day21_monkeymath;
SET search_path = aoc2022_day21_monkeymath;

CREATE FOREIGN TABLE aoc_day21 (id text, action text)
  SERVER aoc2022 options(filename '/tmp/aoc2022.day21.input',
--  SERVER aoc2022 options(filename '/tmp/aoc2022.day21.testinput',
  FORMAT 'csv', DELIMITER ':');

AOC Day 21 - Part One

The puzzle directions are odd but parseable:

Each monkey is given a job: either to yell a specific number or to yell
the result of a math operation. All of the number-yelling monkeys
know their number from the start; however, the math operation monkeys
need to wait for two other monkeys to yell a number, and those two
other monkeys might also be waiting on other monkeys.

We don't speak monkey, but the elephants we freed in the previous rounds do. This puzzle is pretty straightforward. First, let's pull apart the text strings in the puzzle, which look like this:

cgrb: gzwb * rcfd
gfbz: bwgp - qlfm
jrbf: 2
gvvg: rjch + tjdp
vwsh: grwp * ddsv
tpwb: 1

We will separate the data in each line and store one monkey per row in a new unlogged table. As each row is guaranteed to have a colon, we declared the foreign table as a csv with a delimiter of a colon, which saves us a step. But we still need to break apart the other items into specific columns. Some simple regular expression functions can help us do this:

CREATE UNLOGGED TABLE puzzle (
  id     TEXT,
  number BIGINT,
  lefty  TEXT,
  action TEXT,
  righty TEXT
);
/* Fill sparsely, as we will be updating this table a lot */
ALTER TABLE puzzle SET (autovacuum_enabled = off, fillfactor = 20);

INSERT INTO puzzle SELECT id
  ,CASE WHEN action ~ '\d'
    THEN regexp_substr(action, '(\d+)')::BIGINT ELSE null END
  ,CASE WHEN action !~ '\d'
    THEN regexp_substr(action, '\w+') ELSE null END
  ,CASE WHEN action !~ '\d'
    THEN regexp_substr(action, '[+*/-]') ELSE null END
  ,CASE WHEN action !~ '\d'
    THEN ltrim(regexp_substr(ltrim(action), ' (\w+)')) ELSE null END
FROM aoc_day21;

For each line, we examine if it has a number in it or not. If it does, we need to extract the monkey name ("id") and the number it yells out. If there is no number, we need to extract what other monkeys are involved, and what the mathematical symbol is. Afterwards, the table looks like this (we used \pset NULL ☃ in our .psqlrc file to produce a better null indicator)

  id  | number | lefty | action | righty
------+--------+-------+--------+--------
 cgrb |      ☃ | gzwb  | *      | rcfd
 gfbz |      ☃ | bwgp  | -      | qlfm
 jrbf |      2 | ☃     | ☃      | ☃
 gvvg |      ☃ | rjch  | +      | tjdp
 vwsh |      ☃ | grwp  | *      | ddsv
 tpwb |      1 | ☃     | ☃      | ☃

A function will be used to walk through monkey by monkey, apply any math that is needed, and keep running through until finally the monkey named "root" says a number, which will be our solution.

CREATE FUNCTION riddle_me_this()
  RETURNS BIGINT language plpgsql AS $$
DECLARE
  myrec RECORD; first BIGINT; second BIGINT;
BEGIN

LOOP
  /* Walk through and solve every monkey that has a left value. Order does not matter */
  FOR myrec IN SELECT * FROM puzzle WHERE number IS NULL LOOP
      /* Record the number yelled by the first monkey we are listening to */
      SELECT INTO first  p.number FROM puzzle p WHERE id = myrec.lefty;
      IF first IS NULL THEN continue; END IF;
      /* Record the number yelled by the second monkey */
      SELECT INTO second  p.number FROM puzzle p WHERE id = myrec.righty;
      IF second IS NULL THEN continue; END IF;

      /* At this point, we have numbers from two other monkeys, so perform an action */
      UPDATE puzzle SET number =
        CASE WHEN myrec.action = '-' THEN first - second
             WHEN myrec.action = '+' THEN first + second
             WHEN myrec.action = '*' THEN first * second
             WHEN myrec.action = '/' THEN first / second
        END
      WHERE id = myrec.id;

      IF myrec.id = 'root' THEN
        RETURN number FROM puzzle WHERE id = myrec.id;
      END IF;
    END LOOP;
  END LOOP;
END
$$;

We are just about ready to run the function. As we are doing a lot of lookups based on the "id" column, we want to create an index for it:

CREATE INDEX monkey_id ON puzzle(id);

Finally, we analyze the table, turn on timing, and run the function to get the correct answer:

ANALYZE puzzle; /* This helps a lot! */
\timing on
SELECT riddle_me_this();  /* Took 630ms on my system for a 1619 line input file */

riddle_me_this
-----------------
 158731561459602

AOC Day 21 - Part Two

For the second part of the puzzle, we need to figure out what number to feed into the "humn" row such that the "root" row will eventually have the same left and right values. To achieve this, we'll loop through a few times. Based on the previous day's puzzles, this will require a LOT of rounds, so we'll start with a guess of one trillion and then compute how far off we are. So first, we run with a guess of one, then of one trillion, and then compute the differences. All these monkeys are forming a simple linear progression, so we can quickly narrow in at that point until we get matching "root" numbers and have our answer.

CREATE FUNCTION i_am_humn()
  RETURNS BIGINT language plpgsql AS $$
DECLARE
  round INT = 0; myrec RECORD;
  first BIGINT; oldfirst BIGINT; second BIGINT;
  human_value BIGINT; changerate FLOAT;
  trillion BIGINT = 1_000_000_000;
BEGIN
<<outer>> LOOP
  round = round + 1;
  IF round = 1 THEN
    human_value = 1;
  ELSIF round = 2 THEN
    human_value = trillion;
  END IF;

  RAISE INFO '-> Round %: starting with human_value of %', round,
    to_char(human_value, 'FM999G999G999G999G999');

  /* reset to initial state */
  TRUNCATE TABLE puzzle;
  INSERT INTO puzzle SELECT id
    ,CASE WHEN action ~ '\d'  THEN
      regexp_substr(action, '(\d+)')::BIGINT ELSE null END
    ,CASE WHEN action !~ '\d' THEN
      regexp_substr(action, '\w+') ELSE null END
    ,CASE WHEN action !~ '\d' THEN
      regexp_substr(action, '[+*/-]') ELSE null END
    ,CASE WHEN action !~ '\d' THEN
      ltrim(regexp_substr(ltrim(action), ' (\w+)')) ELSE null END
  FROM aoc_day21;

  <<inner>> LOOP

    FOR myrec IN SELECT * FROM puzzle WHERE number IS  NULL LOOP
      SELECT INTO first   p.number FROM puzzle p WHERE id = myrec.lefty;
      IF first IS NULL THEN continue; END IF;
      SELECT INTO second  p.number FROM puzzle p WHERE id = myrec.righty;
      IF second IS NULL THEN continue; END IF;

      /* Discard the original values for "humn" as the goal is for us to provide them */
      IF myrec.lefty  = 'humn' THEN first = human_value; END IF;
      IF myrec.righty = 'humn' THEN second = human_value; END IF;

      UPDATE puzzle SET number =
        CASE WHEN myrec.action = '-' THEN first - second
             WHEN myrec.action = '+' THEN first + second
             WHEN myrec.action = '*' THEN first * second
             WHEN myrec.action = '/' THEN first / second
        END
      WHERE id = myrec.id;

      /* If this is monkey "root" AND the values are the same, we have finished */
      IF myrec.id = 'root' THEN
        IF first = second THEN RETURN human_value; END IF;
        EXIT inner;
      END IF;
    END LOOP;
  END LOOP;

  /* If this is our second run, see how far a trillion numbers has pushed us */
  IF z = round THEN
    changerate = (first-oldfirst) / trillion::float;
  END IF;

  /* Once we know how fast we change based on the input, we can refine our guess */
  IF round >= 2 THEN
    IF first-second < 0 THEN
      human_value = floor(human_value - abs((first-second)/changerate));
    ELSE
      human_value = human_value + abs((first-second)/changerate);
    END IF;
  END IF;

  oldfirst = first;

END LOOP;
END
$$;

There are two things in the function above to make things easier for us humans. First, the use of tochar(human_value, 'FM999G999G999G999G999') rather than just human_value ensures that a bigint like 3769668748355 gets output as 3,769,668,748,355. Second, one of my favorite features of Postgres 16 is the ability to write long numbers in a friendly manner. That's why instead of the confusing BIGINT = 1000000000 we can now simply write BIGINT = 1_000_000_000, Those of you copy and pasting this into an earlier-than-16 version will see: ERROR: trailing junk after numeric literal at or near "1*". Let's run the function:

SET client_min_messages = 'INFO';
SELECT i_am_humn();

INFO:  -> Round 1: starting with human_val of 1
INFO:  -> Round 2: starting with human_val of 1,000,000,000
INFO:  -> Round 3: starting with human_val of 3,769,668,748,355
INFO:  -> Round 4: starting with human_val of 3,769,668,716,709
   i_am_humn
---------------
 3769668716709

This produced the answer in 2.7 seconds, which I am going to call a win. Hopefully this is the last we see of the monkeys this year!

Fun with Postgres Floats, Positioning, and Sequencing

Greg.Sabino.Mullane@crunchydata.com (Greg Sabino Mullane) — Fri, 10 Nov 2023 08:00:00 EST

Disclaimer

This article will contain spoilers both on how I solved 2022 Day 20's challenge "Grove Positioning System" using SQL, as well as general ideas on how to approach the problem. I recommend trying to solve it yourself first, using your favorite language. Will I get these all posted before next year's AOC starts? Consider it a bonus challenge! :)

AOC Day 20

Tech used:

CTEs (Common Table Expressions)
Using a non-integer type to help simulate a linked list
The ever useful file_fdw extension
sequences
The built-in mod(https://www.postgresql.org/docs/current/functions-math.html) function and a custom implementation!
Using CALL to implement our stored procedures

As with the other days, there is some general setup to get a FDW to read the file.

CREATE EXTENSION if not exists file_fdw;

CREATE SERVER if not exists aoc2022 foreign data wrapper file_fdw;

DROP SCHEMA if exists aoc2022_day20_decrypt CASCADE;
CREATE SCHEMA aoc2022_day20_decrypt;
SET search_path = aoc2022_day20_decrypt;

/* I found this commented line was the easiest way to toggle test/real data: */

CREATE FOREIGN TABLE aoc_day20 (val bigint)
  SERVER aoc2022 options(filename '/tmp/aoc2022.day20.input');
--  SERVER aoc2022 options(filename '/tmp/aoc2022.day20.testinput');

For this challenge, we have an encrypted string that we need to decrypt with some very particular rules:

The encrypted file is a list of numbers. To mix the file, move each number
forward or backward in the file a number of positions equal to the value
of the number being moved. The list is circular, so moving a number off one
end of the list wraps back around to the other end as if the ends were connected.

To start, let's create a table to hold our information. We need to keep track of where each number starts, and where it will end up. Because the virtual table created by file_fdw is strictly read-only, we will keep that as the "initial position" table and create a new one to track changes. To speed things up, we should use an unlogged table, and to prevent autovacuum from firing, we disable it for this table. We will use a sequence to insert the items in the order they first appear:

CREATE SEQUENCE aoc;

CREATE UNLOGGED TABLE puzzle (
  val BIGINT,
  slot FLOAT /* Why a float? Keep reading... */
) WITH (autovacuum_enabled = off);

INSERT INTO puzzle SELECT val, nextval('aoc') FROM aoc_day20;
CREATE UNIQUE INDEX puzzle_slot ON puzzle(slot);

Next, we need a procedure to do the actual decryption via "mixing" according to the rules of the contest. We will walk through all the numbers in order, and shift them to their new location based on their value. The number needs to get moved somewhere between two other numbers. In many languages, a linked list is the obvious solution. Since we are doing this in SQL, we will instead set the position of the number to something between the two other number's positions. Hence the use of the data type float above, which allows us to subdivide numbers. Note: float is actually a synonym for double precision, but quicker to type.

In other words, if we need to move something between 18 and 20, we can assign it a slot of 19. If we need to stick it between 18 and 19, we assign it a slot of 18.5. If we need something between 18 and 18.5, we assign it a slot of 18.25, and so on. This trick allows us to still keep things in order, while maintaining a very large pool of potential values. Here is the complete procedure:

CREATE or replace PROCEDURE mixit()
  language plpgsql AS $$
DECLARE
  maxslots SMALLINT; slotcount SMALLINT; myrec RECORD;
  y FLOAT; z FLOAT; halfval FLOAT;
BEGIN

  /* We need to know when to "roll over" to the other side */
  SELECT INTO maxslots count(*)-1 FROM puzzle;

  /* Always set this back to 1 to be safe, so we can re-run this function at will */
  PERFORM setval('aoc', 1, false);

  /* This is using file_fdw, so unlike a regular table, we don't need to worry about an ORDER BY! */
  FOR myrec IN SELECT val, nextval('aoc') AS slot FROM aoc_day20 LOOP

    /* A value of 0 moves no spaces, so we simply ignore it */
    IF myrec.val = 0 THEN CONTINUE; END IF;

    /*
        Postgres does truncated division for mod, which is not the approach we need here,
        so we do it ourselves for negative numbers!
    */
    myrec.val = CASE WHEN myrec.val < 0
      THEN myrec.val - (maxslots  * floor( myrec.val::float / maxslots))
      ELSE mod(myrec.val, maxslots)
    END;

    /* Find the slot that is X more than our current position */
    SELECT INTO y slot FROM puzzle WHERE slot >= myrec.slot
      ORDER BY slot LIMIT 1 OFFSET myrec.val;
    IF y IS NULL THEN
      /* No slot found, so we fell off the right end. Circle to the front */
      SELECT INTO slotcount count(*) FROM puzzle WHERE slot > myrec.slot;
      myrec.val = myrec.val - slotcount;
      /* Grab our new left and right boundaries */
      SELECT INTO y slot FROM puzzle WHERE slot <> myrec.slot
        ORDER BY slot ASC LIMIT 1 OFFSET (myrec.val)-1;
      SELECT INTO z slot FROM puzzle WHERE slot <> myrec.slot
        ORDER BY slot ASC LIMIT 1 OFFSET (myrec.val);
    ELSE
      /* We found a left boundary, can we find a matching right one? */
      SELECT INTO z slot FROM puzzle WHERE slot >= myrec.slot
        ORDER BY slot LIMIT 1 OFFSET myrec.val+1;
      IF z IS NULL THEN
        /* We ran off the right edge - simply move this to the end */
        SELECT INTO slotcount max(slot) from puzzle;
        UPDATE puzzle SET slot = slotcount+1 WHERE slot = myrec.slot;
        CONTINUE;
      END IF;
    END IF;

    /*
       Create a value that is halfway between our left and right slots.
       Because this is a float, we can always find a unique number.
       Technically, it just needs to be between, not halfway...
    */
    SELECT INTO halfval (z+y)/2;

    /* Finally, we can set the value to the new position in the read-write table: */
    UPDATE puzzle SET slot=halfval WHERE slot = myrec.slot;

  END LOOP;

END
$$;

Because there is no data to return, we made this a procedure instead of a function. The way to run a procedure is:

CALL mixit();

After the decryption of the file, there is still one more step to generate the answer. The rules say:

the grove coordinates can be found by looking at the
1000th, 2000th, and 3000th numbers after the value 0,
wrapping around the list as necessary.

While we could solve this programmatically, the lazy way is to make a table that has at least 3000 rows after the last possible row in the original table. We'll create a quick temp table for that by doubling the original table (which has 5000 rows), making sure we maintain the order:

CREATE TABLE bigpuzzle AS SELECT * FROM puzzle ORDER BY slot;
INSERT INTO  bigpuzzle    SELECT * FROM puzzle ORDER BY slot;

Finally, we can solve it with a quick CTE. At the top level, we find the row that has a value of 0, and store the CTID for that row. Then, we make three more sections to grab the values at x+1000, x+2000, and x+3000. Once we have all those, we sum them together to get our final answer:

WITH x AS (SELECT ctid AS c FROM bigpuzzle WHERE val=0 ORDER BY ctid ASC LIMIT 1)
,y1 AS (SELECT val FROM bigpuzzle WHERE ctid >= (SELECT c FROM x)
  ORDER BY ctid LIMIT 1 OFFSET 1000)
,y2 AS (SELECT val FROM bigpuzzle WHERE ctid >= (SELECT c FROM x)
  ORDER BY ctid LIMIT 1 OFFSET 2000)
,y3 AS (SELECT val FROM bigpuzzle WHERE ctid >= (SELECT c FROM x)
  ORDER BY ctid LIMIT 1 OFFSET 3000)
SELECT y1.val+y2.val+y3.val AS aoc2022_day20_part1 FROM y1,y2,y3;

 aoc2022_day20_part1
----------------------
       19070

AOC Day 20 - Part Two

Part Two adds two new rules: a base number multiplier, and a process multiplier:

First, you need to apply the decryption key, 811589153.
Multiply each number by the decryption key before you begin;
this will produce the actual list of numbers to mix.

Second, you need to mix the list of numbers ten times.
The order in which the numbers are mixed does not change
during mixing; the numbers are still moved in the order
they appeared in the original, pre-mixed list

We can re-use our original puzzle table for this. The first step is to wipe it clean and repopulate, but multiply each number by that "decryption key" as it goes in:

SET aoc.decryption_key = 811589153;
TRUNCATE TABLE puzzle;
ALTER TABLE puzzle ADD COLUMN id SMALLSERIAL;
CREATE INDEX puzzle_id ON puzzle(id);
SELECT setval('aoc',1,false);
INSERT INTO puzzle(val,slot)
  SELECT val * current_setting('aoc.decryption_key')::int, nextval('aoc') FROM aoc_day20;

Next we need a new procedure. This one is almost identical to the previous one, with the change that any values coming from our file_fdw table aoc_day20 get multiplied by the decryption key:

CREATE or replace PROCEDURE sir_mix_a_slot()
  language plpgsql AS $$
DECLARE
  maxslots SMALLINT; slotcount SMALLINT; myrec RECORD;
  y FLOAT; z FLOAT; halfval FLOAT; currslot FLOAT;
BEGIN

  SELECT INTO maxslots count(*)-1 FROM puzzle;
  PERFORM setval('aoc',1,false);

  FOR myrec IN SELECT val, nextval('aoc') AS id FROM aoc_day20 LOOP

    IF myrec.val = 0 THEN CONTINUE; END IF;

    /* Make our numbers much much bigger, because they said so */
    myrec.val = myrec.val * current_setting('aoc.decryption_key')::bigint;

    myrec.val = CASE WHEN myrec.val < 0
      /* Special case as Postgres does a weird mod(-X,Y) */
      THEN myrec.val - (maxslots  * floor( myrec.val::float / maxslots))
      ELSE mod(myrec.val, maxslots)
    END;

    /* Find the slot value of the one we are adjusting */
    SELECT INTO currslot slot FROM puzzle WHERE id = myrec.id;

    /* Find the slot that is X more than our current position */
    SELECT INTO y slot FROM puzzle WHERE slot >= currslot
      ORDER BY slot LIMIT 1 OFFSET myrec.val;

    IF y IS NULL THEN
      /* No slot found, so we fell off the right end. Circle to the front */
      SELECT INTO slotcount count(*) FROM puzzle WHERE slot > currslot;
      myrec.val = myrec.val - slotcount;
      /* Grab our new left and right boundaries */
      SELECT INTO y slot FROM puzzle WHERE slot <> currslot
        ORDER BY slot ASC LIMIT 1 OFFSET (myrec.val)-1;
      SELECT INTO z slot FROM puzzle WHERE slot <> currslot
        ORDER BY slot ASC LIMIT 1 OFFSET (myrec.val);
    ELSE
      /* We found a left boundary, can we find a matching right one? */
      SELECT INTO z slot FROM puzzle WHERE slot >= currslot
        ORDER BY slot LIMIT 1 OFFSET myrec.val+1;
      IF z IS NULL THEN
        /* We ran off the right edge - simply move this to the end */
        SELECT INTO slotcount max(slot) FROM puzzle;
        UPDATE puzzle SET slot = slotcount+1 WHERE id = myrec.id;
        CONTINUE;
      END IF;
    END IF;

    SELECT INTO halfval (z+y)/2;
    UPDATE puzzle SET slot=halfval WHERE id = myrec.id;

  END LOOP;

END
$$;

Let's call it ten times in a row. Again, sometimes lazy is best. No loops, just literally run it ten times in a row. We also vacuum between each run, as we are doing a lot of updating:

vacuum puzzle; CALL sir_mix_a_slot(); /* One */
vacuum puzzle; CALL sir_mix_a_slot();
vacuum puzzle; CALL sir_mix_a_slot();
vacuum puzzle; CALL sir_mix_a_slot();
vacuum puzzle; CALL sir_mix_a_slot(); /* Five */
vacuum puzzle; CALL sir_mix_a_slot();
vacuum puzzle; CALL sir_mix_a_slot();
vacuum puzzle; CALL sir_mix_a_slot();
vacuum puzzle; CALL sir_mix_a_slot();
vacuum puzzle; CALL sir_mix_a_slot(); /* Ten! */

As before, we'll create a giant table we can OFFSET into without worrying about wrapping:

DROP TABLE IF EXISTS bigpuzzle;
CREATE TABLE bigpuzzle AS SELECT * FROM puzzle ORDER BY slot;
INSERT INTO  bigpuzzle    SELECT * FROM puzzle ORDER BY slot;

Then we can use our exact same CTE from above to get the solution:

WITH x AS (SELECT ctid AS c FROM bigpuzzle WHERE val=0 ORDER BY ctid ASC LIMIT 1)
,y1 AS (SELECT val FROM bigpuzzle WHERE ctid >= (SELECT c FROM x)
  ORDER BY ctid LIMIT 1 OFFSET 1000)
,y2 AS (SELECT val FROM bigpuzzle WHERE ctid >= (SELECT c FROM x)
  ORDER BY ctid LIMIT 1 OFFSET 2000)
,y3 AS (SELECT val FROM bigpuzzle WHERE ctid >= (SELECT c FROM x)
  ORDER BY ctid LIMIT 1 OFFSET 3000)
SELECT y1.val+y2.val+y3.val AS aoc2022_day20_part2 FROM y1,y2,y3;

 aoc2022_day20_part2
---------------------
      14773357352059

This was one of the easier days, and as such, there is no real ASCII animation to provide this time. Only five more days to go (and yes, they get a lot harder!)