justin appears

Logo

Some technical writing by Justin Giancola. I work on the platform team at Precision Nutrition.

27 February 2021

Analyzing Anki data with Postgres

Analyzing Anki data with Postgres

Anki, the popular open source SRS system, stores all of your review information in an sqlite database. sqlite is a simple, ubiquitous RDBMS. It is embedded in a staggering number of applications and can be found many other places from your modem to your DVR.

If you want to take a look at your Anki data, you could use the simple sqlite client that you very, very likely have installed somewhere on your computer. However, the client isn’t very good, and sqlite itself does not support a large number of features found in the SQL standard. For instance, if you want to use any of the statistical tools you will be out of luck.

I had some questions I wanted to answer based on my own review data so I imported my Anki database into Postgres to do the analysis. However, this proved to be a little trickier than I had anticipated because Anki’s database structure is somewhat broken. I thought I’d write up the approach I ended up using in case anyone finds it useful.

PGLoader

PGLoader is an incredible beast of a library written by Postgres contributor Dimitri Fontaine. I’ve never used a tool for moving data between databases that comes anywhere close to it in terms of ease of use, accuracy, performance, and features.

Normally, importing an sqlite database into Postgres is as simple as:

$ pgloader source_db.sqlite postgresql://username@host:port/target_db

However, because the Anki database is somewhat malformed, you have to define a command file for PGLoader to give it more specific instructions.

This is what I used:

load database
    from source_db.sqlite
    into pgsql://username@host:port/target_db

with include drop, create tables, create indexes, reset sequences

cast column notes.sfld to text drop typemod

set work_mem to '16MB', maintenance_work_mem to '512 MB';

So now you can successfully1 import via:

$ pgloader anki.load

This is more or less what PGLoader does by default if you don’t use a command file. The only difference is:

cast column notes.sfld to text drop typemod

The issue here is that while notes.sfld is defined as an integer field in the Anki sqlite database, the content of this column is actually text. Interestingly, sqlite does not care if you insert text into an integer column. Also interesting, there is no way (so far as I can tell) to change the type of a column in sqlite. So to work around this, we simply tell PGLoader to cast this field as text.

What about analyzing Anki data with Postgres?

My original intention was to talk about the structure of the Anki database and share some interesting things I’ve found in my review data. However, I thought it was worth sharing how I did the import because this step proved far more difficult than it should have due to the aforementioned quirks of sqlite. Hopefully this saves someone a yak shave and they can instead spend that hour analyzing their data.

I’ll write about the Anki database structure and some of my analysis in a future post.

  1. Well, there are still a couple of issues aside from the one that the above command file fixes. The tags table doesn’t import due to an issue with unicase collation, and the sqlite_stat table doesn’t import for reasons I haven’t bothered to investigate. However, I’ve not needed data from either of these tables thus far. 

tags: postgres - databases - anki