At the time of writing, my laptop is running a migration script. It’s evening and I’m quite tired because I spent all the day trying to make it work after almost one year from it’s last commit. Yes, one year.
We decided to migrate this client’s site from our own homegrown CMS made with Rails 2.3 and MongoDB to WordPress. The migration took almost two weeks to complete. Then, for various reasons, the final switch never happened. Our client kept using the old CMS for one year more. We kept doing little changes to the CMS. Little incompatibilities arose with the migration script and here I am now, trying to adjust this bunch of classes to make the old database fit the WordPress back office.
If I’m lucky this will be the last attempt. It’s running, I’m tailing the logs. It’s a humbling experience. Something feel just not right today. I guess I can write down some lessons learned from this experience.
1 – Give the right value to time
This database is pretty big, has thousands of articles and attached images. At the time, I didn’t event think to write a multithreaded script to speed up writes. Speed was just not a concern. The migration was enough complex already for me to think to implement it with a multithreaded approach too. New, reading back my code I feel like I could have done several things run in parallel with little effort and great benefits on the execution time. Also considering what I’m about to write as point #2.
2 – Try to apply incremental changes
I’m not sure if I did it or not at the time, but I’m almost sure I didn’t. This migration script is not incremental. It runs expecting to find an empty destination database each time you run it. Which in conjunction with point #1 it means that if for some reason something goes wrong at a certain point, you have to fix, start over and you’ve lost a lot of time.
To plan for incremental changes would clearly bring more complexity to the whole, but it can of course save you a lot of time. You have to find the balance between complexity and speed. Also consider that having incremental changes can allow for a smoother switch between the two systems.
3 – It’s not different from any other project
I don’t know why, but for this migration I didn’t write a single row of test. I use them when doing my daily job on coding web apps. Tests first, red-green-refactor. I’m not a fundamentalist of TDD, so I’m ok with writing tests after, too, to do spikes to try viable solutions. In this case I didn’t leave a single line of executable test. Which is weird because there are several kinds of post types I could have mocked up very easily to ensure everything was going to fit into the right field in the destination database.
I guess that what made me think it wasn’t worth to write tests was that I thought of it as something disposable, a one shot thing. Run it, migrate the site and never use it again. I don’t know, code seem to stick around for an incredible amount time. Deals shift, sometimes by years like in this case. It’s not infrequent to have to put your hands in code someone else wrote years before. So be wise and keep everything clean. There’s no disposable code. Disposable code is command line, and sometimes even that you frequently reuse (CTRL-R anyone?).
4 – It’s hard to stay focused with a long running script
I knew this already after years practicing TDD. If tests are fast, you work fast. If tests are slow you work slow and the ratio is non-linear. So the faster the migration the faster you can fix bugs and deploy changes. It’s not different from any other project I’ve been working on. Naturally, you should privilege correctness over speed but… here comes my lesson number 5.
5 – Embrace concurrency
Always remember that you have many cores and database are already optimized for concurrent writes. You should try to leverage this as much as possible. Ruby doesn’t make it easy to think in terms of concurrency by default. Luckily there are several projects which help in that, allowing you for an easy switch to a multi threaded approach. If you are familiar with Sidekiq you know what I think about.
Sidekiq is largely adopted to offload the HTTP layer of web apps, but is a perfect fit for many other scenarios too. In the case of a large database migration for instance it would fit very well. If I were to rewrite my script today I would use it for sure.
The point is that you should think in threads by default. It’s mind shift you can’t postpone any longer. It you did it already, good for you, honestly. I’m starting today.