Solr Rocks!

After struggling with large FULLTEXT indexes in MySQL, Solr comes to the rescue, 16 million records ingested in 20 minutes – wow!

One small Gotcha was the security classes, which have obviously moved since the documentation was written (see fix at end of the post).

For web apps I live off MySQL, albeit now-a-days often wrapped with my own NoSQLite libraries to do Mongo-style databases over the LAMP stack. I’d also recently had a successful experience using MySQL FULLTEXT indices with a smaller database (10s of thousands of records) for the HCI Book search.  So when I wanted to index 16 million the book titles with their author names from OpenLibrary I thought I might as well have a go.

For some MySQL table types, the normal recommendation used to be to insert records without an index and add the index later.  However, in the past I have had a very bad experience with this approach as there doesn’t appear to be a way to tell MySQL to go easy with this process – I recall the disk being absolutely thrashed and Fiona having to restart the web server 🙁

Happily, Ernie Souhrada  reports that for MyISAM tables incremental inserts with an index are no worse than bulk insert followed by adding the index.  So I went ahead and set off a script adding batches of a 10,000 records at a time, with small gaps ‘just in case’.  The just in case was definitely the case and 16 hours later I’d barely managed a million records and MySQL was getting slower and slower.

I cut my losses, tried an upload without the FULLTEXT index and 20 minutes later, that was fine … but no way could I dare doing that ‘CREATE FULLTEXT’!

In my heart I knew that lucene/Solr was the right way to go.  These are designed for search engine performance, but I dreaded the pain of trying to install and come up to speed with yet a different system that might not end up any better in the end.

However, I bit the bullet, and my dread was utterly unfounded.  Fiona got the right version of Java running and then within half an hour of downloading Solr I had it up and running with one of the examples.  I then tried experimental ingests with small chunks of the data: 1000 records, 10,000 records, 100,000 records, a million records … Solr lapped it up, utterly painless.  The only fix I needed was because my tab-separated records had quote characters that needed mangling.

So,  a quick split into million record chunks (I couldn’t bring myself to do a single multi-gigabyte POST …but maybe that would have been OK!), set the ingest going and 20 minutes later – hey presto 16 million full text indexed records 🙂  I then realised I’d forgotten to give fieldnames, so the ingest had taken the first record values as a header line.  No problems, just clear the database and re-ingest … at 20 minutes for the whole thing, who cares!

As noted there was one slight gotcha.  In the Securing Solr section of the Solr Reference guide, it explains how to set up the security.json file.  This kept failing until I realised it was failing to find the classes solr.BasicAuthPlugin and solr.RuleBasedAuthorizationPlugin (solr.log is your friend!).  After a bit of listing of contents of jars, I found tat these are now in  I also found that the JSON parser struggled a little with indents … I think maybe tab characters, but after explicitly selecting and then re-typing spaces yay! – I have a fully secured Solr instance with 16 million book titles – wow 🙂

This is my final security.json file (actual credentials obscured of course!

    "blockUnknown": true,
      "tom":"blabbityblabbityblabbityblabbityblabbityblo= blabbityblabbityblabbityblabbityblabbityblo=",
      "dick":"blabbityblabbityblabbityblabbityblabbityblo= blabbityblabbityblabbityblabbityblabbityblo=",
      "harry":"blabbityblabbityblabbityblabbityblabbityblo= blabbityblabbityblabbityblabbityblabbityblo="},


why is the wind always against you? part 2 – side wind

In the first part of this two-part post, we saw that cycling into the wind takes far more additional effort than a tail wind saves.

However, Will Wright‘s original question, “why does it feel as if the wind is always against you?” was not just about head winds, but the feeling that when cycling around Tiree, while the angle of the wind is likely to be in all sorts of directions, it feels as though it is against you more than with you.

Is he right?

So in this post I’ll look at side winds, and in particular start with wind dead to the side, at 90 degrees to the road.

Clearly, a strong side wind will need some compensation, perhaps leaning slightly into the wind to balance, and on Tiree with gusty winds this may well cause the odd wobble.  However, I’ll take best case scenario and assume completely constant wind with no gusts.

There is a joke about the engineer, who, when asked a question about giraffes, begins, “let’s first assume a spherical giraffe”.  I’m not gong to make Will + bike spherical, but will assume that the air drag is similar in all directions.

Now my guess is that given the way Will is bent low over his handle-bars, he may well actually have a larger side-area to the wind than from in front.  Also I have no idea about the complex ways the moving spokes behave as the wind blows through them, although I am aware that a well-designed turbine absorbs a fair proportion of the wind, so would not be surprised if the wheels added a lot of side-drag too.

If the drag for a side wind is indeed bigger than to the front, then the following calculations will be worse; so effectively working with a perfectly cylindrical Will is going to be a best case!

To make calculations easy I’ll have the cyclist going at 20 miles an hour, with a 20 mph side wind also.

When you have two speeds at right angles, you can essentially ‘add them up’ as if they were sides of a triangle.  The resultant wind feels as if it is at 45 degrees, and approximately 30 mph (to be exact it is 20 x √2, so just over 28mph).

Recalling the squaring rule, the force is proportional to 30 squared, that is 900 units of force acting at 45 degrees.

In the same way as we add up the wind and bike speeds to get the apparent wind at 45 degrees, we can break this 900 unit force at 45 degree into a side force and a forward drag. Using the sides of the triangle rule, we get a side force and forward drag of around 600 units each.

For the side force I’ll just assume you lean into (and hope that you don’t fall off if the wind gusts!); so let’s just focus on the forward force against you.

If there were no side wind the force from the air drag would be due to the 20 mph bike speed alone, so would be (squaring rule again) 400 units.  The side wind has increased the force against you by 50%.  Remembering that more than three quarters of the energy you put into cycling is overcoming air drag, that is around 30% additional effort overall.

Turned into head speed, this is equivalent to the additional drag of cycling into a direct head wind of about 4 mph (I made a few approximations, the exact figure is 3.78 mph).

This feels utterly counterintuitive, that a pure side wind causes additional forward drag!  It perhaps feels even more counterintuitive if I tell you that in fact the wind needs to be about 10 degrees behind you, before it actually helps.

There are two ways to understand this.

The first is plain physics/maths.

For very small objects (around a 100th of a millimetre) the air drag is directly proportional to the speed (linear).  At this scale, when you redivide the force into its components ahead and to the side, they are exactly the same as if you look at the force for the side-wind and cycle speed independently.  So if you are a cyclist the size of an amoeba, side winds don’t feel like head winds … but then that is probably the least of your worries.

For ordinary sized objects, the squaring rule (quadratic drag) means that after you have combined the forces, squared them and then separated them out again, you get more than you started with!

The second way to look at it, which is not the full story, but not so far from what happens, is to consider the air just in front of you as you cycle.

You’ll know that cyclists often try to ride in each other’s slipstream to reduce drag, sometimes called ‘drafting’.

The lead cyclist is effectively dragging the air behind, and this helps the next cyclist, and that cyclist helps the one after.  In a race formation, this reduces the energy needed by the following riders by around a third.

In addition you also create a small area in front where the air is moving faster, almost like a little bubble of speed.  This is one of the reasons why even the lead cyclist gains from the followers, albeit much less (one site estimates 5%).  Now imagine adding the side wind; that lovely bubble of air is forever being blown away meaning you constantly have to speed up a new bubble of air in front.

I did the above calculations for an exact side wind at 90 degrees to make the sums easier. However, you can work out precisely how much additional force the wind causes for any wind direction, and hence how much additional power you need when cycling.

Here is a graph showing that additional power needed, ranging for a pure head wind on the right, to a pure tail wind on the left (all for 20 mph wind).  For the latter the additional force is negative – the wind is helping you. However, you can see that the breakeven point is abut 10 degrees behind a pure side wind (the green dashed line).  Also evident (depressingly) is that the area to the left – where the wind is making things worse, is a lot more than the area to the right, where it is helping.

… and if you aren’t depressed enough already, most of my assumptions were ‘best case’.  The bike almost certainly has more side drag than head drag; you will need to cycle slightly into a wind to avoid being blown across the road; and, as noted in the previous post, you will cycle more slowly into a head wind so spend more time with it.

So in answer to the question …

why does it feel as if the wind is always against you?

… because most of the time it is!

why is the wind always against you? part 1 – head and tail winds

Sometimes it feels like the wind is always against you.

Is it really?

I’ve just been out for a run.  It is not terribly windy today by Tiree standards, the Met Office reports the speed at 18mph from the north west, but it was enough to feel as I ran and certainly the gritted teeth of cyclists going past the window suggests is plenty windy for them.

As I was running I remembered a question that Will Wright once asked me, “why does it feel as if the wind is always against you?

Now Will competes in Iron Man events, and is behind Tiree Fitness, which organises island keep fit activities, and the annual Tiree 10k & half marathon and Ultra-Marathon (that I’m training for now).  In other words Will is in a different league to me … but still he feels the wind!

Will was asking about cycling rather than running, and I suspect that the main effect of a head wind for a runner is simply the way it knocks the breath out of you, rather than actual wind resistance.  That is, as most things in exercise, the full story is a mixture of physiology, psychology and physics.

For this post I’ll stick to the ‘easy’ cases when the wind is dead in front or behind you.  I’ll leave sidewinds to a second post as the physics for this is a little more complicated, and the answer somewhat more surprising.

In fact today I ran to and fro along the same road, although its angle to the wind varied.  For the purposes of this post I’ll imagine straightening it more, and having it face directly along the direction of the wind, so that running or cycling one way the wind is directly behind you and in the other the wind is directly in front.

running with the wind

To make the sums easier I’ll make the wind speed 15 mph and have me run at 5mph.

On the outward leg the wind is behind me.  I am running at 5mph, the wind is coming at 15mp, so if I had a little wind gauge as I ran it would register a 10mph tail wind.

When I turn into the wind I am now running at 5mph into a 15mph head wind, so the apparent wind speed for me is 20mph.

So half the time the wind is helping me to the tune of 10mph, half the time resisting me by 20mph, so surely that averages out as 5mph resistance for the whole journey, the same as if I was just running at 5mph on a  still day?

average apparent wind (?) = (  –10 * 4 miles  +  +20 * 4 miles ) / 8 miles = 5

Unfortunately wind resistance does not average quite like that!

Wind resistance increases with the square of your speed.  So a 10mph tail wind creates 100 units of force to help you, whereas a 20mph head wind resists you with 400 units of force, four times as much.  That is, it is like one person pushing you from behind for half the course, but four people holding you back on the other half.

It is this force, the squared speed, that it makes more sense to average

average resistance (wind) = (  –100 * 4 miles  +  +400 * 4 miles ) / 8 miles = 150

Compare this to the effects of running on a still day at 5mph.

average resistance (no wind) = 25  (5 squared)

The average wind resistance over the course is six times as much even though half the distance is into the wind and half the distance is away from it.

It really is harder!

In fact, for a runner, wind resistance (physics) is probably not the major effect on speed, and despite the wind my overall time was not significantly slower than on a still day.  The main effects of the wind are probably the ‘knocking the breath out of you’ feeling and the way the head wind affects your stride (physiology).  Of course both of these make you more aware of the times the wind is in your face and hence your perception of how long this is (psychology).

cycling hard

For a cyclist wind resistance is a far more significant issue1.  Think about Olympic sprinters who run upright compared with cyclists who bend low and even wear those Alien-like cycling helmets to reduce drag.

This is partly due to the different physical processes, for example, on bike on a still day, the bike will keep on going forward even if you don’t pedal, whereas if you don’t keep moving your legs while running you get nowhere.

It is also partly due to the different speeds.  Even Usain Bolt only manages a bit over 20mph and that for just 100 metres, and for long-distance runners this drops to around 12 mph.  Equivalent cycling events are twice as fast and even a moderately fit cyclist could compete with Usain Bolt.

So let’s imagine our Tiree cyclist, grimacing as they head into the wind.

I’m going to assume they are cycling at 15mph.

If it were a still day the air resistance would be entirely due to their own forward speed of 15mph, hence (squared remember) 225 units of force against them.

However, with a 15mph wind, when they are cycling with the wind there is no net air flow, the only effort needed to cycle is the internal resistance of chain and gears, and the rubber on the road.  In contrast, when cycling against the wind, total air resistance is equivalent to 30mph. Recalling the squaring rule, that is 900 units of resistance, 4 times as high as on a still day.

Even averaging with the easy leg, you have twice as much effort needed to overcome the air resistance.  No wonder it feels tough!

However, that is all assuming you keep a constant speed irrespective of the wind.  In practice you are likely to slow down against a head wind and speed up with a tail wind.  Let’s assume you slow down to 10mph in the head wind and manage a respectable 20mph with the wind behind you.

I’ll not do the force calculations as the numbers get a little less tidy, but crucially this means you spend twice as long doing the head wind leg as the tail wind leg.  Although you cycle the same distance with head and tail winds, you spend twice as long battling that head wind.  And I’ll bet with it feeling so tough, it seems like even longer!



  1. The Wikipedia page on Bicycle performance includes estimates suggesting over 75% of effort is overcoming drag … even with no wind[back]