Comparing Embeddings and Definitions

We've talked about word vectors. We've talked about dictionary definitions. What's their common point? One thing you could argue is that they both purport to describe word meaning. That is fairly obvious when it comes to definitions—I don't think I need to do much convincing there, that's just what dictionaries are.

It's perhaps less obvious that we can expect word vectors to be meaning representations as well. There are a number of good arguments for that, but the one piece of evidence you should keep in mind here is that we don't associate words at random when we speak—shocking, I know! In fact, we use words based on what they mean. So we can expect that the context in which we use a word (i.e., what we called its distribution) tells us something of its meaning. That is something we've discussed at greater lengths in a previous post (post #4 here).

But is the meaning of word vectors the same as the meaning of definitions?

I know this sounds like a gibberish question. There are two key points I'd like you to keep in mind before you start booing too loudly. First, the word vectors that we have are constructed from large amounts of data using software. Moreover, they use estimation techniques like gradient descent (see post #8 here). These techniques are neither deterministic nor precise: we have no guarantee that what we end up with is the most optimal product we'd set out for.

The second point I'd like you to keep in mind is that nobody knows what meaning is in the first place. Or at least it's very much an open debate in linguistics. One thing you can say for sure is that the "meaning of sentences" and the "meaning of words" are different kinds of beast, as they have very different properties—for instance, you can say whether a sentence is true or false, but it's entirely meaningless to ponder whether that sentence is a synonym of "gorse".

We don't know what "word-meaning" is. Therefore, we're not certain that the concept of meaning that we implicitly rely on when we read or make up definitions is any different from the concept of word-meaning that we can study through distribution. It's not a trivial question without consequences either. Whenever we have two different scientific accounts for the same set of facts, it's worth sitting down to see whether that difference is merely cosmetic or whether it has deeper implications.

For instance, heliocentrism and describing the movement of planets using epicycloids (that's saying they trace spirograph patterns in the night sky, basically) is at the end of the day more or less equivalent mathematically speaking. What changes here is the optics of it: heliocentrism is a lot more economical, and means we don't have to treat Earth in a special way—we apply the same rules to this planet and to all the other. Don't get me wrong, optics do matter: having the right scientific framweork helps a lot when trying to build on previous results and theories.

The second type of difference is more similar to that between Newtonian gravity and Enstein's general relativity. These theories are not mathematically equivalent: they predict different orbits for the planet Mercury. So you can't just say that they mean the same thing: at most one of the two can describe accurately the motions of planets; simply put, they describe different facts.

Going back to word-meaning, it's worth asking which of the two situations we're in. Are definitions and word distributions equivalent descriptions of meaning? If so, then that would make for a very enlightening difference in optics which might help us to better understand what word-meaning is, fundamentally speaking. If not, then we'll have two competing descriptions of word-meaning, and we'll have to look into whether either of those does not match reality.

The next few posts are going to be about methods to compare word embeddings and definitions, but I'll stop here for today. See you next week!