This week I learned to stop worrying and love n-dimensionality. It started with my vow at the end of last week to learn more about matrices. As it turns out, I'm still on vectors.
Last week, I referred to vectors in two ways: as a row in a matrix, and as a set of coordinates. Both of these conceptions are more or less accurate. What I hadn't considered yet was directionality, which if you think about it is inherent to a set of coordinates, but I hadn't thought about it.
Part of the reason I hadn't thought about it is that in my mathematical education so far, numbers were numbers and coordinates were coordinates, and you put one into math problems and you put the other on a graph. I didn't get to the part where these things became one thing, explicitly, even though using graphs to solve algebra problems really should have tipped me off. (Perhaps one of you can tell me if that part was geometry or calculus or linear algebra.)
In a vector way of thinking, the kinds of numbers you put into math problems are simply 1 dimensional vectors, or vectors in the R1 space as I am told they say:
[2] + [2] = [4]
On a number line. In the flatland. Any vector quantity is a movement from an origin point in some direction. At least one direction. Because it can be in more than one direction. If you have a vector in an R2 space, it has two values in it, and you go one direction for the first value and then a right angle away for the second value, ie the x and y axis. For a third value, you go in another direction at right angles to both of the prior directions, ie the z axis. And right about then, our brain runs out of axes.
Despite this human brain limit, the vector can keep moving in n-many directions. On that, I read this insanely reassuring passage in Dominic Widdows's Geometry of Meaning this week:
"Do not hold yourself back from understanding what dimensions are by trying to visualize more than three dimensions at once: it's a good way of getting frustrated because it's just not within our physical experience. Spaces with many dimensions are easy to live and work with once you accept that several dimensions are perfectly valid and consistent, without trying to forcibly reconcile each dimension with visual experience simultaneously" (144).*
Okay, it's not my fault that I can't see the dimensions! I can focus on practicing how to think them, because they make some pretty remarkable spatial things possible with relatively easy math (an insight kicked off by Des Cartes, who I’m more used to hating on due to his being associated with the idea that minds exist in an ideal state independent of bodies). It's addition, multiplication, and a square root. That's almost all you need. You might need to do them a lot of times, but it's all you need. Specifically, you can find the length of any vector in n-dimensional space using an extended version of the Pythagorean theoreom: you can take the square root of the sum of all the direction quantities. You can do the same thing to calculate the distance between two vectors: take the square root of the sum of the differences between each set of 2 points squared, ie a1 - b1 squared plus a2-b2 squared, ie the Euclidean distance between each point and ultimately between the two vectors. No matter how many dimensions they represent. Widdows again reassures us, "This is another example of the fairly relaxed mentality you need to accept the ways vectors and dimensions are used" (155). Free your mind, the n-dimensionality will follow.
You might need one more thing, and that is division (although this could also be fraction multiplication? right? guys?), which you use for normalizing vectors by finding the number it takes to give them a length of 1. This is a crucial step for word vectors, because it amounts to scaling to account for the wildly varying lengths of different documents and words that are much more frequent than others. Because vectors aren't trying to map word frequency, they are trying to map word relationship. To normalize the vector, you divide every direction quantity (ie every number in the list that makes the vector) by the length of the entire vector (the square root of the sum of all the direction quantities squared). Then you have what is called a unit vector, a set of directional quantities, a set of steps you take into n-dimensional space that ultimately takes you 1 unit of distance but ends up in the same direction as the original vector that took you a longer distance. And once we've turned whatever we're representing into unit vectors, be they keywords or documents, we can measure how similar they are by measuring the cosine of the angle between the vectors they create. That angle is always there, because vectors are not primarily quantities. They are directions.
The words themselves are not really quantified. Their spatial relationships to other words are, represented by co-occurence frequency and proxied through their vector space location. Quantification in vector representation isn't changing the words, it's changing how we imagine the meanings of words to be created--as Gavin has been telling me, and as Kristeva told us before that.
I still have a pretty big question about how word vector space is visualized. It sounds like you have to have a particular comparison at the heart of everything you graph. Cosine similarity doesn't tell you anything about a word. It allows you to compare a word with another word. Yet, I see visualizations with many, many words in them. Are they all from the perspective of one other word, or is there another step that brings them all onto a different kind of plane together, where you could just look at see which ones ended up close together?
Is this magical, or kind of unsettling? Well, it's definitely magical. Centuries of human curiosity and creativity brought us a way of thinking about things--and I do mean things, at the most abstracted, interchangeable level--that allows us to map their qualities and similarities without even knowing what they are first. Using this system, similar things (defined mathematically) land closer together and dissimilar things land further apart.
On another level, it's also unsettling. Understanding of the relationships between words and the concepts that these relationships represent is the work of years, and while it is time-intensive it is perfectly accessible to human readers, at least on a smaller scale of selected works. If we could shortcut all that, what would be the motivation to read so deeply? Well, the motivation to read deeply is always going to be there, for a certain subset of the population. The more unsettling question is, what is going to be the motivation to pay people enough to allow them to read, or at least create the environment in which reading is possible and valued?
Of course it comes back to the word "understanding." Vector space represents, but it doesn't explain. It doesn't even suggest--we do that, through the connections that their visualizations spark. The problem here is not what the computers can do, but because of what we are so primed to do with their results. We see that they've found a relationship between words, and we assume they know what words mean, because for a long time whenever we encountered words put together, someone meant to put them that way. Not the case for word relationships computers generate. They can only model back to us the relationships that we create between words by using them, over and over, in many contexts. The more that we think the computers know what words mean, the more poorly prepared we are going to be to evaluate the words they show us.
And then going back up/down the cosine curve of delight/terror, you can also see the properties of vector space as a reminder that when we read a word in a text and assume we understand it, we may also be missing out on its alternate contexts. Words can be related to more things than we already know.
After his careful explanation of the process of locating vectors in space, Widdows cites a nineteenth century satirical novel called Flatland, by Edwin Abbott, in which "the world of a humble square is rocked by the intersection of a solid sphere with his two-dimensional existence. At first resistant, the square becomes convinced that a third dimension is possible" (Widdows 166, emphasis in original).
Perhaps embracing the simultaneous non-reality and infinite potential of n-dimensions is a way of reminding ourselves that the projections of mathematics and computation are, ultimately, one creative way of representing the world. Not the world itself. The representations can surprise, delight, and even reveal, but they don't mean without our intention.
So see you back next week, when I’ll again be channeling my inner square.
* Widdows is a charming writer, and I would now read any math book he ever writes. He wrote Geometry of Meaning in 2004, during what looks like a postdoctoral stint at Stanford. After that, he left the academy and is now working on quantum cognition. I take some faint hope that a person who took the time to write such a lovely and accessible book is also going to do humane work on what sounds like a frightfully consequential topic.
Ah, transcendent math. So glad you are working through this. Your effort, and dare I say delight, shines through!