Categories
Uncategorized

“Though social scientists care what people think it’s also important to observe what people do, especially if what they think they do turns out to be different from what they actually do.”

– Scott Golder. “Scaling Social Science with Hadoop”

“Big Data is going to be extremely important but we can never lose track of the context in which this data is produced and the cultural logic behind its production. We must continue to ask “why” questions that cannot be answered through traces alone”

Big Data presents new opportunities for understanding social practice. Of course the next statement must begin with a “but.” And that “but” is simple: Just because you see traces of data doesn’t mean you always know the intention or cultural logic behind them. And just because you have a big N doesn’t mean that it’s representative or generalizable. Scott[Golder /speaking about Hadoop] knows this, but too many people obsessed with Big Data don’t.

Increasingly, computational scientists are having a field day with Big Data. This is exemplified by the “web science” community and highly visible in conferences like CHI and WWW and ICWSM and many other communities in which I am a peripheral member. In these communities, I’ve noticed something that I find increasingly worrisome… Many computational scientists believe that because they have large N data that they know more about people’s practices than any other social scientist. Time and time again, I see computational scientists mistake behavioral traces for cultural logic. And this both saddens me and worries me, especially when we think about the politics of scholarship and funding. I’m getting ahead of myself.

Let me start with a concrete example. Just as social network sites were beginning to gain visibility, I reviewed a computational science piece (that was never published) where the authors had crawled Friendster, calculated numbers of friends, and used this to explain how social network sites were increasing friendship size. My anger in reading this article resulted in a rant that turned into a First Monday article. As is now common knowledge, there’s a big difference between why people connect on social network sites and why they declare relationships when being interviewed by a sociologist. This is the difference between articulated networks and personal networks.

On one hand, we can laugh at this and say, oh folks didn’t know how these sites would play out, isn’t that funny. But this beast hasn’t yet died. These days, the obsession is with behavioral networks. Obviously, the people who spend the most time together are the REAL “strong” ties, right? Wrong. By such a measure, I’m far closer to nearly everyone that I work with than my brother or mother who mean the world to me. Even if we can calculate time spent interacting, there’s a difference in the quality of time spent with different people.

Big Data is going to be extremely important but we can never lose track of the context in which this data is produced and the cultural logic behind its production. We must continue to ask “why” questions that cannot be answered through traces alone, that cannot be elicited purely through experiments. And we cannot automatically assume that some theoretical body of work on one data set can easily transfer to another data set if the underlying conditions are different.

– Danah Boyd 

http://www.zephoria.org/thoughts/archives/2010/04/17/big-data-opportunities-for-computational-and-social-sciences.html

// Fascinating to eavesdrop on amazing minds being applied to early days of big data. Tragic to see failure with small data [decisions] & small minds (Zimmerman)

Leave a comment