a visual living record of your life

January 27, 2012

image: Mashable

source: All Things D, “Developers Get Ready To Tell Facebook About Every ‘Action’ You Take” by Liz Gannes

Facebook will on Wednesday launch the Open Graph applications it first debuted last September (video below), sources told AllThingsD. These are the apps, made by outside developers, that “frictionlessly” and continuously share users’ actions back to Facebook after a user has given permission once.

The new apps behave similarly to the “read,” “listen” and “watch” Open Graph applications that have already rolled out in the past few months, which include the Washington Post, Spotify and Hulu. So every time your friends read an article or listen to a song, you might now learn about it on Facebook, and possibly even join them in reading or listening at the same time.

Outside developers have been furiously coding other custom actions since September; and recently, many have been waiting on Facebook, so they can make them available to users.

Facebook has invited press to an unveiling event on Wednesday evening in San Francisco — where it will launch the first batch of these apps, sources confirmed.

Facebook did not reply to requests for comment, though it did send us invitations to the event.

This may well be one of Facebook’s last big press conferences before it files to go public and enters a quiet period, during which financial regulations keep it from commenting on its products, business or criticism from competitors and analysts.

The timing of the Wednesday press event aligns with Facebook’s last public guidance on the subject. The company told developers in late December that since its Timeline profile design was being rolled out worldwide, Open Graph Actions would start being approved in January.

Currently, Facebook Timeline is available to users on an opt-in basis. At some point soon—perhaps as early as this week—Facebook will start requiring users to migrate to the new design.

That’s because Open Graph and Timeline go hand in hand; the idea is for each user’s activity across various Web sites and apps, both on and off of Facebook, to be aggregated as a visual living record of his or her life.

What kind of Actions will developers build on the Open Graph? Some examples include tracking a workout with a GPS device, completing a recipe from a cooking site, or buying an item on an e-commerce site. Those Actions could be expressed on Facebook with verbs like “run,” “cook” or “purchase.”

Along with the new verbs will surely come Facebook’s usual problems: Unanticipated incursions into user privacy, people who hate change, and profligate oversharing.

Sources said that in the lead-up to the launch, Facebook has been busy working on things like how to conjugate the verbs for the Open Graph Actions.

Facebook told developers their Actions must be “simple, genuine and non-abusive.”

To Facebook CEO Mark Zuckerberg, who is known to have studied Latin, I say: Just remember, “Veni, vidi, vici!”


open academia

January 26, 2012

image: Research to Action

unlock my postssource: Bly, Adam, Kathleen Fitzpatrick and Katherine Rowe. (2010, September 22). Tea Party Online, Craigslist and Free Speech, and Open Academia. BrianLehrer.TV.

From the Brian Lehrer videocast (more transcript excerpts below): will the net kill the age-old system of peer review in academic publishing? knowledge: liberated and threatened by web publishing. sounds like common sense. publish-or-perish is the toughest part of being a professor; research and write and be OKd by experts in the field; peer review; elite journal is a must for top-job and tenure; it’s been the system for decades; the web is a disruptive force: publishing without a press, open peer review, show your work, in progress, publicly; what does this mean for the advancement of learning itself and for the politics of academia?

View this document on Scribd

teachers don’t like creative students

January 26, 2012

image: Jennifer Crute
unlock my postssource: Marginal Revolution, “Teachers Don’t Like Creative Students”
by Alex Tabarrok

One of the most consistent findings in educational studies of creativity has been that teachers dislike personality traits associated with creativity. Research has indicated that teachers prefer traits that seem to run counter to creativity, such as conformity and unquestioning acceptance of authority (e.g., Bachtold, 1974; Cropley, 1992; Dettmer, 1981; Getzels & Jackson, 1962; Torrance, 1963). The reason for teachers’ preferences is quite clear creative people tend to have traits that some have referred to as obnoxious (Torrance, 1963). Torrance (1963) described creative people as not having the time to be courteous, as refusing to take no for an answer, and as being negativistic and critical of others. Other characteristics, although not deserving the label obnoxious, nonetheless may not be those most highly valued in the classroom.

….Research has suggested that traits associated with creativity may not only be neglected, but actively punished (Myers & Torrance, 1961; Stone, 1980). Stone (1980) found that second graders who scored highest on tests of creativity were also those identified by their peers as engaging in the most misbehavior (e.g., “getting in trouble the most”). Given that research and theory (e.g., Harrington, Block, & Block, 1987) suggest that a supportive environment is important to the fostering of creativity, it is quite possible that teachers are (perhaps unwittingly) extinguishing creative behaviors.

From “Creativity: Asset or Burden in the Classroom?” (embed below), a good review paper. What the paper shows is that the characteristics that teachers use to describe their favorite student correlate negatively with the characteristics associated with creativity. In addition, although teachers say that they like creative students, teachers also say creative students are “sincere, responsible, good-natured and reliable.” In other words, the teachers don’t know what creative students are actually like.  (FYI, the research design would have been stronger if the researchers had actually tested the students for creativity.)  As a result, schooling has a negative effect on creativity.

My experience as a parent is consistent with the idea that teachers don’t like creative students but I try not to blame the teachers too much. Creative people, for better and worse, ignore social conventions. Thus, it can be hard for teachers to deal with creative students in a classroom setting where they must guide 20-30 students en masse. As Jonah Lehrer puts it:

Would you really want a little Picasso in your class? How about a baby Gertrude Stein? Or a teenage Eminem? The point is that the classroom isn’t designed for impulsive expression – that’s called talking out of turn. Instead, it’s all about obeying group dynamics and exerting focused attention. Those are important life skills, of course, but decades of psychological research suggest that such skills have little to do with creativity.

One hope I have for personalized learning, ala the Khan Academy, is that  teachers will not feel the need to suppress creative students when classroom dynamics do not require that all the students follow all the rules all the time .

Hat Tip: Erik Barker.

View this document on Scribd

the social side of the Internet

January 21, 2012

social side of the internet
unlock my postssource: Pew Internet & American Life Project, “The social side of the internet” by Lee Rainie, Kristen Purcell and Aaron Smith

overview

The internet is now deeply embedded in group and organizational life in America. A new national survey by the Pew Research Center’s Internet & American Life Project has found that 75% of all American adults are active in some kind of voluntary group or organization and internet users are more likely than others to be active: 80% of internet users participate in groups, compared with 56% of non-internet users. Moreover, social media users are even more likely to be active: 82% of social network users and 85% of Twitter users are group participants.

“One of the striking things in these data is how purposeful people are as they become active with groups,” noted Kristen Purcell, the research director at Pew Internet and co-author of the report. “Many enjoy the social dimensions of involvement, but what they really want is to have impact. Most have felt proud of a group they belong to in the past year and just under half say they accomplished something they couldn’t have accomplished on their own.”

“It is important to note that 25% of American adults are not active in any of the groups we addressed,” Aaron Smith, senior research specialist at Pew Internet and co-author of the report. “They often report they are time-stressed or have health or other issues that limit their ability to be involved. And about a fifth of them say that lack of access to the internet is a hindrance. Even in its absence, the internet seems to be a factor in the reality of how groups perform in the digital age.”

about the survey

This report is based on the findings of a survey on Americans’ use of the Internet. The results in this report are based on data from telephone interviews conducted by Princeton Survey Research Associates International from November 23 to December 21, 2010, among a sample of 2,303 adults, age 18 and older.  Telephone interviews were conducted in English and Spanish by landline (1,555) and cell phone (748, including 310 without a landline phone). For results based on the total sample, one can say with 95% confidence that the error attributable to sampling is plus or minus 2.3 percentage points.  For more information, please see the Methodology section of this report.

report

View this document on Scribd

survey questions

View this document on Scribd

assessing teacher compensation

January 21, 2012

image: SodaHead
unlock my postssource: NCPA Daily Policy Digest, “Critical Issues in Assessing Teacher Compensation”

A recent report by the Heritage Foundation concluded that, on average, public school teachers receive total compensation that is roughly 50 percent higher than what they would receive in private-sector employment. Critics have since attacked the study, making numerous charges of bias and analytical flaws. The Heritage Foundation has recognized these accusations and others, and addresses each one in turn:

  • Researchers failed to account for hours of work completed outside of the school day: the authors point out that they did not make assumptions, but instead based their hour figures on reliable, self-reporting among teachers which suggested that the median work week for teachers is 40 hours.
  • The study does not take into account how hard teachers work at their jobs: the study did not include quantifiable data for how hard teachers work at their jobs beyond the hours of their work week; however, private school teachers receive 10 percent less in compensation despite working equally as hard, lending credence to the belief that public school teachers receive inflated compensation packages.
  • Teachers pay for many supplies out of their own pocket: teachers do in fact pay for some classroom materials with their own money.  Nevertheless, many private sector workers pay for business-related items out of their own pockets as well.  Furthermore, the government provides a $250 deductible for classroom materials.
  • Researchers should not have included a benefit of job security, as many teachers have been laid off recently: despite tighter budgets, public school teachers were still only half as likely as other white-collar workers to be laid off in the last five years.
  • Teachers should be paid more so that we can attract better teachers: studies show that it is not below-market salaries that ostracize high-quality teachers, but hiring practices that ignore important qualifications such as college grade point averages and specialized degrees.

Heritage Foundation study:

View this document on Scribd

Haiti earthquake aftermath

January 20, 2012

Haiti earthquake aftermath

The Huffington Post reported on the weakened state of education in Haiti, focusing on the country’s challenge with illiteracy following the 2010 earthquake. My thought is that a surge of interest in environmental and geological issues might facilitate more science education in the region—via distance learning, remote research programs, etc. I wonder, are there any science ed opportunities here?

Two years after a 7.0 magnitude earthquake rocked Haiti, 600,000 illiterate children remain out of school, leaving the country’s next generation of leaders on the streets and without the education, mentors and tools necessary to move beyond a life of destruction and disappointment.

As aid groups devise ways to speed up the rebuilding of a country where more than half a million people are living in tents, they are focusing their efforts on repairing infrastructure, treating diseases and providing clean drinking water. While schooling is key to empowering young people, charities are simply strapped for resources and the education system remains largely privatized in Haiti.

via HuffPost Impact: Haiti Earthquake Aftermath—Charities Tackle Illiteracy In Country’s Slums


big data revealed

January 20, 2012

Source: radar.oreilly.com via C.DLT on Pinterest

unlock my postssource: Radar, “What is big data?”
by Edd Dumbill

Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the strictures of your database architectures. To gain value from this data, you must choose an alternative way to process it.

The hot IT buzzword of 2012, big data has become viable as cost-effective approaches have emerged to tame the volume, velocity and variability of massive data. Within this data lie valuable patterns and information, previously hidden because of the amount of work required to extract them. To leading corporations, such as Walmart or Google, this power has been in reach for some time, but at fantastic cost. Today’s commodity hardware, cloud architectures and open source software bring big data processing into the reach of the less well-resourced. Big data processing is eminently feasible for even the small garage startups, who can cheaply rent server time in the cloud.

The value of big data to an organization falls into two categories: analytical use, and enabling new products. Big data analytics can reveal insights hidden previously by data too costly to process, such as peer influence among customers, revealed by analyzing shoppers’ transactions, social and geographical data. Being able to process every item of data in reasonable time removes the troublesome need for sampling and promotes an investigative approach to data, in contrast to the somewhat static nature of running predetermined reports.

The past decade’s successful web startups are prime examples of big data used as an enabler of new products and services. For example, by combining a large number of signals from a user’s actions and those of their friends, Facebook has been able to craft a highly personalized user experience and create a new kind of advertising business. It’s no coincidence that the lion’s share of ideas and tools underpinning big data have emerged from Google, Yahoo, Amazon and Facebook.

The emergence of big data into the enterprise brings with it a necessary counterpart: agility. Successfully exploiting the value in big data requires experimentation and exploration. Whether creating new products or looking for ways to gain competitive advantage, the job calls for curiosity and an entrepreneurial outlook.

what does big data look like?

As a catch-all term, “big data” can be pretty nebulous, in the same way that the term “cloud” covers diverse technologies. Input data to big data systems could be chatter from social networks, web server logs, traffic flow sensors, satellite imagery, broadcast audio streams, banking transactions, MP3s of rock music, the content of web pages, scans of government documents, GPS trails, telemetry from automobiles, financial market data, the list goes on. Are these all really the same thing?

To clarify matters, the three Vs of volume, velocity and variety are commonly used to characterize different aspects of big data. They’re a helpful lens through which to view and understand the nature of the data and the software platforms available to exploit them. Most probably you will contend with each of the Vs to one degree or another.

volume

The benefit gained from the ability to process large amounts of information is the main attraction of big data analytics. Having more data beats out having better models: simple bits of math can be unreasonably effective given large amounts of data. If you could run that forecast taking into account 300 factors rather than 6, could you predict demand better?

This volume presents the most immediate challenge to conventional IT structures. It calls for scalable storage, and a distributed approach to querying. Many companies already have large amounts of archived data, perhaps in the form of logs, but not the capacity to process it.

Assuming that the volumes of data are larger than those conventional relational database infrastructures can cope with, processing options break down broadly into a choice between massively parallel processing architectures — data warehouses or databases such as Greenplum — and Apache Hadoop-based solutions. This choice is often informed by the degree to which the one of the other “Vs” — variety — comes into play. Typically, data warehousing approaches involve predetermined schemas, suiting a regular and slowly evolving dataset. Apache Hadoop, on the other hand, places no conditions on the structure of the data it can process.

At its core, Hadoop is a platform for distributing computing problems across a number of servers. First developed and released as open source by Yahoo, it implements the MapReduce approach pioneered by Google in compiling its search indexes. Hadoop’s MapReduce involves distributing a dataset among multiple servers and operating on the data: the “map” stage. The partial results are then recombined: the “reduce” stage.

To store data, Hadoop utilizes its own distributed filesystem, HDFS, which makes data available to multiple computing nodes. A typical Hadoop usage pattern involves three stages:

  • loading data into HDFS,
  • MapReduce operations, and
  • retrieving results from HDFS.

This process is by nature a batch operation, suited for analytical or non-interactive computing tasks. Because of this, Hadoop is not itself a database or data warehouse solution, but can act as an analytical adjunct to one.

One of the most well-known Hadoop users is Facebook, whose model follows this pattern. A MySQL database stores the core data. This is then reflected into Hadoop, where computations occur, such as creating recommendations for you based on your friends’ interests. Facebook then transfers the results back into MySQL, for use in pages served to users.

velocity

The importance of data’s velocity — the increasing rate at which data flows into an organization — has followed a similar pattern to that of volume. Problems previously restricted to segments of industry are now presenting themselves in a much broader setting. Specialized companies such as financial traders have long turned systems that cope with fast moving data to their advantage. Now it’s our turn.

Why is that so? The Internet and mobile era means that the way we deliver and consume products and services is increasingly instrumented, generating a data flow back to the provider. Online retailers are able to compile large histories of customers’ every click and interaction: not just the final sales. Those who are able to quickly utilize that information, by recommending additional purchases, for instance, gain competitive advantage. The smartphone era increases again the rate of data inflow, as consumers carry with them a streaming source of geolocated imagery and audio data.

It’s not just the velocity of the incoming data that’s the issue: it’s possible to stream fast-moving data into bulk storage for later batch processing, for example. The importance lies in the speed of the feedback loop, taking data from input through to decision. A commercial from IBM makes the point that you wouldn’t cross the road if all you had was a five-minute old snapshot of traffic location. There are times when you simply won’t be able to wait for a report to run or a Hadoop job to complete.

Industry terminology for such fast-moving data tends to be either “streaming data,” or “complex event processing.” This latter term was more established in product categories before streaming processing data gained more widespread relevance, and seems likely to diminish in favor of streaming.

There are two main reasons to consider streaming processing. The first is when the input data are too fast to store in their entirety: in order to keep storage requirements practical some level of analysis must occur as the data streams in. At the extreme end of the scale, the Large Hadron Collider at CERN generates so much data that scientists must discard the overwhelming majority of it — hoping hard they’ve not thrown away anything useful. The second reason to consider streaming is where the application mandates immediate response to the data. Thanks to the rise of mobile applications and online gaming this is an increasingly common situation.

Product categories for handling streaming data divide into established proprietary products such as IBM’s InfoSphere Streams, and the less-polished and still emergent open source frameworks originating in the web industry: Twitter’s Storm, and Yahoo S4.

As mentioned above, it’s not just about input data. The velocity of a system’s outputs can matter too. The tighter the feedback loop, the greater the competitive advantage. The results might go directly into a product, such as Facebook’s recommendations, or into dashboards used to drive decision-making.

It’s this need for speed, particularly on the web, that has driven the development of key-value stores and columnar databases, optimized for the fast retrieval of precomputed information. These databases form part of an umbrella category known as NoSQL, used when relational models aren’t the right fit.

variety

Rarely does data present itself in a form perfectly ordered and ready for processing. A common theme in big data systems is that the source data is diverse, and doesn’t fall into neat relational structures. It could be text from social networks, image data, a raw feed directly from a sensor source. None of these things come ready for integration into an application.

Even on the web, where computer-to-computer communication ought to bring some guarantees, the reality of data is messy. Different browsers send different data, users withhold information, they may be using differing software versions or vendors to communicate with you. And you can bet that if part of the process involves a human, there will be error and inconsistency.

A common use of big data processing is to take unstructured data and extract ordered meaning, for consumption either by humans or as a structured input to an application. One such example is entity resolution, the process of determining exactly what a name refers to. Is this city London, England, or London, Texas? By the time your business logic gets to it, you don’t want to be guessing.

The process of moving from source data to processed application data involves the loss of information. When you tidy up, you end up throwing stuff away. This underlines a principle of big data: when you can, keep everything. There may well be useful signals in the bits you throw away. If you lose the source data, there’s no going back.

Despite the popularity and well understood nature of relational databases, it is not the case that they should always be the destination for data, even when tidied up. Certain data types suit certain classes of database better. For instance, documents encoded as XML are most versatile when stored in a dedicated XML store such as MarkLogic. Social network relations are graphs by nature, and graph databases such as Neo4J make operations on them simpler and more efficient.

Even where there’s not a radical data type mismatch, a disadvantage of the relational database is the static nature of its schemas. In an agile, exploratory environment, the results of computations will evolve with the detection and extraction of more signals. Semi-structured NoSQL databases meet this need for flexibility: they provide enough structure to organize data but don’t require the exact schema of the data before storing it.

in practice

We have explored the nature of big data, and surveyed the landscape of big data from a high level. As usual, when it comes to deployment there are dimensions to consider over and above tool selection.

cloud or in-house?

The majority of big data solutions are now provided in three forms: software-only, as an appliance or cloud-based. Decisions between which route to take will depend, among other things, on issues of data locality, privacy and regulation, human resources and project requirements. Many organizations opt for a hybrid solution: using on-demand cloud resources to supplement in-house deployments.

big data is big

It is a fundamental fact that data that is too big to process conventionally is also too big to transport anywhere. IT is undergoing an inversion of priorities: it’s the program that needs to move, not the data. If you want to analyze data from the U.S. Census, it’s a lot easier to run your code on Amazon’s web services platform, which hosts such data locally, and won’t cost you time or money to transfer it.

Even if the data isn’t too big to move, locality can still be an issue, especially with rapidly updating data. Financial trading systems crowd into data centers to get the fastest connection to source data, because that millisecond difference in processing time equates to competitive advantage.

big data is messy

It’s not all about infrastructure. Big data practitioners consistently report that 80% of the effort involved in dealing with data is cleaning it up in the first place, as Pete Warden observes in his Big Data Glossary: “I probably spend more time turning messy source data into something usable than I do on the rest of the data analysis process combined.”

Because of the high cost of data acquisition and cleaning, it’s worth considering what you actually need to source yourself. Data marketplaces are a means of obtaining common data, and you are often able to contribute improvements back. Quality can of course be variable, but will increasingly be a benchmark on which data marketplaces compete.

culture

The phenomenon of big data is closely tied to the emergence of data science, a discipline that combines math, programming and scientific instinct. Benefiting from big data means investing in teams with this skillset, and surrounding them with an organizational willingness to understand and use data for advantage.

In his report, “Building Data Science Teams,” D.J. Patil characterizes data scientists as having the following qualities:

  • Technical expertise: the best data scientists typically have deep expertise in some scientific discipline.
  • Curiosity: a desire to go beneath the surface and discover and distill a problem down into a very clear set of hypotheses that can be tested.
  • Storytelling: the ability to use data to tell a story and to be able to communicate it effectively.
  • Cleverness: the ability to look at a problem in different, creative ways.

The far-reaching nature of big data analytics projects can have uncomfortable aspects: data must be broken out of silos in order to be mined, and the organization must learn how to communicate and interpet the results of analysis.

Those skills of storytelling and cleverness are the gateway factors that ultimately dictate whether the benefits of analytical labors are absorbed by an organization. The art and practice of visualizing data is becoming ever more important in bridging the human-computer gap to mediate analytical insight in a meaningful way.

know where you want to go

Finally, remember that big data is no panacea. You can find patterns and clues in your data, but then what? Christer Johnson, IBM’s leader for advanced analytics in North America, gives this advice to businesses starting out with big data: first, decide what problem you want to solve.

If you pick a real business problem, such as how you can change your advertising strategy to increase spend per customer, it will guide your implementation. While big data work benefits from an enterprising spirit, it also benefits strongly from a concrete goal.


Check out this Strata Conference presentation from 2011; Edd Dumbill chats up challenges with big data.


e-framework for education and research

January 19, 2012

JISC e-framework

unlock my postsWhat if across the education community there was one standard description for each course? What if researchers could avoid duplicating their efforts by working across institutions in virtual organizations?

As end users, both students and researchers can gain from a service oriented approach to linking digital systems and applications. Science student retention in the United States is dipping lower and lower, which threatens our standing as a leader in the global innovation economy. Exchanging Course Related Information (ECRI), the project that encompasses course marketing, quality assurance, enrollment and reporting requirements, puts front and center the correlation between loss of student retention and loss of funding. From a research perspective, the MyGrid project provides a shared toolkit for researchers to share, reuse and repurpose experiments. This is just one example of how restructuring the education and research framework can save valuable time and resources for learning and research institutions as well as the talented individuals who might comprise them.

I transcribed a section of this Joint Information Systems Committee (JISC) video in the hopes that anyone experienced with the SOA mentioned would share some feedback, either here or on diigo. Video transcript below.

The e-Framework for Education and Research is an initiative by the UK’s Joint Information Systems Committee (JISC) and Australia’s Department of Education, Science and Training (DEST). The primary goal of the e-Framework is to facilitate technical interoperability within and across education and research through improved strategic planning and implementation processes:

The service oriented approach to linking software systems and applications is transforming the way many organizations share data. this approach improves existing methods of data sharing by providing a service lab between systems. sharing data between applications is a well-established principle that works on a simple level. as systems expand, the solution that connects two software applications will often not work for a third. what began as a workable system then locks valuable data in a silo. sharing the data is still possible but requires adaptations and tweaks. the process is laborious and time consuming, leading to bottlenecks and overload. the reality of this world is unnecessary effort expended in duplication of data.

by applying a service lab to the applications you wish to connect up, data is offered up in a common format for reuse elsewhere. the service oriented approach works with existing software systems and does not require you to remove the monolithic application. placement applications reuse the links already made and can plug into the service lab without affecting other users of the data. new applications can be added to grow overall system architectures in the same way. data in each application is offered up as a service which any other application can consume. when the service oriented approach is evolved into a system-wide architecture, it allows connections and service-sharing opportunities between organizations nationally and globally.

take monitoring of student progress for example. results are logged by a virtual learning environment, but how much of that data is shared with a management system that is tracking progression against funding? If the motivation and commitment of struggling students is not addressed and they decide to walk away, there is a direct impact on funding. if the data is freed up and shared across a system using a service oriented approach, it could contribute to retaining student motivation and thereby maximizing income.


illustration: student loan debt

January 18, 2012

Outrage! How Student Loans are Destroying Us


freeing knowledge creation

January 17, 2012

unlock my posts

source: The Meta-Activism Project, “Fixing Peer Review, Freeing Knowledge Creation” by Mary C. Joyce

In my last post I vented some frustration about the inefficiencies of academia: how current standards for intellectual property, peer review, and tenure are actually limiting academia’s avowed goal of creating and disseminating knowledge, not only in the field of digital activism, but in others as well.

In this post I’d like to dig into one element of that problem: peer review. As the dysfunctional cornerstone of tenure, academic publication, and the validation of truth, peer review clearly has problems. Let’s start by looking at the current process. The graphic below is from the web site Understanding Science, created by the University of California Museum of Paleontology.  It shows a rational system of academic self-regulation in which ideas are filtered through a process of expert analysis before publication, ensuring that the final public product is accurate and useful:
peer review
Yet this diagram glosses over some key stages in the process in order to draw its picture of a rational system. It implies a single step between “studying something” and writing an academic article.  In reality, scientists and scholars (a broader term) often start by collecting data.  After they collect the data they analyze it and from that they draw conclusions.  Once they have those conclusions they can choose to write an academic article about it.

Assuming the process, which can take years, goes well, the result is a single academic journal article, which can be used by other scholars studying similar problems and which can be used by the scholars who wrote it to gain the employment security of tenure.

From this perspective it is pretty easy to see areas for improvement:

  1. Scholars Collect Data: Why just two guys at a telescope?  Why not hundreds of guys (and women) at hundreds of telescopes, compiling data in the cloud?
  2. Data Analysis: Why are these two guys analyzing their data alone?  Why don’t they share it so other scholars can bring their unique perspective, leading not to one set of conclusion but dozens or hundreds?
  3. Writing About It: Peer-reviewed journals clearly have their place, but there are many other options. Conclusions can be easily blogged, even during the analysis process, creating linkages and dialogue between the different scholars studying the data and engaging new scholars in processes of inquiry by offering multiple opportunities for public engagement and discussion.
  4. Peer Review Process: Assuming that an article is written for peer review, this process continues uninterrupted, except that, with proper coordination, there might be multiple articles, on multiple aspects of the data, that are also in this process.  In addition, peer review is not slowing publication. Much of the research has already been made public informally through blogging, open data sets, listservs, and other forms of sharing.
  5. More Than One Article: Not one but several sets of conclusions are made public. There may be a few journal articles in addition to dozens of blog posts and hundreds of discussions. In addition, the data and discussion of conclusions has been in the public domain throughout the formal peer review process, removing the delay of analysis by others.

By opening the process, data has been accessible by more people and has been processed by more minds, more quickly.  The final knowledge created is greater by almost any measure, be it by total publications, awareness of the issue, or individual insights generated.

Here’s how peer review could look:

image: Understanding Science

Why make these changes now? Because publication, mass collaboration, and remote coordination have never been easier or cheaper.  In fact, at the moment it is much harder to keep information secret than to let it be free (just ask the State Department or Stewart Brand).  Quick and broad transfer of information is the new normal.  When the goal is knowledge creation, why fight it?

Why haven’t these changes been made yet? Because there are institutional disincentives to sharing.  First of all, tenure largely rides on publication of books and peer reviewed journal articles.  If a scholar must publish in this manner in order to secure his livelihood, he is going to ensure that it is he who publishes the analysis of the data he collected.

Secretiveness about intellectual property (in industry even more than in academia) is often based on the supposition that someone smarter, with better insight, might take the information and turn it into a consumable product more quickly that the person who made the initial investment to collect the data.  In business, where each firm has an individual profit motive, this might make sense.  But in academia, where the goal should be to create knowledge, preventing someone with greater insight from using available data makes no sense at all.

Does any of this matter? Peer review, tenure, the ivory tower – sounds like pretty dry stuff.  I suppose in a field like art history or literary criticism, where the findings of research are of most interest to the scholarly community of that particular field, the current slow process of peer review might not have a grave effect.  However, in fields like medicine, environmental science, and digital activism new insights can affect the lives of millions of people.  (For a rare dramatization of how current academic processes cost lives, rent And The Band Played On, which addresses the deleterious affects of academic competition and secrecy in the race to fight AIDS.)

What is the role of organizations like the Meta-Activism-Project? One of the roles of the Meta-Activism Project and other research projects outside of academia is to test-drive these new methods of knowledge creation, showing that more open processes do work.  Beyond the knowledge gained about digital activism through the Global Digital Activism Data Set (below), we hope that the project will have demonstration value in proving the merit of open and digital methods of knowledge creation.  Humanity today is beset by a range of complex and existential challenges.  We do ourselves a disservice when we throttle our own efforts to study these phenomena and find solutions.


Follow

Get every new post delivered to your Inbox.