2017 — Year in Review

We (Catherine D’Ignazio and Rahul Bhargava) launched DataBasic.io2016; hoping to help other folks take a more creative, engaging and welcoming way to learn how to work with data. Why? Learn more in our blog posts about building a data cultureor building data literacy without complicated software. Hard to believe that was 2 years ago! We decided this January to share some of the ways we assess our impact. It is important to do this in public, to share our successes and own our weaknesses. So here is a quick review of some of the quantitative data about how people are using our online tools and activities.

This year we connected with new users, built out the tool suite, and helped people of all backgrounds work with data in fun and interesting ways. In addition to launching Connect the Dots, which helps you see how your data is connected, we led numerous workshops. These included nonprofits like the Maine Data Initiative and the World Food Program, more than a hundred journalism students in Greater Boston, librarians in Michigan and Massachusetts, arts-based organizations like Theater Communications Group, and even middle school girls at Greenwich Academy in Connecticut.

New users of made up over ⅔ of our total 2017 audience

In 2017, DataBasic had over 18,000 new users! These new folks made up ⅔ of the total uses of our tools online. That is a lot of growth over the last year, so we are excited that the outreach we are doing is working!

International visitors made up more than ½ of our total users

Surprisingly, just under half of new users came from the United States. We’ve built out our suite in Spanish and Portuguese, so we are trying to fill the need in other languages, but we still expected the majority of our users to be in the US. Over 50% of users in 2017 originated from countries like Spain, UK, and Taiwan. This suggests our audience is international, and perhaps that we should get some help translating our tools into French! We do this translation and localization work because we think it is super important to build tools for learners that are left out of the data hype right now, and our experience has taught us that non-English speakers make up a big chunk of that group. This data supports our approach and motivates us to continue to expand our multi-lingual approach.

People visiting the site are actually using it

Websites online typically measure “conversion” rates, to assess how many people are going from a homepage to some kind of results that indicate they did something. For us, conversion is about whether people try out one of our tools or not (i.e. do they get to a page showing results of analyzing some data.) Across all our tools this hovers around 40%. That might sound low, but for the web it is a surprisingly high number! That tells us that something is working about the way we have built We’ll be doing some more qualitative analysis to understand why, but it is a great start to understand whatis happening with the quantitative data.

WordCounter, our approachable tool for analyzing text, was most heavily utilized in terms of number of users, number of sessions, and session duration. This isn’t too surprising, because it is the tool we use most in workshops. It is also worth mentioning that this list of conversion rates matches the order of tool links on our homepage from left to right. Perhaps we should put in some code to randomize this and see if it impacts which tools folks try out.

Folks are using our learning-centered features

We wrote an academic paper a while back about how and why we built to support data literacy (read it in the Journal of Community Informatics). Now we have enough usage data to assess whether the design principles we describe are effective in practice or not. A quick and easy one to evaluate is the tool-tips feature we’ve included on vocabulary to be more inviting.

Tooltips saw healthy usage — more than two-thirds of of new users used at least one tooltip! Again, we have to do some qualitative interviews and such to understand why, but knowing that the engagement on these explanatory tooltips is so high suggests this particular feature is having an impact.

Coming Soon…

That’s it for now! If is new to you, check out the hands-on activities and focused tools to help you introduce a data culture to your organization in interesting ways. Thank you for being a part of our journey in 2017.

Up next? Over the last few months we’ve been testing out new ways to help organizations run these activities themselves. 30 brave organizations from around the world partnered with us, and we’ve learned loads from how they tailor the activities to their specific needs. We’re putting all that learning and experience together into a self-service curriculum for organizations trying to build their capacity to work with data. We’re calling it “Data Culture Project”, and it’ll launch with an introductory webinar online this March! Totally free. Available online. So stay tuned….

Many thanks to Connie Yee for her work analyzing all this data.

DataBasic at Harvard Law’s Systemic Justice Project

Rahul was invited again this year to join Professor Jon Hanson’s System Justice course at Harvard, to introduce law students to how to include data within their arguments. The DataBasic activities provided a perfect way to explore asking questions (with WTFcsv) and sketching stories (with WordCounter).

The students also got a chance to practice tailoring data-driven arguments to different audiences, using a new participatory activity that we’re still workshopping.

Here are the slides:

New DataBasic Tool Lets You “Connect the Dots” in Data

Catherine and I have launched a new DataBasic tool and activity, Connect the Dots, aimed at helping students and educators see how their data is connected with a visual network diagram.

By showing the relationships between things, networks are useful for finding answers that aren’t readily apparent through spreadsheet data alone. To that end, we’ve built Connect the Dots to help teach how analyzing the connections between the “dots” in data is a fundamentally different approach to understanding it.

The new tool gives users a network diagram to reveal links as well as a high level report about what the network looks like. Using network analysis helped Google revolutionize search technology and was used by journalists who investigated the connections between people and banks during the Panama Papers Leak.

Connect the Dots is the fourth and most recent addition to DataBasic, a growing suite of easy-to-use web tools designed to make data analysis and storytelling more accessible to a general and non-technical audience launched last year.

As with the previous three tools released in the DataBasic suite, Connect the Dots was designed so that its lessons can be easily planned to help students learn how to use data to tell a story. Connect the Dots comes with a learning guide and introductory video made for classes and workshops for participants from middle school through higher education. The learning guide has a 45-minute activity that walks people through an exercise in naming their favorite local restaurants and seeking patterns in the networks that result. To get started using the tool, sample data sets such as Donald Trump’s inside connections and characters from the play Les Miserables have also been included to help introduce users to vocabulary terms and the algorithms at work behind the scenes. Like the other DataBasic tools, Connect the Dots is available in English, Portuguese, and Spanish.

Learn more about Connect the Dots and all the DataBasic tools here.

Have you used DataBasic tools in your classroom, organization, or personal projects? If so, we’d love to hear your story! Write to [email protected] and tell us about your experience.

Designing Tools for Learners (Not Users)

We (Catherine and Rahul) just co-authored an article in the Journal of Community Informatics called Design Principles, Tools and Activities for Data Literacy Learners. In it, we make the case that most tools that help people work with data prioritize flashy visualizations and outputs rather than helping to scaffold a learning process. This ends up making the process of data analysis like a black box (especially for people from non-technical backgrounds). We pose the question – what would it be like if we designed tools for learners rather than users? We offer four qualities that a tool designed for learners should aspire to be: focusedguidedinviting, and expandable and we go on to talk about DataBasic as a case study. Here are the four qualities:

focused tool strives to do one thing well.  These tools are easily learnable and relatively constrained.  Focused tools do not provide many types of options, and thus can provide a low entry point for data literacy learners.  They create a small playground that is rich enough for the learner to play within, but not so rich that they get lost.

guided tool is introduced with strong activities to get the learner started.  Blank-slate websites and software packages require novice users to imagine usage scenarios.  Guided tools combat this by introducing themselves with an activity that holds the learner’s hand as they get started.  These tools might immediately present an on-ramp for learners via example data and example outputs.

An inviting tool is introduced in a way that is appealing to the learner. This might involve using data on a topic that is relevant or meaningful to them, or simply using humor and playfulness to invite the learner to experiment.  Inviting tools make conscious decisions about visual design, user interface and copywriting to offer a consistent, appealing, and non-intimidating invitation to the learner. Inviting activities use familiar materials to produce playful outputs that attract interest and excitement from learners.

An expandable tool is appropriate for the learner’s abilities, but also offers them paths to deeper learning (perhaps by leaving the tool and graduating to more complicated tools).  They overcome a single-minded focus by including call-outs and capabilities that allow the learner an opportunity and pathway to learn more about how the tool works.  Expandable tools recognize that they are steps along the path to building stronger data literacy for the learner, and help bridge from previous work to next steps.

Check out the full paper here. It is part of a special issue on data literacy published by the Journal of Community Informatics.


Workshopping a New Tool

DataBasic is constantly developing, driven by the needs we see in the communities around us; and our ability to get funding to support development and testing.  Over the summer we’ve been designing and developing a new tool we’re calling “Connect the Dots“.  The goal is to help introduce the language and approach of network analysis, and how it can let you ask different kinds of questions.

Since we don’t build new tools in isolation, we brought together a bunch of educators, journalists, students, visualization folks, and designers to give us feedback on our first design.  Here’s a “behind the scenes” video of that first workshop, where we tried out the technology and the activity.

Looks like fun, doesn’t it? So keep your eyes peeled – we expect to launch Connect the Dots as the latest member of the DataBasic suite later this fall!

Workshop at the Data Literacy Conference

We hosted a “Making Data Fun with an Arts-Based Approach” workshop for attendees at the Data Literacy Conference on making data fun using an arts-based approach. Participants used WTFcsv and WordCounter to learn how to ask questions and sketch out a story.  The audience included government agencies, journalists, educators, and others.  This diversity generated a really good conversation about how the could use these tools and approaches in their own work.

One of the groups made a fun image about how when they looked at a heavy metal band’s lyrics with Wordcounter, they found them to be much more lovey-dovey than they had expected!

Here’s the abstract describing the workshop:

Looking for creative ways to present data, empower community, and create art? We will share hands-on techniques for bringing people together to find and tell stories with data. Our activities can help you run workshops that lead a group through finding stories in data, picking a story to tell, and sketching or building something to tell that story. You will walk away with skills to facilitate capacity building and critical data literacy activities with youth and adults.

Big Data and Development at the MIT Media Lab

We conducted a workshop as part of the Data-Pop Alliance’s Global Professional Training Program on Big Data and Development at the MIT Media Lab. Data-Pop’s program focuses on building capacity for working with data for global professionals who are involved in development work and policymaking. You can read more about their approach here. In attendance were around 30 folks from universities, the civil sector, and government from a variety of countries, including Colombia, Senegal, France and the US.

Our workshop was titled Big, small, and popular data: engaging communities with data. First, we did a group critique of an infographic about global food production. This followed the structure of DataTherapy’s Activity: Critique a Gallery of Visualizations where we explored the story’s message, the audience and the visual techniques they used to tell a data-driven story. We then showed a basic process for working with data and gave some examples of how you can build in stakeholder participation at every stage of the process. The GoBoston2030 public engagement process run by the City of Boston for their transportation master planning process is a great example of this in government. They did community data analysis and interpretation events in order to make meaning out of thousands of qualitative data stories that they collected from citizens.

Finally, we presented the basic design goals of Databasic and participants worked in groups to tell a story from quantitative text data using WordCounter and to ask questions of a spreadsheet using WTFcsv. Groups came up with compelling ways to tell stories about their data in less than ten minutes. We had circle diagrams, sophisticated Simpsons’ cartooning and compelling concepts. We followed the workshop with Q & A about how to take simple, participatory methods back to their contexts.


DataBasic at Boston Civic Media

Rahul recently led a short workshop on DataBasic at the Boston Civic Media annual meeting.  The 1 hour hands-on session focused on how to use the arts to tell your civic data story.

Curious about how to use the arts to tell your civic data story? Wondering how to use the arts to help learn how to work with data?  We will introduce a suite of free easy-to-use web tools for beginners that introduce how to work with data through arts-based activities.  Finding and telling data-driven stories can help you streamline operations, spread your message, and bring people together for creative activities.  You’ll walk away with new skills for yourself, fun techniques for helping others learning to work with data, and inspirations for how to combine civic data and the arts.


Participants used WordCounter to try sketching out stories based on lyrics from popular musicians.

The workshop brought together a diverse set of people interested in using the arts to engage people with data.  Reaching these non-traditional consumers of data is super important to us, so it was a joy to be able to engage with them.

Our round-up of Databasic workshops and demos from Spring 2016

This has been an eventful spring for Databasic! After launching in January to great success we have been traveling to classes, conferences and workshops to help different groups of people learn about working with data.

Catherine led four workshops for graduate and undergraduate Journalism students at Emerson College. Journalism students learned how to work with qualitative and quantitative datasets from open data portals and start telling stories. Rahul led a workshop for his Data Storytelling Studio class at MIT. Students studying art and civic engagement at Emerson College used Databasic to analyze citizen ideas for the future of transportation in Boston which they later presented to guests from the City of Boston and State of MA.

Rahul led a workshop for a coalition of organizations that work with the youth in the arts sector.  It was an exciting chance to share how the DataBasic approach can help arts organizations think about telling stories of their impact with the rich qualitative and quantitative data they have.  The group loved WTFcsv’s visual approach to finding stories.

The Institute for Infinitely Small Things, a public art group, used Databasic as part of their project Campaign Limericks where they worked with students and community members to create limericks out of the top phrases spoken by presidential candidates. Want to check out our corpus of candidates speeches? There are over 100K words for Trump, Clinton, Cruz and Bernie. Catherine and the Institute later created an art installation and data visualization of four limericks at the Harvard Center for American Political Studies.

Four large limerick visualizations created from Databasic analysis are up at the Harvard Center for American Politics in Cambridge thru August 2016.

In April, Rahul shared Databasic as a tool for participatory data analysis at TICTec 2016 in Barcelona. Organized by mySociety, the conference showcased many different technologies and methods for evaluating the impact of Civic technologies for people from 20+ countries. Here are some tweets from his presentation:


Catherine spoke about Data & Community Engagement at the White House and demo’d Databasic to 150+ people in law enforcement and technology at the White House’s celebration of one year of the Police Data Initiative. More than 53 police departments across the country have signed on to opening up their data in the next year. The event was inspiring and showcased law enforcement departments like Dallas and Orlando who are at the cutting edge of transparency and community engagement.

Also, Catherine got to take a photograph with USCTO Megan Smith which was kind of awesome:

And in early May, Catherine ran a Data Storytelling 101 workshop for journalists on the education beat at the Education Writers Association conference. We worked with data from the Chicago Public School system on student suspensions and started asking questions about race and school ratings in conjunction with suspensions. We also spent a good portion of our time talking about cleaning and merging data.

The spring wrapped up with a workshop for 50 municipal government workers participating in the CityAccelerator project organized by Living Cities and facilitated by Eric Gordon and the Engagement Lab in New Orleans. Teams from Seattle, Albuquerque, Baltimore, Atlanta and New Orleans worked on analyzing citizen comments with WordCounter and spreadsheets related to their accelerator projects with WTFcsv. We also brainstormed other datasets internal to their organizations that they might use with Databasic.


We are thrilled at the reception up to this point and have learned a lot from our participants’ ideas about how they can use Databasic in the context of journalism, media literacy, the arts, community engagement and local government.

v1.2.0: New Features and Bug Fixes

We just released a new version of DataBasic – version 1.2.0!  Yeah, it doesn’t sound that exciting but trust us, there is some good stuff here 🙂

The highlights:

  • CSV parsing is now much more robust
  • We’ve added the idea of “normalization” to SameDiff
  • We cleaned up some of our sample data
  • We sped up everything
  • We show you when your results are going to expire soon (they last for 60 days and then we delete them)

You can browse our GitHub Issue tracker to see all the bug fixes in this release.

Go try it out now at!