DataBasic.io is live!!! After 6 months of planning, building, trying things out with folks, and rebuilding, we’re open to the public 🙂
We’ve got three tools for you to start playing with – WTFcsv, WordCounter, and SameDiff. Pop on over to https://databasic.io and give them a try. Right now we’re supporting Spanish or English, and it is accessible to visually impaired via screen-readering software.
Don’t forget to watch the short intro videos on each homepage, and check out the activity guides.
Big thanks to the Emerson Engagement Lab team for helping us get these done – specifically Jay and Jordan! And of course this wouldn’t have been possible without the support of our sponsor, the amazing Knight Foundation.
Remember those pictures from the video shoot that we shared recently? Well the editing is all done! Here are three videos that will be featured on the various homepages. These will be launching soon, so stay tuned…
Welcome to DataBasic
We just held our second, BETA, workshop to test out the DataBasic tools and activities. Our invitation brought a wonderfully diverse set of journalists, students, community organizers, educators, and folks that work in the arts! We had to limit attendance to 40 people, simply due to physical limitations in the room.
We gathered another round of invaluable feedback, documented some more of initial uses, brainstormed potential applications, and had a bunch of fun! Here are two quick drawings participants made, comparing the lyrics that various musicians use with WordCounter and SameDiff:
We’re hosting a BETA workshop on our DataBasic suite of tools on Tues 12/8, from 6-8pm. This workshop is designed for journalists, educators and community organizations that are just starting to work with data. Register on our evenbrite page:
The tools focus on understanding what is in a CSV file, and also starting to analyze large sets of text data in quantitative ways. We introduce each with a fun, hands-on activity, so it isn’t just staring at screens all evening 🙂 Read the invite for more about why you might want to attend. We’d appreciate your help testing these tools out and want to get more feedback from real folks before we launch them publicly!
(Plus free dinner!)
Sometimes online tools for working with data can be confusing and overwhelming when you first visit them. One way we can to try to address this is by having short, friendly introductory videos to tell you why you might want to use each tool. We wrote some scripts, found some clothes that match the logos, and started shooting video intros for each of the three tools.
They are in post-production now, but you’ll be able to watch them soon on the homepage of each tool in our suite! Here’s some photos to whet your appetite.
Trying to look casual is hard!
Haven’t had to do this much memorizing since grade school
On November 8th, 2015, we ran our first pilot workshop of the DataBasic suite of tools at the MIT Media Lab. We hosted around 10-12 people, mostly trusted friends who we could rely on to be honest but kind about the inevitable bugs and shortcomings of the tools at this early stage. For this event, we first outlined our high-level design goal: Design tools that support data literacy learners, not just folks who already know what they are doing with data. We also introduced them to the principles behind DataBasic and noted that our target audiences are journalists, educators, community organizations and students.
We then introduced our three tools, one at a time, with activities that we have designed to teach the tools in a fun way. For example, in WTFcsv – a tool that provides column-by-column descriptions of .csv files – learners chose to develop data-driven questions around UFO sightings. Did particular cities have higher per-capita sightings of UFOs for example? Why did so many people see UFOs in the form of “fireballs”? If we combined this data with weather data, would some interesting patterns surface?
If Kanye West and Elvis Presley had a song-baby, the lyrics would probably look like this.
WordCounter and SameDiff teach basic principles of quantitative text analysis. In the activities for these tools, learners worked with sample data from musician’s lyrics. They used crayons and simple drawings to illustrate patterns from individual artist’s lyrics as well as results from comparative analysis (SameDiff). The above image presents a sample song which would be written by Kanye West and Elvis Presley.
Overall, we learned a great deal from our learners. They had excellent ideas to make the tools more fun, approachable and instructive for new users. We also learned where we had over-complicated things and needed to go back and simplify. And, of course, we had lots of small fixes and feature suggestions that we are working on for our public launch in late December. Stay tuned!
Small fixes and features – suggestions from an astute group
DataBasic’s Guiding Principles
These are the design principles that we used to build DataBasic. They come from our paper Designing Tools and Activities for Data Literacy Learners. In order to support learners (rather than users who already know what they are doing), we say that tools should be:
- Centered around one user activity
- Doesn’t have too many options
- Newbies can do something meaningful quickly
- Has sample data baked in
- Can be run from home page
- Clear, contextual documentation to get new users started
- Has a sense of humor!
- Described in non-technical language
- Can be used in a real context
- Built for novices
- Includes information about how it works (not a black box)
- Places itself in a pipeline of analysis
DataBasic is a suite of focused and simple tools and activities for journalists, data journalism classrooms and community advocacy groups. We’re happy to announce that we’ve received funding from the Knight Foundation to build and test DataBasic over the next 6 months!
What is DataBasic?
Though there are numerous data analysis and visualization tools for novices there are some significant gaps that we have identified through prior research. DataBasic is designed to fill these gaps for people who do not know how to code and provide a low barrier to further learning about data analysis for storytelling.
In the first iteration of this project we will build three tools, develop three training activities and run one workshop with journalists and students for feedback. The three tools include:
- WTFcsv: A web application that takes as input a CSV file and returns a summary of the fields, their data type, their range, and basic descriptive statistics. This is a prettier version of R’s “summary” command and aids at the outset of the data analysis process.
- WordCounter: A basic word counting tool that takes unstructured text as input and returns word frequency, bigrams (two-word phrases) and trigrams (three-word phrases)
- SameDiff: A tool that gives you various ways to compare two text documents, to see how they are similar and/or different.
More importantly, we’ll be providing an introductory video and simple training activities for each tool as a way to scaffold learning about data analysis at the same time as doing it. These activities will include fun datasets to start off with, and introduce vocabulary terms and the algorithms at work behind the scenes. We strongly believe in building tools for learners, and will be putting those ideas into practice on these tools and activities.
Who is Building This?
Catherine D’Ignazio is an Assistant Professor of Data Visualization and Civic Media at Emerson College and a Fellow at the Engagement Lab. She has a background in software development, media analysis and the arts and currently teaches journalism students.
Rahul Bhargava is a Research Scientist at the MIT Center for Civic Media. He works in quantitative media analysis and leads data literacy workshops for students and community groups.
Is it Ready Yet?
We are still developing the first prototypes so we can try them out with folks. Expect to see more updates here as we build them out over the fall.