Skip to content

Latest commit

 

History

History
118 lines (84 loc) · 5.97 KB

README.md

File metadata and controls

118 lines (84 loc) · 5.97 KB

Teddy, the Review Explorer

This page contains the source code and supplementary material for our CHI 2020 paper: "Teddy: A System for Interactive Review Analysis".

  1. Introduction
  2. Motivation: An Interview Study into Review Analysis Practices and Challenges
  3. How to use the data and source code in this repo?
  4. The Dataset
  5. Citing Teddy
  6. Contact

Introduction

Teddy (Text Exploration for Diving into Data deeplY) is an interactive system that enables data scientists to quickly obtain insights from reviews and improve their extraction and modeling pipelines. Please watch the demo video for an overview of the system and our contributions.

You can also try Teddy online here!

Above: The Teddy user interface. From left to right we have the Entity View displaying the entities mentioned in reviews, the Cluster View for exploring aggregate statistics over hierarchical clusters of reviews, the Detail View for viewing and filtering/sorting individual reviews, and the Schema Generation View for recording aspects of interest from the reviews.



Above: The Teddy review exploration pipeline. Users can customize the data processing pipeline based on their task, whether it is classification, opinion extraction, or representation learning, and use Teddy to gain insights about their data and model. They can also use the application to iterate on the data processing pipeline, for example by creating a new schema that describes attributes of their review corpus.

Motivation: An Interview Study into Review Analysis Practices and Challenges

We conducted an interview study with fifteen participants to better understand the workflows and rate-limiting tasks of data scientists working on reviews, which motivated the development of features in Teddy. We used an iterative coding method to aggregate the collected data.

Download the results of our iterative coding here.

Anonymized notes from individual interviews and our interview question template are also available in the results/ folder.

How to use the data and source code in this repo?

Important Folders

  • app/ server and front-end code
  • data/ subdirectories containing Trip Advisor data or your own datasets
  • libs/ python libraries for data processing
  • tests/ testing code for the code in libs/

Local Installation

Teddy requires Python 3.5 or above. Make sure you have venv installed. If you don't, run python3 -m pip install virtualenv Copy the contents of /app/react-app/.env.example to /app/react-app/.env

# Install dependencies
make install ENV=local
# Build dependencies
make build
# These will automatically run in a virtual environment called 'venv'

API Keys (Optional)

Teddy requires Google API Keys in order to render the map and the hotel images. Please refer to Google Maps Platform on how to get an API Key, and enable the Maps JavaScript API and the Places API.

Running the Application

# start the backend server
make server
# start the user interface
make ui

Then navigate to http://localhost:3000 in your browser.

The Dataset

The reviews we provide in order to demonstrate the application are provided by Trip Advisor under the Creative Commons Attribution Non-Commercial 4.0 International License.

(Barkha Bansal. (2018). TripAdvisor Hotel Review Dataset. Zenodo. http://doi.org/10.5281/zenodo.1219899).

A subset of the reviews for San Francisco hotels have been selected and modified by (1) computing extractions of aspect, opinion pairs and (2) clustering and computing statistics over those clusters.

Some of the icons used in our application are made by Freepik and can be found at www.flaticon.com.

Citing Teddy

Please cite the CHI paper.

@inproceedings{zhang2020teddy,
 title = {Teddy: A System for Interactive Review Analysis},
 author = {
   Xiong Zhang AND
   Jonathan Engel AND
   Sara Evensen AND
   Yuliang Li AND
   {\c{C}}a{\u{g}}atay Demiralp AND 
   Wang-Chiew Tan
   }, 
 booktitle = {ACM Human Factors in Computing Systems (CHI)},
 year = {2020}
}

Contact

To get help with problems using Teddy or replicating our results, please submit a GitHub issue.

For personal communication related to Teddy, please contact Jonathan Engel ([email protected]), Sara Evensen ([email protected]), or Çağatay Demiralp ([email protected]).