9781449326265
index.html

Agile Data

Russell Jurney

Russell Jurney

Printed in the United States of America.

[?]

O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://my.safaribooksonline.com). For more information, contact our corporate/institutional sales department: (800) 998-9938 or .

Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of O’Reilly Media, Inc. !!FILL THIS IN!! and related trade dress are trademarks of O’Reilly Media, Inc.

Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and O’Reilly Media, Inc. was aware of a trademark claim, the designations have been printed in caps or initial caps.

While every precaution has been taken in the preparation of this book, the publisher and authors assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein.

Revision History
YYYY-MM-DD
First release

Preface
Conventions Used in This Book
Using Code Examples
Safari® Books Online
How to Contact Us
I. Setup
1. Theory
Agile Data
Big Words Defined
Agile Data Teams
Opportunity and Problem
Adapting to Change
Agile Data Stacks
Agile Data Process
Code Review and Pair Programming
Agile Environments: Engineering Productivity
Collaboration Space
Private Space
Personal Space
Realizing Ideas with Large Format Printing
2. Data
Email
Working with Raw Data
Raw Email
Structured vs. Semi-Structured Data
SQL and NoSQL
Serialization
Extracting and Exposing Features in Evolving Schemas
Data Pipelines
Data Structures as Perspectives
Social Networks
Time Series
Natural Language
Probability Theory
Conclusion
3. Agile Tools
Scalability = Simplicity
Agile Data Processing
Setting up a Virtual Environment for Python
Serializing Data with Avro
Avro for Python
Collecting Data
Data Processing with Pig
Introduction
Installing Pig
Publishing Data with MongoDB
Introduction
Installing MongoDB
Installing MongoDB's Java Driver
Installing mongo-hadoop
Pushing data to MongoDB from Pig
Searching Data with ElasticSearch
Installation
ElasticSearch and Pig with Wonderdog
Reflecting on our Workflow
Lightweight Web Applications
Python and Flask
Conclusion
Presenting our Data
Introduction
Installing Bootstrap
Booting Boostrap
Visualizing Data with D3.js
Summary
4. To the Cloud!
Introduction
Github
Heroku
Echo on Heroku
Heroku Workers
Amazon Web Services
Simple Storage Service - S3
Elastic MapReduce
MongoDB as a service
Instrumentation
Google Analytics
Logentries to S3
5. Cloud Patterns
Introduction
Scaling Up
Shifting Processing
Building Indexes
Creating Keysets: Curating Ontologies
II. Climbing the Stack
6. The Data Value Stack
Introduction
Climbing the Stack
7. Collecting and Displaying Records
Introduction
Putting it all together
Collect and Serialize our Inbox
Process and Publish our Emails
Presenting Emails in a Browser
Serving emails with Flask and pymongo
Rendering HTML5 with Jinja2
Checkpoint
Listing Emails
Listing Emails with MongoDB
Anatomy of a Presentation
Searching our Email
Indexing our Email with Pig, ElasticSearch and Wonderdog
Searching our Email on the Web
Conclusion
8. Visualizing Data with Charts
Introduction
Good Charts
Extracting Entities: Email Addresses
Introduction
Extracting Emails
Visualizing Time
9. Exploring Data with Reports
Introduction
Building Reports with Multiple Charts
Linking Records
Conclusion
10. Making Predictions
Working with Sparse Data
Predicting Response Rates to Emails
Personalization
Conclusion
11. Driving Actions
Introduction
Site last updated on: July 26, 2012 at 05:06:45 AM PDT