Python Data Analysis with Pandas and Matplotlib Workshop

Date: April 17-18, 2018 9am-4:30pm

Location: Information Technology Center Room 105A/B, UH Manoa

Presenters: Mahdi Belcaid (HDSI), Sean Cleveland(UH), Ron Merrill (UH), David Schanzenbach (UH), and Jennifer Geis(UH)

Drawing Drawing Drawing

This FREE workshop is sponsored by the Hawai’i Data Science Institue and the University of Hawai’i Information Technology Service Cyberinfrastructure group and Hawai’i EPSCoR.

This workshop focuses specifically on the Python skills necessary for data analysis – as opposed to software development – and introduces some of the libraries that have made Python a popular alternative for working with data at any scale.

Takeaways:

By the end of this workshop students will be able to:

  • Work with the Pandas library to conduct essential data analysis tasks such as reading, exploring, filtering, and summarizing data.
  • Slice, shape and pivot tables.
  • Implement calculations on rows, columns, and tables.
  • Use split-apply-combine to summarize data
  • Merge, concatenate and filter data from multiple sources.
  • Visualize data using matplotlib

Participants should bring their laptops and plan to participate actively. Laptops will require a browser application for accessing jupyter notebooks resources.

Required Software Installation

Installing the Python environment with Anaconda

Python is a popular language for research computing, and great for general-purpose programming as well. Installing all of its research packages individually can be a bit difficult, so we recommend Anaconda, an all-in- one installer. Regardless of how you choose to install it, make sure you install Python version 3.6. We will extensively use the Jupyter programming environment that runs in a web browser. For this to work you will need a reasonably up- to-date browser. The current versions of the Chrome, Safari and Firefox browsers are all supported (some older browsers, including Internet Explorer version 9 and below, are not).

Installation Instructions for Windows

Browse to http://continuum.io/downloads Download the Python installer for Windows Install Python 3.6 using all of the defaults for installation except make sure to check “Make Anaconda the default Python”

Installation Instructions for Mac OS X

Browse to http://continuum.io/downloads Download the Python 3.6 installer for OS X Install using all of the defaults for installation

Schedule

TUESDAY APRIL 17

  • 9AM BEGIN WORKSHOP
  • 10:30AM BREAK
  • 10:45 RESUME
  • NOON LUNCH
  • 1PM RESUME
  • 2:30PM BREAK
  • 2:45 RESUME
  • 4PM STOP FOR THE DAY

WEDNESDAY APRIL 18

  • 9AM BEGIN WORKSHOP
  • 10:30AM BREAK
  • 10:45 RESUME
  • NOON LUNCH
  • 1PM RESUME
  • 2:30PM BREAK
  • 2:45 RESUME
  • 4PM STOP FOR THE DAY

Workshop Materials

Tuesday

Preliminaries.ipynb https://bit.ly/2H5N9Xl

Introduction_to_Python.ipynb https://bit.ly/2vuWAP1

Intro_to_pandas.ipynb https://bit.ly/2J3Y3xp

Plotting_and_visualization.ipynb https://bit.ly/2EW4Erk

Exploring_data.ipynb https://bit.ly/2J3Y4RZ

Missing_values.ipynb https://bit.ly/2voIHl6

Data Files ALL DATA FILES

ZIP OF FILES Once unzipped all the files are in the “data” folder

WEDNESDAY

Grouping Dataframes https://bit.ly/2Ha9lE3

Merging Joining Data https://bit.ly/2JXCgZN

Plotting with Seaborn https://bit.ly/2HyqjLC

Survey

Please fill out the demographic Survey