Loading…
Welcome to the Earth Science Information Partners (ESIP) 2018 Summer Meeting! The 2018 theme is Realizing the Socioeconomic Value of Data. The theme is based on one of the goals in the 2015 - 2020 ESIP Strategic Plan, which provides a framework for ESIP’s activities over the next three years.

All Presentations are being added to a Google Folder temporarily and then will be moved to FigShare and linked to the sessions here. 
View analytic

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Sabino [clear filter]
Tuesday, July 17
 

9:30am

Introduction to Jupyter technologies and how they are used in the ESIP community
You’ve heard a lot about Jupyter. There are Notebooks and Hubs, but what are they? Do they make it easier for you to do or share your work?

Participants in this session will be given an overview on how ESIP members are using the Jupyter Project’s applications to accelerate their own research. This breakout session is intended as an introduction not only to Jupyter applications and their usage in ESIP member organizations. Workshops using the technologies via ESIPhub later in the Meeting will also be discussed. We will hold a ten minute discussion after the presentations on the topics brought up during the talks and how we as a community can use the ESIPhub resource.

Frank Greguska, NASA JPL (15min)
Title: Using Apache Science Data Analytics Platform from Jupyter
Description: Apache Science Data Analytics Platform (SDAP) is an open source Apache Incubator project that, among other things, allows for analysis of scientific data on the cloud. SDAP consists of a collection of webservices that enable science and allow user interaction through Jupyter notebooks. This talk will introduce the Apache SDAP project and walk attendees through some of the algorithms that are available for use.

Tyler Erickson, Google (15min)
Title: Jupyter and Google Earth Engine
Description: Google Earth Engine is a cloud-based geospatial analysis platform that supports analysis of multi-petabyte archives via JavaScript and Python APIs. For users of the JavaScript API. the Earth Engine team maintains an online GUI. For the Python API, we promote the use of Jupyter project tools (JupyterLab, JupyterHub, Jupyter Widgets) for accessing data and developing algorithms.
Presentation: g.co/earth/esip2018-jupyter

John Readey, HDF Group (15min)
Title: HDF Kita Lab
Description: HDF Kita Lab is a Jupyter environment hosted on AWS that provides the ability to easily read and write large HDF datasets.  Users have the ability to utilize HDF Server to access data that would otherwise be too large to copy to the user disk volume.  Data used by HDF Server is stored in AWS S3, which is provides cost-effective and reliable storage.  HDF Kita Lab can be access at: https://hdflab.hdfgroup.org (HDFGroup registration is required).

Rich Signell, USGS (15min)
Title: Jupyter Success Stories from IOOS and USGS
Description: The Integrated Ocean Observing System and the US Geological Survey have been using Jupyter technologies since 2012 to help spread the use of effective and efficient tools across their communities.  These notebooks often demonstrate reproducible workflows based on catalog and data web services and come with reproducible environments made possible by the conda-forge project.  A series of notebooks will be demonstrated, from notebooks demonstrating catalog-driven workflows, to notebooks on binder that appear like web applications.

Keith Maull, NCAR Library (15min)
Title: ESIPHub Pilot | Exploring services and infrastructure to support computational geosciences research and collaboration with JupyterHub
Description: ESIPHub, a JupyterHub-based infrastructure for the ESIP community, is now available and being used within several workshops during the summer meeting.
In this talk, I will discuss the pilot of ESIPHub with UCAR/NCAR's highly successful Research Experiences for Undergraduates (REU) program, SOARS (Significant Opportunities in Atmospheric Research and Science; https://www.soars.ucar.edu). Over the last three years, we have been developing computational workshops to introduced SOARS Protégés to Python, Jupyter, computational thinking and data analysis, and this summer, we piloted ESIPHub within these workshops. I will report on the exciting potential the platform has not only for education and training, but also collaborative research.

Discussion (10)

Learn more about Jupyter and attend the other workshops using ESIPhub:

* Directly after this session is the Metadata Improvement Lab where participants will learn how to translate their xml into JSON-LD using the schema.org vocabulary Google recommends for datasets.
http://sched.co/Eype
* Wednesday afternoon is a workshop for cloud-based analysis.
http://sched.co/EyqK
* Thursday morning we'll learn about some custom widgets for earth science.
http://sched.co/EyqX

Speakers & Moderators
avatar for Tyler Erickson

Tyler Erickson

Developer Advocate, Google
avatar for Sean Gordon

Sean Gordon

Metadata Developer, The HDF Group
Talk to me about the ESIP Labs project, ESIPhub a JupyterHub based shared computational environment for workshops at Meetings.My research focuses on the connections between documentation structures and the evaluation of content for the metadata needs of diverse communities of practice... Read More →
avatar for Rich Signell

Rich Signell

Oceanographer, USGS
Ocean Modeling, Python, NetCDF, THREDDS, ERDDAP, UGRID, SGRID, CF-Conventions, Jupyter, JupyterHub, CSW, TerriaJS



Tuesday July 17, 2018 9:30am - 11:00am
Sabino

11:30am

Metadata Evaluation Lab at ESIP: Assessing if community metadata is ready for Schema.org
In this workshop participants will determine if metadata from around ESIP is ready to be translated to schema.org JSON-LD using the Google recommendation for datasets to enhance discoverablility in search engines and learn a simple way to translate their XML into compliant JSON-LD that they can improve for their metadata dialect's particular needs.

* Participants will hear about NOAA's experience in creating this information for their datasets and what was needed to apply it.
* Participants will learn how to analyze metadata collections for conceptual content and absolute content.
* Participants will apply a conceptual version of the Google recommendation for dataset metadata using the schema.org vocabulary to their collections and create a report detailing collection metrics, regardless of metadata dialect.
* Finally participants will learn how to create JSON-LD for the records in their collection and validate using the Google Structured Data Testing Tool.

We will use ESIPhub, a Jupyterhub based shared computational environment to run the workshop. This means you won't need to set up your computer to participate in the workshop, just bring a device with a connected web browser.

Learn more about Jupyter and attend the other workshops using ESIPhub:

* Tuesday morning before this session includes a general overview of Jupyter usage in our community.
http://sched.co/Eype
* Wednesday afternoon is a workshop for cloud-based analysis.
http://sched.co/EyqK
* Thursday morning we'll learn about some custom widgets for earth science.
http://sched.co/EyqX

Once you've learned how to make your own schema.org JSON-LD, learn how to improve and publish it in these later sessions:

* Tuesday afternoon includes a two-parter on Semantics in Action.
http://sched.co/Eypw 
* Wednesday afternoon goes into depth on Publishing schema.org datasets.
http://sched.co/EyqH

Speakers & Moderators
avatar for Sean Gordon

Sean Gordon

Metadata Developer, The HDF Group
Talk to me about the ESIP Labs project, ESIPhub a JupyterHub based shared computational environment for workshops at Meetings.My research focuses on the connections between documentation structures and the evaluation of content for the metadata needs of diverse communities of practice... Read More →
avatar for John Relph

John Relph

Disruptor, NESDIS/NCEI
OneStop, Metadata, Archival, Automation, Data Management, Canaan Dogs



Tuesday July 17, 2018 11:30am - 1:00pm
Sabino

2:00pm

HDF Workshop 1: Learning about HDF using Jupyter Notebooks

Jupyter Notebooks have been developed by the HDF Group and many others to help scientists and other users understand how to use HDF to create and access datasets in many disciplines. HDF Lab is a tool for bringing these resources together with data in the cloud. The Lab will include sample datasets and notebooks that use them to demonstrate HDF capabilities at many levels. It will also be a place for sharing data examples and related notebooks from users in many disciplines. ESIP members will play an important role in building this resource and ensuring that it is a useful forum for sharing community expertise. Please join us at the ground level to make sure it works.

 


Speakers & Moderators

Tuesday July 17, 2018 2:00pm - 3:30pm
Sabino

4:00pm

HDF Workshop 2: HDF analysis from the desktop to the cloud
The HDF Group is exploring many approaches to providing access to HDF data in the cloud with the goal of protecting data producers and users from disruption as data move to the cloud. These approaches include a restful interface and a plug-in replacement for h5py (h5pyd) that uses that interface, and an implementation of xarray that uses that plug-in. The Highly Scalable Data Server (HSDS) also uses this interface and will operate on HDF files or objects created from the metadata and data in those files. We are also developing several HDF5 library plug-ins that implement a Virtual Object Layer (REST-VOL) for access to the cloud using the restful interface and a Virtual File Driver (S3-VFD) for accessing data in the cloud. We will demonstrate use cases for all of these approaches and discuss how each alternative minimizes disruption for data providers and users.

Speakers & Moderators

Tuesday July 17, 2018 4:00pm - 5:30pm
Sabino
 
Wednesday, July 18
 

2:00pm

Publishing schema.org Dataset: Lessons Learned and Paths Forward
Progress surrounding the schema.org type Dataset have made it an attractive way for repositories to expose dataset metadata to search engines. The NSF EarthCube initiative funded a short-term project, P418, to explore what could be achieved if repositories could adopt schema.org as a mechanism for self-publishing information using a common schema. As part of this project a number of repositories volunteered to try publishing schema.org by embedding it in their websites.

In this session, we will:
introduce the P418 project goals and the philosophy of behind using schema.org, (15min)
and then explore some real-word schema.org publishing stories to: (30min)
hear about the various techniques used and challenges encountered for embedding the schema.org markup in web pages,
understand how well schema.org covers a repository’s own metadata model,
discuss where schema.org needs extensions and how the geoscience community can collectively move forward to improving the quality of the markup.

For more schema.org sessions see:
Tuesday, July 17 • 11:30am - 1:00pm Metadata Evaluation Lab at ESIP: Assessing if community metadata is ready for a Schema.org
Tuesday, July 17 • 4:00pm - 5:30pm Semantics in Action

Speakers & Moderators
avatar for John Relph

John Relph

Disruptor, NESDIS/NCEI
OneStop, Metadata, Archival, Automation, Data Management, Canaan Dogs
avatar for Adam Shepherd

Adam Shepherd

Technical Director, BCO-DMO
Linked Data | Semantic Web | Vocabularies



Wednesday July 18, 2018 2:00pm - 3:30pm
Sabino

4:00pm

TaskAPI - A Scalable Computing Platform for Large Scientific Data Systems
TaskAPI is a workflow platform and DSL (Domain Specific Language) that provides automatic horizontal and vertical scaling of multi-language data-intensive scientific software systems using a functionally declarative workflow paradigm. TaskAPI is capable of quickly wrapping legacy systems, provides structured guidance for best-practices in continued or new development via its JSON DSL, and automatically provides system components with a unified, straightforward API for centralized logging, job and task killing, and configurable property use.

TaskAPI was developed to serve as the backbone for the reengineered US ASOS Ingest software system and exists as its own distributable package for use by other large polyglot systems.

This session will begin by providing a broad overview (surface skim) of the TaskAPI platform, including motivation, capabilities, current and potential use cases, and design and performance characteristics.

The summary will lead into a more detailed look at the TaskAPI structure, including the DSL setup, workflow branching, task types, multi-language parallelization techniques in Java, C, and Fortran, and current and planned language support (Python via Jep, Clojure, Scala via drivers), and other features (Kafka messaging queues).

After we thoroughly explain the system and its capabilities, we will deep dive into a live example of TaskAPI as it was implemented in ASOS, examining real-life challenges we faced and how to think about and implement best use practices.

Finally, we will assist session attendees and participants in determining if this system could serve their own projects, provide assistance in TaskAPI download and setup, and solicit feature requests and needs.

Speakers & Moderators
avatar for Ryan Berkheimer

Ryan Berkheimer

Software Research, GST at NOAA NCEI



Wednesday July 18, 2018 4:00pm - 5:30pm
Sabino
 
Thursday, July 19
 

9:30am

Custom Built Jupyter Widgets for Earth Science
Presentation slides: g.co/earth/esip2018-widgets

For many scientific questions in the Earth sciences, the sheer volume of observed and/or modeled data is a barrier to progress, as it is difficult to explore and analyze using the traditional paradigm of downloading datasets to a local computer for analysis. Furthermore, methods for communicating Earth science algorithms that operate on large datasets in an easily understandable and reproducible way are needed. The Jupyter project has created several tools for general data science that can be leverage for exploratory data analysis of tera- to peta-byte scale geospatial data datasets.

This session will be a hands-on introduction to:
  • JupyterLab (the Jupyter project's next-generation UI)
  • Jupyter Widgets (the interconnection between the UI and a Python kernel)
  • Earth Engine (Google's cloud-based geospatial analysis API)
  • Examples of satellite data exploration and analysis
In addition, we will be using the following technologies:
  • JupyterHub for hosting the multi-user environment
  • Docker for packaging up JupyterLab, the Earth Engine Python API, and dozens of scientific Python packages
  • GitHub for sharing all of the session content
Learn more about Jupyter and attend the other workshops using ESIPhub:

* Tuesday morning includes a general overview of Jupyter usage in our community.
http://sched.co/Eype
* Just after the overview session is a Metadata Improvement Lab focused on schema.org for datasets.  
http://sched.co/Eypl
* Wednesday afternoon is a workshop for cloud-based analysis.
http://sched.co/EyqK


Speakers & Moderators
avatar for Tyler Erickson

Tyler Erickson

Developer Advocate, Google
avatar for Sean Gordon

Sean Gordon

Metadata Developer, The HDF Group
Talk to me about the ESIP Labs project, ESIPhub a JupyterHub based shared computational environment for workshops at Meetings.My research focuses on the connections between documentation structures and the evaluation of content for the metadata needs of diverse communities of practice... Read More →



Thursday July 19, 2018 9:30am - 11:00am
Sabino

11:30am

Machine Learning Working Session
Machine Learning engagement activities to increase the connectivity among data providers, Earth scientists, machine learning practicioners and computer service providers

Speakers & Moderators
avatar for Erin Robinson

Erin Robinson

Executive Director, Earth Science Information Partners (ESIP)
Erin Robinson works at the intersection of community informatics, Earth science and non-profit management. Over the last 10 years, she has honed an eclectic skill set both technical and managerial, creating communities and programs with lasting impact around science, data, and technology... Read More →


Thursday July 19, 2018 11:30am - 1:00pm
Sabino
 
Friday, July 20
 

9:30am

Metadata Times, They Are Changing - New Capabilities and Applications
We will cover new developments in metadata standards from a variety of communities: ISO, DataCite, DataOne, NASA.

Speakers & Moderators
avatar for Ted Habermann

Ted Habermann

The HDF Group
avatar for Matt Jones

Matt Jones

Director of Informatics, UC Santa Barbara
Data Federation | Open Science | Provenance and Semantics
avatar for Tyler Stevens

Tyler Stevens

Senior Discipline Engineer, NASA EED-2 / SGT



Friday July 20, 2018 9:30am - 11:00am
Sabino

11:30am

HDF Townhall
Data in HDF continues to play an important role for Earth Scientists in the U.S. and around the world. The HDF Group will update ESIP members on interesting projects that have come to fruition during the last year, including the TerraFusion project which brings the entire history of Terra as well as recent releases of HDF5. We will also demonstrate how HDF tools support HDF-EOS data from product design to production and standards compliance testing to user support.

Suggestion to include:

Potential benefits of shuffle
Third-party compression filters

Speakers & Moderators

Friday July 20, 2018 11:30am - 1:00pm
Sabino