Loading…
Welcome to the Earth Science Information Partners (ESIP) 2018 Summer Meeting! The 2018 theme is Realizing the Socioeconomic Value of Data. The theme is based on one of the goals in the 2015 - 2020 ESIP Strategic Plan, which provides a framework for ESIP’s activities over the next three years.

If you haven’t already, register here!

Room Block Update: Our block is full. We recommend the AC Hotel Tucson Downtown, which is about 5 minutes by car and is accessible via the Tucson Streetcar in about fifteen minutes.
View analytic

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Pima [clear filter]
Tuesday, July 17
 

9:30am

Optimizing Data for the Cloud

Session Description: When data is shared in the cloud, anyone can analyze it without having to download it or store it themselves, which lowers the cost of new product development, reduces the time to scientific discovery, and can accelerate innovation. However, staging large-scale datasets for analysis in the cloud requires consideration of how data should be prepared and organized to allow fast, efficient, and programmatic access from distributed computing systems. This workshop will provide a forum for members of the community to share lessons learned as they explore ways to use the cloud to expand access to data. It seeks to encourage dialog between users interested in leveraging data in the AWS Cloud for research and application development.


Data Optimization for the cloud: Tools and Services (July 17th, 9:30 am – 11:30 am):

AGENDA

Joe Flasher, AWS (10 min)

Introduction

Dan Pilone, Element84 (10 min)
Title: Interdisciplinary research, heterogeneous data, and the case for Archives of Convenience
Description: Earth Science data is measured in petabytes and represents decades of data collection, evolution of technology and practices, and provides an unparalleled view of our planet. The pace of change is only accelerating: NASA and other agencies are on their way to making hundreds of Petabytes of data available in the cloud, highly scalable processing and analysis architectures and tools are in active use with more being developed every day, and each of these brings with it opportunities for optimization and innovation. This talk demonstrates leveraging the elastic nature of the cloud using GOES-16 data to create ephemeral Archives of Convenience, targeting individual researcher needs, optimized for their problems and tool suites, instead of trying to settle on a single "cloud optimized" solution.

Ilya Khamushkin, Intertrust (10 min)
Title: Earth Data for Everyone
Description: At Intertrust, we believe that working with Earth science data should be easy. Too often file formats, transfer protocols, and cumbersome access interfaces make it difficult for users without domain knowledge to incorporate these data into their workflows. During this session we’ll share our experiences from the past five years building and operating the Planet OS Datahub, our cloud-based data as a service platform.

Marty J. Sullivan, Cornell University (10 min)
Title: The Need for Data Lakes in Climate Science
Description: Climate data is massive. The archive data formats used in the field are difficult to retrieve and analyze, they also come from so many different sources. Learn how and why Cornell University’s department of Earth & Atmospheric Sciences is moving toward the concept of building geospatial data lakes in Amazon S3 and using tools like Amazon Athena.

Sudhir Shrestha, ESRI (10 min)
Title: Scientific Earth Science Data to Cloud Optimized Web Services;
Description: Working with earth science data to extract information sometimes can be challenging due to its diversity and complexity. In this session, we will demonstrate real world examples of successful application of open earth science data in ArcGIS platform. We will share briefly the workflow of optimized scientific data management (ingesting, managing, analyzing and sharing) in cloud and how you can quickly spin up the web applications to share your information products including analytics to larger community. We will share few use cases, such as NOAA High Resolution Refresh Radar (HRRR), Sentinel data and other webmap applications that demonstrate how we access large collections of near real-time data that are stored on-premise or on the cloud, disseminate them dynamically, process and analyze them on-the-fly, and serve them to a variety of geospatial applications.

General discussion (10 min)

Breakout groups: focus on tools and services (30 minutes)

(Continue conversation over coffee - 30 minutes)




Tuesday July 17, 2018 9:30am - 11:00am
Pima
  • Subject Jump In, Deep Dive
  • Remote Participation Link https://global.gotomeeting.com/join/752150301
  • Remote Participation Access Code 752-150-301
  • Remote Participation Phone # (646) 749-3129 More phone numbers Australia: +61 2 9087 3604 Austria: +43 7 2081 5427 Belgium: +32 28 93 7018 Canada: +1 (647) 497-9391 Denmark: +45 32 72 03 82 Finland: +358 942 72 1060 France: +33 170 950 594 Germany: +49 692 5736 7317 Ireland: +353 16 572 651 Italy: +39 0 247 92 13 01 Netherlands: +31 202 251 017 New Zealand: +64 9 280 6302 Norway: +47 21 93 37 51 Spain: +34 932 75 2004 Sweden: +46 853 527 827 Switzerland: +41 435 5015 61 United Kingdom: +44 20 3713 5028
  • Tags Cloud Computing, Data Analytics

11:30am

Optimizing Data for the Cloud
Session Description: When data is shared in the cloud, anyone can analyze it without having to download it or store it themselves, which lowers the cost of new product development, reduces the time to scientific discovery, and can accelerate innovation. However, staging large-scale datasets for analysis in the cloud requires consideration of how data should be prepared and organized to allow fast, efficient, and programmatic access from distributed computing systems. This workshop will provide a forum for members of the community to share lessons learned as they explore ways to use the cloud to expand access to data. It seeks to encourage dialog between users interested in leveraging data in the AWS Cloud for research and application development.


Data Optimization for the cloud: Data Formats (July 17th, 11:00 am – 1:00 pm):

AGENDA


Otis Brown and Jonathan Brannock, CICS-NC (10 min)
Title: Big Data Project (BDP) Data Broker Update
Description: The NOAA Big Data Project Data Broker role and current datasets being provided by CICS-NC are reviewed. NOAA datasets under consideration for provision to the cloud partners are described. An update on GOES-16 accession from AWS S3 including usage by volume and users is given. New policy challenges associated with reformatting datasets and online updated are discussed.

Rich Signell, USGS (10 min)
Title: Cloud-friendly ndarray formats
Description: There is a tremendous amount of scientific multidimensional array data (ndarray) stored in NetCDF or HDF files. Since the cloud uses object storage, not conventional filesystems, there is a need for a "cloud-friendly" storage format that can support the NetCDF and HDF data models. Several solutions have been proposed, including HSDS, Zarr, TileDB, S3-Netcdf, and can be compared with FUSE, which provides a POSIX layer to make object storage look like a filesystem. This talk will discuss what the Pangeo project is doing to explore these data formats and the challenges that remain for the community.

Rob Emanuele, Azavea (10 min)
Title: Cloud Optimized GeoTiffs: enabling efficient cloud workflows
Description: Cloud Optimized GeoTIFFs (COGs) are a raster data format that is a key component to enabling cloud-native geospatial workflows. COGs enable faster reading, writing, and processing of raster data on the cloud without the need for local copies. This talk will include a brief overview of what COGs are and show examples of how they can be used to leverage cloud deployment for research and application development.

John Readey, The HDF Group (10 min)
Title: HDF Data in the Cloud
Description: Amazon S3 is a great storage technology for the cloud: scalable, built-in redundancy, and cost-effective. However traditionally HDF5 files stored on S3 haven’t worked well (or at all) with applications that expect data to be stored on POSIX filesystems, requiring files to be copied to local storage before being accessed. In order to enable HDF data for cloud-based analytics over massive datasets, The HDF Group has developed new methods for storing HDF data on S3 that take full advantage of the storage platform, allows data to be accessed in place, and is compatible with existing applications. This talk will review these technologies and outline some future directions.

General discussion (10 min)

Breakout groups: focus on data formats (30 min)

Report findings from breakout groups (10 min)






Tuesday July 17, 2018 11:30am - 1:00pm
Pima
  • Subject Jump In, Deep Dive
  • Remote Participation Link https://global.gotomeeting.com/join/752150301
  • Remote Participation Access Code 752-150-301
  • Remote Participation Phone # (646) 749-3129 More phone numbers Australia: +61 2 9087 3604 Austria: +43 7 2081 5427 Belgium: +32 28 93 7018 Canada: +1 (647) 497-9391 Denmark: +45 32 72 03 82 Finland: +358 942 72 1060 France: +33 170 950 594 Germany: +49 692 5736 7317 Ireland: +353 16 572 651 Italy: +39 0 247 92 13 01 Netherlands: +31 202 251 017 New Zealand: +64 9 280 6302 Norway: +47 21 93 37 51 Spain: +34 932 75 2004 Sweden: +46 853 527 827 Switzerland: +41 435 5015 61 United Kingdom: +44 20 3713 5028
  • Tags Cloud Computing, Data Analytics

2:00pm

REO: Work to date, lessons, and possible data/software coming from USGS use of the ESIP Testbed
USGS has been taking advantage of the ESIP Testbed for a number of data systems that are part of our evolving Modular Science Framework, a vision for enabling scientific infrastructure. Some of our developments are reaching a level of maturity where we will start deploying them to production capacity on USGS infrastructure, but there are some other components that might be useful across the broader community. Examples are our Spatial Feature Registry, a system for integrating usable named/identified spatial features through time for analytical uses, and the Taxa Information Registry, a component that assembles best available information from across disparate data sources on biological taxa of interest in our biogeographic work. We offer this session to share what we're doing in these projects, offer opportunities for other groups to share similar work on the ESIP Testbed, and open discussion on how best to conduct this work moving forward.

We're particularly looking for feedback on what parts of what we are doing would be best developed by us and others as ESIP common resources. We're building an inherently distributed architecture, and we can run operational components all over the web. As we take ideas from research to engineering, help us figure out the best pathway that will provide maximum value for USGS and for others.

Speakers & Moderators
avatar for Daniel Wieferich

Daniel Wieferich

Physical Scientist, US Geological Survey
python, database management, landscape ecology, machine learning


Tuesday July 17, 2018 2:00pm - 3:30pm
Pima

4:00pm

Research Object Citation and FAIR Guidance Materials for Data Managers and Librarians
Work with us as we define the topics and identify resources that would be valuable to data managers and librarians as they assist researchers with open and FAIR practices for data and other research products as well as best practices for citation. As journals and repositories move to requiring data citations that support your research the community needs consistent guidance that incorporates our best practices developed by our community.

Speakers & Moderators
avatar for Nancy Hoebelheinrich

Nancy Hoebelheinrich

Principal, Knowledge Motifs LLC
See my LinkedIn profile at: https://www.linkedin.com/in/nancy-hoebelheinrich-0576ba3
avatar for Shelley Stall

Shelley Stall

American Geophysical Union
Shelley Stall is the Director of Data Programs at the American Geophysical Union. Shelley has more than two decades of experience working in high-volume, complex data management environments. She has helped organizations in not-for-profit, commercial, defense, and federal civilian... Read More →


Tuesday July 17, 2018 4:00pm - 5:30pm
Pima
  • Subject Jump In
  • Remote Participation Link https://global.gotomeeting.com/join/752150301
  • Remote Participation Access Code 752-150-301
  • Remote Participation Phone # (646) 749-3129 More phone numbers Australia: +61 2 9087 3604 Austria: +43 7 2081 5427 Belgium: +32 28 93 7018 Canada: +1 (647) 497-9391 Denmark: +45 32 72 03 82 Finland: +358 942 72 1060 France: +33 170 950 594 Germany: +49 692 5736 7317 Ireland: +353 16 572 651 Italy: +39 0 247 92 13 01 Netherlands: +31 202 251 017 New Zealand: +64 9 280 6302 Norway: +47 21 93 37 51 Spain: +34 932 75 2004 Sweden: +46 853 527 827 Switzerland: +41 435 5015 61 United Kingdom: +44 20 3713 5028
  • Tags Documentation, Education, Data Management Training, Data Citation, Software and Services Citation
 
Wednesday, July 18
 

2:00pm

Using Jupyter for Cloud-based Analysis
Python and Cloud computing have become ubiquitous in the Earth Science community. Jupyter provides a browser-based workbench environment for researchers to create sharable notebooks with code snippets to interact with data and services on the internet. Participants of this workshop will learn about using Jupyter notebook to interact with cloud-based services for scientific research and analysis.

Speakers & Moderators

Wednesday July 18, 2018 2:00pm - 3:30pm
Pima
  • Subject Jump In
  • Remote Participation Link https://global.gotomeeting.com/join/752150301
  • Remote Participation Access Code 752-150-301
  • Remote Participation Phone # (646) 749-3129 More phone numbers Australia: +61 2 9087 3604 Austria: +43 7 2081 5427 Belgium: +32 28 93 7018 Canada: +1 (647) 497-9391 Denmark: +45 32 72 03 82 Finland: +358 942 72 1060 France: +33 170 950 594 Germany: +49 692 5736 7317 Ireland: +353 16 572 651 Italy: +39 0 247 92 13 01 Netherlands: +31 202 251 017 New Zealand: +64 9 280 6302 Norway: +47 21 93 37 51 Spain: +34 932 75 2004 Sweden: +46 853 527 827 Switzerland: +41 435 5015 61 United Kingdom: +44 20 3713 5028
  • Tags Cloud Computing

4:00pm

What to do at a PROV roadblock
To make the PROV graph sing, all nodes should be identifiers to references and should be resolvable and maintained by the appropriate community group. So, when this isn't the case... how do we move ahead anyway?

This session will focus on:

A. How to come to community consensus when roadblocks arise. 
B. How to resolve issues independently and move forward! 

The session will also explore how the ESIP Community Ontology Repository fits into these challenges.

Speakers & Moderators
avatar for Annie Burgess

Annie Burgess

ESIP Lab Director, ESIP


Wednesday July 18, 2018 4:00pm - 5:30pm
Pima
  • Subject Deep Dive
  • Remote Participation Link https://global.gotomeeting.com/join/752150301
  • Remote Participation Access Code 752-150-301
  • Remote Participation Phone # (646) 749-3129 More phone numbers Australia: +61 2 9087 3604 Austria: +43 7 2081 5427 Belgium: +32 28 93 7018 Canada: +1 (647) 497-9391 Denmark: +45 32 72 03 82 Finland: +358 942 72 1060 France: +33 170 950 594 Germany: +49 692 5736 7317 Ireland: +353 16 572 651 Italy: +39 0 247 92 13 01 Netherlands: +31 202 251 017 New Zealand: +64 9 280 6302 Norway: +47 21 93 37 51 Spain: +34 932 75 2004 Sweden: +46 853 527 827 Switzerland: +41 435 5015 61 United Kingdom: +44 20 3713 5028
  • Tags Disaster Lifecycle, Documentation, Information Quality, Semantic Technologies
 
Thursday, July 19
 

9:30am

Using Cloud Object Stores for Data Storage and Data Services
This session will review and discuss a variety of approaches for using cloud object store technologies to support earth system science (ESS) data storage and data services. On the data storage side, we will discuss a range of projects from those storing existing ESS data files in object stores to projects developing data formats designed with object stores in mind (Zarr, TileDB, etc.).

On the data service end, we will discuss architectures and solutions to leverage object stores for improving data access and analysis (NEXUS, WSWM, HSDS, etc.).

Speakers & Moderators
avatar for Ethan Davis

Ethan Davis

UCAR Unidata
avatar for Thomas Huang

Thomas Huang

Technical Group Supervisor, JPL


Thursday July 19, 2018 9:30am - 11:00am
Pima
  • Subject Deep Dive
  • Remote Participation Link https://global.gotomeeting.com/join/752150301
  • Remote Participation Access Code 752-150-301
  • Remote Participation Phone # (646) 749-3129 More phone numbers Australia: +61 2 9087 3604 Austria: +43 7 2081 5427 Belgium: +32 28 93 7018 Canada: +1 (647) 497-9391 Denmark: +45 32 72 03 82 Finland: +358 942 72 1060 France: +33 170 950 594 Germany: +49 692 5736 7317 Ireland: +353 16 572 651 Italy: +39 0 247 92 13 01 Netherlands: +31 202 251 017 New Zealand: +64 9 280 6302 Norway: +47 21 93 37 51 Spain: +34 932 75 2004 Sweden: +46 853 527 827 Switzerland: +41 435 5015 61 United Kingdom: +44 20 3713 5028
  • Tags Cloud Computing

11:30am

Interactive Data Analysis on Cloud Environment
From hype to real world applications, cloud computing is proven as the platform to tackle our big data challenges. With the elasticity of the cloud, it is possible for us to develop solutions for our community to analysis and interact large collection of data where the actual computing is performed on the cloud next to the data. This session welcomes speakers to discuss and demonstrate their cloud-based solution for interactive science analysis. Invited speakers: Emily Law/JPL - JPL's Trek technology for interactive exploration of planetary and earth data; Rich Signell/USGS - Interactive, data-proximate analysis of earth system model data on the Cloud; Sudhir Shrestha/ESRI - Improved decision support system with Real time flood inundation forecast Applications; Joe Jacob/JPL - OceanWorks: Ocean Science Data Analytics using Apache Science Data Analytics Platform

Speakers & Moderators
avatar for Thomas Huang

Thomas Huang

Technical Group Supervisor, JPL
avatar for Rich Signell

Rich Signell

Oceanographer, USGS
Ocean Modeling, Python, NetCDF, THREDDS, ERDDAP, UGRID, SGRID, CF-Conventions, Jupyter, JupyterHub, CSW, TerriaJS


Thursday July 19, 2018 11:30am - 1:00pm
Pima
  • Subject Jump In
  • Remote Participation Link https://global.gotomeeting.com/join/752150301
  • Remote Participation Access Code 752-150-301
  • Remote Participation Phone # (646) 749-3129 More phone numbers Australia: +61 2 9087 3604 Austria: +43 7 2081 5427 Belgium: +32 28 93 7018 Canada: +1 (647) 497-9391 Denmark: +45 32 72 03 82 Finland: +358 942 72 1060 France: +33 170 950 594 Germany: +49 692 5736 7317 Ireland: +353 16 572 651 Italy: +39 0 247 92 13 01 Netherlands: +31 202 251 017 New Zealand: +64 9 280 6302 Norway: +47 21 93 37 51 Spain: +34 932 75 2004 Sweden: +46 853 527 827 Switzerland: +41 435 5015 61 United Kingdom: +44 20 3713 5028
  • Tags Cloud Computing, Data Analytics, Science Software
 
Friday, July 20
 

9:30am

Preparing Three Dimensional Data for Virtual and Augmented Reality
Many scientists and researchers have been inspired by the VR and AR demos that they have seen at ESIP, AGU and elsewhere. A common question that surfaces is, “how do I get my data into VR?” In this session, a 3D data expert will share information about how to prepare data for immersive data visualization and ideas about automating the ingest of data into immersive visualization platforms.



Speakers & Moderators
avatar for Shayna Skolnik

Shayna Skolnik

Co-founder / CEO, Navteca
Virtual reality, data visualization, science storytelling in VR, cloud computing, entrepreneurship, NASA ESTO Discover AQ project, | creativity + technology = awesome


Friday July 20, 2018 9:30am - 11:00am
Pima
  • Subject Jump In
  • Remote Participation Link https://global.gotomeeting.com/join/752150301
  • Remote Participation Access Code 752-150-301
  • Remote Participation Phone # (646) 749-3129 More phone numbers Australia: +61 2 9087 3604 Austria: +43 7 2081 5427 Belgium: +32 28 93 7018 Canada: +1 (647) 497-9391 Denmark: +45 32 72 03 82 Finland: +358 942 72 1060 France: +33 170 950 594 Germany: +49 692 5736 7317 Ireland: +353 16 572 651 Italy: +39 0 247 92 13 01 Netherlands: +31 202 251 017 New Zealand: +64 9 280 6302 Norway: +47 21 93 37 51 Spain: +34 932 75 2004 Sweden: +46 853 527 827 Switzerland: +41 435 5015 61 United Kingdom: +44 20 3713 5028
  • Tags Cloud Computing, Discovery, Education, Information Quality, Science Communication, Science Software, VR/AR

11:30am

Information management code registry for earth and environmental sciences
Earth and environmental scientists and data managers write significant amounts of code each year for large scale data manipulation. However, publishing and sharing this code is not a common practice and is hampered by the lack of thematic code registries that are designed to make code easily discoverable and reusable. In an exploratory session at last year’s ESIP summer meeting we discussed community practices, existing repositories, challenges, and recommendations for what might be termed ‘information management’ code publication, i.e., code developed to prepare data for a specific research question.

This information management code registry has been implemented using OntoSoft (http://imcr.ontosoft.org/# ) and an initial hackathon conducted. It’s focus is on making code more discoverable that addresses common procedures that earth and environmental information managers encounter when organizing, cleaning, manipulating, documenting, and archiving data sets. It will live in the niche between scientific analysis code and short code snippets as found in Stack Overflow. It will be a community maintained resource containing everything from example information management code to programs with multiple functions that are generalized to be easily reused. In this session we aim to build a community and discuss best practices for publishing code, code metadata, and repository governance/maintenance. After a short introduction to the registry in OntoSoft and a report on lessons learned from the hackathon we will launch into discussion of best practices, governance and a wish-list of code priorities. An agenda for this session is here: https://docs.google.com/document/d/1eGaIBCsYN7tog8hLNzMEQqrlG4Z4-ithu8VYCDH_vVk/edit?usp=sharing

Speakers & Moderators
avatar for Colin Smith

Colin Smith

Data manager, Environmental Data Initiative (EDI)
I work on accelerating the archive and reuse of data in ecological science. My interests are in software development and data harmonization.


Agenda docx

Friday July 20, 2018 11:30am - 1:00pm
Pima
  • Subject Skim the Surface, Jump In
  • Remote Participation Link https://global.gotomeeting.com/join/752150301
  • Remote Participation Access Code 752-150-301
  • Remote Participation Phone # (646) 749-3129 More phone numbers Australia: +61 2 9087 3604 Austria: +43 7 2081 5427 Belgium: +32 28 93 7018 Canada: +1 (647) 497-9391 Denmark: +45 32 72 03 82 Finland: +358 942 72 1060 France: +33 170 950 594 Germany: +49 692 5736 7317 Ireland: +353 16 572 651 Italy: +39 0 247 92 13 01 Netherlands: +31 202 251 017 New Zealand: +64 9 280 6302 Norway: +47 21 93 37 51 Spain: +34 932 75 2004 Sweden: +46 853 527 827 Switzerland: +41 435 5015 61 United Kingdom: +44 20 3713 5028
  • Tags Discovery, Documentation, Science Software, Sustainable Data Management