Loading…
Welcome to the Earth Science Information Partners (ESIP) 2018 Summer Meeting! The 2018 theme is Realizing the Socioeconomic Value of Data. The theme is based on one of the goals in the 2015 - 2020 ESIP Strategic Plan, which provides a framework for ESIP’s activities over the next three years.

If you haven’t already, register here!

Room Block Update: Our block is full. We recommend the AC Hotel Tucson Downtown, which is about 5 minutes by car and is accessible via the Tucson Streetcar in about fifteen minutes.
View analytic

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Pima [clear filter]
Tuesday, July 17
 

9:30am

Optimizing Data for the Cloud

Session Description: When data is shared in the cloud, anyone can analyze it without having to download it or store it themselves, which lowers the cost of new product development, reduces the time to scientific discovery, and can accelerate innovation. However, staging large-scale datasets for analysis in the cloud requires consideration of how data should be prepared and organized to allow fast, efficient, and programmatic access from distributed computing systems. This workshop will provide a forum for members of the community to share lessons learned as they explore ways to use the cloud to expand access to data. It seeks to encourage dialog between users interested in leveraging data in the AWS Cloud for research and application development.


Data Optimization for the cloud: Tools and Services (July 17th, 9:30 am – 11:30 am):

AGENDA

Joe Flasher, AWS (10 min)

Introduction

Dan Pilone, Element84 (10 min)
Title: Interdisciplinary research, heterogeneous data, and the case for Archives of Convenience
Description: Earth Science data is measured in petabytes and represents decades of data collection, evolution of technology and practices, and provides an unparalleled view of our planet. The pace of change is only accelerating: NASA and other agencies are on their way to making hundreds of Petabytes of data available in the cloud, highly scalable processing and analysis architectures and tools are in active use with more being developed every day, and each of these brings with it opportunities for optimization and innovation. This talk demonstrates leveraging the elastic nature of the cloud using GOES-16 data to create ephemeral Archives of Convenience, targeting individual researcher needs, optimized for their problems and tool suites, instead of trying to settle on a single "cloud optimized" solution.

Ilya Khamushkin, Intertrust (10 min)
Title: Earth Data for Everyone
Description: At Intertrust, we believe that working with Earth science data should be easy. Too often file formats, transfer protocols, and cumbersome access interfaces make it difficult for users without domain knowledge to incorporate these data into their workflows. During this session we’ll share our experiences from the past five years building and operating the Planet OS Datahub, our cloud-based data as a service platform.

Marty J. Sullivan, Cornell University (10 min)
Title: The Need for Data Lakes in Climate Science
Description: Climate data is massive. The archive data formats used in the field are difficult to retrieve and analyze, they also come from so many different sources. Learn how and why Cornell University’s department of Earth & Atmospheric Sciences is moving toward the concept of building geospatial data lakes in Amazon S3 and using tools like Amazon Athena.

Sudhir Shrestha, ESRI (10 min)
Title: Scientific Earth Science Data to Cloud Optimized Web Services;
Description: Working with earth science data to extract information sometimes can be challenging due to its diversity and complexity. In this session, we will demonstrate real world examples of successful application of open earth science data in ArcGIS platform. We will share briefly the workflow of optimized scientific data management (ingesting, managing, analyzing and sharing) in cloud and how you can quickly spin up the web applications to share your information products including analytics to larger community. We will share few use cases, such as NOAA High Resolution Refresh Radar (HRRR), Sentinel data and other webmap applications that demonstrate how we access large collections of near real-time data that are stored on-premise or on the cloud, disseminate them dynamically, process and analyze them on-the-fly, and serve them to a variety of geospatial applications.

General discussion (10 min)

Breakout groups: focus on tools and services (30 minutes)

(Continue conversation over coffee - 30 minutes)




Tuesday July 17, 2018 9:30am - 11:00am
Pima
  • Subject Jump In, Deep Dive
  • Remote Participation Link https://global.gotomeeting.com/join/752150301
  • Remote Participation Access Code 752-150-301
  • Remote Participation Phone # (646) 749-3129 More phone numbers Australia: +61 2 9087 3604 Austria: +43 7 2081 5427 Belgium: +32 28 93 7018 Canada: +1 (647) 497-9391 Denmark: +45 32 72 03 82 Finland: +358 942 72 1060 France: +33 170 950 594 Germany: +49 692 5736 7317 Ireland: +353 16 572 651 Italy: +39 0 247 92 13 01 Netherlands: +31 202 251 017 New Zealand: +64 9 280 6302 Norway: +47 21 93 37 51 Spain: +34 932 75 2004 Sweden: +46 853 527 827 Switzerland: +41 435 5015 61 United Kingdom: +44 20 3713 5028
  • Tags Cloud Computing, Data Analytics

11:30am

Optimizing Data for the Cloud
Session Description: When data is shared in the cloud, anyone can analyze it without having to download it or store it themselves, which lowers the cost of new product development, reduces the time to scientific discovery, and can accelerate innovation. However, staging large-scale datasets for analysis in the cloud requires consideration of how data should be prepared and organized to allow fast, efficient, and programmatic access from distributed computing systems. This workshop will provide a forum for members of the community to share lessons learned as they explore ways to use the cloud to expand access to data. It seeks to encourage dialog between users interested in leveraging data in the AWS Cloud for research and application development.


Data Optimization for the cloud: Data Formats (July 17th, 11:00 am – 1:00 pm):

AGENDA


Otis Brown and Jonathan Brannock, CICS-NC (10 min)
Title: Big Data Project (BDP) Data Broker Update
Description: The NOAA Big Data Project Data Broker role and current datasets being provided by CICS-NC are reviewed. NOAA datasets under consideration for provision to the cloud partners are described. An update on GOES-16 accession from AWS S3 including usage by volume and users is given. New policy challenges associated with reformatting datasets and online updated are discussed.

Rich Signell, USGS (10 min)
Title: Cloud-friendly ndarray formats
Description: There is a tremendous amount of scientific multidimensional array data (ndarray) stored in NetCDF or HDF files. Since the cloud uses object storage, not conventional filesystems, there is a need for a "cloud-friendly" storage format that can support the NetCDF and HDF data models. Several solutions have been proposed, including HSDS, Zarr, TileDB, S3-Netcdf, and can be compared with FUSE, which provides a POSIX layer to make object storage look like a filesystem. This talk will discuss what the Pangeo project is doing to explore these data formats and the challenges that remain for the community.

Rob Emanuele, Azavea (10 min)
Title: Cloud Optimized GeoTiffs: enabling efficient cloud workflows
Description: Cloud Optimized GeoTIFFs (COGs) are a raster data format that is a key component to enabling cloud-native geospatial workflows. COGs enable faster reading, writing, and processing of raster data on the cloud without the need for local copies. This talk will include a brief overview of what COGs are and show examples of how they can be used to leverage cloud deployment for research and application development.

John Readey, The HDF Group (10 min)
Title: HDF Data in the Cloud
Description: Amazon S3 is a great storage technology for the cloud: scalable, built-in redundancy, and cost-effective. However traditionally HDF5 files stored on S3 haven’t worked well (or at all) with applications that expect data to be stored on POSIX filesystems, requiring files to be copied to local storage before being accessed. In order to enable HDF data for cloud-based analytics over massive datasets, The HDF Group has developed new methods for storing HDF data on S3 that take full advantage of the storage platform, allows data to be accessed in place, and is compatible with existing applications. This talk will review these technologies and outline some future directions.

General discussion (10 min)

Breakout groups: focus on data formats (30 min)

Report findings from breakout groups (10 min)






Tuesday July 17, 2018 11:30am - 1:00pm
Pima
  • Subject Jump In, Deep Dive
  • Remote Participation Link https://global.gotomeeting.com/join/752150301
  • Remote Participation Access Code 752-150-301
  • Remote Participation Phone # (646) 749-3129 More phone numbers Australia: +61 2 9087 3604 Austria: +43 7 2081 5427 Belgium: +32 28 93 7018 Canada: +1 (647) 497-9391 Denmark: +45 32 72 03 82 Finland: +358 942 72 1060 France: +33 170 950 594 Germany: +49 692 5736 7317 Ireland: +353 16 572 651 Italy: +39 0 247 92 13 01 Netherlands: +31 202 251 017 New Zealand: +64 9 280 6302 Norway: +47 21 93 37 51 Spain: +34 932 75 2004 Sweden: +46 853 527 827 Switzerland: +41 435 5015 61 United Kingdom: +44 20 3713 5028
  • Tags Cloud Computing, Data Analytics
 
Wednesday, July 18
 

2:00pm

Using Jupyter for Cloud-based Analysis
Python and Cloud computing have become ubiquitous in the Earth Science community. Jupyter provides a browser-based workbench environment for researchers to create sharable notebooks with code snippets to interact with data and services on the internet. Participants of this workshop will learn about using Jupyter notebook to interact with cloud-based services for scientific research and analysis.

Speakers & Moderators

Wednesday July 18, 2018 2:00pm - 3:30pm
Pima
  • Subject Jump In
  • Remote Participation Link https://global.gotomeeting.com/join/752150301
  • Remote Participation Access Code 752-150-301
  • Remote Participation Phone # (646) 749-3129 More phone numbers Australia: +61 2 9087 3604 Austria: +43 7 2081 5427 Belgium: +32 28 93 7018 Canada: +1 (647) 497-9391 Denmark: +45 32 72 03 82 Finland: +358 942 72 1060 France: +33 170 950 594 Germany: +49 692 5736 7317 Ireland: +353 16 572 651 Italy: +39 0 247 92 13 01 Netherlands: +31 202 251 017 New Zealand: +64 9 280 6302 Norway: +47 21 93 37 51 Spain: +34 932 75 2004 Sweden: +46 853 527 827 Switzerland: +41 435 5015 61 United Kingdom: +44 20 3713 5028
  • Tags Cloud Computing