NETL researchers, working closely with experts at the U.S. Department of Energy (DOE) Office of the Chief Information Officer (OCIO), have designed a multi-cloud-based computational solution to complement on-site resources that will accelerate clean energy research across the agency.
The team then tested the cloud environment using the powerful NETL-developed deep-learning tool SmartSearch©, which helps to mitigate one of the biggest draws on a researcher’s time — searching for, acquiring, and transforming relevant data.
“DOE has some of the fastest and most powerful scientific computing clusters in the world,” said NETL’s Kelley Rose, technical director for the Lab’s Science-Based Artificial Intelligence and Machine Learning Institute (SAMI). “These systems are always in high demand. They’re also configured for key DOE-aligned applications, and cloud computing allows for additional and complementary compute resources in support of DOE research and development (R&D).”
The computing capability developed by Rose and her colleagues is capable of running on any cloud-based service, but the team worked closely with Google Cloud Platform (GCP) to architect and first demonstrate an efficient and cost-effective solution that would work for researchers across DOE. This simultaneously matured NETL’s SmartSearch tool while testing the GCP environment for the OCIO — a win for NETL, OCIO and DOE at large.
“OCIO was looking for research teams with big AI computing needs to help test the GCP environment for DOE authorization to operate,” Rose said. “And NETL was looking to for the right platform to help scale up our award-winning SmartSearch tool as a hybrid cloud system that would support infinite scalability for DOE R&D. By testing with SmartSearch, we helped fulfill an agency-wide need while also advancing the Lab’s research interests.”
SmartSearch represents a paradigm shift in the way researchers search for data. Rather than typing keywords into a search engine and skimming the results, SmartSearch allows researchers to feed the tool a collection of data similar to what they hope to find. The tool ingests this data, analyzes the content and finds similar data by crawling multiple data repositories, which can include internal data stores or open-source assets, including the world wide web. After finding potential matching data, the tool offers a recommendation engine that compares the new data to the data it was fed and informs the researcher of the relevance of the new data. SmartSearch can run on systems ranging from a desktop computer to a cloud cluster.
Rose and her team first realized the potential of a hybrid cloud environment when they worked to develop the Global and Oil and Gas Infrastructure (GOGI) database, winner of a Carnegie Science Award. GOGI provides critical information to decision-makers on natural gas infrastructure, which has been used in advanced studies to understand fugitive methane emission risks across the globe to address public health, safety and security concerns.
“We were able to use SmartSearch to deliver GOGI in four months by first searching using our own expertise and then feeding these data sets to SmartSearch to find additional relevant data,” Rose said. “The database includes information about more than 6 million individual features — such as wells, pipelines and refineries — from over 1 million data sets in 193 countries and Antarctica. The data processing required for this type of big data project sometimes strained on-premises resources, and we realized that a multi-cloud approach could be used to overcome these scaling limitations.”
Once the team developed the needed multi-cloud solution using GCP, they used real projects as training resources to test how it would perform for researchers. For example, the team leveraged SmartSearch using GCP for materials data curation. The project, sponsored by DOE’s Advanced Manufacturing Office, resulted in the generation of two data sets comprising more than 50,000 resources for specific types of alloy for which only sparse data previously existed.
Those data were subsequently utilized by materials scientists in the project for advanced modeling of high-temperature alloy performance, filling in data gaps and improving model performance. In another example of testing SmartSearch using the GCP environment, NETL researchers enriched available carbon storage data sets available on DOE’s data repository, the Energy Data eXchange®, producing thousands of new results. This open carbon storage database is already being utilized by the SMART initiative to support artificial-intelligence-informed models to address subsurface energy storage and resource characterization goals.
“These and other projects illustrate how our hybrid cloud approach and our SmartSearch tool can work together to support and accelerate clean energy research,” Rose said. “And we are just getting started. We have current projects leveraging SmartSearch using GCP. We have another materials discovery project through DOE’s eXtremeMAT consortium. We also have a project automating data discovery for ocean and geohazard analyses that is utilizing these capabilities to enrich their efforts as well. SmartSearch is a tool built by researchers for researchers, and with this collaboration with OCIO, we have delivered a solution that will parse the information ‘forest’ to see the data ‘trees’ needed by the world’s top energy researchers at DOE.”
NETL is a U.S. Department of Energy national laboratory that drives innovation and delivers technological solutions for an environmentally sustainable and prosperous energy future. By leveraging its world-class talent and research facilities, NETL is ensuring affordable, abundant and reliable energy that drives a robust economy and national security, while developing technologies to manage carbon across the full life cycle, enabling environmental sustainability for all Americans.