Andy Burnett is CEO of Knowinnovation Inc, a consulting firm with offices in Cambridge, Paris, and Buffalo NY. The firm focuses on mechanisms to accelerate scientific innovation.
Tuesday, September 15, 9:00 AM
HUBzero provides a range of unique features, but in a world where new services are coming online daily, how can we take advantage of those developments. Should we think about HUBzero as our hub, or do pieces of HUBzero become parts of other portals? This talk will cover a range of experiments that we have been conducting to try to help us answer this question.
A geographer by education, Steve spent the past 10 years working with universities and state & local governments on adopting Amazon Web Services’ cloud platform and cloud-based Geographic Information Systems. Steve joined AWS in early 2012 and works with universities on integrating AWS into their Enterprise IT portfolio, delivering AWS as-a-Service and supporting researchers use of AWS across a variety of scientific disciplines. Prior to AWS he worked at esri, the world’s leading provider of GIS software and graduated with a degree in Geography from Ohio Wesleyan University. Steve lives outside of Philadelphia, Pennsylvania, tries to spend as much time as possible with his family and is generally drawn towards anything related to earth sciences, cloud and Philadelphia sports teams.
Monday, September 14, 11:00 AM
Amazon Web Services is used by over 1 million customers and 4500 different academic institutions around the globe. AWS users leverage Amazon’s cloud for everything from mission critical space operations to launching new business and social apps or running large compute jobs across thousands of cores. AWS 101 will teach you about AWS, the various infrastructure and cloud services available in Amazon’s cloud and how they impact science, research and higher education. *Please also see the hands-on workshop teaching attendees how to launch the HubZero Essentials AMI in AWS*
Mr. Matthew Jacobsen is the project manager and technical lead for the development and acquisition of enterprise scientific toolsets for the Materials and Manufacturing Directorate (RX) of the Air Force Research Laboratory. Prior to leading RX-wide efforts such as the Integrated Collaborative Environment (ICE), he served as a systems developer and analyst for a variety of software platforms, including financial and portfolio management, scientific process management, and data analysis. Mr. Jacobsen holds a BS in Management Information Systems, and an MS in Logistics and Supply Chain Management, both from Wright State University in Dayton, Ohio.
Tuesday, September 15, 10:30 AM
The dramatic rise in the volume, complexity, and utility of digital research data has triggered a need for scientific software systems that can manage this data across a laboratory enterprise. With no turnkey solutions available for this very difficult task, highly customizable and modular architectures that are implemented from within the organization to meet specific organizational needs are required. The development of these architectures presents a tremendous opportunity to address the unique and complex requirements of individual research teams, but are not without tradeoffs relative to their commercial off-the-shelf (COTS) counterparts. The Integrated Collaborative Environment (ICE), currently under development at the Air Force Research Laboratory Materials and Manufacturing Directorate, is one such architecture that employs many different technologies, including the HUBzero platform. ICE is the conduit through which Integrated Computational Materials Science and Engineering (ICMSE) is delivered to the Air Force Research Laboratory.
An overview of the current state of ICE implementation will be provided, with particular emphasis on the federated integration of numerous disparate components into a single software and hardware ecosystem. Key aspects such as material intelligence, laboratory management, identity and digital object management, robust API development, visual workflow, and machine integration will be explored. Additionally, three case studies will be presented to show how process-specific requirements, including data collection and archival, sample/specimen management, and laboratory workflow, are addressed by various ICE components. Several themes will be emphasized, including integration of numerous technologies with HUBzero, short lead time development cycles to mitigate failure risks, and the importance of software developers and materials scientists cooperating during the development of ICMSE software systems.
Initially a marine Biologist, focusing on Populations structure, Yvan received a PhD on quantitative genetics and genomics in Rennes University. After a one year postdoc at INSERM dedicated to Integrative genomics, he investigated an e-Science approach for Life Sciences during a 3 year postdoc project at INRIA / IRISA Rennes. One of the outcome of this project, called e-Biogenouest, is an innovative Virtual Research Environment (VRE) based on existing open source IT solutions and standards for western France life sciences communities. The success of this VRE proof of concept, focusing on scientific collaboration and project management, data management and analysis led to the CeSGO project funded by Brittany region for 3 years. Yvan is now working on this project aiming at the development and implementation of the first French e-Science center and the extension of the VRE concept to others scientific domains such as Human sciences, Mechanics or Electronics.
Tuesday, September 15, 9:30 AM
Research processes in Life Sciences are evolving at a rapid pace. This evolution, due to technological breakthroughs, allows to address more ambitious scientific problems and generalizes the digital aspect of the research data in Life Sciences. If facing the actual data deluge context represents a challenge, it also offers an opportunity to change and enhance our manner to tackle research tasks and disseminate science. For some disciplines, scientific data management and analysis have to be reconsidered in order to offer services and developments matching the new uses.
Adopting a system of systems strategy, we have used the Galaxy portal in combination with ISATools and HUBzero to build a Life Sciences Virtual Research Environment. Each tool offers complementary functionalities: ISAtools software suite for metadata management, HUBzero for scientific collaboration and Galaxy for computation. The resulting combination allows scientists to manage their project from collaboration to data management and analysis. This Virtual Research Environment (VRE) is tested in partnership with the scientific communities in Western France. The evaluation will give us insights on the usage and acceptance of new tools in a scientific field characterized by profound modification of its traditional processes.
Gerry McCartney serves as system CIO and is responsible for overseeing Purdue University’s information technology organization. Under McCartney's leadership, Purdue has developed the nation's largest campus cyberinfrastructure for research, with five supercomputers listed in the internationally known Top500 list. During his tenure, Purdue has developed some of the nation's most advanced learning and classroom technologies, including Signals student data analytics program and a portfolio of mobile student learning apps. McCartney is also an associate professor in Purdue's College of Technology and is the inaugural recipient of the Olga Oesterle England Professorship of Information Technology.
Monday, September 14, 11:30 AM
Michael McLennan received a Ph.D. in 1990 from Purdue University, supported as an SRC Graduate Fellow, for his dissertation on dissipative quantum mechanical electron transport in semiconductor heterostructure devices. He spent 14 years working in industry at Bell Labs and Cadence Design Systems, developing software for computer-aided design of integrated circuits. He returned to Purdue in 2004 as a senior research scientist working on nanoHUB.org. He has been director of the HUBzero Platform for Scientific Collaboration since its inception in 2008. HUBzero powers more than 60 scientific Web sites around the world.
Monday, September 14, 8:30 AM
Welcome to the world of HUBzero! A world where new scientific web sites can be created in minutes via Amazon Web Services, or in hours via RedHat and Debian packages running on your own hardware. A world where researchers can create their own space online, to share research data and collaborate privately before publishing their data and analysis tools for the scientific community. A world where educators can leverage interactive tools for hands-on homework assignments, and share teaching materials with other educators. A world that is continually changing as new capabilities are added to the platform to support discovery and learning in new ways. This talk will review the latest changes to the HUBzero platform and the current state of the hub community.
Saurabh Sinha is an Associate Professor of Computer Science and the Institute of Genomic Biology at the University of Illinois, Urbana-Champaign. He received his Ph.D. in Computer Science (2002) from the University of Washington, Seattle, and did post-doctoral work under Prof. Eric Siggia at the Rockefeller University in New York. Prof. Sinha’s research interests include regulatory genomics and evolution. He also serves as the co-Director of the NIH-funded BD2K Center of Excellence at UIUC.
Monday, September 14, 9:30 AM
I will describe our on-going work on development of “Knowledge Engine for Genomics” (KnowEnG), an E-science framework for genomics where biomedical scientists will have access to powerful methods of data mining, network mining, and machine learning to extract knowledge out of genomics data. The scientist will come to KnowEnG with their own data sets in the form of spreadsheets and ask KnowEnG to analyze those data sets in the light of a massive knowledge base of community data sets called the “Knowledge Network” that will be at the heart of the system. Associated discovery projects aimed at testing the utility of KnowEnG span a broad range of subjects, from pharmacogenomics to transcriptomics of social behavior. We are using HubZero as the framework for constructing KnowEnG.
Dr. Carol Song is a Senior Research Scientist and director of the Scientific Solutions group at the Rosen Center for Advanced Computing, Purdue University. Carol received her Ph.D. in computer science from University of Illinois at Urbana-Champaign. Her current research interests include high performance computing and distributed systems, cyberinfrastructure science, and data-driven methods and applications. Carol is the principal investigator for several NSF funded projects, including the DRINET, TeraGrid, and XSEDE. She served as the Chair of the XD Service Provider Forum 2011-2013 and is currently a member of the XSEDE advisory board. Her most recent project is an NSF Data Infrastructure Building Blocks (DIBBs) implementation grant to develop and integrate geospatial tools and data support into HUBzero, furthering the HUBzero platform for scientific collaboration.
Monday, September 14, 10:30 AM
Geospatial data are present everywhere today with the proliferation of location-aware computing devices. This is especially true in the scientific community where large amounts of data are driving research and education activities in many domains. Collaboration over geospatial data, for example, in modeling, data analysis and visualization, must still overcome the barriers of specialized software and expertise among other challenges. In addressing these needs, the GABBs project aims at building geospatial modeling, data analysis and visualization capabilities in HUBzero. Funded by NSF’s Data Infrastructure Building Blocks initiative, GABBs is creating a geospatial data architecture that integrates spatial data management, mapping and visualization, and interfaces in HUBzero and will make these available through open source releases. This presentation will report and demonstrate the progress made in the GABBs project, and seek feedback and suggestions from the HUBzero developer and user communities.
Amy Walton is a Program Director in the Advanced Cyberinfrastructure Division at the National Science Foundation, contributing to three major activities in which NSF has a leadership role:
She directed a series of advanced research programs for the processing, analysis, management and visualization of Earth and space science data at the California Institute of Technology Jet Propulsion Laboratory. She has a Ph.D. from Princeton University.
Monday, September 14, 9:00 AM
NSF is involved in a number of challenging research initiatives, such as understanding the human brain; making interdependent critical infrastructure systems more resilient; securing and protecting food, energy and water resources; designing cyber-enabled materials that sense, respond and adapt to the environment; and educating and training professionals in emerging STEM fields. These initiatives address questions of increasing complexity, and require multidisciplinary approaches and expertise. Unprecedented growth in data (both simulations and observations) and rapid advances in technology (instrumentation at all scales, and cyberinfrastructure deployed to connect, compute, visualize, store, and discover) are changing the conduct and practice of science.
Cyberinfrastructure plays a critical role in these research initiatives. Through engagement across NSF directorates, the Advanced Cyberinfrastructure Division is developing collaborative cyberinfrastructure programs that support multidisciplinary research: examples include Data Infrastructure Building Blocks (DIBBs); Software Infrastructure for Sustained Innovation (SI2), and Computational and Data-enabled Science and Engineering (CDS&E). Other mechanisms include public access policies, interdisciplinary collaboration, and interagency and international partnerships. While these investments and activities are accelerating the progress of scientific discovery and innovation, significant challenges remain.
As an Entrepreneur in Residence, Michael helps Purdue faculty and students during the commercialization of their innovations. As a senior research scientist and assessment team lead, his research focuses on studying data driven user behavior patterns on nanoHUB to determine the impact of nanoHUB on the international community in education and in advancing science, as well as on developing visualizations to illustrate this impact. Michael is also currently the CEO of SPEAK MODalities, a Purdue startup with software that helps children with autism develop language skills. Prior to joining Purdue, Michael was founder/senior team member of several information technology startup companies, where he created innovative solutions for extracting patterns from data, collaboration, and constrained optimization. Michael has consulted with many Fortune 500 companies to apply these technologies for solving business problems including operations scheduling, strategic capital investment, process improvement, and new product innovation and creation. Michael holds a BS in Chemical Engineering from the University of Illinois, a MS and Ph.D. in Chemical Engineering from Purdue University, an MBA from Purdue’s Krannert School of Management, and an MBA from the TIAS Business School of Tilburg University in The Netherlands.
Tuesday, September 15, 8:30 AM
Like employees, data are resources necessary for getting the job done. Data serve essential roles for analyzing impact retrospectively, taking immediate tactical action, or strategically planning the future. Unlike employees, we do not get to choose our data. Rather, they or the situation that creates them are placed upon us; and it is our job to learn how to manage these data to extract the value they contain. The focus of this talk will be on several traits of data that are analogous to those of bad employees, with examples from applications of nanoHUB and CatalyzeCare in the domains of nano materials safety, medical device informatics, insurance claims, and hub usage information.
Larry Biehl is a research engineer in the Scientific Solutions group in Research Computing, Purdue University. He received his B.S. in Electrical Engineering and M.S. in Engineering degrees in 1973 and 1974, respectively from Purdue University. Since that time he has been involved in many programs involving remote sensing and image processing projects at Purdue. He currently manages the Purdue Terrestrial Observatory, is a resource for geospatial software licensing on campus, director of IndianaView and is working on the U2U (Useful to Usable) and GABBS (Geospatial Modeling and Data Analysis Building Blocks for HUBzero) projects.
Monday, September 14, 3:30 PM
MultiSpec is a remote sensing analysis tool being developed on MyGeoHub.org as part of the Geospatial Analysis Building Blocks (GABBs) NSF-funded and the IndianaView/AmericaView USGS-funded projects. The tool has been adapted from the desktop Macintosh and Windows versions (https://engineering.purdue.edu/~biehl/MultiSpec/). MultiSpec is an image processing program to display and analyze geospatial images. The current working version on MyGeoHub, a hub dedicated to geospatial data applications, is a subset of the Macintosh and Windows desktop applications. The features include display of images in files from many different formatted files (geotiff, tif, png, hdf4, hdf5, netcdf, to name a few), handle images with one to hundreds of channels (bands), run unsupervised classification (cluster) analyses, run principal components analyses, generate new images from combinations of the channels in the original images such as vegetation indices or principal components images. Several tutorials are available that can be used as the basis for remote sensing image analysis training sessions. There is interest to use this web-based tool in some teacher training workshops in the coming months.
James Fourman is a software engineer and the Lead Developer for the Integrated Collaborative Environment (ICE), an enterprise scientific toolset for the Materials and Manufacturing Directorate of the Air Force Research Laboratory (AFRL). James is responsible for managing the task workload for the other ICE developers, developing and maintaining the Common Service Bus (CSB) on which the ICE platform has at its core, and managing the software development life cycle for deployments of ICE. James received a BS in Computer Engineering from Wright State University in Dayton, Ohio.
Tuesday, September 15, 3:20 PM
The Integrated Collaborative Environment (ICE) is a federated software framework that aims at connecting disparate systems for the purpose of enhanced productivity in the AFRL community. To meet demands for greater efficiency and better research, ICE acts as a common mediator between numerous optimized services. In order to achieve that goal of federated interconnection, a central system communication channel must be in place. Without such a channel, all sub-systems (both hardware and software) within the architecture require a direct connection, including myriad extraction and translation routines. For n systems in the architecture, one would need (n*(n-1))/2 connections in order for each system to communicate with one another. The sheer volume of programming workload to create such an architecture is unmanageable for most software development teams.
The Common Service Bus (CSB) aims to become the broker of all transactions of ICE parties, with several objectives in mind: to reduce the number of custom interconnections needed, to ensure consistency of data transmission, to foster identity management for all system objects, and finally to enable otherwise self-governed systems to participate in the ICE ecosystem. An implementation case study will be presented with a focus on the creation of a robust RESTful API that provides controlled access to research-dependent data models residing in disparate systems. This case study will detail platform-agnostic connections between CSB and the Hub.
Elçin Içten is a PhD candidate in the School of Chemical Engineering at Purdue University. Elçin is working on the development and control of a dropwise additive manufacturing process for pharmaceuticals. Prior to joining Purdue, Elçin received her BS in Chemical Engineering from Bogazici University, Istanbul, Turkey.
Monday, September 14, 2:30 PM
A workflow based knowledge management system, KProMS, that functions as a HUBzero component has been developed at Purdue University. It captures the complete provenance of knowledge by modeling the details of the associated knowledge generation steps as workflows. Its unique workflow representation captures relationships between the processing steps and material and information flows, and data input and output. The general framework can be used for experimental, scientific and business workflows as well as manufacturing recipes. In this paper, we will discuss the use of KProMS to manage and analyze experimental data for an innovative test bed for manufacturing drug products using drop-wise deposition of drug formulations onto edible substrates.
The processing steps in the test bed were modeled as a workflow. For each run, a new instance of the workflow was created and values of the operating parameters were specified. Selected values were then uploaded from the HUB to the LabVIEW system that implemented the specified conditions. After the run the data generated by LabVIEW was uploaded to the workflow. Thus, the workflow instance provided the full context for the associated run. KProMS has built-in workflows for performing preliminary statistical analysis and generating graphs. The workflow-based framework facilitates data selection for analysis or display. Any parameter value or a column of data can be uniquely identified by a four-tuple. Context sensitive choice lists make the selection process very intuitive. As demonstrated in this presentation, KProMS was used by a team of undergraduate research students during their summer internship for managing the data associated with their experimental work.
Rajesh is a graduate student working in the Scientific Solutions Group at RCAC, Purdue University. He has been primarily involved in building data-driven collaborative web applications and tools on the HubZero platform with a focus on geospatial data. He has previously worked on data tertals in the Gridsphere framework that leverage HPC resources to run climate data simulations. In a parallel life he is striving to advance the state of the art in Artificial Intelligence by building an interactive proof assistant.
Monday, September 14, 4:10 PM
As part of an effort to establish a community-based data sharing environment, we previously developed a HUBzero-based web component, iData. Using the open-source data management software, iRODS, iData provides hub users and tools with the ability to publish, manage, discover and consume data in a variety of file formats. The hub’s user groups in turn provide a natural data sharing framework for iData. By managing user files in iRODS, iData can expose a familiar nested directory structure for organizing files and leverage the iRODS metadata catalog to enable indexed searches. In response to growing data storage demands, new storage resources can be easily integrated into the iRODS server while still providing access to them under a unified namespace. Rather than forcing the user to go through the cumbersome process of entering metadata, the iRODS support for trigger-based actions is utilized to automatically extract and capture some file (in particular geospatial) metadata on upload. This potentially avoids duplicated work when metadata has already been attached to a file during creation, for instance in simulation tool outputs. In this talk we will describe how iRODS was used in developing the iData component for self-managing and sharing scientific data on the hub. We will also discuss options for more seamless integration of iRODS into HUBzero. For example, use iRODS as a storage solution for other file-intensive areas of the hub such as Hub Projects, for federation of data and tools from various hubs, and for construction of non-trivial hub tool workflows leveraging the iRODS rule engine and trigger-basedmcle event handling.
Derrick Kearney is a software engineer for HUBzero and Purdue University.
Tuesday, September 15, 1:30 PM
Drew is an Assistant Professor in the Department of Biology at the The College of William and Mary. He completed his PhD in mathematics from The University of Texas at Austin in 2005.
His research interests are in quantitative biology, broadly defined to include mathematical modeling, data analysis, and computer simulation. He is a principal investigator of the NSF sponsored QUBES project (Quantitative Undergraduate Biology Education and Synthesis - https://qubeshub.org), which focuses on strengthening quantitative skills for both faculty and students in the undergraduate life sciences curriculum.
Monday, September 14, 1:50 PM
QUBES (Quantitative Undergraduate Biology Education and Synthesis) is a 5-year, multi-institutional NSF-funded project to address a “call to action” in reports put forth by many institutions, including NSF, NIH, AAAS, and HHMI, among others. The challenges in these reports include preparing students for 21st century science in biology by addressing the need for students to gain skills in quantitative biology (QB) reasoning and literacy. Quantitative, as we see it, encompasses the mathematical (modeling), computational (simulation), and statistical (data analysis) realms.
In this talk I will quickly introduce the five key components of the QUBES project: the Consortium (institutions and societies with common goals in QB education), Hub (the online virtual community leveraging HUBzero infrastructure), Faculty Mentoring Networks (supporting faculty in the understanding and implementation of QB in the classroom), Metrics (redefining professional metrics of success in QB education), and Implementation Research (studying project success). For the remaining time, I will discuss some project visions for QUBES Hub, with possible topics including the exploration of what does and does not work in creating faculty mentoring spaces online, attempts to realize the 5R permissions of Open Education Resources (retain, reuse, revise, remix, and redistribute), and interactive repositories for models and data.
Tuesday, September 15, 4:00 PM
The Integrated Collaborative Environment (ICE) is a federated software framework that aims at connecting disparate systems for the purpose of enhanced productivity in the AFRL community. To meet demands for greater efficiency and better research, ICE acts as a common mediator between numerous optimized services. In order to achieve that goal of federated interconnection, a common language is necessary in order to map data from one system to another. That common language must be capable of ensuring data integrity and traceability all while providing the convenience of an interconnected web of software and services.
I or Handle, the ICE team opted to create a custom framework which addresses the unique requirements of DOD research, such as security. These persistent identifiers are the first-class master names for all objects that exist within ICE, guaranteeing that users can transfer and house data within the ICE ecosystem as they wish with ease, all while retaining a complete map of how data relate to one another and where they live. Data elements can then be related from one system to another through the Hub, for instance, such that all elements relating to some defined category can be depicted with full confidence to the user. Moreover, the ICE PIDs are agnostic to the platforms to which it communicates, enabling the PID to act through the ICE Common Service Bus as a mediator between truly federated exchanges. Through the use of PIDs, the research community is provided with data that are more accessible, more accurate, and more traceable.
Juan Lalinde has a BS in Computer Science, a BS in Mathematics and a PhD in Telecomunications. He is a professor at EAFIT University in Medellin, Colombia, South America, and is the Scientific Director of Apolo, the Scientific Computing Center at the same university. He has been a visiting researcher at the Institute for Human and Machine Cognition (IHMC) and Purdue University.
Monday, September 14, 2:10 PM
IMS Learning Tools Interoperability (IMS LTI) is a standard specification that allows the integration of rich learning applications into learning environments. Its main goal is to enhance a transparent relationship between subsystems within a learning environment where the users could be authorized to use external tools in a secure way. Since HUBzero has the ability to run simulations, model evaluation and the integration with other platforms across its plugins and components, this paper shows a solution developed at EAFIT University that allows any educational institution to integrate Moodle (an LMS) with HUBzero using LTI, allowing the users of Moodle to use the tools provided by HUBzero without a formal registration.
It was implemented as a plugin for the HUBzero platform to which the plugin is integrated, and the services are exposed as an API for the LMS to consume, achieving a fluent user experience between the two systems.
Shawn received his M.S. in Information Technology from Purdue University in 2004. With over 16 years of web development experience, his skills and interest run the gamut from server-side programming to client-side scripting to graphic design to accessibility and usability testing.
In spring of 2005 he joined the Network for Computational Nanotechnology as a web developer, working primarily on nanoHUB.org, contributing to both the visual design and code. With the cretion of HUBzero, Shawn took on the role of senior web developer, spending his time writing code, doing graphic design, and offering support for various projects. He now manages HUBzero's web development efforts and occaisonally finds time to sleep.
Monday, September 14, 1:30 PM
Tuesday, September 15, 1:30 PM
Jeanette Sperhac is a Scientific Programmer at the University at Buffalo's Center for Computational Research (CCR). She earned an M.S. in Computer Science from the University at Buffalo and an M.S. in Chemistry from the University of Colorado. She has worked in both private industry and the public sector.
Jeanette supports vidia.ccr.buffalo.edu, a HUBzero instance hosted at CCR. She is a programmer for XDMoD, an auditing framework that provides metrics for high performance computing centers. Jeanette is also the primary instructor for the Eric Pitman Annual Summer Workshop in Computational Science, hosted at CCR.
Monday, September 14, 1:30 PM
In order to expose undergraduates in the humanities to data intensive computing, University at Buffalo's Center for Computational Research (CCR) teamed with State University of New York (SUNY) Oneonta.
The resulting HUBzero-based Virtual Infrastructure for Data Intensive Analysis (VIDIA) has now supported three semesters of coursework at SUNY Oneonta. Using VIDIA, Social Science faculty have integrated analysis of large text-based datasets into their coursework and research, using tools such as RapidMiner and RStudio. The environment is now more than 300 users strong.
The collaboration is expanding to additional institutions and disciplines, to prepare more students for careers in data analytics. One new collaborator, SUNY Geneseo, offers courses that use VIDIA for linguistic analysis. Students perform research in digital humanities, focusing on text encoding, collaborative interpretation, text mining, and visualization.
We will present an overview of the recent work supported on VIDIA, as well as student impressions, and current status of the hub.
Jason Thiese is a software developer and statistician working with the Integrated Collaborative Environment (ICE) development team at the Air Force Research Laboratory (AFRL). Within ICE, Jason has contributed to the development of a graphical workflow system, forms, and keyword metatags. He is also a developer of other software, such as an order management and accounting system for government purchase cards. In a different role, Jason has performed statistical design and analysis at AFRL, and has coauthored two papers investigating the influence of process cycle on geometry and properties of non-autoclave polymer matrix composites. Jason received a BS in Statistics and an MS in Applied Statistics from Wright State University.
Tuesday, September 15, 4:20 PM
A graphical workflow management system is being developed to meet multiple research community needs, such as the coordination of experiments and simulations, data capture using consistent formats, avoidance of data loss, scheduling of resources, and traceability of the overall research process. The workflow management system allows research processes to be created and sequenced using a drag-and-drop flowchart interface. The interface includes the ability to assign processes to specific researchers, specify the computational or experimental resources to use, define inputs and outputs, and set other properties as needed. All workflow entities – including sub-workflows, processes, decisions, and events – can be saved as templates for reuse.
While the web service endpoints are potentially callable from any platform, development efforts are specifically focused on implementing a component/plug-in to tie the workflow management system to HUB projects. A demonstration of the HUB workflow component will be conducted, in which the component will be used to replace spreadsheet-based experimental monitors.
Christopher Thompson is a research programmer with the Rosen Center of Advanced Computing at Purdue University since 2010. He has contributed to the development of many tools on the DiaGrid hub and is a support consultant for the Scientific Gateways group within the Extended Collaborative Support Services branch of XSEDE.
Tuesday, September 15, 1:50 PM
The DiaGrid hub has a history of creating tools using the HUBzero platform to connect users with scientific models running on computing resources around the world. We believe many tool developers could benefit from a presentation to discuss all the ways remote execution is supported including examples from real, working tools on DiaGrid. Through examination of the designs and code snippets of deployed tools, we would show how developers can utilize the HUBzero platform to submit and monitor remote execution of scientific models within their own tools. We also would discuss the process by which developers can work with the HUBzero team to get new remote computing resources supported and the types of issues which can arise from adding remote computing to a tool. One or more of us can present these topics depending on the time allotted. Depending on the conference track to which this would be accepted, we can also alter tone of content (more or less technical, focused more on the platform features vs more on the success stories of our example projects, etc) to match the audience.
Karan Vahi is a Computer Scientist in the Science Automation Technologies Group at USC Information Sciences Institute.
Karan has been associated with the Pegasus Project since its inception, first as a Graduate Research Assistant and then as a full time programmer. He is currently in charge of development for Pegasus WMS and works closely with the user community to drive its development. In addition, he is also involved in two NIH funded projects, PAGE and CGSMD where computational workflows are being developed for Quality Control Analysis and imputation analysis. Before that he was the technical lead for the STAMPEDE Project that developed high performance monitoring infrastructure for workflow systems. The project has since been integrated in Pegasus and Triana workflow systems. From 2006 to 2008, he was also the lead developer on a AFRL/IARPA funded project for developing a framework for running automated and on demand intelligence analysis on multiple and varied data sources as part of a Terrorism Surveillance System.
Karan received a M.S in Computer Science from University of Southern California and a B.E in Computer Engineering from Thapar University, India. His research interests include scientific workflows and distributed computing systems.
Tuesday, September 15, 1:30 PM
The HUBzero platform for scientific collaboration enables tool developers to build tools that are easily shared with both researchers and educators. This enables users to login and start their analysis without worrying about setup and configuration of the tools. Once, the analysis is done researchers can then analyse the results using various inbuilt capabilities for plotting and visualization. To facilitate handling of more complex workloads, we have integrated Pegasus Workflow Management System with “submit”, the main tool used by tool developers in HUBzero to submit analysis to local and remote compute resources. Pegasus WMS provides a means for representing the application workflow in an abstract form which is independent of the resources available to run it and the location of data and executables. It compiles these abstract workflows into an executable form that can be executed on local or remote distributed resources. Pegasus also captures all the provenance of the workflow lifecycle from the planning stage, through execution, to the final output data. This enables users to easily debug and monitor their computations that occur on remote resources. The advanced data management capabilities of Pegasus allow the tool developers to execute the tightly coupled parts of their workloads on a HPC cluster, while farming out remaining tasks to a distributed HTCondor based computing infrastructure.
The talk will give an introduction to scientific workflows with Pegasus and focus on integration of Pegasus WMS with “submit”, and how it enables tool developers using the Rappture toolkit or “submit” directly, use scientific workflows. Currently the tools benefitting from this integration include BLASTer an online tool to run BLAST on the DiaGrid Hub, CryoEM a tool to reconstruct 3-D structure of macromolecular assemblies, OpenSees workflows through NeesHub and parameter sweep workflows to study ballistic transport in Field Effect Transistors using OSG resources through nanoHUB.
Elizabeth Wirrig serves as the Lead Analyst for ICE, an enterprise scientific toolset for the Materials and Manufacturing Directorate (RX) of the Air Force Research Laboratory. Elizabeth works closely with the Scientists and Engineers in AFRL, gathering requirements to map out their research processes and find ways to customize ICE to fit their specific needs. Elizabeth received a B.S. in Management Information Systems from the University of Dayton.
Tuesday, September 15, 3:00 PM
E Wirrig 1, M Jacobsen 2 , L Simmons 3 , L O’Connell3, K Servaites3, K Porter3 Materials and Manufacturing Directorate, Air Force Research Laboratory, Wright-Patterson AFB, OH 45433
One purpose of the Materials Genome Initiative is to reduce the lead time of materials development. In support of this initiative, Integrated Computational Materials Science and Engineering (ICMSE) in the Materials and Manufacturing Directorate at the Air Force Research Laboratory is developing a path towards model based research. A large scale software infrastructure is required to provide these capabilities. In order for this infrastructure to be adopted by the scientific community, it must be tailored to research requirements. In the federal space, most enterprise software development efforts are marked by the failure of commercial turn-key solutions. The Integrated Collaborative Environment (ICE) is a federated software framework that delivers dynamic and customized functionality to a diverse and complex research landscape. A key driver of success for ICE is the utilization of process and data analytics to identify data management requirements of representative research efforts.
Good design practices and requirements definition are an effective means of both developing robust software as well as defining standard process. These practices include requirements gathering and elicitation using Business Process Modeling Notation, Unified Modeling Language class diagrams, use cases and discrete milestone success criteria. This presentation will describe the use of these tools and methods in the context of an Organic Matrix Composite research project. The results are twofold—rapid delivery of functionality for the research team in ICE integrated with the Hub and a well-defined sequence of the research process, “research on research.” Such an involved requirements process also engenders the development of ICMSE culture—a cooperation of software and materials engineering.
Lan Zhao is a research scientist in Rosen Center for Advanced Computing (RCAC) at Purdue University. She has been working on the design and development of data driven cyberinfrastructure systems for multiple cross-disciplinary projects, including GEOSHARE (Geospatial Open Source Hosting of Agriculture, Resource and Environmental Data), WaterHUB, U2U (Useful to Usable), GABBS (Geospatial Modeling and Data Analysis Building Blocks in HUBzero), XSEDE CESM modeling gateway, DRINET (Drought Research Initiative Network), and IsoMAP (Isoscapes Modeling, Analysis and Prediction). Her interests include infrastructure for scientific data storage, retrieval, provenance, and processing, integration and sharing of heterogeneous data sets and models, and composition of data-driven scientific workflows.
Monday, September 14, 3:10 PM
A large climate model output data archive was produced by an unprecedented coordinated effort among more than 20 climate modeling groups around the world through Coupled Model Intercomparison Project Phase 5 (CMIP5) in collaboration with Intergovernmental Panel on Climate Change (IPCC). This dataset is critical in advancing our knowledge about climate processes as well as their impacts on a number of biophysical and social dimensions. However, researchers, especially those from social science domains, often face significant obstacles to discover, access, and analyze the spatial data as it requires specialized expertise and computation resources. We have developed a climate data aggregation tool to help make the CMIP5 data easily accessible and thus more usable. Currently hosted on the GeoShare hub, this tool enables a seamless workflow of selecting data of interest, downloading the data from the archive using Globus Online, aggregating the downloaded climate data to a user-specified level, and visualize the result on a map. In this presentation we will describe the design and implementation of the CMIP5 climate data tool and its impact on the user community.
Monday, September 14, 3:50 PM
Geospatial data is growing rapidly in volumes thanks to the advancement of large observatories, sensor networks, GPS technologies, and personal devices. Such data is crucial in research and education across many disciplines, making great impacts on our daily life. However, it is not an easy task for researchers and end users to explore, analyze, and share geospatial data. Our past experience involved months of development to create a web application by a computer science graduate student to share a climate dataset online via a map interface.
The NSF funded GABBS project (mygeohub/groups/gabbs) is developing building blocks to help non-experts to easily develop tools to meet their geospatial data needs. As part of the GABBS project, we created the GeoBuilder tool using the HUBzero platform and GABBS geospatial building blocks. GeoBuilder guides users in a step-by-step wizard-style interface to load geospatially referenced csv files, configure a data viewer on a map, and explore the data by plotting them dynamically. It also allows a data provider to save a GeoBuilder-generated viewer of his dataset and share with collaborators or the public. In this talk we will describe the design and implementation of the Geobuilder tool and demo how it can be used to share geospatial data with a map interface and plotting functions in less than 3 minutes with no programming required.