We have two tutorials for this year's DSN 2017!

Tutorial 1: LLFI and the Art of Fault Injection

Presenter: Karthik Pattabiraman, University of British Columbia (UBC), Canada

Abstract

Fault injection has been an extensively researched and well-explored topic in the DSN community. However, today there is no easy framework that allows ordinary developers and testers to inject both hardware and software faults, in a customizable fashion. In our group at UBC, we have been developing one such framework called LLFI over a number of years now (github link). LLFI is based on the popular open-source LLVM compiler, and allows fault injection to be combined with program analysis techniques. This allows targeted injection of faults into specific program constructs and data, and easy correlation of the fault injection results with the program's code. LLFI can inject both hardware and software faults, and has been used in a wide variety of fault injection studies (both in our group and elsewhere). LLFI has been designed to be easy to use and has both a GUI, and command-line scriptable interface. It also has a domain-specific language (FIDL) for writing new fault injectors with the framework.

This tutorial will teach the audience the basics of LLFI, the design philosophy and how to use it. We will start by learning to perform simple fault injection experiments for both hardware and software faults using LLFI. We will also explore how to interpret the results of the fault injection experiments and use them in real-world case studies. Finally, we will delve into the internals of LLFI and learn how to write new fault injectors using the FIDL language, and the APIs of the framework.

Bio

Karthik Pattabiraman is an associate professor in the Electrical and Computer Engineering (ECE) department at the University of British Columbia (UBC). Karthik's research interests span the areas of fault tolerance and software engineering. Karthik received his M.S. and PhD degrees from the University of Illinois at Urbana Champaign (UIUC). Karthik has been a post-doctoral researcher at Microsoft Research (MSR), Redmond. He was awarded the William Carter award in DSN 2008, the best paper runner up award at ICST 2013, and the SIGSOFT distinguished paper award at ICSE 2014. He was general chair for PRDC, 2013, and is program co-chair of ISSRE'17. He is a member of the IFIP WG 10.4, and a senior member of the IEEE. Find out more about him here.

Tutorial 2: A practical view of modeling and quantification of network survivability

Presenter:

Prof. Poul E. Heegaard, (poul.heegaard@ntnu.no) NTNU (Norwegian University of Science and Technology), Department of Information Security and Communication Technology

Prof. Kishor S. Trivedi (ktrivedi@duke.edu) Pratt School of Engineering, Electrical & Computer Engineering Department, Duke University

About

The goal of this tutorial is to provide an introduction to the concept and definition of survivability and to demonstrate approaches to model and quantify survivability in networks. In our tutorial we define survivability as the "ability to provide services in compliance with the requirement even in presence of major and minor failures in network infrastructure and service platforms caused by undesired events that might be external or internal". The network survivability is quantified as defined by the ANSI T1A1.2 committee, which is the transient performance from the instant an undesirable event occurs until steady state with an acceptable performance level is attained. Examples are taken from the survivability of mobile networks and virtual connection over an IP network as well as from smartgrid.

After an introduction to survivability and the modelling framework for quantification of survivability, the tutorial will present an exercise with example from a 5G mobile system. The objective with the exercise is to provide useful insight and experience with the use of an analytic tool (Sharpe), and programming of a discrete event simulator.

Tentative time schedule (6 hours including breaks)

  • Introduction of concept and framework (90 min)
  • Break (15 min)
  • Tool: Sharpe (20 min)
  • Tool: DEMOS (20 min)
  • Introducing the exercise (20 min)
  • Lunch (45 min)
  • Work in groups (120 min)
  • Present and discuss solution (30 min)

As a full-day tutorial (6 hours): both analytic and simulation exercises. [preferred]
As a half-day tutorial (3 hours): only analytic exercise.

The tutorial is in the area of: Models and methodologies for programming, evaluating, and assessing dependable and secure systems: (performance, dependability and security evaluation; analytical and numerical methods; simulation; experimentation; benchmarking; verification; field data analysis; data mining techniques) with hands-on exercises using analytic (and simulation) tools.

Bio

Poul E. Heegaard is Professor at NTNU (Norwegian University of Science and Technology), Department of Telematics (Department of Information Security and Communication Technology from Jan 1, 2017). Heegaard has since 2006 been on the faculty at NTNU, and was Head of Department 2009-2013. He has also been a Senior Research Scientist at Telenor R&I, and Senior Scientist at SINTEF Telecom and Informatics. Heegaard is the author/co-author of more than 150 articles and has supervised 9 PhDs. He has given numerous tutorial and talks at international meetings and conferences. His research interests cover performance, dependability and survivability evaluation and management of communication systems. Special interest has been rare event simulation techniques, and monitoring, routing and management in dynamic networks. His current research focus is on performance, dependability and survivability in interacting complex systems, which includes distributed, autonomous and adaptive management and routing in communication networks and services. Examples are Software Defined Networking and ICT-power system integration (Smart Grid).

Kishor S. Trivedi holds the Hudson Chair in the Department of Electrical and Computer Engineering at Duke University, Durham, NC. He has been on the Duke faculty since 1975. He is the author of a well-known text entitled, Probability and Statistics with Reliability, Queuing and Computer Science Applications, published by Prentice-Hall; a thoroughly revised second edition (including its Indian edition) of this book has been published by John Wiley. He has also published three other books. He is a Life Fellow of the IEEE, and a Golden Core Member of IEEE Computer Society. He is the recipient of IEEE Computer Society Technical Achievement Award for his research on Software Aging and Rejuvenation. World Scientific has listed him in the top 100 computer scientists worldwide based on the h-index. Trivedi has published over 500 articles and has supervised 46 Ph.D. dissertations, and has given numerous keynotes, tutorials, and talks at international meetings and conferences. His research interests are in reliability, availability, performance, performability, security and survivability evaluation of computer and communication systems. He works closely with industry in carrying out reliability/availability analysis, providing short courses on reliability, availability, performability modeling and in the development and dissemination of software packages such as SHARPE and SPNP.