Advanced Systems Lab - Fall 2017

Course Organization & Materials

Find below the dates and details of tutorials (T) and exercises (E):

Date/Time Type Description Materials
 16. Nov. E Exercise session on network of queues.  [Slides]
 14. Nov. T Tutorial session on analyzing a system.  [Slides]
 09. Nov. E Exercise session on M/M/1 and M/M/m queues.  [Slides]
 07. Nov. T Tutorial session on queueing theory (cont.)  
 02. Nov. E Exercise session on 2K experiments  [2K exp]
 31. Oct. T There will be no tutorial session on October 31st.  
 26. Oct. E This exercise session focuses on good and bad practices when plotting and explaining your results.  [Slides]
 24. Oct. T Tutorial session on queueing theory.  [Slides]
 19. Oct. E The fifth exercise session will cover GnuPlot, and good and bad practices when generating plots.  [Plots]
 [GnuPlot Examples]
 17. Oct. T Tutorial session on planning experiments.  [Slides]
 12. Oct. E The fourth exercise session will cover good and bad practices in Java middleware development.  [Java]
 10. Oct. T There will be no tutorial session on October 10th.  
 05. Oct. E The third exercise session will take place in individual groups. Please look up your assigned group in the table below.  [Baselines]
 03. Oct. T Tutorial session on the life cycle of an experiment.  [Slides]
 28. Sept. E The second exercise session will take place on Sept. 28th in CAB G61. It focuses on Microsoft Azure, Bash and Git.

 [Azure Slides]
 [Bash Slides]

 21. Sept.  E The first exercise session will take place on Sept. 21st in CAB G61. It gives an overview of the project.  [Slides]
 19. Sept. T The first tutorial session will take place on Sept. 19th in CAB G61.

 [Intro]
 [Slides]

 

Project Details

Project Description: [Project Description]

Report: [Report Outline] [Report Outline Tex]

Programming: [Java Main Class] [ANT Build File] [Bash Script Examples]

Azure: [VM TemplateCaution: When creating the VMs, the machines are started automatically. Stop them if you do not run experiments right away.

Project Deadline: December 18th, 2017 at 17:00 (CET)

Azure Billing Information: [Internal page]


Literature

"Art of Computer Systems Performance Analysis" - Raj Jain
John Wiley & Sons Inc; Auflage: 2 Rev ed. (21. September 2015)

"The Art of Computer Systems Performance Analysis" - Raj Jain
Wiley Professional Computing, 1991

From the 1st edition of particular relevance are the following chapters:

  • Chapters 1, 2, 3 (General introduction, Common terminology)
  • Chapters 4, 5, 6 (Workloads)
  • Chapter 10 (Data presentation)
  • Chapters 12, 13, 14 (Probability and statistics)
  • Chapters 16, 17, 18, 20, 21, 22 (Experimental design)
  • Chapters 30, 31, 32, 33, 36 (Queueing theory)

 


Lecturer

Gustavo Alonso

 


Course Hours

Tutorials: Tuesday, 17:00 – 19:00, CAB G 61.

Exercises: Thursday, 17:00 - 19:00

General Contact: sg-asl [at] lists.inf.ethz.ch


Exercise Sessions

Exercises sessions are held on Thursday from 17:00 - 19:00 in small groups. In the exercise sessions, we answer high-level questions related to the project and the report.

 Assistant
 Room  Email Last names assigned
 Claude Barthels  CHN D42  claudeb [at] inf.ethz.ch  A-Cl
 David Sidler  CHN D44  dasidler [at] inf.ethz.ch  Co-I
 Kaan Kara  CAB G52  kkara [at] inf.ethz.ch  J-M
 Muhsen Owaida  CAB G56  mewaida [at] inf.ethz.ch  N-Sto
 Zsolt Istvan  CHN D46  zistvan [at] inf.ethz.ch  Stu-Z

 


Office Hours

Office hours are indented to provide you advice that will help you to complete the project and the report. To make an appointment, contact your teaching assistant by email.

  • Make sure you come prepared with concrete and well formulated questions. If possible, include them in your email.
  • We will not complete the assignment for you and neither recommend nor make design decisions on your behalf.
  • We will not debug your code, provide technical support for your setup/scripts/data analysis, or give hints about whether what you have done so far is enough.
  • We will not grade your project in advance, so please avoid questions that try to determine whether what you have done is correct or sufficient for a passing grade.
Time
Assistant
Thursday, 10:00 to 12:00 Muhsen Owaida
Thursday, 15:00 to 17:00 Kaan Kara
Thursday, 15:00 to 17:00 Claude Barthels
Friday, 09:00 to 11:00 Zsolt Istvan
Friday, 09:00 to 11:00 David Sidler

 


FAQ / Tips

Q: How do I check if I have been assigned a voucher? 

A: First go to https://www.microsoftazuresponsorships.com/Manage. Make sure that you see a subscription there. Then, log in to https://portal.azure.com/, and change your directory as explained in the slides. If you do not see the directory, check the Spam folder of the provided email address. 

 

Q: Which Java version am I allowed to use? 

 A: Use Java 7 or 8. 

 

Q: Which Ubuntu version am I allowed to use? 

 A: You can use Ubunu 14 or 16. 

 

Q: What types of clients and commands will be used to automatically test my middleware implementation? 

A: We run automatic tests as a way to reduce the complexity of grading so many reports. We have no hidden unit tests or performance bounds that must be reached. The tests check basic things: that the system actually runs when connected to memtier clients and plugged to memcached servers; that the system actually processes the requests as required (get, sets, multiget); that the constraints specified in the project specification are followed regarding message sizes, control commands, etc. Such tests are performed because, in the past, we have received incomplete submissions, even some not working at all. This made it next to impossible to understand the results that were presented in the report since we can only work on the assumption the project submitted follows the specification. We do not have the capacity to manually check the code for every student so, instead, we do a basic sanity check to ensure the system submitted is the platform used for running the experiments.Although we do not do it for all projects, if we feel your performance numbers reflect a system's behavior we cannot understand, it is possible that we will then try to reproduce your results by running the same workloads you claim to have run and see if we get approximately the same results. If we cannot reproduce your results, we will call you for a meeting in person to clarify the situation.Finally, all submitted code will be checked against this and last year's projects. Any plagiarism will be reported and lead to a failing grade.  

 

Q: Can i aggregate my statistics outside off the middleware (after termination of the experiment)? 

A: Yes, you can collect the data for each thread separately and then collect it after termination of the experiment with some scripts. In general, you have quite some freedom how you implement this, for instance you could also introduce a helper thread to collect the statistics. Important is that you collect in each thread the required statistics and then either collect/combine these results already in the middleware (online) or afterwards with some scripts (offline). 

 

Q: Why do i have to collect/aggregate statistics over a 5 second window? 

A: The 5 second window is an upper limit suggested by us to reduce the overhead of logging (writing to a file) or the amount of data you collect in-memory. If your implementation can deal with smaller windows and thereby collect more datapoints without introducing a visible overhead, feel free to do so. 

 

Q: Should I measure service times separately for each request type? ..,for each thread? 

A: Ideally you measure the service time separately for each request type (SET, GET, multi-GET). As we have shown in the exercise session, aggregating (calculating the average) over different request type can lead to a value that in practice never (rarely) is measured/occurs. For building the M/M/m model it is necessary to have the service time (service rate) for each thread separately.  

 

Q: Why don't you just tell me exactly what I need to measure? 

A: There are different ways to measure the same things depending on how the system has been constructed. It is important to look at the whole report and the questions we are asking in it because that provides very accurate information on what is needed. For instance, regarding getting service rate data for each worker thread separately, the report requires to demonstrate the middleware balances the load across all memcached servers. This is difficult to accomplish if you do not have information on what each worker thread is doing. We strongly recommend to go over the report template and make a list of all the data required and then proceed to implement the instrumentation accordingly.  

 

Q: Should I implement a network queue? 

A: The network queue is, from a functional point of view, not strictly necessary as worker threads could get the requests directly from the network interface. However, if there is no network queue where the requests are placed before being processed, it becomes difficult to do many of the necessary measurement to understand the performance of the system. For instance, it will not be possible to measure the time a message waits to be processed or the average number of requests waiting in the queue at any given time. Without such information, the experimental analysis and definitely the analytical treatment become next to impossible and the results will be very poor. This is why we require that the middleware includes a network queue. 

Q: What do I put into the network queue? 

A: You decide the type of the objects you place on the network queue. Each design decision has implications in terms of the performance you are going to observe because, depending on this decision, the way messages are actually processed varies. The design decision will also have implications in terms of what you measure inside the middleware as overall response time, service time, waiting time, etc.  

Q: Why don't just tell me exactly what I need to do instead of pointing to the several choices available? 

A: The course is about making design choices, building a system, and then experimentally, as well as analytically, explaining why the system behaves the way it does. How you implement the network queue, where you actually do the message parsing, what objects you give to the worker threads, etc. will influence how the system behaves. We expect you to make a choice on the design and then explain what are the consequences of that design in terms of the performance observed. To make such choices, you should think carefully about the overall system behavior, the overheads that each operation will cause, and -very important- to look at what we are asking you to do in the complete report. Some of the questions about whether a network queue is needed can be easily answered from what we ask to measure since, as pointed our above, without the queue, such measurements become very difficult. 

Q: In case in Section 4.2 I have a different number of clients for each worker thread scenario that produce the maximum throughput, how should I show this?

A: In general throughout the report it is a good idea to be as explicit about your setup and measurement points as possible. This helps us interpret your results and it makes it possible for you to quickly verify and compare inside and across sections. In this particular case you can either include the number of clients as part of the explanation of the table, or as an additional row in the table.