A Case Study Automated Continuous Software Engineering Cycle

Abstract

There are different processes and methods in the software development. Automated continuous software engineering is further extension to the existing popular methods, such as, continuous integration and continuous delivery [1], which closes the software development engineering and customer feedback cycle. The automated processes in each phase makes it the most effective and productive process during the software and application development. If fulfils the goal for the customers to verify the system health and further correct the system issue automatically.

Keywords: software development, continuous integration, continuous delivery, continuous engineering.

I. Introduction

From time to time, many software development teams across industry wonder which process to leverage to best suit their needs. That is mainly because different teams have developed different systems or applications, such as, Microsoft Windows, HR web site, and services. Also, it is because people are used to working on the traditional method of developing software, such as, waterfall model [2].

In today’s software industry, there is a trend to move to the scrum method to develop the software. That is because the traditional waterfall method is too slow to catch up with the world change. A modern application which requires to be updated and released into production instantly. During the change, the challenge is how to hammer down the quality, which was traditionally guaranteed by a lengthy system and user acceptance tests.

By nature of software development, we cannot skip the basic processes, such as, environment server provisioning, code compilation & build, code deployment, build verification test, production support, and re-engineering. We cannot avoid those procedures, but can make them as efficient as possible.

Most of the teams focus on one direction, which starts from the engineering development and ends at the production support. But a more and more emphasis is required to put more emphasis on the reversed direction, which is from the production support back to the software development. How to make this process more automated and efficient is a big challenge. This is a weakness area for many teams on system and application development.

Automated Continuous Software Engineering Cycle (ACSEC) is a concept, which sets one goal which is to practice continuous engineering. It emphases on the whole cycle starting from development till production release, and continuously working back from the production monitoring till re-engineering. The key is to automate each phase of the cycle to make the process fast, robust, and efficient.

II. Continuous Build

As soon as the development codes are checked in, a build process is kicked off. This is the first step to verify the code quality to phase out some basic code issues, such as, compilation errors and style & format mis-match, and warning in code metrics.

Now, the build process can also be integrated with the unit tests, build verification test, regression test, against which the statistics of code coverage are provided.

The Figure 1 shows a sample report output generated from a build service which is integrated within the Microsoft Visual Studio.

Figure 1 Sample of Build Service Report

III. Continuous Provisioning

After a full build is complete successfully, a hardware provisioning process can be kicked off automatically in the continuous cycle.

Many companies leverage Microsoft products, such as, Genesis or Azure to accomplish the provision need with Virtual Machines (VMs). In Genesis, a VM template can be created upon the server configuration requirements, such as, a list of server components along with VMs, Operation System (OS) on each VM, SQL database, and etc. Even some of required software applications are pre-loaded onto the template, such as, remote debugging tool and Fiddler. Even more, some administrative work and network configurations can be done ahead of time, such as, firewall configuration, remote access enabling, and adding of a list of administrators on the servers.

The Genesis API against the template is called as long as a provision is triggered. The needed hardware will then be provisioned automatically. Actual, in the future, more and more VM provision will occur in cloud, such as, Microsoft Azure, which will make the provision process more efficient.

The following Figure 2 shows a sample code to call a Genesis API.

Figure 2 Sample Code to Call Genesis API

IV. Continuous Deployment

After a successful provisioning of hardware, on which the software is going to be installed, an automated deployment process is kicked off.

Actually, the deployment can occur on either an existing hardware or any new provisioned hardware.

Besides, the deployment can occur in either a one-box environment which means one VM contains all of the components of the system or a multi-box environment. The one box environment has a limitation to simulate the real system configurations. Therefore, a multi-box environment, which is closer to a production, is needed. In a multi-box environment, the network connection, security settings, such as firewall, can be validated. Also, something like the remote access permission for some of specific service accounts are to be validated as well. Furthermore, some support software, such as Fiddler [3], and Windows remote debugger, can be pre-installed and verified.

The Figure 3 shows a script to deploy some software bits onto a one-box environment. Here, the number in green is the Genesis environment ID, which indicates the specific one-box environment. The same can be applied to the multi-box environment.

Figure 3 Sample Script to Deploy in One Box Environment from Build Location

V. Continuous Test

As soon as the deployment is complete successfully, a Build Verification Test (BVT) test can be kicked off immediately.

The BVT contains a set of basic test cases that verify the build health as a whole. If any of the BVT test cases failed, the build should be noted as a failed one and hence cannot be deployed in the production in the real world. Some investigations must be done against the build to result in either a bug being identified or a BVT test code being updated.

The following Figure 4 gives a sample command on how to execute a suite of BVT test cases against a specific environment. The given green number is the environment ID as an input parameter, which can be tied to either a one-box environment or a multi-box environment.

Figure 4 Sample Script to execute BVT Using Visual Studio

VI. Continuous Monitoring & Reporting

After the build of the feature software is deployed in the production and BVTs are passed, the customers may do some user acceptance tests. If UAT is passed successfully, the feature is defined to be live in production.

But the software engineering process has never been stopped at this phase since there are always some issues in production, which will be found after customers start using the product not matter how carefully people test it.

Many support teams set up some robust monitoring to the systems, such as, SCOM, Telemetry, performance monitor, App Insight, and etc. in order to capture the real time production data, monitor the system status, and analyze the system health intelligently. Sometimes, an email, or a message, even a phone call ought to be triggered and sent over to an on-call person if the system metrics value exceeds the pre-defined thresholds.

The metrics data are continuously captured and the alert thresholds can be set up ahead of time, which can be fired at different levels of the systems, such as, infrastructure, service, IIS, and database. An alert can also be at different level of severity. Furthermore, an alert can also be specified at the application level, such as, service response time, throughput, number of API errors, and service level agreement (SLA) violation, and etc.

The captured metric data can be re-processed and aggravated, upon which a meaningful reporting system is built.

A visible and brief report or dashboard will help the customers, support team, and engineering team to response promptly and properly. It is a business intelligence process to set up a meaningful and helpful reporting.

In order to capture the meaningful and useful data, a comprehensive design is required. During design, not only the basic system functions, but also, the monitoring and reporting ought to be included. A logging as well as tracing feature must be in place to fulfil the goal. For a long term, it will save a lot of cost and energy to the team if a meaningful and useful monitoring and reporting feature is implemented.

The below Figure 5 shows a basic sample of telemetry dashboard within open source product Kibarna, which provides the capability to do a elastic search work seamlessly while interacting with the real-time data.

Figure 5 A Sample of Telemetry Dashboard

VII. Continuous Regression

If there is a failure or error in production, the support team normally tries to troubleshoot the issue in the production first. If they cannot figure out and resolve the issue quickly, they normally escalate the issue back to the engineering team. The first thing the engineering team wants to do is to re-pro the failure if they don’t have a direct access to the production, which is the case for many of the systems and companies.

In order to re-pro the failure, the first step is often to grab the production data which has caused the failure, and load them in the pre-production environment for a re-pro purpose. In many cases, this step is a manual process. If the process can be automated, it will have saved many teams a lot of time, speeded up the troubleshooting process, and achieved a better customer satisfaction.

VIII. Continuous Correction

There are different levels of issues thrown from the systems. Some of them are easily resolved automatically, for example, the service is down. If there is a monitoring of services, as soon as any of the services are down, it will be detected immediately. Then, the information will be passed to a system controller who will do some analysis and either start the services or notify the support based on pre-defined settings. Also, it can trigger a process to raise a support ticket to the support team. There is a carefully selected list of auto-correction items, which should be set up properly to resolve any frequently occurred issues.

Of course, there are always some issues which require the engineering teams to work on it, such as, some code bugs. Then, a new engineering process will be kicked off against those issues. But before the new sprint starts, it is ideal to automate as much as possible to shorten the process of development, build, deployment, test, monitoring, reporting, re-pro, and regression. By doing this, the whole development efficiency will be improved and a high productivity will be achieved.

XV. Conclusions

There are a set of typical methods inside across the industries, which have been used by different companies and teams to develop a software system or application, such as, Waterfall; Scrum[4]. Continuous Integration (CI) is a method that people leverage to improve the integration process. More and more, there is a need to not only practice CI, but also, realize the continuous working in each phase of the software engineering cycle. This also includes the phase of production monitoring, production issue regression, and production issue correction, which have been mostly ignored by many project teams.

The software development has never been only features. It also includes design, performance, quality validation, and process. A bottleneck in any of those will eventually impact the product and service delivery and quality dramatically, each one of which needs to be automated. The automated continuous software engineering cycle (ACSEC) promotes the automation being applied in each phase of the cycle. It should be applied not only in one direction from build, provision, deployment, build verification test, quality validation, user acceptance test, and production release; but also, it should be implemented reversely from the production monitoring & reporting, bug raising, issue re-prod, issue correction, and regression in pre-production environments. In short, it needs to occur in each phase of the software life cycle.

In this paper, it is the first attempt to address and resolve the complexity and inefficiency in the software engineering, and try to bring some concept and integration into reality.

XVI Acknowledgements

I would like to thank Mr. Srinivasa Rao Malladi and N.J. Wang of the team’s leads in MPSIT who take time to review the paper and provide some valuable feedback and corrections.

XVII. References

[1] Booch, Grady (1991). Object Oriented Design: With Applications. Benjamin Cummings. p. 209. ISBN 9780805300918. Retrieved 2014-08-18.

[2] Royce, Winston (1970), Managing the Development of Large Software Systems, Proceedings of IEEE WESCON 26 (August): 1–9

[3] Lawrence, Eric (June 15, 2012). Debugging with Fiddler: The complete reference from the creator of the Fiddler Web Debugger. ISBN 978-1475024487.

Xin Bai, Dawn Wang, Venkata Siva Prasad Vitakula

The Microsoft Corporation

Redmond, WA 98052

xinbai@microsoft.com

dawang@microsoft.com

vvitak@microsoft.com

Posted by: Xin Bai, Senior SDET, Microsoft Corporation, United States (14-Nov-2014)