Software System Design — Stay calm if follow design principles

You deployed multitier enterprise applications into production and make clients happy with a great party! Congratulations.

What next? OR What would be the most challenging tasks after the deployment? You might experience Server Not responding, Data loss, API call failing, Time out from Database, slow response, no proper logs, etc. Most of the time we face more nonfunctional issues in Production than functional issues.

How do we proactively address these situations? Most of the time we can avoid these situations if we properly address non-functional requirements at the project's initial phase such as identifying application scalability and reliability requirements, ensuring maintainability, Identifying proper software/hardware components, etc.

Here, I am talking about what are the key characteristics of any software design and how these help you make the most sustainable software design.

Without further due, let's start and key characteristics are —

  1. Availability
  2. Reliability
  3. Scalability
  4. Maintainability


You might be working internal Application with limited users Or a High Data-intensive million concurrent user access application, But what would be the customer's primary requirement — It is system should be available at any time irrespective of the performance. Of course, scalability is one of the solutions to ensure availability, but we need to ensure the application should be active if any Software, Hardware, or Manual Faults.

Software Fault: Responsibility starts from the developer, Architect to avoid any faults and some recommendations are follow the SOLID principles while designing the application and Proper Error handling, logging, externalization, and localization while developing the application.

Hardware Fault: You may expect hardware faults like — OS patch is corrupted, Linux Kernel is not working, CPU fan is not functioning, Air condition issue in Server Rool. Some recommendations to avoid this fault are to schedule planned maintenance (like midnight / Holidays ) or Use different sets of Hardware and some of them are on standby while other servers s are updating software. Update / Fault domain of Azure VM is one of the best approaches to handle OS updates in large applications.

Manual Fault: The goal is to reduce manual intervention in any phase of deployment and implementation, And some practices are TDD, CI/CD and Automation testing etc.


This is confusing for most of the people once we discussed Avaialibility first then we discuss reliability or vice versa :)

So, How we can say our system is reliable, consider this scenario you designed an enterprise high available system that hosts multiple VMs and it is working fine for a period of time (say — 5 years), But what if you don’t install Operating system updates, Antivirus updates, etc in servers after the first deployment and it may cause serious security issues even data loss and impact your customer trust. Here the system is available (and scalable too) but not reliable.


Sometimes you think if we have proper log files then we can easily analyze and fix the Prod issue and save our time. OR Deployment will be easy if we unplug the business logic from the API layer. OR Keep a Constant file instead of Hardcoding in multiple places.

The average cost of Sofware maintainability approximately takes 75% of the total cost and maintainability is the key deciding factor in choosing the right software components for any application. For example, you are heavily using SQL server in your design and need to implement a limited queue in your application (to interact with another component) and the easiest and maintainable approach is to introduce a custom queue model using SQL server table instead of Kafka or Azure Service bus.

But if you targeting complex and million transaction queues, then the best choice is to introduce an Enterprise system queue solution.

Maintainability comes in different phases like procuring Hardware, designing software components and coding.


A simple definition is — Capability of the system to manage a higher workload.

identify the scalability requirements during the design phase, like how to manage peak workload time. But how to calculate.

Every application has its own unique load parameters (or Workload parameter) like throughoutput (How many batches of data are processed in 1 sec ) response time (web application request-response time) and we need to calculate scaling parameters based on customer expectations and which help us to introduce proper scaling.