Site24x7 is an all-in-one cloud-based performance monitoring solution by the Zoho Corporation, an Indian software development company that has recently celebrated its 25th anniversary. I sat down for an inspiring talk with Rajalakshmi Srinivasan, Director of Site24x7, to discuss the changing needs of IT administrators and the technologies that help tackle them.
Please describe the story behind the company: What sparked the idea, and how has it evolved so far?
We started Site24x7 in 2006, as cloud technology was starting to pick up. Originally, it was more of an experimental startup for monitoring our own websites and infrastructure. We quickly realized we were onto something that many companies would want to utilize. That is how we started bundling Site24x7 as a product for external users.
The first need that we had at Zoho was to monitor the availability and performance of our websites across different parts of the globe. Then, we went on to add all the other monitoring capabilities that are needed for infrastructure and global presence monitoring.
Since then, we kept adding solutions and features on top of it to enhance the capabilities and align with the latest technologies. Site24x7 is a mature product that has been on the market for almost 15 years.
If you have an international website, it needs to be monitored from India as well as the US and all other parts of the globe to ensure performance, uptime, and a good user experience on the front end, no matter where that is.
But websites aren’t the only thing you need to monitor. People want to monitor their infrastructure and cloud architecture, and that has many different layers which all have to be monitored as well. So we added server monitoring capabilities into the same product, followed by application performance and real user monitoring. The latter would let you know who is the user, where they are accessing it from, and how they are experiencing your platform.
When cloud services started picking up, we added our Cloud Cost Management tool to help businesses avoid overspending.
As a business, being transparent is very important to us. You need to let the users know the status of all your features. So we launched StatusIQ as a subproduct.
With all this put together, Site24x7 is now a full-stack, all-in-one monitoring suite.
Here’s a brief introduction to Site24x7
Who are your typical clients?
Almost all major industries, from retail, to finance to healthcare, to education, to government sectors, will need a monitoring tool if they want to have a website or an infrastructure. Those are the types of industries we work with.
As for our target audience, the person using the product is typically the IT administrator who manages the IT infrastructure, as well as anyone working on the product, such as DevOps and Site Reliability Engineers (SREs). Typically, any person who develops and maintains a website and infrastructure would need a monitoring tool, and that is what Site24x7 is here to do.
What are some of the challenges that IT teams are faced with, and how does Site24x7 help?
The primary challenge of being a SaaS and cloud provider is uptime and availability. This is very important because the industry expects 99.999% availability. Notice that the expectation has risen from 99.9% and now contains five nines.
To get to that level, you need to have all your resources up and running. By resources, I don’t just mean the website, but your entire cloud architecture. It could be your website, which is the end layer, your application layer, or the platform layer, where you have your micro-services, database, and the various components of your platform and infrastructure; It could be your physical and virtual servers, cloud servers, and network firewall switches. All those components have to be up and running. If one of them goes down, the entire system will be affected.
When something goes wrong, you will be alerted immediately. The monitoring happens from 100 different locations across the globe for all the metrics that we collect. That’s the primary challenge that Site24x7 is here to solve.
The second challenge is performance. We are living in a digital era where we want everything to be very quick. There is no point in having all those resources up and running if they are going to perform slowly. People don’t have the patience to sit and wait for a page to load. Likewise, when searching for a product in a search engine, we don’t move past the first page, we just settle with whatever we find on the first page. For all these reasons, performance is very important.
Some studies and surveys suggest that the standard time that people expect a page to load in is two seconds. This figure is measured with microseconds and nanoseconds. If your page is taking 10 seconds to load, you’re outdated. You have to find out where the problem is and fix it.
Site24x7 will help you identify which resources are causing the delay. It could be something in the back end, a database query, a browser issue, or a heavy image on the front end. You need to know exactly what is causing the delay and optimize it. That’s how you improve your application performance.
Of course, being in the cloud means you have to make sure that all your user data is secured, so that is another important factor that you need to monitor. You shouldn’t be waiting for somebody to come and hack your system. You can do it yourself, as a business, by setting up periodic reviews and audits. Self-audit and self-testing are very important things. Businesses should keep this in mind and properly train all their employees about it.
What kind of insights does your platform generate?
The metrics vary depending on the resource you are monitoring. If it is a server, you will monitor the CPU, memory, disk space, and the processes running on the server. If it is an application, the metrics will be different.
Site24x7 auto-discovers the resources, collects relevant metrics, does all the correlations, and alerts users immediately when something goes wrong. Reports and dashboards are available anytime, and we also have an AI engine applied on top of the metrics to take it even further.
Of course, users don’t have to configure everything themselves. The tool will do auto configurations and adjust thresholds accordingly. For example, if you know your website takes five seconds to load, you may not want to get alerted about it. So based on the historical data that we collect, the thresholds can be adjusted for different users. This is brought in using AI capabilities.
In addition, we offer log management. Logs are thought of in the industry as one of the pillars of observability today. Logs across the different data centers have to be collected and monitored.
Cloud Cost Management is also important. Now that cloud is picking up, people just purchase tools without realizing they have made a commitment. Once you’ve given your card information, you will be billed irrespective of whether or not you’re using the tool. You shouldn’t be paying for resources you’re not using. Our Cloud Cost Management tool gives you an analysis of unnecessary spending so you can cut those costs or guide your team as per the use of those tools.
Similarly, when something goes wrong, you don’t want to be getting thousands of tickets saying the site is down. Instead, there can be a public page where they can subscribe and see that something is wrong or there is some scheduled maintenance. That is called a public status page, which is for transparent communication with customers. Rather than people not knowing what is happening, you’re telling them that the issue is being addressed, gaining trust and credibility. Those are some of the insights that the platform can provide.
How does your system interact with third-party applications?
Typically, any ad-hoc integration is possible with Site24x7. We are living in the era of customized products. Nobody wants to just take the product and use it as it is. They want to make some tweaks and build their own dashboard, so customization and integration options are very important. Our tool integrates with all standard third-party providers.
All monitoring tools come with alerting, so when you get an alert, it can be forwarded to PagerDuty or Slack, for example, or if you have some other alerting tools, their alerts can be forwarded to Site24x7.
Similarly, you can use our API support to fetch data for analysis. You can fetch the rest API to build your own client. If people want to integrate with IT service management tools like ServiceNow and other ticketing software, that integration is also possible.
For communication and collaboration, people want to integrate with Microsoft Teams, that is certainly possible. Sometimes people want to integrate with project management tools like Jira, and they want any problem to be raised as a ticket in JIRA, so that is also possible.
You can also bring in metrics from other tools, and you can take metrics from our tool and put it in any other tool that you have and grow as an ecosystem.
What can you tell us about the free version of Site24x7?
Typically, any person who has a website or is into SaaS-based development will need a monitoring tool, and Site24x7 is a tool you can use freely for life. The free version supports up to five websites and has limited resources, but if you are a small provider or you just have a personal blog or website hosted on a public domain, it should be sufficient for monitoring your statistics, performance metrics, access logs, traffic sources, and global performance, and our team is available to help with any request.
What are some typical mistakes that IT/Dev/Ops make, and how can they be avoided?
Site24x7 started as a tool for monitoring Zoho’s services and even today, our entire infrastructure is monitored using Site24x7. Over the years, we’ve learned many lessons from our own IT teams, which are possibly similar to what other businesses experienced as well.
One of the most common mistakes IT teams make when something goes wrong is not going into the root cause of the problem. They just do some trial and error, fix it, and let it run. If you don’t fix the root cause, it is bound to happen again. Immediate, quick fixes are ok if you’re just bringing a website back up and running, but subsequently, you need to have a proper plan to find out why it happened and fix the root cause. That is a common mistake that IT teams make and our tools can definitely help.
The other thing is that some people have a fear of new technologies. They don’t want to embrace the changes that are coming up because they rather stick to what’s familiar to them. If you want to be in line with technology changes, the IT team has to be open to learning new things.
As always, when you’re trying new things, there can be problems. Don’t do everything at once and take small steps. Observe the scene, make sure that it’s not affecting your existing customers, but at the same time, don’t avoid it.
The other typical mistake is over spending. Often, people would go and purchase services with an enthusiasm to try new things that they end up not using. Our tool lets you monitor that proactively to make sure you’re not wasting resources for nothing.
How do you expect the recent developments in user privacy to impact your business and industry?
Being in the cloud means privacy and security are a top priority. Zoho has been a very early adopter of cloud technologies, and user privacy has always been our prime motive. We undergo regular periodic audits, both internally and externally, and conduct rigorous penetration testing every six months.
User privacy is very important if you’re storing customers’ data in the cloud. You have to handle it with utmost care and make sure security and privacy are thought through right from the design phase.
Security is not an afterthought. A lot of businesses make the mistake of first finishing everything off and then think about security and privacy.
When you’re designing a feature in your database, you have to make sure that one user’s data is not visible to the other user. User segregation has to be brought in during the design stages.
If you’re storing data, you need to know which sensitive information you are collecting and ask yourself whether it is all needed. If you’re collecting usernames and passwords, make sure that they are all encrypted using the latest technologies, both at rest and during transit. During transmission, it is only through secure layers that data is transmitted.
At any point in time, even if the system is hacked, the attacker should not be able to make sense of the data. That level of encryption should be there if you want to take care of user privacy. That is how any business should take care of user data so that they don’t compromise on user privacy and security.
Wherever possible, make sure you are getting the explicit consent of the user. There should also be an option for customers to opt-out, and that right has to be respected across all product segments. The entire system, from sales to marketing to product design to product promotions, should all be inter-tied to make sure that the user is at the center, and that their privacy is respected.
All of those things have to come from the organization culture, given that user privacy is important to them. Site24x7 makes it possible.
Which trends or technologies do you find to be exciting these days?
There are several trends that we follow and have implemented into our product. over the past 15 years. We have re-written our product suite multiple times to ensure we are in line with the latest technologies and can pass that benefit to customers.
In some cases, we may not be able to apply certain technologies to the existing product as is. In those cases, the bold decision would be to take it in small steps so that we can adopt the technology without interfering with the customer experience.
One trend that will be very interesting in the coming years is containerization. We have moved from the monolith application architecture to a micro-service architecture comprised of containers.
Kubernetes is also picking up. It is easy to deploy micro-services and containers, but the challenge here is monitoring such an environment because the trend just comes and goes very quickly and is now moving from micro-services to serverless.
Function-as-a-service is picking up, posing even more challenges to monitoring because, at the end of the day, you have nothing to monitor, not even a server. Monitoring such an environment is a real challenge, but the industry is moving towards them nevertheless and we all need to adapt.