Juniper Networks 23 rd September 2009 Duration: 00:45:39 Presenter Dale Bender Operator: Thank you for standing by and welcome to the Managed Services Fault Isolation with Juniper AIS conference call. At this time all participants are in a listen-only mode. There will be a presentation followed by a question and answer session at which time if you wish to ask a question you will need to press *1 on your telephone. I must advise you this conference is being recorded today, Wednesday 23 rd September 2009. I would now like to hand over to your speaker today, Dale Bender. Please go ahead and thank you sir. Dale Bender: Thank you, Jenny. Hello everyone. Welcome to the presentation. So our topic today is Juniper Advanced Insight Solutions. Let me start my camera and voice. So welcome again. So Advanced Insight Solutions, these are the topics that we will cover. So first of all what is AIS and how does it work? Also some of the benefits to MSPs in particular I will cover. And then I will do a demo, I will show you what automated incident management looks like where we can streamline the process, wherein you can get technical support using automation and smart systems with JUNOS. And then I will talk about specifics in how AIS can be used by partners and MSPs and then I will go into a bit more deeper context for select topics and then we will have a Q&A. So first, what is Advanced Insight Solutions? So it is a support automation platform that was developed by Juniper leveraging JUNOS technology as well as other software products and services. And it streamlines the detection, isolation and resolution of network faults and incidents. So it is all about speeding up the process of getting technical support and getting the timing right. Collecting the information that is needed to solve problems at the right time when they occur, so some of the features are immediate automated notification to JTech personnel. So again this is getting to the finish line faster; getting the information that JTech or that 1
internal technical support personnel need to help solve the problem, which then allows determining the route cause to happen more quickly. We have seen up to 300 times faster just by cutting out some of the manual processes involved in doing diagnostics and troubleshooting. Also there are other capabilities of AIS that go more towards preventing issues from happening in the first place. So rather than just dealing with problems as they come up, this technology allows us to help you prevent issues in the first place. All products that run JUNOS are supported, including the T Series, TX, J and MX, EX, SRX they are all included. So what I want to show now is a typical scenario or a typical set of steps involved in trouble shooting an incident when it occurs. So at the top you see there is a console alarm. So somehow you are notified that there is an issue. The typical process then involves consulting a play book or a run book or some process document, establishing the severity, a plan for troubleshooting verifying that the issue has occurred, logging into OSS system into the device itself to collect relevant information. Centralising that information, doing some escalations potentially, beginning diagnostic procedures; eventually then escalating to Juniper, opening a case with our JTech and then interacting with them et cetera. So if we consider this to be a typical process without AIS with AIS we have the ability to really shrink this down significantly and we do this again through automation and intelligence systems. The end result is reduced complexity, so less fewer steps means less complexity, increased control and predictability. So the steps that are left are fairly predictable and can be replicated. And finally reduced time to resolution and reducing the time to really being productive in resolution activities. So another way to look at AIS benefits is how the technology can eliminate some of the guess work that occurs. So when you are running your network, when you are trying to keep your network performing very well and with very little downtime, these are some of the questions that typically our customers and partners have. So what undiscovered incidents are happening on the network, what is it that I am not seeing? What do certain log messages mean? JUNOS, you know tens of thousands of different log messages can occur in the log file. You know what do they mean? What information must be collected to work a case? So particular type of issue, hardware, software related issues, some resource utilisation, initialisation error, ASIC issue, what information do I 2
need to collect specifically to help solve this problem? Should a log message require a case? So which log messages really indicate that there is a troubling condition occurring? How should a case be prioritised and escalated? So when I see trouble, how serious is it and how should I go about escalating it? What will be the impact of changes on the network? So changing a version of JUNOS, changing configuration, reconfiguring the hardware and the chassis and all these things can have impacts on the network. Understanding what those impacts are, is important and how to check inventory for EOL upgrading planning et cetera. So all these questions AIS can help with. So if we look specifically at three buckets of benefits to partners and MSPs; on the OPEX side through this automation and through intelligent systems we can cut some of the time needed for tech operation folks to do their job. This leads to faster time to root cause definition and then restoring the network. AIS also fits in with your centralised OSS or your centralised network management infrastructure. And yes there is less investment required for level two support engineers. So by the time the incident is detected with AIS, the information is presented, the severity priority, description of what happened, why it happened and this all time savings directly for the level one, level two support engineers. On the sales side this technology really can differentiate you from your competitors with customers. We have seen this around the world with partners and MSPs that we have worked with, this message that I am delivering to you today does resonate with end customers as well. Also we can speed time to market. When you are rolling out new products or a new service, AIS can help flatten the learning curve that is involved in your tech support folks getting up to speed on the technology, the new features, how to troubleshoot, how to interact with our JTech et cetera. Increased SLA compliance, improved visibility into your customers networks or into your service and improved customer satisfaction. On the customer experience side, there is less downtime, better stability, predictability, reliability and also better responses to problems when customers report them. So let me delve now a bit into the components of AIS. The first one is the network element itself, so this can be a switch, a firewall, a router, any of the products that run JUNOS. So AIS depends very heavily on the 3
JUNOS operating system. So from the very beginnings of JUNOS there was built in manageability capabilities and those now are represented in a variety of ways. One of those is what is called a JUNOS script embedded infrastructure. So what JUNOS scripting allows customers to do is to write scripts and install those or load those on to the devices to automate manual operational tasks. So AIS leverages this existing JUNOS infrastructure for supporting these operational and events scripts and that is how the intelligence comes about. The Advanced Insight scripts that we write that are part of Advanced Insight solutions are written by our ex JTech engineers now they are full time developers. And they know very well the types of conditions that customers report that should be attended to. That should get the proper level of attention from our tech support organisation. They also know what type of information should be collected for each and all that knowledge, all that JTech experience is put into the scripts themselves. So the next one down, the Advanced Insight Manager, the AIM application: This is a web application, a java web application that runs on your network. So we provide the software, you install it on the server and this is your management point. So from this point you can view incidents, take action, collaborate internally, communicate to Juniper. Juniper can communicate to you. You can communicate to your customers. So this is really the customer visible, the partner visible component of AIS. Finally there is what we call a Juniper Support System. So we have built centralised applications and services that allow your Advanced Insight Manager to connect directly to our CRM systems. So this for instance allows you through using AIS to submit a case with Juniper with just one mouse click. So I will describe that in more detail. So let s take a look now at a scenario and how AIS works for automated incident management. So here you can see that there is customer network, your network and Juniper represented in a simplistic way here. So when an event occurs the scripts that are installed on let s say a router provide that router with an understanding of what to do. So the router then collected all the troubleshooting information that is specific to that type of event, packages it all up in an XML file called a Juniper Message Bundle or JMB for short. And then automatically transfers that file over the Advanced Insight Manager application. So all this happens very quickly after the event has occurred. So the Advanced Insight Manager 4
then when it sees this new JMB it processes it and it sends out notifications. So it will send an email or an SMNP trap to whomever you decide should receive that information. That person then can just click a link in the email and they can see the information about the device, the condition that occurred, what to do next, the severity priority and all the troubleshooting data that was collected. So that is all now at their disposal to make a decision as to what to do next. If they choose to open a case with Juniper then they just click a button in the web user interface that then tells AIM to go ahead and connect to Juniper Support Systems and upload all that information that was collected into a case. A case ID is then returned and shows up in Advance Insight Manager. On the preventative side we use the same infrastructure. So what that means is we still leverage these Advanced Insight scripts running on the JUNOS devices. We leverage the AIM application on your network and the Juniper Support Systems, but the information that comes through is not specific to an issue or an event but rather just periodic snapshots that are taken of each device. Very detailed information about the identify, the system status you know are the processes running, any type of errors. I mean just lots of information about what is going on in that device and what that device is. So that information, if you chose, can flow through to Juniper and then we can cross reference data from other knowledge assets that we have, other information that we have with information about those devices. So examples: our bug database. So if you consider that a known of issue of bugs is typically specific to certain versions of software and potentially even features that are running. So the bug would only been seen if for instance you are running BGP on a router. So if we have that information within our bug tracking system, we can cross reference that with the versions and the product platform and the features and the hardware inventory of your devices to match the two up and understand what potential issues you could be exposed to. The same is true for the EOL/EOS database. We can take this identity information about devices, cross reference it with our EOL/EOS database information and understand what the EOL/EOS status is of all the thousands of parts that you can have in your network. And there is more to come too, we are just starting to really build out these capabilities. Again the whole point is to help you prevent issues in the first place and to understand your exposure et cetera. 5
A couple of quick slides and then I will go into the demo. This is a case study that we did with Telefonica. They are a service provider in Europe. They are in 30 countries. They an important customer of Juniper s and they agreed they have been using AIS for a couple of years or a year and a half or so. And they agreed in fact to let us talk about their experience. I will just go through this quickly. It just demonstrates directly from them some of the points that I was making earlier. So they believe they were able to improve their network support to their customers through the automation, through the time savings in getting a productive point where they can troubleshoot. This then allowed them to reduce their time to resolve and OPEX, less time, less complexity, less back and forth between their tech ops and our JTech organisation. Then from process perspective, they have seen the benefit of getting the right information at the right time. And finally proactive resolution, they have been able to solve problems of their own with the information that is collected without even having to open a case with Juniper. So now I suspect this is hard for you to see, so I will describe it. This is just to set up the demo. So if you can imagine that you have AIS deployed and in this particular case it is for a router, this is an incident that has occurred on a specific device and it is an RPD/OSPF neighbour down. So this is a protocol issue. So you can see the description there, OSPF adjacency with the indicated neighbouring router was terminated, a local router no longer exchanges routing information with or directs traffic to the neighbouring router, et cetera. So there is information here describing what happened, the default priority is a two which is a high priority. So this incident should receive some attention and so if you can imagine that your tech ops person then clicks on the link that you see in the email which will take them to the AIS application. And then I will show what they would do from that point. So I will switch over to presentation mode; bear with me this will take a few seconds. So hopefully now you can see my desktop. This is Advanced Insight Manager. So when the level one or level two operator clicks on the link in the email they are taken here to the Advanced Insight Manager web application. And they see here AIS incidents. So here we see that incident that has just come in. It is a protocol error and that is a 6
priority two by default, it is M7I platform, this is the host ID, this is the time it occurred. The status is initial which means that nothing yet has been done, no action has been taken. If I click on the link for the synopsis this takes me to the next level of detail and provides me with additional options. So now I can change the priority from high to critical or to medium depending on the role of this device for instance in my network. You know the level of redundancy I have, et cetera. So let s leave it at high, status is initial. I can also flag this incident to other users within my organisation so if I have a counterpart on a different shift and I think he should know about this, I can flag this incident to him. I can assign ownership, et cetera. So these are all collaboration capabilities within Advanced Insight Manager. But now based on what I see here I decide to go ahead and submit a case with Juniper. So I click the submit case button. When I do that the Advanced Insight Manager is connecting to Juniper and it is uploading the information that I am going to show you in just a minute which is all the troubleshooting information that was collected. So within Juniper s ticketing system, a case now is created and it is going to be picked by our JTech organisation. The difference between an AIS case and another case that you would open over the phone or using the web with our JTech is that all of the information is there already. All the information that was collected at the right time that they need to be productive is already there. So the next thing I will show you is I will click on view JMB. So the Juniper Message Bundle, this is the XML file that was collected from the router in this case at the time that the incident occurred. So here I can see some of the information that I viewed in the other levels of detail, the other pages within Advanced Insight Manager. But there is more here. There is Master Routing Engine information; if a core file was created I would see the stack trace of the core but this incident did not result in a core being created. So now I can see the chassis inventory, I can see what installed cards and PICs and other components are installed on the chassis. You can see the serial number, part number, description. The next section we call trend data and for all the separate systems that are running the routing engine, the [FPC]s, we characterise the state of the system. So you can see here, these parameters, these counters and metrics are all collected. So you can understand what the conditions were 7
on the device at the time the event occurred. So the same is true for all these components; the PFE system, the kernal, it is all here. The next section is called attachments. The attachments are show commands and there is typically about 20 show commands that are run and which ones are run vary depending on the product platform and also on the nature of the issue. If it is a hardware issue, a software issue, different CLI commands will be run and collected. So here is a show of log messages. You can see it here. So these are the logs from the device. Let me skip down a bit and I will show you more. Okay so here are some additional ones. Show system virtual memory and show system boot messages; here is PFE stats error, et cetera. This is show system processes extensive. So this information now has all been uploaded to JTech and it is also here at my disposal if I am a tech support engine and I am interested in doing some diagnostic myself. Let me get back to incident manager now. And we can see that the status is still submitted. So the process is still underway. It takes usually two or three well three or four minutes from the time that the submit case button is clicked to the point at which a case has been created and an updated case ID has been returned. So very soon the status will change from submitted to updated with a case ID. If I click here then a new browser window will open and I will see the case in Juniper s case manager application which is how customers and partners view cases that are opened with Juniper. So there is more to Advanced Insight Manager but I wanted to focus on this automated incident management use case and now I will go back to the presentation. Okay, so here we are. Now the next topic I want to cover is AIS for MSPs and Juniper partners. When AIS the current version of software is 1.3, it has been available for about a year, coming up on two years and we have got about 60 or 70 customers and partners who are using it. And from the very beginning, we built in features and capabilities that are specific to partners, specific to how partners run their networks, how they interact with their customers and MSPs of course as well. So what I will show you are the deployment options. I will describe how you can deploy AIS in different ways, 8
depending on your relationship with your customer or depending on how you are offering services to your customers. So the first is this traditional way that I have been talking about so far. So if you can imagine that you have two customers, customer A and customer B. You can deploy the Advanced Insight Manager centrally within your support centre and have these Juniper Message Bundles from your customers forwarded to the AIM in your premise. So for MSPs this is typically an interesting option because in fact if you are offering turnkey services to your customers and you are responsible for managing and monitoring and troubleshooting and dealing with issues that may occur, then this is likely an interesting option for deploying. So in this case the Juniper Message Bundles are transferred using secure FTP or secure copy from the device to the AIM server. It is always a push model, an AIM server never connects to the device itself. Once the JMBs are seen by the AIM server then the interaction with Juniper is what I have been describing. AIM opens a connection, an HTTPS connection with services.juniper.net which is the service that listens for communication from AIM devices. And again AIM none of our personnel, none of our systems or applications ever connect inbound to the AIM server. AIM always connects outbound. The next option is where you may have a bit of a different relationship with the customer, or in fact it is just not possible for whatever reason to get JMBs from the customer s network to your Advanced Insight Manager. So in this case the Advanced Insight Manager is deployed on the customer network. There is role based access control capabilities within Advanced Insight Manager that allow you to share or not access with customers. So just because the Advanced Insight Manager is on their network, on the customer network, doesn t necessarily mean they are an AIM user. Also you can access AIM server of course using a web browser, so as long as you can have web access, HTTPS web access to the AIM server, then you can control it remotely. The next, this model is what we call a partner proxy model and AIM can be licensed in such a way that it changes from what we call a standalone Advanced Insight Manager to a partner proxy Advanced Insight Manager. What that means is that the AIM actually becomes a proxy between AIMs at the customer site and Juniper. So this allows you to provide the 9
capabilities and the visibility and the ability to collaborate and take action. It allows you to provide this to your customer, given them the option to escalate issues to you, allowing them to participate in the management process. But it keeps the control with you. So you own the relationship with the customer. None of the actions that are taken by the customer go directly to Juniper. In fact Juniper isn t involved at all directly with the customer in this case. So if there is an incident, the customer submits a case. They submit it to your AIM and then you can choose to escalate it to Juniper. I have a slide that goes into this in a bit more detail. But this model is of interest again when you re providing technical support more as a partnership with the customer rather than as an outsourcer. So the customer isn t outsourcing all of their technical operations to you, rather you are an escalation point for them. So real quick and this is fairly straightforward, so when in partner proxy mode, when an issue occurs it shows on the customer s AIM server. They do what I showed in the demo, they review and decide to escalate to you. Over HTTPS their AIM connects to your AIM and the incident now is opened with you. So compared to the direct model, now you are the Juniper in the direct model, you are taking that role. You have access and the JMB is available to you to view, the end customer can establish the priority before submitting to you. All the things that I demonstrated are still true, except you are playing that role. If you choose to escalate to Juniper, then it looks traditional from the perspective that the Advanced Insight Manager connects to our Juniper Support Systems, uploads the message bundle, a case is created, a case under your site ID. So this is a case for you, not the end customer. We don t even know who the end customer is in fact. And then when a case ID is returned, it is returned to your AIM server et cetera. So you are playing the role here of owner of the technical support relationship with the customer. So at this point I will drill down into a couple of subjects that are generally interesting and really show the power of Advanced Insight Manager and Advanced Insight Solutions. So the first is the role of the scripts. So to try to demystify a little bit how the automated incident management works and how a router can know, not only when an issue is occurring but also why and what the severity is and what information to collect that is specific to that type of event that will help in troubleshooting efforts et cetera. It starts with JUNOS as I mentioned earlier and specifically our implementation of structured sys log. So all the different subsystems in 10
JUNOS, so here we see Chassis D but there is RPD, there is Cost D, there is many processes that are running. Each is creating these messages continuously and these messages can be provided to a sys log server, or we can do other things with them and that is what Advanced Insight Solutions does. So specifically these structured sys log messages and also other unstructured events are set continuously to let me go to the next slide to a management daimon or process called Event D. So these events, these messages some are informational but some are interesting. There is about 300 or so that we have instrumented an example here RPD/LPD session down that indicate a troubling condition is occurring they provide some warning as to an unstable condition. So with our scripts we can tell essentially Event D which of these types of conditions that occur should result in a JMB being created and then again what information to include in the JMB. So that is how it works. When we load our Advanced Insight scripts on a router, switch or firewall, also installed is what is called event policy and that event policy essentially tells Event D what to do and when. What types of events and what options do we have for triggering JMBs, for creating JMBs, for collecting information at the time that event has occurred. So we can use the event name, we can use count, we can correlate different events. So there is a lot of different kind of business logic we can use depending on what the situation is to better understand what is happening. So we have a lot of tools at our disposal to implement to really cover a lot of conditions that should be escalated. So I am not going to go into this in detail. The point here is that when you install the AI scripts it is a fairly straightforward process. You have to make a relatively small change to the configs and then you have to commit the script bundle. But the AI scripts running on a router, switch or firewall, there is no code there is no additional code. I mean the scripts are just files. There are no additional processes running. There are no additional resources used by the device to enable AIS. So we leverage what is already there, we leverage what is already in JUNOS. So this is not an agent, this is not additional software that you install. These are just script files that sit in a directory and then are called by Event D and Event D is always running. 11
Let s talk a little bit more about the Juniper Message Bundles. Juniper Message Bundles as I showed you they have these three sections. The first is called the manifest and this is the basic data, show version, show chassis hardware, show firmware. So this is the identity of the device. Then there is the event data, problem description, priority severity synopsis, the trend data. And the trend data depends what we collect and this is all built into our scripts, what we collect is specific to the type of device and also what cards and how the if it is a chassis based product chassis has been configured. So it is not just a generic set of commands that we run. The scripts are intelligent in that they know the product platform, they know the version and they run the appropriate commands. And again the attachments are the show commands. So trend data, show chassis, routing engine, show chassis FPC, SEB, so all the boards and cards are characterised. And also the JMB data structure allows for easy adds and deletions of data elements. So this is how we can accommodate all the different products and platforms and cards, is the way we structure the file. It is very flexible. So the last topic I wanted to cover before we get into questions is a little more about Advanced Insight Manager. I mentioned that it is a web application, so it is a java application with a JBOSS it runs on JBOSS application server, MySQL database. It runs on Solaris Red Hat Enterprise. It has role based access control capabilities, I think I have slide on that in a minute. It allows you to set up user accounts with different permissions et cetera. It has got logging, other traditional capabilities of a network management related application. So for notification, notification is set up in Advanced Insight Manager with what we call event or I can t think of it now, I will think it in a minute but the reaction policy. So with reaction policy you can define when you want to be notified under what condition, so do you want to just be notified when a new incident comes in, do you want to be notified when a case is updated by Juniper? Do you want to be notified when a new informed message which we didn t talk much about is received, so under what circumstances do you want to be notified, how do you want to be notified email or SMS or SMNP trap and these sorts of things. So it allows you to really customise the way in which the Advanced Insight Manager notifies you when certain things happen. 12
I showed you already an email, so this just shows the information that is contained within an email. If it was an SMNP trap, the SMNP trap would contain similar information. The one thing I didn t mention is that the Juniper Message Bundle is attached to the email in XML form. So the person who receives the email, if they are not for whatever reason able to access Advanced Insight Manager, they can open up the file in an XML editor and look at all the information. It is very usable the way that the XML is structured. It is very easy to pull information out using an XML editor. Data security and confidentiality; I mentioned some of the protocols that we use. They are all secure protocols, HTTPS, secure FTP, secure copy. It is always a push model. The router pushes the JMB to AIM. AIM connects upstream to JSS. You can set filters on what information you share with Juniper, so configuration typically if it contains information you can apply filters there. And also all the information that is shared with Juniper is stored in its original I mean in the filtered format and the original format so you can audit et cetera. So that is we are about out of time. So Jenny if you could take us into the Q&A? Operator: I will indeed. Thank you, Mr Bender. We will now begin the question and answer session. If you wish to ask a question please press *1 on your telephone keypad and wait for your name to be announced. Well there are no questions at this point, Mr Bender. Sir, please continue. Dale Bender: So I want to again thank everyone for attending. I appreciate your time and if you are interested in more information you can find it on Juniper.net and also there are some links that you will see as we go into the next part of the process here that will provide you more information. So again, thank you very much and have a great day. Operator: With many thanks to our speaker, Mr Bender, today that does conclude our conference. Thank you all for participating. You may now disconnect. 13