List of Figures
Figure 3.0.1 Feature extraction. VIII
Figure 3.0.2 Event processing. VIII
Figure 3.8.1 Basic Architecture of Bayesian Enabled Network XI
The Problem
Cloud Network security is very significant when it comes to online data and transactions. Attackers often examine vulnerabilities within the cloud system. There are malwares and viruses that can be easily detected and those that cannot be detected easily. With advance technology and increase in computing knowledge, network attackers use sophisticated software that can beat most antiviruses and firewalls hence accessing the system without anyone noticing them (Kumar, 2015). Malicious code also referred to as (malware) often grow larger with varied versions that may differ with existing ones.
In this case, the data that is carried using such devices increases for instance medical records and transactions, financial transactions and other business that are either personal or organizational. Currently, tools that are available to curb malicious programs are numerous though not enough to detect all the newly coded programs meant for malicious use (Kumar, 2015). Since cloud network operates in a heterogeneous network that can be easily be blocked and accessed by an authorized users with the main intention to access business transactions or important documents that are sensitive to organizations or people.
However, National Institute of Standards and Technology describes cloud computing as convenient, ubiquitous and on demand since it is accessed and used by several people. Several computing resources such as servers, networks, applications, services and storage are locally and internationally shared for the betterment of service delivery and improved business transactions. Distributed Denial of Service (DDoS) attacks and man in the middle are among the vulnerabilities that cloud networks and other computing networks go through while in operation (Kumar, 2015). IaaS (Infrastructure-as-a-Service) clouds have programs that are related to zombie problems since users at times access insecure links or websites leading to malicious attacks.
In this paper, the main proposal is to use Bayesian predictive analysis in order to detect unrecognized Malware in cloud network. Bayesian predictive analysis entails graphs, networks and calculations to take note of any intrusion that had not been detected with firewalls or anti-viruses and other software measures taken by individuals and organizations. Using dynamic Bayesian graphs to monitor attacks, mitigation and risk assessment are very important in maintain system security, reliability, integrity and productivity. PaaS (Platform as a Service), IaaS (Infrastructure as a Service), SaaS (Software as a Service) are applications that run on hybrid, private and public cloud models (Kumar, 2015).
Even though cloud computing comes with several benefits, it is necessary to note that programs and data requires proper security against malicious attacks. Big data problems associated with store and cyber-attacks can be protected against detected intrusions and prevented from known attacks or malwares. However, network security does not provide a way to prevent or detect unknown malwares and attacks. In this case, it is still unfortunate that cloud computing does not offer 100% protection or prevention from data compromise and integrity related to datacenters.
In this paper, the necessity of building a structured security model using Bayesian Predictive analysis for cloud network against unrecognized malware. Digital possessions must be well protected from threats that swiftly evolve and duplicates to attack as many machines as possible. It is quite important to take note that malwares are often stored in hidden files that sometimes it’s difficult to prevent or detect using current measures put in place by service providers (Patel et al., 2013). Technology grows everyday necessitating the need for more sophisticated but simple to use analysis in order to detect or prevent unknown Malwares that are used by attackers to access vital information online.
There is need to explorer more research on network security since there is need for a Malware predictive system that can be used to protect cloud networks from any possible attacks. Bayesian analysis commonly relies on path knowledge based on data flow and previous attack patterns. Paths are weighted and values obtained in percentage are used to estimate and predict both known and unknown Malwares and other vulnerabilities such as viruses.
1.1 Justification for Research
Malware infection recorded for any computer is considered to be troublesome meaning if the same infection is to be recorded for a cloud network, it can be a disaster. Based on signatures, known malwares can be stopped by antispyware, antivirus and intrusion detection system but the same cannot be used for unknown malwares. Therefore, this research explains the Bayesian analysis and algorithm can be used to predict unknown malware for cloud network.
Problem Statement
Breaching a cloud network is like breaking into a central bank and collecting all the money stored in the shelves. Several people lose billions every year to online fraud and data lose with the main channel being online transactions and communication facilities. Cybercrime is an international offence that is treated like any other crime creating room for revolution and adoption of new skills by attackers to avoid being arrested or noticed when they breach internet regulations. Most malware attacks often take different code patterns that cannot be detected using signatures approach or using any antispyware available today. Actually, most malwares are detected a month or so after a great damage has been experienced due to symptoms posted during internet operations.
Data Breach Investigation Researchers came across over 200 compromised records and over 1000 attack incidents in the year 2013. Malicious software are used to code malwares purposely to disrupt computer operations and to gather vital information being transmitted online for malicious use (Patel et al., 2013). Warms and Trojans are the commonly known malwares as compared to different versions of viruses. Most programmers come up with software that is coded to form harmful bugs also known as malware. Several techniques have been employed but it’s not enough to detect and correct the damages caused by malwares, it is prudent to prevent networks and computers from damages caused by malwares. The best damage is no damage at all, with this principle in mind, Bayesian analysis is used to predict malware occurrence before they are detected by the system.
1.3 Research Objectives
The main objectives of this proposal are threefold. Chiefly, to predict possible malware attacks using Bayesian analysis in windows applications. Secondly, to analyze forms of malwares and current measures already taken to detect intrusions from the system (Patel et al., 2013). Last but not least, to illustrate new architecture that is slightly different from the current architectures already designed for preventive measures.
1.3.1 Research Questions
These are the main research questions:
1: Can Bayesian analysis be accurate in predicting network security breach without interfering with normal data flow with the availability of network and security controls such as DNS and AD?
2: Can Bayesian analysis quickly predict possible attacks to allow preventive measures to be done in good time before the actual infection?
1.3.2 Feasibility of Malware Classification
Malware classification is conducted by using a simple behaviour –based SVC working under heuristic approach that analyzes both the code and its behaviour. In this approach, the signature or network services are determined making it easy to determine a proper vaccine to prevent malware from infecting new devices and treating the already infected devices. Malware programmers usually have specific targets hence modify the signatures in order to hide from the already existing tools meant for detecting or preventing malware attacks. Therefore Bayesian theory can be used to analyze any form of malware signatures and codes for any specific reason and alert the network admin.
1.3.3 Feasibility of Infection Prediction
Predicting malware attacks or virus infection from a computer or a network requires a lot of analysis. The signatures, codes and logs must be properly analyzed before they are terminated by the admin. Various algorithms have been used previously to classify the malware before termination but it does not stop programmers from coming up with hidden malwares that resemble normal network flow hence becoming difficult to prevent possible attacks (Kumar, 2015). Bayesian can be assessed by its ability to identify malware reinfection in the network and identify various malware species. If those two measures are taken by Bayesian analysis, there will be zero infection fulfilling its purpose.
1.4 Scope of the Study
The study is purely based on Bayesian analysis used in predicting the unknown malwares in a cloud network or a computer system. In the study, the capabilities of detecting codes and signatures used by programmers with an intention to breach cloud network security is properly analyzed using Bayesian graph (Kumar, 2015). The results obtained are to be used to come up with a prediction system that can be adopted by individuals and organizations across the world. Bayesian analysis rather than linear analysis is run in order to predict malware occurrences using the event patterns and behaviour. DNS data, file system data, active directory data and big data are all added to the analysis for testing purposes.
Literature Survey
Bayesian inference especially in time model often provide out-of-sample and exact predictive distributions that are evaluated using five alternative asset models (Patel and Srivastava 23). Comparison of inherent Bayesian gives a likelihood of future occurrences. Bayesian evaluates and compares five alternative performance based on the distributions of assets models. Scholars predicted possible evolution of codes used by Software some being destructive while some are less destructive (Jansen 10). Vulnerability assessments are done also on androids since they also form part of the cloud network and there are possibilities that hackers may be interested in all network activities for malicious activities.
Droid-Checker is used to check leakages within the android applications and network systems. Intrusion detection systems in the recent past have received substantial attention both within in corporate world and the academic environment. However, signature-based malware detection is commonly used for known viruses (Jamil and Hassan, 2672). The main challenge of using signature based detection in current network is that many at times it cannot be used to identify newly created malware in its first appearance. Using an IDS algorithm, one can be able to recognize zero day infections in a system within a timely manner.
In cloud network, predictive analytics has been used occasionally thereby not becoming a new idea. However, several literatures on analytical predictive is same as other forms of statistics and analysis conducted on systems with the main difference varying from the tools used in performing the same (Brodkin 2). There are three main content types: aggregation, data collection and model formation aid in predictive analysis. The literature submits that the above mentioned three themes are vital predictive model’s lifecycle (Zissis and Dimitrios 587). Considering the fact that model formation heavily relies on statistics methods for instance neural networks and logistics regression. Though, it covers the entire lifecycle of the topics overlap and a predictive model.
Malware often attempt to create more malicious code within the digital memory. The memory access trend can be checked using the algorithm before and after the dumping access point. Several attacks are mutated even after the intrusion is detected and removed. In such cases IDSs (Intrusion Detection Systems) are modified in order to match and blend the codes using their corresponding behavior. Conventionally Antivirus products often stop Malwares through identification of unique codes. Nonetheless, hackers who are also programmers have new ways of spreading their malicious code without being detected using Web 2.0 since it has an ability to add more content to user. Dynamic Code Obfuscation helps in hiding java script exploits after it is combined with polymorphism (Martínez, Gustavo and Andrés 132).
Bayesian algorithm assigns distributions to events based on occurrences during data collections. Bayesian techniques can be applied using Bayes' theorem in distribution immediately after obtaining the data. In cloud networks, mail security is granted when both malicious and spam payloads are identified and prevented. Firewalls cannot provide 100% security of the network. Currently, apart from firewalls, IPS-IDS systems are used in identifying malware attacks over the networks for both known and unknown (Dillon et al. 65).
Documented research indicates that network security is far much poor in most organizations than is represented in the news outlets. This can be attributed to inadequate individuals with the required skills to stop the attacks from swiftly evolving and mutating. Malware attacks are normally derivatives of past successfully launched attacks (Zhang, Lu Cheng, and Raouf 16). The malware coders are well aware of their target; they understand that software programmers just like any other human beings are bound to make mistakes. Furthermore, they tend make the same mistakes repeatedly- making the process of exploitation of vulnerabilities quite simple once they are detected. Accordingly, if network attacker detects that a certain part of the network has been penetrated with a particular byte-pattern, they would frequently create several variations of this pattern to be exploited in the future to gain an entry into the network (Pymand Martin Sadler 185). Big corporate and government networks are prime targets for malicious malware attacks from black-hat hackers, hostile government agencies and harmful NGOs (Gelman 34). Despite the fact these networks are fantastically built and much complex, each of the components used in the network such as user, application, data source and sensor still increase the threat surface for the malware attacks. Experts contend that simplifying the network complexity is not the solution for network vulnerabilities. Current network technologies such as anti-malware, firewalls, etc are not able to stop the most lethal types of zero-day attacks.
Proposed Methodology
In the methodology, data is collected for analysis, specification and identification of known and unknown malwares and using Bayesian algorithms to predict the occurrences of the malwares considering their behaviour and past events. Bayesian identifier determines the uncertainty and probabilities of malicious codes and their signatures even if they are hidden from other anti-spywares (Pym and Martin Sadler 185). Cloud network covers a very large area and if there is any attack, a greater damage can be reported. The below figure 3.0.1 shows the feature extraction using mobile and a computer. As well as figure 3.0.2 showing how events are processed.
Figure 3.0.1 Feature extraction Figure 3.0.2 Event processing
3.1 Data Collection
In order to get a set of data ranging from big data, file system and active directory data, a simulated environment must be set . Using an Operating system such as windows, event logs can be traced along the traffic flow of data then analysis commences immediately. Oracle and SaaS applications can be used to provide big data while mobile applications provide other data for Bayesian analysis. Both clients and server are both at risk since attackers are only busy with information flowing through the network.
3.2 Comparison of Classification Algorithms
The investigation conducted is to determine the whether MS is available at the traffic log and event of (A) environments as compared to (B) and (C) environments. The logical units are categorized into two levels; operating as host (H) and as event (E). Depending on the types of events available, A and B attacks can be recorded differently then compared logically. The mathematical representation; p (H is CjO) + p (H is DjO) = 1 where p (H is DjO) is (A) and p (H is DjO representing (B) (Patel et al., 2013). Historical data can be sort and tabled using (CPT) Conditional probability table. Future predictions can be expressed as; p (Q | Mk, D) p (Mk | D) where posterior distribution of quantity Q is p (Q | Mk, D), model data Mk, data D and posterior probability is p (Mk | D) (Kumar, 2015). Once the results for A and B have been tabled, it is important to use the same criteria with different values for the purpose of averaging. According to Appendix A, the classifications stated have various limitations even though they have different features and approaches.
3.3 Comparison of Prediction Algorithms
Predictive fashion algorithms can be measured in two ways; trial MS where a set of O is used to represent reinfection of any previously identified and the other measure is where trial MS sets O before the algorithm training representing zero day malware that is yet to be identified. The two can be compared in order to trace the trends of the attacks from the previous attacks and the new attacks with possible attacks. In order to evaluate the level of reliability with predictive algorithms, it is important to take into consideration the possibility of both true and false.
3.4 Algorithm Types
There are different types of algorithms that can be found for the purpose of prediction and classification problems. They include; support vector machines, nearest neighbor classification, principle component analysis, parametric Bayesian analysis, discriminant analysis (quadratic & linear) and decision tree . The algorithms have already been implemented routinely as open source meant for research community. During investigations, effective algorithms are selected.
3.5 Bayesian Classifier training
There are three classifications of Bayesian models where a set of samples totaling to 2000 comprises over 1000 malwares were used in the analysis as illustrated in Appendix B. Testing and evaluation is supposed to filter large amounts of data inform of apps. Bayes theory states that feature vector where class C is given as;
in this formula, estimated probabilities include and
Therefore the cloud network represented by vector begins with
Considering Bayes rule; Bayes Rule: =P(h|D)=PDhP(h)P(D), where P(h|D)is the probability of relevant events put into account. P (h) is the prior probability that has been directly associated with hypothesis denoted as (h).
3.6 User Systems
Considering several user systems built up for the purpose of information exchange and to some it form part of business transactions for example Jumia and olx websites. Non0standard programs including Skype, twitter, Facebook and malware sites are commonly affected by viruses and malicious interests from programmers. User’s actions can easily trigger attacks since programmers mutate behind attractive sites such as porn sites and nice photos that prompts them to click without thinking twice (Mell and Grance 162). With this in mind, it’s an area that is supposed to be mapped for Bayesian analysis for future predictions.
3.7 Basic Architecture
With cloud networks, several systems are used in accessing the internet, they include; smart phones, laptops computers, desktop computers and Wi-Fi enabled systems. In this case, the basic architecture of the network is very simple provided two servers are installed with Bayesian algorithm. The figure 3.8.1 below illustrates the basic architecture.
Figure 3.8.1 Basic Architecture of Bayesian Enabled Network
3.8 Intruder Systems
Penetration testing is conducted to exploit the possibilities of existing vulnerabilities in the system. Cloud networks especially virtual is developed based on network traffic protocols that ensures the system is protected from intrusions. Servers are paired in order to increase their efficiency and usability (Zissis and Lekkas 231). Statistical analyses are conducted on collected data to find the real time correlations. However, SIEM must be tuned in order to predict possible attacks once the new system is loaded with Bayesian algorithm alongside new guidelines. Finally, as attacks mutate, the system learns from every attempted attack to enable them predict future attacks based on previous attacks supported by Bayesian analysis. The algorithms are designed to predict cloud network attacks immediately new malware are launched. SIEM based on Bayesian algorithm changes as the malwares therefore programmers are not able to understand its patterns since it changes with changes in intrusions.
3.9 Current Architecture
The current Architecture does not have the capability of predicting anomalies more especially unknown malwares. Considering the fact that current architecture is able to prevent and detect known malwares does not mean the network cannot be attacked or repeatedly attacked by the same malwares (Velte, Velte and Elsenpeter 113). It is only prudent to understand that antispyware, firewalls and filtering methods undertaken by organizations can still be penetrated undetected by malwares due to growth in technology.
4.0 New Architecture
The new architecture is formed in such a way that uses previous attacks to predict possible attacks. Currently, programmers may develop unknown signature and allows it to access the internet without network administrator having known about of the same. In the new system, Bayesian graph is used to analyze previous occurrences for known malwares as well as understand new codes that run through the system. In this case, the new codes are newly developed malwares that are reported for the first time. The introduction of Bayesian into the system does not men firewalls and already installed antispyware are replaced, it adds value to the cloud network where possible attacks are predicted before they happen. The function analyzes data and codes and if there are unknown codes, it stops and informs the network admin of the same.
References
Patel, Krunal and Srivastava, Rohit. Classification of Cloud Data using Bayesian Classification. Available at: <http://www.ijsr.net/archive/v2i6/IJSRON20131090.pdf>, 2013.
Zissis, Dimitrios, and Dimitrios Lekkas. "Addressing cloud computing security issues." Future Generation computer systems 28.3 (2012): 583-592.
Velte, Toby, Anthony Velte, and Robert Elsenpeter. Cloud computing, a practical approach. McGraw-Hill, Inc., 2009.
Ramgovind, Sumant, Mariki M. Eloff, and Elme Smith. "The management of security in cloud computing." Information Security for South Africa (ISSA), 2010. IEEE, 2010.
Mell, Peter, and Tim Grance. "The NIST definition of cloud computing." (2011).
Gelman, Andrew, et al. Bayesian data analysis. Vol. 2. London: Chapman & Hall/CRC, 2014.
Guan, Qiang, Ziming Zhang, and Song Fu. "Ensemble of bayesian predictors for autonomic failure management in cloud computing." Computer Communications and Networks (ICCCN), 2011 Proceedings of 20th International Conference on. IEEE, 2011.
Pym, David, and Martin Sadler. "Information Stewardship in cloud computing." Grid and Cloud Computing: Concepts, Methodologies, Tools and Applications: Concepts, Methodologies, Tools and Applications (2012): 185.
Zhang, Qi, Lu Cheng, and Raouf Boutaba. "Cloud computing: state-of-the-art and research challenges." Journal of internet services and applications 1.1 (2010): 7-18.
Rimal, Bhaskar Prasad, Eunmi Choi, and Ian Lumb. "A taxonomy and survey of cloud computing systems." INC, IMS and IDC, 2009. NCM'09. Fifth International Joint Conference on. Ieee, 2009.
Dillon, Tharam, Chen Wu, and Elizabeth Chang. "Cloud computing: issues and challenges." Advanced Information Networking and Applications (AINA), 2010 24th IEEE International Conference on. Ieee, 2010.
Brodkin, Jon. "Gartner: Seven cloud-computing security risks." Infoworld 2008 (2008): 1-3.
Zissis, Dimitrios, and Dimitrios Lekkas. "Addressing cloud computing security issues." Future Generation computer systems 28.3 (2012): 583-592.
Jansen, Wayne. "Cloud hooks: Security and privacy issues in cloud computing." System Sciences (HICSS), 2011 44th Hawaii International Conference on. IEEE, 2011.
Jamil, Danish, and Hassan Zaki. "Security issues in cloud computing and countermeasures." International Journal of Engineering Science and Technology (IJEST) 3.4 (2011): 2672-2676.
Martínez, Cristian Adrián, Gustavo Isaza Echeverri, and Andrés G. Castillo Sanz. "Malware detection based on cloud computing integrating intrusion ontology representation." Communications (LATINCOM), 2010 IEEE Latin-American Conference on. IEEE, 2010.
Appendix A
Appendix B
Top 20 permission requested out of 1000 malware samples