ARTICLE
Year : 2011 | Volume
: 57 | Issue : 5 | Page : 413--422
Algorithm for Web Server Security
Brijendra Singh, Pooja Agarwal Department of Computer Science, University of Lucknow, Lucknow, Uttar Pradesh, India
Correspondence Address:
Brijendra Singh Department of Computer Science, University of Lucknow, Lucknow, Uttar Pradesh India
Abstract
The Web application layer is the number one target for malicious online attacks. Millions of Web sites regulate access to highly sensitive information including social security numbers, credit card numbers, names, addresses, birthdates, intellectual property, financial records, trade secrets, medical data, and more. These data must be rigorously protected from intruders. To reduce the risk of losses, brand damage, theft of intellectual property, legal liability, and fines, enterprises need timely information about how Web sites are penetrated and how they can be defended. This article presents an algorithm to generate the security report. The proposed algorithm uses the experimental data of 69 different Indian research/educational Web sites and the generated security report took place after checking all the possible prospective of obtained experimental attributes dataset. This article also presents the changes in the security settings of the Indian Web sites within 3 years, on the bases of open port«SQ»s study of 3 years experimental result. Based on port scanning, preventive security measures can be taken by organizations as a policy matter.
How to cite this article:
Singh B, Agarwal P. Algorithm for Web Server Security.IETE J Res 2011;57:413-422
|
How to cite this URL:
Singh B, Agarwal P. Algorithm for Web Server Security. IETE J Res [serial online] 2011 [cited 2013 Jun 18 ];57:413-422
Available from: http://www.jr.ietejournals.org/text.asp?2011/57/5/413/90150 |
Full Text
1. Introduction
In the modern Internet, manually reviewing each networked system for security flaws is no longer feasible. Operating systems, applications, and network protocols have grown so complex over the last decade that it takes a dedicated security administrator to keep even a relatively small network shielded from attack.
To avoid security risks in real world, we need to use network scanning on our system. Network scanning involves using a port scanner to identify all hosts potentially connected to an organization's network, the network services operating on those hosts, such as the file transfer protocol (FTP) and hypertext transfer protocol (HTTP), and the specific application running the identified service, such as WU-FTPD, Internet Information Server (IIS), and Apache for the HTTP service. The result of the scan is a comprehensive list of all active hosts and services, printers, switches, and routers operating in the address space scanned by the port-scanning tool, i.e., any device that has a network address or is accessible to any other device [1],[2],[3],[4],[5] .
Port scanning is one of the most popular inspection techniques attackers use to discover services they can break into. All machines connected to a Local Area Network (LAN) or Internet run many services that listen at well-known and not so well known ports. A port scan helps the attacker find which ports are available for their attacks. Essentially, a port scan consists of sending a message to each port, one at a time [6] . This technique consists of sending a message to a port and listening for an answer. The received response indicates the port status and can be helpful in determining a host's operating system and other information relevant to launching a future attack. This paper is to analyze and characterize port scanning traffic. By defining a set of heuristics and applying them to the network trace data, we were able to isolate suspicious packets and group them into sets of scans. These sets were further analyzed to extract properties of the port scanning traffic and to collect relevant statistics. Band on port scanning, preventive security measures can be taken by the organization as a policy matter.
1.1 Port Scanning Techniques
Once you have identified an active host, you can attempt to identify the ports and services running on that host by performing port scanning. Port scanning sends a request to solicit a reply from ports on a target computer [7] . There are many different types of port scanning techniques. Most of them can be loosely categorized as the following:
Connect scan: Connect scans perform a full TCP three-way handshake and open a connection to the target. These scans are easily detected and often logged by the host. If a TCP port is listening and not fire walled, it will respond with a SYN/ACK packet, otherwise the host responds with a RST/ACK packet.
Half-open scan: A half-open scan does not complete the full TCP three-way handshaking. It is also referred to as a SYN scan. With a half-open scan, when the scanner receives a SYN/ACK from the target host, implying an open port on the target, the scanner immediately tears down the connection with a RST.
Stealth scan: Stealth scans use various flag settings, fragmentation, and other types of evasion techniques to go undetected. Some examples are a SYN/ACK scan, a FIN scan, an ACK scan, and a NULL scan.
Port scanning solicits a variety of responses by setting different TCP flags or sending UDP packets with various parameters. Both TCP and UDP have 65,536 possible ports (0 through 65,535). It is routine to scan the well-known ports below 1024 that are associated with common services such as FTP, SSH, Telnet, SMTP, DNS, and HTTP [6] .
Fingerprinting: The information gathered during this open port scan will often identify the target operating system. This process is called operating system fingerprinting. For example, if a host has TCP port 135 and 139 open, it is most likely a Windows NT or 2000 host. Other items such as the TCP packet sequence number generation and responses to ICMP packets, e.g., the time to live (TTL) field, also provide a clue to identifying the operating system. Operating system fingerprinting is not foolproof. Firewalls filter (block) certain ports and types of traffic, and system administrators can configure their systems to respond in nonstandard ways to camouflage the true operating system.
It can also identify the application running on a particular port. For example, if a scanner identifies that TCP port 80 is open on a host, it often means that the host is running a web server. However, identifying which web server product is installed can be critical for identifying vulnerabilities [8] . For example, the vulnerabilities for Microsoft's IIS server are very different from those associated with Apache web server. The application can be identified by "listening" on the remote port to capture the "banner" information transmitted by the remote host when a client connects. It can provide a wealth of information, including the application type, application version, and even operating system type and version which have been described with sample 1 in Annexure.
This article describes in five sections. In first one is the Introduction in which we describe the background of the work. In second section, we describe in details regarding methodology that first we perform the port scanning and then applied the proposed algorithm on the obtained dataset from different attributes of given Web site, to generate security report. The proposed algorithm for Web security has been described in section 3, having the details of implementation of the proposed algorithm. Forth section describes the Analysis, which is based on the experimental dataset. We describe the conclusion in section fifth, and the sixth section is the reference.
1.2 Notations
[INLINE:1]
2. Methodology
This article describes the methodology for development of algorithm for web security, which is based on the experimental data of port scanning. Present work is focused on the Web sites of Indian National Institute/research organizations, which we chosen randomly. In this experiment we have taken several different Web sites within 3 years, in first year we worked on 30 different Web sites, in second year we worked on 68 different Indian Web sites, and in third year on 69 Web sites. We applied proposed algorithm on data experimentally collected and accordingly security report is generated by the algorithm.
In proposed approach, generated report is classified into four different categories (based on the metric values).
Group I {NR}(Secure/Host Down)
(Here metric value is NR which shows that either the host is down or secured enough, e.g., sample 2 in Annexure) Group II {0-5} (Normal)
(Here metric value lies in between 0 and 5, which shows that host is having normal settings.) Group III {6-10} (Interesting)
(Here metric value lies in between 6 and 10, which shows host's setting is not proper) Group IV {11-above} (Alert)
(Here metric value lies in between 6 and 10, which shows host's setting required attention)
Here, we applied basic port scanning techniques with the help of the NMAP version-5.21 tool. In this experiment we used PC with the following hardware and software specifications:
2.1 Hardware
core 2 duo processor One GB RAM280 HDD
2.2 Software
Operating system MS window XP SP2Internet connectivity (24 × 7)NMAP is an open source utility to explore the network and to audit the security tools. It scans large networks (even those consisting of hundreds of thousands of machines, claims one of the users) quite rapidly, although it works fine against single hosts. NMAP is free. First identify active hosts in the address range specified by the user using Transport Control Protocol/Internet Protocol (TCP/IP) Internet Control Message Protocol (ICMP) ECHO and ICMP ECHO_REPLY packets. But in this experiment we focused on the TCP port and related protocols [9].
To obtain additional information apart from open ports during experiment of advanced scanning of port scan as:
FingerprintingBanner grabbingTrace routeTCP wrappersIdentify vulnerable services, devicesrDNS recordIdentify SOA A NS MX record
Based on the experiment of port scanning, database from different Web sites has been prepared for various attributes as specified above. We used proposed algorithm, which can interpret the level of security of web server. To achieve security of web server, we required the Web sites address for which we wanted to generate the security report.
As [Figure 1] shows the block diagram of the algorithm, in which to achieve the security report from the algorithm user need to give only the address of the Web site and rest of the work is done by the algorithm itself. {Figure 1}
The logic of the algorithm shows how it works on the different perspectives of the given Web site to generate their security report. In this algorithm, we worked on several characteristics or attributes (according to their scanned value) and then found out the right combination of attributes and then checked it with the existing database's dataset and then apply the rule set on them. So finally it produces the report. [Figure 2] shows the logic of the algorithm in the form of flow chart.{Figure 2}
So in this article, Algorithm has been proposed, which automates the interpretation on the bases of the different rule set which are applied on the dataset reported from the scan as per their attributes or characteristics of experimental scanning results on the bases of classified metrics as specified above.
3. Proposed Algorithm for Web Security
As we mentioned earlier that proposed algorithm used the experimental scanned data, this section describes the proposed algorithm on experimental Web site's scanned data in detail.
Once a port is discovered, a network scanner may perform additional examination to determine the actual version of the service running on the open port. As with host discovery, port scanning is also subject to intervention by routers and firewalls, thus port responses may be dropped. Response time is also an important aspect in this scanning. In our observation we also find out the relative factor in scanned details.
Here each reported attribute set h = {Rtime,, proto, srcIP, Oport, Fport, Cport srcPort, destIP, destPort,OS, Device, Latency, rDNs, MX, rebot………}, where the Rtime attribute reflects the occurrence time of the result and the proto attribute identifies the network protocol for the traffic. The srcIP, srcPort, destIP, and destPort attributes describe the source IP address, source port, destination IP address, and destination port of the traffic, respectively.
In each scan as a dataset D(h) = {(proto, Oport),(proto, Oport, rDNS), (proto, Oport, rebot), (proto, Oport, OS), (proto, Oport, OS), (proto, Oport,rDNs,OS) ……………………}. Which is the combination of attributes.
H(g) is a dataset of all the g attributes from existing database.
A dataset of features Fe_Rset contains all the possible combination of different feature rules set, i.e., Fe_Rset = {R(proto, Oport),R(proto, Oport, rDNS), R(proto, Oport, rebot), R(proto, Oport, OS), R(proto, Oport, OS,), R(proto, Oport,rDNs,OS) ……………………}.
Working of algorithm has been described in various steps as given below:
h is set of all attributes obtained from the scanning.For all resulted attributes h in D(h) that have Did = id, count the occurrence times.In order to filter, remove some irrelevant combinations and set c (c is a set which contains the details of the remaining combinations). Check the reported dataset D(h) with the existing dataset H(g).If the above estimation procedure does not yield a result, it is repeated for the (proto, destIP, destPort), (proto, srcIP, destPort), and (proto, destPort) attribute combinations (in the given order), until one of the procedures returns a result or the last procedure terminates.During the analysis, we are focusing on protocol and destination-related attributes and have excluded the srcPort attribute altogether, since source attributes are usually associated with attackers and can have a wide range of possible values (especially srcPort).
In Annexure, sample (1, 3-4) shows the experimental scanning data from which we obtained the set of attributes (h) and dataset D(h) that we generate after the processing was over.
3.1 Algorithm
Input: A-an address of web server (e.g., 192.168.1.1)
Output: Generate report
function func(){ Randomize (h)Discretization D(h)for each D(h) in H(g){ if (D(h) matches H(g)) then { Did:= idV=C_dataset(h, D)Select_feset(D, Fe_Rset[])if { V V, Fe_Rset(h) } then generate Report if (does not exist) then return 0 } } }
3.2 Function Description
Functions used by the classifier:
attribute(h) function returns the dataset D(h) of attribute names for the frequent end point set h, e.g., if h contains (proto, destPort) tuples, getattr(h) returns {proto, destPort}; C_dataset(h, D) function extracts the values of attributes AttrSet from the alarm A and creates a tuple from them, in order to search a frequent end point set, e.g., C_dataset((10, TCP, 192.168.1.1, 1234, rDNS,10.1.1.1, 53), {proto, destPort}) returns (TCP, 53).Select_feset(D, Fe_Rset(h)) decide which feature rule set go for check.
4. Analysis
We have applied the proposed algorithm over 69 Web sites of different Indian research/educational institution during experiment. Here present the dataset of only one prospective to show the working of algorithm but the report took place after checking all the possible prospective. [Table 1] shows the tabular presentation of the processed dataset and arrange them in four different groups (on the bases of the open ports), Port details (in %), rDNS (in %), corresponding Latency's mean for all four classified groups with standard deviation, standard error, and 95% confidence interval. We also had taken the Response Time and Service details with their values. {Table 1}
Latency: It is the amount of time a message takes to traverse a system.
Response time: It is the method that involves measuring and calculating the amount of time nodes are active processing and sending frames, as well as the amount of time that frames spend traversing the network.
Reverse DNS (rDNS): It is a method of resolving an IP address into a domain name, just as the domain name system resolves domain names into associated IP addresses.
Service details: Mail services, FTP server, name server, Application services, etc.
Here we can see that maximum (%) of open ports is found in Group 4 (1300%) and minimum in Group 2 (275%), and Group 3 has 745.45% and Group 1 is NR. We know that it increases the possibility of threats in Group 4, so it is justifying the classification.
In the case of the rDNS, the results are not totally matching the pattern of the previous one, but margin is comparatively low, in this, tabular data show minimum in Group 2 (66.7%) and maximum in Group 3 (90.9%). Group 4 has 86.7%. So it shows some contradiction in maximum values as per their classification.
In the case of the Latency, again the results match the pattern of open ports. In this, tabular data show minimum mean value in Group 2 (0.21) and maximum in Group 4 (0.33). Group 3 has 0.22 standard mean. So it is also justifying their classification.
Response time shows the result that also follows the pattern of the rDNS in which tabular data show minimum standard mean is in Group 3 (423.33) and maximum in Group 4 (711.41). Group 2 has 593.43 standard mean values. It shows some contradiction in minimum values as per the classification.
In the case of the Service details, again the results are matching the classification. In this, tabular data show minimum % value in Group 2 (8.33%) and maximum in Group 4 (40%). Group 3 has 27.27%.
It is mentioned in the logic of algorithm, the importance of correct combination of attributes and feature sets because classification of feature set is decide that which set of rules is going to impose on them as per our algorithm. Here classifier threshold are also rebuilt as per the chosen features and accordingly rules set imposed on them.
The graphical representation of the data shown in the table is in [Figure 3], [Figure 4], [Figure 5], [Figure 6] and [Figure 7], respectively; these graphics show the data of DNS, Ports, Services in percentage, and mean of response time and latency followed by the standard deviation. {Figure 3}{Figure 4}{Figure 5}{Figure 6}{Figure 7}
In [Figure 4], the graphics show the data of rDNS. The experimental % value shows that the highest point of rDNS exists in Group 3 and lowest in Group 1, but Group 1 shows NR. It means that we found maximum details under Group 3 and minimum in Group 2. It is also important because by using DNS rebinding, an attacker can circumvent firewalls to spider corporate intranets, ex-filtrate sensitive documents, and compromise un-patched internal machines. An attacker can also hijack the IP address of innocent clients to send spam e-mail, commit click fraud, and frame clients for misdeeds [10] .
In [Figure 5], highest peek of latency is in Group 4 and we know that high-latency networks can resist strong attackers who can watch the whole network and control a large part of the network infrastructure. To prevent this "global attacker" from linking senders to recipients by correlating when messages enter and leave the system, high-latency networks introduce large delays into message delivery times and are thus only suitable for applications like e-mail and bulk data delivery most users are not willing to wait half an hour for their web pages to load.
Low-latency networks, on the other hand, are fast enough for web browsing, secure shell, and other interactive applications, but have a weaker threat model: an attacker who watches or controls both ends of a communication can trivially correlate message timing and link the communicating parties [11] .
In experimental data shown in [Figure 6], we found the most service details in Group 4, and more details means more chance of threats.
After experimenting with several patterns, we come to the conclusion that we cannot judge the security of a Web site on just one or two characteristics; the combination of the correct features set is very important to reach on the correct conclusion.
In [Figure 8] and [Figure 9], we represent the graphical comparison of open ports of accessed web sites in all two years and three years respectively. In this comparison we found that some of the Web sites which did not responded previously, had number of ports opened this time. In some cases we saw that Web sites improved their settings and did not show their details as they provided early. The port 80 is found open in maximum number of Web sites and after that port 443(ssl/http) and 21(FTP) were found open. Based on the number of open port, preventive security measures can be implemented, which can help to organize, but it is not easy to comment on the security settings, which has been described by various organization as per policy of that organization. {Figure 8}{Figure 9}
5. Conclusion
In this article, algorithm for web server security has been proposed, which used the experimental scanning results to be performed under specified environment. This article analyzed the effect of change in specified environment and its impact on generated security report. Based on various classification of security groups (1-4) during experiments, we find out the difference in various fields result, e.g., response time, scanning pattern's response, and details of services.
This article describes an algorithm for the web server security and number of ports found open during port analysis. Preventive security measures can be implemented as a policy of the web server security of an organization.
[INLINE:2]
References
| 1 | B Singh, Network Security and Management, New Delhi: Prentice-Hall of India; 2006. |
| 2 | James P Anderson Co., Fort, Washington: Computer Security Threat Monitoring and Surveillance, PA, USA, Technical Report 98- 17, April 1980. |
| 3 | D E Denning, "An intrusion-detection model", IEEE Transactions in Software Engineering, vol. 13, no.2. pp. 222-32, Feb. 1987. |
| 4 | A A C´ardenas, J S Baras, and K Seamon, "A Framework for the Evaluation of Intrusion Detection Systems", Proceedings of the 2006 IEEE Symposium on Security and Privacy (S&P'06) pp.1081-6011, 2006. |
| 5 | C Herringshaw, "Detecting Attacks on Networks", IEEE Magazine Computer Dec. 2007. |
| 6 | B Singh, and P Agarwal, "Study of Security Measures in Indian Websites, Proceedings of National conference on Research and Development Trends in ICT," University of Lucknow, Lucknow, pp. 23- 9, Feb. 2010 |
| 7 | C B Lee, C Roedel, and E Silenok, "Detection and Characterization of Port Scan Attacks," Department of Computer Science and Engineering University of California, San Diego. |
| 8 | P Agarwal, "Overview on the Network Security, Proceedings of National conference on Research and Development Trends in ICT", University of Lucknow, Lucknow, pp. 66-73, Feb. 2010. |
| 9 | A Orebaugh, and B Pinkard, "NMAP in the Enterprise Your Guide to Network Scanning," 30 Corporate Drive Burlington, MA: Syngress Publishing, Inc. Elsevier, Inc.; |
| 10 | C Jackson, A Barth, A Bortz, W Shao, and D Boneh, Protecting Browsers from DNS Rebinding Attacks; CCS'07, Alexandria, Virginia, USA. Copyright 2007 ACM 978-1-59593-703-2/07/ October 29-November 2, 2007. |
| 11 | R Dingledine and N Mathewson, Anonymity Loves Company: Usability and the Network E_ect (Available from: http://weis2006. econinfosec.org/docs/41.pdf [Last cited on 29 March 2010].) |
|