Most Read This Week
Why Rule-Based Log Correlation Is Almost a Good Idea... (Part 5)
Performance tolls - why you can't correlate 100% of your logs
By: Gorka Sadowski
Jan. 10, 2012 06:00 AM
Performance Tolls - Why you cannot correlate 100% of your logs...?
A correlation engine works really hard, even when dealing with a limited set of scenarios:
So the correlation engine needs to be fed very carefully, don't give it more than what it can chew or it will essentially run out of resources and die.
Knowing which logs need to be part of scope is an important part of tuning a correlation engine.
No, you cannot ask your correlation solution to manage all of your logs. It's not designed for that. Managing only the most critical ones is already a daunting task for it.
Correlation load for one simple correlation rule over one hour
Assumptions - at that time, we have:
o 5 logins/sec average from local logins
o 5 logins/sec average from VPN logins
o 1 000 events/sec average total infrastructure
o Logs kept for a total of 1 week - for reporting etc.
Total data space of 1000*3600*24*7 = 604 800 000 events
For each local login event - which is 5 times per second
o Look in the VPN login events - for the past 3600 seconds - and check if that same person logged in through VPN
Total data space of 3600*5 = 18 000 events VPN logins
Total of 18 000 * 5 = 90 000 checks per second
Size of that 1 hour data space in which to perform the 90 000 checks
o 1000 events/sec * 3600 = 3 600 000 events
So, for this one correlation rule:
90 000 database reads and checks per second
While at the same time doing 1000 record writes and inserts per second
Correlation load for one complex global rule over one day
The assumptions are then:
Same 1 week = 604 800 000 events
1000 * 3600 * 24 = 86 400 000 events
Number of tests = 100 times per second, look into each record in the 1-day sliding window data space
100 * 86 400 000 = 8 640 000 000 tests per second
That's 8 billion reads per second!!!
Imagine having to enrich this 1 correlation rule with geolocalization information, or somehow putting a dynamic dimension to it.
Imagine having 100's or 1000's of correlation rules, what would be the impact on number of database reads and load?
This is just not practical, and you cannot always solve this problem by throwing more hardware at it.
Did you know that APT attacks can last weeks and months? Stay tuned for what this means for static rule based correlation...
Subscribe to the World's Most Powerful Newsletters
Today's Top Reads