From 1000/day to 1000/sec

Transcription

1 From 1000/day to 1000/sec The evolution of our big data system Yoav Cohen VP Engineering

2 This Talk A walk-through of how we built our big-data system 2

3 About Incapsula Vendor of a cloud-based Application Delivery Controller Web Application Firewall Load- Balancing CDN & Optimizer DDoS Protection 3

4 How does it work? 4

5 Modeling Web-Traffic 1. First request to a website starts a new session 2. Subsequent requests are part of the same session 3. After being idle for 30 minutes the session ends Session 1 starts 10:03:01 GET Session 1 request 1 10:03:10 GET Session 1 request 2 10:03:12 GET Session 1 ends Session 2 starts 10.35:05 GET 5

6 The Data A stream of messages in Google Protobuf format msgtid: ype: SESSION_MESSAGE_CREATE siteid: 7 starttime: clientip: ****** countrycode: "US" entryurlid: visitorid: "7e59c804-f a0df-35d9b02eb747" useragent: "Incapsula Site Monitor - OPS" visitorclappid: 209 requeststarttime: responsestarttime: responseendtime: sessionid: urlid: request_id: querystring: "" postbody: "" statuscode: 200 serialnumber: 1 content_length: 6350 protocol: HTTP requestresult: REQ_CACHED_FRESH... 6

7 The Problem Transforming the stream of messages to readable data Processing throughput Read performance Scalability Session 1 starts Session 1 request 1 Session 1 request 2 Session 1 ends Session 2 starts? 7

8 Architecture 8

9 System Evolution Gen Gen Gen Gen

10 Gen 1: Code Name rtproc 10

11 Gen 1: OLAP Cube A text book solution Time x IP x Country x! # requests, # attacks, dimensions counters Slice and dice to answer any question (how many attack from Germany in Jan-2010?) select sum(number_of_attacks) from Attacks where site_id=140 and country_code= DE and time > and time <

12 Gen 1: OLAP Cube Loading data for individual attacks requires joins: 12

13 Gen 1: Analysis Generic solution Very big tables Overly complex (lots of moving parts) Processing Read Scalability 13

15 Gen 2: Code Name rtprocng Main problems to solve: > Read Performance > Simplify New approach: > Count things on the edge instead of centrally > NoSQL model to improve read performance (no joins) 15

16 Gen 2: Simpler Design 16

17 Gen 2: Stats NoSQL Storage One document per day, containing all the data to build the charts Read performance improved (one lookup for all charts) Can even load parts of the data (MongoDB feature) {"_id" : "7_ ", "pageviews" : [ NumberLong(2369), NumberLong(2380), NumberLong(2520), NumberLong(5651), NumberLong(2912), NumberLong(3357), NumberLong(3723), NumberLong(3301), NumberLong(3092), NumberLong(2984), NumberLong(3791), NumberLong(3069) ], "humsess" : [ NumberLong(213), NumberLong(258), NumberLong(298), 17

18 Gen 2: Events NoSQL Storage One document per session, containing all its actions Lookups are easy (no joins) Searches use MongoDB indexes (OK but not great) { "_id": , "start": { "$date": " T10:19:00Z" }, "cc": ["CA"], "securityflags": ["rid4"], "badbot": true, "prxy": [226], "clappt": 1, "actns": [ { "reqres": 10, "u": " "attack": [ { "loc": 1, "acode": 0, "act": 7, "rid": 4, "more": 0, "atype": 314, "hidden": false, "match": "", "pval": "" }... 18

19 Gen 2: Python Processor Batch process: > Process the files in the directory for up to X minutes > Flush to storage and exit How to achieve good processing throughput? > Cache objects in memory > When processing messages, update object in memory > When process finishes, flush all the objects from memory to storage 19

20 Gen 2 Storage Bottleneck Single DB for all sessions Reality check: > MongoDB coarse-grained locking (lock per DB server) > When batch process flushes, UIs are stuck (lock prefers writes) > Dropping old data impossible > Fragmentation caused excessive disk usage 20

21 Gen 2 Storage Re-Factoring Single DB! DB per day > Drop DBs that are X days old Live sessions! Live DB Dead sessions! per-day DB > 0% fragmentation in per-day DBs > Daily maintenance of Live DB (but it s relatively small) DB locking not resolved (later MongoDB versions have lock per DB) 21

22 Gen 2: Analysis Simple and scalable MongoDB is easy to get started with > Over time TCO increases Reached batch processing limits Processing Read Scalability 22

24 Gen 3: Code Name Graceland Main problems to solve: > Faster, online processing > Better search capabilities New approach: > Multi-threaded Java-based processor: - Faster protobuf library than python - Keep objects in memory for longer periods of time and reduce flushes to storage > Lucene for search > A DB we can understand and control 24

25 Gen 3: Design 25

26 Gen 3: Multi-Threaded Java Processor One reader thread reads the files and distributes the data between the workers Workers process the data > Load object from cache > If not in cache, load from storage > Update object > Flush to storage - Periodically - On certain events 26

27 Gen 3: Cache Design Design goal: large cache, but not all in JVM heap Layered LRU cache (extends LinkedHashMap) One layer is the map, backing layer on tmpfs or disk 27

28 Gen 3 Stats Storage ( Segmented Storage ) Binary file per day Keep recent files separate, archive older files pbz pbz pbz archive.pbz archive.pbz

29 Gen 3 Stats Storage (Segmented Storage) Files are served via nginx Clients keep cache 29

30 Gen 3 Events Storage Tried different DBs: > LevelDB, KyotoCabinet - Storing the raw session data inside the lucene index - Index memory footprint grew (all the session data got memory-mapped) > LevelDB, KyotoCabinet - Couldn t get these to work reliably > Cassandra - Rule of thumb: if your DB has its own conference, you need a DBA - We felt it s easier to write our own than read the docs 30

31 Gen 3 Events Storage ( Indexing Partition ) A partition (directory) per-day, containing: > Lucene index of sessions > Big file with sessions in it Same approach as in Gen 2 for live sessions: > Live sessions! Live partition > Dead sessions! per-day partitions > 0% fragmentation > Complicates searching a bit > Live partitions require cleanup or re-building 31

32 Gen 3 Events Storage ( Indexing Partition ) Searches are more efficient: > Search requests are served directly from index > Session data is loaded only on-demand, and via nginx using HTTP Range header 32

33 Gen 3: Analysis Good processing throughput Good read performance Reaching JVM issues (big heap) Processing Read Scalability 33

35 Gen 4: 2015 Based on Gen 3 Distribute work to more than one system > One data server in each POP (> 20 POPs) > Each POP processes and stores its own data > Upload processed outputs to central servers or search on all POP servers 35

36 Summary It is equally important to understand how your system works as it is to understand every other aspect of your business At some point we realized it s better for us to build our software from scratch than use off the shelves products as black-boxes: > We need to find people who know the products - Which is crazy since we tried tons of them over the last 4 years > We usually have less requirements - Who needs multi-dc replication since day 1? > We prefer coding it than reading documentations and stackoverflows - Then we can hack it in the middle of the night if needed - It s way more fun (at least for the developers ) 36

37 Questions? 37

38 Types of Data Statistics just numbers, used for charts, billing, etc. 38

39 Types of Data Events in-depth information, used for forensics and research 39