BIG DATAN TUTKIMUS JA OPETUS JYVÄSKYLÄN YLIOPISTOSSA

Size: px
Start display at page:

Download "BIG DATAN TUTKIMUS JA OPETUS JYVÄSKYLÄN YLIOPISTOSSA 4.7.2014"

Transcription

1 BIG DATAN TUTKIMUS JA OPETUS JYVÄSKYLÄN YLIOPISTOSSA JYVÄSKYLÄN YLIOPISTO INFORMAATIOTEKNOLOGIAN TIEDEKUNTA 2014

2 2 SISÄLLYS JOHDANTO BIG DATA JYVÄSKYLÄN YLIOPISTON TUTKIMUKSESSA Tieteellinen laskenta ja data-analyysi Kyberturvallisuus Big data ja SOTE Tilastotiede Bio- ja ympäristötieteet Fysiikka BIG DATA JYVÄSKYLÄN YLIOPISTON KOULUTUKSESSA Sovelletun matematiikan maisteriohjelma Laskennalliset tieteet -maisteriohjelma Web Intelligence and Service Engineering (WISE) -maisteriohjelma Tilastotieteen koulutus Ihmistieteiden metodikeskus (IHME) Big dataan liittyvä opetus lukuvuonna CENTRE OF SIMULATION AND DATA-ANALYSIS BIG DATA -ALAN KEHITYKSEN SEURANTA Data-analyysiin liittyvät väitöskirjat LIITE 1: JYVÄSKYLÄN YLIOPISTON IT-TIEDEKUNNAN OPETUSOHJELMAAN LIITTYVÄT VERKKOKURSSIT LIITE 2: YHTEENVETO BIG DATA -ALASTA JA SOVELLUTUKSISTA JYVÄSKYLÄN YLIOPISTON IT-TIEDEKUNNASSA LIITE 3: BIG DATA - PARADIGM, MAJOR TOPICS AND TRENDS LIITE 4: VERKKO-OPPIMATERIAALIEN SOPIVUUDET IT-TIEDEKUNNAN KURSSIMATERIAALIKSI... 44

3 3 JOHDANTO Tässä raportissa käsitellään Jyväskylän yliopistossa annettavaa big datan ja dataanalyysin tutkimusta ja koulutusta. Jyväskylän yliopisto haluaa olla laaja-alaisesti mukana big datan tutkimuksessa ja koulutuksessa sekä alan kehittämisessä. Data-analyysin asiantuntijan tulee kyetä vaativaan tilastolliseen mallintamiseen ja osata ohjelmointia ja tiedonhallintaa. Näiden taitojen oppiminen puolestaan vaatii pohjakseen matematiikan osaamista. Alan vaativuuden vuoksi tutkijakoulutus on usein tarpeen käytännön työtehtävissä. Ohjelmistojen tekninen hallitseminen ei riitä lisäarvon tuottamiseen, vaan big datan hyödyntäminen vaatii sekä substanssialueen että analyysimenetelmien syvällistä ymmärtämistä. Vain näin voidaan varmistaa prosessien ja tulosten luotettavuus ja käyttökelpoisuus pitkällä aikavälillä. Data-analyysin ja big datan maisteri- ja tohtorikoulutus pohjautuu Jyväskylän yliopistossa vankkaan matematiikan, tietotekniikan ja tilastotieteen tutkimukseen. Jyväskylän yliopiston monitieteinen toimintaympäristö antaa erinomaisen lähtökohdan kehittää uusia data-analyysin ja big datan menetelmiä ja soveltaa niitä eri tieteen aloilla sekä yritysmaailman että julkisen sektorin osa-alueilla. Tässä raportissa kuvataan Jyväskylän yliopiston Centre of Simulation and Data-analysis toimintaympäristöä. Jyväskylässä Dekaani, professori Pekka Neittaanmäki

4 4 1 BIG DATA JYVÄSKYLÄN YLIOPISTON TUTKIMUKSESSA Data-analyysin asiantuntijan tulee kyetä vaativaan tilastolliseen mallintamiseen ja osata ohjelmointia ja tiedonhallintaa. Näiden taitojen oppiminen puolestaan vaatii pohjakseen matematiikan osaamista. Alan vaativuuden vuoksi tutkijakoulutus on usein tarpeen käytännön työtehtävissä. Ohjelmistojen tekninen hallitseminen ei riitä lisäarvon tuottamiseen, vaan big datan hyödyntäminen vaatii sekä substanssialueen että analyysimenetelmien syvällistä ymmärtämistä. Vain näin voidaan varmistaa prosessien ja tulosten luotettavuus ja käyttökelpoisuus pitkällä aikavälillä. Informaatioteknologian tiedekunnassa tehdään kansainvälisesti korkealaatuista IT-alan tutkimusta kahdella laitoksella, jotka ovat tietotekniikan ja tietojenkäsittelytieteiden laitokset. Tiedekunnan tutkimushankkeet liittyvät usein yhdessä kansallisten ja kansanvälisten tutkimuskumppaneiden ja teollisuuden kanssa tehtäviin tutkimus- ja kehityshankkeisiin. Tietotekniikan laitoksen tutkimus perustuu pääosin analyyttis-konstruktiivisten menetelmien käyttöön teknisestä, laskennallisesta, matemaattisesta tai pedagogisesta näkökulmasta. Data-analyysin opetusta ja tutkimusta tehdään tietotekniikan laitoksella osana laskennallisten tieteiden ja ohjelmistotekniikan koulutusta. (Esimerkkejä dataanalyysin teollisista sovellutuksista löytyy Tietojenkäsittelytieteiden laitoksen tutkimuksessa tarkastellaan tietojärjestelmiä ja tietojenkäsittelyä neljästä näkökulmasta: teknologinen, ihmiskeskeinen, liiketoiminnallinen ja informaatiokeskeinen. Nämä näkökulmat muodostavat laitoksen yleisen tehtävän: ymmärtää, kehittää, suunnitella ja hallita tietojärjestelmiä ja tietojenkäsittelyä sekä niiden vaikutuksia kokonaisvaltaisesti käyttökontekstissaan. Data-analyysiin ja big dataan liittyvää tutkimusta tehdään IT-tiedekunnan lisäksi matematiikan ja tilastotieteen laitoksella, ihmistieteiden metodikeskuksessa (IHME) sekä sovelletaan useissa tutkimusryhmissä eri puolilla yliopistoa, mm. bio- ja ympäristötieteet, fysiikka, kemia ja sosiaalitieteet. Seuraavassa on kuvattu lyhyesti data-analyysiin ja big dataan liittyvää tutkimustoimintaa Jyväskylän yliopistossa.

5 1.1 Tieteellinen laskenta ja data-analyysi 5 Tieteellisen laskennan tutkimusaloja ovat matemaattinen mallintaminen, luotettava malli- ja datapohjainen simulointi, optimointi, adaptiiviset ja tehokkaat numeeriset laskentamenetelmät, epävarmuuden huomioiminen numeerisessa simuloinnissa, hajautettujen systeemien säätö, spline ja spline wavelet tekniikat signaalin ja kuvankäsittelyssä, dynaamiset systeemit ja nanoelektroniikan mallinnus. Data-analyysin tutkimusaloja ovat analysointimenetelmien kehittäminen, erityisesti numeriikka ja massiivisen datan luokittelutekniikat, hyperspektrikameran datan analysointitekniikoiden kehittäminen ja tekniikan soveltaminen sen osa-alueilla: solubiologia, lääketiede, ympäristötiede, maa- ja metsätalous, kemialliset aseet, rikospaikkatutkimustekniikka. Lisäksi yhteistyöhankkeita on mm. fysiikan ja aivotutkimuksen alueilla. 1.2 Kyberturvallisuus Tietotekniikan laitoksella tutkitaan tietotekniikkaa teknis-matemaattisesta näkökulmasta. Tutkimuskohteena on informaation käsittelyprosessien tehokas automatisointi. Tutkimuksen painoalat liittyvät informaatioteknologian keskeisiin alueisiin, kuten uudenlaisten tietojenkäsittelysovellusten ja ohjelmistojen suunnitteluun, tietoverkkojen tiedonsiirtojärjestelmien suunnitteluun ja hallintaan sekä tehokasta tietokonelaskentaa hyödyntävien numeeristen ja matemaattisten menetelmien ja mallien käyttöön, esimerkiksi teollisten tuotteiden suunnittelussa, teollisten prosessien ohjauksessa, luonnontieteellisessä mallintamisessa ja suurten tietoaineistojen analyysissä. Laitoksen tutkimushankkeita, joilla on vahva yhteys big dataan, ovat mm: Cyber Attacks Protection of Critical Infrastructures Tutkimuksessa kehitetään innovatiivista tietojärjestelmien turvaamiseen liittyvää menetelmää. Menetelmä tutkii tietomassoista epänormaaleja käyttäytymismalleja ja tekee analyysin pohjalta päätelmiä havaintojen vakavuudesta tietojärjestelmän turvallisuudelle. Hankkeessa tutkitaan teknologioita, joiden avulla voidaan automaattisesti tunnistaa, havaita ja luokitella erilaisia haittaohjelmia. Big Data Analytics - Data-Driven Methods for Cyber Security Tutkimushankkeessa kehitetään automaattisia ja puoliautomaattisia laskentametodeja, analyysityökaluja ja ohjelmistoalgoritmeja, joilla voidaan analysoida suuria tietovarantoja, jotta voidaan löytää ja määritellä tuntemattomia toimintoja ja niihin liittyviä trendejä ja erityyppisten datojen välisiä suhteita mukaan lukien haittaohjelmien havaitseminen. Organizing and analyzing massive high dimensional datasets Tutkimushankkeen kohteena on korkea-dimensionaalisen datan analysointi. Tutkimuksessa järjestellään, klusteroidaan ja luokitellaan korkea-dimensionaalista dataa sekä

6 6 tunnistetaan siitä poikkeuksia ja anomalioita. Tutkimuksessa kehitetään ydinteknologioita haittaohjelmien automaattiseen tunnistamiseen. IT-tiedekunnan big datan sekä perustutkimuksella että soveltavalla tutkimuksella kyetään jatkuvasti tuottamaan korkeatasoisia uusia innovaatioita ja tieteellisiä läpimurtoja. 1.3 Big data ja SOTE Esimerkki 1: Sairaalasuunnittelu Sairaalaympäristö on hyvä esimerkki big datan hyödyntämismahdollisuuksista. Analysoimalla dataa potilaiden, henkilökunnan, materiaalien sekä laitteiden logistiikasta, potilaille tehtävistä hoitotoimenpiteistä sekä resurssien käytöstä, on mahdollista suunnitella sairaalan hoito- sekä tukipalveluprosessit optimaalisella tavalla toteutettaviksi. Jyväskylän yliopiston IT-tiedekunnassa keskitytään sairaalatoimintojen kehittämiseen ja prosessien tehostamiseen. Hyvänä esimerkkinä tästä on Keski-Suomen sairaanhoitopiirin Uusi sairaala -projekti, jossa tiedekunta on ollut mukana jo kahden vuoden ajan: https://agoracenter.jyu.fi/projects/uuden-sairaalan-logistiikka. Esimerkki 2: Sosiaali- ja terveydenhuollon prosessit Pyrittäessä potilaan hoidon kokonaisvaltaiseen optimointiin, pitää tarkastelua laajentaa organisaatiotasolta potilaan koko hoitoketjun tarkasteluun. Tässä yhteydessä big data ajattelu nousee kokonaan uudelle tasolle. Tämän tiedon perusteella luotujen laskennallisten mallien avulla voidaan lähteä etsimään optimaalista tapaa toteuttaa eri potilaiden hoito sekä luomaan ennusteita siitä missä vaiheessa hoitoon pitäisi puuttua, jotta potilaan tila ei ehtisi huonontumaan. Tällä hetkellä laskennallisen prosessianalytiikan tutkimusryhmä on kehittämässä uudenlaista työkalua tämän lähestymistavan konkretisointiin ja big datan tehokkaampaan hyödyntämiseen: https://agoracenter.jyu.fi/projects/remaster. Tämän lisäksi IT-tiedekunnassa on käynnistynyt hanke, jossa tähän kokonaisuuteen liitetään mukaan vielä kustannusmuuttujat ja eri rahoittaja-/maksajatahot: https://www.jyu.fi/ajankohtaista/arkisto/2014/07/tiedote Tilastotiede Tilastotiede vastaa kysymyksiin, kuinka dataa tulisi kerätä, kuinka toimia, kun data on valikoitunutta, ja kuinka epävarmuutta voi hallita. Jyväskylän yliopistossa tilastotieteen tutkimusaloja ovat tutkimusten kustannustehokas suunnittelu, biometria ja ympäristötilastotiede, rakenneyhtälömallit, puuttuvan tiedon käsittely, parametrittomat mene-

7 7 telmät, spatiaalinen tilastotiede ja aikasarja-analyysi, https://www.jyu.fi/maths/tutkimus/tilastotiede. Tutkimusta tehdään läheisessä yhteistyössä muiden tieteenalojen, tutkimuslaitosten (THL, RKTL, METLA, SYKE) ja yritysten kanssa. Yhteishankkeessa THL:n kanssa pohditaan, kuinka väestön terveydentilaa voisi seurata tehokkaasti, kun perinteisten terveystarkastustutkimusten osallistumisaktiivisuus laskee. Yhteistyössä liike-elämän edustajien kanssa on selvitetty, kuinka yritykset, joilla ei ole suoraa yhteyttä asiakkaisiinsa, voisivat määrittää asiakassegmenttiensä arvon dataan perustuen ja käyttää tätä tietoa liiketoimintansa ohjaamisessa (J. Karvanen, A. Rantanen, L. Luoma (2014). Survey data and Bayesian analysis: a cost-efficient way to estimate customer equity. Accepted for publication in Quantitative Marketing and Economics, 1.5 Bio- ja ympäristötieteet Bio- ja ympäristötieteissä, erityisesti bio-kuvantamisen alalla datamäärän kasvu on huikea. Hyväresoluutioinen mikroskooppidata tuottaa ison määrän dataa varsinkin kun pyrkimys nykyään on tuottaa kolmiulotteista dataa ajan suhteen (neliulotteista dataa). Mikroskooppikuvantamista käytetään tänä päivänä paljolti potentiaalisten lääkkeiden testaamiseen isoissa näyteaineistossa tai kun pyritään tunnistamaan mitkä solun omat molekyylit ovat tärkeitä solun fysiologisten tai patologisten prosessien säätelijöitä. Nämä tutkimukset ovat ns. High-throughput-tutkimuksia, joissa lyhyessä ajassa tuotetaan huikeita määriä mikroskooppidataa, joista pitää tehokkaasti ja automaattisesti tulkita tutkimustulokset. Bio- ja ympäristötieteiden laitoksella on kehitetty yhteistyössä oma ohjelmistoalusta BioImageXD (www.bioimagexd.net) joka pyrkii neliulotteisen datan tehokkaaseen prosessointiin, analysointiin ja animointiin. Ohjelmistoa kehitetään lähitulevaisuudessa helpottamaan high-throughput-aineistojen analysointia. Esimerkkinä tutkimuksesta on Kankaanpää et al., BioImageXD - an open general purpose and high throughput image processing and analysis platform for biomedical images, Nat Methods Jun 28;9(7): Fysiikka Fysiikan laitoksella big dataan liittyvää tutkimustietoa sovelletaan mm. syklotroni laboratoriossa, nanotieteissä ja materiaalitieteissä. Jyväskylän yliopiston fysiikan laitoksen kiihdytinlaboratorion (JYFL-ACCLAB) tieteellisenä päätehtävänä on tuottaa perustietoa aineen sub-atomaarisesta rakenteesta tutkimalla eksoottisia atomien ytimiä. Tutkimus on luonteeltaan kokeellista. Tutkimusaineistoja tuotetaan useissa mittalaitteistoissa, joiden käyttöön laboratorion kolme hiukkaskiihdytintä tuottavat ionisuihkuja. Tutkimusprojektien tuottaman datan määrä vaihtelee suuresti, muutamista kilotavuista kymmeniin teratavuihin mittausta kohden.

8 8 Kokonaisuudessaan laboratorion tuottama aineistomäärä vaihtelee vuosittain 30-70TB:n välillä. Projektien elinkaari mittauksista julkaisuun saattaa olla useiden vuosien mittainen, joten aineistoille tarvitaan luotettava keskipitkän aikavälin tallennusratkaisu. Laboratorion tutkijat ottavat osaa myös muissa laboratorioissa (esim. CERN/Isolde, GSI/FAIR) toteutettaviin kokeisiin, joissa tuotettuihin aineistoihin pätevät samat tallennuskriteerit. Aineistot ovat pääosin mittausaineistoja ydinfysiikan kokeellisista tutkimusprojekteista. Aineistojen uudelleenkäyttö ja -analysointi tutkimusalan sisällä on normaali käytäntö. Aineistojen käyttömahdollisuudet muilla tutkimusaloilla ovat rajalliset.

9 9 2 BIG DATA JYVÄSKYLÄN YLIOPISTON KOULUTUKSESSA Informaatioteknologian tiedekunta vastaa kehittyvän informaatioteknologian sekä digitalisoitumisen tuomiin tutkimus- ja koulutushaasteisiin. Tiedekunta yhdistää kokonaisvaltaisesti teknologian, informaation, organisaatioiden ja liiketoiminnan sekä ihmisen näkökulmat niin tutkimuksessa, koulutuksessa kuin sidosryhmäyhteistyössä. Tiedekunta kouluttaa informaatioteknologian laaja-alaisia ja kansainvälisiä osaajia sekä kauppatieteellisellä että luonnontieteellisellä koulutusalalla. Informaatioteknologian tiedekunnalla on keskeinen rooli yliopiston painoaloihin kuuluvan ihmisläheisen teknologian kehittämisessä. Tiedekunnan keskeinen vahvuus on kyvykkyys tarkastella informaatioteknologiaa laajasti, useita näkökulmia yhdistäen ja eri ilmiöiden yhteisvaikutuksia tunnistaen. Tämä yhdistyy kansainvälisesti arvostettuun huippututkimukseen kärkialoilla ja aktiiviseen toimijuuteen ympäröivän yhteiskunnan kanssa. IT-tiedekunta on saavuttanut johtavan aseman laskennallisissa tieteissä, kyberturvallisuudessa, tietojärjestelmätieteissä ja edustaa ainoana IT-alan tiedekuntana kognitiotieteen tutkimusta ja opetusta. Big datan ja data-analyysin koulutusta annetaan IT-tiedekunnan seuraavissa monitieteisissä maisteriohjelmissa: sovellettu matematiikka, laskennalliset tieteet ja Web Intelligence and Service Engineering (WISE). 2.1 Sovelletun matematiikan maisteriohjelma Sovelletun matematiikan avulla pyritään ratkaisemaan tosielämän ongelmia. Sovelletun matematiikan tavoitteena on mallintaa erilaisia ilmiöitä, kuvailla niitä ja yrittää ymmärtää niitä. Sovelletun matematiikan opiskelussa yhdistyy tieteellisen laskennan käsitteet ja menetelmät, joita käytetään kysymyksiin, jotka ilmentyvät matematiikan ja muiden tieteenalojen rajapinnoissa. Jyväskylän yliopistossa opinnoissa keskitytään sellaisiin osa-alueisiin, kuten funktionaalianalyysi, mitta- ja integraaliteoria, kompleksianalyysi, numeerinen analyysi, optimointi ja simulointi.

10 Laskennalliset tieteet -maisteriohjelma Laskennallisten tieteiden maisteriohjelmassa käsitellään jatkuvan ja diskreetin simuloinnin periaatteet ja sovelluskohteet. Tavoitteena ovat jatkuvien simulointimallien tavallisimmat diskretisointimenetelmät ja niiden tehokkaan toteuttamisen perusperiaatteet moderneissa tietokonearkkitehtuureissa ja lisäksi yksi- ja monitavoitteisen epälineaarisen optimoinnin periaatteet ja ratkaisumenetelmät. Opetuksessa muodostetaan tekniikan ja luonnontieteiden ilmiöille matemaattisia simulointimalleja. Opetuksessa käsitellään laaja-alaisesti tilastotieteen, numeerisen laskennan ja ohjelmoinnin käsitteitä ja menetelmiä. Data-analyysissä opetetaan ja tutkitaan menetelmiä ja lähestymistapoja, joilla eritavoin kerätystä tiedosta (data) pyritään muodostamaan malleja ja korkeampaa tai tarkempaa informaatiota. 2.3 Web Intelligence and Service Engineering (WISE) - maisteriohjelma Web Intelligence and Service Engineering keskittyy suunnittelemaan web-pohjaisia sovelluksia, jotka auttavat verkossa toimivaa palveluyhteiskuntaa niin julkisella kuin yksityisellä sektorilla. Maisteriohjelmalla on suora yhteys big data strategian tavoitteisiin, kuten verkossa olevan datan käsittelyssä tarvittavaan "älykkyyteen" ja tämän päälle rakennettaviin palveluihin. On completion of the programme, the graduates will be able to use and design complex self-managed Web-based public and industrial systems, digital ecosystems 2, platforms, services and applications; will be able to connect their designs with publicly available data and Web-based capabilities as services; will be able to figure-out and approach various challenging aspects of wicked problems world-wide, which require self-managed service-based architectures for their solutions; understand and professionally utilize for that purpose knowledge on enabling technologies and tools; perform academic doctoral level studies; will be skilful in international communication due to the integrated language and communication studies. Students, who will graduate from the programme, will think beyond the routine and will be able not just to adapt to a change but to help to create and control it. 2.4 Tilastotieteen koulutus Tilastotiede, jota Jyväskylän yliopistossa voi opiskella sekä pää- että sivuaineena, antaa hyvät valmiudet käytännön data-analyysiin. Tilastotiede vastaa kysymyksiin, kuinka dataa tulisi kerätä, kuinka toimia, kun data on valikoitunutta, ja kuinka epävarmuutta voi hallita. Tilastotieteen pääaineopintoihin kuuluu paljon myös matematiikan ja tietotekniikan opintoja ja tutkinto luo täten erinomaisen pohjan big data -tehtäviin.

11 11 Esimerkkinä big data -koulutuksesta voidaan mainita Jyväskylän yliopiston kesäkoulussa 2013 toteutetun tilastotieteen kurssin "Industrial data science", jonka luennoitsijat edustivat suomalaisen big data -osaamisen huippua yritysmaailmassa. 2.5 Ihmistieteiden metodikeskus (IHME) Informaatioteknologian tiedekunnan panos Ihmistieteiden metodikeskuksen opetuksessa on merkittävä liittyen data-analytiikan koulutukseen ja kehittämistyöhön. Ihmistieteiden metodikeskus (IHME) tarjoaa Jyväskylän yliopiston jatko-opiskelijoille ja tutkijoille tutkimusmenetelmien ja -etiikan koulutusta yli tiedekuntarajojen ja edistää toiminnallaan tieteiden välistä tutkimusyhteistyötä ja tutkimusmenetelmien innovatiivista käyttöä. Big data -lähestymistapa sisältyy teemana IHME:en koulutukseen, jossa lähtökohtana on aineistolähtöisten analyysien hyödyntäminen monimenetelmäisessä ja -alaisessa metodikoulutuksessa. IT-tiedekunnassa tehtävään tutkimukseen perustuvan koulutuksen tavoitteina on alan kehitystä seuraten lisätä eri alojen tohtorikoulutettavien datatietoisuutta, erityisesti ymmärrystä siitä minkälaisia analyysejä ja niihin perustuvia tulkintoja voidaan big datan eri menetelmiin perustuen luotettavasti tehdä, miten big data -lähestymistapaa voidaan soveltaa eri tieteenalojen tutkimuksessa sekä minkälaisia tutkimuseettisiä valmiuksia ja menettelyitä big data - lähestymistapa edellyttää. IT-tiedekunnan muille tiedekunnille antama opetus myös toteutuu IHME-yhteistyön kautta. 2.6 Big dataan liittyvä opetus lukuvuonna Lukuvuoden opetusohjelma valmistuu Ohjelmassa tulee olemaan useita big dataan, datan luokitteluun, laskennalliseen tilastotieteeseen ja tietokantoihin (mm. NoSQL) sekä sovellutuksiin liittyviä kursseja. Lisäksi opiskelijoille tarjotaan mahdollisuus suorittaa verkkokursseja. Liitteenä 1 on lista Jyväskylän yliopiston IT-tiedekunnan opetusohjelmaan liittyvistä verkkokursseista. Päivitetty versio listasta julkaistaan

12 12 3 CENTRE OF SIMULATION AND DATA-ANALYSIS Datan määrä ja asema yhteiskunnassa on radikaalisti muuttumassa: datan määrä kasvaa eksponentiaalisesti jalostettu ja analysoitu data on yhä keskeisempi tuottavuutta ja kilpailukykyä voimistava tekijä datan tuottaminen ja jalostaminen tulevat merkittäviksi liiketoiminnan alueiksi datan perusteella luodun tiedon esittämisen muodot ja keinot monipuolistuvat data-analyysi on yksi voimakkaimmin kasvavista teknologia-alueista suurien datamassojen käsittelystä on muodostunut uusi tieteen paradigma data-analyysi muuttaa merkittävästi digitaalista palvelutuotantoa Tapahtuva muutos antaa paljon tehtäviä tutkimukselle. Toisaalta tarvitaan tutkimusta, joka liittyy datan tekniseen hallintaan, sen siirtämiseen, analysointiin ja jalostamiseen sekä turvallisuuteen erityisesti päätöksenteon tueksi. Tällainen tutkimus tukee kansantalouden kilpailukykyä ja tuotantoa. Toisaalta tarvitaan tutkimusta, joka auttaa ohjaamaan tietoyhteiskunnan kehitystä. Tällöin tutkimuskohteena on inhimillinen näkökulma datan käsittelyyn, sen luottamuksellisuuteen ja yksilön roolista data-analyysin tulosten käytön kohteena. Big data-analyysiä voi lähestyä yleisesti käytettyjen neljän V:n määrittelyjen perusteella: Volume: datan määrä (sekä havaintojen että muuttujien) Variety: datan moninaisuus ja heterogeenisuus Velocity: nopeus jolla dataa syntyy Veracity: datan laatu Suomessa on osaamista mm. lääketieteellisessä tutkimuksessa, mobiiliteknologioissa, peliteollisuudessa ja ympäristömonitoroinnissa, jotka kaikki ovat hyvin dataintensiivisiä ja sen monimuotoiseen analyysiin perustuvia aloja. Suomella on myös vahvaa menetelmä- ja IT-osaamista, jota muuntamalla ja hyödyntämällä koulutuksen, tutkimuksen ja asiantuntemuksen jakamisen kautta saataisiin mukaan Big Data -kehitystyöhön. Big datan hyödyntäminen julkisella sektorilla on vasta alkutekijöissään, mutta tarjoaa suuria mahdollisuuksia niin palvelujen kuin prosessienkin parantamiseen ja tehostamiseen sekä uusiin toimintatapoihin. Suomi on ollut edelläkävijämaita avoimessa datassa ja julkinen sektori on avaamassa tietoaineistojaan. Tätä avoimuuden ja julkisten tietovarantojen saatavuuden kulttuuria tulisi hyödyntää myös Big data -kehitystyössä. Julkisten ja yksityisten data-aineistojen yhdistämisessä ja analyysissä voidaan saavuttaa merkittäviä eri osapuolia hyödyttäviä tuloksia.

13 13 Tehokas big data -tutkimus edellyttää moninaisia eri tieteenalojen tutkimusryhmiä. Tietoa louhitaan ja analysoidaan yhteistyössä muiden kanssa ja aineistolle esitetään yhä uusia kysymyksiä. Tällainen toimintatapa antaa mahdollisuuden ymmärtää laajasti digitaalista aineistoa ja tuottamaan yhä laaja-alaisempia ja tarkempia perusteita, analyysejä ja ennusteita päätöksentekijöiden käyttöön. Big data -tutkimukselle on useita sovelluskohteita. SOTE-alalla Suomella on suuri potentiaali big datan suhteen. Suomesta löytyy maailmanlaajuisesti katsoen poikkeuksellisen laadukkaita ja kattavia tietokantoja. SOTE-uudistuksen myötä on mahdollisuus tutkia, kokeilla ja ottaa käyttöön big dataan perustuvia ratkaisuja. Dataan perustuvista hoitomenetelmistä ja -käytännöistä on jo saatu merkittäviä tuloksia ja hyvillä ratkaisuilla voidaan saavuttaa taloudellisia säästöjä. Big data ajattelutapana (datatietoisuus) ja teknologiana antaa uudenlaisia näkökulmia julkishallinnolle edistää tuottavamman yhteiskunnan ja kestävyysvajeen torjumisen strategisia päätavoitteita, lisäten samalla kansalaisten tyytyväisyyttä julkisiin palveluihin. Big datan avulla on mahdollista realisoida tuottavuushyötyjä useimmilla hallinnon alueilla. Datalähtöisempää julkishallintoa voidaan tarkastella seuraavilla osa-alueilla: datalähtöinen päätöksenteko ja jatkuva organisaatiokehitys kansalaisten digitaaliset julkiset palvelut yritysten ja kansalaisten parempi osallistaminen julkisten palveluiden kehitykseen Teollinen internet (IoT) antaa big data-tutkimukselle useita sovellusalueita, kuten valmistavan teollisuuden prosessit ja niiden optimointi, ennakoiva huolto, energian käytön hallinta, käyttöomaisuuden hallinta ja ennakoiva huolto. Alan tutkimukselle on laajoja mahdollisuuksia myös muualla elinkeinoelämässä, kuten kaupan ja logistiikan alueella, rakentamisessa ja kiinteistöjen hoidossa sekä kunnallisten ja muiden julkisten palvelujen tuottamisessa. Kuvassa 1 on esitetty simuloinnin ja data-analyysin tutkimusympäristö.

14 14 KUVA 1 Centre entre of simulation and data data-analysis analysis

15 15 4 BIG DATA -ALAN KEHITYKSEN SEURANTA Big data -ala ja sovellutukset kuuluvat IT-tiedekunnan strategisiin koulutus- ja tutkimusalueisiin. Alan kehityksestä tehdään puolen vuoden välein yhteenvetoja. Liitteenä 2 on tehty yhteenveto. Päivitetty versio julkaistaan elokuussa Data-analyysiin liittyvät väitöskirjat Data-analyysiin ja big dataan liittyen on lukuvuonna julkaistu seuraavat väitöskirjat: Guy Wolf: Big high-dimensional data analysis with diffusion maps Ilkka Pölönen: Discovering knowledge in various applications with a novel hyperspectral imager Tuomo Sipola: Knowledge discovery using diffusion maps Mikhail Zolotukhinin: On data mining applications in mobile networking and network security Vuoden 2014 loppupuolella tullaan data-analyysin alueelta julkaisemaan viisi väitöskirjaa.

16 16 LIITE 1: JYVÄSKYLÄN YLIOPISTON IT-TIEDEKUNNAN OPETUSOH- JELMAAN LIITTYVÄT VERKKOKURSSIT Päivitetty versio listasta julkaistaan name 1 Statistics One 2 Statistics: Making Sense of Data 3 Statistics 4 Introduction to Statistics: Descriptive Statistics 5 Mathematical Statistics 6 Introduction to Probability and Statistics 7 Statistics for Applications 8 Probability and statistics 9 Probability & Statistics 10 Statistical Reasoning 11 An Introduction to Interactive Programming in Python 12 Introduction to Programming for Digital Artists 13 Creative, Serious and Playful Science of Android Apps 14 Introduction to Computer Science 15 Introduction to Computer Science and Programming 16 Introduction to Computer Science I 17 Introduction to Computer Science and Programming 18 Computer Science 19 Principles of Computing 20 Media Programming 21 Learn to Program: The Fundamentals 22 Ohjelmoinnin MOOC 23 Object-Oriented programming with Java, part I 24 Peliohjelmoinnin MOOC 25 Introduction to Systematic Program Design 26 Algoritmien MOOC 27 Algorithms, Part I 28 Algorithms, Part II 29 Algorithms: Design and Analysis, Part 1 30 Algorithms: Design and Analysis, Part 2 31 Algorithms 32 Introduction to Algorithms 33 Computer Algorithms in Systems Engineering 34 Game Theory

17 17 35 Games without Chance: Combinatorial Game Theory 36 General Game Playing 37 Advanced Algorithms 38 Introduction to Theoretical Computer Science 39 Interactive 3D Graphics 40 Foundations of Computer Graphics 41 Computational Geometry 42 Computer Graphics 43 Computer System Engineering 44 Programming Languages 45 Design of Computer Programs 46 Introduction to Data Science 47 Database Systems 48 Introduction to Computer Networks 49 Computer Architecture 50 Computer System Architecture 51 Artificial Intelligence for Robotics 52 Control of Mobile Robots 53 Cryptography I 54 Cryptography II 55 Cryptography and Cryptanalysis 56 Network and Computer Security 57 Applied Cryptography 58 Computer Security 59 Information Security and Risk Management in Context 60 Malicious Software and its Underground Economy: Two Sides to Every Story 61 Selected Topics in Cryptography 62 Advanced Topics in Cryptography 63 Designing and Executing Information Security Strategies 64 Building an Information Risk Management Toolkit 65 Network and Computer Security 66 Automata, Computability, and Complexity 67 Automata 68 Software Debugging 69 Software Testing 70 Functional Programming Principles in Scala 71 Creative, Serious and Playful Science of Android Apps 72 Computational Methods for Data Analysis 73 Computing for Data Analysis 74 Web Intelligence and Big Data 75 Data Mining 76 Statistics and Visualization for Data Analysis and Inference 77 Scientific Computing 78 Computing for Data Analysis 79 Data Analysis

18 18 80 High Performance Scientific Computing 81 Statistics: Making Sense of Data 82 Computational Methods for Data Analysis 83 Metadata: Organizing and Discovering Information 84 Machine Learning 85 Neural Networks for Machine Learning 86 Machine Learning 87 Computational Neuroscience 88 Dynamical Modeling Methods for Systems Biology 89 Introduction to Artificial Intelligence 90 Artificial Intelligence 91 Digital Signal Processing 92 Introduction to Communication, Control, and Signal Processing 93 Signal Processing: Continuous and Discrete 94 Discrete-Time Signal Processing 95 Digital Signal Processing 96 Signals and Systems 97 Machine Vision 98 Pattern Recognition for Machine Vision 99 Computer Vision 100 Computer Vision: The Fundamentals 101 Computer Vision: From 3D Reconstruction to Visual Recognition 102 Fundamentals of Digital Image and Video Processing 103 Linear and Discrete Optimization 104 Linear and Integer Programming 105 Systems Optimization 106 Optimization Methods 107 Nonlinear Programming 108 Everything is the Same: Modeling Engineered Systems 109 Introduction to Numerical Simulation 110 Introduction to Modeling and Simulation 111 Functional Hardware Verification 112 Introduction to Parallel Programming 113 Heterogeneous Parallel Programming 114 Parallel Computing 115 Theory of Parallel Systems 116 Differential Equations in Action 117 Linear Algebra 118 Coding the Matrix: Linear Algebra through Computer Science Applications 119 Calculus One 120 Calculus: Single Variable 121 Pre-Calculus 122 Visualizing Algebra 123 Web Development 124 HTML5 Game Development

19 Pattern-Oriented Software Architectures for Concurrent and Networked Software 126 CS169.1x: Software as a Service 127 Networked Life 128 Social Network Analysis 129 Videogames and Learning 130 Fundamentals of Online Education: Planning and Application 131 Gamification 132 Live!: A History of Art for Artists, Animators and Gamers 133 Online Games: Literature, New Media, and Narrative 134 How to Build a Startup 135 Startup Engineering 136 Startup 137 Leading Strategic Innovation in Organizations 138 Grow to Greatness: Smart Growth for Private Businesses, Part I 139 Grow to Greatness: Smart Growth for Private Businesses, Part II 140 Design Thinking for Business Innovation 141 An Introduction to Operations Management 142 Developing Innovative Ideas for New Companies 143 Creativity, Innovation, and Change 144 New Models of Business in Society 145 Foundations of Business Strategy 146 Design Thinking for Business Innovation 147 Content Strategy for Professionals: Engaging Audiences for Your Organization 148 International Organizations Management 149 Inspiring Leadership through Emotional Intelligence 150 Critical Perspectives on Management 151 Law and the Entrepreneur 152 Copyright 153 Markets with Frictions 154 An Introduction to Financial Accounting 155 Introduction to Finance 156 Corporate Finance 157 Organizational Analysis 158 Understanding economic policymaking

20 20 LIITE 2: YHTEENVETO BIG DATA -ALASTA JA SOVELLUTUKSISTA JYVÄSKYLÄN YLIOPISTON IT-TIEDEKUNNASSA Muistio Jyväskylä Asia: Big data Lisätietoja: dekaani professori Pekka Neittaanmäki, Luennot Kesä 2013, Jyväskylän kansainvälinen kesäkoulu - Amir Averbuch: TIEJ658 COM6: Advanced Methods for Classification of Big High Dimensional Data (JSS23), 2 op Lukuvuosi Gil David: ITKST47 Advanced Anomaly Detection: Theory, Algorithms and Applications, 5 op - Gil David: ITKST48 Advanced Persistence Threat, 5 op, Advanced Persistence Threat exploitation cycle - Mauri Leppänen: ITKA204 Tietokannat ja tiedonhallinnan perusteet, 4 op - Tommi Kärkkäinen: TIES445 Tiedonlouhinta, 5 op - Oleksiy Mazhelis: TJTSM61 Business Analytics and Big Data Management, 5 op - Michael Cochez: tammikuussa 2015 kurssi nimeltä "Big Data Engineering Väitöskirjat Guy Wolf: Big high-dimensional data analysis with diffusion maps. - Ilkka Pölönen: Discovering knowledge in various applications with a novel hyperspectral imager - Tuomo Sipola: Knowledge discovery using diffusion maps Väitöskirja tekeillä - Limor Gavish: Memcached - nosql and big data databased particularly for caching. Hankkeita - Pekka Neittaanmäki: New System for Cyber Attacks Protection of Critical Infrastructures , CAP-projekti ( , Tekes)

21 21 - Jari Veijalainen: Tiedonkaivuu sosiaalisesta mediasta, MineSocMed-projekti ( , SA) tarkoitus kehittää sosiaalisen median analyysialgoritmeja. - Timo Hämäläinen: Suurien moniulotteisten datajoukkojen järjestäminen ja analysointi, HIDE-hanke ( , Tekes) - Timo Hämäläinen: Kiinteistöautomaatiojärjestelmien datan älykäs analysointi, KIIAUDATA-hanke ( , Tekes) - Amir Averbuch: MeBUD: Methods For Big Unstructured High Dimensional Data ( , haettu rahoitusta SA) Maisterikoulutus Data-analyysin monitieteellinen maisterikoulutus (DATA) on Tietotekniikan sekä Matematiikan ja tilastotieteen laitoksien yhteinen ohjelma. Opetuksessa käsitellään laajaalaisesti tilastotieteen, numeerisen laskennan ja ohjelmoinnin käsitteitä ja menetelmiä. Data-analyysissä opetetaan ja tutkitaan menetelmiä ja lähestymistapoja, joilla eritavoin kerätystä tiedosta (data) pyritään muodostamaan malleja ja korkeampaa tai tarkempaa informaatiota. Data-analyysin maisterikoulutus vastaa muuttuvan maailman tilanteeseen, jossa suurien data-aineistojen automaattisesta analysoinnista on tullut keskeinen työkalu useilla aloilla. Koulutuksen tavoitteena on antaa opiskelijoille data-analyysiin liittyvää erikoisosaamista sekä tilastollisista menetelmistä että niiden soveltamisesta tietokoneisiin Muuta tutkimusta - Michael Cochez: A book chapter called 'Toward Evolving Knowledge Ecosystems for Big Data Understanding' in book called Big Data Computing, 2013, In this chapter we propose the use of ecosystems known from biology to tackle the big data problem. - Michael Cochez: much of my own research is related to big data, mainly the alignment of huge ontologies. - Timo Hämäläinen: Verkkoliikenteen analyysiin liittyvää tutkimusta (myös TIES326 tietoturvakurssilla näitä esillä). Näitä jatketaan uusien datamöykkyjen kanssa (JAMK:n labra etc.). Timo Hämäläinen: Network traffic analysis Nowadays HTTP servers and applications are some of the most popular targets for network attacks. The easiest way to carry out such attacks is to inject malicious code into HTTP request messages. Such intrusive requests can be extremely dangerous since they can corrupt the server or collect confidential information from the server databases. One of the options to detect these attacks is to process all HTTP queries as text lines and transform them to numeric feature matrices, which then are used to find intrusions with anomaly detection algorithms. However, since there can be many different kinds of requests to different HTTP applications for each unique web resource one feature matrix is constructed and analyzed as a rule. Thus, for a huge HTTP server several thousands of matrices would need to be built, which is not efficient from the

22 22 computing resources point-of-view. In addition, it is also difficult to define normal users behavior for resources that have been requested few times. In this research, we considered an algorithm for HTTP intrusions detection based on simple clustering algorithms and advanced processing of HTTP requests which allows the analysis of all queries at once and does not separate them by resource. The method proposed allows detection of HTTP intrusions in case of continuously updated web-applications. The algorithm is tested using logs acquired from a large real-life web service and, as a result, almost all attacks from these logs are detected, while the number of false alarms remains very low. [1] We have also studied online detection of anomalous HTTP requests with Growing Hierarchical Self-Organizing Maps (GHSOMs). By applying an n- gram model to HTTP requests from network logs, feature matrices were formed. GHSOMs are then used to analyze these matrices and detect anomalous requests among new requests received by the webserver. The system proposed is self-adaptive and allows detection of online malicious attacks in the case of continuously updated web-applications. The method is tested with network logs, which include normal and intrusive requests. Almost all anomalous requests from these logs are detected while keeping the false positive rate at a very low level.[2] In addition, real-world network logs were analyzed using dimensionality reduction technique called Diffusion Map (DM). First, some features like n-grams or character frequency was calculated to form a feature matrix. The dimensionality of this matrix was reduced for the purposes of visualization and facilitating subsequent clustering and other analysis phases. Principal Component Analysis (PCA) was used as a comparison, since it s one of the most frequently used methodologies. The main advantage of Diffusion Maps is the fact that it can handle non-linear dependencies in the data, which PCA cannot. DM is used successfully to find actual intrusions from the data, as well as visualizing the structure of the network traffic. Subsequently, a clustering algorithm, such as k-means, can be used to find anomalies and structures in the data. These will help network administrators detect intrusions and other anomalies from a network. Also, a rule extraction algorithm was implemented an d tested. This idea comes from the world of credit card fraud detection. The point is to first analyze and cluster network traffic using methods described previously. The problem is that running dimensionality reduction and clustering algorithms continuously can be computationally expensive. For this reason, conjunctive rule extraction is used to automatically generate signatures that approximate the traffic clustering result. Using these rules is very efficient, and they can be periodically updated completely automatically. This approach combines the accurate clustering of the advanced algorithms and the efficiency of using simple signature rules to classify traffic into normal and anomalous. [A,3,4] [A] T. Sipola, A. Juvonen, J. Lehtonen: Dimensionality Reduction Framework for Detecting Anomalies from Network Logs, Engineering Intelligent Systems, 20(1):87-97, 2012 [1] M. Zolotukhin, T. Hämäläinen: Detection of Anomalous HTTP Requests Based on Advanced N-gram Model and Clustering Techniques. NEW2AN 2013: International

23 23 Conference on Next Generation Wired/Wireless Networking, August 28, St. Petersburg, Russia, 2013 [2] M. Zolotykhin, T. Hämäläinen and A. Juvonen, Online Anomaly Detection by Using N-gram Model and Growing Hierarchical Self-Organizing Maps, IEEE IWCMC 2012, August 27-31, Limassol, Cyprus, 2012 [3] A. Juvonen, T. Sipola: Adaptive Framework for Network Traffic Classification Using Dimensionality Reduction and Clustering, In Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT), th International Congress on, pages , St. Petersburg, Russia, October IEEE [4] A. Juvonen, T. Sipola: Combining Conjunctive Rule Extraction with Diffusion Maps for Network Intrusion Detection, The 18th IEEE Symposium on Computers and Communications (ISCC 13), July 7-10, 2013, Split, Croatia.

24 24 LIITE 3: BIG DATA - PARADIGM, MAJOR TOPICS AND TRENDS BIG DATA Paradigm, major topics and trends Mariia Gavriushenko, supervisor: Pekka Neittaanmäki 24/6/2014

25 25 1. INTRODUCTION Throughout 2013, big data and analytics have been among the most-hyped themes within the arena of IT. Many of the companies making the effort to use big data for improving its reputation Big data is a popular concept used to describe the exponential growth and availability of data, both structured and unstructured. And big data may be as important to business and society as the Internet has become. But still there are some challenges which have to be considered. Many organizations are concerned that the amount of amassed data is becoming so large that it is difficult to find the most valuable pieces of information. And several questions appeared: What if your data volume gets so large and varied you don't know how to deal with it? Do you store all your data? Do you analyze it all? How can you find out which data points are really important? How can you use it to your best advantage? Big data technology has to support search, development, governance and analytics services for all data types from transaction and application data to machine and sensor data to social, image and geospatial data, and more. Several sectors with particular importance for development Big Data are quite data intensive: education, health, government, and communication host one third of the data in the country. Figure 2 illustrates a typical knowledge discovery life-cycle for big data, which consists of the following steps: 1. Data Generation: Data may be generated by instruments, experiments, sensors, or supercomputer simulations. 2. Data processing and Organization: This phase entails (re)organization, deriving subsets, reduction, visualization, query analytics, distribution and other aspects in data processing. 3. Data analytics, Mining and Knowledge Discovery: Given the size and complexity of data and the need for both top-down and bottom-up discovery, scalable algorithms and software need to be deployed in this phase. 4. Actions, Feedback and Refinement: Insights and discoveries from previous phases help close the loop by influencing new simulations, models, parameters, settings, observations, thereby, making the closed loop a virtuous cycle for big data [1]

26 26 Figure 1 A knowledge-discovery life-cycle for Big Data [1] Big data in companies: No single business trend in the last 10 years has as so big potential impact on incumbent IT investments as big data. A big data promises to upend legacy technologies at many big companies. As IT modernization initiatives gain traction and the accompanying cost savings hit the bottom line, executives in both line of business and IT organizations are getting serious about the technology solutions that are tied to big data. Companies are not only replacing legacy technologies in favor of open source solutions like Apache Hadoop, they are also replacing proprietary hardware with commodity hardware, custom-written applications with packaged solutions, and decades-old business intelligence tools with data visualization. This new combination of big data platforms, projects, and tools is driving new business innovations, from faster product time-to-market to an authoritative single view of the customer to custom-packaged product bundles and beyond. Big companies with large investments in their data warehouses have neither the resources nor the will to simply replace an environment that works well doing what it was designed to do. At the majority of big companies, a coexistence strategy that combines the best of legacy data warehouse and analytics environments with the new power of big data solutions is the best of both worlds Many companies continue to rely on incumbent data warehouses for standard BI and analytics reporting, including regional sales reports, customer

27 27 dashboards, or credit risk history. In this new environment, the data warehouse can continue with its standard workload, using data from legacy operational systems and storing historical data to provision traditional business intelligence and analytics results. Data warehouse can serve as a data source into the big data environment. Likewise, Hadoop can consolidate key data output that can populate the data warehouse for subsequent analytics. [2]

28 28 2. BIG DATA PARADIGM The advent of Big Data delivers the cost-effective prospect to improve decision-making in critical development areas such as health care, employment, economic productivity, crime and security, and natural disaster and resource management. Well-known caveats of the Big Data debate, such as privacy concerns, interoperability challenges, and the almighty power of imperfect algorithms, are aggravated in developing countries by long-standing development challenges like lacking technological infrastructure and economic and human resource scarcity. The Big Data paradigm provides loads of additional data to fine-tune the models and estimates that inform all sorts of decisions. As such, the crux of the Big Data paradigm is actually not the increasingly large amount of data itself, but its analysis for intelligent decision-making (in this sense, the term Big Data Analysis would actually be more fitting than the term Big Data by itself). On the picture 1 we can see an established three-dimensional conceptual framework that models the process of digitization as interplay between technology, social change, and guiding policy strategies. The framework comes from the ICT4D literature (Information and Communication Technology for Development)[3] and is based on a Schumpeterian notion of social evolution through technological innovation [4]. Figure 1 The three-dimensional framework applied to Big Data [5]

29 29 The most interesting dimension is tracking: Tracking words Analyzing comments, searches or online posts can produce nearly the same results for statistical inference as household surveys and polls. The tracking of words can be combined with other databases, such as done by Global Viral Forecasting, which specializes in predicting and preventing pandemics [6],or the World Wide Anti-Malarial Resistance Network that collates data to inform and respond rapidly to the malaria parasite s ability to adapt to drug treatments [7]. Tracking locations Location-based data are usually obtained from four primary sources: inperson credit or debit card payment data; in-door tracking devices, such as RFID tags on shopping carts; GPS chips in mobile devices; or cell-tower triangulation data on mobile devices. The system can successfully predict future traffic conditions, based on matching current to historical data, combining it with weather forecasts, and information from past traffic patterns, etc. Such traffic analysis does not only save time and gasoline for citizens and businesses, but is also useful for public transportation, police and fire departments, and, of course, road administrators and urban planners. Tracking nature A recent project by the United Nations University uses climate and weather data to analyze where the rain falls in order to improve food security in developing countries [8] Combing Big Data of nature and social practices, relatively cheap standard statistical software was used by several bakeries to discover that the demand for cake grows with rain and the demand for salty goods with temperature. Sensors, robotics and computational technology have also been used to track river and estuary ecosystems, which help officials to monitor water quality and supply through the movement of chemical constituents and large volumes of underwater acoustic data that tracks the behavior of fish and marine mammal species [9]. Tracking behavior Behavioral abnormalities are usually spotted by analyzing variations in the behavior of individuals in light of the collective behavior of the crowd. With Big Data, a simple analysis of variations allows to detect unwarranted variations, which originate with the underuse, overuse, or misuse of medical care [10]. These affect the means of health care, but not its ultimate end. By now, multiplayer online games are also used to track and influence behavior at the same time. Health insurance companies are currently developing multi-layer online games that aim at increasing the fitness levels of their clients. The tracking of who relates to whom quickly produces vast amounts of data on social network structures, but defines the dynamics of opinion leadership and peer pressure, which are extremely important inputs for behavioral

30 30 change [11]. 3. BIG DATA MAJOR TOPICS BigData 2014's major topics [12] include but not limited to: Big Data Architecture, Big Data Modeling, Big Data As A Service, Big Data for Vertical Industries (Government, Healthcare, etc.), Big Data Analytics, Big Data Toolkits, Big Data Open Platforms, Economic Analysis, Big Data for Enterprise Transformation, Big Data in Business Performance Management, Big Data for Business Model Innovations and Analytics, Big Data in Enterprise Management Models and Practices, Big Data in Government Management Models and Practices, and Big Data in Smart Planet Solution. While the big news for most businesses has been the development of big data technology like Hadoop, the growing prevalence of cloud computing or the decreasing costs of flash storage, the big news for enterprise technology vendors is that a huge portion of that innovation is coming from startups and open source projects vs. the tech giants of the past. Hadoop itself is an open source project that startups have now integrated with the cloud to offer big data services. Startups have also developed hybrid and other flash solutions to help cut costs and improve performance. Some examples of startups and the technology they are offering include: MobileIron: provides software solutions for BYOD and other mobile programs Arista: A cloud service that specializes in high frequency trading along with offering big data solutions Tegile: Offers hybrid arrays that leverage the speed of flash while using lower-cost disk to boost capacity. It claims that its particular product offers five times the performance with 75 percent less capacity required when compared to legacy arrays PernixData: Offers a Flash Virtualization Platform which accelerate business applications through a scale out architecture that virtualizes serverside flash FireEye: Created a virtual cyber-threat defense system that promises real-time protection The future of on-the-go big data: According to Nathaniel Mott, the future of computing will be a question of head vs wrist instead of desktop vs mobile. The coming years will probably

31 31 be flooded with new mobile devices currently unknown and all of them will require a different approach of on-the-go big data. So the challenge ahead for organizations is to accept this and adapt on time to meet the needs of the mobile future. 4. BIG DATA TRENDS Data sources, analysis devices, and simulations are connected with current-generation networks that are faster and capable of moving significantly larger volumes of data than in previous generation. These trends are refer to big data. A reflection on the 2013 Big Data trends [13] 1) On-the-go Big Data, meaning being able to view Big Data visualizations on mobile devices became important and in 2013 we saw the rise of a bunch of new mobile devices including smart watches and Google Glass. There are some Big Data startups, such as Roambi, who have very clear understanding of on-the-go Big Data and are capable of bringing real-time interactive visualizations to mobile devices. 2) Big Data does not require big bucks because of the plethora of Big Data open source tools that are available in the market as well as the decreasing costs of storage. The price of storage does indeed continue to decrease in costs, but the amount of data also grows exponentially. Will we be able to keep up with this or will the amount of data outgrow the available storage? The amount of open source tools is growing rapidly, but there is also a rise in licensed Big Data solutions, because open source tools do require experience Big Data personnel and many organizations do not yet have these staff available. So, to start with Big Data it does not have to cost the world, but to develop and implement a complete Big Data solution can be expensive, although the results can also be significant. 3) Big real-time data and 2013 did indeed show an important growth in real-time analytics. More and more tools become available that create a layer on top of Hadoop to be able to deal with real-time data and Hadoop 2.0 s YARN framework enables real-time data analysis. In the coming years this will become more important as many industries see the advantages of real-time analytics. 4) Big consumer data, or the quantified-self movement. really took off in Wearable technologies that can measure everyday life have started to appear massively and more and more consumers are measuring at least something of their behavior, be it their sleeping patterns or the running results. Recent research from Pew Research Centre revealed that 69% of U.S. adults keep

32 32 track of at least one health indicator such as weight, diet, exercise routine or symptom. 5) Big Data related to privacy. In 2013 there were the PRISM leak by Edward Snowden, which showed that privacy in the Big Data world is indeed an endangered species and there is probably a lot more to be revealed in the coming months. Privacy is indeed affected by Big Data and as consumers, and companies, we will have to get used to this new reality. New trends [13] 1) Internet that will affect the industrial sector dramatically. The next year, machine-to-machine data will grow significantly and continue to do so in the years after. By 2020, 40% of all data will come from sensor data and it will unlock a $ 1 trillion global market in 2020 (currently it is a $ 121 billion global market). In addition, GE reports that the Industrial Internet could add $10 to $15 trillion to global GDP in the coming years. These sensors will completely change the way companies, factories and supply chains will be operated and managed. Technology is transforming the industrial sector, creating machines that can see, feel, sense and react, so they can be operated far more efficiently. Already there are some great examples of how this will affect companies, ranging from airline companies that can reduce turn-around time with monitoring the plane during the flight, to analyzing many different variables to pick the best places to locate wind turbines around the world to be able to harvest the most energy at the lowest costs. As sensors and storage are becoming cheaper every day, algorithms are becoming better and organizations more and more see the need for smart factories, the Industrial Internet will really take off in ) It is going to be cloudy: Big-Data-as-a-Service solutions There are more and more Big Data startups that are creating a Big-Dataas-a-Service solution to help organizations apply Big Data without the heavy costs involved. Especially useful for Small or Medium sized enterprises who do want to develop a data-driven information-centric organization, but who do not have the capacity to develop and maintain a full-fledge Big Data solution on premises. Big-data-as-a-Service is a combination of Analytics-as-a-service, Infrastructure-as-a-Service and Data-as-a-Service and it will spur the adoption of Big Data also by smaller and medium sized organizations. IIA calls this adoption ready-made analytics in the cloud. These solutions offer an attractive alternative for organizations that want to start with Big Data or want to easily scale existing programs. Gartner predicts that cloud computing will become the bulk in IT spending by 2016 and 2014 will be a turning point in the acceptance of the cloud as part of a Big Data strategy. 3) Security to protect the privacy If there is thing that the NSA documents have shown, governments from around the world have almost unprecedented access to data files from organizations and consumers. Organizations will start to focus more heavily on securing their data to protect the privacy of their customers. The first signs for this

33 33 are that tech companies call for aggressive NSA reforms at the White House meeting in December Executives from 15 companies expressed their concern that the NSA s wide-ranging surveillance activities had undermined the trust of their users. Of course, a reform of the NSA activities is one side of the coin; the other side will be increased security measures by the companies to protect their data. More and more organizations will start to use Big Data techniques to secure their IT infrastructure and prevent from being hacked and have data monitored or stolen. Log data will form an important aspect in this and organizations will start to see the importance in monitoring and analyzing their IT infrastructure log data in order to keep their infrastructure and data safe. This will help to restore and keep the trust of their customers. 4) Personalization will become personal Consumers are creating massive amounts of data through every click, like, tweet, cell-phone call, purchase and self-tracking applications they use. Companies like Amazon have already used these kinds of data for many years to create a personal online shopping experience with recommendations, personal homepages, personal discounts or personally targeted mass- campaigns. However, in 2014 more organizations will also start to see the value in such a personalized approach, be it online or offline. A good example is the Australian shoe retailer Shoes of Prey, who have developed an analytics system that enables them to look at individual customer-spend and profitability, and allows them to begin upselling based on the fashion tastes of its clients. Personalization is making a giant leap forward in the coming years and 2014 could very well be the inflection point in the offering of personalized offers and the acceptance of it by consumers. Consumers will start to see that their data is valuable and they do want something in return for providing their data. So consumers are willing to cooperate and share their data if it brings them personalized discounts. 5) Education will be essential for success As more organizations are trying to understand Big Data and preparing their staff for the Big Data era, education becomes a crucial aspect. Already in 2011, McKinsey predicted a shortage in the coming years of Big Data scientists and Big Data managers. Organizations will therefore stimulate their employees to be more Big Data skilled. Many organizations are heading for a major skill gap and will have to take action to be ready for the big data era. Also freshgraduates or students will see the Big Data trend and in the competitive jobs market will feel the need to differentiate to stand a chance on the job market. Therefore in 2014 we will see a steep increase in the available online and offline big data courses. Apart from the online universities such as Coursera or Udacity, many more universities from around the world will start offering a big data course or program. These courses or programs will range from Big Data strategy courses to deep analytics and machine-learning programs for Big Data scientists and anything in between to cater for the massive increase in Big Data

34 34 students. 6) Big data moves into mixed data In the past years Big Data was all about obtaining as much data as possible and the perception was that you require massive datasets to gain insights from those data sources. In the coming year however, organizations will start to see that the most important aspect of Big Data is not so much the volume of a dataset, but more the insights derived from combining several, smaller, datasets. Organizations that do not have Exabyte s or petabytes can still obtain very valuable insights with smaller, but more, data sets. Of course, more data does mean more accurate insights but it does not per se mean more insights. In the coming year, more organizations will understand this and take their first steps into the direction of Big Data. They will start mixing and combining several data sets that they will analyses to derive insights. So in 2014 Big Data will become mixed data. 7) Proof of Concept The past years we saw a lot of talk around Big Data. More conferences, new books, more Big Data startups and more interesting best practices are being shared and distributed online and offline. In 2014 many more organizations will start working towards a Big Data strategy and start developing a Proof of Concept (PoC) to investigate what Big Data can mean for them. The PoC will help organizations gain a better understanding of Big Data, will help them get educated and will help them to be better able to predict the ROI of future Big Data projects. A Proof of Concept is a vital part of a Big Data strategy when you start with Big Data. Future for BigData [14] Big data in cloud really means private cloud. In 2013, most of the big data projects we've seen were put on top of bare metal infrastructure in the enterprise. We expect to see an evolution toward a virtualized infrastructure in We re seeing a lot of investment in products that make this happen, such as Serengeti for vsphere, Savanna for OpenStack and Ironfan for Amazon Web Services (AWS). These projects allow us to automate the deployment of big data platforms to a virtualized infrastructure. The era of analytic applications begins. In 2013, enterprises learned a lot about how to use the big data infrastructure that was new to the market. This coming year, those lessons will be applied toward analytic applications. In 2014, we will see some great use cases happen on that big data infrastructure. This will be the year of: What can I do with big data? rather than: What is big data? Given this refocus on analytic applications, 2014 will create an even greater demand for people with skills in data science. The Hadoop clone wars end. We feel confident that the big data industry will consolidate down to a couple of Hadoop distributions. Currently, many distributions of Hadoop exist, some proprietary and some open source. In 2014, the industry will consolidate

35 35 to two of these. Those that remain will become less relevant either because they are consolidated by acquisition into one of the survivors or they exit the market. Real time in-memory analytics, complex event processing and ETL will combine. Speaking of exits, serial extract, transform, load (ETL) processes will largely go away in As the velocity of data increases, especially social data, there s more need to analyze data in real time as a stream. Currently, Hadoop is being pressed into service for this something it s not well suited for. Inmemory analytics and complex event processing give us the capability to analyze these streams in real time and extract intelligence on the fly. That eliminates the need to perform the traditional ETL steps. MDM will provide the dimensions for big data facts. Master data management (MDM) is used to create a single definition of data from an internal standpoint. As people realize that external data sources are going to add more dimensions to their internal problems, they ll look for a single definition, or a single piece of data that will help describe that new definition or that new dimension, even though it s coming from the outside world. If you realize that external data sources help solve a problem, you'll want an external MDM focus as well. The consolidation of NoSQL will continue. NoSQL means not only SQL rather than the absence of SQL, which means it is more inclusive than exclusive. NoSQL means there are many ways to look at data other than the structured and ordered approach that SQL requires. NoSQL was created to offer a way to look at data without forcing it into a concrete schema. That has been extremely successful, and we re seeing a massive growth in NoSQL. There will be no slowdown in the adoption of NoSQL, but just as with Hadoop distributions, the industry is beginning to settle on a few major players will bring a similar consolidation of NoSQL database distributions. If to look ahead to 2014, experts made a call on social media asking for 2014 big data predictions. Here are some of the responses that have been received [15]: "The hot new data of 2013 was 'exhaust data' powered by the Internet of Things. This will take further hold during 2014, but the hot data of next year will be human data. With knowledge that employee data has been shown to do everything from help manage organizational health to be a leading indicator of quarterly consumer demand, this area will be a hot area for startups. Expect another round of TOS updates from LinkedIn." --Dan Malligner, data science practice lead, Think Big Analytics "The data-information-insight-decision lifecycle will get shortened due to machine learning-based automated decision systems; increased adoption of Cloud ETL to analyze on-premise and off-premise open data; open-source solutions such as R will replace legacy solutions like SAS; the number of offerings of

36 36 reporting solutions embedded in cloud with 90-day deployment will increase." - -Milind Kelkar, leader, Smart Decisions Lab, Genpact "The future of big data in 2014 will be 'where.' Location intelligence is of paramount importance to companies as their customers are increasingly handling the majority of their daily activity on mobile, from online banking to restaurant check-ins and social sharing in real time. Having precise location data helps organizations understand relationships between specific locations so that they can identify growth opportunities, improve information sharing internally and to their customers, and make better strategic business decisions." --James Buckley, SVP, customer data and location intelligence,pitney Bowes Software "It's a major problem for businesses that their online and offline data management systems don't talk to one another. For example, when prospective customers start their buying process online but choose to make a phone call for assistance, the online analytics vanish. Businesses will be seeking means of appending specified data points to those interactions so they have the appropriate IDs and device information to retarget that customer on any given channel. The focus will increasingly be on automating how that data is reported, packaged, and sent to other systems." --Eric Holmen, CMO, Invoca "While Hadoop is still immature, technology advances like YARN are contributing to an enterprise-friendly big data future. Because of YARN, we'll increase opportunities to use new and more optimally efficient engines and expand Hadoop possibilities." --Mike Hoskins, CTO, Actian

37 37 5. INDUSTRIAL INTERNET There are three elements which embody the essence of Industrial Internet: Intelligent machines, advanced analytics, people at work (Figure 2 [16]) Figure 2 Key elements of the Industrial Internet Connecting and combining these elements offers new opportunities across firms and economies. As system monitoring has advanced and the cost of information technology has fallen, the ability to work with larger and larger volumes of real-time data has been expanding. High frequency real-time data brings a whole new level of insight on system operations. Machine-based analytics offers yet another dimension to the analytic process. The combination of physics- based approaches, deep sector specific domain expertise, more automation of information flows, and predictive capabilities can join with the existing suite of big data tools. The result is the Industrial Internet encompasses traditional approaches with newer hybrid approaches that can leverage the power of both historic and real-time data with industry specific advanced analytics. Remote data storage, big data sets and more advanced analytic tools that can process massive amounts of information are maturing and becoming more widely available. Together these changes are creating exciting new opportunities when applied to machines, fleets and networks. Advanced Analytics: Advances in big data software tools and analytic techniques provide the means to understand the massive quantities of data that are generated by intelligent devices. Together, these forces are changing the cost

BIG DATAN TUTKIMUS JA OPETUS JYVÄSKYLÄN YLIOPISTOSSA 4.7.2014

BIG DATAN TUTKIMUS JA OPETUS JYVÄSKYLÄN YLIOPISTOSSA 4.7.2014 BIG DATAN TUTKIMUS JA OPETUS JYVÄSKYLÄN YLIOPISTOSSA 4.7.2014 JYVÄSKYLÄN YLIOPISTO INFORMAATIOTEKNOLOGIAN TIEDEKUNTA 2014 2 SISÄLLYS JOHDANTO... 3 1 BIG DATA JYVÄSKYLÄN YLIOPISTON TUTKIMUKSESSA... 4 1.1

More information

Data Refinery with Big Data Aspects

Data Refinery with Big Data Aspects International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data

More information

MEng, BSc Applied Computer Science

MEng, BSc Applied Computer Science School of Computing FACULTY OF ENGINEERING MEng, BSc Applied Computer Science Year 1 COMP1212 Computer Processor Effective programming depends on understanding not only how to give a machine instructions

More information

Are You Ready for Big Data?

Are You Ready for Big Data? Are You Ready for Big Data? Jim Gallo National Director, Business Analytics February 11, 2013 Agenda What is Big Data? How do you leverage Big Data in your company? How do you prepare for a Big Data initiative?

More information

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics BIG DATA & ANALYTICS Transforming the business and driving revenue through big data and analytics Collection, storage and extraction of business value from data generated from a variety of sources are

More information

III Big Data Technologies

III Big Data Technologies III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

MEng, BSc Computer Science with Artificial Intelligence

MEng, BSc Computer Science with Artificial Intelligence School of Computing FACULTY OF ENGINEERING MEng, BSc Computer Science with Artificial Intelligence Year 1 COMP1212 Computer Processor Effective programming depends on understanding not only how to give

More information

Data-intensive HPC: opportunities and challenges. Patrick Valduriez

Data-intensive HPC: opportunities and challenges. Patrick Valduriez Data-intensive HPC: opportunities and challenges Patrick Valduriez Big Data Landscape Multi-$billion market! Big data = Hadoop = MapReduce? No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard,

More information

Integrating a Big Data Platform into Government:

Integrating a Big Data Platform into Government: Integrating a Big Data Platform into Government: Drive Better Decisions for Policy and Program Outcomes John Haddad, Senior Director Product Marketing, Informatica Digital Government Institute s Government

More information

Are You Ready for Big Data?

Are You Ready for Big Data? Are You Ready for Big Data? Jim Gallo National Director, Business Analytics April 10, 2013 Agenda What is Big Data? How do you leverage Big Data in your company? How do you prepare for a Big Data initiative?

More information

REAL-TIME OPERATIONAL INTELLIGENCE. Competitive advantage from unstructured, high-velocity log and machine Big Data

REAL-TIME OPERATIONAL INTELLIGENCE. Competitive advantage from unstructured, high-velocity log and machine Big Data REAL-TIME OPERATIONAL INTELLIGENCE Competitive advantage from unstructured, high-velocity log and machine Big Data 2 SQLstream: Our s-streaming products unlock the value of high-velocity unstructured log

More information

The Future of Data Management

The Future of Data Management The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class

More information

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014 5 Keys to Unlocking the Big Data Analytics Puzzle Anurag Tandon Director, Product Marketing March 26, 2014 1 A Little About Us A global footprint. A proven innovator. A leader in enterprise analytics for

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

Transforming the Telecoms Business using Big Data and Analytics

Transforming the Telecoms Business using Big Data and Analytics Transforming the Telecoms Business using Big Data and Analytics Event: ICT Forum for HR Professionals Venue: Meikles Hotel, Harare, Zimbabwe Date: 19 th 21 st August 2015 AFRALTI 1 Objectives Describe

More information

Demystifying Big Data Government Agencies & The Big Data Phenomenon

Demystifying Big Data Government Agencies & The Big Data Phenomenon Demystifying Big Data Government Agencies & The Big Data Phenomenon Today s Discussion If you only remember four things 1 Intensifying business challenges coupled with an explosion in data have pushed

More information

Big Data better business benefits

Big Data better business benefits Big Data better business benefits Paul Edwards, HouseMark 2 December 2014 What I ll cover.. Explain what big data is Uses for Big Data and the potential for social housing What Big Data means for HouseMark

More information

White Paper: SAS and Apache Hadoop For Government. Inside: Unlocking Higher Value From Business Analytics to Further the Mission

White Paper: SAS and Apache Hadoop For Government. Inside: Unlocking Higher Value From Business Analytics to Further the Mission White Paper: SAS and Apache Hadoop For Government Unlocking Higher Value From Business Analytics to Further the Mission Inside: Using SAS and Hadoop Together Design Considerations for Your SAS and Hadoop

More information

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the

More information

Buyer s Guide to Big Data Integration

Buyer s Guide to Big Data Integration SEPTEMBER 2013 Buyer s Guide to Big Data Integration Sponsored by Contents Introduction 1 Challenges of Big Data Integration: New and Old 1 What You Need for Big Data Integration 3 Preferred Technology

More information

The 4 Pillars of Technosoft s Big Data Practice

The 4 Pillars of Technosoft s Big Data Practice beyond possible Big Use End-user applications Big Analytics Visualisation tools Big Analytical tools Big management systems The 4 Pillars of Technosoft s Big Practice Overview Businesses have long managed

More information

Adaptive Framework for Network Traffic Classification using Dimensionality Reduction and Clustering

Adaptive Framework for Network Traffic Classification using Dimensionality Reduction and Clustering IV International Congress on Ultra Modern Telecommunications and Control Systems 22 Adaptive Framework for Network Traffic Classification using Dimensionality Reduction and Clustering Antti Juvonen, Tuomo

More information

Big Data & the Cloud: The Sum Is Greater Than the Parts

Big Data & the Cloud: The Sum Is Greater Than the Parts E-PAPER March 2014 Big Data & the Cloud: The Sum Is Greater Than the Parts Learn how to accelerate your move to the cloud and use big data to discover new hidden value for your business and your users.

More information

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Kanchan A. Khedikar Department of Computer Science & Engineering Walchand Institute of Technoloy, Solapur, Maharashtra,

More information

Towards a Thriving Data Economy: Open Data, Big Data, and Data Ecosystems

Towards a Thriving Data Economy: Open Data, Big Data, and Data Ecosystems Towards a Thriving Data Economy: Open Data, Big Data, and Data Ecosystems Volker Markl volker.markl@tu-berlin.de dima.tu-berlin.de dfki.de/web/research/iam/ bbdc.berlin Based on my 2014 Vision Paper On

More information

Industry 4.0 and Big Data

Industry 4.0 and Big Data Industry 4.0 and Big Data Marek Obitko, mobitko@ra.rockwell.com Senior Research Engineer 03/25/2015 PUBLIC PUBLIC - 5058-CO900H 2 Background Joint work with Czech Institute of Informatics, Robotics and

More information

Big Data and Analytics in Government

Big Data and Analytics in Government Big Data and Analytics in Government Nov 29, 2012 Mark Johnson Director, Engineered Systems Program 2 Agenda What Big Data Is Government Big Data Use Cases Building a Complete Information Solution Conclusion

More information

Software Engineering for Big Data. CS846 Paulo Alencar David R. Cheriton School of Computer Science University of Waterloo

Software Engineering for Big Data. CS846 Paulo Alencar David R. Cheriton School of Computer Science University of Waterloo Software Engineering for Big Data CS846 Paulo Alencar David R. Cheriton School of Computer Science University of Waterloo Big Data Big data technologies describe a new generation of technologies that aim

More information

CORE CLASSES: IS 6410 Information Systems Analysis and Design IS 6420 Database Theory and Design IS 6440 Networking & Servers (3)

CORE CLASSES: IS 6410 Information Systems Analysis and Design IS 6420 Database Theory and Design IS 6440 Networking & Servers (3) COURSE DESCRIPTIONS CORE CLASSES: Required IS 6410 Information Systems Analysis and Design (3) Modern organizations operate on computer-based information systems, from day-to-day operations to corporate

More information

MSCA 31000 Introduction to Statistical Concepts

MSCA 31000 Introduction to Statistical Concepts MSCA 31000 Introduction to Statistical Concepts This course provides general exposure to basic statistical concepts that are necessary for students to understand the content presented in more advanced

More information

The University of Jordan

The University of Jordan The University of Jordan Master in Web Intelligence Non Thesis Department of Business Information Technology King Abdullah II School for Information Technology The University of Jordan 1 STUDY PLAN MASTER'S

More information

locuz.com Big Data Services

locuz.com Big Data Services locuz.com Big Data Services Big Data At Locuz, we help the enterprise move from being a data-limited to a data-driven one, thereby enabling smarter, faster decisions that result in better business outcome.

More information

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

Is a Data Scientist the New Quant? Stuart Kozola MathWorks Is a Data Scientist the New Quant? Stuart Kozola MathWorks 2015 The MathWorks, Inc. 1 Facts or information used usually to calculate, analyze, or plan something Information that is produced or stored by

More information

Addressing government challenges with big data analytics

Addressing government challenges with big data analytics IBM Software White Paper Government Addressing government challenges with big data analytics 2 Addressing government challenges with big data analytics Contents 2 Introduction 4 How big data analytics

More information

Concept and Project Objectives

Concept and Project Objectives 3.1 Publishable summary Concept and Project Objectives Proactive and dynamic QoS management, network intrusion detection and early detection of network congestion problems among other applications in the

More information

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA http://kzhang6.people.uic.edu/tutorial/amcis2014.html August 7, 2014 Schedule I. Introduction to big data

More information

Chapter 6 - Enhancing Business Intelligence Using Information Systems

Chapter 6 - Enhancing Business Intelligence Using Information Systems Chapter 6 - Enhancing Business Intelligence Using Information Systems Managers need high-quality and timely information to support decision making Copyright 2014 Pearson Education, Inc. 1 Chapter 6 Learning

More information

Interactive data analytics drive insights

Interactive data analytics drive insights Big data Interactive data analytics drive insights Daniel Davis/Invodo/S&P. Screen images courtesy of Landmark Software and Services By Armando Acosta and Joey Jablonski The Apache Hadoop Big data has

More information

Scaling Big Data Mining Infrastructure: The Smart Protection Network Experience

Scaling Big Data Mining Infrastructure: The Smart Protection Network Experience Scaling Big Data Mining Infrastructure: The Smart Protection Network Experience 黃 振 修 (Chris Huang) SPN 主 動 式 雲 端 截 毒 技 術 架 構 師 About Me SPN 主 動 式 雲 端 截 毒 技 術 架 構 師 SPN Hadoop 基 礎 運 算 架 構 師 Hadoop in Taiwan

More information

Formal Methods for Preserving Privacy for Big Data Extraction Software

Formal Methods for Preserving Privacy for Big Data Extraction Software Formal Methods for Preserving Privacy for Big Data Extraction Software M. Brian Blake and Iman Saleh Abstract University of Miami, Coral Gables, FL Given the inexpensive nature and increasing availability

More information

Sustainable Development with Geospatial Information Leveraging the Data and Technology Revolution

Sustainable Development with Geospatial Information Leveraging the Data and Technology Revolution Sustainable Development with Geospatial Information Leveraging the Data and Technology Revolution Steven Hagan, Vice President, Server Technologies 1 Copyright 2011, Oracle and/or its affiliates. All rights

More information

IBM Software Hadoop in the cloud

IBM Software Hadoop in the cloud IBM Software Hadoop in the cloud Leverage big data analytics easily and cost-effectively with IBM InfoSphere 1 2 3 4 5 Introduction Cloud and analytics: The new growth engine Enhancing Hadoop in the cloud

More information

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank Agenda» Overview» What is Big Data?» Accelerates advances in computer & technologies» Revolutionizes data measurement»

More information

From Big Data to Smart Data Thomas Hahn

From Big Data to Smart Data Thomas Hahn Siemens Future Forum @ HANNOVER MESSE 2014 From Big to Smart Hannover Messe 2014 The Evolution of Big Digital data ~ 1960 warehousing ~1986 ~1993 Big data analytics Mining ~2015 Stream processing Digital

More information

Review of IT Service Management Tools Currently in Use in Finland

Review of IT Service Management Tools Currently in Use in Finland Jussi Suominen & Lasse Tuomi Review of IT Service Management Tools Currently in Use in Finland ITIL, Implementation and Functionality Helsinki Metropolia University of Applied Sciences Bachelor of Engineering

More information

Business Analytics In a Big Data World Ted Malone Solutions Architect Data Platform and Cloud Microsoft Federal

Business Analytics In a Big Data World Ted Malone Solutions Architect Data Platform and Cloud Microsoft Federal Business Analytics In a Big Data World Ted Malone Solutions Architect Data Platform and Cloud Microsoft Federal Information has gone from scarce to super-abundant. That brings huge new benefits. The Economist

More information

A New Era Of Analytic

A New Era Of Analytic Penang egovernment Seminar 2014 A New Era Of Analytic Megat Anuar Idris Head, Project Delivery, Business Analytics & Big Data Agenda Overview of Big Data Case Studies on Big Data Big Data Technology Readiness

More information

Master of Science in Health Information Technology Degree Curriculum

Master of Science in Health Information Technology Degree Curriculum Master of Science in Health Information Technology Degree Curriculum Core courses: 8 courses Total Credit from Core Courses = 24 Core Courses Course Name HRS Pre-Req Choose MIS 525 or CIS 564: 1 MIS 525

More information

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume

More information

A BUSINESS CASE FOR BEHAVIORAL ANALYTICS. White Paper

A BUSINESS CASE FOR BEHAVIORAL ANALYTICS. White Paper A BUSINESS CASE FOR BEHAVIORAL ANALYTICS White Paper Introduction What is Behavioral 1 In a world in which web applications and websites are becoming ever more diverse and complicated, running them effectively

More information

Create and Drive Big Data Success Don t Get Left Behind

Create and Drive Big Data Success Don t Get Left Behind Create and Drive Big Data Success Don t Get Left Behind The performance boost from MapR not only means we have lower hardware requirements, but also enables us to deliver faster analytics for our users.

More information

Master s Degree Programme in International Business Management

Master s Degree Programme in International Business Management Lahti University of Applied Sciences Master s Degree Programme in International Business Management Study Guide 2014-2015 15.9.2014 0 Sisällysluettelo International Business Management... 2 DEGREE PROGRAMME

More information

3rd International Symposium on Big Data and Cloud Computing Challenges (ISBCC-2016) March 10-11, 2016 VIT University, Chennai, India

3rd International Symposium on Big Data and Cloud Computing Challenges (ISBCC-2016) March 10-11, 2016 VIT University, Chennai, India 3rd International Symposium on Big Data and Cloud Computing Challenges (ISBCC-2016) March 10-11, 2016 VIT University, Chennai, India Call for Papers Cloud computing has emerged as a de facto computing

More information

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first

More information

Big data and its transformational effects

Big data and its transformational effects Big data and its transformational effects Professor Fai Cheng Head of Research & Technology September 2015 Working together for a safer world Topics Lloyd s Register Big Data Data driven world Data driven

More information

Enhance Collaboration and Data Sharing for Faster Decisions and Improved Mission Outcome

Enhance Collaboration and Data Sharing for Faster Decisions and Improved Mission Outcome Enhance Collaboration and Data Sharing for Faster Decisions and Improved Mission Outcome Richard Breakiron Senior Director, Cyber Solutions Rbreakiron@vion.com Office: 571-353-6127 / Cell: 803-443-8002

More information

Detecting Anomalous Behavior with the Business Data Lake. Reference Architecture and Enterprise Approaches.

Detecting Anomalous Behavior with the Business Data Lake. Reference Architecture and Enterprise Approaches. Detecting Anomalous Behavior with the Business Data Lake Reference Architecture and Enterprise Approaches. 2 Detecting Anomalous Behavior with the Business Data Lake Pivotal the way we see it Reference

More information

Sanjeev Kumar. contribute

Sanjeev Kumar. contribute RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 sanjeevk@iasri.res.in 1. Introduction The field of data mining and knowledgee discovery is emerging as a

More information

DATA MANAGEMENT FOR THE INTERNET OF THINGS

DATA MANAGEMENT FOR THE INTERNET OF THINGS DATA MANAGEMENT FOR THE INTERNET OF THINGS February, 2015 Peter Krensky, Research Analyst, Analytics & Business Intelligence Report Highlights p2 p4 p6 p7 Data challenges Managing data at the edge Time

More information

COMP9321 Web Application Engineering

COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 11 (Part II) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411

More information

MEDICAL DATA MINING. Timothy Hays, PhD. Health IT Strategy Executive Dynamics Research Corporation (DRC) December 13, 2012

MEDICAL DATA MINING. Timothy Hays, PhD. Health IT Strategy Executive Dynamics Research Corporation (DRC) December 13, 2012 MEDICAL DATA MINING Timothy Hays, PhD Health IT Strategy Executive Dynamics Research Corporation (DRC) December 13, 2012 2 Healthcare in America Is a VERY Large Domain with Enormous Opportunities for Data

More information

Big Data Defined Introducing DataStack 3.0

Big Data Defined Introducing DataStack 3.0 Big Data Big Data Defined Introducing DataStack 3.0 Inside: Executive Summary... 1 Introduction... 2 Emergence of DataStack 3.0... 3 DataStack 1.0 to 2.0... 4 DataStack 2.0 Refined for Large Data & Analytics...

More information

1 st Symposium on Colossal Data and Networking (CDAN-2016) March 18-19, 2016 Medicaps Group of Institutions, Indore, India

1 st Symposium on Colossal Data and Networking (CDAN-2016) March 18-19, 2016 Medicaps Group of Institutions, Indore, India 1 st Symposium on Colossal Data and Networking (CDAN-2016) March 18-19, 2016 Medicaps Group of Institutions, Indore, India Call for Papers Colossal Data Analysis and Networking has emerged as a de facto

More information

White Paper. Version 1.2 May 2015 RAID Incorporated

White Paper. Version 1.2 May 2015 RAID Incorporated White Paper Version 1.2 May 2015 RAID Incorporated Introduction The abundance of Big Data, structured, partially-structured and unstructured massive datasets, which are too large to be processed effectively

More information

Chapter 7. Using Hadoop Cluster and MapReduce

Chapter 7. Using Hadoop Cluster and MapReduce Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in

More information

Data Centric Computing Revisited

Data Centric Computing Revisited Piyush Chaudhary Technical Computing Solutions Data Centric Computing Revisited SPXXL/SCICOMP Summer 2013 Bottom line: It is a time of Powerful Information Data volume is on the rise Dimensions of data

More information

雲 端 運 算 願 景 與 實 現 馬 維 英 博 士 微 軟 亞 洲 研 究 院 常 務 副 院 長

雲 端 運 算 願 景 與 實 現 馬 維 英 博 士 微 軟 亞 洲 研 究 院 常 務 副 院 長 雲 端 運 算 願 景 與 實 現 馬 維 英 博 士 微 軟 亞 洲 研 究 院 常 務 副 院 長 Important Aspects of the Cloud Software as a Service (SaaS) Platform as a Service (PaaS) Infrastructure as a Service (IaaS) Information and Knowledge

More information

This Symposium brought to you by www.ttcus.com

This Symposium brought to you by www.ttcus.com This Symposium brought to you by www.ttcus.com Linkedin/Group: Technology Training Corporation @Techtrain Technology Training Corporation www.ttcus.com Big Data Analytics as a Service (BDAaaS) Big Data

More information

EPSRC Cross-SAT Big Data Workshop: Well Sorted Materials

EPSRC Cross-SAT Big Data Workshop: Well Sorted Materials EPSRC Cross-SAT Big Data Workshop: Well Sorted Materials 5th August 2015 Contents Introduction 1 Dendrogram 2 Tree Map 3 Heat Map 4 Raw Group Data 5 For an online, interactive version of the visualisations

More information

Big Data Analytics for Space Exploration, Entrepreneurship and Policy Opportunities. Tiffani Crawford, PhD

Big Data Analytics for Space Exploration, Entrepreneurship and Policy Opportunities. Tiffani Crawford, PhD Big Analytics for Space Exploration, Entrepreneurship and Policy Opportunities Tiffani Crawford, PhD Big Analytics Characteristics Large quantities of many data types Structured Unstructured Human Machine

More information

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Datenverwaltung im Wandel - Building an Enterprise Data Hub with Datenverwaltung im Wandel - Building an Enterprise Data Hub with Cloudera Bernard Doering Regional Director, Central EMEA, Cloudera Cloudera Your Hadoop Experts Founded 2008, by former employees of Employees

More information

CONTENTS. Introduction 3. IoT- the next evolution of the internet..3. IoT today and its importance..4. Emerging opportunities of IoT 5

CONTENTS. Introduction 3. IoT- the next evolution of the internet..3. IoT today and its importance..4. Emerging opportunities of IoT 5 #924, 5 A The catchy phrase Internet of Things (IoT) or the Web of Things has become inevitable to the modern world. Today wireless technology has reached its zenith making it possible to interact with

More information

THE EXPERT SYSTEMS ANALYSIS USING THE CONCEPT OF BIG DATA AND CLOUD COMPUTING SERVICES

THE EXPERT SYSTEMS ANALYSIS USING THE CONCEPT OF BIG DATA AND CLOUD COMPUTING SERVICES THE EXPERT SYSTEMS ANALYSIS USING THE CONCEPT OF BIG DATA AND CLOUD COMPUTING SERVICES Violeta Nicoleta OPRIŞ 1 Ciprian RACUCIU 2 1 Inf. Ph.D. Student 1 Military Technical Academy, Faculty of Military

More information

Safe Harbor Statement

Safe Harbor Statement Defining a Roadmap to Big Data Success Robert Stackowiak, Oracle Vice President, Big Data 17 November 2015 Safe Harbor Statement The following is intended to outline our general product direction. It is

More information

Ganzheitliches Datenmanagement

Ganzheitliches Datenmanagement Ganzheitliches Datenmanagement für Hadoop Michael Kohs, Senior Sales Consultant @mikchaos The Problem with Big Data Projects in 2016 Relational, Mainframe Documents and Emails Data Modeler Data Scientist

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:

More information

Deploying Big Data to the Cloud: Roadmap for Success

Deploying Big Data to the Cloud: Roadmap for Success Deploying Big Data to the Cloud: Roadmap for Success James Kobielus Chair, CSCC Big Data in the Cloud Working Group IBM Big Data Evangelist. IBM Data Magazine, Editor-in- Chief. IBM Senior Program Director,

More information

BIG DATA What it is and how to use?

BIG DATA What it is and how to use? BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14

More information

Turning Big Data into Big Insights

Turning Big Data into Big Insights mwd a d v i s o r s Turning Big Data into Big Insights Helena Schwenk A special report prepared for Actuate May 2013 This report is the fourth in a series and focuses principally on explaining what s needed

More information

Exploiting Data at Rest and Data in Motion with a Big Data Platform

Exploiting Data at Rest and Data in Motion with a Big Data Platform Exploiting Data at Rest and Data in Motion with a Big Data Platform Sarah Brader, sarah_brader@uk.ibm.com What is Big Data? Where does it come from? 12+ TBs of tweet data every day 30 billion RFID tags

More information

SAP Thought Leadership Paper. Helping the U.S. Government Serve the American People Better

SAP Thought Leadership Paper. Helping the U.S. Government Serve the American People Better SAP Thought Leadership Paper Helping the U.S. Government Serve the American People Better Helping the U.S. Government Serve the American People Better innovating with less: the cornerstone of the Digital

More information

FITMAN Future Internet Enablers for the Sensing Enterprise: A FIWARE Approach & Industrial Trialing

FITMAN Future Internet Enablers for the Sensing Enterprise: A FIWARE Approach & Industrial Trialing FITMAN Future Internet Enablers for the Sensing Enterprise: A FIWARE Approach & Industrial Trialing Oscar Lazaro. olazaro@innovalia.org Ainara Gonzalez agonzalez@innovalia.org June Sola jsola@innovalia.org

More information

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012. Viswa Sharma Solutions Architect Tata Consultancy Services

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012. Viswa Sharma Solutions Architect Tata Consultancy Services Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012 Viswa Sharma Solutions Architect Tata Consultancy Services 1 Agenda What is Hadoop Why Hadoop? The Net Generation is here Sizing the

More information

IBM Big Data. Hadoop-tietoisku kumppaneille Pekka Leppänen, IBM Analytics Platform Leader Finland. 2015 IBM Corporation

IBM Big Data. Hadoop-tietoisku kumppaneille Pekka Leppänen, IBM Analytics Platform Leader Finland. 2015 IBM Corporation IBM Big Data Hadoop-tietoisku kumppaneille Pekka Leppänen, IBM Analytics Platform Leader Finland 2015 IBM Corporation Agenda 8.30 Aamiainen ja ilmoittautuminen 9:10 9:45 Keskeiset toimijat ja trendit markkinoilla

More information

01219211 Software Development Training Camp 1 (0-3) Prerequisite : 01204214 Program development skill enhancement camp, at least 48 person-hours.

01219211 Software Development Training Camp 1 (0-3) Prerequisite : 01204214 Program development skill enhancement camp, at least 48 person-hours. (International Program) 01219141 Object-Oriented Modeling and Programming 3 (3-0) Object concepts, object-oriented design and analysis, object-oriented analysis relating to developing conceptual models

More information

Kimmo Bergius (kimmo.bergius@microsoft.com) Tietoturvajohtaja

Kimmo Bergius (kimmo.bergius@microsoft.com) Tietoturvajohtaja Kimmo Bergius (kimmo.bergius@microsoft.com) Tietoturvajohtaja Number of Digital IDs Trendejä Exponential Growth of IDs Identity and access management challenging Increasingly Sophisticated Malware Anti-malware

More information

Some Research Challenges for Big Data Analytics of Intelligent Security

Some Research Challenges for Big Data Analytics of Intelligent Security Some Research Challenges for Big Data Analytics of Intelligent Security Yuh-Jong Hu hu at cs.nccu.edu.tw Emerging Network Technology (ENT) Lab. Department of Computer Science National Chengchi University,

More information

SAP Solution Brief SAP Technology SAP HANA. SAP HANA An In-Memory Data Platform for Real-Time Business

SAP Solution Brief SAP Technology SAP HANA. SAP HANA An In-Memory Data Platform for Real-Time Business SAP Brief SAP Technology SAP HANA Objectives SAP HANA An In-Memory Data Platform for Real-Time Business Fast, broad, and meaningful insight at your service Real-time analytics Fast, broad, and meaningful

More information

INTERNATIONAL MASTER IN BUSINESS ANALYTICS AND BIG DATA

INTERNATIONAL MASTER IN BUSINESS ANALYTICS AND BIG DATA POLITECNICO DI MILANO GRADUATE SCHOOL OF BUSINESS BABD INTERNATIONAL MASTER IN BUSINESS ANALYTICS AND BIG DATA Courses Description A JOINT PROGRAM WITH POLITECNICO DI MILANO SCHOOL OF MANAGEMENT PRE-COURSES

More information

Sunnie Chung. Cleveland State University

Sunnie Chung. Cleveland State University Sunnie Chung Cleveland State University Data Scientist Big Data Processing Data Mining 2 INTERSECT of Computer Scientists and Statisticians with Knowledge of Data Mining AND Big data Processing Skills:

More information

Industry Impact of Big Data in the Cloud: An IBM Perspective

Industry Impact of Big Data in the Cloud: An IBM Perspective Industry Impact of Big Data in the Cloud: An IBM Perspective Inhi Cho Suh IBM Software Group, Information Management Vice President, Product Management and Strategy email: inhicho@us.ibm.com twitter: @inhicho

More information

BIG DATA TECHNOLOGY. Hadoop Ecosystem

BIG DATA TECHNOLOGY. Hadoop Ecosystem BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big

More information

Navigating Big Data business analytics

Navigating Big Data business analytics mwd a d v i s o r s Navigating Big Data business analytics Helena Schwenk A special report prepared for Actuate May 2013 This report is the third in a series and focuses principally on explaining what

More information

YOU VS THE SENSORS. Six Requirements for Visualizing the Internet of Things. Dan Potter Chief Marketing Officer, Datawatch Corporation

YOU VS THE SENSORS. Six Requirements for Visualizing the Internet of Things. Dan Potter Chief Marketing Officer, Datawatch Corporation YOU VS THE SENSORS Six Requirements for Visualizing the Internet of Things Dan Potter Chief Marketing Officer, Datawatch Corporation About Datawatch NASDAQ: DWCH Pioneer in real-time visual data discovery

More information

HITACHI DATA SYSTEMS INTRODUCES NEW SOLUTIONS AND SERVICES TO MAKE SOCIETIES SAFER, SMARTER AND HEALTHIER

HITACHI DATA SYSTEMS INTRODUCES NEW SOLUTIONS AND SERVICES TO MAKE SOCIETIES SAFER, SMARTER AND HEALTHIER FOR IMMEDIATE RELEASE HITACHI DATA SYSTEMS INTRODUCES NEW SOLUTIONS AND SERVICES TO MAKE SOCIETIES SAFER, SMARTER AND HEALTHIER Acquisitions and Innovations in Big Data Analytics and Internet of Things

More information

Hadoop for Enterprises:

Hadoop for Enterprises: Hadoop for Enterprises: Overcoming the Major Challenges Introduction to Big Data Big Data are information assets that are high volume, velocity, and variety. Big Data demands cost-effective, innovative

More information

EL Program: Smart Manufacturing Systems Design and Analysis

EL Program: Smart Manufacturing Systems Design and Analysis EL Program: Smart Manufacturing Systems Design and Analysis Program Manager: Dr. Sudarsan Rachuri Associate Program Manager: K C Morris Strategic Goal: Smart Manufacturing, Construction, and Cyber-Physical

More information

Big Data Challenges and Success Factors. Deloitte Analytics Your data, inside out

Big Data Challenges and Success Factors. Deloitte Analytics Your data, inside out Big Data Challenges and Success Factors Deloitte Analytics Your data, inside out Big Data refers to the set of problems and subsequent technologies developed to solve them that are hard or expensive to

More information

High Performance Data Management Use of Standards in Commercial Product Development

High Performance Data Management Use of Standards in Commercial Product Development v2 High Performance Data Management Use of Standards in Commercial Product Development Jay Hollingsworth: Director Oil & Gas Business Unit Standards Leadership Council Forum 28 June 2012 1 The following

More information

EO Data by using SAP HANA Spatial Hinnerk Gildhoff, Head of HANA Spatial, SAP Satellite Masters Conference 21 th October 2015 Public

EO Data by using SAP HANA Spatial Hinnerk Gildhoff, Head of HANA Spatial, SAP Satellite Masters Conference 21 th October 2015 Public Leveraging Geospatial Technologies EO Data by using SAP HANA Spatial Hinnerk Gildhoff, Head of HANA Spatial, SAP Satellite Masters Conference 21 th October 2015 Public Disclaimer This presentation outlines

More information