Crawling, parsing & coding of online jobs to enable text mining for skills Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID CEPS Workshop 20-10-2014
Language gap I like programming, but I m interested do take on more project management responsibility Is there a job in our organisation that better fits my degree? I d like to work on our mobile strategy. I ve helped a friend develop a mobile app. I d like to do more with my organisational talent. We are looking to hire: An experienced tech team team lead The ideal candidate has: - min. 5yr of experience - Certfied scrummaster - Exp. w/ios, Android Completed academic studies Computer Science or related 30% travel for customer presentations
The Job ad searches directly in a database and identifies relevant candidates (or vice versa)
Textkernel: Spinoff from R&D in machine learning and language technology Founded 2001, offices in Amsterdam (HQ), Frankfurt, Paris, 52 employees; strong R&D focus Deloitte Fast 50 2007, 2010, 30% YoY growth Core technology: Understanding unstructured text data. Multi-lingual Market: Job boards, Recruitment Software, Staffing and recruitment, Mobility, Large Employers Products: Multi-lingual tools (15 languages) to extract CVs and jobs Jobfeed: largest real time DB for job market analysis Search! & Match! to connect people and jobs Customers: UWV, Pole Emploi, Adecco, Randstad, USG, Monster, Stepstone, XING, SAP, Unisys, Bosch, Axa, Philips, etc. (>400 direct, 10000+ indirect), Large partner network (HR & recruitment software)
Jobfeed Search and analyse real-time online job ads as well as historical data
Jobfeed Spidering (Wide & Targeted) Classification Cleaning web pages Extracting (>30 fields) Normalisation and matching De-duplication Expired jobs Monitoring
Jobfeed
Jobfeed! Knowledge of all online demand for labour in European job market Sales leads for recruitment and staffing companies Real time labour market analytics tools Largest database of jobs for matching unemployed Perfect data source for text mining
Jobfeed! Real time collection of online job ads from any (unstructured) source Available in NL, DE, FR, IT Gradually rolling out in rest of Europe (2015: BE, UK, AT) Richly semantically structured data
Occupation coding! Coding follows Extraction Customer specific or standard taxonomies String similarity based normalization Lot of synonyms per language Distance = confidences Problem cases: ambiguity, context, long tail More complex models can help (classifiers, multi-variate models) Semantic matching better (occupation coding errors are counterbalanced by other variables)
Jobfeed: Multilingual Occupation Taxonomy Occupations >4000 codes 4 languages 3 layer hierarchy >50K synonyms Link to other concepts: - Skills - Education level - Sector - O*NET - UWV (Dutch Employment Agency) - ROME Example: NL: administratief medewerker, EN: administrative assistant, FR: employé administratif, DE: Verwaltungsassistent (m/w). Group: administrative personnel Class: Administration and Customer Service Synonyms: administrative employee, assistant clerk, office support Skills: ms office, excel, english language, etc O*NET: 43-9199.00: Office and Administrative Support Workers, All Other UWV: 1000402563: Administratief medewerker secretariaat Based on millions of jobs, years of customer feedback and experience!
Jobfeed!
Skill mining Example: Jobtitle: Truck driver Number of unique skills for this jobtitle: 586 Skill Skill probability Skill relevance Relevance score Bulk-Auto 0.0034 7.22 4.699 Gültiger LKW-Schein 0.0034 6.53 2.349 Sattelzug 0.0051 5.97 2.014 word 0.0017 1.12 0.005
Example output
Tag Cloud for a Job category Truck-driver
Tag Cloud for a Job category Chauffeur
Jobfeed Spidering (Wide & Targeted) Classification Cleaning web pages Extracting (>30 fields) Normalisation and matching De-duplication Expired jobs Monitoring
Semantic Recruitment Technology Thanks!