NORTHEAST CONSORTIUM NORTHEAST RESEARCH CONSORTIUM: Guide to Using Real Time Data for LMI Analysts. March, 2012

Size: px
Start display at page:

Download "NORTHEAST CONSORTIUM NORTHEAST RESEARCH CONSORTIUM: Guide to Using Real Time Data for LMI Analysts. March, 2012"

Transcription

1 NORTHEAST CONSORTIUM Understanding the Labor Market in New Ways NORTHEAST RESEARCH CONSORTIUM: Guide to Using Real Time Data for LMI Analysts March, 2012 This workforce solution was funded by a grant awarded by the U.S. Department of Labor s Employment and Training Administration. The solution was created by the grantee and does not necessarily reflect the official position of the U.S. Department of Labor. The Department of Labor makes no guarantees, warranties, or assurances of any kind, express or implied, with respect to such information, including any information on linked sites and including, but not limited to, accuracy of the information or its completeness, timeliness, usefulness, adequacy, continued availability, or ownership. This solution is copyrighted by the institution that created it. Internal use by an organization and/or personal use by an individual for non commercial purposes is permissible. All other uses require the prior authorization of the copyright

2 Section 1: THE LMI UNCERTAINTY PRINCIPLE Perhaps the most important distinction to make when dealing with online job postings is that they are not the same as job openings or vacancy rates. They are created by employers for their own unique and specific purposes, which does not include developing reporting data for use by LMI analysts. Online job postings may be used to attract applicants, but are also often times written to solicit a specific response or simply test the market. Online job postings are prevalent in some occupational areas and limited or non existent in others. The actual postings are usually vague in geographic location details and frequently do not include information such as salary, educational requirements, experience requirements, etc. Also, there is not necessarily a one to one relationship between an online job posting and a job opening. An employer might: Create a single posting for several openings with the same or similar qualifications in the same location or multiple locations; Use an online job posting to better understand the local workforce without having an actual job available; Post several times on differing websites for a single job opening, sometimes using different language in the description for each of the varying sites, making de duplication difficult or impossible; Never make some vacancies public, but fill them internally, through word of mouth, etc.; Use advertising methods that do not generate online postings signs in the windows, union hiring halls, radio ads, and employee referrals. As a result, in the world of online job postings, there are no standard practices, rules or conventions as to the form or function. Guide to Using Real Time Data for LMI Analysts Page 1

3 How Online Job Postings Differ from Job Vacancy Data THE LMI UNCERTAINTY PRINCIPLE 1: Output, i.e., projections & conclusions, cannot be better than input, i.e., the online job postings themselves. Postings are sourced, using spidering technology, from thousands of sites, frequently grabbing multiple copies of a single posting. Both the postings and the websites are of varying quality. The postings are designed primarily for recruitment, and are not designed for analysis. Analysis has shown that the postings are biased upscale in the labor market. There is a significant daily fluctuation of online job postings. Online job postings are influenced by factors unrelated to real changes in labor demand. Trying to extract real information from short time periods produces more noise than signal; three to six month chunks of data seem to be required to smooth out the noise. Geographic units of analysis are likely to be large, i.e., at the State level, because of the way job locations are posted. Zip code level data is not reliable because very few jobs post an exact location that can be tied to a zip code. THE LMI UNCERTAINTY PRINCIPLE 2: The more granular you make any parameter, the less you can know about other parameters. Analyzing skills at the occupational level will require large spans of both time and geography. Projections may be possible, but may not be capable of being isolated to individual occupations except in areas with very high concentrations of that occupation. Guide to Using Real Time Data for LMI Analysts Page 2

4 What Have We Gleaned From Online Job Postings Data? Advertising Behaviors: An analysis of postings based on job family (two digit SOC level) showed common patterns of advertising behavior. Different posting methods were found by job family, even by the same employer. Large employers and corporations were more likely to post all jobs online. Smaller businesses were more likely to only post executive or high skill positions on line. Data Quality and Quantity: The ability of automated software programs to generate accurate data from online postings is affected by both the quality and quantity of the postings. In some job families, larger quantities of postings have allowed the software to produce data that is measurably accurate for that job family. For other job families, posting quality, i.e., detailed content, is commonly found, which allows the software to produce measurably accurate data. Job families with a negligible number of postings (e.g., agriculture, forestry and fishing, building and grounds cleaning and maintenance, etc.) were found to be of limited accuracy. Job families with quality postings have the potential to be analyzed at the four digit SOC level if there is also sufficient quantity (e.g., computer and math, healthcare practitioners, technical occupations, etc.). Context Is Important: Understanding terminology context is important. Some terms and phrases have different meanings depending on the occupation to which it is applied. For example, reuse applies to software reuse, workforce reuse, and water reuse, but the meaning is quite different for each. This context is important because the meaning of the words can make classification of the skill different. The word reuse alone could be green for instance but it is not green when it is modified by software but probably green when modified by water. Analysis software can be trained to recognize these context items but such training requires significant work by trained LMI analysts. Postings Data are Complementary to Existing LMI: As with the majority of LMI data, online job postings do not represent a complete picture of the labor market. The potential value of this piece is in its timeliness and that it measures the flow of jobs, as opposed to being a measure of stock usually found in LMI. In preliminary comparisons, the online postings data was found to have similar patterns to JOLTS data, one of the few flow measures in traditional LMI. Potentially, existing LMI could be used as a benchmark for online job posting patterns, providing confidence that past data had been extracted within acceptable error limits and thus providing a measure of confidence that future data is also being extracted within acceptable error limits. Guide to Using Real Time Data for LMI Analysts Page 3

5 Not All Areas Are Created Equal: Geographic location for an advertised job is not universally available. Rural areas were found to have a limited share of job ads, while urban areas hold the greatest share. Frequently the actual location of a job is not included in the job posting, causing the data parser to default the listing to the location of the central zip code of a large labor market clearly an unreliable zip code location. Because of the location accuracy, the Consortium s goal of being able to provide data for all substate areas was not possible at present and may never be possible. Real time data reported at any level below a major metropolitan area is suspect and will either substantially under or over state the online posting counts. Real time LMI providers that produce city, county or zip code level data should be asked to disclose their coding and modeling methodologies for review and data produced at those levels should be considered questionable for any use until the analyst is satisfied with the response. Data Source is Important: Sources primarily have an effect on the data in two ways: First, data from some spidered sites simply lack content, compromising the parsing and occupational coding processes, i.e., you cannot analyze what does not exist. It is important to understand your vendor s spidering and de duplication processes. Second, there can be artificial increases in the volume of job postings when source domains are added to the spidering process. This can be irrelevant if you are doing a pointin time analysis, but an impediment to doing time series analysis. It is important to know whether your vendor is continuously adding new sites in an attempt to provide a reasonable representation of the scope of the labor market or whether your vendor is limiting the number of sites to a stable subset of reliable and representative sites. Guide to Using Real Time Data for LMI Analysts Page 4

6 There may not be a single version of a real time database that can be used for all purposes and maintaining multiple versions may prove to be prohibitively expensive. Pursuing the expansion of spidering to cover all large corporate sites, large national boards and smaller regional and occupation niche boards may be the only way to get representative counts in all local areas and all occupations. However, narrowing the number of sites to a stable but representative group so that a time series can be created produces data which tracks more closely with other external measures (the Help Wanted Online database is the best example of this approach). For analysis of the skills and other requirements of the particular class of job, it might prove useful to create a database that is fed only by high quality corporate, national and niche job boards. High quality in this context would be boards that contain extensive job postings with some or all of the information fielded (e.g., education or experience requirements are always found in the same place). Free posting sites like Craigslist or Snagajob are likely to be excluded. This constrained source database would not be used to gauge absolute volume, but may be very useful in showing proportions skill mix, education requirement distributions, experience requirement distributions, etc. Guide to Using Real Time Data for LMI Analysts Page 5

7 What Issues Must Be Addressed When Publishing Postings Data? Data Labeling: Results should be labeled online job postings, job ad volume, online job ads, or something similar, to accurately reflect the data being analyzed. The data does not represent job openings or vacancy rates. Incorrect uses of these terms, i.e., confusing job ads with job openings, have already been seen in media stories on various published data series. Data Accuracy: Data made available to the public should only be from data fields with a reasonable measure of accuracy. Data fields that cannot be verified to some standardized measure whether due to lack of information within the postings or due to error in the analysis software should not be published. Continuing standardization will be needed for ongoing research. Job posting data is fluid and continued analysis will be needed. Buyer Beware: When publishing information based on online job postings, you must also present data limitations, caveats, potential error rates, etc. Some end users may choose to utilize the data inappropriately anyway, but the same holds true for any LMI that is published (and is frequently misquoted and used out of context). We do recommend that only data with minimal caveats or disclaimers be released to the public. We also recommend that high level policy makers should only use online postings data in their decision making process if it is combined with traditional LMI. Although there is still a great deal of analytical and procedural work to be done, the information extracted from online job postings is still worthwhile, if used appropriately. Guide to Using Real Time Data for LMI Analysts Page 6

8 Realities and Limitations of the Data (It is important to note that there was not complete consensus by all of the Consortium members on all of these points on this page and the next, or on the value of online job postings in the decision making process. However, these two pages represent the best aggregation that everyone could live with.) DATA FIELDS Occupation Geography Industries Skills Job Title Firm Name Education Certification Greenness Timeframe Job source CURRENT LEVEL OF CONFIDENCE Generally 2 digit, depending on job family State level Unknown Currently, of limited utility and only with analyst review; Shows promise, but needs more research with an analyst looking at context and job family Yes Sometimes, but questionable, due to so many variations of the name; Requires an analyst to vet Sometimes, but interpretation and context are important (degree might be required, might be preferred). Also, many online postings for jobs with clear degree requirements (e.g., lawyers, nurses, etc.) will not include any stated requirement, letting the degree level be implied. Caveat: Low % of fields listed When it is listed, it is generally valid, but it still needs an analyst to review, based on the low percentage of listings Sometimes; should only be used for research purposes with an analyst review for interpretation and context. Analysis based on skill words generally produces much more accurate results than coding by occupation. The primary problem is the rate of false positives (calling a job green when it is not). Better to use quarterly data or a 3 month moving average to smooth out spikes; monthly is possible (calendar & 30 day timeframes), depending on the report Yes Guide to Using Real Time Data for LMI Analysts Page 7

9 WHAT THE DATA CAN DO: Can show which websites post job ads Can show which firms post job ads Can describe data at the state level or other large geographic area Can display accurate information at most 2 digit occupational codes Can compare job titles with other variables Can use postings on a quarterly or, sometimes, on a monthly basis, depending on the report; usually better to use quarterly data or a 3 month moving average to smooth out spikes Can improve quality by selectively removing bad sources of data, using Quality Assurance & Data workgroups Can show educational levels matched to job titles, especially in high skill jobs, when education is listed or inferred and with an analyst looking at interpretation and context Can provide information on certifications within certain occupations, when it is posted and with an analyst review Can look at skills and certifications together to get a more complete picture, with an analyst reviewing, especially for context and job family May be able to do firm name based on query (good for green jobs, comparative analysis, and profiling online job ad posters) May be able to get an up/down trend on known list of skills May be able to show increase/decrease in job titles Can show job titles in demand Can show top volume O*Net codes WHAT THE DATA CANNOT DO: Cannot equate job posting with job vacancy Cannot eliminate all duplicates Cannot get the universe of job openings Cannot get all 6 digit occupational codes Cannot do industries Cannot know negative biases on anything not posted or parsed Cannot use skills without vetting Cannot project job postings yet, but modeling shows promise out to 6 months Cannot provide consistent representation of postings from month to month Cannot show salary or benefits Cannot publish zip code or any sub state level data Cannot always determine educational needs in most postings Cannot always determine required vs. preferred educational level (where listed)

10 Section 2: OVERVIEW OF REAL TIME DATA With employers increasingly turning to the Internet to advertise opportunities, online job postings have become the primary vehicle for many job seekers. There have been mixed responses to this development. Some argue that given the size of today s applicant pool, firms have become ever more selective when sorting through résumés, prompting rumors of discrimination against the unemployed. 1 Others contend that online intermediaries alleviate labor market imperfections, often associated with imperfect information and adverse selection. 2 Labor market analysts, workforce professionals, and researchers are increasingly in need of diverse and robust sources of data in order to be more responsive to the changing economy. In recent years, realtime data, most commonly in the form of online job postings, has become a staple of some labor market information (LMI) staff, and its use is expected to increase in the years ahead. For example, the Projections Managing Partnership, which coordinates the nationwide employment projections, has stated that the new projections interface will likely include data from online postings alongside the standard information, such as total employment and new and replacement job growth. 3 In this changing environment it is critical for researchers, policy makers, and other data users to have a broad understanding of what real time data is, how it is obtained, and what it can and cannot accomplish, as well as how it relates to more commonly used sources of data or traditional labor market information. This section provides an overview of the analytical uses of online job postings, outlines the methodology used by the Northeast Consortium s data provider (Burning Glass Technologies), and briefly discusses the relationship between online postings and traditional sources of labor market information. 1 The Help WantedSign Comes With a Frustrating Asterisk NYT ( wanted ads exclude the long term jobless.html?_r=1&smid=fb nytimes&wt.mc_id=bu SM E FB SM LIN HWA NYT NA&WT.mc_ev=click) 2 The Economics of Labor Market Intermediation VOX Policy Research ( crisis debate.com/index.php?q=node/2500) 3 Presentation by Ms. Alexandra Hall, Director, Office of Government, Policy and Public Relations, Colorado Dept. of Labor & Employment Guide To Using Real Time Data for LMI Analysts Page 9

11 The Promise and the Pitfalls Traditional LMI includes systematically collected data from either administrative records, such as the Quarterly Census of Employment and Wages (QCEW), or from surveys, such as the Occupational Employment Statistics (OES) wage survey. This methodologically sound data is both valid and reliable; however, there is a significant time lag between when data is collected and when it is released for analytical purposes. Online job postings can be compiled into a database and analyzed as a proxy for recent demand in the labor market. In contrast to traditional LMI, online job postings (real time data and online job postings are used interchangeably in this report) offer a more timely data source for analysis. As previously stated, online job postings were designed for recruitment, not analysis; therefore, postings databases lack the validity and reliability of traditional LMI. They have the potential to provide an innovative tool for analysis of issues in labor demand, especially related to industries, such as green, where many of the occupations and their accompanying knowledge, skills, and credentials are still emerging. Aggregated job postings databases include many variables that are absent from traditional data sources, such as skills, education requirements, industry based certifications, and employee benefits associated with job postings. The promise of the data has resulted in increased interest in their use for analysis. The potential benefits of jobs postings analysis include: 1) unique features that position postings as the solution for curriculum development; 2) incorporation into models for more accurate short term projections; and 3) a source of emerging skills that have not made it to standardized knowledge, skills and abilities taxonomies produced by O*NET. These benefits and others have resulted in the increased interest in and use of postings in recent years. Job postings databases compiled from various Internet sources are not without limitations. While the diversity of sites used as a source of data is an asset, it also creates problems not found in traditional labor market data. The quality varies widely. Postings often do not include all of the skills, knowledge, or credentials required for a position (and it is impossible to determine what proportion of the requirements is included). In addition, postings are a proxy for demand, but they do not represent actual hiring. While not all postings are for open positions, it is also true that not all open Guide To Using Real Time Data for LMI Analysts Page 10

12 positions are posted on the Internet. Word of mouth, referrals, and even signs in the window continue to be preferred recruitment methods for some industries and firms. There are also occupations where recruitment rarely occurs online. The number of jobs posted online is higher in fields such as computer science and management, but lower in fields like construction and farming. Many of these factors result in the postings having a generally upscale bias overall in terms of the types of positions listed. That upscale bias may actually be a benefit for the use of real time data to help guide curriculum development because many of the positions not recruited online are either low skill or require only on the job training. Guide To Using Real Time Data for LMI Analysts Page 11

13 Data Field Specifics What the data can do There are several variables in the online postings that are of value for data users and consumers. Unit of time: The job postings data used by the Consortium includes a variable for the date of the posting. Data quality reviews concluded that the minimum unit of time for analysis is one month because of the variability of postings on a daily or weekly basis, but that it was generally better to use at least quarterly or 3 month moving average data. The use of a moving average is common for highly volatile data series like the unemployment claims data. As noted above, the more granular the other elements of analysis are (e.g., occupations, levels of geography, skills), the longer the time period of analysis should be. Geography: State level and large geographic areas are the best geographic units of analysis. As described in the limitations section, the location information available in the postings is not as specific as it may seem. Zip code level data was deemed unreliable and misleading. Job title: Job titles from postings are useful variables for analysis. Titles can be compared over time to determine if some are increasing or decreasing in frequency. They are also valuable when used in conjunction with other variables (e.g., education, where available). Titles may produce a window into emerging occupations and, with further analysis, emerging skills. Firm name: The name of the firm is not available for all postings (because it is not listed on some sites), but it can be analyzed when it is included. Useful information that can be gained from firm name analysis including the types of postings (occupations and job titles) listed for specific firms, seasonal hiring patterns, and gaining a better understanding about which firms post jobs online. Guide To Using Real Time Data for LMI Analysts Page 12

14 Skills and certifications: Skills are not clearly represented in the postings, and certifications are included in only limited groups of occupations. Skills, in this context, is short hand for a mix of qualifications and job duty statements that might more broadly be thought of as skills, abilities, theoretical and applied knowledge, common tasks and types of experience. Terms and abbreviations can have more than one meaning; for example, P.T. might mean Physical Therapist or Part Time. Postings frequently do not specify qualifications, as it appears an assumption is made that those who are qualified already know the qualifications. For example, lawyers know they need to pass the Bar, and nurses know they need a degree of some kind and need to be licensed. Furthermore, the data analysis showed that the skills field often functions as a catch all category into which skills, certifications, occupations, and qualifications were parsed. Specific skills and knowledge areas, such as knowledge of a computer language, are relatively easy to identify, as are basic skills such as communication. Other terms do not parse as neatly. This area requires a great deal more research. Further research should focus on extracting and categorizing skills and certifications that will bear fruit because online postings are the only real source of this data. As noted earlier, one of the key efforts is to increase the ability of analytical software to understand context. Source Domain: The website from which the job posting was captured can be a useful analytical tool. It can point to websites that generate disproportionate errors from the analytical software because of how the job information is formatted or how the site displays search terms and related jobs. It is potentially useful in generating stable time series databases. However, analysts need to be aware of shifts in domains that are unrelated to real changes in the data stream. Tracking successor and predecessor domains is a critical component of being able to track real changes, rather than simply a company s decision to redesign their website. This successor/predecessor issue should be familiar to any analyst who has tried to clean up wage record or QCEW files. Analysts need to be cautious about spikes in data from particular domains. Increases or decreases may be the result of changes in spidering, such as the addition of new websites or a broken spider. Such shifts may change the relative importance of a site in terms of producing master jobs the original job against which duplicates are measured. Guide To Using Real Time Data for LMI Analysts Page 13

15 What the data cannot do Northeast Research Consortium March, 2012 In addition to the types of analysis that can be conducted with online job postings data, there are also a number of analyses which are not advisable at this time. Location: Exact job locations are difficult to pinpoint. Recruitment often takes place regionally, and job postings commonly list the largest city near the work location, such as the Boston area. Unless the posting includes a precise address for the company (and even this is not always the place the new employee would be working), the general location is at best a proxy. The location related variables (e.g., city, county, zip code, latitude and longitude) should be used with caution. Geographies below the metro area level are likely to contain significant over or under counts, depending on where the analysis software decided to locate the job that lists only a broad recruitment area. Salary: In most online job postings, salary information is not included. When a salary is listed, it can be presented in many different forms (e.g., annually, monthly, weekly, or hourly). Postings with some salary information may include dollar signs or other abbreviations that cause inconsistency when parsing the information into variables. For all of these reasons, this data field is not considered usable. Occupational Coding: Occupational coding beyond the two digit SOC code level, is problematic. Overall, two factors influence the ability to analyze data by job family (two digit SOC). First, job postings do not, as a general rule, include occupational codes. Thus, the accuracy of occupational coding depends on the data provider s ability to accurately classify a job title into an occupational taxonomy. Within the data file used by the Consortium, the overall accuracy varied by type of job. For example computer and mathematics jobs were usually coded correctly, but farming, fishing, and forestry jobs were rarely coded correctly. The second factor influencing job family analysis is that the jobs posted online do not represent the universe of all occupations. Even if correctly coded into a taxonomy, there is limited representation for some job families (e.g., construction), while others have substantial representation in the postings (e.g., information technology / computer related occupations). The extent of the under representation can be estimated for job families where the job family has a tight link to a 3 digit NAICS code. For instance, NAICS 722 Eating and Drinking Establishments would be tightly linked to the SOC Family 35 Food Preparation and Service Workers. Using separations and new hire data from Census LED, one could estimate the proportion of turnover that would be expected in that industry compared to all others. That estimate of turnover could then be compared to the proportion of job posting found in job family 35 to see the extent of the likely underrepresentation. Guide To Using Real Time Data for LMI Analysts Page 14

16 Qualifications: Qualifications for job applicants, such as education, can be difficult to determine. This manifests itself in a number of ways: for example, most postings do not include the desired level of education, or when they do it can be unclear whether it is a required or preferred qualification. In addition, some basic prerequisites (e.g., passing the Bar exam for an attorney) that may be an industry or occupational standard are rarely listed because it is assumed that qualified applicants already know this requirement. As noted above, the full skill set required for each posting is typically not included. Industries: Online job postings are just that jobs. Industry information, and thus the ability to accurately assign a NAICS code, is rarely included in the postings. Frequently the posting is spidered from a site other than that of the employer, and a firm name is not listed. Guide To Using Real Time Data for LMI Analysts Page 15

17 Burning Glass Methodology The Consortium s job postings data was provided by Burning Glass Technologies, a Boston based firm that aggregates online postings and offers a variety products and services related to job matching. Postings are collected daily from thousands of private and government job boards and websites, newspapers and other media outlets, corporate job boards and websites, and community sites for employment opportunities. They are collected through spidering technology which crawls the web for suitable content. The text of each posting is then analyzed using a natural language based artificial intelligence, which allows context not simply a list of rules or look up tables to be taken into account when parsing the text into variables. A dataset of over 60 variables is produced, which range from the most basic (job title, company name, city) to more complex concepts (skills and credentials). For the Northeast Consortium project, a taxonomy of green skills was developed which serves as a reference so that when the parsing technology encounters a term representing a green skill it can be standardized in the database. Guide To Using Real Time Data for LMI Analysts Page 16

18 Job Postings Data and Traditional LMI Limited analysis has been done comparing trends in online job posting data to traditional labor market information. The Job Openings and Labor Turnover Survey (JOLTS) is not identical, but it is a helpful source of comparison. The two data sources have size differences; however, this has changed over time. In September 2004, there were over 4.1 million total private job openings from JOLTS, versus about 1.3 total online postings from Burning Glass; by summer 2010 this gap had narrowed to about three million openings for JOLTS and two million online postings for Burning Glass. This was likely due to a number of factors, such as increased use of online posting by firms and increased sites spidered by Burning Glass. More importantly, each series follows the same broad trends over time. And, although job postings data do track closely with Current Employment Statistics (CES), there is insufficient evidence to suggest that the data is a leading indicator. Guide To Using Real Time Data for LMI Analysts Page 17

19 Section 3: DEFINING GREEN Despite the recent momentum behind green jobs, there is still no universally accepted definition of green. A number of scholars have explored the workings of the green economy, but they all face the challenge of defining and quantifying this nebulous concept. In an effort to advance the efforts of conceptualizing the green economy, the Consortium chose to deviate from the traditional survey based approach and concentrate instead on analysis of online job postings. Data mining a universe of job ads gives researchers not only a time advantage, but also greater flexibility with respect to defining green. It also gives them the ability to modify the framework for what constitutes green (i.e., clusters, search triggers, etc.). More importantly, this method allows us to track the greening of the economy more efficiently than using green revenue as a proxy. 4 Given its strengths, this approach also has limitations. The Consortium had intended to conduct an analysis of robust data, however much of the time and focus was spent on improving the artificial intelligence parser and ongoing identification of green occupations and green skills. Initially it was believed that an industry based (NAICS) approach would yield the most effective outcomes. It quickly became evident that not all jobs within an industry, or even a job family, could be considered green and, most importantly, industry coding proved to be one of the most erroneous variables, in those rare instances where it was included in an online job posting. Therefore, a two tiered approach was established first, identification of green skills phrases, and second, creation of a green firms list. The foundation of this methodology is based on the green taxonomy, which is primarily derived from O*NET s descriptions of green occupations. In cooperation with our vendor, we closely imitated each occupation s description and, in turn, built and continuously updated a database of nearly 900 key phrases pertaining to green. 5 In effect, this is the basis of real time green demand analysis the prevalence of these tasks and phrases in real time data serves as a gauge of demand for green jobs. More specifically, postings that are validated through this taxonomy are flagged as green, signaling that certain green skills were listed in the job posting. Consider O*NET s description for Construction Managers, for example: Apply green building strategies to reduce energy costs or minimize carbon output or other sources of harm to the environment. Working from the above occupation, the following strings would all be included in the taxonomy to serve as trigger words in identifying green jobs: green, building, minimize carbon output, environment. The taxonomy can be easily improved with the addition of like terms and, similarly, the elimination of those deemed obsolete. 4 To proxy green jobs, some earlier studies used firms green revenue as share of total. Accuracy of such estimates is unsettled. 5 Appendix contains full taxonomy and clusters Guide To Using Real Time Data for LMI Analysts Page 18

20 The green jobs report developed by the Consortium shows the volume of online job ads containing green skills taxonomy. The report shows a point in time snapshot of the top green skills for the specified timeframe and geography. LMI users should be aware that due to changes in parsing software, changes to the green skills list and fluctuations in the number of websites spidered, there are too many inconsistencies to conduct a reliable time series analysis of individual skills. While the first approach provides a proxy for green labor demand via select keywords, the second tier was expected to give us parity with part of the BLS definition green goods producing and service providing establishments. Postings from establishments that only produce green goods or provide green services are indisputably green, so it follows that such analysis would shed light on recent green developments, such as skills and certifications demanded by the green business community. By matching a Consortium wide list of green firms 6 to real time data, we had hoped to reveal postings that may not have been triggered by our taxonomy, but may still be considered green since they are derived from a green employer. This method has proven to be very challenging in practice, however. Agreeing on green only producing establishments is problematic, but accurately matching establishment names to those appearing in real time data is simply not feasible at this time. This is largely due to the nature of online job postings (explained in the next section), whose format and content can vary substantially. Firstly, roughly half of the postings contain no employer information. Of those ads with the employer name present, slight discrepancies in spelling accounts for the remaining incongruity. For instance, in real time, company XYZ may take the form of XYZ, XYZ Inc., XYZ Corp., XYZ Ltd., etc. making the matching process not viable. One possible solution would be to add a handful of variations to each establishment, along with the official name, as seen in tax records. For these reasons, the advancement of this approach was not pursued as initially planned. As with most empirical work, we faced broader methodological limitations in defining green. Our reliance on algorithms to sort through the complexity of online jobs postings was the largest source of concern. As previously stated, postings are crafted for recruitment purposes and not for analysis, thereby often omitting information of interest to researchers. Even when the algorithms are on target, the context of each posting is perhaps the most delicate factor in the process, as many key words are acutely context sensitive, i.e., some terms have different meanings depending on the occupation to which it is applied. For example, reuse can apply to software reuse, workforce reuse, or water reuse, guaranteeing skewed green posting counts. Context also plays a factor when the parser cannot distinguish between a web site s page navigation, advertisements, and other source code, and the text of a job posting. When a green skill is pulled from a source outside the actual job posting text, it creates a false positive leading to an overcounting of green jobs. Also, it is wrong to assume that the distribution of green job postings mirrors the patterns of the real green economy; as previously explained, the analysis clearly shows that online job postings are not representative of all economic sectors. 6 See Green Firm Identification. Guide To Using Real Time Data for LMI Analysts Page 19

21 With the current occupational and industrial coding systems there is no perfect method for identifying green jobs. More specifically, green crosses the boundaries of today s job classification systems, making the green economy difficult to measure. Three different methods have attempted to overcome the difficulties: 1) counting green revenue share and apportioning employment via the given ratio; 2) directly surveying employers and asking them to list green jobs; and 3) using real time job ads to proxy green labor demand. The Consortium attempted to fill the void in previous studies by addressing a retrospective look at the green economy and we believe our methodology will enable researchers to track the gradual greening of the economy more precisely than other measures. While attempting to overcome the rigid nature of existing studies, our approach is not error free and we ve learned that real time requires further refinement and standardization to be a credible tool. With further improvement in artificial intelligence, analysis of the green economy through the lens of real time data should eventually result in more robust findings. Guide To Using Real Time Data for LMI Analysts Page 20

22 Section 4: THE DATA ITSELF There have been mixed responses to the increasing role of the Internet in recruitment. Some argue that given the size of today s applicant pool, firms have become ever more selective when sorting through résumés, prompting rumors of discrimination against the unemployed. 7 Others contend that online intermediaries alleviate labor market imperfections, often associated with imperfect information and adverse selection. 8 The prevalent use of the Internet in job search has greatly expanded the geographic and skill scope for employers, but, more importantly, the explosive growth in online job postings holds great potential for the research and LMI communities as a cutting edge tool to examine labor market conditions with less lag time than standard LMI data. As this approach continues to gain traction, end users are in need of a better understanding of the underlying data. One goal of this Consortium was to fill that void by providing an insider view of real time data. At present, the Consortium appears to be the first to critically evaluate the robustness of real time data. This section outlines critical issues of which analysts must be aware prior to working with real time data. 7 The Help-WantedSign Comes With a Frustrating Asterisk NYT ( 8 The Economics of Labor Market Intermediation VOX Policy Research ( Guide To Using Real Time Data for LMI Analysts Page 21

23 Nature of Real Time Not Traditional LMI Data It is important to highlight our ability to closely examine the data in raw form, a notion not well embraced by the industry most vendors provide preset, aggregated job postings and related analytics, which prevent gaining deeper insight on the subject matter. To mine the data, the Northeast Consortium partnered with Burning Glass Technologies (BG), a Boston based firm that develops technological methods of matching résumés to job postings. The data collection process relies on spiders that crawl the web, aggregating data from job boards, employer sites, government agencies, and newspapers. The goal of this process is to gather a comprehensive jobs database. The data is then locally stored, parsed and coded to over 60 variables, such as location, employer name, occupational and industry codes. Due to data quality, only a handful of variables have been found to be suitable for analysis at this time. Though there is room for optimism about expansion of the number of reliable variables, given the considerable improvements witnessed in our short time of working with real time data. 9 To understand complexity of real time data, one need not look further than the data deluge that has accompanied the Internet boom. Online content is growing exponentially, with only a fraction of it being verifiable, valid, and usable material. The same can be argued about online job postings. The limitations of real time are largely a consequence of: (1) how the postings are originally crafted, and (2) the effectiveness of artificial intelligence, or the parser, in properly coding the data. Unlike working with a static, homogenous database structure, the postings are aggregated from over 16,000 non standardized sources. For example, Craigslist, a free classifieds service, is a significant source of postings volume, but it also creates a substantial share of data problems. Unfortunately, artificial intelligence cannot yet effectively distinguish between a job, a personal ad, or spam. Whatever ends up on a job board is spidered, parsed, and becomes part of the real time database. This highlights a major difference between real time data and other job measures. Real time data is dependent upon how firms and recruiters craft their ads and do not necessarily translate into a hire or represent a vacancy. In other words, postings are designed for recruitment and not for research and analysis employers do not have to reveal much information to receive a flood of applications. In fact, it is not uncommon for key variables, such as the exact location, salary and even the employer s name to be omitted. While some variables are often omitted in job postings, it does not negate the value of those variables when they are present. For example, quality control reviews found that educational requirements were specifically identified in just one third of total postings. Among those postings that did not specify education, the educational requirement could be inferred for nearly 20 percent. 9 See Appendices for a summary of variables, their definitions and quality control results. Guide To Using Real Time Data for LMI Analysts Page 22

24 Postings without educational requirements were generally for occupations with common educational levels for qualification, i.e., professional or technical occupations in health care, computer science, finance, or law. When education level is specific to a position, it is often an employer requirement, thus the employer is more likely to include that information in a job posting. On the other hand, if education is an occupational requirement, as frequently established for State issued licenses, it is not uncommon for the educational level to be omitted, as qualified applicants are presumed to have already met the requirement. This observation presented an interesting dynamic in online job postings. The 2010 American Community Survey estimates show that about 37 percent of the region s population aged has a bachelor s degree or higher. 10 In a sample of 8,000 job postings for the entire region, among those that specified an educational level, just over 70 percent required a baccalaureate or higher. 11 This measure indicates an upscale bias in online job postings, meaning that jobs with lower skill requirements are less likely to be found online than jobs requiring higher levels of educational preparation. Analysis of zip code level and other sub state locations has indicated a significant issue. While value was found in available educational requirement information, geographic location questions were not so easily resolved. Some job postings are designed for national recruitment, while others are not. By the same token, some postings are only advertised internally or by word of mouth. With specific location data commonly omitted from ads, the parser infers a zip code for location of the job, sometimes using the firm headquarters, or the nearest metropolitan area, or even the job board s physical location. For example, a posting for a job in Westchester, New York might be advertised on multiple job boards as being in the greater New York area. The parser infers a zip code identification for New York City, since that is the only geographic area mentioned, but it is incorrect. This common lack of detailed geographic information brings into question the usefulness of geographic specificity below the State level, or perhaps the level of large metropolitan areas. Unlike traditional survey based methods, real time data is not subject to standard sampling errors, but is influenced by factors other than true labor demand. In essence, it is extremely difficult to control exogenous factors, such as the full extent of removal of duplicates (de duplication), job board under or over coverage, and spidering processes, among other issues. For instance, by plotting daily American Community Survey 1 Year Estimates, Table B23006: Educational Attainment by Employment Status for the Population 25 to 64 Years. Data summed for the eight state Consortium region. 11 Estimated from a sample of 8,000 postings from the period of 8/1/10 2/5/11 from all eight Consortium states Guide To Using Real Time Data for LMI Analysts Page 23

25 postings it becomes evident that the spiders are inactive some days while spiking on others. This volatility, along with an inability to adjust for seasonality, makes one month the minimum period for conducting time series analyses, with a three month period frequently being preferred. More problematic are the approximately 16,000 sites from which the data are sourced. In some cases, the spider that gathers data from these sites cannot differentiate between job posting text and the unique HTML code and detailed meta information 12 contained in a web page s source code. If page code and metadata is parsed into the real time database, it can undermine the parser s coding accuracy. Close examination of the data has shown that a single month may have as many records of parsed page code and metadata as the three previous months combined. As a result, the data has had to be continuously monitored and the parser updated with unique patches to address quality issues. Creating good real time data requires substantial and continued investment in human analysts who will not only monitor the system for quality, but be involved in the necessary efforts to improve the analytical software and provide context for the words and phrases found in the job ads. 12 Meta information, or metadata, describes other data, such as length of document, author, or date created. Web page HTML source code can include page navigation, text from other advertisements, page headers and footers, etc. Guide To Using Real Time Data for LMI Analysts Page 24

26 Duplicates and De Duplication Another major challenge in producing reliable analysis is the prevalence of job scraping, which is an industry wide practice that consists of copying postings from corporate client sites, state jobs banks, and other sources to other job boards. Job postings are produced to attract applicants and some are written by employers to solicit a specific response, or test the labor market. Duplication is beneficial for employers as expanded market coverage increases the odds of reaching the talent pool further reinforcing the concept that real time data does not directly track job openings or vacancies. Thus, the interests of job seekers, employers and job boards are usually in conflict with those of an analyst. This highlights the need for a standardized methodology to identify duplicates and to de duplicate data. Industry research suggests that most vacancies are filled within two to three months; therefore postings that re appear within a 60 day period are flagged as duplicates by our vendor. (Other vendors may use a different time period and analysts should always query the vendor about the specifics of the de duplication methodology.) This is subject to debate, however, as those estimates were developed prior to the recent recession. The Great Recession has created an employer s market, with more job seekers than job openings. Further delaying the hiring process is the gap between the candidates skills and experience and those desired by the firm, leading recruiters to woo candidates from competitors. The 60 day window is a subtle element and to date, the Consortium has not produced significant analyses on the role of time frames to producing more robust data. Generally, duplicates are believed to complicate the reconciliation of total job counts reports and in turn, cause fluctuations in the distribution of key variables until 60 days after the original posting. To further distinguish duplicates from unique postings, the de duplication algorithm relies on a number of select variables (namely, job title, employer, city, state, skills ) and the presence of employer information. The standard de duplication algorithm used by our vendor is able to identify and remove a share of the duplicates based on defined criteria, such as a reoccurring job title posted by the same employer name. But it still takes an experienced human eye to verify the extent of duplicates in the data, which is costly. De duplication concerns are often de emphasized, because the algorithm relies on a script with a strict set of rules that do not always pick up small subtleties in the postings. One cannot fully rule out the possibility of a duplicate even if the content inside the posting has been modified. Analyses of the rates at which select postings and/or occupations reappear may provide a useful gauge for determining the difficulty in filling certain vacancies. Even after being filtered, the dataset may still contain percent duplicates, primarily due to slight variations in either the job title or the employer name, such as Vanguard, Vanguard Inc., or Vanguard, Inc. This variation can be Guide To Using Real Time Data for LMI Analysts Page 25

27 attributed to how online job postings are uploaded, but is more likely the result of being sourced from thousands of different sites. Each job board is unique to the parsing technology and some sites are parsed with metadata 13 while others are not. presents a substantial qualitative difference as metadata often alters the parsed variables, making analysis and de duplication efforts ineffective. Lastly, it is worth mentioning the incidence of postings from large chains and box stores. Similar to staffing firms, some companies aggressively manipulate the online job market arena, at times posting to essentially every location they re established in a given geography. For example, an opening for a cook at a national chain restaurant tends to be scraped across all of the chain s locations in the vicinity. Unless cooks at all these locations come and go concurrently, as real time data suggests, there will be an over count of cooks in the data. This potential over count needs to be balanced against a likely under count for low skill jobs, because they are less frequently posted on the Internet. Duplication remains a great challenge in interpreting real time data. As long as the marginal cost of postings jobs on the Internet keeps diminishing, duplicates will continue to complicate matter for researchers. However, it should be noted that duplicates impact different types of analysis in different ways. Pure demand analysis is most impacted by duplicates because they will inflate the job counts. Skill analysis is likely to be less impacted because that analysis usually focuses on the distribution and proportion of skill words and phrases and the absolute level. 13 Metadata is used to describe a page's content and information. In our context, this is a spidering issue, rather than parsing. Issues arise when header information that is not part of the job text itself is spidered and parsed into real time variables. Guide To Using Real Time Data for LMI Analysts Page 26

28 What We ve Learned About Online Job Ads Reality did not meet expectations: Based on published data from other vendors, we expected to have data that covered the universe of occupations and industries, and would be useful at sub State levels. This turned out not to be true. Job ads posted on line are prevalent in some occupational areas and non existent in others. The postings are usually vague in geographic location details, and frequently do not include information that we d hoped to analyze such as salary, educational requirements, experience requirements, and required skills. There is still a great deal of analytical and procedural work to be done: Initially, we expected to obtain a complete and standardized set of data. But this turned out to be more bleeding edge than most of us thought it would be. At the present time, the online data still requires intensive (costly and time consuming) review by knowledgeable analysts. We are only part way up the learning curve in developing a standardized methodology for extracting, analyzing, and publishing reliable volumes of real time data. The information extracted from the postings data is still a useful tool, if used appropriately: This information is particularly indicative of rapidly changing trends in technology when used in point in time contexts. It could also be a useful tool for employment counselors, identifying employers with high posting levels for select job titles. Some occupational areas are well represented in the data, offering good insight into the market for those fields with much less of a lag time than traditional labor market information. Identification of green jobs and skills appears to be possible: There have been some green skill phrases and specific certifications (such as LEED) that were successfully extracted from postings data where they existed. But it should be noted that job posting analysis, as well as green jobs and green skills identification, are in their infancy. This work still requires a great deal of refinement and further analysis. Improvements are dependent on employers continuing to deem green practices beneficial to their business and to desire employees with green skills. Guide To Using Real Time Data for LMI Analysts Page 27

29 Green job requirements are still difficult to isolate: As with any emerging sector, the green key words and phrases and their related job postings change quickly. Many key words are acutely context sensitive, having multiple meanings when taken at face value. Other key words may be common in job postings but may or may not represent a green job duty when placed in context. The good news is that the current ability to identify some green skills in job postings indicates that with continued investment, the ability to mine green skills and certifications should be possible. There is value in the current product: While the data did not meet the original expectations of being a flawless, finished product, several job families were found to have consistently reliable data enough to be published with a reasonable level of confidence. State level data also had a measure of accuracy, enough to be published with confidence. The data now has value as a point in time measure, but not a time over time measure. Guide To Using Real Time Data for LMI Analysts Page 28

30 What We ve Learned About Working With Vendors Having a good vendor, and a good relationship with that vendor, is critical: Data collection, parsing, and coding are time consuming and costly processes. For the data to be useful, the vendor must be able to continuously improve the spidering process, de duplicate data sets, and adjust coding. The vendor needs to conduct some measure of data quality. Two way communication is also critical in the quality analysis process. For data improvements to take place, the vendor must be willing to accept and act on feedback from analysts. At this point in the Consortium s work, continuous quality analysis is still needed to monitor data quality. When selecting a vendor, know what you are getting: When contracting with a vendor for job postings data collection and processing, understand what you are getting. Decide what you desire and expect to get from a vendor, and determine if the vendor will indeed be able to meet your needs. Some questions you might ask: Will there be a single point of contact for the project? Will the vendor provide timelines and remain in contact with the customer to resolve issues? What de duplication process is used, including time frame? Are data available both with and without duplicates? What quality control processes are in place? Does the vendor check for data spikes or other anomalies? How much quality analysis will you have to provide back to the vendor? Is there standardized data set for consistency over time? Will the vendor accept and act to correct submitted data errors? What process is used to code data, such as occupational or geographic area coding? Will the vendor customize inferred data for the customer s use? Does the vendor supply data sets directly or must a third party be involved (this may cause a lag in obtaining data)? Guide To Using Real Time Data for LMI Analysts Page 29

31 Conclusion As exciting as real time data developments may seem, we have yet to realize its full potential. Analysts must recognize that it is not produced for analysis, and fails to meet basic standards applied to most other labor market data. It is, however, a tool that, with ongoing refinements and longer time series, can complement areas of labor market research and perhaps advance public policy discussion. The Consortium has been able to expose numerous issues that should motivate greater interest in this under researched field. Technological advancements should mitigate some of the problems in regards to duplicates and coding, but future efforts will inevitably be affected by the nature of this data. Real time data require sustained monitoring in order to alleviate quality concerns, but future parser refinements should allow for richer analysis. Given the challenges behind reducing statistical noise, real time data remains very much experimental. Real time data reports should not stand alone, but should be reviewed by experienced LMI analysts and combined with traditional labor market information. It is the responsibility of analysts to use expertise of their local area markets in order to ensure any reports based on real time data are appropriately reported in conjunction with other labor market data. Guide To Using Real Time Data for LMI Analysts Page 30

32 APPENDIX I: GLOSSARY Canon A variable field that is the result of standardizing raw postings data. Different entries in a variable field which represent the same entity are combined into one, specific, standardized entry. E.g., G.E. and General Electric represent the same company and would have the same Canon Employer title. This is done to improve the efficiency of the algorithm and is still a work in progress. De duplication A process by which the collected job postings have copies of job ads that have been identified as duplicates removed from the dataset in an attempt to have a single, unique job ad represented in the dataset. It is important to note that it may not be possible to remove all duplicate job ads. Green The Consortium embraces O*NET s identification of green rather than a job being exclusively green or non green, a gradual greening of jobs is observed. The general use is based on the keywords approach, with those skills that are identified as having green properties being indicative of the greenness of a job. A more thorough description of the competing green definitions can be found in an adjoining Consortium report, Methodological Appendix: Northeast Research Consortium Green Definition and Identifying Green Jobs. Job Posting The actual content of an advertisement for a potential job opportunity that appears on an online site. However, it is important to note that a job posting does not equate to the term job vacancy. Parser An artificial intelligence tool that analyzes a job posting s content and separates it into variable fields for the dataset. The use of artificial intelligence allows for continual improvement in the field accuracy through a learning process. Real Time Data The universe of online job postings at a specific point in time. It offers potential benefits not offered by traditional Labor Market Information (LMI), but has significant caveats to be considered, as detailed in other reports, especially the Guide to Using Real Time Data for LMI Analysts. Spider A computer program that automatically retrieves web pages. Spiders are used to feed web pages to search engines. The term spider is similar to WebCrawler because spiders crawl the web. Guide To Using Real Time Data for LMI Analysts Page 31

33 Dataset Variables Canon Certification Standardized variable for any certifications found in the Certification field, e.g., CDL becomes Commercial Driver s License. Canon City Standardized variable for the city field, used in the event of an alias for a city, e.g., Anderson Acres becomes Reno. When posting information is scarce, there are times it erroneously reports corporate headquarters or even job boards locations. Canon County Standardized variable for the associated county with the job posting. Canon Employer Standardized form of the employer name extracted from the job posting. Canon Intermediary Standardized version of any hiring intermediaries. Canon Job Title Standardized version of the job title that is extracted from the parsed job title information in the job posting. Canon Job Type Standardized version of the JobType field. Canon Skill Collection of the standardized skills pulled from the job posting. This field is strictly derived from a given taxonomy; many of the skills may be subjective (e.g., Facebook, Microsoft, etc.). Canon Skill Clusters Standardized variable for skillcluster names that are applicable to the job posting. Canon State Standardized variable for the State referenced in the job posting. Canon Min Years Of Experience Standardized variable for the computed minimum of experience in months. Canon Years of Experience Level Standardized variable for the years of experience taken from the other fields into three ranges: low, mid, and high (experience.) These ranges are 0 1 for low, 1 6 for mid, and 6 or more for high experience. Certification Any certifications that are found in and extracted from the online job posting. Clean Job Title An intermediary job title field that is meant to remove extraneous text and/or noise from the field. Consolidated Inferred NAICS The NAICS code for the online job posting, based on the employer information and any existing NAICS code related to that employer via inference. If the background relating to the company is used, it may or may not produce multiple NAICS codes. Guide To Using Real Time Data for LMI Analysts Page 32

34 Consolidated Degree Degree qualifications as detailed in the job posting (e.g., Bachelor s, Masters, etc.). Consolidated O*NET The O*NET code of the job posting as determined by the Autocoder. Green O*NET Binary variable, which indicates if the occupation found in the job posting matches one from the O*NET definitions that they have marked as green. Green O*NET Type A listing of any of the applicable green O*NET subcategories, if the occupation code found is considered green by O*NET, e.g., green enhanced demand, green new and emerging, green enhanced skills, etc. Intermediary Identifier that indicates the presence of a recruiter or staffing intermediary in the posting. Is Green Identifier that specifies if there are skills in the skills variable that fall into the green skills category and, if so, this field is True/reports as 1. Is Duplicate Identifier of whether the job posting is considered a duplicate of an older ad, based on a comparison in the database to other job postings in a 60 day window. Is Duplicate Of Identifier of the corresponding previous JobID of an identified duplicate. Job Date Date the posting was acquired and added to the database. Job Domain Domain name of the website where the job posting was acquired. Job ID A unique identifier generated of an online job posting. Job Type Identifier of the type of job, whether it be permanent, contractual, or temporary. LMA Identifier of the Labor Market Area, as defined by the 2010 LMA Directory from the Bureau of Labor Statistics. Major Identifier of a specific advanced degree, if it is requested in the job posting. MSA Identifier of the Metropolitan Statistical Area, as defined by the Office of Management and Budget 2009 MSA lookup. O*NET Salary Local Annual Identifier of the annual salary data for the O*NET code referenced in the dataset on a local level. O*NET Salary Local Hourly Identifier of the hourly salary data for the O*NET code referenced in the dataset on a local level. Guide To Using Real Time Data for LMI Analysts Page 33

35 O*NET Salary National Annual Identifier of the annual salary data for the O*NET code referenced in the dataset on a national level. O*NET Salary National Hourly Identifier of the hourly salary data for the O*NET code referenced in the dataset on a national level. Raw Description Identifier of the raw text file that was extracted from the website containing all the information from which the software produces entries for each particular job posting. Root Title The most basic form of the job posting in question, e.g., Nurse Practitioner would have Nurse as the root title. Salary Identifier for the salary, as it is extracted, when available, from the job posting, and used to standardize the other salary fields in the database for the job. Source ID Identifier for the specific source for the job posting. Years Of Experience Identifier for the number of years of experience, as listed in the job posting. Guide To Using Real Time Data for LMI Analysts Page 34

36 APPENDIX II: Summary Results of the Second Quality Control Introduction Due to size constraints, previous quality control efforts have been omitted. This round of quality control looked at 450 green postings and 900 non green postings in order to assess to accuracy several variables of interest, such as greenness, skills, NAICS, job family and source of data. A stratified random sample was drawn with postings stratified by state and sources of the posting. The postings were restricted to the eight Northeastern states, and the postings were drawn to represent the respective proportions of the five online posting sources given below: (1) Employer job boards (2) Job boards (3) Recruiters (4) Free job boards (5) Unknown Summary findings: There are limitation imposed by how firms and recruiters post job openings, e.g., employer name is not disclosed in some postings, education requirements for some job openings are not stated, etc. This absence reduces the usefulness of those fields for analytical purposes. Despite these limitations, findings indicated that the parser showed considerable improvement in this second round of quality control over the first round. The degree of accuracy achieved for some fields leads us to suggest the data will be useful for select research purposes and reporting. We categorized three fields based on their rate of accuracy. Fields with an accuracy rate of 80 percent or more were deemed to be useful for reporting. Fields with accuracy between percent were deemed to be used for research and reporting, but with some caution. Fields with an accuracy rate less than 70 percent were considered not yet ready for reporting. These cut offs have no scientific basis, but mainly based on our understanding of the limitations of the postings data. It is also important to note the differences in the evaluation process by State representatives some chose to recognize Microsoft Office as a skill, while others did not. Guide To Using Real Time Data for LMI Analysts Page 35

37 According to the above mentioned categorization, we considered the state, city, occupation title, 2 digit occupation code, and limited skills to be ready for research and reporting. Any research that uses detailed occupational codes (of 3 digits or more) should be reported with caution. Industry code, employer name, and education requirement are not ready for reporting. The green posting identifier with an accuracy rate of 69 percent falls below the reporting threshold. Yet, this approach has the highest accuracy rate when compared to other existing non survey based approaches. While we strove to achieve an accuracy rate of at least 80 percent, we believe that the green field can be used for analysis. Statistical Guidelines The purpose of this review was to arrive at the accuracy of the entire postings database (population) using a random sample. It is important to discuss how the estimated accuracy rate for the sample translates to the population. The population accuracy rate will be given as a range for the sample accuracy rate. The size of the population accuracy range depends on the sample size and the expected level of confidence. The below table provides the 95% confidence interval for the different accuracy rates for a sample size of 800 and of 400 postings. This means that the true population accuracy is the sample accuracy ± the population correction with 95% confidence. The larger the sample, the smaller is the range of the population accuracy rate. The below discussion only provides the sample accuracy rate, but the reader is asked to bear in mind that the true population accuracy rate lies within a narrow band of the sample accuracy rate. Guide To Using Real Time Data for LMI Analysts Page 36

38 Table 1: Correction for the Population Accuracy by Sample Size and Sample Accuracy Population correction (± error) Sample Accuracy % Sample size=400 Sample size= Guide To Using Real Time Data for LMI Analysts Page 37

39 Quality of Non green Postings Out of the 800 non green postings, 12 were either not a job opening or outside of the US. Table 2 summarizes the results from 800 non green postings. This table provides four data points for each of the key fields in the postings data; the number of postings with information for field correctly identified, the number of postings with the field incorrectly identified, the number of postings with the particular field information not available in the job text and thus, the information missing for that field, and lastly, the number of postings which had some entry in the field, but the accuracy of the field cannot be verified from the job text. The rate of accuracy is reported as a share of all 800 postings and also as a share of all except those where the information is missing in the job text. Though one can argue that 788 (= postings) should be used as the base, the reported parser accuracy rate uses all 800 postings as the base. The accuracy of the occupation titles were in the upper 80s. This somewhat explains the 81 percent accuracy rate of the 2 digit occupation code (ConsolidatedOnet). The 6 digit occupation code has an accuracy level of 73 percent. This is a considerable improvement from before. The parser is able to identify the state of the posting 93 percent of the time and the city is correctly identified 80 percent of the time. This again is an improvement that increases the reliability of the data. The accuracy level of the employer name is of concern. The employer name has an overall accuracy rate of 55 percent. However, this number increases to 75 percent when the postings that do not provide employer name are removed before calculating the ratio. The low accuracy rate in the employer name is partly due to how employers and recruiters post the job openings online. Given this limitation, the maximum accuracy that the parser can achieve will be around 80 percent. Also, it is not surprising that the accuracy of the industry code (InferredNAICS) also remains low (50% 60%) given the low accuracy in identifying the employer name. However, there are a few instances that an industry code is flagged using the description of the company even if the employer name is not available. Education requirements (MinDegreeLevel) are available in only a little more than half of the postings. When available in the posting, the parser seems to accurately identify the education requirement 85 percent of the time. Guide To Using Real Time Data for LMI Analysts Page 38

40 Table 2: Summary results of quality control of 800 non green postings 800 Non green Postings Rate of Accuracy Field Name Correctly Identified Incorrect Not available in job text Accuracy cannot be verified from job text % of all 800 non green postings % excluding not available in job text Canontitle % 89% Cleantitle % 87% CanonEmployer % 75% Canoncity % 80% CanonState % 93% InferredNAICS 2 digit % 76% 3 digit % 67% 4 digit % 65% ConsolidatedOnet 2 digit % 81% 3 digit % 77% 6 digit % 73% MinDegreeLevel % 85% Certification % 46% YearsofExperience % 88% CanonSkills Share of Skills Correctly identified Share of Skills captured incorrectly 86% 7% Guide To Using Real Time Data for LMI Analysts Page 39

41 The accuracy rate of the skills field is reported using two metrics. First is the average share of skill phrases correctly captured by the parser. The second is the average share of skill phrases incorrectly captured. The average share of skill phrases correctly captured is 86 percent. Then the likelihood of incorrect skill phrases is very low at seven percent. 14 Observations indicate differences in the degree of detail in evaluating the skills field by state representatives. There were differences in the amount of detail/attention given to the field from one state to another. When one representative would carefully examine the job text for skill phrases as to include all skills, general (e.g., communication, Microsoft, etc.) and job specific skills, the other would place the emphasis only on more job specific skills. This is especially true when reviewing green postings, as the focus appeared to be exclusively on green skills. The total number of skill phrases is derived by summing the counts of what the states mark as correctly captured skill phrases and skill phrases not captured. Differences in state reported not captured skill phrases, translate into differences in what we expect to be the total number of skill phrases in a posting. These differences would impact the estimates of the share of skill phrases detected and not detected by the parser. Thus, the calculations for the skills field should be considered as rough estimates that provide some guidance to the level of reliability. More standardized review of the skills field is required to obtain a reliable estimate of the accuracy of the parser. Quality of Green Postings Out of the 400 green postings, seven postings were in Canada and three were identified not to be a posting. Table 3 summarizes the results from reviewing the quality of the 400 green postings. The findings are similar to what we saw in non green postings. The variable that is of principle interest is the IsGreen field that flags the posting as green or not based on the green skill phrases identified in the posting. These 400 postings that were flagged as green were reviewed to understand the accuracy of the parser to identify green skills and label those postings as green. Summary results indicate that 69 percent of the time, the parser was able to correctly identify green postings. The remaining 31 percent of postings were false positives. These false positives were mainly a result of the parser capturing green specific phrases out of context in the job description or posting (like LEED certified building from the company description or metadata of the website as an analysis by Connecticut pointed out) or because there were phrases that are not unique to green postings (such as recycling, and water treatment). This accuracy rate is somewhat higher than what we initially expected and as pointed out by the Connecticut analysis, there is hope that the false positives can be reduced further by some quick fixes. 14 Note that the two shares (correct and incorrect share of skills) do not add up to 100 as the denominators used are different. The share of correctly captured skill phrases uses the total number of expected skills phrases in the denominator. The share of incorrectly captured skill phrases uses the total number of skill phrases captured by the parser. Guide To Using Real Time Data for LMI Analysts Page 40

42 Table 3: Summary results of quality control of 400 green postings 400 Green Postings Rate of Accuracy Field Name Correctly Identified Incorrect Not available in job text Accuracy cannot be verified from job text % out of all 400 postings % excluding not available in job text Canontitle % 87% Cleantitle % 91% CanonEmployer % 74% Canoncity % 85% CanonState % 92% InferredNAICS 2 digit % 77% 3 digit % 60% 4 digit % 58% ConsolidatedOnet 2 digit % 78% 3 digit % 74% 6 digit % 71% MinDegreeLevel % 91% Certification % 50% YearsofExperience % 87% Clusters % 75% Green % 74% IsGreen % 69% CanonSkills Share of Skills Correctly identified 97% Share of Skills captured incorrectly 5% Guide To Using Real Time Data for LMI Analysts Page 41

43 Table 4, below, summarizes the number of green postings that state reviewers identified to be truly green out of the 50 Burning Glass flagged green postings that each State reviewed. Note that the State here is mainly an identifier of the person that reviewed the 50 postings rather than whether the postings were from the particular State or not. As can be seen from the State breakdown, there is the possibility of a degree of personal variation introduced to all results, but especially to the green related fields when identifying what is a green job. Table 4: Accuracy of 'IsGreen' field by Reviewed State State Correctly Flagged Incorrectly Flagged Accuracy Rate GU % CT % ME % NH % NY % RI % VT % NJ % Total/Average % Accuracy by the Source of the Posting Table 5 reports the accuracy of each key field by the source of online posting. The sources Free job boards and Unknown have the lowest accuracy rate in most fields. These sources mainly fail at identifying the employer name and the industry code. The employer name is less likely to be identified when the posting is captured from a recruiter site. However, as the recruiters provide a description of the company, the parser was able to tag the industry code to some of the postings from recruiter sites. The ability of the parser to capture skills phrases does vary by sources, but the variation is much less than what we had expected. This ranges between percent by source of the postings. Guide To Using Real Time Data for LMI Analysts Page 42

44 Table 5: Parser Accuracy by Source of Posting Field Name Employer posted Job board Free job board Recruiter Unknown Source ID=(1,4,5) Source ID=(2,6) Source ID=(201,202) Source ID=(3,7,8) Source ID=(0) Number of postings Accuracy Rate Canontitle 92% 88% 89% 89% 76% Cleantitle 88% 86% 89% 89% 72% CanonEmployer 81% 56% 11% 29% 45% Canoncity 77% 80% 85% 83% 62% CanonState 91% 92% 98% 95% 69% InferredNAICS 84% 55% 26% 65% 45% 2 digit InferredNAICS 4 digit 78% 44% 22% 56% 38% ConsolidatedOnet 2 72% 81% 87% 88% 55% digit ConsolidatedOnet 6 63% 75% 74% 84% 38% digit MinDegreeLevel 54% 49% 28% 37% 28% Certification 12% 8% 2% 5% 3% YearsofExperience 59% 53% 15% 49% 31% Share of Skills Correctly identified Share of Skills captured incorrectly 91% 87% 82% 79% 79% 5% 8% 5% 6% 4% Guide To Using Real Time Data for LMI Analysts Page 43

WEIE Labor Market Context: Methods & Resources

WEIE Labor Market Context: Methods & Resources WEIE Labor Market Context: Methods & Resources The labor market component of the partnership case studies consisted of secondary analysis of existing public and proprietary data sources, supplemented and

More information

Evaluating Salary Survey Methodologies By Jonas Johnson, Ph.D. Senior Researcher

Evaluating Salary Survey Methodologies By Jonas Johnson, Ph.D. Senior Researcher Evaluating Salary Survey Methodologies By Jonas Johnson, Ph.D. Senior Researcher (800) 627-3697 - [email protected] - www.erieri.com - 8575 164th Avenue NE, Redmond, WA 98052 In the field of compensation,

More information

Matching Workers with Registered Nurse Openings: Are Skills Scarce?

Matching Workers with Registered Nurse Openings: Are Skills Scarce? Matching Workers with Registered Nurse Openings: Are Skills Scarce? A new DEED study found that a lack of skilled candidates is a small factor in the inability of employers to fill openings for registered

More information

WHAT AN INDICATOR OF LABOR DEMAND MEANS FOR U.S. LABOR MARKET ANALYSIS: INITIAL RESULTS FROM THE JOB OPENINGS AND LABOR TURNOVER SURVEY

WHAT AN INDICATOR OF LABOR DEMAND MEANS FOR U.S. LABOR MARKET ANALYSIS: INITIAL RESULTS FROM THE JOB OPENINGS AND LABOR TURNOVER SURVEY WHAT AN INDICATOR OF LABOR DEMAND MEANS FOR U.S. LABOR MARKET ANALYSIS: INITIAL RESULTS FROM THE JOB OPENINGS AND LABOR TURNOVER SURVEY Kelly A. Clark, Bureau of Labor Statistics 2 Massachusetts Ave. NE,

More information

META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING

META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING Ramesh Babu Palepu 1, Dr K V Sambasiva Rao 2 Dept of IT, Amrita Sai Institute of Science & Technology 1 MVR College of Engineering 2 [email protected]

More information

Appendix B Data Quality Dimensions

Appendix B Data Quality Dimensions Appendix B Data Quality Dimensions Purpose Dimensions of data quality are fundamental to understanding how to improve data. This appendix summarizes, in chronological order of publication, three foundational

More information

This study is an extension of a research

This study is an extension of a research Business Employment Dynamics data: survival and longevity, II A study that extends previous research on the longevity of businesses shows that survival decreases at a decreasing rate; establishments that

More information

Actuarial Standard of Practice No. 23. Data Quality. Revised Edition

Actuarial Standard of Practice No. 23. Data Quality. Revised Edition Actuarial Standard of Practice No. 23 Data Quality Revised Edition Developed by the General Committee of the Actuarial Standards Board and Applies to All Practice Areas Adopted by the Actuarial Standards

More information

Supply & Demand Report

Supply & Demand Report Supply & Demand Report Job Title: Manufacturing Engineer Location: Phoenix, AZ (within a 50 mile radius) Timeframe: December 2012 to November 2014 Filters Applied: Occupations : Industrial Engineers, Mechanical

More information

Short-Term Forecasting in Retail Energy Markets

Short-Term Forecasting in Retail Energy Markets Itron White Paper Energy Forecasting Short-Term Forecasting in Retail Energy Markets Frank A. Monforte, Ph.D Director, Itron Forecasting 2006, Itron Inc. All rights reserved. 1 Introduction 4 Forecasting

More information

CRISP-DM, which stands for Cross-Industry Standard Process for Data Mining, is an industry-proven way to guide your data mining efforts.

CRISP-DM, which stands for Cross-Industry Standard Process for Data Mining, is an industry-proven way to guide your data mining efforts. CRISP-DM, which stands for Cross-Industry Standard Process for Data Mining, is an industry-proven way to guide your data mining efforts. As a methodology, it includes descriptions of the typical phases

More information

KEYWORDS: Risk Assessment, Competitive Intelligence, National Security, Web Security, Defense, Information Security

KEYWORDS: Risk Assessment, Competitive Intelligence, National Security, Web Security, Defense, Information Security The Competitive Intelligence and National Security Threat from Website Job Listings Jay D. Krasnow Georgetown University (M.A., May 2000) Communications, Culture and Technology Program 10706 Kings Riding

More information

JOB OPENINGS AND LABOR TURNOVER APRIL 2015

JOB OPENINGS AND LABOR TURNOVER APRIL 2015 For release 10:00 a.m. (EDT) Tuesday, June 9, Technical information: (202) 691-5870 [email protected] www.bls.gov/jlt Media contact: (202) 691-5902 [email protected] USDL-15-1131 JOB OPENINGS AND LABOR

More information

Education Pays in Colorado:

Education Pays in Colorado: Education Pays in Colorado: Earnings 1, 5, and 10 Years After College Mark Schneider President, College Measures Vice President, American Institutes for Research A product of the College Measures Economic

More information

How To Monitor A Project

How To Monitor A Project Module 4: Monitoring and Reporting 4-1 Module 4: Monitoring and Reporting 4-2 Module 4: Monitoring and Reporting TABLE OF CONTENTS 1. MONITORING... 3 1.1. WHY MONITOR?... 3 1.2. OPERATIONAL MONITORING...

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

IMPLEMENTATION NOTE. Validating Risk Rating Systems at IRB Institutions

IMPLEMENTATION NOTE. Validating Risk Rating Systems at IRB Institutions IMPLEMENTATION NOTE Subject: Category: Capital No: A-1 Date: January 2006 I. Introduction The term rating system comprises all of the methods, processes, controls, data collection and IT systems that support

More information

Quarterly Census of Employment and Wages (QCEW) Business Register Metrics August 2005

Quarterly Census of Employment and Wages (QCEW) Business Register Metrics August 2005 Quarterly Census of Employment and Wages (QCEW) Business Register Metrics August 2005 Sheryl Konigsberg, Merissa Piazza, David Talan, and Richard Clayton Bureau of Labor Statistics Abstract One of the

More information

Jobs4TN Online. Digital Government Government to Citizen. Contact: Leesa Bray, Information Technology Administrator. Tennessee

Jobs4TN Online. Digital Government Government to Citizen. Contact: Leesa Bray, Information Technology Administrator. Tennessee Jobs4TN Online Digital Government Government to Citizen Contact: Leesa Bray, Information Technology Administrator Tennessee Project Initiation Date: January 12, 2011 Project Completion Date: May 14, 2012

More information

!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"

!!!#$$%&'()*+$(,%!#$%$&'()*%(+,'-*&./#-$&'(-&(0*.$#-$1(2&.3$'45 !"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"!"#"$%&#'()*+',$$-.&#',/"-0%.12'32./4'5,5'6/%&)$).2&'7./&)8'5,5'9/2%.%3%&8':")08';:

More information

South Carolina Nurse Supply and Demand Models 2008 2028 Technical Report

South Carolina Nurse Supply and Demand Models 2008 2028 Technical Report South Carolina Nurse Supply and Demand Models 2008 2028 Technical Report Overview This document provides detailed information on the projection models used to estimate the supply of and demand for Registered

More information

Iowa State University University Human Resources Classification and Compensation Unit 3810 Beardshear Hall [email protected]

Iowa State University University Human Resources Classification and Compensation Unit 3810 Beardshear Hall uhrcc@iastate.edu Iowa State University University Human Resources Classification and Compensation Unit 3810 Beardshear Hall [email protected] Table of Contents INTRODUCTION... - 3 - SECTION I - EXTERNAL COMPETITIVENESS...

More information

Commonwealth of Virginia Job Vacancy Survey 2011-2012

Commonwealth of Virginia Job Vacancy Survey 2011-2012 a Commonwealth of Virginia Job Vacancy Survey 2011-2012 Prepared for: Virginia Employment Commission Richmond, Virginia Prepared by: Virginia Center for Urban Development and the Survey and Evaluation

More information

ANALYSIS OF HR METRICS for the Northern Non Profit Service Providers

ANALYSIS OF HR METRICS for the Northern Non Profit Service Providers ANALYSIS OF HR METRICS for the Northern Non Profit Service Providers Part of the 2011/12 Shared Human Resource (HR) Services Pilot Program for Small Non Profit Agencies Serving A Large Geographic Area

More information

A better way to calculate equipment ROI

A better way to calculate equipment ROI page 1 A better way to calculate equipment ROI a West Monroe Partners white paper by Aaron Lininger Copyright 2012 by CSCMP s Supply Chain Quarterly (www.supplychainquarterly.com), a division of Supply

More information

The U.S. Producer Price Index for Management Consulting Services (NAICS 541610)

The U.S. Producer Price Index for Management Consulting Services (NAICS 541610) The U.S. Producer Price Index for Management Consulting Services (NAICS 541610) Andrew Baer* U.S. Bureau of Labor Statistics 2 Massachusetts Avenue NE Washington, DC 20212 August 8, 2006 * The views expressed

More information

APPENDIX N. Data Validation Using Data Descriptors

APPENDIX N. Data Validation Using Data Descriptors APPENDIX N Data Validation Using Data Descriptors Data validation is often defined by six data descriptors: 1) reports to decision maker 2) documentation 3) data sources 4) analytical method and detection

More information

Recruitment and Selection

Recruitment and Selection Recruitment and Selection The recruitment and selection belongs to value added HR Processes. The recruitment is about: the ability of the organization to source new employees, to keep the organization

More information

Ch 1 - Conduct Market Research for Price Analysis

Ch 1 - Conduct Market Research for Price Analysis Ch 1 - Conduct Market Research for Price Analysis 1.0 - Chapter Introduction 1.1 - Reviewing The Purchase Request And Related Market Research o 1.1.1 - How Was The Estimate Made? o 1.1.2 - What Assumptions

More information

Populating a Data Quality Scorecard with Relevant Metrics WHITE PAPER

Populating a Data Quality Scorecard with Relevant Metrics WHITE PAPER Populating a Data Quality Scorecard with Relevant Metrics WHITE PAPER SAS White Paper Table of Contents Introduction.... 1 Useful vs. So-What Metrics... 2 The So-What Metric.... 2 Defining Relevant Metrics...

More information

Markups and Firm-Level Export Status: Appendix

Markups and Firm-Level Export Status: Appendix Markups and Firm-Level Export Status: Appendix De Loecker Jan - Warzynski Frederic Princeton University, NBER and CEPR - Aarhus School of Business Forthcoming American Economic Review Abstract This is

More information

Data Quality Assessment. Approach

Data Quality Assessment. Approach Approach Prepared By: Sanjay Seth Data Quality Assessment Approach-Review.doc Page 1 of 15 Introduction Data quality is crucial to the success of Business Intelligence initiatives. Unless data in source

More information

Fort McPherson. Atlanta, GA MSA. Drivers of Economic Growth February 2014. Prepared By: chmuraecon.com

Fort McPherson. Atlanta, GA MSA. Drivers of Economic Growth February 2014. Prepared By: chmuraecon.com Fort McPherson Atlanta, GA MSA Drivers of Economic Growth February 2014 Diversified and fast-growing economies are more stable and are less sensitive to external economic shocks. This report examines recent

More information

MEASURING INCOME DYNAMICS: The Experience of Canada s Survey of Labour and Income Dynamics

MEASURING INCOME DYNAMICS: The Experience of Canada s Survey of Labour and Income Dynamics CANADA CANADA 2 MEASURING INCOME DYNAMICS: The Experience of Canada s Survey of Labour and Income Dynamics by Maryanne Webber Statistics Canada Canada for presentation at Seminar on Poverty Statistics

More information

Development of an ECI excluding Workers Earning Incentive Pay. Anthony J. Barkume and Thomas G. Moehrle * U.S. Bureau of Labor Statistics

Development of an ECI excluding Workers Earning Incentive Pay. Anthony J. Barkume and Thomas G. Moehrle * U.S. Bureau of Labor Statistics Development of an ECI excluding Workers Earning Incentive Pay Anthony J. Barkume and Thomas G. Moehrle * U.S. Bureau of Labor Statistics NOTE: This paper has been prepared for presentation to the Federal

More information

Optimizing Customer Service in a Multi-Channel World

Optimizing Customer Service in a Multi-Channel World Optimizing Customer Service in a Multi-Channel World An Ovum White Paper sponsored by Genesys Publication Date: October 2010 Introduction The way in which customer service is delivered has changed. Customers

More information

EQR PROTOCOL 4 VALIDATION OF ENCOUNTER DATA REPORTED BY THE MCO

EQR PROTOCOL 4 VALIDATION OF ENCOUNTER DATA REPORTED BY THE MCO OMB Approval No. 0938-0786 EQR PROTOCOL 4 VALIDATION OF ENCOUNTER DATA REPORTED BY THE MCO A Voluntary Protocol for External Quality Review (EQR) Protocol 1: Assessment of Compliance with Medicaid Managed

More information

Web Analytics Definitions Approved August 16, 2007

Web Analytics Definitions Approved August 16, 2007 Web Analytics Definitions Approved August 16, 2007 Web Analytics Association 2300 M Street, Suite 800 Washington DC 20037 [email protected] 1-800-349-1070 Licensed under a Creative

More information

Do you know? "7 Practices" for a Reliable Requirements Management. by Software Process Engineering Inc. translated by Sparx Systems Japan Co., Ltd.

Do you know? 7 Practices for a Reliable Requirements Management. by Software Process Engineering Inc. translated by Sparx Systems Japan Co., Ltd. Do you know? "7 Practices" for a Reliable Requirements Management by Software Process Engineering Inc. translated by Sparx Systems Japan Co., Ltd. In this white paper, we focus on the "Requirements Management,"

More information

MACHINE LEARNING & INTRUSION DETECTION: HYPE OR REALITY?

MACHINE LEARNING & INTRUSION DETECTION: HYPE OR REALITY? MACHINE LEARNING & INTRUSION DETECTION: 1 SUMMARY The potential use of machine learning techniques for intrusion detection is widely discussed amongst security experts. At Kudelski Security, we looked

More information

Analyzing survey text: a brief overview

Analyzing survey text: a brief overview IBM SPSS Text Analytics for Surveys Analyzing survey text: a brief overview Learn how gives you greater insight Contents 1 Introduction 2 The role of text in survey research 2 Approaches to text mining

More information

A Portrait of Seattle s Low-Income Working Population

A Portrait of Seattle s Low-Income Working Population A Portrait of Seattle s Low-Income Working Population December 2011 Support provided by the City of Seattle Office of Economic Development 1 INTRODUCTION The Great Recession, now over two years gone, has

More information

Coastal Restoration Spending in Louisiana Economic Impact Analysis

Coastal Restoration Spending in Louisiana Economic Impact Analysis Coastal Restoration Spending in Louisiana Economic Impact Analysis Louisiana Workforce Commission www.lmi.laworks.net/green September 2011 In 2009, Louisiana and Mississippi partnered to research economic

More information

Private Equity Performance Measurement BVCA Perspectives Series

Private Equity Performance Measurement BVCA Perspectives Series Private Equity Performance Measurement BVCA Perspectives Series Authored by the BVCA s Limited Partner Committee and Investor Relations Advisory Group Spring 2015 Private Equity Performance Measurement

More information

Small Business Checkup

Small Business Checkup Small Business Checkup How healthy is your business? www.aretehr.com TABLE OF CONTENTS The Four Keys to Business Health... 3 Management & Operations... 4 Marketing... 6 Financial & Legal... 8 Human Resources...

More information

Managing Records and Information within Your Organization

Managing Records and Information within Your Organization Managing Records and Information within Your Organization By: Carl E. Weise, CRM, ERM m, ECM m, EMM m, SharePoint s You may already have a records management program in your organization, or recognize

More information

Jobs Online Background and Methodology

Jobs Online Background and Methodology DEPARTMENT OF LABOUR LABOUR MARKET INFORMATION Jobs Online Background and Methodology DECEMBER 2009 Acknowledgements The Department of Labour gratefully acknowledges the support of our partners in Jobs

More information

Secrets to Automation Success. A White Paper by Paul Merrill, Consultant and Trainer at Beaufort Fairmont, LLC

Secrets to Automation Success. A White Paper by Paul Merrill, Consultant and Trainer at Beaufort Fairmont, LLC 5 Secrets to Automation Success A White Paper by Paul Merrill, Consultant and Trainer at Beaufort Fairmont, LLC 5 Secrets to Automated Testing Success 2 Secret #1 Practice Exceptional Leadership If you

More information

Anatomy of a Decision

Anatomy of a Decision [email protected] @BlueHillBoston 617.624.3600 Anatomy of a Decision BI Platform vs. Tool: Choosing Birst Over Tableau for Enterprise Business Intelligence Needs What You Need To Know The demand

More information

Methodological Issues for Interdisciplinary Research

Methodological Issues for Interdisciplinary Research J. T. M. Miller, Department of Philosophy, University of Durham 1 Methodological Issues for Interdisciplinary Research Much of the apparent difficulty of interdisciplinary research stems from the nature

More information

10426: Large Scale Project Accounting Data Migration in E-Business Suite

10426: Large Scale Project Accounting Data Migration in E-Business Suite 10426: Large Scale Project Accounting Data Migration in E-Business Suite Objective of this Paper Large engineering, procurement and construction firms leveraging Oracle Project Accounting cannot withstand

More information

A Management Report. Prepared by:

A Management Report. Prepared by: A Management Report 7 STEPS to INCREASE the RETURN on YOUR BUSINESS DEVELOPMENT INVESTMENT & INCREASE REVENUES THROUGH IMPROVED ANALYSIS and SALES MANAGEMENT Prepared by: 2014 Integrated Management Services

More information

ORACLE ENTERPRISE DATA QUALITY PRODUCT FAMILY

ORACLE ENTERPRISE DATA QUALITY PRODUCT FAMILY ORACLE ENTERPRISE DATA QUALITY PRODUCT FAMILY The Oracle Enterprise Data Quality family of products helps organizations achieve maximum value from their business critical applications by delivering fit

More information

Building a Strategic Workforce Planning Capability at the U.S. Census Bureau 1

Building a Strategic Workforce Planning Capability at the U.S. Census Bureau 1 Building a Strategic Workforce Planning Capability at the U.S. Census Bureau 1 Joanne Crane, Sally Obenski, and Jonathan Basirico, U.S. Census Bureau, and Colleen Woodard, Federal Technology Services,

More information

Building a Database to Predict Customer Needs

Building a Database to Predict Customer Needs INFORMATION TECHNOLOGY TopicalNet, Inc (formerly Continuum Software, Inc.) Building a Database to Predict Customer Needs Since the early 1990s, organizations have used data warehouses and data-mining tools

More information

Solvency II Data audit report guidance. March 2012

Solvency II Data audit report guidance. March 2012 Solvency II Data audit report guidance March 2012 Contents Page Introduction Purpose of the Data Audit Report 3 Report Format and Submission 3 Ownership and Independence 4 Scope and Content Scope of the

More information

Direct Marketing of Insurance. Integration of Marketing, Pricing and Underwriting

Direct Marketing of Insurance. Integration of Marketing, Pricing and Underwriting Direct Marketing of Insurance Integration of Marketing, Pricing and Underwriting As insurers move to direct distribution and database marketing, new approaches to the business, integrating the marketing,

More information

730 Yale Avenue Swarthmore, PA 19081 www.raabassociatesinc.com [email protected]

730 Yale Avenue Swarthmore, PA 19081 www.raabassociatesinc.com info@raabassociatesinc.com Lead Scoring: Five Steps to Getting Started 730 Yale Avenue Swarthmore, PA 19081 www.raabassociatesinc.com [email protected] Introduction Lead scoring applies mathematical formulas to rank potential

More information

Predicting the Stock Market with News Articles

Predicting the Stock Market with News Articles Predicting the Stock Market with News Articles Kari Lee and Ryan Timmons CS224N Final Project Introduction Stock market prediction is an area of extreme importance to an entire industry. Stock price is

More information

US Behavior Analyst Workforce: Understanding the National Demand for Behavior Analysts

US Behavior Analyst Workforce: Understanding the National Demand for Behavior Analysts US Behavior Analyst Workforce: Understanding the National Demand for Behavior Analysts Produced by Burning Glass Technologies on behalf of the Behavior Analyst Certification Board. Electronic and/or paper

More information

Removing Web Spam Links from Search Engine Results

Removing Web Spam Links from Search Engine Results Removing Web Spam Links from Search Engine Results Manuel EGELE [email protected], 1 Overview Search Engine Optimization and definition of web spam Motivation Approach Inferring importance of features

More information

Occupational Demand/ Program Supply Analysis using Web Sources

Occupational Demand/ Program Supply Analysis using Web Sources Occupational Demand/ Program Supply Analysis using Web Sources MCCA Student Success Summit September 19, 2013 Institutional Research Dept Washtenaw Community College I. Online Data Sources II. Application

More information

Measurement Information Model

Measurement Information Model mcgarry02.qxd 9/7/01 1:27 PM Page 13 2 Information Model This chapter describes one of the fundamental measurement concepts of Practical Software, the Information Model. The Information Model provides

More information

Position Classification Flysheet for Logistics Management Series, GS-0346

Position Classification Flysheet for Logistics Management Series, GS-0346 Position Classification Flysheet for Logistics Management Series, GS-0346 Table of Contents SERIES DEFINITION... 2 SERIES COVERAGE... 2 EXCLUSIONS... 4 DISTINGUISHING BETWEEN LOGISTICS MANAGEMENT AND OTHER

More information

2007 Denver Regional Workforce Gap Analysis. New Picture Here (this is a placeholder)

2007 Denver Regional Workforce Gap Analysis. New Picture Here (this is a placeholder) 2007 Denver Regional Workforce Gap Analysis New Picture Here (this is a placeholder) September 14, 2007 ABOUT DEVELOPMENT RESEARCH PARTNERS Development Research Partners specializes in economic research

More information

How To Choose the Right Vendor Information you need to select the IT Security Testing vendor that is right for you.

How To Choose the Right Vendor Information you need to select the IT Security Testing vendor that is right for you. Information you need to select the IT Security Testing vendor that is right for you. Netragard, Inc Main: 617-934- 0269 Email: [email protected] Website: http://www.netragard.com Blog: http://pentest.netragard.com

More information

Spam Testing Methodology Opus One, Inc. March, 2007

Spam Testing Methodology Opus One, Inc. March, 2007 Spam Testing Methodology Opus One, Inc. March, 2007 This document describes Opus One s testing methodology for anti-spam products. This methodology has been used, largely unchanged, for four tests published

More information

Franchise Success Statistics and Factors:

Franchise Success Statistics and Factors: Franchise Success Statistics and Factors: Are Franchised Businesses More Successful Than Independent Businesses? What Information Should Individuals Rely on Before Buying a Franchise? Industry Claims US

More information

Billions of dollars are spent every year

Billions of dollars are spent every year Forecasting Practice Sales Quota Accuracy and Forecasting MARK BLESSINGTON PREVIEW Sales-forecasting authority Mark Blessington examines an often overlooked topic in this field: the efficacy of different

More information

Errors in Operational Spreadsheets: A Review of the State of the Art

Errors in Operational Spreadsheets: A Review of the State of the Art Errors in Operational Spreadsheets: A Review of the State of the Art Stephen G. Powell Tuck School of Business Dartmouth College [email protected] Kenneth R. Baker Tuck School of Business Dartmouth College

More information

Performing a data mining tool evaluation

Performing a data mining tool evaluation Performing a data mining tool evaluation Start with a framework for your evaluation Data mining helps you make better decisions that lead to significant and concrete results, such as increased revenue

More information

User Stories Applied

User Stories Applied User Stories Applied for Agile Software Development Mike Cohn Boston San Francisco New York Toronto Montreal London Munich Paris Madrid Capetown Sydney Tokyo Singapore Mexico City Chapter 2 Writing Stories

More information

SalesStaff White Paper Collection. All Leads Are Not Created Equal: Why Lead Quality Matters

SalesStaff White Paper Collection. All Leads Are Not Created Equal: Why Lead Quality Matters SalesStaff White Paper Collection All Leads Are Not Created Equal: Why Lead Quality Matters 1 Lead generation is not simply a game of producing as many leads as possible. That s because not all leads are

More information

Contribution of S ESOPs to participants retirement security

Contribution of S ESOPs to participants retirement security Contribution of S ESOPs to participants retirement security Prepared for the Employee-Owned S Corporations of America March 2015 Executive summary Since 1998, S corporations have been permitted to maintain

More information