DETECCIÓN DE MIEMBROS CLAVE EN UNA COMUNIDAD VIRTUAL DE PRÁCTICA MEDIANTE ANÁLISIS DE REDES SOCIALES Y MINERÍA DE DATOS AVANZADA

Size: px
Start display at page:

Download "DETECCIÓN DE MIEMBROS CLAVE EN UNA COMUNIDAD VIRTUAL DE PRÁCTICA MEDIANTE ANÁLISIS DE REDES SOCIALES Y MINERÍA DE DATOS AVANZADA"

Transcription

1 UNIVERSIDAD DE CHILE FACULTAD DE CIENCIAS FÍSICAS Y MATEMÁTICAS DEPARTAMENTO DE INGENIERÍA INDUSTRIAL DETECCIÓN DE MIEMBROS CLAVE EN UNA COMUNIDAD VIRTUAL DE PRÁCTICA MEDIANTE ANÁLISIS DE REDES SOCIALES Y MINERÍA DE DATOS AVANZADA TESIS PARA OPTAR AL GRADO DE MAGISTER EN GESTIÓN DE OPERACIONES MEMORIA PARA OPTAR AL TÍTULO DE INGENIERO CIVIL INDUSTRIAL HÉCTOR IGNACIO ÁLVAREZ GÓMEZ PROFESOR GUÍA: SEBASTIÁN A. RÍOS PÉREZ MIEMBROS DE LA COMISIÓN: FELIPE I. AGUILERA VALENZUELA GASTÓN A. L HUILLIER CHAPARRO LUIS A. GUERRERO BLANCO SANTIAGO, CHILE OCTUBRE 2010

2 Esta tesis está dedicada a todos los que creen que con fuerza de voluntad se pueden lograr las cosas. Nada es imposible de hacer, sólo es mas o menos complicado. Siempre miren hacia adelante, que siempre habrá una solución. Y si no la encuentran, tengan por seguro que siempre habrá una mano amiga dispuesta a ayudar.

3 Agradecimientos Bueno, siempre es dificil comenzar los agradecimientos, sobre todo cuando hay una gran cantidad de gente a la cual aprecio y debe estar aquí. Trataré de hacerlo lo más ordenado posible para que sea de fácil lectura. En primer lugar, quiero agradecer a mis padres, por su constante apoyo y cariño, por siempre estar ahí cuando los necesité y por darme el espacio necesario para retroceder poco y avanzar a saltos. Sin su gran paciencia, nada de lo que me ha sucedido hubiera sido posible. A mi hermano Héctor Matías, sin darse cuenta me ayudó a desarrollar mi tesis. Preguntas que al formularlas se ven tan sencillas de responder, pero que en la realidad su respuesta no es trivial, sirvieron para resolver puntos pendientes en mi trabajo. Creo que tienes un talento ahí que deberías pulir, ya que esa es parte fundamental del Ingeniero. Estoy muy orgulloso de como te estás desarrollando en tu carrera, sigue con esa disciplina y compromiso, que llegarás lejos. Ahora quisiera agredecer a Andrea, mi polola y ahora prometida, por esos empujones dados en los momentos precisos, por esos retos que sacuden y me hicieron avanzar. Gracias por la motivación entregada, sobre todo en la recta final, de no ser por eso, tal vez quizas todavía estaría estancado en mi trabajo. Gracias por ser el complemento que siempre anduve buscando, por toda la compañía y apoyo que has representado durante estos 5 años, que se transformarán en muchos más. Te amo mucho. A mi comisión, muchas gracias por su soporte y aguantar los cambios presentados durante el trabajo, gracias por reaccionar de buena forma a ellos. A Sebastián Ríos, mi profesor guía por presentarme un área tan interesante, como actual. Con este trabajo se me abrió un mundo que durante mi tiempo en la escuela no pude conocer por no estar en los contenidos de ningún curso. Las ganas de profundizar en el tema están, por lo que espero no estar tan alejado del área. A Felipe Aguilera, mi co-guía, gracias por tus intervenciones durante las presentaciones previas, por ayudarme en la fase inicial de recolección de datos, y por facilitarme la comunidad que fue el pilar fundamental de mi trabajo. A Gastón L Huillier, compañero y colega de investigación, muchas gracias por tu contribución tanto en las publicaciones que hicimos juntos, como en la convivencia día a día en la oficina. Tu buena onda y positivismo nunca se me van a olvidar. Finalmente a Luis Guerrero, muchas gracias por su paciencia y capacidad de reacción a cada cosa que le pedí al final de la tesis. Gracias también por el enfoque que le dió a sus preguntas, que me hizo ver el trabajo desde otra perspectiva. Dentro de mi pasar por la Universidad, he conocido mucha gente que de una u otra forma parte de lo que soy hoy en día. Aunque no lean esto, quisiera darle las gracias a los profesores Raúl Uribe, Jaime Gonzales, Juan Paulo Wiff, Francisco Santamaría, Eduardo Olguín y Juan Velásquez, quienes aparecieron en momentos claves en mi carrera. Su forma de ser hace que sean profesores que no olvidaré facilmente. Muchas gracias por sus enseñanzas. ii

4 A mis amigos mechones, compañeros de épicas batallas (académicas y no acacémicas) y maratónicas jornadas de estudio. Pasamos por hartas cosas mientras estuvimos juntos en Plan Común. Aunque nuestros caminos se separaron después, aún los recuerdo con gran cariño y espero que no se pierda el contacto nunca y que nunca les falte rock. Mención especial a Murgas y la Viole, sigan siempre fiel a sus caminos. Estoy muy orgulloso de ustedes dos. No se me pueden olvidar todos aquellos que estuvieron en la salita DOCODE, mi oficina por todo este año. Fue muy agradable trabajar ahí, dudo encontrar un ambiente tan grato en otro lugar. A Edumerlo, compañero en labores de redes sociales, estoy muy orgulloso por tu trabajo en la Memoria, te lo mereces. Gastón, que siempre estaba en la sala, haciendo casi innecesario tener llave. Pato Moya, que le daba el cambio de tuerca a la sala (siempre es bueno trabajar con gente de otras carreras). Gabriel, Gerardo y Felipe, ustedes son los siguientes, ya lo han demostrado con sus trabajos previos, así que esta fase final no será nada más que un trámite. A todos ustedes, muchas gracias. Y desde el punto de vista académico, debo darle las gracias a Vicente, por haberme dado la confianza desde el principio para ser el auxiliar de su ramo. Ha sido muy grato trabajar durante estos 4 años como el Auxiliar de Conta, hecho que dio paso a ejercer otros cargos docentes. A Fernando Ordoñez por la fe que tuvo en Álvaro y en mí al aceptarnos como sus auxiliares, espero que hayamos estado a la altura durante estos tres semestres. Y finalmente, a Richard Weber por la confianza que tuvo al elegirme como profesor de Catedra. Es un paso muy importante para mi, y estoy enormemente agradecido por la oportunidad. A Julie Lagos, el pilar del MGO, muchas gracias por tus gestiones e infinita paciencia, que sin ellas los plazos no se hubieran podido cumplir. Finalmente, un agradecimiento a toda la gente que he llegado a conocer en esta universidad. Compañeros de clases, magister, almuerzo, boletineros, seishines, ayudantes con los que he trabajado y colegas auxiliares. A mis alumnos, tanto de clases auxiliares como de Catedra, les digo que aunque no se note, creanme que los tengo muy presentes y clase a clase trato de ser un aporte un poco más allá de explicar un par de formulas en una diapo o una pizarra, eso lo hace cualquiera. Siempre recuerden que detrás de los números hay personas, y que la empatía y el trabajo en equipo no son sólo cosas que van en en currículum, deben ser parte de la vida misma. Un gran capítulo en mi vida se cierra hoy, y como la vida misma, automáticamente se abre uno nuevo con desafíos aún más grandes. Espero que todos aquellos a quienes estimo sigan ahí dándome ese apoyo tan valioso que me ha hecho superar cada desafío que me he propuesto. Son todos geniales. Héctor I. Álvarez Gómez Octubre, 2010

5 Resumen Ejecutivo Actualmente, el uso de internet tiene diferentes propósitos, entre ellos se encuentran las Redes Sociales. Su principal objetivo es el comunicar lo más posible a las personas, sin importar su ubicación geográfica. La interacción entre los usuarios genera una cultura de intercambio, creando una Comunidad en la Red Social. Existen diferentes tipos de Comunidades, entre ellas las Comunidades de Práctica, definidas como comunidades donde la interacción entre usuarios está basada en la necesidad de aprender sobre un área técnica en específico. Cuando las interacciones entre los miembros de la comunidad se lleva a cabo por la Internet, se les llaman Comunidades de Práctica Virtuales (VCoP en inglés). En este tipo de comunidades, es importante que el propósito por el cual fue creada sea cumplido a través del tiempo por parte de los miembros. Para los administradores, no es fácil de identificar quiénes son los miembros cuyos aportes son importantes para la comunidad, debido a la cantidad de usuarios y post que se generan día a día. Estos miembros claves son generalmente descubiertos con técnicas de Análisis de Redes Sociales (SNA en inglés), pero estas técnicas no consideran el contenido que ellos entregan. Al contrario, técnicas de minería de datos en documentos logran medir el contenido desarrollado en la comunidad, pero no consideran las interacciones entre usuarios. Esto implica que es necesario tener una metodología que mezcle ambos enfoques y así poder encontrar a los miembros clave en términos del contenido. En esta tésis, se desarrolló un sistema híbrido que combine el contenido procesado por Minería de Datos y las técnicas de Anáisis de Redes Sociales para encontrar miembros clave. La idea principal es obtener una representación de la red en un grafo que considere los conceptos o tópicos contenidos en un post. Para este propósito, tres configuraciones fueron definidas, de acuerdo a quien le responde un miembro cuando postea. Además, dos filtros de contenido fueron aplicados, obteniendo los conceptos y tópicos tratados en la comunidad. Algoritmos de Grado y HITS fueron utilizados para encontrar miembros clave, logrando definir dos tipos: los motivadores, que atraen a otros miembros a participar, y los respondedores, que responden las preguntas de la comunidad. Los resultados muestran que se lograron encontrar miembros claves para todas las configuraciones de la red, además de mostrar una precisión igual o superior al 70% por algoritmo. Los administradores entregaron nóminas de miembros claves antes y después de ver los resultados, y se pudo apreciar un aumento en la cantidad de miembros claves, demostrando que la aplicación de técnicas de SNA ayudan a mejorar la búsqueda de miembros clave. La aplicación de filtros de contenido no presentan mejoras significativas. Sin embargo, el incluir el contenido será de ayuda para el trabajo futuro, donde se podrán generar redes temáticas de acuerdo a los conceptos o tópicos de la comunidad. iv

6 Summary Nowadays, internet is used for many differents purposes, among them there is Social Network. The main objective of Social Networks is to communicate as much as posible people no matter the geographical position of them. The interaction between users generates a sharing culture, creating a Community, in the network itself. There are different classification of Communities, between them are the Communities of Practices: communities where members interaction is based in need of learn about an specific topic. When the interaction between community members is develop by Internet, it calls Virtual Community of Practice (VCoP). In this kind of community, it is important that the purposes in which the community was created be accomplish through time by members contributions. For administrators, it not easy to identify who are the members which contribution are important for the community, due to the amount of users and post generated day by day. Commonlly this key-members are discovered with techniques like Social Network Analysis, but this techniques only consider member participation and not the users contribution. On the other hand, to measure the content developed in the community, Documental Text Mining techniques are used, but they not consider members interaction. So, it is necessary to develop a methodology which mix both approaches in order to find key-members in terms of the content. In the thesis, an hybrid approach which combines content contribution obtained by Text Mining and SNA for key-members discovery is developed. The main idea is to obtain a graph representation of the community network which consider the concepts or topics containing in members post. Three network configuration where defined, according to the assumption of who is replying a member when posts. Also, two content filtered where applied, obtaining the concepts and topics treated in the community. Degree and HITS algorithm of SNA key-members discovery were applied, finding two kinds of key-members: motivators, who encourage other members to participate and repliers, who replies the answers of the network. Results shows that is possible to find key-members including the content to the graph representation, also, a precision equal or greater than 70% was obtained for each network configuration and rank algorithm. Administrators generate lists of key-members before and after they reviewed results, and was possible to apreciate an increment in the quantity of key-members, demonstrating that SNA techniques helps to improve key-member discovery. Application of content filtering does not present meaningful enhancements. However, to include the content will help for future work, where thematic networks will be build by considering community concepts or topics. v

7 Contents Agradecimientos Resumen Summary Contents List of Tables List of Figures i iv v vi ix xi 1 INTRODUCTION Key-member discovery problem Objectives General Objective Specific Objectives Expected Results Methodology Thesis Structure PREVIOUS WORK Social Network Analysis Graph representation of a network Metrics used in Social Network Analysis Social Network Analysis Applications vi

8 CONTENTS Social Network Analysis on Virtual Communities of Practice (VCoP) Text Mining for content reduction METHODOLOGY Data Selection Preprocessing Data Text Processing Concept based text mining Using Latent Dirichlet Allocation for Topic classification Network Configuration Concept-based & Topic-based Network Filtering Network Construction Network Visualization Social Network Analysis key-member discovery Analysis and Evaluation APPLICATION A real Virtual Community of Practice SNA-KDD application Data Selection Text Processing Network Configuration and key-member discovery Results and Discussion Topics obtained Resulted Networks Key-members Discovered Key-members detection algorithm comparison Filter algorithm CONCLUSIONS AND FUTURE WORK SNA-KDD methodology vii

9 CONTENTS 5.2 Content Filtering Density reduction Key-members discovered Future Work Content contribution Concepts approach Thematic networks SNA and Topic extraction computational tool REFERENCES 59 Appendix A SNA-KDD Results 65 A.1 Key-member obtained A.2 Key-member Database A.3 Topic extracted viii

10 List of Tables Table 3.1 List of goals and membership values of a singular post Table 4.1 Plexilandia activity Table 4.2 Plexilandia general measures Table 4.3 Database Evolution through SNA-KDD development Table 4.4 Topics obtained Table 4.5 Topic meanings Table Densities Table 4.7 Creator Oriented Motivators Key-members Table 4.8 Creator Oriented Repliers Key-members Table 4.9 Administrators Key-members Table 4.10 Global Key-members precision Table 4.11 Type A Key-members precision Table 4.12 Type B Key-members precision Table 4.13 Type C Key-members precision Table 4.14 Administrator Key-members enhancements Table 4.15 Global Key-members precision enhancement Table 4.16 Kendall τ coefficient for rank algorithm Table 4.17 Kendall τ coefficient for filter algorithm compared with Counting graph Table A.1 Reply Oriented Motivators Key-members Table A.2 Reply Oriented Repliers Key-members Table A.3 All Previous Oriented Motivators Key-members Table A.4 All Previous Oriented Repliers Key-members ix

11 LIST OF TABLES Table A.5 Topic Extracted x

12 List of Figures Figure 1.1 Key-member Discovery Process Figure 2.1 Directed and Undirected graph examples Figure 2.2 Complex Network: A School Network Figure 3.1 SNA-KDD Process Figure 3.2 Thread Post Sequence Figure 3.3 Networks Configurations Figure 4.1 Initial Database Figure 4.2 Key-member Database Figure 4.3 Creator Oriented Graphs Figure 4.4 Reply Oriented Graphs Figure 4.5 All previous Oriented Graphs Figure densities comparison Figure 4.7 Density reduction Figure Creator Oriented Density Figure Reply Oriented Density Figure All Previous Oriented Density Figure 4.11 Motivators Creator reply ranks Scatter plot Figure 4.12 Repliers Creator oriented ranks Scatter plot Figure 5.1 Kawada Kawaii Visualization for 2009 Creator oriented networks Figure A.1 Key-member Final Database xi

13 List of Algorithms 1 Initialize Semantic Weights Matrix Creator Reply Network Last Reply Network All Previous Reply Network xii

14 Chapter 1 Introduction In this chapter, a general background of this thesis purpose is presented, followed by this thesis general and specific objectives. Then the methodology used for the development of the thesis is discussed. Finally, the thesis structure with a brief introduction for all chapters is presented. 1.1 Key-member discovery problem Nowadays, people are more connected thanks to Internet. Internauts have experienced a proliferation of many different web-services and applications which are used day by day and are becoming more popular, such as e-business (a way to buy and sell products on-line), on-line games (rol playing games, interactive games) streaming media (like Youtube 1 and Grooveshark 2 ), instant messaging software (like G-talk or MSN ) and on-line social networks (like Facebook 3, LinkedIn 4 and MySpace 5 ). In On-Line Social Networks (OSN), users can interact and communicate each other, making new friends or contacts, establishing ties with old classmates, discuss about different topics, organize events, among other features. This way, people are not only more connected today compared with last decade; also its communication is more fluid and quick. In Social Networks, continuous interaction between members provokes a sense of belonging, presenting benefits for the network itself, increasing the network activity in terms of experiences, knowledge, or opinions sharing, all of this being develop in a community context. The result of this exchange results not only in reinforced ties between users, but also the creation of new ones, growing the connections between them, and maintaining or enhancing their healthy. In the case of

15 Chapter 1. INTRODUCTION a traditional community (which it is developed in a real-life context), members need both physical and temporal coordination in order to interact and share each other. Then, when a community interact through a computer-mediated space it is called Virtual Community [26]. In this type of community, members of different places around the world can interact in a friendly environment where the physical coordination and temporal synchronization is replaced for a virtual interaction through a web-page. There are different kinds of Virtual Communities. Kim et al. [28] organizes Social Web Communities describing the kind of users, uses, and needed features for every type of community. Wenger [58] identified three different Virtual Communities (VC), depending on the objectives pursued: access to information (VC of Interest), to complete a particular objective (VC of Purpose) or knowledge about an specific topic, skill or profession (VC of Practice). Specifically, it defines Virtual Communities of Practice (VCoP) as a group of people who share about specific topics and depth their own knowledge and expertise interacting on a friendly interface. On-Line Social Networks can contain a VCoPs. In fact, communities begin their life with an existing Social Network and the collaboration between their users. When their interactions are brought together by sharing interests, goals, needs, or practices, a community appears and users who participate becomes in members of this [9]. Community members participation invite new users to be part of it, improving the quality of the conversations between them, which contribute to the creation of new community members. VCoPs are commonly related to professional organizations, academic communities, even groups of artists who wants to improve their techniques by learning from colleagues. This means that the main objective of VCoPs is to share information about specific topics, and is expected that members collaborate by proposing discussions around this Main topics. But in occasions members own topics could not be the same as community has, either users propose topics which are not related with desired topics, increasing their number, or just ignorance about how to use the virtual space in which the community develops, deviating the discussion from community Main topics [20]. So the task of VCoPs administrators is to check out that topics generated by members contribute to this. Although, there are members whose contributions are according to community Main topics. Even more, some times members lead conversations to this topics and motivate other to participate in the discussion. Administrators wants to know which members contributes to the well development of the community in the terms described before. In other words, administrators want to know who are the key-members of the community. There are several definitions on what a key-member would represent: The most participative member [49], the member who answers others members questions [35] or the member who encourage others members to participate [5]. In other words, key-members participate more than other users, keeps the community active, and help to less experienced members by replying their questions. All of these definitions, however, presents key-members according to their participation, but do not take into consideration how meaningful are their contributions for the community. Therefore, with 2

16 Chapter 1. INTRODUCTION this approach it only matters if a member replies in a certain topic, not if the post answers the original question or contribute to the general discussion. If key-members are identified, they would help other community members to develop their skills, increase their knowledge, facilitate the comprehension of discussions, and generate participation. Without them, the knowledge generation it blocks and the members activity could decrease at the level of provoke the death of the community [49]. In other words, they are members who keeps community alive and running. Also, there are communities where key-members are paid to develop content. Therefore, if administrators do a wrong key-member detection, the impact of their decisions over the community will not be the expected by administrators. For that reason, it is important to recognize who are key-members and have a methodology to find them. To detect them, administrators time and effort is needed, because they have to review which content contributes to the community. If the community is small in terms of members and posts, administrators can find the key-members manually, but generally this is not the case. The amount of register users in OSNs is increasing year by year. For example, Twitter had members by Feb while it had members by Feb. 2009, which means 1382% of growth. This growth has been experienced in many others OSNs, so one of the great challenges for administrators is to discover key-members in large communities, where the manual inspection is not possible. Therefore, an automatic key-member detection system is needed. To face the scalability problem, administrators commonly use techniques such as Social Network Analysis (SNA), which deliver graph representation of the structure of the network and patterns of interaction between members [57], helping to the visualization of the community, and also to recognize their behaviour, like who are the relevant members by algorithms which measures their participation in the community. Other approach used is Text Mining (TM), which is the application of Data Mining in texts data. In this case, community members posts are used as input for the TM algorithms, resulting in patterns of the content presenting in the community. The patterns would be helpful if the administrators wanted to know which are the top topics of the community in a given moment, and compared with the expected main topics. Both methods analyze different aspects of the community. On the one hand, SNA can measure the activity of the community, but not the activity content. On the other hand, TM can discover the relevant content for the community, but not the relationship between the members. Applied separated, only is possible to study the participation of members, or the content generated in the community (in global terms). But administrators also need to measure how much a member contribute to communitys Main topics, and this methods can not evaluate this aspect of the community if they are applied alone. For that reason, the hypothesis of this work is that the combination of SNA with Text Mining methods will present an improvement of VCoPs understanding, in terms to the relation between members including the content that they are sharing. This will help administrators to discover the 3

17 Chapter 1. INTRODUCTION key-members according their participation and how align are with the Main topics, improving not only the process of find them, but also the understanding of the network structure and members behaviour within. 1.2 Objectives General Objective The main objective of this thesis is to design and develop a hybrid approach, by using SNA and Text Mining, in order to enhance key-members discovery on Virtual Communities of Practice Specific Objectives 1. To characterize VCoPs, in order to build a graph representation. 2. Use the different graph configuration in order to evaluate the best configuration to extract experts in a VCoP 3. Improve traditional SNA approach with KDD. 4. To evaluate the proposed approach in a real world Virtual Community of Practice Expected Results Among the results expected in this thesis, there are: A methodology for key-member discovery which include the interaction between members and the content generated by them. Classification of community content. A database which contain information about members, the discussions threated in the community and the structure of interactions. Graph representations of the community. To develop an algorithm which build a graph representation for Virtual Community of Practice, considering both, SNA only, and SNA with Text Mining configurations. For each graph or network structure, apply Social Network Analysis in order to discover key-members. 4

18 Chapter 1. INTRODUCTION Have the key-members of the community identified by different methods. Also, an expected result is that these key-members change when the content had been included in the algorithms in favor of the real key-members. As content is included in the discovery methodology, is expected that not only key-member be founded, but also the interaction would be filtered and this could result in a less dense graph representation. 1.3 Methodology The general idea of key-member discovery process is explained in Figure 1.1. First, the needed data of the VCoP is collected, which is processed to obtain a graph representation of the network. Then, SNA is applied in order to find key-members. With this results, administrators have the information to take decisions about community development. Figure 1.1: Key-member Discovery Process. The methodology used for the development of this thesis is structured in the following steps: 1. Related work For the accomplishment of this thesis, it is necessary to have a knowledge about Social Network Analysis and their applications over Virtual Communities of Practice. Also, Text Mining methods for content extraction are required in order to have the relevance of the content generated by community members. For that reason, state of the art of both, SNA and TM will be reviewed to establish the most appropriated methods which fulfill thesis main purpose. 2. Graph definition and configuration The community graph representation will depend of their definition, in other words, the definition of both nodes and arcs. Also, how to configure the graph in terms of links between nodes is relevant for further experiments, because it will represent the interaction between community members. Key-member discovery algorithms will be applied. 3. Social Network Analysis definition and previous work Social Network Analysis will be reviewed, specifically the state of the art of SNA applied in OSNs, VCs, and VCoPs. This review will consider not only key-members detection works, 5

19 Chapter 1. INTRODUCTION also other issues that can be solved with SNA, such as clusters, structure analysis, brokers detection, among others. 4. Application and Evaluation of the Proposed Model Text Mining methods will be developed in a real world VCoPs, resulting in a new representation of the members content. Then, different graph representations will be set and built according to the community data and how will be considered the replies between members. Then, SNA will be applied in the different network configurations, in order to find community key-members. 5. Results analysis and conclusions After SNA and TM methods are applied, different results are presented: An improvement of the visualization, a database structure to store all the information about key-members, content classification according to the TM methods, and a new method to find key-members. This method have two approaches to be evaluated: first, comparing results with administrators key-members, and second, a benchmark between SNA methods about how different are the results obtaining by each one. 1.4 Thesis Structure In the next chapter, related work about state of the art in Social Network Analysis, their applications in Virtual Communities of Practice for key-members detection, Text Mining for topic extraction and content reduction improvements is presented. The main idea of this chapter is establish that actual approach it is not considering the content which community members develops. On chapter 3, the main contribution of this thesis is presented, following the SNA-KDD process: Community structure and data required for this work, Text Mining approach for community content reduction, Network configuration, according to how the replies in the community are defined; network filtering, considering the results of Text Mining methods; network construction algorithm, which explain how to build the graph with the filter and without it; network visualization methods to have a graphical understanding, SNA methods applied to find key-members in the different network configuration, and finally how results will be analyzed and evaluated. Then, on chapter 4, an experiment on a real life VCoP is presented. Here, the VCoP is described in terms of content, users, and main topics. Also, the text processing method for the needed content representation and evaluation method are presented. Then, main results for both traditional and proposed system are presented and analyzed. These results are presented according to the evaluation criteria previously introduced, including the benchmark between different combinations of graph and detection methods. Finally, on chapter 5 the main conclusions are presented, including our main findings and contributions, as well as the future work and lines for research. 6

20 Chapter 2 Previous Work In this chapter, state of the art and previous work is reviewed. Firstly, the Social Network Analysis approach and their applications are presented. Secondly, research of SNA on Virtual Communities of Practice are reviewed, including key-member discovery approaches. Finally, Text Mining techniques for content reduction are exposed, presenting both scopes; semi-automatic and automatic techniques. 2.1 Social Network Analysis Social Network Analysis [57] helps to understand relationships in a given community by analyzing with a graph representation. It focuses on study ties between people, groups of people, organizations, even countries. When these ties are combined, it form a network, what is the objective to be analyzed. The main goal of social network analysis is detecting and interpreting patterns of social ties among actors [12]. Main concepts related with SNA are: first, there is an interdependencies between actors, because their actions in the network affects other; second, the ties (or linkages) between actors represents a transference of a certain resource; and third, network models conceptualize structures as a pattern of actors relations. In other words, SNA analyses a set of actors and linkages among them which represent the network, instead of analyze the behaviour of an actor individually, like others approaches does. Network representation is a primary task, because according to the interactions defined will be the patterns that SNA find. In this part, actors, their relationships, and how to represent it are established. Then, techniques to find networks patterns are applied. Which technique is used it depends of what aspect of the network is studied: 7

21 Chapter 2. PREVIOUS WORK Cohesion: Who are related with who, if there is sub-networks, how strong are them, and what happen if some of this sub-networks are removed from the network. Brokerage: How information is transported in the community, and who collaborate in this task. Both who generates and who act as a bridge to deliver the information to other. Ranking: Who are the most important, pointed, or popular actors and how this popularity affects the development of the network. Roles: More than find the most popular, some times members develop an specific task in the network. So, it is necessary to identify the role that actors are assuming when interact with other. One of the main benefits of using SNA, is that the visualization improves analysis and comprehension of the network [44, 62], comparing with statistical analysis. For example, Melo et al. [11] shows that the application of SNA facilitates data comprehension, and its better than application of statistics like box score. They use NBA data as an example, because the amount of generated data by this league too huge to make classic statistics in order to predict a team success during the full season Graph representation of a network In general, a network is defined as it follows: a set A of a actors and a set R of r relations between actors. Then, the graph G which represent a network is composed by a pair G(N, E), where N = {n 1,... n a } correspond to the nodes and E = {e 1,... e r } the edges or arcs of the network. In the case of SNA, nodes are actors of the network and edges or arcs are the ties between them. Figure 2.1: Graph examples. There are two kinds of ties: directed and undirected. When edges are used, the relationship is undirected. For example, if a interact with b (being a and b actors of the network), then the edge e 1 = (a, b) = (b, a) represent their interaction, and will be represented in the graph by a line between a and b. On the contrary, arcs represent a directed relationship. If a starts an interaction with b, then the arc a 1 = (a, b) represent this tie, and graphically will be represented as an arrow from a to b. Figure 2.1 illustrate both possible configurations the network. 8

22 Chapter 2. PREVIOUS WORK There are more kinds of configurations, depending what is wanted to study with the network. For example, to represent a School, actors will be students and teachers, and two relationships could be if a student is in a certain teacher class and which students are friends. Figure 2.2 shows this network: the blue squares are teachers, the red circles are students, the continuous line is a class relationship and the break lines are the friendship relationship. The class relationship is directed because a student is in the class of this teacher, and the friendship relationship is undirected because friendship is, generally, a mutual relation. Figure 2.2: A School Network Metrics used in Social Network Analysis Classical SNA metrics are following presented: Degree: The basic measure in Social Networks. In general terms, it count the total of arcs and/or edges that a node have. If the network is directed, two sets are defined. The first is D + (i), i N, which correspond to all arcs started from node i, and the second is D (j), j N, which represent all arcs who finish with node j. Then, the Out-degree of a node j is defined as the number of arcs a A where a D + (j), while the In-degree of a node i are the number of arcs where a D (i). Centrality: It is defined as a ratio between the Degree of a node and the maximum degree which have a node of the network. Closeness centrality of a node i N is the ratio between the number of reachable nodes from i and the sum of distances between i with this nodes. Betweenness centrality of a node i is the proportion between the number of shortest path between a pair of other nodes which include i, and the total of shortest path in the network. Core: Filter the network according to the degree of nodes. A K-Core is a sub-network which contain only nodes with a degree great or equal to k. 9

23 Chapter 2. PREVIOUS WORK Social Network Analysis Applications Some general research had been previously presented. Chakrabarti et al. [7] present a survey about Graph Mining, explaining the patterns and algorithms that could be used in a network. Getoor et al. [22] describes the benefits of use Link Mining for SNA and present a taxonomy of common Link Mining task. According to related work, it is possible to classify SNA applications in the following categories. 1. Network behaviour SNA is useful to understand different behaviour in the network. In the case of user behaviour, Musial et al. [40] presents most common measures for users understanding, such as Node Degree and Prestige and Centrality, which were described in section On the other hand, to describe the behaviour of the network itself, Mislove et al. [39] study and analyze the structure of different large-scale on-line social networks. They describe their structure in terms of users, links, and groups, as explained in section 2.1. Also confirm the existence of some phenomena like Small-world: networks which have a small diameter and exhibit high clustering, and Power-law: probability that a node has a degree of k is proportional to k γ, for large k and γ > 1. Depending what it is required to study by administrators, one of this approaches would be used. With a more applied approach, Wang et al. [56] studied the correlation between the electronic Word of Mouth (e-wom) and products sales prediction in a cell phone discussion board through a SNA perspective. They use the interaction between users to see if the conversations affects the cell phone buy decision. Pfeil et al. [46, 45] analyzed the behaviour of an older people community to see how they seek and give support each other. 2. Community properties In occasions, ties among a group of members are stronger than with the rest, forming a subcommunity. That is why some of the objectives of SNA applications is to find these inside the network. Fortunato [17] present a survey about sub-community detection in graphs. In different aspects, like biological and social networks, explained deeply the characteristic of a graph and the algorithms to find communities in them. Also, Cocciolo et al. [10] presents how to discover a community behaviour in a on-line document repository. With an interesting approach, Alberich et al. [1] use SNA over the Marvel Universe, a world of comic book characters. They studied the relations between the characters and found out that Marvel Universe satisfy Small-world and Power-law properties, and also discover a high level of interaction between good characters, forming a community. In addition, Gleiser [23] explained that the community formed by the superheros confirms why good always wins. Because the villains does not have a community and commonly fight isolated, contrasting collaborative work that heroes have. 10

24 Chapter 2. PREVIOUS WORK 3. Users behaviour To understand users behaviour and why they belong to a certain community, Kwon et al. [33] investigate why a user choose or not leave a Social Network. They found out that usefulness and easy to use perception are reasons for which users keeps in the community. Also, individuals who have higher social identity, altruism, and telepresence are more likely to be participative in this communities. At this point, issues like the contents of networks or goals that users pursued are not included in the analysis, being not applied to every community. Relating to members in the community, roles understanding and and key-members discovery are very similar. In role discovery, SNA is applied to understand the different classifications of users. For example, Yelupula et al. [60] uses information to understand the roles inside a company. These roles were compared with the organizational structure of the enterprise, and results were significant, having high levels of accuracy by each found cluster. On the other hand, key-member discovery have the objective to find the most important users of a Social Network, rank users but not to classify them. Another research was made by Kumar et al. [31]. They expose a segmentation of the network in three regions: one of isolated users, which have enter into the network, but never interact with the rest of users or members (they are merely witnesses of the interactions); sub-networks which interact almost only with themselves; and a giant well-connected core which do not need key-members to persist in time. Key-members helps to the keep the community active, but there are another kind of member which helps to this purpose. They are middle-man members (also known as brokers), which facilitates the transference of information between members. Kossinets et al. [30] demonstrate how much information is lost if brokers are deleted from the network. The amount of these brokers affect the network, because if there are too many, the network will not suffer loss of information. More about SNA for key-member discovery is explained in next section, detailing how the approach is useful for the Virtual Communities of Practice Social Network Analysis on Virtual Communities of Practice (VCoP) For VCoPs are very important to generate, store, and keep knowledge resulting from members interaction. The success of a VCoP depends on a governance mechanism [49] and key members participation (so called leader [5] or core members [49]). Likewise, every VCoP members goal is to learn specific knowledge from the community. Therefore, it must be considered to the analysis the content of posts which are interesting for members. As explained in section 1.1, there are many definitions about what is a key member: the most participative member [49], the member who answers the others members questions [35] or the member who encourage others members to participate [5]. However, non of these definitions take into consideration the content or the meaning of their interactions (posts, reply, etc.). Even more, this approaches does not measure the contribution that members does to the Main topics when interact with other members. Interaction is usually measure by user A reply post of user B, if A 11

25 Chapter 2. PREVIOUS WORK successfully answer or not the question it is not taken into account. Those approaches they just consider participation, even if the post is about a topic totally different than the thread or post in which is replying. Therefore, we hypothesized that key members defined this way, would lead to an incomplete analysis of their behaviour in the community. In the thesis approach, a key member is defined as the member who participates (asking or answering) according to a specific purpose of VCoP. The more aligned to VCoPs purposes define the greater importance degree a member has. This way, a key member is obtained by combining posts content with SNA techniques in a single process. Key-member discovery is a very important administrator s task, because these members are the ones that keep the community alive. They share their experiences, knowledge, create tutorials, develop videos on a subject to help other non-experts members, etc. Many times, administrators or community owners, may pay these experts to develop some contents for the community, since they know that these contents will produce high impact on the community, producing great interaction between members and help to capture new members. In small communities, administrators or owners know almost all members and their participations, because the quantity of both members and posts can be checked manually. Therefore, they all know who-is-who in the community. However, in bigger communities, where there are thousands of members publishing thousands of posts daily, this task becomes unmanageable. In general, administrators do not have time to read every post, or the amount of posts makes impossible to be analyzed by a human administrator. As Social Network could have a community, is very common to apply SNA in VCoPs. As was explained in previous section, the result is a graphical representation which helps to find community core, sub communities, network clusters, peripheral members, etc. Key members belong to communities core, therefore, should be applied core algorithms to discover them such as HITS [29] or measures described in section like degree or centrality [40]. The same issues treated in SNA are interesting to solve on VCoP. For example, communities detection [17], moderation management [20], and members analysis [15]. Some cases of SNA on VCoP are as follows: 1. Members behaviour Toral et al. [54] uses SNA in a VCoP to analyze the role of brokers. As explained before, these members are the link between askers and repliers. They used a graph representation of the community, based on the replies between members. The value of the arc is a measure of member s interaction. They found the evolution through time of the brokers and how valuable they are for the community. They do not consider the content generated by the community, only use SNA in the traditional way. Lin et al. [34] focuses in understanding which factors or perceptions encourage a member to participate and share his knowledge in a professional VC. They calculated ratios like reci- 12

26 Chapter 2. PREVIOUS WORK procity, compatibility and loyalty. This study is relevant because establishes some qualities that defines a key-member. Fang et al. [16] performs a statistical work to establish why a member keeps his knowledgesharing intention through time. They made a survey on which measures three research streams related to the member s intentions. But not only measure the trust between members, also proves that trust member-manager has influence in the community too. This approach is very important, because a healthy community depends on knowledge-sharing intention of the members, specially of key-members. Chen et al. [8] does a similar work as Fang [16], but the core was to explain why members give or receive knowledge to/from other community members. Between the factors, interpersonal trust and knowledge sharing self-efficacy are the most relevant to understand contributions in a community. Both works help to emphasize that trust between members is important to have a healthy community. Therefore, key-members play a relevant work for knowledge contribution, motivating other members to participate and frequently bringing up new key-members. In addition, in [13] it is shown a marketing approach to determinate the influence that Virtual Communities have in customers consuming-decisions. They identified six categories of members according his interest and participation in the community, remarking the core members who are the most frequent visitors and the ones who spend more time sharing his knowledge and participating in different threads. 2. Key-members discovery approaches Expert detection approach focuses in extract members interaction and recognize the most participative members with algorithms like betweenness [50], centrality or HITS [29]. Liu et al. [35] uses a community based in a Q&A discussion board and define the expert as a person who answer similar questions in the past. Zhang et al. [61] works in a web forum and defined their expert as a person which knowledge matches with the words obtained by a query browser. They used different rank algorithm to recognize the experts and then classify in expertise levels. A pending issue in this work is that they cannot measure the expertise of the members, they know who is an expert, but no how much expert he or she is. Fu et al. [19] uses data as a VCoP and create the network based on who is replying to, if this reply is directed, a copy or a background copy. This configuration is helpful to have an idea of the present work network configuration. Campbell et al. [6] uses data too, and extract the quality of the corpus to determine if a user is a expert or not. Then build an expert graph and applied HITS to find the experts in this graph. This work has an similar approach with present work, only that the amount of data used by them is lower and it is not strictly a VCoP. Ehrlich et al. [15] uses a work network and establish the relationship between members as who knows who. Then determined three levels of expert: the person who knows most people the person who has the most knowledge and the person who is a bridge between others (in other words, the broker). Amatriain et al. [2] have another approach. Firstly, they recognize the experts of the community, and then use this information into a recommendation system. This work is relevant to 13

27 Chapter 2. PREVIOUS WORK understand how important is to know who are the experts in order to make better enhancements for the community. The expertise is measured by member participation or user content generated only, but they are not combined. Both alternatives presents difficulties: ignoring the content created by users could result in experts like flooders, trolls or spammers, and worst, not consider the real community experts. On the other hand, ignore the interactions and consider only the content as an isolated factor, could consider a non participative member as an expert. An hybrid methodology which consider the content of the members interaction, not only would greatly improve key-member detection, but also present new features that either approaches can present separately. 2.2 Text Mining for content reduction As described in previous section, several techniques have been proposed to extract key members [42]. Classify users according his relevance within the community [47, 60], discovering and describing resulting sub-communities [32], among other applications. However, all these approaches leave aside the meaning of relationships among users. Therefore, analysis based only on reply of mails or posts to measure relationships force or weakness it is not a good indicator. It is necessary to incorporate the content, either corpus or post replies. Web Text Mining [55] is useful to find patterns of web text content. In particular for this work, as members generated content in posts, patterns could be used as a reduction of the communitys content. In order to have patterns which reduce community content, two approaches are following presented. The first is presented by Ríos et al. in [52], where used a Concept-based Text Mining approach to extract the goals accomplishment of a VCoP. The objective of a VCoP is to generate knowledge by members interaction. This knowledge is classified by administrators or community owners in a set of goals. Each of them is related to a set of terms which are scored according to how important is for the goal. This way, post contribution to the knowledge development is measured by a set of goal scores. The other approach is presented by Blei et al. [4], where use a probabilistic model in order to find underlying topics. Basically, given a number of topics, they estimate a probability that a word belongs to each of them. Applications of this approach are presented by McCallum et al. [37, 38]. They determine roles and topics in Text-based Social Networks, where topics depend on the interaction between members. Usually, link structure is used to find sub-communities, but as the same as key-members discovery, include the content will improve the precision of the discovery. Pathak et al. [43] include topic extraction which used to study relationships between members. This results in a sub-community for eachtopic. Also, member roles where discovered by their interactions in the community. However, both approaches were applied over data, which could not have a clear purpose or goal to accomplish. 14

28 Chapter 3 Proposed methodology for Key-member Discovery In the following chapter, the main contribution of this thesis is presented. A methodology to solve the Key-member discovery problem will be described by using an SNA-KDD approach, including text preprocessing such as LDA [4] and Concept-Based [52], graph topologies and building, community content incorporation, SNA expert detection algorithms and evaluation of the results. As explained in Section 1.1, key-members discovery problem has the purpose of find relevant members on a VCoP. These key-member are defined as a person who answer other members questions or encourage and promote community participation. Section 2 present approaches which consider only members participation to find them, but it is not sufficient, because their content could not contribute to the community. It is necessary to incorporate community content to the members interaction, because improves the relation by filtering replies which do not contribute to community Main topics and also measures how meaningful are the comments that members does. To explain the methodology which is applied to combine this approaches, an adaptation of the Knowledge Discovery in Databases (also know as KDD) is used, SNA-KDD. The idea is to use KDD steps and incorporate Text Processing and SNA to the process. Figure 3.1 illustrate the modifications over KDD approach: first, the preprocessing step is applied by Text Mining techniques, in transformation step the data is used to configure a network in order to have a graph representation of the community. The data mining step is replaced by a SNA step, in which patterns are extracted from the graph, discovering the key-members of the configuration. All of this steps are following explained. 15

29 Chapter 3. METHODOLOGY Figure 3.1: SNA-KDD Process. 3.1 Data Selection VCoPs usually are supported by forum systems (like VBuletin, PHPbb, etc.). The forum is the virtual place in which members interact each other and generate knowledge. Then, the forum has categories where different topics are discussed. For example, a forum have categories like Sports, Movies, Music, etc., which are not related one to each other. In VCoPs, on the contrary, the categories are related with main purposes of the practice that members are interested to develop. Each conversation in the VCoP is arranged in threads; generally started by a member question, and every member can participate in a thread by replying with a post. In other words, the members interaction in a VCoP is represented by posts in the different threads. As the object of this thesis is to discover VCoPs key-members, members data is necessary. Data like nicknames or user ID will be used to identify them, know with whom is interacting, and associate the content to the correct member. Another relevant data is the community content, representing by the members posts. Like others web features, the content of the post could be text, images, hyper links, videos, etc. For the purpose of present work, the content used will be texts of the community posts. All the data related with the post is necessary, such as the thread and category it belongs, date of the post, who posted and the text of it. 3.2 Preprocessing Data In order to use content for key-member discovery, members posts will be used, but it could not be possible to use it directly. In terms of the message itself, forums sometimes include quotes, so members content would be replicated other members post. In this case, a member would be 16

30 Chapter 3. METHODOLOGY classified as a key-member only for the content that he is quoting. So a first filter approach is to identify the quotes and deleted from post, keeping only the new generated content. Also, there are post which not represent a contribution for the community, such as spam, trolling or flood posts. This kind of messages have to be detected and ignored for the latter analysis to compare members replies. From the point of view of the words of a post, misspelling and acronyms difficult the comparison between a pair of post. Also, there are terms which not correspond to words that are used in forums, such as emoticons or terms like *laughs*, hahaha, LOL, ROFL or XD. To solve this problem, stemming and stopwords filtering is applied. The first reduce each word to his root. For example, conjugated verbs were replaced by their infinitive, plural words by singular, and all expression with not represent a contribution to the content were replaced with the word useless. Also, other terms like hyperlinks, images, or forum tags where replaced by the word misc. When posts were stemmed, every stopword, such as articles, pronouns, adverbs, misc, and useless, were deleted from the text. The result is a filtered post with words that could be useful for further analysis. But even after the application of stemming and stopwords filtering, the number of useful words could be too high for a word-to-word comparison. Is not necessary to use every single word to study if a member is replying or not a question, instead, topics or concept mentioned in posts could be necessary to study their behaviour. When replies content is included, it is possible to have a better graph representation. If are very aligned to VCoPs Main topics, will be a positive interaction and should be maintained, in other case, it should not be considered for further analysis. As a result, once we apply content reduction and include it in members interaction, the resulting network is a filtered version of the original, which keeps only meaningful relationships Text Processing To represent the text data for text processing, the following notation will be introduced. Let V a vector of all different words that defines the vocabulary used in the community after the preprocessing step. We will refer to a word v, as a basic unit of discrete data, indexed by {1,..., V }. Then, a post message p i is a sequence of a subset of S i words from V, where p i = S i. To compose the post, let w ij where { 1 if word vi p w ij = j 0 (3.1) 17

31 Chapter 3. METHODOLOGY V and w ij = S j. Finally, a corpus is defined by a collection of P post messages denoted by i=1 C = (p 1,..., p P ). A vectorial representation of the posts corpus is given by TF-IDF= (m ij ), i {1,..., V } and j {1,..., P }, where m ij is the weight associated to whether a given word is more important than another one in a post. The m ij weights considered in this research is defined as an improvement of the tf-idf term [53] (term frequency times inverse document frequency), defined by m ij = f ij V k=1 f kj ( ) C log 1 + n i (3.2) where f ij is the frequency of the i th word in the j th post and n i is the number of posts containing word i. The tf-idf term is a weighted representation of the importance of a given word in a post that belongs to the corpus. The term frequency (TF) indicates the weight of each word in a post, while the inverse document frequency (IDF) states whether the word is frequent or uncommon posts, setting a lower or higher weight respectively. As posts were filtered eliminating stopwords and stemmed, there would be posts without words. To fix an undefined value of the tf-idf, the IDF was adapted as shown in Equation Concept based text mining Fuzzy Logic for Conceptual Classification The following approach is based on Sebastián Ríos thesis ([51]). Some definitions are needed to start. Linguistic variables (LV) values are not numbers but words or sentences in natural language. These variables are more complex but less precise. Let u be a LV, we can obtain a set of terms T (u) which cover its universe of discourse U. e.g. T (temperature) = {cold, nice, hot} or T (pressure) = {high, ok, low}. A Fuzzy Relation ( ) is a representation of the membership value between spaces of objects. Let A = {a 1,..., a n } and B = {b 1,..., b m } two sets of objects, then a fuzzy relation A B is defined by equation 3.3, and the membership value between a i and b j is represented as a i b j = µ ij. µ : A B µ ij [0, 1] (3.3) A Fuzzy Composition ( ) is a rule to compose fuzzy relations. Let A, B and C set of objects, Q(A, B) and R(B, C) fuzzy relations of A B and B C respectively, and µ Q and µ R the 18

32 Chapter 3. METHODOLOGY membership functions for Q and R. Then the fuzzy composition is defined by equation 3.4, where and are the compositional rules, and represents the composition between the fuzzy relations. µ [A B] [B C] = µ Q R (a, c) = {µ Q (a, b) µ Z (b, c)} (3.4) In order to use LV for conceptual classification, we assume that a post can be represented as a fuzzy relation [Concepts P osts] also called [C P ]. Which is a matrix where each row is a concept and every column is a post. To obtain such matrix we can rewrite this relation in a more convenient manner in Equation 3.5 [36]. In this expression the Terms are words that can be used to define a concept, and P refers to the set ofpost. [Concepts Posts] = [Concepts Terms] [Terms Posts] (3.5) As defined above, let P the total amount of web posts in the whole VCoP, V the total number of different words among all posts, and K the total number of concepts defined for the VCoP site. Then we can characterize the matrix [Concepts P ] by its membership function shown in Equation 3.6, where µ C P = µ [C T ] [T P ] represents the membership function of the fuzzy composition in Equation 3.5, and membership values are in [0, 1]. In other words, how much a post of P belongs to a concept of C. µ C P (x, z) = µ 1,1 µ 1,2... µ 1, P µ 2,1 µ 2,2... µ 2, P.... µ K,1 µ K,2... µ K, P (3.6) There are several alternatives to perform the fuzzy composition, [41] performed a study between six different reasoning models. To decide which one will be used in this thesis, two aspect were considered. On the one hand, if a concept appears in a post does not imply that all terms related to it were mentioned, because is measured the membership degree of a post relating a concept. Even more, with a subset of terms should be sufficient to express the meaning of a concept [51]. Then, if some terms are not present in a post, the degree of expressing a concept should not suffer alterations. On the other hand, the membership degree is measured through a fuzzy relation between concepts and posts, as defined above, meaning that the membership value has a range between [0, 1]. Any value over 1 obtained by equations has not interpretation. For that reason, equation values 19

33 Chapter 3. METHODOLOGY over 1 will have a membership value of one. This is a reason to use compositional rule Equation 3.7. µ Q R (c, p) = min{1, µ Q (c, t) µ Z (t, p)} (3.7) Where Q(C, T ) and R(T, P ) are the fuzzy relations between [Concepts T erms] and [T erms P ost], sharing the set of T erms. Let µ Q (c, t) with c C t T and µ R (t, p) with t T p P membership functions for Q and Z respectively. Comparing with Equation 3.4, is the limited sum defined by min(1, ) and is the algebraic product = (a b). Identification and Definition of Concepts In order to apply the above proposal, it is needed to begin identifying the relevant concepts for the study. It is important to remark that it is not the purpose to have a conceptual classification for information retrieval, which may include thousands of concepts and terms in order to retrieve all relevant documents regardless of the keywords used in the user s query. It is required concepts which describes visitors alignment to community purposes. To do so, experts knowledge whom identify which are the most interesting concepts to describe visitors behavior in the web site is used. Then, we use the help of a thesaurus and dictionaries to extract terms to define the relevant concepts i.e. to express every concept like a list of terms (assuming that a concept is a LV). We used synonyms, quasi-synonyms, antonyms, etc. Afterwards, we need to define the membership values for the fuzzy relations [Concepts T erms] and [T erms P ]. To represent the membership values of matrix [T erms W P ], equation 3.2 was used to calculate relative frequency of words in a web page to represent the membership values of matrix [T erms P ]. More complex is the definition of [Concepts T erms] values. We performed this operation by asking the expert to assign the degree of a term to represent a concept. To do so, he compared two terms each time and gave a value between 0 and 1. For example, a synonym can receive a value near 1; a quasi-synonym, may receive a value near between 0.65 and 1; an antonym can be set to 0, etc. This method is an indirect method with one expert. Finally, we obtained the fuzzy relation µ G P (x, z) applying Equation 3.4. In Table 3.1 we present a column of matrix µ G P (x, z), which represents the goals classification for a single post from VCoP. From this Table we can say that post have a strong relation with the goal 1 and goal 2, almost no relation with goals 3, 4 and 5. On the other hand, we are able to apply an automatic approach for automatic fuzzification. The problem of these approaches are that usually are designed to include all possible concepts. They 20

34 Chapter 3. METHODOLOGY Table 3.1: List of goals and membership values of a singular post Goals µ G P Goal Goal Goal 3 0 Goal Goal take all words or the most repeated words in the corpus as concepts. Then they go to a thesaurus and use an algorithm for automatic fuzzification. Then, the algorithms incorporate all possible concepts which produce that results are very complex to understand. Most of these approaches are used for information retrieval. This way, they should use hundreds of concepts with hundreds or thousands of terms to retrieve all relevant documents when the user enters a query. This is not our case, our main goal is to perform a semantic filtering of the network, not retrieve documents based on a query Using Latent Dirichlet Allocation for Topic classification A topic model can be considered as a generative probabilistic model that relates documents and words through variables which represent the main topics inferred from the text itself. In this context, a document can be considered as a mixture of topics, represented by probability distributions which can generate the words in a document given these topics. The inferring process of the latent variables, or topics, is the key component of this model, whose main objective is to learn from text data the distribution of the underlying topics in a given corpus of text documents. A main topic model is the Latent Dirichlet Allocation (LDA) [3, 4, 25]. LDA is a Bayesian model where latent topics of documents are inferred from estimated probability distributions over the training dataset. The key idea of LDA, is that every document of the Corpus has a probability distribution over a set of topics (T ), where every topic is modeled as a probability distribution over a subset of words (v i V). These distributions are sampled from multinomial Dirichlet distributions. The advantage of this method over concept based approach is that is not necessary to have the topics defined before, because they are discovered by the algorithm. Only experts opinion to provide a description for each discovered topic is needed. As described by [4], the latent Dirichlet allocation model can be represented as a probabilistic generative process described by the following sequence of events: 21

35 Chapter 3. METHODOLOGY For a given post p: 1. The words which appears are independent events. To represent this, let S Poisson(ξ) be the number of words in a given post. The s th word of the post is represented by w s. 2. Let β the distribution over words for each topic and θ a multinomial distribution over topics for each post, where θ Dir(α). The Dirichlet distribution is used in Bayesian statistics to estimate hidden parameters of a categorical distribution [21], which is in this case, the topics. 3. Then, for each word w s p, let z s Multinomial(θ) a vector where z p s is the probability that topic s is in post p. 4. Finally, a word w s from p(w s z s, β), which is a multinomial probability conditioned on the topic z s, is chosen. where the final set of topics T is built by the top k topics z s of n words, for which k and n must be defined a-priori in the experimental setup. For LDA, given the smoothing parameters β and α, and a joint distribution of a topic mixture θ, the idea is to determine the probability distribution to generate from a set of topics T, a post composed by a set of S words w (p = (w 1,..., w S )), S p(θ, z, p α, β) = p(θ α) p(z s θ)p(w s z s, β) (3.8) s=1 where p(z s θ) can be represented by the random variable θ i, such that topic z s is presented in document i (z i s = 1). A final expression can be deduced by integrating equation 3.8 over the random variable θ and summing over topics z T. Given this, the marginal distribution of a message can be defined as follows: p(w α, β) = ( S ) p(θ α) p(z s θ)p(w s z s, β) dθ (3.9) s=1 z s T The final goal of LDA is to estimate previously described distributions to build a generative model for a given corpus of messages. There are several methods developed for making inference over these probability distributions such as variational expectation-maximization [4], a variational discrete approximation of equation 3.9 empirically used by [59], and by a Gibbs sampling Markov chain Monte Carlo model [24] which have been efficiently implemented and applied by [48]. Both approaches are helpful to have a content reduction and a measure of how much a post contribute to the community. To have this measures, an algorithm which measures the community 22

36 Chapter 3. METHODOLOGY posts according the concepts or topic contribution is implemented. Algorithm 1 presents the pseudocode for post filtering which starts with filtered post as an input and finish with a topic or concept post score. When LDA is applied, the set of parameters {k, n, α, β} are determined according to the volume of the Vocabulary, and the documents used (in this case, the volume of post) [48]. Afterwards, the TF-IDF matrix is multiplied by the Topic or Concept matrix (called Semantic Matrix), in order to clean-up the overall corpus vectorial representation. The output of this algorithm is a topic or concepts score for each post. These measure how much are contained the topics or concepts in posts. Now, interaction between members can be evaluated by comparing their topics or concept scores, and how similar are they. Algorithm 1 Initialize Semantic Weights Matrix Input: V (Vocabulary) Input: P (Filtered Posts) Input: k (Number of Topics or Concepts) Output: Semantic Weights Matrix SWM[ P, k] 1: TF-IDF[ P, V ] (Eq. 3.2) 2: SM[k V] Build SM (semantic matrix) by classifying according to Topics or Concepts 3: SWM[ P, k] TF-IDF SM T 3.3 Network Configuration To build the social network, members interaction must be taken into consideration. In general, members activity is followed according to its participation on the forum. Likewise, participation appears when a member post in the community. Because the activity of the VCoP is described according members participation, the network will be configured according to the following: Nodes will be the VCoP members, and arcs will represent interaction between them. How to link the members and how to measure their interactions to complete the network is our main concern. There are two kinds of forums. Directed Forums, which shows clearly to whom is replying a member, and post are aligned according to which member is replying and the time when it was posted, and Undirected Forums, where it is not possible to identify to whom is replying, posts are aligned only according their time in which was posted. Figure 3.2 illustrate both forum classifications. For Undirected Forums, it is necessary to take assumptions about to which members is replying. In this thesis, three VCoPs network representation are defined, according the following replying schema of members: 1. Creator reply Network: When a member create a thread, every reply will be related to him/her. This network representation is the less dense network (density is measured in terms of the number of arcs that the network have). 23

37 Chapter 3. METHODOLOGY Figure 3.2: An example of thread post sequence. 2. Last reply Network: Every reply of a thread will be a response of the last post. This network representation has a middle density. 3. All previous reply Network: Every reply of a thread will be a response to all posts which are already in a specific thread. This network representation is the most dense network. In figure 3.3 the latter three approaches of forum reply representations are presented. Arcs represents members replies and nodes represent the members who made the posts. In a traditional approach, the weight of arcs will be a simple counter of how many times a given member replies to other. Figure 3.3: Three different network configurations which represent a given thread interaction. In order to consider members replies according to the community purpose (for any of these configurations), and to filter noisy posts, both concept based and topic based message reduction is performed Concept-based & Topic-based Network Filtering Previous work [52] brings a method to evaluate community goals accomplishment. In this work we will use this approach to classify the members posts according VCoPs goals. These goals are defined as a set of terms, which are composed by a set of keywords or statements in natural language. 24

38 Chapter 3. METHODOLOGY The idea is to compare two members posts with a distance measure. If the distance is over a certain threshold θ, an interaction will be considered between them. We support the idea that this will help us to avoid irrelevant interactions. For example, in a VCoP with k concepts (or topics), for a thread t, let Pj t a post of user j that is a reply to post P t i of user i. The distance between them will be calculated with Equation d m (P t i, P t j ) = k gt ik gt jk k gt ik 2 k gt 2 jk (3.10) Where gik t is the score of concept (or topic) k in post of user i in the thread t, calculated by Algorithm 1 explained in Section 3.2. It is clear that the distance exists only if Pj t is a reply to P t i in any of the three types defined above. After that, the weight of arc a i,j is calculated according to equation a i,j = { 1 if dm (P t i, P t j ) θ 0 (3.11) We used this criteria in all three configurations previously described (Creator reply, Last reply & All previous reply) Network Construction Now that posts are measured, the following is to construct the filtered graph with the Concept Based or LDA approach. Algorithm 2 presents the pseudo-code on how a the graph G c = (N, A) is built by using the Creator reply network. Equation 3.10 is used to compare post scores. The result is a filtered graph by content, where members replies to thread creators. A similar algorithm is used for the other orientations. The only change is the post which is compared. In the case of Creator reply network algorithm, was the post who creates the thread. On the other hand, Last reply network uses the last post of the thread included by the algorithm. Algorithm 3 presents the construction of this network. In the case of All previous reply network, posts are compared with each post presented in the thread. Algorithm 4 present the pseudo-code for the last network topology. To build a non-filtered graph, the algorithms are the same, with the difference that equation 3.10 is not used. About arcs weight, for the approach of this thesis, only interest if an appropriated interaction exists between two members, not how many interaction between them exists. In all this pseudocodes the result is a graph with arc weights equals to 1, even if they have more pair of post which are over θ. 25

39 Chapter 3. METHODOLOGY Algorithm 2 Creator Reply Network Input: {V, P, k, Users} Output: Network G c = (N, A) 1: Build SWM according to Algorithm 1 2: Initialize N = {}, A = {} 3: for each thread t P do 4: i t.creator 5: N N i 6: for each j {t.replies}, i j do 7: if d m (P t i, P t j ) θ then 8: N N j 9: a i,j 1 10: A A a i,j 11: end if 12: end for 13: end for Algorithm 3 Last Reply Network Input: {V, P, k, Users} Output: Network G c = (N, A) 1: Build SWM according to Algorithm 1 2: Initialize N = {}, A = {} 3: for each thread t P do 4: i t.creator 5: N N i 6: for each j {t.replies}, i j do 7: if d m (P t i, P t j ) θ then 8: N N j 9: a i,j 1 10: A A a i,j 11: end if 12: i j 13: end for 14: end for 26

40 Chapter 3. METHODOLOGY Regarding to the threshold θ, their function is to filter replies which are not similar in terms of content scores with replied post. For that reason, and as a first approach, an strict filter threshold of 0.8 is used in this thesis. This way, only very similar interactions will be considered in filtered graphs. It is proposed to find which value of θ is appropriated to filter the network. Algorithm 4 All Previous Reply Network Input: {V, P, k, Users} Output: Network G c = (N, A) 1: Build SWM according to Algorithm 1 2: Initialize N = {}, A = {}, P rev = {} 3: for each thread t P do 4: P rev = {} 5: i t.creator 6: N N i 7: P rev P rev i 8: for each j {t.replies}, i j do 9: for each k P rev do 10: if d m (Pk t, P j t) θ then 11: N N j 12: a k,j 1 13: A A a k,j 14: end if 15: end for 16: P rev P rev j 17: end for 18: end for Network Visualization There are many techniques to present a network. The most common is a Circular visualization, where the nodes are aligned around a circle and the arcs between them are inside of it. Others approaches focus in the aesthetic of the graphs, trying to avoid crossing edges or arcs, minimize the distance between nodes, non incident edges or arcs, or the angle formed by two incident edges or arcs at a vertex, symmetry of the graph visualization or minimize the area of drawing. Force directed methods for graph visualization define a system of forces which act on nodes and arcs or edges, adjusting the position of nodes depending on the distance r between a pair of them, the attraction force f a, and the repulsive force f r. Eades [14] present a first approach replacing arcs for springs with specific nature length, and put to nonadjacent nodes springs with infinite nature length. Then, the attract force is defined by 27

41 Chapter 3. METHODOLOGY Equation 3.12, where c a is the attract factor, and the repulsive force by Equation 3.13, where c r is the repulsive factor. The system begins with a random distribution, and then begin to iterate until the changes of nodes position be small enough. f a = c a log(r) (3.12) f r = c r r 2 (3.13) Fruchterman and Reingold [18] include the number of nodes and restrict the area of drawing, by defining a parameter k = c area #nodes where c is founded experimentally. In this case, f a and f r are calculated by Equations 3.14 and 3.15 respectively. f a = r2 k (3.14) f r = k2 r (3.15) Finally, Kamada and Kawai [27] not only consider the geometrical distance, but also include in their system the graph distance in terms of the shortest path between a pair of nodes. Then they minimize both distances, including them in the system s energy function. This function is defined by Equation 3.16, where k is a constant, p u and p v are the position of nodes u and v G(N, E), and d(u, v) is the shortest path between nodes u and v. E s = k( p u p v d(u, v)) 2 (3.16) u,v G(N,E) In the case of Eades approach, the system finds their equilibrium by iterating the nodes position. On the contrary, Fruchterman-Reingold and Kamada-Kawai solve complex equation systems to find the equilibrium. The algorithmic time will depend of the number of nodes and arc the graph has the network, because the iterative process and the number of equations depend of this two factors. For this thesis, Circular visualization and Kamada-Kawai approach will be used to present the resulting networks of the VCoP. The reason is to see how dense are the different networks configurations (in the case of Circular Visualization), and to present the interaction in the community (in the case of Kamada-Kawai) 28

42 Chapter 3. METHODOLOGY 3.4 Social Network Analysis key-member discovery Once graphs are built according to section 3.3, it is possible to apply SNA on them. In Section 2 it was explained the different applications that SNA has, each of them can be applied in these graphs, but for the scope of the thesis is relevant key-members discovery only. Thanks to the incorporation of members generated content, the participation represented in graphs correspond to a similarity between asker and repliers posts. To find key-members, members degree is measured. As explained in Section 2.1.2, degree correspond to the number of arcs that a member has, in occasions it is sufficient with this; if not, in-degree and out-degree would be used. There are different techniques to find out the most relevant members of the community, such as centrality, degree (or in-degree or out-degree), which are very common in SNA [57], or HITS [29]. HITS is a technique used originally to classify web pages according to link structure. When users search web pages only by keywords, results could not have relevant pages, presenting pages with the keywords but without relevant information instead. Kleinberg [29] established that if a page have relevant information, other pages have to reference it (by hyperlinks), bringing to this page an authority over that pages. Also, there are pages which not present the information, but they link to pages which it has, being a hub between the user and the required information. Therefore, this approach focuses in the link structure of a set of pages, finding the best authorities and hubs among them, being a good hub a web site who point into a good authority web site, and vice versa; a good authority is pointed by good hubs. The iterative algorithm defines for each page p P (being P a set of pages), an autorithy value (a p ), and a hub value (h p ). At the initialization, all pages have the same score for both, authority and hub. Then, for each iteration this values change according Equation 3.17 and 3.18: the authority of a page depend of how valuated are the pages that are pointing it, and the hub value depend of the authority of the pages that the page itself is pointing. a p = h p = q:(q,p) E q:(p,q) E h q (3.17) a q (3.18) After this reevaluation of the pages values, each vector which represent both values are normalized. The algorithm stops when there is no significant variations between consecutive iterations. So, when the difference between vectors is equal or less to ɛ, the algorithm stops. The result is a ranking about the authority and hub values of the set of pages. 29

43 Chapter 3. METHODOLOGY In a VCoP context, as the action of point represent a reply, a good hub is a member who replies to goods askers and a good authority is an asker who is replied by a good authority. With this approach, the key-members appears separately in the different rankings: top hubs are motivators key-members, who generate content according the communitys purposes or goals, and top authorities are repliers key-members, who encourage or attracts other members to participate, and consequently, provokes that experts reply their questions. As two kinds of key-members were defined, the output of HITS is useful to identify them, instead other techniques, which only order members by its participation without knowing what kind of contribution (motivate or reply) does the key-member. The result of this technique is a rank of members, arranged by its participation in the community, considering their content contribution to the community. 3.5 Analysis and Evaluation Previous task developed has different point of analysis, beginning with the preprocessing data. The first analysis is the meaning of the resulting LDA topics. Just like K-Means, every topic extracted with LDA has to be analyzed to understand what represent. Community administrators can explain which means the set of words which represent a topic, having a characterization of each extracted topic. On the contrary, Concept based approach does not require this analysis, because administrators decided before what are the main concepts that he want to measure in the community. In graph construction case, different topologies could be analyzed. Content filtered graph has two objectives, the first is include community content in the graph structure to compare interactions, and the other is the graph reduction. Filtering graphs imply that some members interactions will not be consider to build the network, eliminating arcs that in traditional SNA are considered. In conclusion, it is possible to compare topologies in terms of number of members, number of arcs and graph density. The evaluation step is relevant in the algorithms to find key-members. The comparison between algorithms will help to understand their respectively results. Two kinds of key-members where defined in previous Section, motivators and repliers key-members, which depends if a member is highly pointed or if a member points many other, respectively. Therefore, algorithms results for both types are different. In the case of motivators, Out-degree and Hub will rank them, and for repliers, In-degree and Authority will be used. Also, degree and HITS algorithm will be compared to evaluate how similar are. It is also important to have administrators point of view, because he knows the community, their members and contributions. Algorithms precision will be evaluated with administrators keymembers, and will present two meanings. In the one sense, how similar are algorithm results, and in the other, to understand administrators criteria to consider some members as key-members. 30

44 Chapter 4 Application in a real VCoP In this chapter, the thesis application in a real VCoP is presented. Following the steps of the methodology explained in Section 3. A description of the experimental VCoP 1, followed by the basic database and how it is completed during the development of the steps is detailed. Then, the results of each step are presented and discussed with their preliminary improvements. 4.1 A real Virtual Community of Practice Plexilandia 2 is a VCoP formed by a group of people who have met towards the building of music effects, amplifiers and audio equipment (like Do it yourself style). It was created with the purpose of share common experiences in the construction of plexies 3. Today, plexilandia count with more than 2500 members in almost 8 years of existence. All these years they have been shearing and discussing their knowledge about building their own plexies, effects. Besides, there are other related topics such as luthier, professional audio, buy/sell parts. Although, they have a basic community information web page, most of their members interactions are produced on the discussion forum. Table 4.1 presents the activity in the different categories of the forum since the beginning of the community until 2010 is shown. In the beginning there was only one administrator. Today, due to the growth of the community, this task is performed by several administrators (in 2008 they count with 5 administrators). In fact, the amount of information weekly generated makes impossible to let the administration task 1 Work developed by Felipe Aguilera and Sebastián Ríos Plexi is the nickname given to Marshall amp heads model 1959 that have the clear perspex (a.k.a plexiglass) fascia to the control panel with a gold backing sheet showing through as opposed to the metal plates of the later models. 31

45 Chapter 4. APPLICATION in just one person. Nowadays, an administrator has the following tasks: Re-classification of posts: it is frequent that members post a message in the wrong forum category. For example, buy and sell advertisement should be placed in the general forum but newcomers place them in other sections. Therefore, administrators have to move the post into the right category. Members moderation: administrators must protect that members use the forums to discuss topics which are related with the community and use appropriate language. This task is not as frequent than classification, because other active members help to detect these situations, facilitating the administrator work. Participation: although communitys knowledge is distributed in all its members, some members have greater degree of knowledge or expertise about some topics. Due to diverse reasons (community founders, experts in an area, etc.), administrators are important knowledge generators. Therefore, administrators are active participants of most discussions. They motivate discussions, create new threads and create new categories. Table 4.1: Plexilandia activity Forum TOTAL Amplifiers Effects Luthier General Pro Audio Synthesizers TOTAL In their first six years of life, this community has undergone a great sustained growth in members contributions, reaching a peak in From this year, community participation is constantly decreasing, so community owners want to have measures which help to enhance the community in order to improve members participation. The vision of administrators and experts about the community is based mostly by experience and time participating in the community. They also have some basic and global measures. For example, total number of posts, connected members, etc. However, they don t have information about members browsing behavior, members content quality or how this members contribute to community purposes. To have a characterization of the community, table 4.2 summarizes general measures obtained through the development of the forum. 32

46 Chapter 4. APPLICATION 4.2 SNA-KDD application Table 4.2: Plexilandia general measures Measures Total Users 2857 Categories 6 Total Threads Total Post It is clear that a database is involved through every step of the SNA-KDD process. The database is being completed with relevant information for key-member discovery every time that a step is fulfilled. Table 4.3 show which table is updated in each step of the SNA-KDD process. Table 4.3: Database Evolution through SNA-KDD development Data Selection Text Processing Network Configuration SNA users x post x concepts x terms x time x topics x words x CB post score x LDA post score x graph x orientation x filter x ranking x scores x In the Data Selection step, all information about members and their posts are inserted in tables users and posts. Also, as concepts are defined previously by administrators, tables concepts and terms contain all related information. Then LDA is run and the topics extracted, the words which composed these topics and their probabilities are inserted in tables topics and words. With both topics and concepts defined, Text Proccessing step algorithm for content reduction is applied, obtaining content posts scores. CB post score and LDA post score contains the scores for each post. Network configuration step creates graphs with their different topologies, and meta data about them is inserted in tables graph, orientation, filter, whose combination describe a graph. After this, SNA step is applied over graphs, obtaining members ranks for each topology, which is 33

47 Chapter 4. APPLICATION inserted in table scores, while table ranking have the information about the ranking algorithms used Data Selection To obtain the data, it was necessary to review the forum database first. The required data was disperse in the different databases tables, so the required data from original database was extracted to build a small and summarized database. The structure of the initial database is illustrated in figure 4.1. The first tables are data of the users and the posts in the community. Then, two tables related to the concepts are included, one of them contain concepts meta data (concepts) and the other have the terms which represent each concept, with their corresponding score (terms). Figure 4.1: Preliminar database Text Processing Both text processing algorithms are executed as explained in Section 3.2. There are differences between both approaches. In the case of Concept based, the concepts are already detailed in the database, opposed to LDA, in which a previous algorithm is run over the community texts in order to obtained the topics. After that, table topics is built, which have the same structure of terms, that means a table with words which composed a topic with their respective probability (instead of the term score). 34

48 Chapter 4. APPLICATION The next step of text processing is to obtain the scores for each post for both LDA an CB algorithm with fuzzy classification explained in section When post scores had been calculated, there would be post with scores equals to zero for each concept or topic. In that case, these posts have to be deleted, because their comparisons with other posts will be null. Also, time execution for network configuration will be saved ignoring these posts before. Summarizing, tables CB post score and LDA post score will have post with at least one topic or concept score different to zero Network Configuration and key-member discovery The graph has many variables which modify its configuration. 1. Time: One dimension not mentioned before it is time. Depending time period of that it is wanted to be analyzed; it could be possible to have monthly, annual or historic networks. It is also possible to have graphs with other specific periods of time like half-year or whatever administrator wants. But, it is not direct to build a network with specific time period, because topics and concepts scores would be different regarding the post that are considered, affecting measures like TF-IDF. That means that post scores tables have to have for each post, as many scores as periods are wanted to analyze. 2. Graph Filtering: Including the traditional non-filtered graph, the other two configurations correspond to graphs filtered by topics or concepts, according with algorithm Interaction topology: According to the assumption of to whom is replying?, three possible configuration appears. Creator reply (algorithm 2), Last reply (algorithm 3) and All previous reply (algorithm 4). To configure the network and have the graph representation, all of this three variables has to be decided. In this thesis, monthly and annual networks have been built, and to be compared, the threshold is set by 0, 8 for both filters, as explained in section Then, for each interaction representation, the result is a graph with the members who posts in a specific period of time and has an interaction greater or equal to the filter threshold. Graphs are saved and table graphis created, which has the number of nodes, arcs and density for each graph. Also, tables orientation and filter are created to describe these items which define graph topologies. Once the graphs are built, the next step is to find the key-members. There, HITS, in-degree, out-degree and centrality are applied for each temporal, filtered and orientation topology, obtaining a score for every node of the graph. The last table is created here scores, which contains member ranking score for each topology. Due to the combination of variables involved in a specific member score, this table contains a huge amount of data. For a single period of time, there are two ranking approaches (degree and HITS), and both of them have two measures (In-degree and Out-degree, Hub and Authority). Also, this 35

49 Chapter 4. APPLICATION rankings are applied for the three network topologies, filtered by Concept based and LDA or just not filtered. In other words, a member in a single period of time have 36 different ranking scores. With this structure, information could be presented moving across different dimensions combinations. When graph rank scores are arranged from high to low, the order will present the most valuated members in first place. Then, potential motivators and repliers key-members of a specific time period and topology will be members with highest rank scores. 4.3 Results and Discussion As described in Section 1.2.3, one of the expected results is the Key-members discovery database, in which administrators could surf to see the key-members of different times, topologies or rank algorithm. Database tables are shown in Figure 4.2. Appendix A.2 present the ER database model. Figure 4.2: Key-members discovery database Topics obtained The application of LDA over text content resulted in 50 topics with 100 words and their respectively probabilities. Table 4.4 presents the first 20 words of Topics 6, 43 and 3 obtained by LDA. 36

50 Chapter 4. APPLICATION Table 4.4: Topics obtained Cable connections Components buy/sell solicitudes Classic questions Word Probability Word Probability Word Probability cable mand cos jack vend electronica pata precio hac entr mail com negar pag bas conectado cos aprend bateria pedir gust sal envio conocer conectar avis construir conecta traer empezar circuit interesa trabaj lado necesit proyect plug contact busc enchuf msn hay switch interesado harto conexion cotiz informacion extremo habl libro rojo import estudio out via diseo cen respond mat Results of LDA were presented to administrators 4, which provided meaning to many of the topics obtained. Only 9 of the 50 topics were considered unhelpful to understand the community, and Topic 34 was described as many words bad written. Table 4.5 has 7 of the successful 40 topics as a sample of the meaning found with LDA. The rest of the topics extracted with their respective meanings are presented in A.3. The words of each resulted topic were used for the construction of the graphs filtered by LDA. Table 4.5: Topic meanings Topic Id Meaning 03 Classic questions of new members 06 Cable conections 13 Electronic supply stores 23 Images and videos of amplifiers construction improvements 25 Places where to buy 35 Community coexistence norms 43 Components buy and sale solicitudes 4 Work develop by Felipe Aguilera and Sebastián Ríos 37

51 Chapter 4. APPLICATION Resulted Networks Monthly and annually graph representations of the community were built, but only results for 2009 are presented in this section. The graphs will be compared according their interaction representation. Then, results were arranged by filter technique (non-filtered, Concept Based and LDA) and circular visualization of the graph was chosen, in order to evidence the changes in members interaction when the different filters are applied. Figure 4.3 illustrate the resulted graphs with Creator reply network. The changes between a Concept base and LDA filtering are very significant. This phenomena is explained for the topology itself. A member creates a thread because needs an answer for a specific problem. This problem is related with community concepts in a certain degree, or have a probability that topics appears. Therefore, only replies whose content is according to concepts or topics of original post will be considered. Figure 4.4 shows the Last reply network. In this case, no significant change is observed. Threads contains so many post that is possible to see, in the same thread, discussions which are not aligned with original post. As this network measures interaction of consecutive posts, they could be strongly related, but does not mean that posts are aligned with original post. In conclusion, this configuration measures how similar are consecutive post, without considering their content is according original post. Finally, figure 4.5 present the graphs oriented by All Previous replies. In this case, when a member replies in a certain thread, the assumption is that he or she is replying to each member whose post is before in the thread. Total of arcs between nodes will be greater than other configurations, but filters conserve their quality, appreciating a considerable decrease in network density. Figure 4.3: Creator reply Networks Density is a ratio which compare the difference between the arcs that compose the graph and all potential arcs that could be build in them. Equation 4.1 represents the density for a graph 38

52 Chapter 4. APPLICATION Figure 4.4: Last reply Networks Figure 4.5: All previous reply Networks G(N, E). For present analysis, formula 4.1 is adapted with the purpose that each graph could be compared with the original graph. In other words, if a graph G(N, E) is filtered, the density of the filtered graph G f (N f, E f ) is calculated with equation 4.2 and Table 4.6 shows the results for each network configuration of year d = E N ( N 1) (4.1) Interaction representation affects the graph density. In fact, Figure 4.6 shows that except for LDA filtered graphs, All previous reply is more dense than others configurations. This exception could be explained because LDA filters with more topics, meaning that two non-consecutive post would be less similar than consecutive ones, which is what Last reply orientation have in his configuration. d f = E f N ( N 1) (4.2) 39

53 Chapter 4. APPLICATION Table 4.6: 2009 Densities Counting Concept Based LDA Creator Last Reply All Previous Figure 4.6: 2009 densities comparison With visualization of Figures 4.3, 4.4, and 4.5 a preliminary idea of the graph density is presented. The filter techniques effectively reduce the interaction between members, keeping only meaningful relations for the community. Figure 4.7 compare filtered graphs with traditional approach for each reply topology, to evaluate how much reduced are graphs after applying each filtered algorithm. Results are impressive, specially in the All previous approach, where density was dramatically reduced. It is not relevant if Concept based or LDA presents the lower density of a representation, main result is that filtered graphs have a lower density than the original graph. After this general visualization, it is important to evaluate if the same behaviour is presented in each network configuration. Monthly graphs density through year 2009 were used to compare filters behaviour. Figures 4.8, display the density evolution for Creator reply graphs. It appreciates that there are months of high activity which consider with summer vacations. The most interesting is that this behaviour is common for three configurations. The density reduction achieved is considerable, both filters presents a reduction from 2% to 0.05% or less. In this case, LDA filter is stronger than Concept Based. Figure 4.9 illustrate density evolution for Last reply graphs. In this case is interesting to see how both Concept based and LDA conserves the behaviour of the original graph, meaning that during the year, the directs replies and concepts or topics treated are very similar. In the case of LDA, the filter is stronger than Concept based. Besides, both filters achieved their purpose of reduce the interactions. 40

54 Chapter 4. APPLICATION Figure 4.7: Density reduction Finally, figure 4.10 presents evolution of the density through 2009 for All Previous reply graphs. There, the density reduction is complete notorious. Both filters eliminate most of the community interaction, which combined with previous figures means that replies are more oriented to original post or previous reply more than all post of the thread. Figure 4.8: 2009 Creator Oriented Density It is important to highlight that not only a reduction of the density is obtained with the application of content filter algorithms. As described in Section 1.2.3, community behaviour is very similar among all filters, even peaks are reached at the same period, and in the case of Last reply orientation, the behaviour of the Concept Based network is the same as the original network but with a lower density. Although, it is possible to conclude that at the same threshold θ for content filtering, Concept Based is better for Reply oriented networks, while LDA is recomendable to use in 41

55 Chapter 4. APPLICATION Figure 4.9: 2009 Reply Oriented Density Figure 4.10: 2009 All Previous Oriented Density Creator oriented in terms of reduce density. For All Previous oriented network it is not clear which filter is better, because in magnitude both present good performance Key-members Discovered Once the graphs are obtained, proposed algorithms and traditional SNA techniques are applied to discover key-members. HITS [29], and degree [57] algorithms where chosen. The reason of use this algorithms is that both present in their results the approaches what defines a key-member. While Authority and In-degree consider the replies received by a member, Hub and Out-degree uses the replies generated by a member, meaning that key-members will be people who motivate participation, and whose replies are according to community purposes, respectively. 42

56 Chapter 4. APPLICATION As a sample of the key-members discovered, tables 4.7 and 4.8 shows the Top 10 motivators and repliers key-members for Creator Oriented graph, respectively. Members nicknames were replaced by their users ID. Tables for Reply and All Previous orientation are showed in Appendix A.1. Table 4.7: Creator Oriented Motivators Key-members In-Degree Authority Counting Concept Based LDA Counting Concept Based LDA User User User User User User User User677 User User User677 User User User User677 User User User User User User User User User677 User User User User User User User677 User User User User User User User User User677 User User User36 User User793 User36 User User User User User User User User User User User User User User793 Table 4.8: Creator Oriented Repliers Key-members Out-Degree Hub Counting Concept Based LDA Counting Concept Based LDA User User User User User User User User User User User677 User User User User User User User User1 User1 User User User User677 User User User User User User User161 User User36 User User User User User161 User User User User User36 User User32 User User User7 User User32 User1 User User User User32 User677 User677 User1 User User677 Now that key-members are identified for each network, an evaluation of detection precision has to be realized. To do this evaluation, administrators were asked about who are key-members by their perspective. They not only have a key-members list, also they classify their key-members from A to C, according to their relevance for the community, being A the most important key-members and C the less. Table 4.9 show the administrators key-members. It is important to emphasize that administrators consider that key-members of the same category has the same relevance between them, so the order not represent an internal relevance. Compare key-members automatically detected with administrators key-members has two objectives. First, to evaluate the precision of the algorithms from an administrator point of view, and second, understand which aspects are considered by administrators when classify users as key- 43

57 Chapter 4. APPLICATION Table 4.9: Administrators Key-members User Classification User1 A User A User3 A User85 A User36 B User B User B User677 B User B User B User B User161 B User671 C User759 C User32 C User7 C User127 C User733 C User136 C User28 C User658 C User22 C members. The precision is measured with the classical Information Retrieval ratio. From the set of K key-members, let tp the number of administrators key-members that an algorithm detected. Then, precision p is calculated with equation 4.3. p = tp K (4.3) As key-members are classified in three categories, four types of precision were calculated.table 4.10 presents the results of key-member global precision (from the key-members detected, how many really are), for each combination of filter and orientation. The results are promising, except for the combination LDA - Creator, precision of algorithms are over 50% in both motivators and replier, which means that at least half of the key-members detected are also considered key-members by administrators. All Previous orientation present the best results, explained by motivator key-members ask relevant questions for community, and the effect of a good question is measured according how many people participate in discussions. From the point of view of the replies, All Previous orientation presents also good results. As an expert, repliers key-members gives high valued answers, so a reply to all previous replies will have a high value. 44

58 Chapter 4. APPLICATION Table 4.10: Global Key-members precision Creator Reply Previous Counting In-Degree 54,5% 63,6% 68,2 Counting Authority 54,5% 63,6% 68,2 CB In-Degree 59,1% 63,6% 68,2% CB Authority 59,1% 59,1% 68,2% LDA In-Degree 40,9% 63,6% 63,6% LDA Authority 40,9% 63,6% 63,6% Counting Out-Degree 72,7% 59,1% 68,2% Counting Hub 54,5% 63,6% 63,6% CB Out-Degree 68,2% 63,6% 63,6% CB Hub 50,0% 59,1% 68,2% LDA Out-Degree 59,1% 59,1% 63,6% LDA Hub 50,0% 63,6% 63,6% The difference between Degree and HITS are not notorious in the case of motivators, which have the same precision. On the contrary, repliers present an uncertain result, not having the same precision for Out-Degree and Hub for the same filter-orientation combination. Despite this results, the variation is not significant and could be explained by a simple member exclusion in one of both algorithms. The following precision measures are related with the type of key-member founded. As administrators have their own key-member classification, it was necessary to use it for detected, assuming that rank algorithms presents a similar behaviour in that sense. The results for precision of Type A detection are presented in table Due to the number of Type A key-members, a variation of one member will implied an increment or a decrease of 25% of the algorithm precision. Therefore, an increment in the precision is not than significant for the same reason. In general, the detection of repliers key-members have better results than motivators. This could be helpful to understand the kind of key-member that type A is. If a member is classified as a Type A is because their contribution is too helpful for the community, and from administrator point of view, this contribution are good answers for community discussions. The reason that Creator Out-Degree combination has the best precision is explained because count how many good answers are posted by a member. Either way, algorithms are capable to find at least one of Type A members, which is a good result consider that the amount of them is low. Type B key-members detection precision are showed in table The precision is more homogeneous than type A, probably because it is easier to find a key-member in a bigger set of candidates, being not necessary that the same member be found in every algorithm. The results are similar for Degree and HITS, and also presents high values for each combination. The action of filters affects only in Creator orientation, meaning that member contribution are not usually according to the original post. In fact, the similitude in the precision could be interpreted as a both 45

59 Chapter 4. APPLICATION motivator and replier key-member, almost like a broker which in occasions gives good answers and in others contributes asking to continue the discussion or bringing type A members which answer the original post. Table 4.11: Type A Key-members precision Creator Reply Previous Counting In-Degree 25,0% 50,0% 50,0% Counting Authority 25,0% 50,0% 50,0% CB In-Degree 50,0% 25,0% 50,0% CB Authority 50,0% 25,0% 50,0% LDA In-Degree 0,0% 50,0% 25,0% LDA Authority 0,0% 50,0% 25,0% Counting Out-Degree 75,0% 25,0% 50,0% Counting Hub 25,0% 50,0% 50,0% CB Out-Degree 75,0% 50,0% 50,0% CB Hub 50,0% 25,0% 50,0% LDA Out-Degree 50,0% 25,0% 25,0% LDA Hub 25,0% 50,0% 25,0% Table 4.12: Type B Key-members precision Creator Reply Previous Counting In-Degree 75,0% 75,0% 75,0% Counting Authority 75,0% 75,0% 75,0% CB In-Degree 62,5% 75,0% 75,0% CB Authority 62,5% 75,0% 75,0% LDA In-Degree 37,5% 75,0% 75,0% LDA Authority 37,5% 75,0% 75,0% Counting Out-Degree 75,0% 75,0% 75,0% Counting Hub 75,0% 75,0% 75,0% CB Out-Degree 75,0% 75,0% 75,0% CB Hub 50,0% 75,0% 75,0% LDA Out-Degree 50,0% 75,0% 75,0% LDA Hub 50,0% 75,0% 75,0% The results for Type C key-members does not present a easy-to-understand pattern. Table 4.13 presents the results, which precision is lower than previously showed. A first approach is that algorithms are not capable to find members of this type, but the results presented before demonstrate that they could, so the second approach is to give an interpretation of what type of key-member the administrator is defining. As the analysis has been made with an administrator point of view, an explanation could be that this type of key-members had not been active during the period studied, either they are historic key-members which participated in the community time ago and now they their participation is moderated, or just not participate so much in the period of time analyzed. For that reason administrators recognize them as key-members, but algorithms not. 46

60 Chapter 4. APPLICATION Besides, the behaviour through algorithms, in terms of precision range, is similar. Comparing Degree and HITS, it not appears a meaningful difference in their precisions. All Previous orientation, although, present the best results for this type of key-member, finding in general a 30% of them. Same as previous analysis, the reason is the major number of interaction which this configuration presents, giving the oportunity of consider the same contribution multiple times. Table 4.13: Type C Key-members precision Creator Reply Previous Counting In-Degree 10,0% 20,0% 30,0% Counting Authority 10,0% 20,0% 30,0% CB In-Degree 20,0% 30,0% 30,0% CB Authority 20,0% 20,0% 30,0% LDA In-Degree 20,0% 20,0% 30,0% LDA Authority 20,0% 20,0% 30,0% Counting Out-Degree 30,0% 20,0% 30,0% Counting Hub 10,0% 20,0% 20,0% CB Out-Degree 20,0% 20,0% 20,0% CB Hub 10,0% 20,0% 30,0% LDA Out-Degree 30,0% 20,0% 30,0% LDA Hub 20,0% 20,0% 30,0% Previous analysis consider only key-members that administrators could remember. When algorithms where applied, results were presented to administrators and they were asked to recognize key-members presented by algorithms, in order to discover whose where ignored in their previous review. Improvements are represented not only in quantity, but also in algorithm precision. Table 4.14 presents a comparison between the number of administrator key-members before and after the review of algorithm results. It is notorious the increment of Global key-members, produced for the administrators review: from the previous key-members, eight of Type B and C were re-classified as Type A, nine Type C were re-classified as Type B. Also, twelve Type A, nine Type B and seven Type C members were discovered by algorithms and added to the set of administrator key-members. Table 4.14: Administrator Key-members enhancements Admin Algorithm Type A 4 26 Type B 8 16 Type C 10 7 Global Considering this new set of key-members, precision was calculated again. Table 4.15 shows the improvements in the Global precision for each graph configuration. Content filters does not have a better precision than SNA approach, finding a similar proportion of key-members. Although, that does not mean that algorithms find the same key-members. There are two factors that affect this result: first, administrators collect the new key-members by reviewing all algorithm results together, and second, precision was calculated considering how many of the 49 administrator key-members 47

61 Chapter 4. APPLICATION are in the top-25 rank for each configuration. Both factors helps to understand the results presented in Table 4.15 and realize that these numbers could be similar, but does not have to represent the same. Table 4.15: Global Key-members precision enhancement Creator Reply Previous Counting In-Degree 84% 84% 84% Counting Authority 100% 80% 88% CB In-Degree 100% 80% 88% CB Authority 84% 84% 84% LDA In-Degree 48% 80% 72% LDA Authority 48% 80% 72% Counting Out-Degree 84% 80% 76% Counting Hub 84% 80% 84% CB Out-Degree 84% 84% 80% CB Hub 84% 84% 84% LDA Out-Degree 72% 84% 84% LDA Hub 68% 80% 80% Apparently, it is not possible to establish that Content filtering presents an improvement finding key-members with this approach. At least, their performance is similar to traditional SNA. The value of the threshold θ has to be reviewed and the application of Content filtering in order to find sub-networks that can not be founded by traditional SNA Key-members detection algorithm comparison In section appears that results obtained by Degree and HITS are similar in terms of precision. Then, it is natural to think that it does not matter which algorithm is chosen to detect key-members, because all give similar results. But even both algorithms work with nodes degree, the difference is how they use it. While degree in general only count the number of arcs and where are pointing them, HITS use arcs for an iterative process, where the output is a normalized rank score of members. In this section, admin key-members are not necessary, because algorithm performance is measured. Also, eventhought the order in which key-members appears is not relevant, this factor will be considered in this case because both algorithms arrange them according the relevance calculated by their degrees. To compare both algorithms, for each reply topology and filter content process, In-degree and Authority will be compared for motivators key-members and Out-degree and Hub for repliers key-members. To analyze them, Kendall Tau rank correlation coefficient is used. This statistic calculates how similar are two rankings according how the elements were arranged in them. Let X and Y be two comparable rankings of n elements, (x 1, y 1 ), (x 2, y 2 )... (x n, y n ) as the pair of 48

62 Chapter 4. APPLICATION elements ranking in the n th position of both ranks. Then, a pair of observations (x i, y i ) and (x j, y j ) are defined concordant if the ranks of both elements agree, i.e. x i > x j and y i > y j or x i < x j and y i < y j. In other case, the pair is defined as discordant. Kendall tau correlation coefficient is defined by equation 4.4. τ = # of concordant pairs # of disconcordant pairs 1 2 n(n 1) (4.4) This coefficient is in range ( 1, 1), being 1 too similar rankings and -1 when a ranking is the inverse of the other. In addition of this coefficient, a significance test was made where null hypothesis is rankings are independent. The expected result is that degree and HITS are similar for each combination of topology, filter and kind of key-member, meaning that coefficient will be near to 1, and null hypothesis of ranking independence will be rejected. Table 4.16: Kendall τ coefficient for rank algorithm Motivators Key-members Repliers Key-member Counting Concept Based LDA Counting Concept Based LDA Creator 0,6179 0,8884 0,8830 0, ,6358 0,7558 Reply 0,7374 0,9222 0,8347 0,4898 0,4159 0,4725 Previous 0,6036 0,8622 0,8258 0,4272 0,3125 0,5008 Table 4.16 shows the Kendall τ coefficient between Degree and HITS for both motivators and repliers key-members. The results are significant, the null hypothesis was rejected for every combination, which means that both algorithms are dependent. This first conclusion is very logical, because both, degree and HITS, use the same network. The coefficients shows that both algorithms has a very similar members arrange in the case of motivators key-members. For repliers key-members, algorithm present the same behaviour in Creator orientation, but in Reply and All Previous the correlation it is not that clear. Graphically, the result are more evident. As a sample, scatter plots of ranks by Degree and HITS were made. Figure 4.11 illustrate for each filter, the scatter plots of In-Degree and Authority for Creator oriented networks from a motivator key-member point of view. It is clear that there is a correlation between both ranks, which confirms the results presented by Kendall τ. The more elements being in the diagonal, the more similar are the rankings. Figure 4.12 show the scatter plot for repliers key-members for Creator reply Network. In this case, the Counting graph does not present a specific order, not like the filters, which have a great similarity. This figure helps to emphasize a main purpose of this work. Filtered networks ignored interactions which are not according community concepts of topics. In this case, by deleting these interactions, appears an order of relevance of members that was not clear in a traditional network. For the other cases, where the correlation is not high, only means that the algorithms are not so similar, but in any case it is possible to do a judgment about which one is better. 49

63 Chapter 4. APPLICATION Figure 4.11: Motivators Creator reply rank Scatter plot Figure 4.12: Repliers Creator oriented rank Scatter plot Filter algorithm Section presents the results obtained comparing with administrators key-members. But in occasions, key-members are unknown even to administrators. In this case, both traditional SNA and proposed SNA and Text Mining approach are helpful to discover key-members. Compared with traditional approach, filters only consider valuable contributions, so if there is loss of information, are from post that not contribute to the community, such as trolls, flood, spam or other malicious posts. Then, results of key-member have to be different, because the relations considered is reduced. Although, is expected that key-members appears either the network be filtered or not, because participation is a factor in the algorithms. So, as mentioned in Section 1.2.3, the other expected result is that the ranking of the key-members has to change in favor of the real key-members. To evaluate changes provoked by network filtering, Kendall τ was used to have a correlation between filtered and traditional graphs. Table 4.17 have the correlations between filtered graphs and traditional approach (Counting graph), for motivators and repliers key-members. Correlations are low, which means a rearrangement of the members in terms of the content generated by them. Besides, this correlation appoint that there is a relation between both graphs, what is true, because 50

64 Chapter 4. APPLICATION filtered graph derive from Counting graph and also traditional SNA has been develop a preliminary solution to key-members discovery problem through years. Table 4.17: Kendall τ coefficient for filter algorithm compared with Counting graph Creator Reply Previous In-Degree Authority In-Degree Authority In-Degree Authority Concept-Based 0,2755 0,2642 0,2915 0,3389 0,4052 0,4383 LDA 0,2291 0,1932 0,4373 0,4127 0,3312 0,3820 Out-Degree Hub Out-Degree Hub Out-Degree Hub Concept-Based 0,2832 0,2184 0,3012 0,3392 0,2842 0,4051 LDA 0,2593 0,1726 0,3409 0,4278 0,2271 0,2961 Summarizing, there is an improvement using text content filtering in community relationships, because as well of discover the member who makes more questions are more replied, it captures the content that this members does in the community, presenting not only the key-members in terms of participation, but also in terms of contribution for the community. 51

65 Chapter 5 Conclusions and Future Work VCoPs key-member discovery problem is commonly solved with SNA techniques. Present work suggest that apply only this approach will result in a wrong judgment, because the content generated in the community is not considered. For that reason, the main objective of this thesis was to incorporate community content by Web Text Mining techniques, such as Concept Based and LDA, in order to improve the discovery of key-members. As presented in section 4.3, the main objective was accomplish, following the specific objectives stated in section Each objective is fulfilled and their contribution to this thesis is presented in the following list. 1. In Section 3.3, the elements which compose communitys network was determined. Also, three topologies according how and to who a member replies in the community were designed. 2. Concept Based (section 3.2.2) and LDA (section 3.2.3) algorithms were implemented in order to measure the content generated by members. With different approach, both algorithms give a score to each post of the community, related with the concepts that administrators defined or with the topics extracted from the community. 3. An algorithm which include community content was develop and explained in section Three graph resulted, traditional graph and filtered graph made with Concept Based and LDA. 4. As presented in section 4, SNA was applied as was detailed in section 3.4. Degree and HITS were applied and result in two approaches of key-members: motivators and repliers key-members. Analysis was based and treated separated. 5. In section 4.3, results are shown and states the improvements generated by using the content to discover key-members. In the following, improvements obtained are fully detailed. 52

66 Chapter 5. CONCLUSIONS AND FUTURE WORK 5.1 SNA-KDD methodology In order to have an scheme which explain the novel approach proposed in this thesis, KDD was adapted into the SNA-KDD methodology, which include a SNA step, used in this thesis for keymember discovery, as explained in section 3. This approach will be helpful as a basis for later work. Also, network configuration was established. The three topologies based in the assumption of how a user replies in a thread are complementary, depending in what administrators needs. Creator reply network is helpful to find key-members who motivates participation. Last reply network is to obtain the members whose replies are according to the post exactly before, measuring their capacity of direct interaction. Finally, All Previous reply network helps to find member whose replies are according to community purposes, because is consider as a global reply for the thread. As the idea was to present a work structure for SNA which include community content, the idea for future work is to add other features that SNA can develop, such as sub-communities or brokers finding, just as discussed in 2.1, among each other treated in [57]. 5.2 Content Filtering Concept Based and LDA were successfully implemented as contents filters 1. In the case of LDA, the topics extracted were recognized by administrators as relevant topics for the community. Then, filters were applied to community, obtaining a reduced graph keeping only interaction which are related with the community purposes or topics extracted, as explained in section When filters where used, the graph contains only high valued interactions, and this results is a great improvement for SNA state of the art, because probes that is possible to obtain a graph representation which have not only the interaction, but also the quality of the community discussions, improving the later application of each SNA technique. The interaction threshold used for this work was 0.8, meaning that community was extremely filtered. in the case of LDA filtration is more notorious than Concept Based, having he lowest density in the majority of the cases. What threshold is better for each filter was not discussed in this work, because the idea was to evaluate the quality of the filter. In future work, the value of the threshold has to be discussed to obtain the better filtration of the community. 1 LDA was implemented by Gastón L Huillier 53

67 Chapter 5. CONCLUSIONS AND FUTURE WORK 5.3 Density reduction With the implementation of content filters, network density was reduced. In section 4.3.2, annual networks were showed, demonstrated that novel approach filter the content, reducing the amount of interaction. Otherwise, Circular visualization illustrate how much the interaction are decreased, which means that other types of visualization could be used and helped to understand graphically the community behaviour. Figure 5.1 display the 2009 Creator orientation networks with Kamada- Kawai Visualization. Figure 5.1: Kawada-Kawai Visualization for 2009 Creator oriented networks Visualization would help administrators to reinforce results obtained by SNA, and also verify their own judgment when classify users as key-members. In addition, this presentation could be clarifying as a preliminary analysis over the network, helping to discover sub-communities for example, changing the color of nodes depending on which sub-community it belongs. Summarizing, visualization aims to bring a point of view of the network that without filter application is impossible to view, because of the huge amount of interaction between members. 5.4 Key-members discovered Traditional SNA find key-members, in terms of their participation in the community. Two definitions of key-members were presented according of the relations which arcs represent. The first type of key-members was called motivator key-member, who correspond to members which posts are replied for others, motivating and generating participation. The second type, called repliers key-members are members who reply more than others. Degree and HITS were used to find them, being In-Degree and Authority the ranks for motivators, and Out-Degree and Hub the ranks for repliers. The first conclusion is that both algorithms find key-members with the same precision, but this not mean that they are substituting algorithms, because key-members found by both of them are not the same. 54

68 Chapter 5. CONCLUSIONS AND FUTURE WORK As exposed in section 4.3.5, no loss of information was found when filters were applied, only a rearrangement of the ranking, meaning that content filtered network can be used to discover keymembers. The precision decrease a little, but more than mean a decrease in the finding quality is a new approach to present key-members. The reason is that this kind of graphs consider participation which contributes to the community, so the key-members found correspond to members which posts are according to community purposes or treated topic. The improvement of find this new kind of key-member helps administrators to improve their point of view, differentiating two definitions of key-members, and also presenting to them members which are not consider as key-members, but in fact they are, enhancing the list of communitys key-members. With the filter graph, a motivator key-members correspond to members whose posts fulfill community concepts or topics and motivate participation of other members in terms of post content. On the other hand, repliers key-members are members whose replies are according to the post which is replying and also the community desired content. Results were showed to administrators, in order to have an evaluation of them. The result was an increment in administrator key-members, which implies in a better precision from each algorithm and graph configuration. It is possible to conclude that both SNA and Content Filtering approach present similar results finding key-members, at least in terms of proportion. Therefore, it is important to find how Content Filtering would represent a real improvement to the Key-member discovery problem. 5.5 Future Work Present work is a start point for different work areas, because consider many different approaches. In this section, the guidelines to future work are presented, focused in content of the community, new representations of a network and SNA applications development Content contribution At this moment, community content is used only as a interaction filter. If a reply do not coincide with the post which is reply, the interaction is deleted. Even more, if a interaction is valid and another valid interaction appears between the same members, it will not count, because only matters if exist and interaction between two members. The next step for this approach is consider the contribution of the contents generated by a member. Incorporating an interaction weight could improve even more the key-member detection, because not only considerate the participative contributing members, but also will consider the quality of all their contributions. 55

69 Chapter 5. CONCLUSIONS AND FUTURE WORK To discuss in later research is how to incorporate the quality contribution, how measure multiples contributions, and which rank algorithms to use. One possibility is to use a weighted- HITS approach, but the value and range of the arcs weight has to be discussed previously. Also, the value of the threshold θ has to be evaluated. In this thesis, θ has the value of 0.8, representing a strong filtering. In a future work, different values of θ has to be applied to filter content and evaluate the quality of the obtained graphs Concepts approach Concept Based is used to extract the value of post according previous established concepts, defined by administrators. In occasions, administrators could not include all the concepts that are treated in the community, also the community is evolving through time, changing the concepts too and administrator could not capture this changes. LDA extracts topics of the community, this way it could be possible to obtain the actual topics treated by members in the community. Then, the work of administrators will be to recognize what these topics means. In the future, how to use automatic topic extraction systems have to be evaluated, and also how to analyze the topic evolution through time in a VCoP. Also, concepts approach have more applications than key-members discovery. If administrators defines a concepts of bad behaviour, such as trolling or spam, could help to moderate the community by catching the key-members of these concepts. Also could be helpful to study the health of the community in terms of how much junk post are in the community, and analyze the evolution through time of this concepts. Purposes evolution through time in VCoP was worked in [52]. In this work, a revision of the community was analyze, but not the evolution of members through time. If the same idea is used to study the behaviour of members through time, it could be possible to research issues like: evolution of members from common member to key-member, evolution of member participation and members churn detection Thematic networks In this thesis, all concepts or topics are compared to measure the global contribution of a post, but it is possible to isolate the concepts to have a network which include the high valued post of a specific concept or topic, creating a new kind of network representation, defined as Thematic network. With this network configuration, key-members founded are related with specific concepts or topics. With this approach, administrators efforts to enhance the community will be more focused, because the answer of who know what about something will be answered. 56

70 Chapter 5. CONCLUSIONS AND FUTURE WORK SNA and Topic extraction computational tool Every algorithm used in this thesis was programmed only in order to have the needed data. There is no optimization of the code, or efficiency in the algorithms implemented, even graphic interfaces where omitted in benefit of batch process to have all network configurations needed. So a benchmark of the software used is not possible. A database was modeled and used to have and process the data, which could be useful for all later works, but it is necessary to have a computational tool which does all the previous data processing. Among the features that a computational tool will have is: Multidimensional Database with the needed information as shown in section 4.3. A step by step framework which extract the topics, measures the concept and topic post scores configures the networks filtered and non-filtered (with the option of vary the threshold), use the SNA features and present benchmark between algorithms. End-user OLAP tool, which present the evolution of concept, topics and user behaviour through time into the community. Graphical visualization of the networks and the results obtained by SNA. A repository to store resulted networks. Report generator of the experiments realized. This software will help to have the results easier, quickly and will establish a standard for the VCoPs analysis which will benefit later research. 57

71 Chapter 5. CONCLUSIONS AND FUTURE WORK Conferences and Workshops Authors would like to thank the continuous support of Instituto Sistemas Complejos de Ingeniería (ICM: P F, CONICYT: FBO16); Initiation into Research Founding (FONDE- CYT), project code , entitled Semantic Web Mining Techniques to Study Enhancements of Virtual Communities ; and the Web Intelligence Research Group (wi.dii.uchile.cl). [1] H. I. Alvarez, S. Ríos, F. Aguilera, G. L Huillier. Enhancing SNA with a Concept-based Text Mining Approach to discover key members on a VCoP. In KES 10: 14th International Conference on Knowledge-Based and Intelligent Information & Engineering Systems. Cardiff, Wales, England, [2] G. L Huillier, H. I. Alvarez, S. Ríos, F. Aguilera. Topic-Based Social Network Analysis for Virtual Communities of Interests in the Dark Web. ISI-KDD 10: ACM SIGKDD Workshop on Intelligence and Security Informatics Washington DC, USA,

72 REFERENCES [1] R. Alberich, J. Miro-Julia, and F. Rossello. Marvel universe looks almost like a real social network. February [2] Xavier Amatriain, Neal Lathia, Josep M. Pujol, Haewoon Kwak, and Nuria Oliver. The wisdom of the few: a collaborative filtering approach based on expert opinions from the web. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pages , Boston, MA, USA, ACM. [3] István Bíró, Dávid Siklósi, Jácint Szabó, and András A Benczúr. Linked latent dirichlet allocation in web spam filtering. pages 37 40, [4] D Blei, A Ng, and M Jordan. Latent dirichlet allocation. The Journal of Machine Learning ldots, Jan [5] A. Bourhis, L. Dubé, R. Jacob, et al. The success of virtual communities of practice: The leadership factor. The Electronic Journal of Knowledge Management, 3(1):23 34, [6] Christopher S. Campbell, Paul P. Maglio, Alex Cozzi, and Byron Dom. Expertise identification using communications. In Proceedings of the 12 th international conference on Information and knowledge management, pages , New Orleans, LA, USA, ACM. [7] Deepayan Chakrabarti and Christos Faloutsos. Graph mining: Laws, generators, and algorithms. ACM Comput. Surv., 38(1):2, [8] Chih-Jou Chen and Shiu-Wan Hung. To give or to receive? factors influencing members knowledge sharing and community promotion in professional virtual communities. Information & Management, 47(4): , May [9] Chao-Min Chiu, Meng-Hsiang Hsu, and Eric T.G. Wang. Understanding knowledge sharing in virtual communities: An integration of social capital and social cognitive theories. Decision Support Systems, 42(3): , December [10] Anthony Cocciolo, Hui Soo Chae, and Gary Natriello. Using social network analysis to highlight an emerging online community of practice. In Proceedings of the 8th iternational conference on Computer supported collaborative learning, pages , New Brunswick, New Jersey, USA, International Society of the Learning Sciences. 59

73 REFERENCES [11] Pedro O.S. Vaz de Melo, Virgilio A.F. Almeida, and Antonio A.F. Loureiro. Can complex network metrics predict the behavior of NBA teams? In Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages , Las Vegas, Nevada, USA, ACM. [12] Wouter de Nooy, Andrej Mrvar, and Vladimir Batagelj. Exploratory Social Network Analysis with Pajek. Cambridge University Press, [13] Kristine de Valck, Gerrit H. van Bruggen, and Berend Wierenga. Virtual communities: A marketing perspective. Decision Support Systems, 47(3): , June [14] Peter Eades. A heuristic for graph drawing. Congressus Numerantium, 42: , [15] Kate Ehrlich, Ching-Yung Lin, and Vicky Griffiths-Fisher. Searching for experts in the enterprise: combining text and social network analysis. In Proceedings of the 2007 international ACM conference on Supporting group work, pages , Sanibel Island, Florida, USA, ACM. [16] Yu-Hui Fang and Chao-Min Chiu. In justice we trust: Exploring knowledge-sharing continuance intentions in virtual communities of practice. Computers in Human Behavior, 26(2): , March [17] Santo Fortunato. Community detection in graphs. Physics Reports, 486(3-5):75 174, February [18] Thomas M. J. Fruchterman and Edward M. Reingold. Graph drawing by force-directed placement. Software: Practice and Experience, 21(11): , [19] Yupeng Fu, Rongjing Xiang, Yiqun Liu, Min Zhang, and Shaoping Ma. Finding experts using social network analysis. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, pages IEEE Computer Society, [20] Joaquín Gairín-Sallán, David Rodríguez-Gómez, and Carme Armengol-Asparó. Who exactly is the moderator? a consideration of online knowledge management network moderation in educational organisations. Computers & Education, 55(1): , August [21] Andrew Gelman, John B. Carlin, Hal S. Stern, and Donald B. Rubin. Bayesian Data Analysis, Second Edition (Chapman & Hall/CRC Texts in Statistical Science). Chapman & Hall, 2 edition, July [22] Lise Getoor and Christopher P. Diehl. Link mining: a survey. SIGKDD Explor. Newsl., 7(2):3 12, [23] Pablo M Gleiser. How to become a superhero. Journal of Statistical Mechanics: Theory and Experiment, 2007(09):P09020 P09020, [24] T Griffiths. Finding scientific topics. Number 101, pages , [25] G Heinrich. Parameter estimation for text analysis. Technical report,

74 REFERENCES [26] John Hagel III and Arthur G. Armstrong. Net gain: expanding markets through virtual communities. Harvard Business School Press, [27] T. Kamada and S. Kawai. An algorithm for drawing general undirected graphs. Inf. Process. Lett., 31(1):7 15, April [28] Won Kim, Ok-Ran Jeong, and Sang-Won Lee. On social web sites. Information Systems, 35(2): , April [29] Jon M. Kleinberg. Authoritative sources in a hyperlinked environment. J. ACM, 46(5): , [30] Gueorgi Kossinets. Effects of missing data in social networks. Social Networks, 28(3): , July [31] Ravi Kumar, Jasmine Novak, and Andrew Tomkins. Structure and evolution of online social networks. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages , Philadelphia, PA, USA, ACM. [32] Haewoon Kwak, Yoonchan Choi, Young-Ho Eom, Hawoong Jeong, and Sue Moon. Mining communities in networks: a solution for consistency and its evaluation. pages , [33] Ohbyung Kwon and Yixing Wen. An empirical study of the factors affecting social network service use. Computers in Human Behavior, 26(2): , March [34] Ming-Ji James Lin, Shiu-Wan Hung, and Chih-Jou Chen. Fostering the determinants of knowledge sharing in professional virtual communities. Computers in Human Behavior, 25(4): , July [35] Xiaoyong Liu, W. Bruce Croft, and Matthew Koll. Finding experts in community-based question-answering services. In Proceedings of the 14th ACM international conference on Information and knowledge management, pages , Bremen, Germany, ACM. [36] Stanley Loh, José Palazzo M. de Oliveira, and Mauricio A. Gameiro. Knowledge discovery in texts for constructing decision support systems. Applied Intelligence, 18(3): , May [37] Andrew McCallum, Andrés Corrada-Emmanuel, and Xuerui Wang. Topic and role discovery in social networks. In Proceedings of the 19th international joint conference on Artificial intelligence, pages , Edinburgh, Scotland, Morgan Kaufmann Publishers Inc. [38] Andrew McCallum, Xuerui Wang, and Andrés Corrada-Emmanuel. Topic and role discovery in social networks with experiments on enron and academic . J. Artif. Int. Res., 30(1): , [39] Alan Mislove, Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, and Bobby Bhattacharjee. Measurement and analysis of online social networks. In Proceedings of the 7th ACM SIGCOMM conference on Internet measurement, pages 29 42, San Diego, California, USA, ACM. 61

75 REFERENCES [40] Katarzyna Musia l, Przemys law Kazienko, and Piotr Bródka. User position measures in social networks. In Proceedings of the 3rd Workshop on Social Network Mining and Analysis, pages 1 9, Paris, France, ACM. [41] H Nakanishi, IB Turksen, and M Sugeno. A review and comparison of six reasoning methods. Fuzzy Sets and Systems, Jan [42] Robert D Nolker and Lina Zhou. Social computing and weighting to identify member roles in online communities. Web Intelligence, IEEE / WIC / ACM International Conference on, 0:87 93, [43] Nishith Pathak, Colin Delong, Arindam Banerjee, and Kendrick Erickson. Social topic models for community extraction. Aug [44] Adam Perer and Ben Shneiderman. Integrating statistics and visualization: case studies of gaining clarity during exploratory data analysis. In Proceeding of the twenty-sixth annual SIGCHI conference on Human factors in computing systems, pages , Florence, Italy, ACM. [45] Ulrike Pfeil. Online support communities for older people: investigating network patterns and characteristics of social support. SIGACCESS Access. Comput., (89):35 41, [46] Ulrike Pfeil and Panayiotis Zaphiris. Investigating social network patterns within an empathic online community for older people. Computers in Human Behavior, 25(5): , September [47] Ulrike Pfeil and Panayiotis Zaphiris. Investigating social network patterns within an empathic online community for older people. Computers in Human Behavior, 25(5): , September [48] X H Phang and CT Nguyen. Gibbslda++ (http://gibbslda.sourceforge.net/), [49] Gilbert Probst and Stefano Borzillo. Why communities of practice succeed and why they fail. European Management Journal, 26(5): , October [50] Indra Rajasingh, Bharati Rajan, and Florence Isido D. Betweeness-Centrality of grid networks. In Computer Technology and Development, International Conference on, volume 1, pages , Los Alamitos, CA, USA, IEEE Computer Society. [51] Sebastián Ríos. A Study on Web Mining Techniques for Off-Line Enhancements of Web Sites. PhD thesis, University of Tokyo, [52] Sebastián Ríos, Felipe Aguilera, and Luis Guerrero. Virtual communities of practiceś purpose evolution analysis using a Concept-Based mining approach. In Knowledge-Based and Intelligent Information and Engineering Systems, pages [53] G Salton, A Wong, and C S Yang. A vector space model for automatic indexing. Commun. ACM, Vol. 18(11): ,

76 REFERENCES [54] S.L. Toral, M.R. Martínez-Torres, and F. Barrero. Analysis of virtual communities supporting OSS projects using social network analysis. Information and Software Technology, 52(3): , March [55] J.D. Velasquez and V. Palade. Adaptive Web Sites: A Knowledge Extraction from Web Data Approach. IOS Press, [56] Jyun-Cheng Wang, Chui-Chen Chiu, and Jr jing Tang. The correlation study of ewom and product sales predictions through SNA perspectives: an exploratory investigation by taiwan s cellular phone market. In Proceedings of the 7th international conference on Electronic commerce, pages , Xi an, China, ACM. [57] S Wasserman and K Faust. Social Network Analysis: Methods and Applications [58] Etienne Wenger, Richard Arnold McDermott, and William Snyder. Cultivating communities of practice. Harvard Business Press, [59] Dongshan Xing and Mark Girolami. Employing latent dirichlet allocation for fraud detection in telecommunications. Pattern Recognition Letters, Vol. 28(13): , [60] K. Yelupula and Srini Ramaswamy. Social network analysis for classification. In Proceedings of the 46th Annual Southeast Regional Conference on XX, pages , Auburn, Alabama, ACM. [61] Jun Zhang, Mark S. Ackerman, and Lada Adamic. Expertise networks in online communities: structure and algorithms. In Proceedings of the 16th international conference on World Wide Web, pages , Banff, Alberta, Canada, ACM. [62] Bin Zhu, Stephanie Watts, and Hsinchun Chen. Visualizing social network concepts. Decision Support Systems, 49(2): , May

77 Appendix 64

78 Appendix A SNA-KDD Results The following results are showed as a complement of results showed and discussed in section 4.3. A.1 Key-member obtained Top 10 motivators and repliers are showed in the following tables. The nicknames were replaced by their community user ID. Table A.1: Reply Oriented Motivators Key-members In-Degree Authority Counting Concept Based LDA Counting Concept Based LDA User User User User User User User1 User1 User1 User1 User1 User1 User User User User User User User User User User User User User36 User User User36 User User User User User36 User User User36 User User36 User User User36 User User User161 User677 User User161 User677 User User User User User User User User User User User User161 65

79 Appendix A. Table A.2: Reply Oriented Repliers Key-members Out-Degree Hub Counting Concept Based LDA Counting Concept Based LDA User User User User User User User User User User1 User1 User1 User1 User User User User User User User User User User User User User1 User1 User36 User User User User User User User User36 User161 User User161 User User36 User User User32 User677 User User User677 User32 User User User User161 User User User36 User32 User User User161 Table A.3: All Previous Oriented Motivators Key-members In-Degree Authority Counting Concept Based LDA Counting Concept Based LDA User User User User User User User1 User User36 User1 User User36 User User1 User User User1 User User36 User User User36 User User User User User1 User User User1 User User User161 User User User User User161 User User User161 User161 User User677 User User User677 User User User32 User User User32 User User161 User User User161 User User Table A.4: All Previous Oriented Repliers Key-members Out-Degree Hub Counting Concept Based LDA Counting Concept Based LDA User User User User User User User User User User1 User User36 User User User User36 User1 User User1 User User677 User User User User36 User1 User36 User User User1 User User User1 User User User161 User User User161 User User161 User User161 User161 User User User677 User User User36 User User User32 User User32 User User User161 User User

80 Appendix A. A.2 Key-member Database Figure A.1 display the resulting database after the key-member discovery processs. Figure A.1: Key-member Final Database. 67

Sales Management Main Features

Sales Management Main Features Sales Management Main Features Optional Subject (4 th Businesss Administration) Second Semester 4,5 ECTS Language: English Professor: Noelia Sánchez Casado e-mail: noelia.sanchez@upct.es Objectives Description

More information

LINIO COLOMBIA. Starting-Up & Leading E-Commerce. www.linio.com.co. Luca Ranaldi, CEO. Pedro Freire, VP Marketing and Business Development

LINIO COLOMBIA. Starting-Up & Leading E-Commerce. www.linio.com.co. Luca Ranaldi, CEO. Pedro Freire, VP Marketing and Business Development LINIO COLOMBIA Starting-Up & Leading E-Commerce Luca Ranaldi, CEO Pedro Freire, VP Marketing and Business Development 22 de Agosto 2013 www.linio.com.co QUÉ ES LINIO? Linio es la tienda online #1 en Colombia

More information

AP SPANISH LANGUAGE 2011 PRESENTATIONAL WRITING SCORING GUIDELINES

AP SPANISH LANGUAGE 2011 PRESENTATIONAL WRITING SCORING GUIDELINES AP SPANISH LANGUAGE 2011 PRESENTATIONAL WRITING SCORING GUIDELINES SCORE DESCRIPTION TASK COMPLETION TOPIC DEVELOPMENT LANGUAGE USE 5 Demonstrates excellence 4 Demonstrates command 3 Demonstrates competence

More information

INTELIGENCIA DE NEGOCIO CON SQL SERVER

INTELIGENCIA DE NEGOCIO CON SQL SERVER INTELIGENCIA DE NEGOCIO CON SQL SERVER Este curso de Microsoft e-learning está orientado a preparar a los alumnos en el desarrollo de soluciones de Business Intelligence con SQL Server. El curso consta

More information

Exemplar for Internal Achievement Standard. Spanish Level 1

Exemplar for Internal Achievement Standard. Spanish Level 1 Exemplar for Internal Achievement Standard Spanish Level 1 This exemplar supports assessment against: Achievement Standard 90910 Interact using spoken Spanish to communicate personal information, ideas

More information

Memorial Health Care System Catholic Health Initiatives Financial Assistance Application Form

Memorial Health Care System Catholic Health Initiatives Financial Assistance Application Form B Please note - Memorial Hospital may access external validation resources to assist in determining whether a full application for assistance is required. Financial Assistance Application 1) Patient Name

More information

REY PERALES MEMORIAL SCHOLARSHIP

REY PERALES MEMORIAL SCHOLARSHIP REY PERALES MEMORIAL SCHOLARSHIP On August 6th, 2007, the National Plasterers Council lost a great friend and respected colleague in REYMONDO Rey PERALES. REYMUNDO "Rey" PERALES was born October 14, 1951

More information

AP SPANISH LANGUAGE AND CULTURE EXAM 2015 SCORING GUIDELINES

AP SPANISH LANGUAGE AND CULTURE EXAM 2015 SCORING GUIDELINES AP SPANISH LANGUAGE AND CULTURE EXAM 2015 SCORING GUIDELINES Identical to Scoring Guidelines used for French, German, and Italian Language and Culture Exams Interpersonal Writing: E-mail Reply 5: STRONG

More information

LEARNING MASTERS. Explore the Northeast

LEARNING MASTERS. Explore the Northeast LEARNING MASTERS Explore the Northeast Explore the Northeast BUILD BACKGROUND Reading Expeditions: Language, Literacy & Vocabulary Five Regions Map Use the information on page 4 of Explore the Northeast

More information

BALANCE DUE 10/25/2007 $500.00 STATEMENT DATE BALANCE DUE $500.00 PLEASE DETACH AND RETURN TOP PORTION WITH YOUR PAYMENT

BALANCE DUE 10/25/2007 $500.00 STATEMENT DATE BALANCE DUE $500.00 PLEASE DETACH AND RETURN TOP PORTION WITH YOUR PAYMENT R E M I T T O : IF PAYING BY MASTERCARD, DISCOVER, VISA, OR AMERICAN EXPRESS, FILL OUT BELOW: XYZ Orthopaedics STATEMENT DATE BALANCE DUE 10/25/2007 $500.00 BALANCE DUE $500.00 ACCOUNT NUMBER 1111122222

More information

What is the Common Problem that Makes most Biological Databases Hard to Work With, if not Useless to most Biologists?

What is the Common Problem that Makes most Biological Databases Hard to Work With, if not Useless to most Biologists? What is the Common Problem that Makes most Biological Databases Hard to Work With, if not Useless to most Biologists? RUNI VILHELM MRAG Americas, Inc. 110 South Hoover Blvd., Suite 212 Tampa, Florida 33609-2458

More information

Your summer goal: To practice what you have been learning in Spanish and learn more about the Spanish language and Spanish-speaking cultures.

Your summer goal: To practice what you have been learning in Spanish and learn more about the Spanish language and Spanish-speaking cultures. Bienvenidos a la clase de Español Honores! THS 2013-2014 Your summer goal: To practice what you have been learning in Spanish and learn more about the Spanish language and Spanish-speaking cultures. REQUIRED

More information

0530 SPANISH (FOREIGN LANGUAGE)

0530 SPANISH (FOREIGN LANGUAGE) CAMBRIDGE INTERNATIONAL EXAMINATIONS International General Certificate of Secondary Education MARK SCHEME for the October/November 2012 series 0530 SPANISH (FOREIGN LANGUAGE) 0530/22 Paper 2 (Reading and

More information

Entrenamiento a Embajadores Ambassador training

Entrenamiento a Embajadores Ambassador training Entrenamiento a Embajadores Ambassador training Quiénes somos? Who we are? Levanta la mano si Please raise your hand if a. b. c. d. e. f. g. h. Hablas español You speak spanish Hablas Inglés You speak

More information

ICT education and motivating elderly people

ICT education and motivating elderly people Ariadna; cultura, educación y tecnología. Vol. I, núm. 1, jul. 2013 htpp://ariadna.uji.es 3 RD International Conference on Elderly and New Technologies pp. 88-92 DOI: http://dx.doi.org/10.6035/ariadna.2013.1.15

More information

National Quali cations EXEMPLAR PAPER ONLY

National Quali cations EXEMPLAR PAPER ONLY FOR OFFICIAL USE H National Quali cations EXEMPLAR PAPER ONLY EP4/H/03 Mark Spanish Listening and Writing Date Not applicable Duration 1 hour *EP4H03* Fill in these boxes and read what is printed below.

More information

Copyright 2016-123TeachMe.com 4ea67 1

Copyright 2016-123TeachMe.com 4ea67 1 Sentence Match Quiz for Category: hacer_make_do_1 1) Nosotros hacemos todo lo posible para proporcionar un buen servicio. - A: We do our best to provide good service. - B: These chores are done each time.

More information

FOR TEACHERS ONLY The University of the State of New York

FOR TEACHERS ONLY The University of the State of New York FOR TEACHERS ONLY The University of the State of New York REGENTS HIGH SCHOOL EXAMINATION S COMPREHENSIVE EXAMINATION IN SPANISH Wednesday, January 24, 2007 9:15 a.m. to 12:15 p.m., only SCORING KEY Updated

More information

Guidelines for Designing Web Maps - An Academic Experience

Guidelines for Designing Web Maps - An Academic Experience Guidelines for Designing Web Maps - An Academic Experience Luz Angela ROCHA SALAMANCA, Colombia Key words: web map, map production, GIS on line, visualization, web cartography SUMMARY Nowadays Internet

More information

abrir además ahí ahora algo alguno alto amor años

abrir además ahí ahora algo alguno alto amor años abrir además ahí ahora algo alguno alto amor años 2012 2013 Houston Independent School District 1 antes aquello aquí así atención aunque ayudar bailar bajar 2012 2013 Houston Independent School District

More information

demonstrates competence in

demonstrates competence in AP SPANISH LANGUAGE 2012 INTERPERSONAL WRITING SCORING GUIDELINES SCORE DESCRIPTION TASK COMPLETION/TOPIC DEVELOPMENT LANGUAGE USE 5 excellence 4 command 3 competence 2 Suggests lack of competence 1 lack

More information

VMware vsphere with Operations Management: Fast Track

VMware vsphere with Operations Management: Fast Track VMware vsphere with Operations Management: Fast Track Duración: 5 Días Código del Curso: VSOMFT Método de Impartición: Curso Cerrado (In-Company) Temario: Curso impartido directamente por VMware This intensive,

More information

SPAN 2113 Intermediate Spanish Schedule

SPAN 2113 Intermediate Spanish Schedule August September 1 st nd 3rd week 4 6 31 SPAN 113 Intermediate Spanish Schedule The University of Oklahoma Department of Modern Languages, Literatures, and Linguistics Fall 015 MW PM Class Abbreviations:

More information

Tema 7 GOING TO. Subject+ to be + ( going to ) + (verb) + (object )+ ( place ) + ( time ) Pronoun

Tema 7 GOING TO. Subject+ to be + ( going to ) + (verb) + (object )+ ( place ) + ( time ) Pronoun Tema 7 GOING TO Going to se usa para expresar planes a futuro. La fórmula para construir oraciones afirmativas usando going to en forma afirmativa es como sigue: Subject+ to be + ( going to ) + (verb)

More information

Resumen de Entrevista: Asociación de Agentes de Aduana del Puerto de Manzanillo

Resumen de Entrevista: Asociación de Agentes de Aduana del Puerto de Manzanillo Resumen de Entrevista: Asociación de Agentes de Aduana del Puerto de Manzanillo 1. To your knowledge, to what extent do customs brokers run into operative inconveniences when it comes to collecting payment

More information

AP SPANISH LANGUAGE 2013 PRESENTATIONAL WRITING SCORING GUIDELINES

AP SPANISH LANGUAGE 2013 PRESENTATIONAL WRITING SCORING GUIDELINES AP SPANISH LANGUAGE 2013 PRESENTATIONAL WRITING SCORING GUIDELINES SCORE DESCRIPTION TASK COMPLETION TOPIC DEVELOPMENT LANGUAGE USE 5 Demonstrates excellence 4 Demonstrates command 3 Demonstrates competence

More information

FOR TEACHERS ONLY The University of the State of New York

FOR TEACHERS ONLY The University of the State of New York FOR TEACHERS ONLY The University of the State of New York REGENTS HIGH SCHOOL EXAMINATION S COMPREHENSIVE EXAMINATION IN SPANISH Tuesday, June 22, 2010 1:15 to 4:15 p.m., only SCORING KEY Updated information

More information

Voices of Recent Latina Immigrants and Refugees:Effects of Budget Cuts on Their Settlement Experiences

Voices of Recent Latina Immigrants and Refugees:Effects of Budget Cuts on Their Settlement Experiences Voices of Recent Latina Immigrants and Refugees:Effects of Budget Cuts on Their Settlement Experiences by Neita Kay Israelite, Faculty of Education, York University Arlene Herman, Department of Sociology,

More information

Ask your child what he or she is learning to say in Spanish at school. Encourage your child to act as if he or she is your teacher.

Ask your child what he or she is learning to say in Spanish at school. Encourage your child to act as if he or she is your teacher. Welcome to Descubre el español con Santillana! This year, your child will be learning Spanish by exploring the culture of eight Spanish-speaking countries. Please join us as we travel through each of the

More information

Response Area 3 - Community Meeting

Response Area 3 - Community Meeting September 2010 Greetings, Welcome to the Independence Division, Response Area 3 monthly community letter. Please check the Independence Division Response Area map at www.cmpd.org/patrol to see which area

More information

APS ELEMENTARY SCHOOL PLANNING SURVEY

APS ELEMENTARY SCHOOL PLANNING SURVEY ARLINGTON PUBLIC SCHOOLS PLANNING AND EVALUATION OFFICE APS ELEMENTARY SCHOOL PLANNING SURVEY SURVEY PROCESS & DATA SUMMARY PLANNING AND EV ALUATION O FFICE 1426 NORTH Q UINCY STREET A RLINGTON, VA 22207

More information

R&P Cultural Orientation Model Assessment, Spanish

R&P Cultural Orientation Model Assessment, Spanish R&P Cultural Orientation Model Assessment, Spanish Participant Name Case # Assessor Name Date CO Completed Date of Assessment Additional Notes Reminders for assessors: Locally- or culturally-relevant terms

More information

Management effectiveness evaluation: for the CBD and for better parks Principles for MEE Methodologies

Management effectiveness evaluation: for the CBD and for better parks Principles for MEE Methodologies Management effectiveness evaluation: for the CBD and for better parks Principles for MEE Methodologies Key question: How will the evaluation help management? Before choosing a methodology or undertaking

More information

Contents. Introduction... 5. Chapter 1 Articles... 9. Chapter 2 Nouns... 12. Chapter 3 Adjectives... 15. Chapter 4 Prepositions and Conjunctions...

Contents. Introduction... 5. Chapter 1 Articles... 9. Chapter 2 Nouns... 12. Chapter 3 Adjectives... 15. Chapter 4 Prepositions and Conjunctions... Contents Introduction........................ 5 Chapter 1 Articles.................... 9 Chapter 2 Nouns..................... 12 Chapter 3 Adjectives................... 15 Chapter 4 Prepositions and Conjunctions........

More information

San Francisco Xavier Business School

San Francisco Xavier Business School San Francisco Xavier Business School Sharing information on progress (SIP) 2014-2015 Javier Ismodes Talavera Executive Director San Francisco Xavier Business School jismodes@sfx.edu.pe www.sfx.edu.pe The

More information

Teacher: Course Name: Spanish I Year. World Language Department Saugus High School Saugus Public Schools

Teacher: Course Name: Spanish I Year. World Language Department Saugus High School Saugus Public Schools Week 1 Week 2 Capítulo Preliminar 1. Intro to Spanish speaking world/nombres en Espanol 2. Frases útiles/ los cognados 3. El Alfabeto 4. Los Colores 5. Los números (0-30) and 1.3 Students present information,

More information

90 HOURS PROGRAMME LEVEL A1

90 HOURS PROGRAMME LEVEL A1 90 HOURS PROGRAMME LEVEL A1 GENERAL AIMS On completing this course, students should be able to: be familiar with the Spanish alphabet letters and signs and relate them to the corresponding sounds. recognise

More information

ENVIRONMENT: Collaborative Learning Environment

ENVIRONMENT: Collaborative Learning Environment Guía Integrada de Actividades Contexto de la estrategia de aprendizaje a desarrollar en el curso: The activity focuses on the Task Based Language Learning (TBLL). The task is used by the student in order

More information

Taller de Emprendimiento 2 IESE Business School Version 06.07 LMC

Taller de Emprendimiento 2 IESE Business School Version 06.07 LMC Taller de Emprendimiento 2 IESE Business School Version 06.07 LMC . Anuncio del taller de emprendimiento madrid 2013.pdf Bibliografía The Startup Owners Manual, Steve Blank, Ranch House 2012 http://steveblank.com/2012/11/27/open-source-entrepreneurship/

More information

TEACHER GUIDE STRATEGIES ACHIEVE READING SUCCESS. CURRICULUM ASSOCIATES, Inc. STARS SERIES E SPANISH EDITION

TEACHER GUIDE STRATEGIES ACHIEVE READING SUCCESS. CURRICULUM ASSOCIATES, Inc. STARS SERIES E SPANISH EDITION TEACHER GUIDE STARS SERIES E SPANISH EDITION STRATEGIES TO ACHIEVE READING SUCCESS PROPORCIONA ACTIVIDADES DE ENSEÑANZA PARA 12 ESTRATEGIAS DE LECTURA USA UN SISTEMA DE VARIOS PASOS PARA LOGRAR ÉXITO EN

More information

Control of a variety of structures and idioms; occasional errors may occur, but

Control of a variety of structures and idioms; occasional errors may occur, but AP SPANISH LANGUAGE 2012 PRESENTATIONAL WRITING SCORING GUIDELINES SCORE DESCRIPTION TASK COMPLETION TOPIC DEVELOPMENT LANGUAGE USE 5 Demonstrates excellence 4 Demonstrates command 3 Demonstrates competence

More information

FOR TEACHERS ONLY The University of the State of New York

FOR TEACHERS ONLY The University of the State of New York FOR TEACHERS ONLY The University of the State of New York REGENTS HIGH SCHOOL EXAMINATION S COMPREHENSIVE EXAMINATION IN SPANISH Wednesday, January 26, 2011 9:15 a.m. to 12:15 p.m., only SCORING KEY Updated

More information

Spanish 8695/S Paper 3 Speaking Test Teacher s Booklet Time allowed Instructions one not not Information exactly as they are printed not 8695/S

Spanish 8695/S Paper 3 Speaking Test Teacher s Booklet Time allowed Instructions one not not Information exactly as they are printed not 8695/S AQA Level 1/2 Certificate June 2014 Spanish 8695/S Paper 3 Speaking Test Teacher s Booklet To be conducted by the teacher-examiner between 24 April and 15 May 2014 Time allowed: 11 minutes (including 2

More information

BtoB MKT Trends. El Escenario Online. Luciana Sario. Gerente de Marketing IDC Latin America 2009 IDC W W W. I D C. C O M / G M S 1

BtoB MKT Trends. El Escenario Online. Luciana Sario. Gerente de Marketing IDC Latin America 2009 IDC W W W. I D C. C O M / G M S 1 BtoB MKT Trends El Escenario Online Luciana Sario Gerente de Marketing IDC Latin America 2009 IDC W W W. I D C. C O M / G M S 1 Audio Test Estamos haciendo un Audio Test y estoy hablando en este momento

More information

CUSTOMER ENGAGEMENT & COMMERCE PORQUE EL CAMINO & EL RESULTADO IMPORTAN

CUSTOMER ENGAGEMENT & COMMERCE PORQUE EL CAMINO & EL RESULTADO IMPORTAN CUSTOMER ENGAGEMENT & COMMERCE PORQUE EL CAMINO & EL RESULTADO IMPORTAN NAME TITLE 2011 SAP AG. All rights reserved. 1 QUÉ SIGNIFICA CUSTOMER ENGAGEMENT AND COMMERCE? RELACIONARNOS CON NUESTROS CLIENTES

More information

Sentence Match Quiz for Category: preterite_vs_imperfect_1 Mark the sentence that matches each item below.

Sentence Match Quiz for Category: preterite_vs_imperfect_1 Mark the sentence that matches each item below. Sentence Match Quiz for Category: preterite_vs_imperfect_1 1) Llegó un poco tarde. - A: He arrived a little late. - B: Last week was wonderful. - D: There were at least 120 people in the street. 2) En

More information

PATIENT HEALTH QUESTIONNAIRE PHQ-9 FOR DEPRESSION

PATIENT HEALTH QUESTIONNAIRE PHQ-9 FOR DEPRESSION PATIENT HEALTH QUESTIONNAIRE PHQ- FOR DEPRESSION USING PHQ- DIAGNOSIS AND SCORE FOR INITIAL TREATMENT SELECTION A depression diagnosis that warrants treatment or treatment change, needs at least one of

More information

UNIVERSIDAD TÉCNICA DEL NORTE ARTÍCULO CIENTÍFICO (INGLÉS)

UNIVERSIDAD TÉCNICA DEL NORTE ARTÍCULO CIENTÍFICO (INGLÉS) UNIVERSIDAD TÉCNICA DEL NORTE FACULTAD DE INGENIERÍA EN CIENCIAS APLICADAS CARRERA DE INGENIERÍA EN SISTEMAS COMPUTACIONALES TRABAJO DE GRADO PREVIO A LA OBTENCIÓN DEL TÍTULO DE INGENIERO EN SISTEMAS COMPUTACIONALES

More information

New Server Installation. Revisión: 13/10/2014

New Server Installation. Revisión: 13/10/2014 Revisión: 13/10/2014 I Contenido Parte I Introduction 1 Parte II Opening Ports 3 1 Access to the... 3 Advanced Security Firewall 2 Opening ports... 5 Parte III Create & Share Repositorio folder 8 1 Create

More information

ACTIVITY # Dear Parent, Carta a los Padres. pbskids.org

ACTIVITY # Dear Parent, Carta a los Padres. pbskids.org Dear Parent, Today was the 100th Day of School, and what better way to celebrate than with activities all about the number 100? With the help of Peg and Cat the problem-solving, math-loving duo from PBS

More information

Una vida asombrosa (An Amazing Life)

Una vida asombrosa (An Amazing Life) Unit Guide for Una vida asombrosa (An Amazing Life) OVERVIEW Focus on: Instructional Terms: Genre Study Biography Instructional Terms: Academic Language línea de tiempo, secuencia Lesson with scripting

More information

Como sabemos que lo funcional y lo estético son importantes para ti, te ofrecemos diferentes acabados y colores.

Como sabemos que lo funcional y lo estético son importantes para ti, te ofrecemos diferentes acabados y colores. A En Rejiplas fabricamos y comercializamos organizadores y soluciones de espacio para el hogar. Hacemos realidad tus proyectos e ideas optimizando todos los ambientes. Nuestros herrajes y soluciones están

More information

CHILD CARE 2016 SUMMER CREDENTIAL COURSES NEW YORK

CHILD CARE 2016 SUMMER CREDENTIAL COURSES NEW YORK CHILD CARE 2016 SUMMER CREDENTIAL COURSES NEW YORK 100% online Convenient, evening classes Caring instructors with field experience Scholarships available We are proud to offer you the fully accredited

More information

En esta guía se encuentran los cursos que se recomiendan los participantes en la implementación de un SGEn en dependencias del Gobierno Federal.

En esta guía se encuentran los cursos que se recomiendan los participantes en la implementación de un SGEn en dependencias del Gobierno Federal. En esta guía se encuentran los cursos que se recomiendan los participantes en la implementación de un SGEn en dependencias del Gobierno Federal. Las lecciones se agrupan en 5 cursos dirigidos cada participante

More information

Global Art: A Sense of Caring Nos Preocupamos por los Demas

Global Art: A Sense of Caring Nos Preocupamos por los Demas Project Starter Kit for Online Collaborations Submitted by Jennifer Geist Bridges to Understanding Seattle, WA December 2006 Global Art: A Sense of Caring Nos Preocupamos por los Demas A Starter Kit for

More information

What you need TITLE to know about college admission TITLE tests

What you need TITLE to know about college admission TITLE tests Parents What you need to know about college admission tests Your child will want to take a college admission test, such as the SAT or other college entrance exams, when he or she is a junior or senior.

More information

PROCEDIMIENTOPARALAGENERACIÓNDEMODELOS3DPARAMÉTRICOSA PARTIRDEMALLASOBTENIDASPORRELEVAMIENTOCONLÁSERESCÁNER

PROCEDIMIENTOPARALAGENERACIÓNDEMODELOS3DPARAMÉTRICOSA PARTIRDEMALLASOBTENIDASPORRELEVAMIENTOCONLÁSERESCÁNER PROCEDIMIENTOPARALAGENERACIÓNDEMODELOS3DPARAMÉTRICOSA PARTIRDEMALLASOBTENIDASPORRELEVAMIENTOCONLÁSERESCÁNER Lopresti,LauraA.;Lara, Marianela;Gavino,Sergio;Fuertes,LauraL.;Defranco,GabrielH. UnidaddeInvestigación,DesaroloyTransferencia-GrupodeIngenieríaGráficaAplicada

More information

LECCIÓN 4: LA SALUD Y EL BIENESTAR TERMINO DE LA UNIDAD: 10 DE ABRIL

LECCIÓN 4: LA SALUD Y EL BIENESTAR TERMINO DE LA UNIDAD: 10 DE ABRIL LECCIÓN 4: LA SALUD Y EL BIENESTAR E S PA Ñ O L I I I TERMINO DE LA UNIDAD: 10 DE ABRIL PRÁCTICA Explicación de los mandatos. Por and para are both translated as for, but they are not interchangeable.

More information

Comments on Draft OECD/IOPS Good Practices on Pension Fund s Use of Alternative Investments and Derivatives

Comments on Draft OECD/IOPS Good Practices on Pension Fund s Use of Alternative Investments and Derivatives Comments on Draft OECD/IOPS Good Practices on Pension Fund s Use of Alternative Investments and Derivatives This document includes comments from various FIAP members, belonging to different countries.

More information

SUBCHAPTER A. AUTOMOBILE INSURANCE DIVISION 3. MISCELLANEOUS INTERPRETATIONS 28 TAC 5.204

SUBCHAPTER A. AUTOMOBILE INSURANCE DIVISION 3. MISCELLANEOUS INTERPRETATIONS 28 TAC 5.204 Part I. Texas Department of Insurance Page 1 of 10 SUBCHAPTER A. AUTOMOBILE INSURANCE DIVISION 3. MISCELLANEOUS INTERPRETATIONS 28 TAC 5.204 1. INTRODUCTION. The commissioner of insurance adopts amendments

More information

Owatonna - BCBSF Library Initiative Evaluation Plan Template March 2013. Data to be collected

Owatonna - BCBSF Library Initiative Evaluation Plan Template March 2013. Data to be collected Owatonna - BCBSF Library Initiative Evaluation Plan Template March 2013 Activity to be Create new access points to access the library, focused on health equity topics, for targeted population a. Provide

More information

Sixth & Eighth Grade Scholarship Letter Deadline April 15, 2013

Sixth & Eighth Grade Scholarship Letter Deadline April 15, 2013 2012-2013 CABE South County Chapter Officers School Co- Alain E. Garnica- Mendoza School Trustee - Sweetwater Union School Liaison Transforming Education for English Learners San Ysidro School Sixth &

More information

learning science through inquiry in primary classroom Overview of workshop

learning science through inquiry in primary classroom Overview of workshop Indicators of teaching and learning science through inquiry in primary classroom Wynne Harlen UK Overview of workshop Part 1: Why IBSE? What do we want to achieve through it? Part 2: Thinking about how

More information

FORMACIÓN E-LEARNING DE MICROSOFT

FORMACIÓN E-LEARNING DE MICROSOFT FORMACIÓN E-LEARNING DE MICROSOFT NANFOR IBÉRICA S.L PARTNER GLOBAL DE E-LEARNING DE MICROSOFT, único en Europa del Sur e Iberoamérica, y uno de los 9 existentes en todo el mundo. NOVEDADES EN LAS CERTIFICACIONES

More information

Por qué ExecuTrain? Por qué ExecuTrain? Modalidad de servicio

Por qué ExecuTrain? Por qué ExecuTrain? Modalidad de servicio Por qué ExecuTrain? ExecuTrain es un proveedor de entrenamiento corporativo a nivel internacional y líder mundial en la capacitación empresarial. Contamos con 22 años y más de 62 mil personas capacitadas

More information

DIPLOMADO EN BASE DE DATOS

DIPLOMADO EN BASE DE DATOS DIPLOMADO EN BASE DE DATOS OBJETIVOS Preparan al Estudiante en el uso de las tecnologías de base de datos OLTP y OLAP, con conocimientos generales en todas las bases de datos y especialización en SQL server

More information

the task- Based Approach: a way to improve the didactic competence of pre-service teachers in Colombia using technology

the task- Based Approach: a way to improve the didactic competence of pre-service teachers in Colombia using technology ABSTRACT the task- Based Approach: a way to improve the didactic competence of pre-service teachers in Colombia using technology This article focuses its attention on an innovative application of the Task-Based

More information

TIDEHAVEN INDEPENDENT SCHOOL DISTRICT ENGLISH AS A SECOND LANGUAGE PROGRAM. LARGO VIVE ESPAÑOL CONVERSACION mucho

TIDEHAVEN INDEPENDENT SCHOOL DISTRICT ENGLISH AS A SECOND LANGUAGE PROGRAM. LARGO VIVE ESPAÑOL CONVERSACION mucho TIDEHAVEN INDEPENDENT SCHOOL DISTRICT ENGLISH AS A SECOND LANGUAGE PROGRAM LARGO VIVE ESPAÑOL CONVERSACION mucho Updated 11/99 TIDEHAVEN INDEPENDENT SCHOOL DISTRICT ENGLISH as a SECOND LANGUAGE CAMPUS

More information

Coming to America: Regulatory Opportunities

Coming to America: Regulatory Opportunities Regulatory Quality Forum October 3 and 10, 2014 Four Points Hotel and Casino, Caguas PR Coming to America: Regulatory Opportunities Business Excellence Consulting, Inc. Phone: 787.705.7272 www.calidadpr.com

More information

FAMILY INDEPENDENCE ADMINISTRATION Seth W. Diamond, Executive Deputy Commissioner

FAMILY INDEPENDENCE ADMINISTRATION Seth W. Diamond, Executive Deputy Commissioner FAMILY INDEPENDENCE ADMINISTRATION Seth W. Diamond, Executive Deputy Commissioner James K. Whelan, Deputy Commissioner Policy, Procedures and Training Lisa C. Fitzpatrick, Assistant Deputy Commissioner

More information

0678 FOREIGN LANGUAGE SPANISH (US)

0678 FOREIGN LANGUAGE SPANISH (US) UNIVERSITY OF CAMBRIDGE INTERNATIONAL EXAMINATIONS International General Certificate of Secondary Education MARK SCHEME for the May/June 2012 question paper for the guidance of teachers 0678 FOREIGN LANGUAGE

More information

Conociendo el Nuevo. Microsoft Social Engagement. Guillermo Ramhorst Microsoft LATAM Dynamics CRM guillermo.ramhorst@microsoft.com

Conociendo el Nuevo. Microsoft Social Engagement. Guillermo Ramhorst Microsoft LATAM Dynamics CRM guillermo.ramhorst@microsoft.com Conociendo el Nuevo Microsoft Social Engagement Guillermo Ramhorst Microsoft LATAM Dynamics CRM guillermo.ramhorst@microsoft.com Vivimos en un mundo conectado En cualquier lugar, todo el tiempo 6.8+ BILLONES

More information

The New Forest Small School

The New Forest Small School The New Forest Small School Spanish For Children Aged 11 to 16 OCR GCSE in Spanish J732 AIMS AND OBJECTIVES To provide: A meaningful and enjoyable educational experience Known and achievable but challenging

More information

Enrollment Forms Packet (EFP)

Enrollment Forms Packet (EFP) Enrollment Forms Packet (EFP) Please review the information below. Based on your student(s) grade and applicable circumstances, you are required to submit documentation in order to complete this step in

More information

Copyright 2016-123TeachMe.com 242ea 1

Copyright 2016-123TeachMe.com 242ea 1 Sentence Match Quiz for Category: por_vs_para_1 1) Son las habitaciones accesibles para discapacitados? - A: Are the rooms handicapped accessible? - B: You must fill out this form in order to get work

More information

SPANISH MOOD SELECTION: Probablemente Subjunctive, Posiblemente Indicative

SPANISH MOOD SELECTION: Probablemente Subjunctive, Posiblemente Indicative SPANISH MOOD SELECTION: Probablemente, Posiblemente Hilary Miller April 26, 2013 Spanish Mood Selection According to Spanish textbooks: = doubt, indicative = reality/certainty Es probable que/es posible

More information

Cambridge IGCSE. www.cie.org.uk

Cambridge IGCSE. www.cie.org.uk Cambridge IGCSE About University of Cambridge International Examinations (CIE) Acerca de la Universidad de Cambridge Exámenes Internacionales. CIE examinations are taken in over 150 different countries

More information

VaughanTown. Newsletter 5:...Last Words. Last Words and Recommendations Last Reminder Meeting point map. www.vaughantown.com

VaughanTown. Newsletter 5:...Last Words. Last Words and Recommendations Last Reminder Meeting point map. www.vaughantown.com VaughanTown Newsletter 5:...Last Words Last Words and Recommendations Last Reminder Meeting point map www.vaughantown.com 00 / 01 Años / VaughanTown Escolares en el Extranjero E.S.O & Bachillerato Last

More information

CONSENT OF THE GOVERNED EL CONSENTIMIENTO DE LOS GOBERNADOS EXTENDING. Founding Principles for English Language Learners

CONSENT OF THE GOVERNED EL CONSENTIMIENTO DE LOS GOBERNADOS EXTENDING. Founding Principles for English Language Learners CONSENT OF THE GOVERNED The power of government comes from the people. EL CONSENTIMIENTO DE LOS GOBERNADOS El poder del gobierno viene del pueblo. poder: gobierno: power: government: 1. Why is it important

More information

IntesisBox PA-RC2-xxx-1 SANYO compatibilities

IntesisBox PA-RC2-xxx-1 SANYO compatibilities IntesisBox PA-RC2-xxx-1 SANYO compatibilities In this document the compatible SANYO models with the following IntesisBox RC2 interfaces are listed: / En éste documento se listan los modelos SANYO compatibles

More information

CUADERNO DE VOCABULARIO Y GRAMATICA SPANISH 1 ANSWER KEY

CUADERNO DE VOCABULARIO Y GRAMATICA SPANISH 1 ANSWER KEY CUADERNO DE VOCABULARIO Y GRAMATICA SPANISH 1 ANSWER KEY So may advice to users is read before you purchase if you can find a free trail version to experience before you pay then you will find a far better

More information

Spanish 1512.253 (TR 9:30 10:50) Course Calendar Spring 2015

Spanish 1512.253 (TR 9:30 10:50) Course Calendar Spring 2015 1 Spanish 1512.253 (TR 9:30 10:50) Course Calendar Spring 2015 U: Unidos etext: Electronic Textbook (See MySpanishLab at http://myspanishlab.com/) Audio for the activities in the hardcopy of the Classroom

More information

NEW TOOLS FOR THE SELECTION OF TECHNOLOGIES; APPLICATION TO SHEET METAL FORMING

NEW TOOLS FOR THE SELECTION OF TECHNOLOGIES; APPLICATION TO SHEET METAL FORMING XI CONGRESO INTERNACIONAL DE INGENIERIA DE PROYECTOS LUGO, 26-28 Septiembre, 2007 NEW TOOLS FOR THE SELECTION OF TECHNOLOGIES; APPLICATION TO SHEET METAL FORMING Abstract David. Cortés Saenz (p), Carles.

More information

3. Does it happen in PARTNERSHIP and through our Action Teams and/ or Leadership Team?

3. Does it happen in PARTNERSHIP and through our Action Teams and/ or Leadership Team? LiveWell: What to consider when considering projects 1. Is the issue SPECIFIC? You will know what success is. 2. Does it support the MISSION and goals of LiveWell? 3. Does it happen in PARTNERSHIP and

More information

LOS ANGELES UNIFIED SCHOOL DISTRICT REFERENCE GUIDE

LOS ANGELES UNIFIED SCHOOL DISTRICT REFERENCE GUIDE TITLE: NUMBER: ISSUER: Service Completion Criteria for Speech Language Impairment (SLI) Eligibility and Language and Speech (LAS) Services REF-4568.1 DATE: August 24, 2015 Sharyn Howell, Associate Superintendent

More information

Summer Reading and Class Assignments 2014-2015 Rising Seniors

Summer Reading and Class Assignments 2014-2015 Rising Seniors Student Name: Summer Reading and Class Assignments 2014-2015 Rising Seniors JIMMY CARTER EARLY COLLEGE HIGH SCHOOL LA JOYA INDEPENDENT SCHOOL DISTRICT To the Class of 2015: Jimmy Carter Early College High

More information

Environmental Policy (English Version)

Environmental Policy (English Version) Example #1 Environmental Policy (English Version) ABC Farms commits to meet all environmental rules and regulations in the swine industry and to strive to protect our environment through sound management

More information

INFORMATION DOSSIER WORK EXPERIENCE EUROPEAN SCHOOL ALICANTE

INFORMATION DOSSIER WORK EXPERIENCE EUROPEAN SCHOOL ALICANTE INFORMATION DOSSIER WORK EXPERIENCE EUROPEAN SCHOOL ALICANTE YEAR 2015-2016 INDEX 1. GENERAL 2. INTRODUCTORY LETTER 3. GUIDE FOR BUSINESSES / GUÍA PARA LAS EMPRESAS. 4. CONFIRMATION FORM / CARTA DE CONFIRMACIÓN.

More information

Module Title: Spanish 2.2

Module Title: Spanish 2.2 CORK INSTITUTE OF TECHNOLOGY INSTITIÚID TEICNEOLAÍOCHTA CHORCAÍ Autumn Examinations 2011 Module Title: Spanish 2.2 Module Code: LANG 6030 School: Business & Humanities Programme Title(s): Bachelor of Business

More information

Horizon 2020 Y emprendedores en la red

Horizon 2020 Y emprendedores en la red Horizon 2020 Y emprendedores en la red 29 November 2011 Oportunidad para el ABI Horizon es el nuevo programa de la UE para la investigación y la innovación con llamadas desde el 2013 EL ABi debe empezar

More information

General Certificate of Education Advanced Level Examination June 2014

General Certificate of Education Advanced Level Examination June 2014 General Certificate of Education Advanced Level Examination June 2014 Spanish Unit 4 Speaking Test Candidate s Material To be conducted by the teacher examiner between 7 March and 15 May 2014 (SPA4T) To

More information

Link. Links. Links. Links. Network. Links. Currículum - Portafolio. Content. Community. Community. Online. Feedback. Feedback. Twitter.

Link. Links. Links. Links. Network. Links. Currículum - Portafolio. Content. Community. Community. Online. Feedback. Feedback. Twitter. Username manager Facebook Currículum - Portafolio manager Facebook Comunication CV Username manager Facebook manager Facebook Comunication Facebook Comunication Información Personal Soy Periodista y Comunicador

More information

A. Before you read the text, answer the following question: What should a family do before starting to look for a new home?

A. Before you read the text, answer the following question: What should a family do before starting to look for a new home? UNIT 1: A PLAN FOR BUYING English for Real Estate Materials A. Before you read the text, answer the following question: What should a family do before starting to look for a new home? Read the following

More information

DOCUMENT RESUME ED 318 301 FL 800 119 AUTHOR EDRS PRICE DESCRIPTORS

DOCUMENT RESUME ED 318 301 FL 800 119 AUTHOR EDRS PRICE DESCRIPTORS DOCUMENT RESUME ED 318 301 FL 800 119 AUTHOR Spener, David TITLE Setting an Agenda for Study in Home-Based ESL r.lasses with Native Speakers of Spanish. PUB DATE 90 NOTE 7p. PUB TYPE Guides - Classroom

More information

LOS ANGELES UNIFIED SCHOOL DISTRICT REFERENCE GUIDE

LOS ANGELES UNIFIED SCHOOL DISTRICT REFERENCE GUIDE TITLE: NUMBER: ISSUER: Spring Semester 2014 Alternate Emergency Exercise Earl R. Perkins, Assistant Superintendent School Operations Office of the Superintendent ROUTING Instructional Superintendents Administrators

More information

Open-Ended Responses. Parent Survey for schools. Question 1. What do you like best about our school? Response. Survey Open-Ended Responses.

Open-Ended Responses. Parent Survey for schools. Question 1. What do you like best about our school? Response. Survey Open-Ended Responses. Survey Open-Ended s Open-Ended s Parent Survey for schools Question 1. What do you like best about our school? s. The learning enviroment and the individual care for each child needs IB program the good

More information

SUBCHAPTER A. AUTOMOBILE INSURANCE DIVISION 3. MISCELLANEOUS INTERPRETATIONS 28 TAC 5.204

SUBCHAPTER A. AUTOMOBILE INSURANCE DIVISION 3. MISCELLANEOUS INTERPRETATIONS 28 TAC 5.204 Part I. Texas Department of Insurance Page 1 of 11 SUBCHAPTER A. AUTOMOBILE INSURANCE DIVISION 3. MISCELLANEOUS INTERPRETATIONS 28 TAC 5.204 1. INTRODUCTION. The Texas Department of Insurance proposes

More information

Marta Zorrilla Universidad de Cantabria

Marta Zorrilla Universidad de Cantabria Tipos de problemas Marta Zorrilla Universidad de Cantabria Slides from Tan, P., Steinbach, M., Kumar, V. Introduction to data mining. Pearson Prentice Hall. 2006 Data Mining Tasks Prediction Methods Use

More information

GENERAL OVERVIEW STATEMENT OF THE PROBLEM LITERATURE REVIEW METHODOLOGY FINDINGS RESEARCH AND PEDAGOGICAL IMPLICATIONS LIMITATIONS CONCLUSIONS

GENERAL OVERVIEW STATEMENT OF THE PROBLEM LITERATURE REVIEW METHODOLOGY FINDINGS RESEARCH AND PEDAGOGICAL IMPLICATIONS LIMITATIONS CONCLUSIONS GENERAL OVERVIEW STATEMENT OF THE PROBLEM 1 LITERATURE REVIEW 2 METHODOLOGY 3 FINDINGS 4 RESEARCH AND PEDAGOGICAL IMPLICATIONS 5 LIMITATIONS 6 CONCLUSIONS 7 STATEMENT OF THE PROBLEM United Nations Development

More information

Prepárate. BT 030 - Computer ABCs for Women in Transition

Prepárate. BT 030 - Computer ABCs for Women in Transition Prepárate Lane Community College offers classes for students to establish a foundation for a successful transition to college. The classes below are recommended for students as resources for a successful

More information