HKUST-MIT Research Alliance Consortium Call for Proposal Lead Universities Participating Universities
Data Science and E-learning Research [Draft: 28 Feb 2015] Background Heterogeneous data derived from pervasive sensing, web and mobile technologies are being generated at an unprecedented scale and complexity. Data originated from various domains, such as medical sensors, energy monitoring networks, scientific measurements, financial transactions, e-learning platforms, web-interactions and social media become more readily available. These data permeate everywhere and exist over time in different forms of representations. A concept referred to as Big Data was introduced to describe such enormous data. So far, there is no clear definition for Big Data. Wiki describes Big Data as datasets that grow so large that they become awkward to work with using on-hand database management tools. The challenges include capture, curation, storage, search, sharing, analysis and visualization. Managing and analyzing big data bring many challenges. To address these challenges, we need advanced techniques for capturing data (sensors, RFID, cameras), storing data (data center and network architecture), modeling data (databases), mining data (machine learning and data mining), and visualizing data (visualization). However, despite successes in each of these areas separately, collective and integrated effort from researchers representing these different fields to address the challenges of big data, has yet to occur. The Data Science and E-learning Research Cluster will aim at supporting multidisciplinary teams to conduct research and develop technologies in data science and e-learning. In this Call for Proposals, proposals are sought to address challenging problems that will advance teaching and learning through research in data science, enhance teaching and learning on and off-campus, and expand access to high quality education to everyone. The main topics of interest include, but not limited to, the following areas: Design and development of e-learning platform E-learning content development Learning analytics on structured and unstructured data Crowdsourcing and social network for education purpose Knowledge mining from e-learning content Bridging big data analytics and behavioral sciences Although e-learning has been identified as the main problem domain, we expect that the research results and the technologies developed would be applicable to other problem domains including energy management, health care systems, retail and financial services, public policy formulation and social networks. Data Analytics in E-learning
Influenced by recent studies and trends, education has gone through a paradigm shift from teacher-centered to learner-centered pedagogy. Much emphasis is being placed nowadays on learning outcomes and mastery of subjects using new pedagogical approaches such as blended, active and collaborative learning. With the rapid development of information and communication technology, access to high quality digital learning resources, such as multimedia content, becomes increasingly pervasive. The availability of learning management platforms and social media provide better support for interaction and collaboration among learners. The advances in Web Computing (HTML5) allow real-time monitoring of students potentially providing new insights into how students learn. Emerging mobile and wireless technologies created ubiquitous learning environments that allow learning to be carried out anytime and anywhere using mobile devices. The convergence of these new pedagogical and technological developments has led to new approaches in the delivery of teaching and learning including blended learning, flipped courses and Massive Open Online Course (MOOC). The development of MOOC has drawn a lot of attention in 2012 following the founding of Coursera by two Stanford professors and the launching of edx by MIT and Harvard. The New York Times hailed 2012 as The Year of the MOOC [1] and Time Magazine featured MOOC as the cover story in Oct 2012 [2]. While Coursera and edx focus primarily on university level courses, Khan Academy, a non-profit educational startup founded by Sal Khan in 2006, targets mainly primary and secondary school students. To date, Khan has produced over 4,500 lectures covering topics spanning from Mathematics, Biology, Physics, Chemistry, to History and Economics. The site has delivered over 260 million lessons to millions of students. This groundbreaking initiative has attracted investments from both Bill Gates and Google. MOOC is currently dominated by US providers, in particular, Coursera and edx. However, other countries and regions, including UK, Australia and EU have recently announced the establishment of their own MOOC platforms. For fear of being left behind, major universities in Mainland China (C9) announced in July 2013 to jointly develop an e-learning/mooc platform in China [3]. Subsequently, Tsinghua launched XuetangX in October 2013 using open-source Open edx technology. In September 2013, edx teamed up Google to co-develop the portal mooc.org [4] and the following month, edx announced to work with The French Ministry of Higher Education to create a national online learning portal using Open edx technology [5]. Since September 2012, HKUST has become partner of both Coursera and edx, while CUHK and HKU have joined Coursera and edx respectively. MOOC platforms allow learning activities of students to be recorded (often referred to as clickstreams). As reported in [6], over 230 million clickstreams (including video views, discussion forum postings, quiz participations, etc.) together with the associated IP addresses and IDs of around 150,000 students have been recorded when the MIT course Circuits and Electronics was offered on edx. Since April 2013, HKUST has offered the first three MOOCs from Asia on Coursera and attracted over 130,000 students worldwide. Learning analytics have been performed on data collected from these MOOCs and some encouraging preliminary results have been obtained.
It is important to emphasize that the impact of e-learning/mooc platforms goes far beyond content development and delivery. Educators can collect data on students study patterns to perform learning analytics on how students learn and how teaching could be enhanced in brickand-mortar institutions. Perspectives from Education Industry Hong Kong has an exemplary education system and is one of the top spenders per capita in education. Its K-12 and higher education systems enjoy both competitive ranking and glowing reputation worldwide. Annually, more than 6,000 new Science and Engineering graduates feed into the talent pool ready to power the fast-paced and rapidly evolving technology industry. Furthermore, the penetration rate of internet and mobile devices is amongst the highest in the world. A strong legal system and guaranteed freedom of press secure Hong Kong s position as a global information hub. A diversity of Eastern and Western cultures co-exist and integrated seamlessly side by side. Close proximity to mainland China provides a huge geographic and cultural advantage to Hong Kong. China is already the biggest mobile market in the world. Combining with the high propensity of Chinese families to invest a significant proportion of their household income in education means that China is potentially an immense and dynamic market for educational products and services in the coming decades. All these factors favor Hong Kong economically, culturally and strategically as a major resource and player in spearheading e-learning education in the greater China region. A forum was organized on 28 September 2013 at HKUST to brainstorm on the challenges and opportunities of utilizing data science for e-learning developments. The forum was attended by practitioners from the tertiary, primary and secondary sectors as well as representatives from the IT and education industries. The following issues were identified as areas of interest to stakeholders of the IT and education sectors in Hong Kong. Research challenges in Data Science and E-learning The goal of Data Science is to take advantage of the increasing volume, velocity and variety of scientific data, in order to make new discoveries and create new knowledge. The field is based on mathematics, data analytics, machine learning, statistical learning and many areas of computation. It requires advances in high performance computing, cloud computing, data warehousing, unstructured data stores, data integration across multiple media and visualization. Using these techniques we seek to make sense of data and discover new insights about the world around us, by bringing together data, models and analytics. It is spawning new fields of research such as Social Physics, Network Science and Connection Science and requires that a new generation of researchers be trained in these techniques. Data science is being applied to problems in areas such as, education, healthcare, Internet of Things, cyber security, business analytics and city science ([7], [8] & [9]). Some of the research challenges in data science and e- learning are briefly elaborated below.
E-learning platform Various e-learning platforms are currently available to host e- learning content. Web and cloud computing are the enabling technologies that allow smooth delivery of multimedia digital content to a large number of users around the world and potentially new data gathering techniques to be applied. Big data research could make use of large amount of data collected from e-learning platforms to enhance the learning experience of students. Research will be conducted to investigate design and performance issues on the development of e-learning platforms. - Web and cloud computing - Data storage and retrieval - Mobile and cross-platform integrations - Design and performance evaluation of new e-learning platforms and tools Content development Authoring tools are essential components for creating digital educational content. Teachers can make use of existing learning materials such as those produced for the Khan Academy and other MOOC platforms. In particular, Khan Academy s video tutorials in Mathematics, Physics, Chemistry and Biology are deemed highly relevant to the primary and secondary curricula in Hong Kong. Khan Academy style videos have been used by many K-12 schools in the United States to experiment with innovative blended learning approaches. Studies need to be conducted to explore how these approaches and additional content in the subjects of Chinese and English languages, and Liberal Studies could be adapted in Hong Kong and the greater China region. Online lessons similar to the Advanced Placement (AP) courses being developed by edx in collaboration with the College Board in the US [10] can also be developed for secondary students in Hong Kong and the region. Standards and cross-platform tools for online and mobile learning need be developed for porting, migrating and sharing of e- learning content. - Authoring tools for learning objects - New applications and tools for support of students learning - New tools for remote testing and accreditation - Creating and adopting Khan Academy style lecture videos - Content development in STEM, Chinese, English as well as liberal studies - Bridging courses for familiarizing secondary students with university education Learning Analytics - Learning analytics will be performed on both structured data (e.g. those resulted from student profiles, video access and quizzes) and unstructured data (e.g. those extracted from discussion forums and peer-to-peer assessments). Visualization and utility tools would need to be developed for analyzing the vast amount of data and
presenting the data in an interactive manner. Based on the results of learning analytics, tools can be developed to deliver just-in-time feedback, impart advice on improving the delivery of teaching, identify students requiring special needs and design assessment tasks to evaluate learning outcomes. Tools and models would need to be developed to validate the causal relationships derived from learning analytics. - Learning analytics on structured and unstructured data - Machine learning and data mining - Visualization and utility tools for big data - Development of feedback based teaching plans and advising systems - Design of formative assessment tasks - Validation tools for the causal relationships derived from learning analytics Crowdsourcing and social network - Most e-learning platforms provide discussion forums and collaborative tools that facilitate interactions between teachers and students as well as among students. Participation in such forums would help students to develop transferable skills such as critical thinking, teamwork, and questioning and answering skills. Research in data mining, machine learning, social networks and natural language processing could provide better understanding on how peer instructions, social and collaborative learning could help to engage students more effectively. Such studies can have far reaching implications for understanding the behavioral patterns and psychology of learners. Models can be developed for innovative online assessment approaches including peer assessment. - Collaborative tools - Peer instructions and social learning - Behavioral patterns and psychology of users - Peer assessments Knowledge mining from e-learning content - A knowledge representation framework can be designed through analyzing the vast amount of multimedia content in online courses (including video, speech, transcript, powerpoint presentations, HTML5 etc.) to capture the interdependence of concepts and map out the relationships across topics. Knowledge representations can be used as a structure to personalized learning strategies for students from different background and with varying abilities. - Knowledge mining from multimedia e-learning content
- Generation of learning strategies - Personalized learning Bridging big data analytics and behavioral sciences - User online behavior covers various actions including friendship requesting, messaging, content publishing, rating, commenting, retweeting, browsing, clicking, searching, purchasing, and so on. The corresponding user behavior data, including social media interactions, navigation paths, clicks, queries, purchasing decisions, marketing responsiveness and other specific metrics like the interaction time and order information, can be captured and compiled. User online behavior analytics utilize and analyze the captured user behavior data to understand the user s intrinsic interests and build the user s profile, allowing future behavior s prediction and trend discovery. References: Behavior analytics is a fundamental research problem of the social sciences, with applications from economics to psychology, sociology, politics, and beyond. With the abundance of user behavior data from e-learning platforms and online social networks, online behavior analytics is becoming increasingly important in commercial, political, and social environments to increase marketing value, help decision making and improve networking experience. The techniques developed for big data analytics can be used to mine knowledge of behavioral science. Behavior analytics mainly focuses on mining the user intrinsic interests from historical behavior data for personalized recommendation and marketing, predicting time-varying user behavior. In addition to historical behavior data, other social information, including social structure and content, can be leveraged for behavior analysis. - Learner behavior modeling - Mining user behavior data - Investigating impact of teacher/peer attention on learner behavior - Designing integrated methods to conduct behavior analysis [1] http://www.nytimes.com/2012/11/04/education/edlife/massive-open-online-courses-are-multiplyingat-a-rapid-pace.html?pagewanted=all [2] http://content.time.com/time/covers/0,16641,20121029,00.html [3] http://news.sjtu.edu.cn/info/1005/137742.htm [4] https://www.edx.org/alert/edx-announces-partnership-google/1115 [5] https://www.edx.org/alert/edx-work-french-ministry-higher/1179
[6] Seaton, D. T., Bergner, Y., Chuang, I., Mitros, P., & Pritchard, D. E. (2013). Who does what in a massive open online course? Communications of the ACM. [7] http://columbiadatascience.com/ [8] http://requestinfo.datascience.berkeley.edu/ [9] http://aba.mit.edu [10] http://www.nytimes.com/2013/12/05/education/professors-in-deal-to-design-online-lessons-for-apclasses.html?_r=0