The document discusses how data science may reinvent learning and education. It begins with background on the author's experience in data teams and teaching. It then questions what an "Uber for education" may look like and discusses definitions of learning, education, and schools. The author argues interactive notebooks like Project Jupyter and flipped classrooms can improve learning at scale compared to traditional lectures or MOOCs. Content toolchains combining Jupyter, Thebe, Atlas and Docker are proposed for authoring and sharing computational narratives and code-as-media.
Hector Guerrero- Road to Business AnalyticsErika Marr
This document provides an overview of key concepts in business analytics including:
- Definitions of data science, data scientist, and analytics which involve extracting insights from data.
- A process map of data science including data collection, cleaning, modeling, and communication.
- A brief history and timeline of developments in computer technology, statistics, and analytics from the 1960s to present.
- Emerging areas like artificial intelligence, autonomous systems, and the impact of technology on jobs and society.
The document discusses IBM's Watson artificial intelligence system from an academic perspective. It summarizes that Watson is interesting from a research perspective because of its underlying "cognitive pipeline" approach to parallelizing reasoning using a large memory, as a different approach to memory-based reasoning, and as a validation of the paradigm of AI as a collection of small processes linked through learned contexts rather than as a monolithic system. The document argues that Watson demonstrates that intelligence relies on an ability to appropriately retrieve relevant information from memory, and opens up new areas for cognitive computing research.
GALE: Geometric active learning for Search-Based Software EngineeringCS, NcState
Multi-objective evolutionary algorithms (MOEAs) help software engineers find novel solutions to complex problems. When automatic tools explore too many options, they are slow to use and hard to comprehend. GALE is a near-linear time MOEA that builds a piecewise approximation to the surface of best solutions along the Pareto frontier. For each piece, GALE mutates solutions towards the better end. In numerous case studies, GALE finds comparable solutions to standard methods (NSGA-II, SPEA2) using far fewer evaluations (e.g. 20 evaluations, not 1,000). GALE is recommended when a model is expensive to evaluate, or when some audience needs to browse and understand how an MOEA has made its conclusions.
As we move into a new era of ITSM computing, new big data and machine learning tools and methodologies are being developed to support IT staff by intelligently extracting insights and making predictions from the enormous amounts of data accumulated from the organization. According to Gartner, I&O leaders must take a comprehensive approach to incorporate advanced big data and machine learning technologies into their organizations or risk becoming irrelevant. But what exactly is big data and machine learning all about? How can you introduce these concepts into your existing Service Desk?
Join USF’s distinguished Computer Science and Engineering Professor Lawrence Hall and SunView Software’s VP of Marketing and Product Strategy John Prestridge as they break down the fundamentals of big data and machine learning and provide real-world examples of the impact the technologies will have on ITSM.
Crowdsourced Data Processing: Industry and Academic PerspectivesAditya Parameswaran
This document provides a tutorial on crowdsourced data processing from both academic and industry perspectives. The tutorial is divided into three parts. Part 0 provides a background on crowdsourcing and surveys Parts 1 and 2. Part 1 surveys crowdsourced data processing algorithms from academia, discussing unit operations, cost models, error models, and examples like filtering and sorting. Part 2 surveys crowdsourced data processing in industry, finding that many large companies use internal platforms at large scale for tasks like categorization and content moderation, and that academic research is not yet widely used in industry.
The Unreasonable Effectiveness of MetadataJames Hendler
Invited talk at VIVO 2017 conference - explores the view of the semantic web as enriched metadata, and how that kind of information can be used in new and interesting ways.
Talk given at Los Alamos National Labs in Fall 2015.
As research becomes more data-intensive and platforms become more heterogeneous, we need to shift focus from performance to productivity.
This document provides an overview of machine learning, including definitions, common applications, and examples of companies using machine learning. It discusses how BuildFax, a company that provides building permit data and services to industries like insurance, used Amazon Machine Learning to build more accurate predictive models for roof age and job cost estimates. By leveraging Amazon ML, BuildFax was able to build models much faster and provide more precise, property-specific predictions to customers through APIs.
This document summarizes a seminar on machine learning using big data. It discusses the history of data storage and traditional databases. It then introduces machine learning and the types of learning, including supervised and unsupervised learning. Specific algorithms for each type are covered such as k-means clustering for unsupervised and naive Bayes for supervised. Case studies on applications like Amazon product recommendations are presented. The document concludes by discussing tools for machine learning and future applications as more connected devices generate extensive data.
Why Watson Won: A cognitive perspectiveJames Hendler
In this talk, we present how the Watson program, IBM's famous Jeopardy playing computer, works (based on papers published by IBM), we look at some aspects of potential scoring approaches, and we examine how Watson compares to several well known systems and some preliminary thoughts on using it in future artificial intelligence and cognitive science approaches.
Machine Learning in the Cloud with GraphLabDanny Bickson
The document discusses machine learning in the cloud using GraphLab. It introduces the need for machine learning with big data and the shift towards parallelism using GPUs, multicore processors, clusters and clouds. It describes GraphLab as providing high-level abstractions for parallel and distributed machine learning through its data representation as a graph and use of update functions. Examples of algorithms it supports include PageRank, collaborative filtering, and label propagation.
The document discusses techniques for scaling up automated content analysis projects. It begins by looking back at the workflow and techniques covered in previous sessions, such as developing components separately, writing functions, and making the code robust. It then looks forward by discussing additional techniques that were not covered, such as using Selenium for dynamic web scraping, databases for storing large datasets, word embeddings, and more advanced natural language processing and machine learning models. The document also introduces the INCA project, which aims to scale up content analysis by collecting data in a way that allows for reuse across multiple projects, using a database backend and reusable preprocessing and analysis code. The goal is to make automated content analysis usable with minimal Python knowledge.
Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...James Hendler
IJCAI 16 keynote on the need to bring modern AI accomplishments of recent years into connection with the more traditional goals of symbolic AI (and vice versa).
1. The document discusses future directions for software engineering research, including tools to support "citizen scientists" and proposed services for next-generation data repositories.
2. It suggests that data mining tools could provide more services beyond data repositories, such as supporting verification, compression, privacy, and streaming of data.
3. The talk outlines several topics, including software tools for citizen scientists, issues around decision software, and lessons learned regarding certification envelopes, goals, locality, and the need for repair and verification tools.
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014The Hive
This document discusses setting up an environment for agile data science and analytics applications. It recommends:
- Publishing atomic records like emails or logs to a "database" like MongoDB in order to make the data accessible to designers, developers and product managers.
- Wrapping the records with tools like Pig, Avro and Bootstrap to enable viewing, sorting and linking the records in a browser.
- Taking an iterative approach of refining the data model and publishing insights to gradually build up an application that discovers insights from exploring the data, rather than designing insights upfront.
- Emphasizing simplicity, self-service tools, and minimizing impedance between layers to facilitate rapid iteration and collaboration across roles.
Big Data & Machine Learning - TDC2013 Sao PauloOCTO Technology
BigData and Machine Learning: Usage and Opportunities for your IT department
Talk presented at The Developer Conference in São Paulo - 12/0713
Mathieu DESPRIEE
In this talk I review some of the early visions of the Semantic Web, some of the different views, and I follow through on a thread of how Semantic Web technology has been adopted in search engines (and other companies). I end with a challenge to the research community to keep pursuing this research, rather than letting industry take over the "low end" and keep new work from flourishing.
Jupyter for Education: Beyond Gutenberg and ErasmusPaco Nathan
O'Reilly Learning is focusing on evolving learning experiences using Jupyter notebooks. Jupyter notebooks allow combining code, outputs, and explanations in a single document. O'Reilly is using Jupyter notebooks as a new authoring environment and is exploring features like computational narratives, code as a medium for teaching, and interactive online learning environments. The goal is to provide a better learning architecture and content workflow that leverages the capabilities of Jupyter notebooks.
Use of standards and related issues in predictive analyticsPaco Nathan
My presentation at KDD 2016 in SF, in the "Special Session on Standards in Predictive Analytics In the Era of Big and Fast Data" morning track about PMML and PFA http://dmg.org/kdd2016.html
GraphX: Graph analytics for insights about developer communitiesPaco Nathan
The document provides an overview of Graph Analytics in Spark. It discusses Spark components and key distinctions from MapReduce. It also covers GraphX terminology and examples of composing node and edge RDDs into a graph. The document provides examples of simple traversals and routing problems on graphs. It discusses using GraphX for topic modeling with LDA and provides further reading resources on GraphX, algebraic graph theory, and graph analysis tools and frameworks.
Microservices, containers, and machine learningPaco Nathan
http://www.oscon.com/open-source-2015/public/schedule/detail/41579
In this presentation, an open source developer community considers itself algorithmically. This shows how to surface data insights from the developer email forums for just about any Apache open source project. It leverages advanced techniques for natural language processing, machine learning, graph algorithms, time series analysis, etc. As an example, we use data from the Apache Spark email list archives to help understand its community better; however, the code can be applied to many other communities.
Exsto is an open source project that demonstrates Apache Spark workflow examples for SQL-based ETL (Spark SQL), machine learning (MLlib), and graph algorithms (GraphX). It surfaces insights about developer communities from their email forums. Natural language processing services in Python (based on NLTK, TextBlob, WordNet, etc.), gets containerized and used to crawl and parse email archives. These produce JSON data sets, then we run machine learning on a Spark cluster to find out insights such as:
* What are the trending topic summaries?
* Who are the leaders in the community for various topics?
* Who discusses most frequently with whom?
This talk shows how to use cloud-based notebooks for organizing and running the analytics and visualizations. It reviews the background for how and why the graph analytics and machine learning algorithms generalize patterns within the data — based on open source implementations for two advanced approaches, Word2Vec and TextRank The talk also illustrates best practices for leveraging functional programming for big data.
See 2020 update: https://derwen.ai/s/h88s
SF Python Meetup, 2017-02-08
https://www.meetup.com/sfpython/events/237153246/
PyTextRank is a pure Python open source implementation of *TextRank*, based on the [Mihalcea 2004 paper](http://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf) -- a graph algorithm which produces ranked keyphrases from texts. Keyphrases generally more useful than simple keyword extraction. PyTextRank integrates use of `TextBlob` and `SpaCy` for NLP analysis of texts, including full parse, named entity extraction, etc. It also produces auto-summarization of texts, making use of an approximation algorithm, `MinHash`, for better performance at scale. Overall, the package is intended to complement machine learning approaches -- specifically deep learning used for custom search and recommendations -- by developing better feature vectors from raw texts. This package is in production use at O'Reilly Media for text analytics.
Apache Spark and the Emerging Technology Landscape for Big DataPaco Nathan
The document discusses Apache Spark and its role in big data and emerging technologies for big data. It provides background on MapReduce and the emergence of specialized systems. It then discusses how Spark provides a unified engine for batch processing, iterative jobs, SQL queries, streaming, and more. It can simplify programming by using a functional approach. The document also discusses Spark's architecture and performance advantages over other frameworks.
How Apache Spark fits into the Big Data landscapePaco Nathan
How Apache Spark fits into the Big Data landscape http://www.meetup.com/Washington-DC-Area-Spark-Interactive/events/217858832/
2014-12-02 in Herndon, VA and sponsored by Raytheon, Tetra Concepts, and MetiStream
GalvanizeU Seattle: Eleven Almost-Truisms About DataPaco Nathan
http://www.meetup.com/Seattle-Data-Science/events/223445403/
Almost a dozen almost-truisms about Data that almost everyone should consider carefully as they embark on a journey into Data Science. There are a number of preconceptions about working with data at scale where the realities beg to differ. This talk estimates that number to be at least eleven, through probably much larger. At least that number has a great line from a movie. Let's consider some of the less-intuitive directions in which this field is heading, along with likely consequences and corollaries -- especially for those who are just now beginning to study about the technologies, the processes, and the people involved.
Databricks Meetup @ Los Angeles Apache Spark User GroupPaco Nathan
This document summarizes a presentation on Apache Spark and Spark Streaming. It provides an overview of Spark, describing it as an in-memory cluster computing framework. It then discusses Spark Streaming, explaining that it runs streaming computations as small batch jobs to provide low latency processing. Several use cases for Spark Streaming are presented, including from companies like Stratio, Pearson, Ooyala, and Sharethrough. The presentation concludes with a demonstration of Python Spark Streaming code.
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MorePaco Nathan
Spark and Databricks component of the O'Reilly Media webcast "2015 Data Preview: Spark, Data Visualization, YARN, and More", as a preview of the 2015 Strata + Hadoop World conference in San Jose http://www.oreilly.com/pub/e/3289
Microservices, Containers, and Machine LearningPaco Nathan
Session talk for Data Day Texas 2015, showing GraphX and SparkSQL for text analytics and graph analytics of an Apache developer email list -- including an implementation of TextRank in Spark.
QCon São Paulo: Real-Time Analytics with Spark StreamingPaco Nathan
The document provides an overview of real-time analytics using Spark Streaming. It discusses Spark Streaming's micro-batch approach of treating streaming data as a series of small batch jobs. This allows for low-latency analysis while integrating streaming and batch processing. The document also covers Spark Streaming's fault tolerance mechanisms and provides several examples of companies like Pearson, Guavus, and Sharethrough using Spark Streaming for real-time analytics in production environments.
The document provides an overview of Apache Spark, including its history and key capabilities. It discusses how Spark was developed in 2009 at UC Berkeley and later open sourced, and how it has since become a major open-source project for big data. The document summarizes that Spark provides in-memory performance for ETL, storage, exploration, analytics and more on Hadoop clusters, and supports machine learning, graph analysis, and SQL queries.
Graph analytics can be used to analyze a social graph constructed from email messages on the Spark user mailing list. Key metrics like PageRank, in-degrees, and strongly connected components can be computed using the GraphX API in Spark. For example, PageRank was computed on the 4Q2014 email graph, identifying the top contributors to the mailing list.
The document discusses the future of data science, including increased use of functional programming, cloud notebooks, and probabilistic modeling of large and diverse datasets from IoT devices, drones, and satellites. It also predicts data scientists will displace traditional product managers as data becomes more important for decision making. Overall, the future involves analyzing exponentially larger volumes of diverse data using scalable cloud tools and probabilistic algorithms.
This document discusses Spark, an open-source cluster computing framework. It provides a brief history of Spark, describing how it generalized MapReduce to support more types of applications. Spark allows for batch, interactive, and real-time processing within a single framework using Resilient Distributed Datasets (RDDs) and a logical plan represented as a directed acyclic graph (DAG). The document also discusses how Spark can be used for applications like machine learning via MLlib, graph processing with GraphX, and streaming data with Spark Streaming.
OSCON 2014: Data Workflows for Machine LearningPaco Nathan
This document provides examples of different frameworks that can be used for machine learning data workflows, including KNIME, Python, Julia, Summingbird, Scalding, and Cascalog. It describes features of each framework such as KNIME's large number of integrations and visual workflow editing, Python's broad ecosystem, Julia's performance and parallelism support, Summingbird's ability to switch between Storm and Scalding backends, and Scalding's implementation of the Scala collections API over Cascading for compact workflow code. The document aims to familiarize readers with options for building machine learning data workflows.
Big Data is changing abruptly, and where it is likely headingPaco Nathan
Big Data technologies are changing rapidly due to shifts in hardware, data types, and software frameworks. Incumbent Big Data technologies do not fully leverage newer hardware like multicore processors and large memory spaces, while newer open source projects like Spark have emerged to better utilize these resources. Containers, clouds, functional programming, databases, approximations, and notebooks represent significant trends in how Big Data is managed and analyzed at large scale.
MOOCs are arguable a revolutionary innovation in education. But are they really that new? Do we need to stick to a course format? Do they have to be online or is blending also acceptable? How open are they really? Should they be massive and what is massive anyway? Do the democratise education, as is often claimed?
Keynote lecture at 2016 NTU Learning and Teaching Seminar - Students as Partn...Simon Bates
Keynote lecture at 2016 NTU Learning and Teaching Seminar - Students as Partners in Learning and Teaching. In this keynote, I will consider the role of students as partners in learning with reference to what current research can tell us about how people learn, what students have to say about what supports their learning, and where technology can help.
The document discusses the future of online learning and personal learning environments (PLEs). It notes that online learning has advanced significantly since 1995 with the growth of the World Wide Web. PLEs are centered around the learner's interests and support immersive, hands-on learning through connections to resources around the world. The document outlines key elements of PLEs, including tools for modeling concepts, demonstrating expertise, providing practice environments, enabling reflection, and allowing for learner choice, identity and creativity. It argues that PLEs will resemble social networks and enable learning through network-based approaches like associationism.
This document discusses pedagogy, retention, attainment, and the use of new technologies in education. It provides examples of how some colleges in Scotland are innovating with blended learning, MOOCs, learning tools, and digital skills development for staff and students. It suggests colleges could make better use of online resources and tools to enhance teaching and learning. The document also discusses the importance of authentic assessment, staff development, analytics, and embracing informal learning opportunities.
The MOOC movement is only four years old, but has already had a tremendous impact on teaching and learning. While the some of the original hype surrounding MOOCs has not been realized, the reality is that they are here for good and are influencing institutional thinking. This talk will discuss the past, present and future of MOOCs.
Adventures in Designing a MOOC with OER--STEMTech Denver, CO Nov. 2014cccschamp
This presentation was part of a session on creating a Technical Math MOOC with open educational resources. In October 2013, Colorado Community College System was awarded a TAACCCT 3 grant for Advanced Manufacturing. Our Advanced Manufacturing Industry partners were actively engaged in helping our faculty tailor their courses and course content to industry needs. Yet, the industry partners still had some complaints: I would like to send my employees to your colleges for courses, certificates and training but you want them to take and pass a technical math course before they can complete a course or certificate; my employees or I can’t afford the time and money to have them pass through the “gate keeping course.” Attendees will hear on how the CCCS system created a viable solution, a free Technical Math MOOC that works for faculty, industry and our students.
The document discusses the 7Cs of learning design proposed by Gráinne Conole. The 7Cs include: conceptualize, capture, communicate, collaborate, consider, consolidate, and continue. Conole outlines how new technologies have led to more open, social, and participatory approaches to learning. However, replicating old pedagogies with new tools does not fully leverage their potential. The learning design process emphasizes explicit design methods and sharing of practices. It encourages reflecting on how to harness new technologies and resources while rethinking support and assessment of learning.
The document discusses adopting MOOCs for corporate employee learning and development. It outlines opportunities, challenges, and a proposed framework. A working group is exploring using MOOCs and developing employee experiences with initial pilots. The group aims to understand the impact on learning functions and develop guidance on effectively using MOOCs for capabilities, skills and business drivers. Open discussion is encouraged around learning strategy, governance models and addressing common concerns regarding quality, reporting, and content relevance for organizations.
The document discusses the 7Cs framework for learning design proposed by Gráinne Conole. It outlines characteristics of new media technologies and their implications for learning, teaching and research. Some key points include: new technologies allow for peer critiquing, user-generated content, and networked and personalized learning. However, their potential is not fully realized as existing pedagogies are often replicated without taking advantage of new opportunities. The 7Cs framework - conceptualize, create, communicate, consume, collaborate, contribute, and critique - provides a design-based approach that encourages reflective practices and sharing. It can help educators harness new technologies while rethinking design, support and assessment of learning.
E Learning in Medical Education.E-learning (or eLearning) is the use of electronic media, educational technology and information and communication technologies (ICT) in education. E-learning includes numerous types of media that deliver text, audio, images, animation, and streaming video, and includes technology applications and processes such as audio or video tape, satellite TV, CD-ROM, and computer-based learning, as well as local intranet/extranet and web-based learning. Information and communication systems, whether free-standing or based on either local networks or the Internet in networked learning, underlie many e-learning processes
This document discusses using MOOCs to increase lifelong learning skills. It proposes blending MOOCs into classroom lessons to make learning more passionate and self-regulated. An ongoing project combines MOOCs with content and language integrated learning (CLIL) in French and English classes. Preliminary results show increased student motivation and digital literacy, as well as teachers learning to offer guidance while trusting students. The document advocates shifting towards lifelong learning for all by creating personal, passionate learning journeys using open educational resources like MOOCs.
This document discusses considerations around massive open online courses (MOOCs). It provides an overview of what MOOCs are, their history and current landscape. It examines potential pros and cons of MOOCs from the perspectives of students, faculty, universities and teaching/learning. It also addresses frequently asked questions around MOOCs and revenue/accreditation models. Overall, the document aims to inform decisions around whether and how an institution might engage with MOOCs.
Presentation for my EDDE 801 course (Athabasca University EdD program) on MOOCs. Covers a brief history of MOOCs, an initial taxonomy of issues around MOOCs and the taxonomy applied (briefly) to the Greek Open Course effort (ca. 2014)
Learning Portals – User Centric Gateway to Learning & KnowledgeLearningCafe
In the age of information glut, Learning Portals can provide Learners a way through the chaos to Learning and Knowledge that is useful and easier to access. However success stories are few and far between due to technology and design challenges. In many organisations the LMS is viewed as a Learning portal but not the one that provides the flexibility and user experience required.
With the Learning ecosystem becoming more complex and connected, Learner experience expectations are rising with an increase in the need to reduce costs.
Is it possible to implement a Learning portal that meets these requirements? We discuss with an experienced panel about the state of Learning portals and which way is it heading.
Date & Time : Thu, 29th June 2017, 12 – 1 pm Sydney Time
We Discuss
Should Learning Portals be the gateway for all learning and knowledge in the organisation?
What is user experience expected from a Learning Portal?
What are the benefits and drawbacks of using the LMS as a Learning Portal?
Can a Learning Portal be developed in the face of IT and policy restrictions?
Research Webinar: OERS and Cognitive ScienceiNACOL
This webinar provides practical information on how to use published research findings and make contact with cognitive scientists in order to improve K-12 and university students’ learning from digital online resources, like Khan Academy videos or interactive mathematics exercises. The webinar focuses on how students’ motivation and grades have been increased by helping them believe they can take charge of their learning and become smarter, and how students can be supported in reflective thinking and seeking deep understanding, when questions and prompts for students to explain are inserted in videos and interactive exercises
MOOCs provide opportunities for teachers and learners. For teachers, MOOCs allow for professional development by learning new content and teaching styles. MOOCs can also be added to traditional classes by using MOOC content and discussions. For learners, MOOCs increase access to education and provide flexible, self-paced learning. However, learners need computer access and time to benefit. MOOCs are also driving changes to education through the globalization and digitization of learning.
The document summarizes key points from a discussion on reimagining authentic curriculum and assessment in the age of generative AI. It includes:
1. Three major challenges are contract cheating, impersonation, and generative AI which can produce written work.
2. There are opportunities to use AI to enhance student learning and productivity if designed appropriately. Students could become creators by using AI to aid understanding or produce new learning resources.
3. Authentic assessment needs to move beyond essays and emphasize real-world skills through activities like presentations that cannot be produced by AI as well as balancing written work with other assessments.
MOOCs (Massive Open Online Courses) have made a lot of headlines and captured attention from university management. What is a MOOC? Do we care? Should we? This paper offers a briefing on what the first things you need to know are.
This paper explains what the different types of MOOCs are, the different pedagogical theories they assume, and discuss how they might evolve in the future. We identify the various motivations each stakeholder group might have for becoming involved in a MOOC as a learner, teacher or institution. Do MOOCs mark the dawn of a golden age of adult education and free CPD; or the final collapse of large courses into impersonal production lines? We discuss some apparent challenges to "normal" HE standards e.g. attrition rates, likely workloads.
Human in the loop: a design pattern for managing teams working with MLPaco Nathan
Strata CA 2018-03-08
https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/64223
Although it has long been used for has been used for use cases like simulation, training, and UX mockups, human-in-the-loop (HITL) has emerged as a key design pattern for managing teams where people and machines collaborate. One approach, active learning (a special case of semi-supervised learning), employs mostly automated processes based on machine learning models, but exceptions are referred to human experts, whose decisions help improve new iterations of the models.
Human-in-the-loop: a design pattern for managing teams that leverage MLPaco Nathan
Strata Singapore 2017 session talk 2017-12-06
https://conferences.oreilly.com/strata/strata-sg/public/schedule/detail/65611
Human-in-the-loop is an approach which has been used for simulation, training, UX mockups, etc. A more recent design pattern is emerging for human-in-the-loop (HITL) as a way to manage teams working with machine learning (ML). A variant of semi-supervised learning called active learning allows for mostly automated processes based on ML, where exceptions get referred to human experts. Those human judgements in turn help improve new iterations of the ML models.
This talk reviews key case studies about active learning, plus other approaches for human-in-the-loop which are emerging among AI applications. We’ll consider some of the technical aspects — including available open source projects — as well as management perspectives for how to apply HITL:
* When is HITL indicated vs. when isn’t it applicable?
* How do HITL approaches compare/contrast with more “typical” use of Big Data?
* What’s the relationship between use of HITL and preparing an organization to leverage Deep Learning?
* Experiences training and managing a team which uses HITL at scale
* Caveats to know ahead of time:
* In what ways do the humans involved learn from the machines?
* In particular, we’ll examine use cases at O’Reilly Media where ML pipelines for categorizing content are trained by subject matter experts providing examples, based on HITL and leveraging open source [Project Jupyter](https://jupyter.org/ for implementation).
Human-in-a-loop: a design pattern for managing teams which leverage MLPaco Nathan
Human-in-a-loop: a design pattern for managing teams which leverage ML
Big Data Spain, 2017-11-16
https://www.bigdataspain.org/2017/talk/human-in-the-loop-a-design-pattern-for-managing-teams-which-leverage-ml
Human-in-the-loop is an approach which has been used for simulation, training, UX mockups, etc. A more recent design pattern is emerging for human-in-the-loop (HITL) as a way to manage teams working with machine learning (ML). A variant of semi-supervised learning called _active learning_ allows for mostly automated processes based on ML, where exceptions get referred to human experts. Those human judgements in turn help improve new iterations of the ML models.
This talk reviews key case studies about active learning, plus other approaches for human-in-the-loop which are emerging among AI applications. We'll consider some of the technical aspects -- including available open source projects -- as well as management perspectives for how to apply HITL:
* When is HITL indicated vs. when isn't it applicable?
* How do HITL approaches compare/contrast with more "typical" use of Big Data?
* What's the relationship between use of HITL and preparing an organization to leverage Deep Learning?
* Experiences training and managing a team which uses HITL at scale
* Caveats to know ahead of time
* In what ways do the humans involved learn from the machines?
In particular, we'll examine use cases at O'Reilly Media where ML pipelines for categorizing content are trained by subject matter experts providing examples, based on HITL and leveraging open source [Project Jupyter](https://jupyter.org/ for implementation).
Humans in a loop: Jupyter notebooks as a front-end for AIPaco Nathan
JupyterCon NY 2017-08-24
https://www.safaribooksonline.com/library/view/jupytercon-2017-/9781491985311/video313210.html
Paco Nathan reviews use cases where Jupyter provides a front-end to AI as the means for keeping "humans in the loop". This talk introduces *active learning* and the "human-in-the-loop" design pattern for managing how people and machines collaborate in AI workflows, including several case studies.
The talk also explores how O'Reilly Media leverages AI in Media, and in particular some of our use cases for active learning such as disambiguation in content discovery. We're using Jupyter as a way to manage active learning ML pipelines, where the machines generally run automated until they hit an edge case and refer the judgement back to human experts. In turn, the experts training the ML pipelines purely through examples, not feature engineering, model parameters, etc.
Jupyter notebooks serve as one part configuration file, one part data sample, one part structured log, one part data visualization tool. O'Reilly has released an open source project on GitHub called `nbtransom` which builds atop `nbformat` and `pandas` for our active learning use cases.
This work anticipates upcoming work on collaborative documents in JupyterLab, based on Google Drive. In other words, where the machines and people are collaborators on shared documents.
Humans in the loop: AI in open source and industryPaco Nathan
Nike Tech Talk, Portland, 2017-08-10
https://niketechtalks-aug2017.splashthat.com/
O'Reilly Media gets to see the forefront of trends in artificial intelligence: what the leading teams are working on, which use cases are getting the most traction, previews of advances before they get announced on stage. Through conferences, publishing, and training programs, we've been assembling resources for anyone who wants to learn. An excellent recent example: Generative Adversarial Networks for Beginners, by Jon Bruner.
This talk covers current trends in AI, industry use cases, and recent highlights from the AI Conf series presented by O'Reilly and Intel, plus related materials from Safari learning platform, Strata Data, Data Show, and the upcoming JupyterCon.
Along with reporting, we're leveraging AI in Media. This talk dives into O'Reilly uses of deep learning -- combined with ontology, graph algorithms, probabilistic data structures, and even some evolutionary software -- to help editors and customers alike accomplish more of what they need to do.
In particular, we'll show two open source projects in Python from O'Reilly's AI team:
• pytextrank built atop spaCy, NetworkX, datasketch, providing graph algorithms for advanced NLP and text analytics
• nbtransom leveraging Project Jupyter for a human-in-the-loop design pattern approach to AI work: people and machines collaborating on content annotation
Lessons learned from 3 (going on 4) generations of Jupyter use cases at O'Reilly Media. In particular, about "Oriole" tutorials which combine video with Jupyter notebooks, Docker containers, backed by services managed on a cluster by Marathon, Mesos, Redis, and Nginx.
https://conferences.oreilly.com/fluent/fl-ca/public/schedule/detail/62859
https://conferences.oreilly.com/velocity/vl-ca/public/schedule/detail/62858
O'Reilly Media has experimented with different uses of Jupyter notebooks in their publications and learning platforms. Their latest approach embeds notebooks with video narratives in online "Oriole" tutorials, allowing authors to create interactive, computable content. This new medium blends code, data, text, and video into narrated learning experiences that run in isolated Docker containers for higher engagement. Some best practices for using notebooks in teaching include focusing on concise concepts, chunking content, and alternating between text, code, and outputs to keep explanations clear and linear.
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingPaco Nathan
London Spark Meetup 2014-11-11 @Skimlinks
http://www.meetup.com/Spark-London/events/217362972/
To paraphrase the immortal crooner Don Ho: "Tiny Batches, in the wine, make me happy, make me feel fine." http://youtu.be/mlCiDEXuxxA
Apache Spark provides support for streaming use cases, such as real-time analytics on log files, by leveraging a model called discretized streams (D-Streams). These "micro batch" computations operated on small time intervals, generally from 500 milliseconds up. One major innovation of Spark Streaming is that it leverages a unified engine. In other words, the same business logic can be used across multiple uses cases: streaming, but also interactive, iterative, machine learning, etc.
This talk will compare case studies for production deployments of Spark Streaming, emerging design patterns for integration with popular complementary OSS frameworks, plus some of the more advanced features such as approximation algorithms, and take a look at what's ahead — including the new Python support for Spark Streaming that will be in the upcoming 1.2 release.
Also, let's chat a bit about the new Databricks + O'Reilly developer certification for Apache Spark…
How Apache Spark fits into the Big Data landscapePaco Nathan
Boulder/Denver Spark Meetup, 2014-10-02 @ Datalogix
http://www.meetup.com/Boulder-Denver-Spark-Meetup/events/207581832/
Apache Spark is intended as a general purpose engine that supports combinations of Batch, Streaming, SQL, ML, Graph, etc., for apps written in Scala, Java, Python, Clojure, R, etc.
This talk provides an introduction to Spark — how it provides so much better performance, and why — and then explores how Spark fits into the Big Data landscape — e.g., other systems with which Spark pairs nicely — and why Spark is needed for the work ahead.
PRESS RELEASE - UNIVERSITY OF GHANA, JULY 16, 2024.pdfnservice241
The University of Ghana has launched a new vision and strategic plan, which will focus on transforming lives and societies through unparalleled scholarship, innovation, and result-oriented discoveries.
How to Make a Field Storable in Odoo 17 - Odoo SlidesCeline George
Let’s discuss about how to make a field in Odoo model as a storable. For that, a module for College management has been created in which there is a model to store the the Student details.
Topics to be Covered
Beginning of Pedagogy
What is Pedagogy?
Definition of Pedagogy
Features of Pedagogy
What Is Pedagogy In Teaching?
What Is Teacher Pedagogy?
What Is The Pedagogy Approach?
What are Pedagogy Approaches?
Teaching and Learning Pedagogical approaches?
Importance of Pedagogy in Teaching & Learning
Role of Pedagogy in Effective Learning
Pedagogy Impact on Learner
Pedagogical Skills
10 Innovative Learning Strategies For Modern Pedagogy
Types of Pedagogy
Dr. Nasir Mustafa CERTIFICATE OF APPRECIATION "NEUROANATOMY"Dr. Nasir Mustafa
CERTIFICATE OF APPRECIATION
"NEUROANATOMY"
DURING THE JOINT ONLINE LECTURE SERIES HELD BY
KUTAISI UNIVERSITY (GEORGIA) AND ISTANBUL GELISIM UNIVERSITY (TURKEY)
FROM JUNE 10TH TO JUNE 14TH, 2024
How to install python packages from PycharmCeline George
In this slide, let's discuss how to install Python packages from PyCharm. In case we do any customization in our Odoo environment, sometimes it will be necessary to install some additional Python packages. Let’s check how we can do this from PyCharm.
Email Marketing in Odoo 17 - Odoo 17 SlidesCeline George
Email marketing is used to send advertisements or commercial messages to specific groups of people by using email. Email Marketing also helps to track the campaign’s overall effectiveness. This slide will show the features of odoo 17 email marketing.
How to Create an XLS Report in Odoo 17 - Odoo 17 SlidesCeline George
XLSX reports are essential for structured data analysis, customizable presentation, and compatibility across platforms, facilitating efficient decision-making and communication within organizations.
A history of Innisfree in Milanville, PennsylvaniaThomasRue2
A history of Innisfree in Milanville, Damascus Township, Wayne County, Pennsylvania. By TOM RUE, July 23, 2023. Innisfree began as "an experiment in democracy," modeled after A.S. Neill's "Summerhill" school in England, "the first libertarian school".
Life of Ah Gong and Ah Kim ~ A Story with Life Lessons (Hokkien, English & Ch...OH TEIK BIN
A PowerPoint Presentation of a fictitious story that imparts Life Lessons on loving-kindness, virtue, compassion and wisdom.
The texts are in Romanized Hokkien, English and Chinese.
For the Video Presentation with audio narration in Hokkien, please check out the Link:
https://vimeo.com/manage/videos/987932748
1. 2015-08-24 • San Jose
Paco Nathan, @pacoid
Director, O’Reilly Learning
Data Science Reinvents Learning?
Beyond Gutenberg and Erasmus
meetup.com/SF-Bay-ACM/events/221693508/
2. 2
Some Background…
• O’Reilly Learning: you may only hear about us in
a few instances, if we do our job well; ACM is a great
forum for this discussion
• prior: built-out the community evangelism and training
program for Apache Spark at Databricks
• prior: led Data teams for several years, working on
large-scale ML apps in industry, including: one of the
largest Hadoop instances running in AWS (2008);
one of the first 100% AWS system architectures (2006)
• …
• ancient prior: Stanford CSD teaching fellowship (1984-86,
Alice Supton, Stuart Reges) peer-teaching CS course
which later became Residential Computing
6. 6
Intro
Ostensibly that leads to a question, how might
an “Uber for Education” look?
a) Similar to Cthulhu, we might regret actually seeing that
7. 7
Intro
Ostensibly that leads to a question, how might
an “Uber for Education” look?
a) Similar to Cthulhu, we might regret actually seeing that
8. 8
Intro
Ostensibly that leads to a question, how might
an “Uber for Education” look?
a) Similar to Cthulhu, we might regret actually seeing that
b) Would we really need that anywho?
9. 9
Intro
Ostensibly that leads to a question, how might
an “Uber for Education” look?
a) Similar to Cthulhu, we might regret actually seeing that
b) Would we really need that anywho?
c) Uber itself might not take that approach …
10. 10
Intro
Ostensibly that leads to a question, how might
an “Uber for Education” look?
a) Similar to Cthulhu, we might regret actually seeing that
b) Would we really need that anywho?
c) Uber itself might not take that approach …
Perhaps “Uber for Learning” might be somewhat
more apt?
In any case, what comes after Books,
Kindle, MOOCs?
13. 13
Some Definitions…
Even the best schools these days question
what they will become in 5-10 years
Not-so-best schools are perhaps questioning
much more than that
14. 14
Some Definitions…
Oh BTW, too many (funded) teams seem to
have this mediocre idea for “education”:
1. assessment: collect test scores ➜
2. define “quantified student” ➜
3. reuse online marketing funnel ad-tech ➜
4. invoke agile coding teams ➜
5. ship mobile/cloud-based SaaS platform ➜
6. ...
7. profit
15. Oh BTW, too many (funded) teams seem to
have this mediocre idea for “education”
1. assessment: collect test scores
2. define “quantified student”
3. reuse online marketing funnel ad-tech
4. invoke agile coding
5. ship a mobile/cloud-based SaaS platform
6. ...
7. profit
15
Some Definitions…
LMS
16. K-12 not so much, except perhaps in the
case of Safari for Schools
undergrad textbooks?
graduate textbooks, conferences?
professional focus of our audience
16
Some Definitions…
17. 17
• vocational:
making a career move
• aspirational:
improvement within a career path
• proficiency:
has a specific pain-point, needs to resolve it
• familiarity:
wants to join in a team dialog about a topic,
e.g., conversational programmer
Learner Personas for professional category
19. 19
What about MOOCs?
Massive Open Online Courses –
seven year trend, beginning with:
Connectivism and Connective Knowledge
George Siemens, Stephen Downes
University of PEI (2008)
http://cck11.mooc.ca/
21. 21
What about MOOCs?
Anthony Joseph
UC Berkeley
early Jun 2015
edx.org/course/uc-berkeleyx/uc-
berkeleyx-cs100-1x-
introduction-big-6181
Ameet Talwalkar
UCLA
late Jun 2015
edx.org/course/uc-berkeleyx/
uc-berkeleyx-cs190-1x-
scalable-machine-6066
22. 22
What about MOOCs?
Pros:
• cost-effective to reach a large audience
• popular with students
• ¿ addresses “train the trainers” bottleneck ?
Cons:
• expensive to produce and curate
• most students are sampling
• low completion rates
• somewhat chaotic
• lecture fatigue
• ¿ reinforces advantage of the elites ?
23. 23
What about MOOCs?
Online education: MOOCs taken by educated few
Ezekiel Emanuel, Nature 503, 342 (2013-11-21)
• 80% students already have an advanced degree
• 80% come from the richest 6% of the population
Michael Shanks @Stanford: retrenchment around traditional
disciplines will make disparities even more pronounced
An Early Report Card on Massive Open Online Courses
Geoffrey Fowler, WSJ (2013-10-08)
Amherst, Duke, etc., have rejected edX
see: Open edX Universities Symposium @GWU, 2015-11-11
24. 24
• search engines surface too many choices
among the available learning content
• we must get people wanting to interact with
the material – generally due to social context
• academe strives to decontextualize, which
is the opposite of learning in context
• how do we recognize that learning has
occurred?
• what is the learning promise?
What about MOOCs?
26. 26
Introduction to Robotics
Peter Corke @QUT
https://moocs.qut.edu.au/learn/introduction-to-
robotics-august-2015
• effective use of peer review for scaling
• worked well reaching into Africa, India
Peer Review
27. 27
EffectiveThinkingThrough Mathematics
Michael Starbird @UT/Austin
https://www.edx.org/course/effective-thinking-
through-mathematics-utaustinx-ut-9-01x
• getting students to articulate their
epiphany moments is more interesting
than other results – Donna Kidwell
Epiphany Moments
28. 28
Caltech Offers Online Course with
Live Lectures in Machine Learning
Yaser Abu-Mostafa (2012-03-30)
http://www.caltech.edu/news/caltech-offers-online-
course-live-lectures-machine-learning-4248
• significant improvement through the use
of “flipped” a.k.a. inverted classrooms
Inverted Classrooms
29. 29
Scalable Learning
David Black-Schaffer @Uppsala
Sverker Janson @KTH SICS
https://www.scalable-learning.com/
• active learning: Flipped Classroom and Just-in-timeTeaching
• exams built directly into specific diagrams within videos
• metrics for where in video+code that students get stuck
• instructor can customize subsequent classroom discussions
(active teaching phase) based on stuck/unstuck metrics
Inverted Classrooms
30. 30
How to Flip a Class
CLT @UT/Austin
http://ctl.utexas.edu/teaching/flipping-a-class/how
1. identify where the flipped classroom model makes
the most sense for your course
2. spend class time engaging students in application
activities with feedback
3. clarify connections between inside and outside
of class learning
4. adapt your materials for students to acquire course
content in preparation of class
5. extend learning beyond class through individual
and collaborative practice
Inverted Classrooms
31. 31
Learning programming at scale
Philip Guo
O’Reilly Radar (2015-08-13)
http://radar.oreilly.com/2015/08/learning-
programming-at-scale.html
• PythonTutor
• Codechella
Tutors could keep an eye on around
50 learners during a 30-minute session,
start 12 chat conversations, and
concurrently help 3 learners at once
Collaborative Learning
32. 32
Data-driven Education and the Quantified Student
Lorena Barba @GWU
PyData Seattle 2015
https://youtu.be/2YIZ2SY9mW4
• keynote talk: abstract, slides
• homepage
If you study just one link in this entire talk…
34. 34
If by some bizarre chance you haven’t used
it already, go to https://jupyter.org/
• 50+ different language kernels
• new funding 2015-07
• UC Berkeley, Cal Poly
• nbgrader autograder by Jess Hamrick
• jupyterhub multi-user server
• curating a list of examples
• repeatable science!
see also:
Teaching with Jupyter Notebooks
http://tinyurl.com/scipy2015-education
Project Jupyter
35. 35
Deploying JupyterHub for Education
Jessica Hamrick
Rackspace blog (2015-03-24)
https://developer.rackspace.com/blog/deploying-
jupyterhub-for-education/
Project Jupyter
36. 36
Literate Programming
Don Knuth
Univ of Chicago Press (1992)
literateprogramming.com/
Instead of imagining that our main task is
to instruct a computer what to do, let us
concentrate rather on explaining to human
beings what we want a computer to do
Evoking some earlier works…
37. 37
Most definitely check out CodeNeuro,
both online and the conf/hackathon…
Some great examples:
Jeremey Freeman, HHMI Janelia Farm
http://notebooks.codeneuro.org/
Matthew Conlen, NY Data Company
http://lightning-viz.org/
Olga Botvinnick, UCSD
http://yeolab.github.io/flotilla/docs/gallery/
Great Examples
40. 40
Embracing Jupyter Notebooks at O'Reilly
Andrew Odewahn
O’Reilly Media (2015-05-07)
https://beta.oreilly.com/ideas/jupyter-at-oreilly
O’Reilly Media is using our Atlas platform
to make Jupyter Notebooks a first class
authoring environment for our publishing
program
Jupyter, Thebe, Atlas, Docker, etc.
Content Toolchain
41. 41
Embracing Jupyter Notebooks at O'Reilly
Andrew Odewahn
O’Reilly Media (2015-05-07)
https://beta.oreilly.com/ideas/jupyter-at-oreilly
O’Reilly Media is using our Atlas platform
to make Jupyter Notebooks a first class
authoring environment for our publishing
program
Jupyter
Content Toolchain
42. 42
On Demand Analytic and Learning Environments with Jupyter
Kyle Kelley, Andrew Odewahn
lambdaops.com/jupyter-environments-odsc2015/
Exploring a couple themes, in particular:
• computational narratives
- exploratory data analysis
- software development/collaboration
- API exploration
- technical papers
- reports, exec dashboards
• code-as-media
- Thebe project, etc.
Content Toolchain
43. 43
Personal experiences during 2012-2015
as an author and instructor…
Just Enough Math
Paco Nathan
O’Reilly Media (2014)
http://justenoughmath.com
Content Toolchain
44. 44
Learnings based on working on this
project with Kyle and Andrew…
How to transit from roles of data scientist,
software developer, engineering director –
into roles of author, teacher – and vice versa
Content Toolchain
45. 45
Interactive notebooks:
Sharing the code
Helen Shen
Nature (2014-11-05)
nature.com/news/interactive-notebooks-
sharing-the-code-1.16261
Content Toolchain
46. 46
Content Toolchain
Atlas is our content platform backed by Git,
for project collaboration among authors,
editors, et al.
https://atlas.oreilly.com/
47. 47
Content Toolchain
Thebe (a moon of Jupiter) provides a layer
atop Jupyter that is needed for publishing,
white-labeled content, etc.
https://github.com/oreillymedia/thebe
49. 49
Content Toolchain
Contrast our current talent workflow and this
new world of Jupyter+Docker+Thebe+cloud …
How would it work with known successes such
as Head First?
production presentation
Thebe:
player
Jupyter:
notebook
Docker:
container
web page:
interaction
Git:
versioning
Atlas:
publications
various
formats
authoring
cloud
infra
53. 53
The Learning Architecture:
Defining Development and Enabling Continuous Learning
David Mallon, Dani Johnson
Bersin (2014-05-06)
http://www.bersin.com/Practice/Detail.aspx?
docid=17435&mode=search&p=Learning-@-Development
This report is designed to help leaders
and talent development and learning
professionals to take positive steps
toward understanding and implementing
learning architectures
Sidebar: Learning Architecture
54. Think of a favorite open source framework …
who (or where) are the experts in this graph?
Sidebar: Innovators vs. Experts
Diffusion of Innovation
Everett Rogers (1962)
http://sphweb.bumc.bu.edu/otlt/MPH-Modules/SB/SB721-
Models/SB721-Models4.html
54
55. 55
Building Blocks
In software engineering, we rarely hand a
developer the spec for some app and say
“Start from scratch, then come back when
you’re done.” Instead:
• focus on MVP
• leverage APIs, libraries, microservices, etc.
• iterate on small, incremental changes
• this allows for TDD, CI, etc.
• plus, customer experiments ➜ data science
Compare/contrast that with how publishers
approach authors, speakers, instructors?
56. 56
Building Blocks
Proposing a new format spec to replace
EPUB, MOBI, etc.:
• video segments + transcripts
• notebooks in Jupyter+Thebe+Docker
• metadata (persona, topics, cues, etc.)
• links to Git repos, Dat data
• annotations atop existing content
• webcast/livestream
• social interaction (TA/mentoring)
• evaluation modules
• discourse analytics
most reused across a spectrum
of synchronous to async
instrumented for experiments,
analytics, iteration
57. 57
total
newbie
good
overview
Do you have sufficient familiarity with the topic?
utterly
confused
familiar
territory
Can you build on familiarity with a related topic?
must get
unstuck
send pull
request
Do you have necessary proficiency in the topic?
learner
topic
experience
concise
topic
inter-
disciplinary
How many boundaries must you span to achieve structural literacy for this topic?
want to
for myself
have to
for my job
What is your primary motivation to learn this topic?
bleeding
edge
COBOL 2020
Where are you on the "diffusion of innovation" curve w.r.t. the topic?
on-
demand
major
event
How high is the transaction cost for the experience delivered to you?
"go read
the code"
full-team
participation
Does the learning experience immerse you within a diverse, supportive social context?
Dimensional Reduction
Did we mention intense needs
for data analytics at scale?
58. 58
Is it possible to measure “distance” between
a learner and a subject community?
From Amateurs to Connoisseurs:
Modeling the Evolution of User
Expertise through Online Reviews
Julian McAuley, Jure Leskovec
http://i.stanford.edu/~julian/pdfs/www13.pdf
Recommender Systems
59. 59
Back to “Uber for Learning” – approaching from a learner
(audience) perspective, generally within a social context
Given that:
• books aren’t used by learners as much anymore
• experts don’t have time to write books anymore
If we can:
• fit learners’ needs to topics w.r.t. subject communities,
based on their S-curve positions
• personalize lectures for learners’ pain-points
• reuse containerized building blocks
Imagine the extent to which our current data science
tooling and techniques can be leveraged?
Summary
60. 60
PS: If you are interested in opportunities
to write, speak, teach, mentor, code, etc.,
based on these approaches, let us know
Get Involved!