Mozilla’s Common Voice project

Any high-quality speech-to-text engines require thousands of hours of voice data to train them, but publicly available voice data is very limited and the cost of commercial datasets is exorbitant. This prompted the question, how might we collect large quantities of voice data for Open Source machine learning?

Source: Branson, M. (2018). We’re intentionally designing open experiences, here’s why.

One of the big problems with the development of AI is that few organisations have the large, inclusive, diverse datasets that are necessary to reduce the inherent bias in algorithmic training. Mozilla’s Common Voice project is an attempt to create a large, multilanguage dataset of human voices with which to train natural language AI.

This is why we built Common Voice. To tell the story of voice data and how it relates to the need for diversity and inclusivity in speech technology. To better enable this storytelling, we created a robot that users on our website would “teach” to understand human speech by speaking to it through reading sentences.

I think that voice and audio is probably going to be the next compter-user interface so this is an important project to support if we want to make sure that Google, Facebook, Baidu and Tencent don’t have a monopoly on natural language processing. I see this project existing on the same continuum as OpenAI, which aims to ensure that “…AGI’s benefits are as widely and evenly distributed as possible.” Whatever you think about the possibility of AGI arriving anytime soon, I think it’s a good thing that people are working to ensure that the benefits of AI aren’t mediated by a few gatekeepers whose primary function is to increase shareholder value.

Most of the data used by large companies isn’t available to the majority of people. We think that stifles innovation. So we’ve launched Common Voice, a project to help make voice recognition open and accessible to everyone. Now you can donate your voice to help us build an open-source voice database that anyone can use to make innovative apps for devices and the web. Read a sentence to help machines learn how real people speak. Check the work of other contributors to improve the quality. It’s that simple!

The datasets are openly licensed and available for anyone to download and use, alongside other open language datasets that Mozilla links to on the page. This is an important project that everyone should consider contributing to. The interface is intuitive and makes it very easy to either submit your own voice or to validate the recordings that other people have made. Why not give it a go?

I enjoyed reading (December)

This post is really delayed, mainly because I took a break from blogging over December and January. I was starting to feel an “obligation to blog”, which is when I know that I need to step back a bit and take some time off. There’s nothing worse than writing because you feel you have to, rather than actually wanting to. Now that I’ve had a break, I find myself feeling excited at the prospect of blogging again, which is a much better place to be.

9 reasons why I am NOT a social constructivist (Donald Clark): Interesting critique of the concept of social constructivism as a theory that explains learning. To be honest, I’ll admit to having accepted the authenticity of the theory because it fits in with how I believe the world is. However, I haven’t been at all critical of it. In the spirit of adopting a more critical view of my beliefs, this was a very good post to read.

Educators nod sagely at the mention of ‘social constructivism’ confirming the current orthodoxy in learning theory. To be honest, I’m not even sure that social constructivism is an actual theory, in the sense that it’s verified, studied, understood and used as a deep, theoretical platform for action. For most, I sense, it’s a simple belief that learning is, well, ‘social’ and ‘constructed’. As collaborative learning is a la mode, the social bit is accepted without much reflection, despite its obvious flaws. Constructivism is trickier but appeals to those with a learner-centric disposition, who have a mental picture of ideas being built in the mind.

Going Beyond ‘Learning to Code’: Why 2014 is the Year of Web Literacy (Doug Belshaw): I like the idea of people having a sense of how technology works. As more and more of our lives become integrated with technology, isn’t it important to understand how it affects us? How are the decisions we make increasingly influenced by those who write the code of the applications and devices we use? Think about pacemakers that determine the frequency and regularity of your heartbeat. Wouldn’t you want to make sure that there are as few software bugs as possible? My interest in this topic is more related to the idea of open source software and the importance of ensuring that as much code as possible is open for review by an objective and independent community. Mozilla’s Web Literacy standard is one small aspect of developing competence in a range of skills that are increasingly relevant to our ability to interact with others in the world.

In this post I want to argue that learning to code is part of a larger landscape that we at Mozilla call ‘web literacy’. I see that landscape as being increasingly relevant in 2014 as we come to realise that “learn to code!” is too simplistic and de-contextualised to be a useful exhortation. Web Literacy, on the other hand, is reasonably well-defined as the skills and competencies required to read, write and participate effectively online. We’ve included ‘coding/scripting’ as just one part of a wider strand identified as ‘Building’ (i.e. writing) the web. Other competencies in this strand include ‘remixing’ and ‘composing for the web’.

Do What You Love: A Selfish and Misguided Message (Dean Shareski):

By keeping us focused on ourselves and our individual happiness, DWYL [Do What You Love] distracts us from the working conditions of others while validating our own choices and relieving us from obligations to all who labor, whether or not they love it. It is the secret handshake of the privileged and a worldview that disguises its elitism as noble self-betterment. According to this way of thinking, labor is not something one does for compensation, but an act of self-love. If profit doesn’t happen to follow, it is because the worker’s passion and determination were insufficient.

Academic publishers must sort out their outdated electronic submission and review processes (Dorothy Bishop):

My relationships with journals are rather like a bad marriage: a mixture of dependency and hatred. Part of the problem is that journal editors and academics often have a rather different view of the process. Scientific journals could not survive without academics. We do the research, often spending several years of our lives to produce a piece of work that is then distilled into one short paper, which the fond author invariably regards as a fascinating contribution to the field. But when we try to place our work in a journal, we find that it’s a buyer’s market: most journals are overwhelmed with more submitted papers than they can cope with, and rejection rates are high. So there is a total mismatch: we set out naively dreaming of journals leaping at the opportunity to secure our best work, only to be met with coldness and rejection.

Side note: The above post included a screenshot of this tweet, which I enjoyed.

Selection_001

PHT402 online course accreditation

The #pht402 Professional Ethics course has just been accredited by the South African Society of Physiotherapists and Health Professions Council of South Africa for 6 Level 2 Ethics CPD points. If you are a South African physiotherapist and would like to take part in the course, please register here before 9th August.

Image from opensourceway's Flickr stream
Image from opensourceway’s Flickr stream

Over the past few weeks I’ve been running an open, online course in Professional Ethics for my 3rd year students, in collaboration with Physiopedia. Check out the project page for the details of the course, including the context and background. I also received ethical clearance from our institutional review board to study the process and outcomes.

One of the major decisions we made was to invite qualified physiotherapists to participate as well. We wanted to encourage interaction between our students and the “real world”, that intangible place we say we’re preparing our students for. In return, participants external to the university would receive a badge from Physiopedia. These badges are compatible with Mozilla’s Open Badge standard and so have value outside of the Physiopedia ecosystem.

Until recently the course was only an interesting experiment among our 3rd year students and the 26 international physiotherapists who are also participating. However, I’m now very happy to announce that the SASP and HPCSA have accredited the course for 6 Level 2 Ethics CPD points. They had an additional requirement for participants to write a short test at the end but other than that, the course was accepted as is.

By accrediting the course the SASP and HPCSA have given this method of learning a degree of legitimacy that I find really exciting from two organisations that I think are traditionally quite conservative. It’s one thing for it to be recognised as an interesting research project and quite another for the professional bodies to recognise it’s potential to provide learning opportunities for geographically distributed professionals. A significant challenge for qualified South African physiotherapists obtaining their annual Ethics CPD points is that the courses are most often only offered in major city centres (requiring travel and sometimes overnight accommodation) and the registration fees are usually quite high. Our course is online and self-paced, which acknowledges the unique time constraints of individuals, and is free.

Now that we’ve set a precedent, we’ll offer the course every year and try to build a model for physiotherapy education for appropriate subjects through distance learning. This has potentially massive implications for the profession in terms of:

  • Moving learning away from the classroom, which will impact on physical space requirements
  • Connecting the university to health care professionals at a global level, bringing in many unique perspectives from “the real world”
  • Introducing a host of digital and information literacies for participants
  • Emphasising a student-centred, self-directed approach to learning that empowers learners to take control of their learning
  • Opening up further opportunities for collaboration between academia and the profession

Watch this space for further details. On a related note, I’ve also entered the course into the Reclaim Open Learning Contest, which is being run by MIT. I’ll be sure to post the outcome here.

Connectivism and connective knowledge, 2009

I just registered for the Connectivism and connective knowledge (CCK09) course that’s going to start in September.  I first came across it when I did the Mozilla open education course earlier this year and have been keeping an eye on it in the meantime.  It’s a massively open online course that so far has 1000+ registered participants, and is hosted by George Siemens and Steven Downes.

From the 2008 course outline, the Connectivism and connective knowledge course is a “…twelve week course that will explore the concepts of connectivism and connective knowledge and explore their application as a framework for theories of teaching and learning. It will outline a connectivist understanding of educational systems of the future.”

Here’s the syllabus for the 2008 course, and the Moodle outline.  If you register for the CCK09 course, let me know so that we can keep in touch.

Mozilla Open Education project blueprint

If you’ve been following my recent posts, you’ll have realised that I’m participating in the Mozilla Open Education course, jointly hosted by the Mozilla foundation, ccLearn and the Peer to Peer University. The course has involved participating in online seminars over the past 6 weeks with the objective of creating a project blueprint that takes into account the concepts of open education, open technology and open licensing.

I decided that my project was going to involve something I’ve been thinking about for a few months and saw the course as an opportunity to take the first few tentative steps. The idea was to create an online, distributed authoring environment that would allow physiotherapy clinicians, educators and students to participate in collaboratively writing a national physiotherapy textbook. The problem with imported (American and British) textbooks is a complete lack of cultural and contextual relevance, as well as being associated with a high cost and not being adaptable to local needs (think, multiple languages).

I won’t go into any more detail here (check out the blueprint page), only to say that the idea is taking shape slowly and that I’m quite excited at the prospect of refining it over the next month or so. The course was so information-heavy (not so much from the organisers, but from the back chatter of participants) that it’s going to take some time to review the aggregated content.

Mozilla Open Education course: seminar 6

I know that this is all out of sync but the audio for sessions 4 and 5 aren’t up yet and I haven’t had a chance to go through the slideshows yet.  Today’s session was about the actual practice of teaching, using “open” as a framework.  Here are my notes:

Session 6 – Open pedagogy

Focus on educators and the impact of “open” on them.

Jason Jones

Initially started using wikis for groupwork.

Noticed a few problems when teaching – no one takes notes in class, “no real content”, inattention.  Also, when taking notes, educators aren’t always sure what notes are being taken.  Notes can “go wrong” when other thoughts intrude or when students mis-hear.

Paper notes are hard to improve and are private and difficult to organise.

Wikis are public and solve some of the problems just mentioned.  Everyone collaborates and there is negotiation of content.

An unexpected result was noticing that under the old system of teaching the only way you would know if the students have the wrong information is when they fail a test.  With a public wiki, you realise more quickly that students may be on the wrong track.

Lessons learned along with way.  Merely pointing students towards the wiki doesn’t work.  Students don’t always understand technology.  They’re also not sure what to record when taking notes, so templates are useful.  Students can sometimes find it difficult to use other resources (one benefit of using wikis / being online).

Problem of using old assessment techniques with new approaches to teaching and learning.

Garin Fons

Using wikis to get faculty to put teaching materials online, as well as collaborating with dedicated classmates to build community (reflect on communities of practice).

With wikis, faculty get a chance to have materials edited and reviewed in a way they can’t do alone.

Participatory pedagogy – John Seely Brown and the social view of learning.  We can no longer look at the classroom in a cartesian system.  We participate, therefore we learn.

Melanie McBride

Students create blogs as emerging professionals, rather than personal blogs (about what’s happening in their industry).

Found that some students weren’t very keen on blogging.  Reasons included: “I don’t know who I am yet, or who I want to be (powerful statement)…and that some don’t like the idea of being told what to do.  Anonymity was also an issue.

Students did take ownership of their own emerging industry knowledge.

“Banking” model of education = passive recipients of education.

Concerned with progressive asessment models.  Using wiki as means of checking in on student learning.

Issues of social justice and equity.  Not every student has access to tech (in America…try Africa).  Educators must be aware of that.

Pre-defined roles fall away with open pedagogy – students take ownership of courses and rewrite / restructure them.  Allow this to happen.  This can make teachers nervous.  Dichotomy of losing control but giving freedom.  Be careful about too much freedom.

Teachers and control…depends on the teacher, if they’re willing to dive into the participatory learning environment.  Getting teachers involved in the process.  What does their classroom look like normally and what is their teaching style?  Are they willing to break out of that?  if not, it’s difficult to move forward with this approach.

Mozilla Open Education course: seminar 3

Open web tech

Again, I missed this seminar because of poor internet connectivity on the day and am catching up on the audio after the fact.  Here are my notes from the presentation given by Mozilla’s Chris Blizzard.

  1. Open as a concept
  2. Innovation and change = important building blocks
  3. Relevance and why open matters
  4. Repurposing key web technologies

“Open”: what does it mean?  First of all, the opposite of open is not necessarily “closed”…though useful terms, in this context they shouldn’t be seen as polarising.  In the context of the open web, the opposite of open may be thought of as opaque…you don’t understand how it works, can’t see inside it, don’t know how it came about.  Gives a sense of the visual.  Therefore, open could be thought of as “transparent”.

Not requiring permission is an important component of open because it relates to patents, licensing, etc.  Comparison of video codecs like h264 and ogg theora and the difference that open licensing makes with regards permission to use the code.

Side note: all content from this course is available under an open license for anyone to re-purpose for any use.

“Generative” – word that is used widely in academia.  Meaning that through your action you allow others to do something as well. It allows people other than the original creator of the work to change the work and use it for things that the creator didn’t think of, it facilitates the mulitiplication of efforts and exploration.

“Innovation” is over-used in many circles…a black box in which things are improved but where the process is invisible.  The most important characteristic of innovation is that it represents change (both good and bad change).  Intentional disruption = standing up to make a difference in a way that’s going to be uncomfortable…and people are often reluctant to change because it’s uncomfortable.  Setting things up to purposefully be uncomfortable and going up against various interests (possibly commercial or political) who would not benefit from that change.  Setting yourself up against the status quo.  In an open model where you’re trying to encourage change / innovation / disruption, you’re going to run up against issues.

Where does experimentation come from?  Assume that progress and innovation stem from experimentation and failure (learning from our mistakes), it’s important to understand this process as it leads to change.  The core group of contributors to large projects are not necessarily the ones doing the experimenting, it usually comes from the periphery.  How do you set yourself up to have “edges” in the community and be open in order to promote experimentation and innovation?  This disruption is difficult for business to commit to because it’s hard to determine future value in experimentation and innovation.

As messy and painful as it is, the open web has worked well.  Very few other inventions have disrupted communication so comprehensively before the web (maybe the printing press, telephone).  An instantaneous communication network that people are continually changing and re-purposing without having to ask permission from anyone is very important.  The nature of the web made this possible i.e. intentionally built on a model of open technology / software where anyone could make changes without permission.

What makes something open web technology?  Web browser is the gateway to the web and we spend a lot of time using it, therefore it should be comfortable and easy to use.  Can you see the page source to understand how it works?  Being able to look at somebody’s source is part of the transparency / open-ness of the web.  Source is delivered (HTML, Javascript) and compiled / executed locally.  Historical mistake where originally authors were writing simple documents where source didn’t matter as much.  Now, this presents as a learning opportunity where others can see what you’ve done and use it in other ways.  This doesn’t mean that you should copy and paste everything, rather figure out how it works and learn that way.

If you have access to the source you may be able to figure out the API (or the API is open), which means that you can then re-purpose the application.  Twitter is an example…even though it’s only a simple application (status updates), others have figured out how to use it in different, more complex ways because of it’s open API and a whole ecosystem has developed around it. 

Another example is how people have changed Google search by implementing code in the browser, even though Google hasn’t explicitly given that permission.  An example of people using the open-ness of the web to figure things out and make changes that have not explicitly been allowed by an open license.

Key peices of open web technology:

  • HTML = core of open web, describes document structure, content, continually improving and evolving
  • XML = more generalised data management (not as widely used), semantic meaning is important in the open web
  • CSS = controls presentation of content (unlike HTML), can imply visual structure, media context, also implies semantic meaning
  • Images = static visual medium that conveys expression (jpg, png are simple but allows everyone to use), adds context to the open web
  • Javascript = integration of all the other peices, makes the static web dynamic
  • Open video = transparent, generative, not closed implementation of web video (in contrast to Flash), using ogg theora (patent- and royalty-free video codec)

Mozilla Open Education course: seminar 2

Open educational resources

I missed the second session of the Mozilla Open Education course that was held about two weeks ago because of Internet issues, and only just had the opportunity to listen to the audio. Here are my notes from the session, which featured a panel of experienced users and creators of Open Education Resources (OER).

Began with an overview of the open ed movement / background to set the context against which the case studies are set…what is the big picture? OER features many people involved at many levels, using many technologies and business models are being built around this idea…shows it’s an idea who’s time has come.

Create a movement of diversity, seeing how different ideas play off one another.

Fundamental adherence to openness means that ideas and content designed for one task need not be delimited to that task but can be “re-packaged” for others i.e you needn’t design materials for everybody, just for your own needs, but then to endow it with the characteristics (legal and technical) that make it available for everybody to redesign.

OER should be:

  • designed to give learners access to a broad array of tools
  • available for anyone to use/share/adapt to their needs
  • relevant for formal/informal and lifelong learning needs

Open licensing is crucial – current systems undermine the premise that creative content can be shared and changed, therefore OER is important for catalysing new ways of learning, critical thinking, collaboration, engagement, reflection

Education is the key to an informed population, therefore it needs broad, optimistic ideas that do away with the notion that “you don’t get to have an education because of your circumstances”.

4 topics that came from previous interviews:

  1. Open means not being afraid to solve problems publically (and to fail publically)
  2. Open means creating space for people to do things that you don’t anticipate
  3. Open means giving up control
  4. Open means sharing models that others build on for quick diffusion of good ideas

What is an edupunk and how does it relate to online learning? Edupunk came from a notion that you could do a lot in education by yourself, and not being afraid to fail. Moving against the corporate base who designs courses based around management, rather than learning (isn’t this a bigger problem within Learning Management Systems. Take this further with the idea of “managed learning”). Also, proprietary, no control, they shape our learning experience.

Traditional methods of learning and teaching are clean, easy and simple for lecturers to follow, textbooks are available, curriculum can be moved through in a predetermined way, boundaries are evident. Open source communities allow involvement with real things, which can be scary…you don’t always know where it’s going. The opportunities to talk about things that wont’ come up in other contexts adds to a richer expereince. Better place to learn because it scales.

Discussed issues with institutions catching on to and embracing change, eg. hosting content on external servers.

Difficult to get students to contribute to blogs:

  • Thought no-one would read it
  • Thought that if they did read it, they’d think it was stupid

Realised that by aggregating content, they could draw a much larger audience. Students were blown away by comments on blogs (profound moment when the person you’re blogging about comments on your blog). Aggregation helps build critical mass. Powerful idea that people from all over the world are reading your work and following it.

A key competency is understanding how to manage online identities. Posts can’t be thrown out there, reflection before posting is important because these conversations are available forever. People beocme more conscious about how ideas and conversations can travel.

Surprised at how few students read and understand how blogs work. Need to teach them how the internet works. Communciation needs to change, tone, strategy. “Learning to write in a way that honours the web”. We need to spend time teaching students how to communicate online, in a living and open way. It’s wrong to think that this is the Facebook generation and that they know how to do this.

Students taking control of their work and presenting or “re-presenting” themselves online. Where they live online and how they work online. Online identity and data portability. Moving beyond the limited view of institutional services…not about email addresses or university webspaces…framing their own online identity outside of the institution.

Regarding Weave for an “educational passport”. Students taking their own digital identity and learning experiences with them when they leave univerity…portfolios of learning that they own. Storing personal information through the browser that the student owns and can always access. Aggregating online identity through your own domain.

Not about building resources, it’s about building community. Forget about building the one hoop that you can re-use every year to make new students jump through. How can I make sure that my community of students is healthy and finding their own hoops?

Mozilla Open Education course and other thoughts

I was unable to participate in the second session for the Mozilla Open Education course due to local Internet problems that meant I had no sound.  While it was frustrating to begin with, I realised that this is the reality of the situation in most countries and that while we talk about open this and open that, we’re not going to make real progress in South Africa until we get decent bandwidth, lower access costs and deeper penetration of the service.

Taking this idea a littler further, I went on to work out that I’m one of the fortunate people in the top 1% of people in South Africa who have a broadband connection at home, which means that the majority of citizens in this country will remain completely unaware of everything I do that relates to the use of technology in education.  This really helps to keep things in perspective, as high levels of poverty and crime are far more important issues in terms of social change, than the results of my blogging assignment.

I guess my point is that it’s easy to get frustrated with the technical problems experienced as part of this online web seminar, but that I live in a developing country where my lack of streaming audio is the least of our problems.

PS.  In case you’re wondering “Why bother if the technology is so limited?” my plan is to use technology to improve physiotherapy education, which will create better physiotherapists, who will then improve the health service, which will have a positive effect on large numbers of the population 🙂

Note: I calculated the percentage of people with broadband by taking the number of ADSL subscribers in 2008 as a percentage of the population from the 2008 census.  It’s not very accurate but gives a decent estimate.

Twitter Weekly Updates for 2009-04-05

  • @sharingnicely for what it’s worth, my vote goes to #mozopened in reply to sharingnicely #
  • @reflectivelrnr Sometimes, they find you 🙂 in reply to reflectivelrnr #
  • Just went through Alltop Twitterati (http://bit.ly/CoiAC). Are the people with the most to say the least interesting to follow? #
  • Very excited to be participating in Mozilla open education online course http://bit.ly/82ksO #
  • Insightful post: “9 great reasons why teachers should use Twitter” http://bit.ly/qexSG #
  • I hate to be cliched, but “Slumdog Millionnaire” is the best movie I’ve seen in 5 years #
  • Participating in online, open education course with Mozilla, ccLearn and Peer 2 Peer University #
  • Great first seminar on #mozopenedcourse, minor tech glitches. Lots to think about. Looking forward to next week http://bit.ly/82ksO #
  • Just watched “Accepted”…it came on and the remote was too far away. Light hearted comedy about higher education http://bit.ly/WE2mV #
  • @JasonCalacanis Every year the rich pledge a lot of money to the world’s poor. They have yet to deliver. Just another empty promise… in reply to JasonCalacanis #
  • Just posted my notes from today’s #mozopenedcourse seminar. Interesting session, plenty of food for thought http://bit.ly/9DL3G #
  • “Physiopedia”, an awesome evidence-based physiotherapy reference site with really great content http://bit.ly/14IyvT #
  • Just watched “Sicko”…scary, tragic, sad, criminal…all the things that healthcare shouldn’t be http://bit.ly/gvYOO #
  • Another reason to not be a fan of Blackboard. Just my opinion http://bit.ly/gMCFB #
  • Using wikis in learning and teaching, from Leeds University, interesting stuff incl. tips on assessing wiki content http://bit.ly/Ery7 #
  • Great resource for summaries of physio-related articles, available at Physiospot http://bit.ly/wCTER #

Powered by Twitter Tools.