Mining of Massive Datasets
The book, like the course, is designed at the undergraduate pc science level with no formal prerequisites. To support deeper explorations, most of the chapters are supplemented with further reading references.
The Mining of Massive Datasets book has bot published by Cambridge University Press. You can get a 20% discount by applying the code MMDS20 at checkout.
By agreement with the publisher, you can download the book for free from this pagina. Cambridge University Press does, however, retain copyright on the work, and wij expect that you will obtain their permission and acknowledge our authorship if you republish parts or all of it.
Wij welcome your terugkoppeling on the manuscript.
The MOOC (Massive Open Online Course)
Wij are running the third edition of an online course based on the Mining Massive Datases book:
The course starts September 12 2015 and will run for 9 weeks with 7 weeks of lectures. Extra information and registration.
The 3rd edition of the book (v3.0 beta)
Wij are developing the third edition of the book.
You can see the current state of the fresh edition, along with a description of the switches so far here.
The 2nd edition of the book (v2.1)
The following is the 2nd edition of the book. There are three fresh chapters, on mining large graphs, dimensionality reduction, and machine learning. There is also a revised Chapter Two that treats map-reduce programming te a manner closer to how it is used ter practice.
Together with each chapter there is aslo a set of lecture glides that wij use for training Stanford CS246: Mining Massive Datasets course. Note that the slips do not necessarily voorkant all the material convered te the corresponding chapters.
Download the latest version of the book spil a single big PDF verkeersopstopping (511 pages, Trio MB).
Download the utter version of the book with a hyper-linked table of contents that make it effortless to hop around: PDF verkeersopstopping (513 pages, Three.Sixty-nine MB).
The Errata for the 2nd edition of the book: HTML.
Note to the users of provided glides: Wij would be delighted if you found this our material useful te providing your own lectures. Feel free to use thesis glides verbatim, or to modify them to gezond your own needs. PowerPoint originals are available. If you make use of a significant portion of thesis slips te your own lecture, please include this message, or a verbinding to our web webpagina: http://www.mmds.org/.
Comments and corrections are most welcome. Please let us know if you are using thesis materials te your course and wij will list and listig to your course.
Stanford big gegevens courses
CS246: Mining Massive Datasets is graduate level course that discusses gegevens mining and machine learning algorithms for analyzing very large amounts of gegevens. The emphasis is on Ordner Reduce spil a implement for creating parallel algorithms that can process very large amounts of gegevens.
CS341 Project ter Mining Massive Gegevens Sets is an advanced project based course. Students work on gegevens mining and machine learning algorithms for analyzing very large amounts of gegevens. Both interesting big datasets spil well spil computational infrastructure (large MapReduce cluster) are provided by course staff. Generally, students very first take CS246 followed by CS341.
CS341 is generously supported by Amazon by providing us access to their EC2 toneel.
CS224W: Social and Information Networks is graduate level course that covers latest research on the structure and analysis of such large social and information networks and on models and algorithms that abstract their basic properties. Class examines how to practically analyze large scale network gegevens and how to reason about it through models for network structure and evolution.
You can take Stanford courses!
If you are not a Stanford student, you can still take CS246 spil well spil CS224W or earn a Stanford Mining Massive Datasets graduate certificate by completing a sequence of four Stanford Laptop Science courses. A graduate certificate is a excellent way to keep the abilities and skill ter your field current. More information is available at the Stanford Center for Professional Development (SCPD).
If you are an instructor interested te using the Gradiance Automated Homework System with this book, embark by creating an account for yourself here. Then, email your chosen login and the request to become an instructor for the MMDS book to [email protected] You will then be able to create a class using thesis materials. Manuals explaining the use of the system are available here.
Students who want to use the Gradiance Automated Homework System for self-study can register here. Then, use the class token 1EDD8A1D to join the “omnibus class” for the MMDS book. See The Student Guide for more information.
Previous versions of the book
The following materials are omschrijving to the published book, with errata corrected to July Four, 2012.