Had00b
Big data made simple
Big data made simple
Menu
Sorry, the page you were looking for in this blog does not exist.
Sorry, the page you were looking for in this blog does not exist.
Home
Popular Posts
Reservoir Sampling in MapReduce
[ Image source ] We consider the problem of picking a random sample of a given size k from a large dataset of some unknown size n . T...
Had00b - Introduction
Had00b is a Big Data blog for readers ranging from n00bs to advanced. We will discuss common algorithmic problems that arise in Big Da...
MapReduce: a gentle introduction with examples
A brief history of MapReduce and Hadoop. MapReduce is a programming framework for distributed processing originally developed at Google in...
Setup Apache Hadoop on your machine (single-node cluster)
Let's get your machine ready for some big data crunching! Installing Apache Hadoop on a single machine is very simple. Of course, the ...
Search
Labels
big data
(1)
distributed file system
(1)
easy
(1)
hadoop
(4)
hadoop streaming
(1)
hdfs
(2)
linux
(1)
mac os x
(1)
machine learning
(1)
mapreduce
(3)
matrix transpose
(1)
pseudo-distributed
(1)
python
(1)
random subset
(1)
reservoir sampling
(1)
sampling
(1)
setup
(1)
word count
(1)
Blog Archive
▼
2013
(4)
▼
August
(3)
Reservoir Sampling in MapReduce
Setup Apache Hadoop on your machine (single-node c...
MapReduce: a gentle introduction with examples
►
July
(1)