posted by 코딩 공부중 2020. 1. 3. 11:16

웹사이트 글자수 세기

java mapreduce 사용

1.wordcount class

public class WordCount {
   
    public static void main(String[] args) throws Exception{
       
        Configuration conf = new Configuration();
       
        if(args.length!=2) {
            System.err.println("Usage : WordCount  ");
            System.exit(2);
           
        }
       
        Job job = new Job(conf, "WordCount");
       
        job.setJarByClass(WordCount.class);
        job.setMapperClass(WordCountMapper.class);
        job.setReducerClass(WordCountReducer.class);
         
       
        job.setInputFormatClass(org.apache.hadoop.mapreduce.lib.input.TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);       
   
       
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
       
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        job.waitForCompletion(true);
    }

} 

2.mapper class

public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();
   
    @Override
    protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, IntWritable>.Context context )
            throws IOException, InterruptedException {
       
        StringTokenizer itr = new StringTokenizer(value.toString());
        while(itr.hasMoreTokens()) {
            word.set(itr.nextToken());
            context.write(word, one);
        }
    }
} 

3.reducer class

public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
   
    private IntWritable result = new IntWritable();
   
    @Override
    protected void reduce(Text key, Iterable values,
            Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {
        int sum = 0;
        for (IntWritable val : values) {
            sum += val.get();
        }
       
        result.set(sum);
        context.write(key, result);
    }

} 

4.실행결과

'빅데이터' 카테고리의 다른 글

[mapreduce]교통사고 발생건수 통계  (0) 2020.01.03
[mapreduce]ncdc 연도별 기온 통계  (0) 2020.01.03
[mapreduce]인구수 통계  (0) 2020.01.03