Machine Learning and Data mining trends in CET project
Shinichi Takayanagi from Recruit Communications and Recruit Lifestyle talked about Real-time Analysis Platform “CET” (Capture Everything) developed by Recruit group. His team utilized Apache Spark, Google Cloud Platform, Leaflet and some tools to provide Real-Time analysis dashboard which displays reservation history of Japanese major hotel reservation service “Jalan“. Also, his team developed prediction engine for web form. The engine predict and set default value in hotel reservation forms (payment type and etc.) to improve customer satisfaction. Details of CET are described in this article.
Plotting Data on Map in R with Leaflet
Non-tabular Data processing using Purrr
@sinhrks a committer of Pandas gave a presentation about purrr (pronounce as PU-RAA? *please let us know correct pronunciation* ).
The purrr is a functional programming tool and a data processing package. The package enables us to apply functions to dataframe in a smart way. He recommended to use purrr with a machine learning package caret to create/evaluate a model, and a visualization package ggfortify for creating charts.
Room, Shirt and Me
Kazuya Wada (@wdkz) a Data Mining Engineer gave a presentation named “Room, Shirt and Me” (Originally it is a pop music sung by Aya Matsuura). In his presentation, he talked about a Web application made with shiny, rApache and DeployR. This combination resembles “Room”, “Shirt” and “Me”, according to him (I cannot understand why…).
He demonstrated his web application running on the deployR which provides words that have similar meaning using a text analysis tool Word2Vec. He recommended creating a web application with the DeployR is the best solution to share your deliverables with people who cannot code. Also, an audience commented there is a similar tool named OpenCPU and it has flexible output features.
Lightning Talks (Small presentations)
More than 20 people gave a presentation. (And that took 2 hours!) I will share some of the presentations spoken in LT session.
Gepuro task views
Atsushi Hayakawa (@gepuro) a host of Japan.R wanted to know useful R packages but not known by everyone. Also, he wanted to search R packages hosted on the Github. That’s why he developed a website named “Gepuro task views“. His website displays useful R packages clawed from Github on a batch process and automatically categorized with his algorithm.
Hot topics of Julia
Kenta Sato (@bicycle1885) shared hot topics of Julia. Also he attended the Julia summer of code and developed features/packages for Julia with his teammates. He stated nowadays 700-800 add-on packages are available, also he introduced some topics; threading feature which was developed by Intel; FRB released financial economics model available on Julia.
Estimating the effect of advertising with Machine learning
Shota Yasui (@housecat442) from Cyber Agent implemented an algorithm for predicting effects of advertising. Most marketers believe to advertise surf board in California is more effective than in Arizona. But it has a selection bias and cannot compare them fairly. So, he implemented an algorithm from the thesis Varian (2014) with a data set of store sales provided by Kaggle. In his program, he used Gradient Boosted Decision Tree with xgboost package and estimated the ad effect.
Naming with R
@hoxo_m got his baby in this October. His wife request him to give a name for his baby, but he was not good at naming. So, he decided to create an R program which generates an appropriate baby name from millions of names. His program scrapes data from a web service “enanae.net” which suggests baby names as a good fortune using rvest and lambdaR packages. Finally his wife chose his baby name from a list created by his R program!
SparkR and Parquet
Ryuji Tamagawa (@tamagawa_ryuji) a translator and he translates Japanese edition of O’relly books. He introduced a his new book Japanese edition of “Advanced Analytics with Spark” and demonstrated some SparkR codes. Although his slot was only 5 minutes, he could conduct a demonstration of large data manipulation from his RStudio and proved SparkR is a fast and easy solution.
SeekR Annual Search Trends Report 2015
Takekatsu Hiramura (@hiratake55) a webmaster of “SeekR” a search engine for R users. In his presentation, he gathered frequent search keywords in 2015 and introduced some tools, articles and R packages like a “fft“, “Kriging“, “RMarkdown“, “RPresentation“, “bitcoin” and etc.
Pizza party and Izakaya party were held after the conference and we discussed through midnight.
Japan.R was a great opportunity for sharing knowledge and creating a network for me. Please let us know if you want to attend or give a presentation in the next Japan.R!
Overall presentations in Japanese are listed on this blog.