The Annual Japanese R user conference “Japan.R 2015”

The annual Japanese R user conference “Japan.R” was held in Dec 5th, 2015 at Recruit Ginza 8 Bldg and attended by more than 200 R users. In this post, I will share some presentations in English.

IMG_20151205_133705

Talk sessions

Machine Learning and Data mining trends in CET project

Shinichi Takayanagi from Recruit Communications and Recruit Lifestyle talked about Real-time Analysis Platform “CET” (Capture Everything) developed by Recruit group. His team utilized Apache Spark, Google Cloud Platform, Leaflet and some tools to provide Real-Time analysis dashboard which displays reservation history of Japanese major hotel reservation service “Jalan“. Also, his team developed prediction engine for web form. The engine predict and set default value in hotel reservation forms (payment type and etc.) to improve customer satisfaction. Details of CET are described in this article.

IMG_20151205_135331

Plotting Data on Map in R with Leaflet

Kazuhiro Maeda (@kazutan) gave a tutorial presentation for visualizing geolocation data using Leaflet. The R package leaflet enables us to visualize geolocation data (Circles, Polygons, Polylines and etc.) on the Open Street map. And it does not require Javascript coding skill. Generated maps can be exported as an HTML file and embedded into your blog, shinyapps or RPubs. He demonstrated a typhoon path visualization app and a restaurant map for conference attendees. Also he introduced some experimental features like MiniMaps, ScaleBars, Measures and AwesomeMarkers (These features are available only in Github)

IMG_20151205_143333

Slides: Plotting Data on Map in R with Leaflet

Non-tabular Data processing using Purrr

@sinhrks a committer of Pandas gave a presentation about purrr (pronounce as PU-RAA? *please let us know correct pronunciation* ).
The purrr is a functional programming tool and a data processing package. The package enables us to apply functions to dataframe in a smart way. He recommended to use purrr with a machine learning package caret to create/evaluate a model, and a visualization package ggfortify for creating charts.

Slides: Non-tabular Data processing using Purrr

Room, Shirt and Me

Kazuya Wada (@wdkz) a Data Mining Engineer gave a presentation named “Room, Shirt and Me” (Originally it is a pop music sung by Aya Matsuura). In his presentation, he talked about a Web application made with shiny, rApache and DeployR. This combination resembles “Room”, “Shirt” and “Me”, according to him (I cannot understand why…).
He demonstrated his web application running on the deployR which provides words that have similar meaning using a text analysis tool Word2Vec. He recommended creating a web application with the DeployR is the best solution to share your deliverables with people who cannot code. Also, an audience commented there is a similar tool named OpenCPU and it has flexible output features.

Lightning Talks (Small presentations)

More than 20 people gave a presentation. (And that took 2 hours!) I will share some of the presentations spoken in LT session.

Gepuro task views

Atsushi Hayakawa (@gepuro) a host of Japan.R wanted to know useful R packages but not known by everyone. Also, he wanted to search R packages hosted on the Github. That’s why he developed a website named “Gepuro task views“. His website displays useful R packages clawed from Github on a batch process and automatically categorized with his algorithm.

Hot topics of Julia

Kenta Sato (@bicycle1885) shared hot topics of Julia. Also he attended the Julia summer of code and developed features/packages for Julia with his teammates. He stated nowadays 700-800 add-on packages are available, also he introduced some topics; threading feature which was developed by Intel; FRB released financial economics model available on Julia.

Estimating the effect of advertising with Machine learning

Shota Yasui (@housecat442) from Cyber Agent implemented an algorithm for predicting effects of advertising. Most marketers believe to advertise surf board in California is more effective than in Arizona. But it has a selection bias and cannot compare them fairly. So, he implemented an algorithm from the thesis Varian (2014) with a data set of store sales provided by Kaggle. In his program, he used Gradient Boosted Decision Tree with xgboost package and estimated the ad effect.

Naming with R

@hoxo_m got his baby in this October. His wife request him to give a name for his baby, but he was not good at naming. So, he decided to create an R program which generates an appropriate baby name from millions of names. His program scrapes data from a web service “enanae.net” which suggests baby names as a good fortune using rvest and lambdaR packages. Finally his wife chose his baby name from a list created by his R program!

SparkR and Parquet

Ryuji Tamagawa (@tamagawa_ryuji) a translator and he translates Japanese edition of O’relly books. He introduced a his new book Japanese edition of “Advanced Analytics with Spark” and demonstrated some SparkR codes. Although his slot was only 5 minutes, he could conduct a demonstration of large data manipulation from his RStudio and proved SparkR is a fast and easy solution.

SeekR Annual Search Trends Report 2015

Takekatsu Hiramura (@hiratake55) a webmaster of “SeekR” a search engine for R users. In his presentation, he gathered frequent search keywords in 2015 and introduced some tools, articles and R packages like a “fft“, “Kriging“, “RMarkdown“, “RPresentation“, “bitcoin” and etc.

 

Party

Pizza party and Izakaya party were held after the conference and we discussed through midnight.

PANO_20151205_225234

Conclusion

Japan.R was a great opportunity for sharing knowledge and creating a network for me. Please let us know if you want to attend or give a presentation in the next Japan.R!

Overall presentations in Japanese are listed on this blog.

Posted in Meetup, R

How to import a large CSV File to SAP HANA from HANA Studio / HANA Tools

In this post, I’ll introduce how to import a CSV (and also Microsoft Excel (.xls, .xlsx)) File to SAP HANA from your Eclipse based development tool HANA Studio / HANA Tools.

1. Select “File” > “Import” from menu

2. Select “SAP HANA Content” > “Data From Local File”

3. Select target System (Database)

4. Specify File location, file layout and table name.
(In this case, First row has each column name)

5. Check table definition and data mapping

6. Check upload progress and table content

(* You can open table content by right-click and select “Open Data Preview”)

These steps are very easy and you can do export dataset from SAP HANA by HANA Studio / HANA Tools as well.

See also

HANA Academy – Importing Data from CSV file – YouTube

Posted in SAP HANA

RForcecom Demo Video

Recently, I have created a demo video of an R package named RForcecom which connect to the Salesforce.com and Force.com from R.

The video consists of 4 parts.

  1. Install and load RForcecom
  2. Sign into the Salesforce.com
  3. Get opportunity list from Salesforce.com
  4. Visualize opportunities as decision tree (using rpart and rpart.plot package)

This tutorial can be utilized as win probability analysis if your organization uses Salesforce.com. Also, demonstation code is available on the GitHub.

RForcecom

Posted in R, Salesforce.com

Tokyo-based R Meetup TokyoR #45

Last week, I attended a Tokyo-based Statistical Software R meetup named “TokyoR #45” held on Jan 17 at VOYAGE GROUP office in Shibuya, Tokyo. Almost all presentations were given in Japanese, but in this post I’ll share brief a summary of those presentations in English.

2015-01-17 22.01.26

The meetup consists of 3 sections. Beginner sessions, Advanced sessions and Lightning Talks (LT).

Beginner sessions

Nobuaki Oshiro (@doradora09) gave a three minutes version of his “Learning R in 10 minutes” presentation which includes what is R, who should use R, how to install R, how to code R and where to acquire information about R on the web.

Takashi Minoda (@aad34210)’s session was about fundamental R such as if statement, Loop and plot graphs. Also he mentioned how to visualize data using rCharts and googleVis package.

Advanced sessions

Tetsuro Ito (@tetsuroito) talked about “Hot topics of R in 2015”. He introduced some packages, such as “anomalydetection” by Twitter, “ver 1.0 of ggplot2” by Hadley Wickham. Also he stated nowadays most of hot packages are stored in github, not CRAN. Additionally, he introduced the newly released book “The Lean Analytics“, a methodology for data scientists.

Shinya Uryu (@u_ribo)’s presentation was “Data pretreatment for Data pretreatment”. According to his presentation, to reduce data pretreatment time enables expanding time for data analysis. That makes us get high-quality output. Also, he recommended to use R project file (.RProj) and R markdown file (.Rmd) on the Rstudio that integrate team members and their deliverables.

Yatsuta Toshihisa (@tyatsuta) talked about “Typed Function”. As you may know, to process huge size of dataset on R requires too long CPU time, however the “Typed Function” enables fast processing with efficient memory allocation. The “Typed Function” is 250 times faster than normal loop.

Yohei Sato (@yokkuns) talked about Kernel-Multivariate analysis. As you can see the background of his slides are a Colonel Sanders. Actually, the character of the KFC “Colonel Sanders” is known as “Kernel Ojisan” in Japanese.

Lightning Talks (Short presentations)

Yoshio Tokorosawa (@dichika); also known as Serial Package Creator (Seripac) developed an project schedule management system on R. His system utilize some R packages sinchokuR, AnomalyDetection and twitteR. SinchokuR retrieves the schedule data from github and AnomalyDetection checks is schedule behind or not. When the system caught behind schedule, then notify it to developer via twitter. Also, he announced he is translating the book “Advanced R” with some folks and it will be released in Japanese.

K Mori (@wonder_zone) developed a favorite anime character recommendation system with SVM (Support Vector Machine) using dplyr and e1071 packages. Source code of his system are available at his github repository.

I, Takekatsu Hiramura (@hiratake55) talked about the newly released book “R and cloud computing” written by Ajay Ohri and myself. I introduced some cloud service providers which are compatible with R such as Amazon Web Service, Google Prediction API, BigML, Microsoft Azure ML, plot.ly and Yhat.

Tatsuya Tojima (@salinger001101) shared an idea of using the continuous integration tool Jenkins as an analytical reporting tool. His project makes daily reports using R and Jenkins on batch processing automatically.

@ksmzn developed a web application for learning probability distribution with shiny, rCharts and nvd3.js. His app is available at ShinyApps.io.

Networking Event

There were more than 80 R users from software, consulting, banking, social media and other industries. They shared idea, talked and drank a lot.

2015-01-17 19.54.09

Next meetup TokyoR #46 was scheduled on Feb 21. When you have any opportunities to visit Tokyo, please join our meetup. If you have any questions or requests please feel free to contact me.

Posted in Meetup, R

How to do a silent install of R

In this post, I’ll introduce how to do a silent install of R. Assume that you are a faculty member at an R course and need to prepare R environments for each students’ PC. In this case, you can install R, RStudio and R package in just one-click by their silent install mode.

1. R silent installation
According to the R FAQ, the R installer has command line options for silent installation “/SILENT” and “/VERYSILENT“. Download the R installer and run the command “R-3.1.0-win.exe /SILENT” from your command prompt enables you to do silent install.

p02

2. RStudio silent installation
RStudio also has silent installation option. This support page describes how to run as a silent mode. According to the page the Rstudio installer has silent option “/S” and the command “RStudio-0.98.507.exe /S” enables you to do a silent install.

p03

3. R package silent installation
R packages such as ggplot2 or plyr are installable from the command line.

3-1. Download the R packages from CRAN site
Download packages and all required/dependent packages(s) mentioned in CRAN page.

p04

3-2. Run a silent installation command
Below is an example of the command.

"%ProgramFiles%\R\R-3.1.0\bin\R" CMD INSTALL Rcpp_0.11.1.zip

p07

4. Making a silent installation script
Create a silent installation script to enable one-click installation.

4-1. Download installers and R packages and store them into the same folder
p08

4-2. Make a BAT file
Below is a code example, and save it as BAT file (ex: Rinstall.bat).

R-3.1.0-win.exe /SILENT
RStudio-0.98.507.exe /S
"%ProgramFiles%\R\R-3.1.0\bin\R" CMD INSTALL Rcpp_0.11.1.zip
"%ProgramFiles%\R\R-3.1.0\bin\R" CMD INSTALL plyr_1.8.1.zip
pause

p05

4-3. Run the BAT file as an administrator
p06

These procedures are quite simple and also available when you are updating your R environment. Let’s try when you become an R lecturer.

Posted in R

Today is my 10,000 days old birthday

I’ve been calculated my day of 10,000 days old birthday in R since few days ago.
I found that to calculate this in R is quite simple.

My birthday to 10,000 days old birthday:

> as.Date("1986-09-21") + 10000
[1] "2014-02-06"

Birthday to days since my birthday:

> Sys.Date() - as.Date("1986-09-21")
Time difference of 10000 days

Just for reference, below is an R function to convert a birthday to age.

# Birthday to age
birthday2age <- function(birthday){
  td.y <- as.integer(format.Date(Sys.Date(),"%Y"))
  td.m <- as.integer(format.Date(Sys.Date(),"%m"))
  td.d <- as.integer(format.Date(Sys.Date(),"%d"))
  bd.y <- as.integer(format.Date(birthday,"%Y"))
  bd.m <- as.integer(format.Date(birthday,"%m"))
  bd.d <- as.integer(format.Date(birthday,"%d"))
  return(td.y-bd.y-(td.m<bd.m||(td.m==bd.m&&td.d<bd.d)))
}
> birthday2age("1986-09-21")
[1] 27

cake
Happy birthday with R.

Reference:
年齢の計算は暗算でもできる

Posted in R

RForcecom – An R package provides the connection between R and Salesforce.com

In this post, I’ll introduce an R package RForcecom and its usage. As you may know, R statistical computing environment is the most populous statistical computing software, and Salesforce.com is the world’s most innovative cloud-computing based SaaS (Software-as-a-Service) CRM package.

RForcecom enables you to connect to Salesforce.com from R. It is provided as an add-on package of R and its source code are available at github.

1. Install the latest version of R
You can download the latest R statistical computing environment from the R-Project website.
http://cran.r-project.org/

2. Install and load the RForcecom
Type the commands from your R console to install and load the RForcecom.

install.packages("RForcecom")
library(RForcecom)

3. Sign in to Force.com or Salesforce.com
To sign in to the Salesforce.com, use rforcecom.login() function. Set your username, password, instance URL, API version as follows.
Note: DO NOT FORGET your security token in password field.

username <- "yourname@yourcompany.com"
password <- "YourPasswordSECURITY_TOKEN"
instanceURL <- "https://na14.salesforce.com/"
apiVersion <- "26.0"
(session <- rforcecom.login(username, password, instanceURL, apiVersion))

rforcecom-02

4. Retrieving records
To retrieve the dataset, use rforcecom.retrieve() function. Set parameters as follows.

objectName <- "Account"
fields <- c("Id", "Name", "Phone")
rforcecom.retrieve(session, objectName, fields)

rforcecom-03
rforcecom-04

5. Execute a SOQL
To retrieve the dataset using SOQL (Salesforce Object Query Language), use rforcecom.query() function. Set parameters as follows.

soqlQuery <- "SELECT Id, Name, Phone FROM Account WHERE AnnualRevenue > 50000 LIMIT 5"
rforcecom.query(session, soqlQuery)

rforcecom-05

6. Create a record
To Create a record, use rforcecom.insert() function.

objectName <- "Account"
fields <- c(Name="R Analytics Service Ltd", Phone="5555-5555-5555")
rforcecom.create(session, objectName, fields)

rforcecom-08
rforcecom-07

7. Retrieve a server timestamp
To retrieve a server timestamp from Salesforce.com server, use rforcecom.getServerTimestamp() function.

rforcecom.getServerTimestamp(session)

rforcecom-06

These procedures are very easy and are very useful for projects using R and Salesforce.com. Next post, I’ll introduce an example of a use case of RForcecom.

RForcecom website

Posted in R, Salesforce.com
Follow

Get every new post delivered to your Inbox.