An introduction to the Open Source statistical and data analysis programming language which offers an alternative to commercial products such as SAS and SPSS.
I first came across R a couple of years ago working for a Government Agency where I used it briefly for a POC where we were building Kohonen Networks before moving to SAS at the behest of the client. Since then I haven't come across it and had just about forgotten its existence until I read this article in the New York Times, Data Analysts Captivated by R’s Power and subsequent post which inspired me to have another look.
What is it and who uses it:
R is a statistical programming language which in very basic terms can be thought of as a supercharged version of Excel, it comes pre-canned with packages for organising and manipulating data and there is a large repository of user driven packages which can be downloaded - giving you a very strong statistical library to begin with which can be utilised straight away or tweaked to your specific needs. So whether you are doing financial modelling, basket analysis or social network analysis R should be a tool you can you use.
The product seems to be getting popular and has been adopted by firms in a diverse range of industries – including Google, Financial Services Organisations, Government Agencies and essentially everywhere and anywhere there is large volumes of data which needs to be analysed. The best news about its rise in popularity is the amount of packages which are freely available to use. There is also a commercial offering from REvolution.
R is not a super technical tool and you won't need a Computing or Maths degree to get going but you will need a decent knowledge of data and the statistics or analysis that you want to use - when I started working heavily with Stats I found this book, Business Statistics very useful to get back up to speed with the maths and there is a lot of material on the web.
I want it:
R can be installed for Windows, Mac or Linux from this location http://www.r-project.org/. Once installed you are ready to use R:
R can be used as a command line calculator
To see some demos of the visualisation techniques you can run the following command:
Linux Screen Shots on the left, Windows on the right.
There is a lot of documentation available to get you started and to get deep into R.
Show me more:
R covers a very large subject area so it would be difficult if not impossible for me to go into an example which everyone can relate to, so I am going to show a very basic demo so that you can get going with your data sets. In this example I am going to import a basic csv file called Orders and create a basic chart.
1. Get data in...
Possibly the most important step. Whether you are using Excel, CSV files, Oracle, SQL Server or another data source the power of R will only be seen once the data is in a position where it can be manipulated. See here for the details of how to import data - http://cran.r-project.org/doc/manuals/R-data.html#Top.
The csv file, Orders contains the following:
20080101,"Agile Cards",4,6.00,1001,"J Smith"
Now I am going to assign this data to a variable called myOrders, note the parameters to specify the type of csv file (there are many more options):
> myOrders <- read.csv (file="C:/R_Samples/Orders.csv", header=TRUE, sep=",", quote="\"", dec=".")
2. Manipulate the columns and assign them to values Cost and Volume:
> Cost <- cbind (myOrders[gl(nrow(myOrders),1,nrow(myOrders)),4])
> Volume <- cbind (myOrders[gl(nrow(myOrders),1,nrow(myOrders)),3])
3. Plot these points onto a graph and add a title:
> plot (Cost, Volume)
> title ("Hello World - Cost vs Volume Example")
4. Here is one I prepared earlier:
As always I would really like to hear your comments and thoughts,
Technorati Tags: Data Visualisation
,Open Source BI
,Open Source Analytics