Wednesday, September 25, 2019

R - Importing an Excel file

The R-library openxlsx offers a performant tool for importing Excel files.

Example for openxlsx::read.xlsx:

library(openxlsx)
openxlsx::read.xlsx(file, 
  sheet = 1, 
  colNames = T, 
  rows = seq(2,100)

This package performs better than the read.xlsx in package xlsx. However, there is one thing to notice: columns in date format will not be correctly handled by openxlsx::read.xlsx. The function xlsx::read.xlsx handles them correctly.

Example for xlsx::read.xlsx:

library(xlsx)
xlsx::read.xlsx(file,
  sheetIndex = 1,
  header = T, 

  rowIndex = seq(2,100)

There is also read.xlsx2 function in the library xlsx, which is written in java, and performs better than xlsx::read.xlsx.

library(xlsx)
xlsx::read.xlsx2(file,
  sheetIndex = 1,
  header = T, 

  rowIndex = seq(2,100)

Using the xslx functions requires that the environments is aware of the location of the java runtime:

Sys.setenv(JAVA_HOME='C:\\ieu\\java\\openjdk-11') 

Tuesday, September 24, 2019

R: create a list of dataframes

gnr1 <- c("1111","4444", "3333","5555", "2222","9999")
gnr2 <- c("7777", "2222","1111","5555","3333","4444","8888")
prime1 <- c(1000, 4000, 3000, 5000, 2000, 9999)
prime2 <- c(7777, 2001, 1001, 5001, 3001, 4001,8888)
val1 <- c(1,2,3,4,5,6)
df1 <- data.frame(gnr1)
df2 <- data.frame(gnr2)
df1$prime <- prime1
df2$prime <- prime2
df1$val <- val1
mylist <- list()
mylist[[1]] <- df1
mylist[[2]] <- df2

Monday, September 23, 2019

R: DataFrames : creating a R Dataframe


# how to create a dataframe in r > diets <- data.frame ('diet'=1:4, 'protein'=c(0,0,1,1), 'vitamin'=c(0,1,0,1))

R: DataFrame: Selecting A Subset of a R Data Frame, merging.

Notes from the Tutorial "Meet The R Dataframe: Examples of Manipulating Data In R":

Use the ChickWeight dataset for this example

data("ChickWeight")


> head (ChickWeight)
  weight Time Chick Diet
1     42    0     1    1
2     51    2     1    1
3     59    4     1    1

Selecting A Subset of a R Data Frame
1. using the function subset
> subset(ChickWeight, Diet==4)

2. with a conditional indexing
> ChickWeight[ChickWeight$Diet==4,]

3. using the function which
> ChickWeight[which((ChickWeight$Diet == 4) & (ChickWeight$Time==21)), names(ChickWeight) %in% c("weight","Time")]

Wednesday, September 4, 2019

Outer join recap

Given the two tables COUNTRIES and LOCATIONS, in a 1-> m relationship, with a foreign key in the LOCATIONS table.

Not all the countries have a location child record.

These two statements illustrate the use of a left outer join:

1. Display all the records of LOCATIONS, and the related records in COUNTRIES:

select c.country_name, loc.*
from locations loc
left outer join countries c 
on c.country_id = loc.country_id;

The result is equivalent (in this case) to a inner join, since no record in LOCATIONS have no parent in COUNTRIES.

2. Display all the records of COUNTRIES, and the related records in LOCATIONS (if any):

select c.country_name, loc.city
from countries c
left outer join locations loc 
on c.country_id = loc.country_id
order by c.country_name;

COUNTRY_NAME                         NVL(LOC.CITY,'NULL')          
------------------------------------ ------------------------------
Argentina                                NULL                       
Australia                                Sydney                     
Belgium                                  NULL                       
Brazil                                   Sao Paulo                  
...