Wednesday, May 29, 2019

R : remplace a String

I have a DataFrame sampleUvg50 with a column GNR in this format:

=TEXT(12040373;00000000)

And I would like to get rid of the parts "=TEXT(" and ";00000000)"

To obtain a string replacement in R, I can use the gsub function with a reg exp as follow:

sampleUvg50$GNR <- gsub(";\\d*\\)", "", gsub("=TEXT\\(", "", sampleUvg50$GNR));

Friday, May 24, 2019

R: select from data frame (join types):

Selecting data in Data Frame with dplyr:

Find entries in List that are in another list

df1 %>%
  inner_join(df2, by="myColumn")

Find entries in List that are not in another list

df1 %>%
  anti_join(df2, by="myColumn")

Wednesday, May 15, 2019

Naive Bayes Classifier

The basic asumption when using a Naive Bayes classifier is that each feature pair being classified is independant of each other and contributes equally to the outcome.

Naive Bayes classifiers have worked quite well in many real-world situations, e.g. 
document classification and spam filtering. 

They require a small amount of training data to estimate the necessary parameters.

Naive Bayes learners and classifiers can be extremely fast compared to more sophisticated methods.