We have previously covered the basic process flow when it comes to text mining and how RapidMiner makes this very intuitive to set up. The figure below shows the high level process flow for most text mining activities. In this article – which contains some excerpts from our upcoming book on Predictive Analytics using RapidMiner – we […]
As data science extends its domain over more and more types of businesses, it is inevitable that we will need to work with unstructured text data. This means dirty data takes on many different avatars. Cleaning structured data usually means accounting for “incorrectly” entered numbers, which implies outliers; accounting for missing values, and so on. With […]
In the first article of this series we discussed how regular expressions, when appropriately set up can increase the power of RapidMiner’s operators. We also compared the two most popular open source data mining tools for their preprocessing abilities applied to text mining. We found that R, with its very solid data framing capabilities can excel when […]
R-bloggers recently posted an interesting text mining article which attempts to text mine the entire collected works of William Shakespeare. R really shines at some aspects of data mining, particularly in preprocessing. The large number of functional shortcuts available for working with data frames virtually spoils the analysts! Here we do a step by step comparison of […]
Inspired by the really cool video series on text mining by Vancouver Data Blog, we are going to kick off our article series on text mining (also) using RapidMiner. Neil McGuigan does a great job covering this topic in those compact 10-min videos. In our article series we will try to get into a little bit […]
Understanding the needs of your customers is a critical aspect of business. This requires proper customer segmentation. There are many different approaches to segmenting customers: based on their behavior, based on their business structures (e.g. small, medium, large) or even based on the amount of revenues they generate for you, as performed in an 80-20 customer analysis. Text […]
In a recent article How Predictive Analytics Can Boost Your Social Media Campaigns, we saw that social media data is becoming a great source of data for understanding product trends by applying predictive analytics. We also emphasized that in order to stay ahead of competition, businesses need to make better use of the social media chatter […]
By now most people may be numb to the overused statistic about the growth of data. I am not going to repeat that, but here is one stat which was quite intriguing: 99.5% of all the data that is collected is never analyzed. Does this mean more opportunity for data scientists? Or is this data even worth […]
There are terabytes of data that come from surveys and most of this is unstructured – the kind where respondents type in their views in an open box. Of course these questions come in mixed with the standard survey type structured questions “On a scale of 1 to 5, how happy …”. Structured or numerical […]
Text analytics with AI involves converting unstructured data into a semi-structured format before applying any standard machine learning algorithm. There are several intermediate steps that are necessary till we get to this point. With so many steps needed to deliver the final result, the process design can get pretty complicated which also makes debugging or experimenting […]
Password Reset
Please enter your e-mail address. You will receive a new password via e-mail.