Data is divided only in two types of categories one is numerical (continuous and discrete)and other is categorical i.e character data which is divided into 3 parts (Binary, Nominal and ordinal)only the representation of data changes. Although there is a lot of data types available in different data systems like int, real, float, double, char, Data types changes according to the memory size of the systems. Binary means we have 2 options like either this or that, Nominal and ordinal have a slight difference like in nominal we cannot rank and in ordinal we can rank them also. Here we are talking about the categories however, variable (name of the person) is in any data cannot be categorized although it’s a character data.
To make it simpler data types are divided into two parts numeric and character. Here we can take an example to describe the numerical as well as categorical data :
This is kind of data we are referring to: data which belongs to the banking industry we can use this data for Predictive modeling as well. Here we will make some basic understanding in excel only further we can use this kind of data in Machine Learning training as well. Point is to learn the approach to see the data first.
To understand this data 1st thing we can see is 1st variable is the name of the customer it's pretty ok. 2nd one is customer id ( whenever we will prepare data that contains some key data, we are not talking primary or secondary or foreign key, we are talking about key data which will uniquely identify a row). To explain in general language when we open a bank account in a bank we get a bank account number and we get a customer id as well so both of them are key attributes. We can be identified as a bank customer by that number only.
The 3rd variable is the number of credit cards you have which is in numeric that is discrete like the categorization of the data so whenever we will talk about the numeric data it could be discrete or continuous. Like we can count the number of credit cards it can not be like 3.5 or 2.7, it could be only 1,2,3 that is the discrete data. 4th is the age of the customer which into 18 to 70 minimum is 18 and max is 70 which also a discrete one. Next is gender if the customer which can be male or female what we call them categorical. In categorical data, there are two types of bifiguration whenever the category is divided into two options either this or this here we will call as binary categorical variables. Next one is marital status which has three options married, unmarried as well as divorced, as its also a categorical variable but it's not a binary categorical it can be nominal or ordinal. Whenever a categorical and we can rank them it will become an ordinal( means we can define an order) and where we can not rank them it will become nominal categorical variable. Next variable is a salary which is usually a discrete variable but can be continuous we can deny but usually a discrete and the last one monthly credit card usage there is alphanumeric so this a character.
In a nutshell, we can categorize the variables like in numeric ( discrete or continuous) categorical (Binary, nominal and ordinal).

Comments
Post a Comment