Introduction to numerical features¶
Column Decsriptions¶
Following are the numerical features in the dataset:
Administrative¶
These are the number of pages visted by the vistor about account management.
Administrative duration¶
Total amount of time (in seconds) spent by the visitor on account management related pages.
Informational¶
Number of pages visited by the visitor about Web site, communication and address information of the shopping site.
Informational duration¶
Total amount of time (in seconds) spent by the visitor on informational pages.
Bounce rate¶
Average bounce rate value of the pages visited by the visitor. It refers to the percentage of visitors who enter the ite from that page and then leave without triggering any other requests to the analytics server during that session.
Exit rate¶
Average exit rate value of the pages visited by the visitor. It is the value for a specific webpage calculated for all pageviews to the page, the percentage that were the last in the session.
Page value¶
Average page value of the pages visited by the visitor. It represents the average value for a web page that a user visted before completing an e-commerce transaction.
Special day¶
Closeness of the site visiting time to a special day
Column statistics¶
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12330 entries, 0 to 12329
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Administrative 12330 non-null int64
1 Administrative_Duration 12330 non-null float64
2 Informational 12330 non-null int64
3 Informational_Duration 12330 non-null float64
4 ProductRelated 12330 non-null int64
5 ProductRelated_Duration 12330 non-null float64
6 BounceRates 12330 non-null float64
7 ExitRates 12330 non-null float64
8 PageValues 12330 non-null float64
9 SpecialDay 12330 non-null float64
dtypes: float64(7), int64(3)
memory usage: 963.4 KB
There are no null entries in the numerical data. Have both integer and floating types of data.
Administrative | Administrative_Duration | Informational | Informational_Duration | ProductRelated | ProductRelated_Duration | BounceRates | ExitRates | PageValues | SpecialDay | |
---|---|---|---|---|---|---|---|---|---|---|
count | 12330.000000 | 12330.000000 | 12330.000000 | 12330.000000 | 12330.000000 | 12330.000000 | 12330.000000 | 12330.000000 | 12330.000000 | 12330.000000 |
mean | 2.315166 | 80.818611 | 0.503569 | 34.472398 | 31.731468 | 1194.746220 | 0.022191 | 0.043073 | 5.889258 | 0.061427 |
std | 3.321784 | 176.779107 | 1.270156 | 140.749294 | 44.475503 | 1913.669288 | 0.048488 | 0.048597 | 18.568437 | 0.198917 |
min | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
25% | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 7.000000 | 184.137500 | 0.000000 | 0.014286 | 0.000000 | 0.000000 |
50% | 1.000000 | 7.500000 | 0.000000 | 0.000000 | 18.000000 | 598.936905 | 0.003112 | 0.025156 | 0.000000 | 0.000000 |
75% | 4.000000 | 93.256250 | 0.000000 | 0.000000 | 38.000000 | 1464.157214 | 0.016813 | 0.050000 | 0.000000 | 0.000000 |
max | 27.000000 | 3398.750000 | 24.000000 | 2549.375000 | 705.000000 | 63973.522230 | 0.200000 | 0.200000 | 361.763742 | 1.000000 |
The mean duration of time spent by vistor on different types are people differ widely. Users are more likely to visit and spend their time on product related pages followed by administrative pages and then informational pages. Though the maximum value for the time spent on product related pages in a session seems to be off-beat. Informational related pages have highly skewed values.
It should be noted that the maximum value of bounce rates and exit rates is 0.2. Bounce rates and special day columns have highly skewed values.