Trying to format Google Analytics data ready for machine learning, need help looping through a list of values and create the same number of columnsreplace blanks in numpy arrayhow do you filter pandas dataframes by multiple columnsClassifying Gender (and likely age range) from First NameHow do i convert 30 categories into numbers for scikitMost efficient way to aggregate tabular data meeting certain conditions in Python in O(1) time?Accuracy rate of Keras neural network not changingVisualize data from one columnSplitting a columnnormalization of categorical variableImbalanced dataset in .csv

Multi tool use
How do campaign rallies gain candidates votes?
how to add 1 milliseconds on a datetime string?
What do teaching faculty do during semester breaks?
Determine if a triangle is equilateral, isosceles, or scalene
Is a normal-sized rug with the Animate Objects spell cast on it able to carry a person and fly?
How can I stop myself from micromanaging other PCs' actions?
Keeping an "hot eyeball planet" wet
Why do people say "I am broke" instead of "I am broken"?
Area of parallelogram = Area of square. Shear transform
How do professional electronic musicians/sound engineers combat listening fatigue?
How did C64 games handle music during gameplay?
Is the 2-Category of groupoids locally presentable?
What should I say when a company asks you why someone (a friend) who was fired left?
How to write a sincerely religious protagonist without preaching or affirming or judging their worldview?
Why keep the bed heated after initial layer(s) with PLA (or PETG)?
Is there a published campaign where a missing artifact or a relic is creating trouble by its absence?
Inadvertently nuked my disk permission structure - why?
How to optimize IN query on indexed column
Big Sample size, Small coefficients, significant results. What should I do?
Company requiring me to let them review research from before I was hired
How may I shorten this shell script?
What are the exact meanings of roll, pitch and yaw?
Can two figures have the same area, perimeter, and same number of segments have different shape?
Why is the return type for ftell not fpos_t?
Trying to format Google Analytics data ready for machine learning, need help looping through a list of values and create the same number of columns
replace blanks in numpy arrayhow do you filter pandas dataframes by multiple columnsClassifying Gender (and likely age range) from First NameHow do i convert 30 categories into numbers for scikitMost efficient way to aggregate tabular data meeting certain conditions in Python in O(1) time?Accuracy rate of Keras neural network not changingVisualize data from one columnSplitting a columnnormalization of categorical variableImbalanced dataset in .csv
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
I am currently working with analytics data that features the landing page of users and I have already made the Gender and Age Bracket data friendly for machine learning via the following method:
age = dataset.pop('ga:userAgeBracket')
gender = dataset.pop('ga:userGender')
dataset['18-24'] = (age == '18-24') * 1.0
dataset['25-34'] = (age == '25-34') * 1.0
dataset['35-44'] = (age == '35-44') * 1.0
dataset['45-54'] = (age == '45-54') * 1.0
dataset['55-64'] = (age == '55-64') * 1.0
dataset['65+'] = (age == '65+') * 1.0
dataset['Male'] = (gender == 'male') * 1.0
dataset['Female'] = (gender == 'female') * 1.0
This method works fine for data that has only a few different variants.
However, ga:landingPagePath can potentially have an infinite number of data points and I was wondering if there is an easy way to loop through this and create the same column structure as I have with the other data points
Just for reference, this is how the dataset looks after I clean the gender and age column:
The number of unique URLs for this particular dataset with a REGEX filter I have in place is about 70+
python python-3.x pandas numpy machine-learning
add a comment |
I am currently working with analytics data that features the landing page of users and I have already made the Gender and Age Bracket data friendly for machine learning via the following method:
age = dataset.pop('ga:userAgeBracket')
gender = dataset.pop('ga:userGender')
dataset['18-24'] = (age == '18-24') * 1.0
dataset['25-34'] = (age == '25-34') * 1.0
dataset['35-44'] = (age == '35-44') * 1.0
dataset['45-54'] = (age == '45-54') * 1.0
dataset['55-64'] = (age == '55-64') * 1.0
dataset['65+'] = (age == '65+') * 1.0
dataset['Male'] = (gender == 'male') * 1.0
dataset['Female'] = (gender == 'female') * 1.0
This method works fine for data that has only a few different variants.
However, ga:landingPagePath can potentially have an infinite number of data points and I was wondering if there is an easy way to loop through this and create the same column structure as I have with the other data points
Just for reference, this is how the dataset looks after I clean the gender and age column:
The number of unique URLs for this particular dataset with a REGEX filter I have in place is about 70+
python python-3.x pandas numpy machine-learning
Clarifying question: What do you mean by same column structure? A problem you'll run into down the road is sparsity. For example, you might have only 1 hit to a rare landing page (and you'll have a lot of these). You might think instead about creating classes based on the page path instead. So: example.com/books/book-1.html would get just the class "books". My opinion is you'll have to roll URL data up into a higher-level class.
– Jarad
Mar 26 at 15:59
Hi @Jarad, There is a filter in place to prevent obscure URLs from popping up in the pull from Analytics, so in total there are about 70+ URLs that I need to create columns for
– Yarry T
Mar 26 at 16:00
Do you know about pd.get_dummies()? You could also use scikit-learn's label_binarize I believe.
– Jarad
Mar 26 at 16:16
add a comment |
I am currently working with analytics data that features the landing page of users and I have already made the Gender and Age Bracket data friendly for machine learning via the following method:
age = dataset.pop('ga:userAgeBracket')
gender = dataset.pop('ga:userGender')
dataset['18-24'] = (age == '18-24') * 1.0
dataset['25-34'] = (age == '25-34') * 1.0
dataset['35-44'] = (age == '35-44') * 1.0
dataset['45-54'] = (age == '45-54') * 1.0
dataset['55-64'] = (age == '55-64') * 1.0
dataset['65+'] = (age == '65+') * 1.0
dataset['Male'] = (gender == 'male') * 1.0
dataset['Female'] = (gender == 'female') * 1.0
This method works fine for data that has only a few different variants.
However, ga:landingPagePath can potentially have an infinite number of data points and I was wondering if there is an easy way to loop through this and create the same column structure as I have with the other data points
Just for reference, this is how the dataset looks after I clean the gender and age column:
The number of unique URLs for this particular dataset with a REGEX filter I have in place is about 70+
python python-3.x pandas numpy machine-learning
I am currently working with analytics data that features the landing page of users and I have already made the Gender and Age Bracket data friendly for machine learning via the following method:
age = dataset.pop('ga:userAgeBracket')
gender = dataset.pop('ga:userGender')
dataset['18-24'] = (age == '18-24') * 1.0
dataset['25-34'] = (age == '25-34') * 1.0
dataset['35-44'] = (age == '35-44') * 1.0
dataset['45-54'] = (age == '45-54') * 1.0
dataset['55-64'] = (age == '55-64') * 1.0
dataset['65+'] = (age == '65+') * 1.0
dataset['Male'] = (gender == 'male') * 1.0
dataset['Female'] = (gender == 'female') * 1.0
This method works fine for data that has only a few different variants.
However, ga:landingPagePath can potentially have an infinite number of data points and I was wondering if there is an easy way to loop through this and create the same column structure as I have with the other data points
Just for reference, this is how the dataset looks after I clean the gender and age column:
The number of unique URLs for this particular dataset with a REGEX filter I have in place is about 70+
python python-3.x pandas numpy machine-learning
python python-3.x pandas numpy machine-learning
edited Mar 26 at 16:06
Yarry T
asked Mar 26 at 15:50
Yarry TYarry T
5910 bronze badges
5910 bronze badges
Clarifying question: What do you mean by same column structure? A problem you'll run into down the road is sparsity. For example, you might have only 1 hit to a rare landing page (and you'll have a lot of these). You might think instead about creating classes based on the page path instead. So: example.com/books/book-1.html would get just the class "books". My opinion is you'll have to roll URL data up into a higher-level class.
– Jarad
Mar 26 at 15:59
Hi @Jarad, There is a filter in place to prevent obscure URLs from popping up in the pull from Analytics, so in total there are about 70+ URLs that I need to create columns for
– Yarry T
Mar 26 at 16:00
Do you know about pd.get_dummies()? You could also use scikit-learn's label_binarize I believe.
– Jarad
Mar 26 at 16:16
add a comment |
Clarifying question: What do you mean by same column structure? A problem you'll run into down the road is sparsity. For example, you might have only 1 hit to a rare landing page (and you'll have a lot of these). You might think instead about creating classes based on the page path instead. So: example.com/books/book-1.html would get just the class "books". My opinion is you'll have to roll URL data up into a higher-level class.
– Jarad
Mar 26 at 15:59
Hi @Jarad, There is a filter in place to prevent obscure URLs from popping up in the pull from Analytics, so in total there are about 70+ URLs that I need to create columns for
– Yarry T
Mar 26 at 16:00
Do you know about pd.get_dummies()? You could also use scikit-learn's label_binarize I believe.
– Jarad
Mar 26 at 16:16
Clarifying question: What do you mean by same column structure? A problem you'll run into down the road is sparsity. For example, you might have only 1 hit to a rare landing page (and you'll have a lot of these). You might think instead about creating classes based on the page path instead. So: example.com/books/book-1.html would get just the class "books". My opinion is you'll have to roll URL data up into a higher-level class.
– Jarad
Mar 26 at 15:59
Clarifying question: What do you mean by same column structure? A problem you'll run into down the road is sparsity. For example, you might have only 1 hit to a rare landing page (and you'll have a lot of these). You might think instead about creating classes based on the page path instead. So: example.com/books/book-1.html would get just the class "books". My opinion is you'll have to roll URL data up into a higher-level class.
– Jarad
Mar 26 at 15:59
Hi @Jarad, There is a filter in place to prevent obscure URLs from popping up in the pull from Analytics, so in total there are about 70+ URLs that I need to create columns for
– Yarry T
Mar 26 at 16:00
Hi @Jarad, There is a filter in place to prevent obscure URLs from popping up in the pull from Analytics, so in total there are about 70+ URLs that I need to create columns for
– Yarry T
Mar 26 at 16:00
Do you know about pd.get_dummies()? You could also use scikit-learn's label_binarize I believe.
– Jarad
Mar 26 at 16:16
Do you know about pd.get_dummies()? You could also use scikit-learn's label_binarize I believe.
– Jarad
Mar 26 at 16:16
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55361277%2ftrying-to-format-google-analytics-data-ready-for-machine-learning-need-help-loo%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Is this question similar to what you get asked at work? Learn more about asking and sharing private information with your coworkers using Stack Overflow for Teams.
Is this question similar to what you get asked at work? Learn more about asking and sharing private information with your coworkers using Stack Overflow for Teams.
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55361277%2ftrying-to-format-google-analytics-data-ready-for-machine-learning-need-help-loo%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
uW7fODsCEbuci5tvIu,1G9VDSVWvLw
Clarifying question: What do you mean by same column structure? A problem you'll run into down the road is sparsity. For example, you might have only 1 hit to a rare landing page (and you'll have a lot of these). You might think instead about creating classes based on the page path instead. So: example.com/books/book-1.html would get just the class "books". My opinion is you'll have to roll URL data up into a higher-level class.
– Jarad
Mar 26 at 15:59
Hi @Jarad, There is a filter in place to prevent obscure URLs from popping up in the pull from Analytics, so in total there are about 70+ URLs that I need to create columns for
– Yarry T
Mar 26 at 16:00
Do you know about pd.get_dummies()? You could also use scikit-learn's label_binarize I believe.
– Jarad
Mar 26 at 16:16