Trying to format Google Analytics data ready for machine learning, need help looping through a list of values and create the same number of columnsreplace blanks in numpy arrayhow do you filter pandas dataframes by multiple columnsClassifying Gender (and likely age range) from First NameHow do i convert 30 categories into numbers for scikitMost efficient way to aggregate tabular data meeting certain conditions in Python in O(1) time?Accuracy rate of Keras neural network not changingVisualize data from one columnSplitting a columnnormalization of categorical variableImbalanced dataset in .csv

How do campaign rallies gain candidates votes?

how to add 1 milliseconds on a datetime string?

What do teaching faculty do during semester breaks?

Determine if a triangle is equilateral, isosceles, or scalene

Is a normal-sized rug with the Animate Objects spell cast on it able to carry a person and fly?

How can I stop myself from micromanaging other PCs' actions?

Keeping an "hot eyeball planet" wet

Why do people say "I am broke" instead of "I am broken"?

Area of parallelogram = Area of square. Shear transform

How do professional electronic musicians/sound engineers combat listening fatigue?

How did C64 games handle music during gameplay?

Is the 2-Category of groupoids locally presentable?

What should I say when a company asks you why someone (a friend) who was fired left?

How to write a sincerely religious protagonist without preaching or affirming or judging their worldview?

Why keep the bed heated after initial layer(s) with PLA (or PETG)?

Is there a published campaign where a missing artifact or a relic is creating trouble by its absence?

Inadvertently nuked my disk permission structure - why?

How to optimize IN query on indexed column

Big Sample size, Small coefficients, significant results. What should I do?

Company requiring me to let them review research from before I was hired

How may I shorten this shell script?

What are the exact meanings of roll, pitch and yaw?

Can two figures have the same area, perimeter, and same number of segments have different shape?

Why is the return type for ftell not fpos_t?



Trying to format Google Analytics data ready for machine learning, need help looping through a list of values and create the same number of columns


replace blanks in numpy arrayhow do you filter pandas dataframes by multiple columnsClassifying Gender (and likely age range) from First NameHow do i convert 30 categories into numbers for scikitMost efficient way to aggregate tabular data meeting certain conditions in Python in O(1) time?Accuracy rate of Keras neural network not changingVisualize data from one columnSplitting a columnnormalization of categorical variableImbalanced dataset in .csv






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








0















I am currently working with analytics data that features the landing page of users and I have already made the Gender and Age Bracket data friendly for machine learning via the following method:



age = dataset.pop('ga:userAgeBracket')
gender = dataset.pop('ga:userGender')
dataset['18-24'] = (age == '18-24') * 1.0
dataset['25-34'] = (age == '25-34') * 1.0
dataset['35-44'] = (age == '35-44') * 1.0
dataset['45-54'] = (age == '45-54') * 1.0
dataset['55-64'] = (age == '55-64') * 1.0
dataset['65+'] = (age == '65+') * 1.0
dataset['Male'] = (gender == 'male') * 1.0
dataset['Female'] = (gender == 'female') * 1.0


This method works fine for data that has only a few different variants.



However, ga:landingPagePath can potentially have an infinite number of data points and I was wondering if there is an easy way to loop through this and create the same column structure as I have with the other data points



Just for reference, this is how the dataset looks after I clean the gender and age column:
enter image description here



The number of unique URLs for this particular dataset with a REGEX filter I have in place is about 70+










share|improve this question
























  • Clarifying question: What do you mean by same column structure? A problem you'll run into down the road is sparsity. For example, you might have only 1 hit to a rare landing page (and you'll have a lot of these). You might think instead about creating classes based on the page path instead. So: example.com/books/book-1.html would get just the class "books". My opinion is you'll have to roll URL data up into a higher-level class.

    – Jarad
    Mar 26 at 15:59











  • Hi @Jarad, There is a filter in place to prevent obscure URLs from popping up in the pull from Analytics, so in total there are about 70+ URLs that I need to create columns for

    – Yarry T
    Mar 26 at 16:00












  • Do you know about pd.get_dummies()? You could also use scikit-learn's label_binarize I believe.

    – Jarad
    Mar 26 at 16:16


















0















I am currently working with analytics data that features the landing page of users and I have already made the Gender and Age Bracket data friendly for machine learning via the following method:



age = dataset.pop('ga:userAgeBracket')
gender = dataset.pop('ga:userGender')
dataset['18-24'] = (age == '18-24') * 1.0
dataset['25-34'] = (age == '25-34') * 1.0
dataset['35-44'] = (age == '35-44') * 1.0
dataset['45-54'] = (age == '45-54') * 1.0
dataset['55-64'] = (age == '55-64') * 1.0
dataset['65+'] = (age == '65+') * 1.0
dataset['Male'] = (gender == 'male') * 1.0
dataset['Female'] = (gender == 'female') * 1.0


This method works fine for data that has only a few different variants.



However, ga:landingPagePath can potentially have an infinite number of data points and I was wondering if there is an easy way to loop through this and create the same column structure as I have with the other data points



Just for reference, this is how the dataset looks after I clean the gender and age column:
enter image description here



The number of unique URLs for this particular dataset with a REGEX filter I have in place is about 70+










share|improve this question
























  • Clarifying question: What do you mean by same column structure? A problem you'll run into down the road is sparsity. For example, you might have only 1 hit to a rare landing page (and you'll have a lot of these). You might think instead about creating classes based on the page path instead. So: example.com/books/book-1.html would get just the class "books". My opinion is you'll have to roll URL data up into a higher-level class.

    – Jarad
    Mar 26 at 15:59











  • Hi @Jarad, There is a filter in place to prevent obscure URLs from popping up in the pull from Analytics, so in total there are about 70+ URLs that I need to create columns for

    – Yarry T
    Mar 26 at 16:00












  • Do you know about pd.get_dummies()? You could also use scikit-learn's label_binarize I believe.

    – Jarad
    Mar 26 at 16:16














0












0








0








I am currently working with analytics data that features the landing page of users and I have already made the Gender and Age Bracket data friendly for machine learning via the following method:



age = dataset.pop('ga:userAgeBracket')
gender = dataset.pop('ga:userGender')
dataset['18-24'] = (age == '18-24') * 1.0
dataset['25-34'] = (age == '25-34') * 1.0
dataset['35-44'] = (age == '35-44') * 1.0
dataset['45-54'] = (age == '45-54') * 1.0
dataset['55-64'] = (age == '55-64') * 1.0
dataset['65+'] = (age == '65+') * 1.0
dataset['Male'] = (gender == 'male') * 1.0
dataset['Female'] = (gender == 'female') * 1.0


This method works fine for data that has only a few different variants.



However, ga:landingPagePath can potentially have an infinite number of data points and I was wondering if there is an easy way to loop through this and create the same column structure as I have with the other data points



Just for reference, this is how the dataset looks after I clean the gender and age column:
enter image description here



The number of unique URLs for this particular dataset with a REGEX filter I have in place is about 70+










share|improve this question
















I am currently working with analytics data that features the landing page of users and I have already made the Gender and Age Bracket data friendly for machine learning via the following method:



age = dataset.pop('ga:userAgeBracket')
gender = dataset.pop('ga:userGender')
dataset['18-24'] = (age == '18-24') * 1.0
dataset['25-34'] = (age == '25-34') * 1.0
dataset['35-44'] = (age == '35-44') * 1.0
dataset['45-54'] = (age == '45-54') * 1.0
dataset['55-64'] = (age == '55-64') * 1.0
dataset['65+'] = (age == '65+') * 1.0
dataset['Male'] = (gender == 'male') * 1.0
dataset['Female'] = (gender == 'female') * 1.0


This method works fine for data that has only a few different variants.



However, ga:landingPagePath can potentially have an infinite number of data points and I was wondering if there is an easy way to loop through this and create the same column structure as I have with the other data points



Just for reference, this is how the dataset looks after I clean the gender and age column:
enter image description here



The number of unique URLs for this particular dataset with a REGEX filter I have in place is about 70+







python python-3.x pandas numpy machine-learning






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 26 at 16:06







Yarry T

















asked Mar 26 at 15:50









Yarry TYarry T

5910 bronze badges




5910 bronze badges












  • Clarifying question: What do you mean by same column structure? A problem you'll run into down the road is sparsity. For example, you might have only 1 hit to a rare landing page (and you'll have a lot of these). You might think instead about creating classes based on the page path instead. So: example.com/books/book-1.html would get just the class "books". My opinion is you'll have to roll URL data up into a higher-level class.

    – Jarad
    Mar 26 at 15:59











  • Hi @Jarad, There is a filter in place to prevent obscure URLs from popping up in the pull from Analytics, so in total there are about 70+ URLs that I need to create columns for

    – Yarry T
    Mar 26 at 16:00












  • Do you know about pd.get_dummies()? You could also use scikit-learn's label_binarize I believe.

    – Jarad
    Mar 26 at 16:16


















  • Clarifying question: What do you mean by same column structure? A problem you'll run into down the road is sparsity. For example, you might have only 1 hit to a rare landing page (and you'll have a lot of these). You might think instead about creating classes based on the page path instead. So: example.com/books/book-1.html would get just the class "books". My opinion is you'll have to roll URL data up into a higher-level class.

    – Jarad
    Mar 26 at 15:59











  • Hi @Jarad, There is a filter in place to prevent obscure URLs from popping up in the pull from Analytics, so in total there are about 70+ URLs that I need to create columns for

    – Yarry T
    Mar 26 at 16:00












  • Do you know about pd.get_dummies()? You could also use scikit-learn's label_binarize I believe.

    – Jarad
    Mar 26 at 16:16

















Clarifying question: What do you mean by same column structure? A problem you'll run into down the road is sparsity. For example, you might have only 1 hit to a rare landing page (and you'll have a lot of these). You might think instead about creating classes based on the page path instead. So: example.com/books/book-1.html would get just the class "books". My opinion is you'll have to roll URL data up into a higher-level class.

– Jarad
Mar 26 at 15:59





Clarifying question: What do you mean by same column structure? A problem you'll run into down the road is sparsity. For example, you might have only 1 hit to a rare landing page (and you'll have a lot of these). You might think instead about creating classes based on the page path instead. So: example.com/books/book-1.html would get just the class "books". My opinion is you'll have to roll URL data up into a higher-level class.

– Jarad
Mar 26 at 15:59













Hi @Jarad, There is a filter in place to prevent obscure URLs from popping up in the pull from Analytics, so in total there are about 70+ URLs that I need to create columns for

– Yarry T
Mar 26 at 16:00






Hi @Jarad, There is a filter in place to prevent obscure URLs from popping up in the pull from Analytics, so in total there are about 70+ URLs that I need to create columns for

– Yarry T
Mar 26 at 16:00














Do you know about pd.get_dummies()? You could also use scikit-learn's label_binarize I believe.

– Jarad
Mar 26 at 16:16






Do you know about pd.get_dummies()? You could also use scikit-learn's label_binarize I believe.

– Jarad
Mar 26 at 16:16













0






active

oldest

votes










Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55361277%2ftrying-to-format-google-analytics-data-ready-for-machine-learning-need-help-loo%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes




Is this question similar to what you get asked at work? Learn more about asking and sharing private information with your coworkers using Stack Overflow for Teams.







Is this question similar to what you get asked at work? Learn more about asking and sharing private information with your coworkers using Stack Overflow for Teams.



















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55361277%2ftrying-to-format-google-analytics-data-ready-for-machine-learning-need-help-loo%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript