How can I parse CSV file while preserving dictionary values within rowsHow do I sort a list of dictionaries by a value of the dictionary?How do I sort a dictionary by value?How can I check for NaN values?How can I read and parse CSV files in C++?How can I parse a YAML file in PythonParsing CSV files in C#, with headerHow to import CSV file data into a PostgreSQL table?CSV file written with Python has blank lines between each rowReading CSV file and storing values into an arrayHow can I sort a dictionary by key?

Has the United States ever had a non-Christian President?

Why does sound not move through a wall?

How do I, as a DM, handle a party that decides to set up an ambush in a dungeon?

Hostile Divisor Numbers

Understanding ties

Install Firefox without updates

As black, how should one respond to 4. Qe2 by white in the Russian Game, Damiano Variation?

Should homeowners insurance cover the cost of the home?

Motion-trail-like lines

Firefox 66.0.3 (64-bit) addons broken

Does running exec do anything?

In linear regression why does regularisation penalise the parameter values as well?

Why would one crossvalidate the random state number?

How can a hefty sand storm happen in a thin atmosphere like Martian?

Sci-fi/fantasy book - ships on steel runners skating across ice sheets

As a GM, is it bad form to ask for a moment to think when improvising?

What happens if I accidentally leave an app running and click "Install Now" in Software Updater?

GitLab account hacked and repo wiped

Is disk brake effectiveness mitigated by tyres losing traction under strong braking?

Dirichlet series with a single zero

Sparring against two opponents test

Clarification of algebra in moment generating functions

What is the closest airport to the center of the city it serves?

Which US defense organization would respond to an invasion like this?



How can I parse CSV file while preserving dictionary values within rows


How do I sort a list of dictionaries by a value of the dictionary?How do I sort a dictionary by value?How can I check for NaN values?How can I read and parse CSV files in C++?How can I parse a YAML file in PythonParsing CSV files in C#, with headerHow to import CSV file data into a PostgreSQL table?CSV file written with Python has blank lines between each rowReading CSV file and storing values into an arrayHow can I sort a dictionary by key?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








0















*If anyone has suggestions on how to format this better I am more than willing to do so.



This is for a homework assignment, the end goal is to normalize the data and insert into a schema that is in 2NF (Second Normal Form) then perform queries on the database. I am thinking I could store the data in a bunch of lists and then insert into the schema from there, how to setup the schema in a way to ensure 2NF is separate question but any suggestions would really help.



As far as the main question regarding parsing goes I am not expecting someone to just code it up for me but I am very much stumped on how this can be done in a way that preserves the dictionary like formatting of certain values.



Below is a snippet from the CSV file (each row is a movie):



budget , genres, homepage, id, keywords, original_language, original_title, overview,popularity, production_companies, production_countries, release_date, revenue, runtime, spoken_languages, status, tagline, title, vote_average, vote_count

237000000,"[""id"": 28, ""name"": ""Action"", ""id"": 12, ""name"": ""Adventure"", ""id"": 14, ""name"": ""Fantasy"", ""id"": 878, ""name"": ""Science Fiction""]",http://www.avatarmovie.com/,19995,"[""id"": 1463, ""name"": ""culture clash"", ""id"": 2964, ""name"": ""future"", ""id"": 3386, ""name"": ""space war"", ""id"": 3388, ""name"": ""space colony"", ""id"": 3679, ""name"": ""society"", ""id"": 3801, ""name"": ""space travel"", ""id"": 9685, ""name"": ""futuristic"", ""id"": 9840, ""name"": ""romance"", ""id"": 9882, ""name"": ""space"", ""id"": 9951, ""name"": ""alien"", ""id"": 10148, ""name"": ""tribe"", ""id"": 10158, ""name"": ""alien planet"", ""id"": 10987, ""name"": ""cgi"", ""id"": 11399, ""name"": ""marine"", ""id"": 13065, ""name"": ""soldier"", ""id"": 14643, ""name"": ""battle"", ""id"": 14720, ""name"": ""love affair"", ""id"": 165431, ""name"": ""anti war"", ""id"": 193554, ""name"": ""power relations"", ""id"": 206690, ""name"": ""mind and soul"", ""id"": 209714, ""name"": ""3d""]",en,Avatar,"In the 22nd century, a paraplegic Marine is dispatched to the moon Pandora on a unique mission, but becomes torn between following orders and protecting an alien civilization.",150.437577,"[""name"": ""Ingenious Film Partners"", ""id"": 289, ""name"": ""Twentieth Century Fox Film Corporation"", ""id"": 306, ""name"": ""Dune Entertainment"", ""id"": 444, ""name"": ""Lightstorm Entertainment"", ""id"": 574]","[""iso_3166_1"": ""US"", ""name"": ""United States of America"", ""iso_3166_1"": ""GB"", ""name"": ""United Kingdom""]",2009-12-10,2787965087,162,"[""iso_639_1"": ""en"", ""name"": ""English"", ""iso_639_1"": ""es"", ""name"": ""Espau00f1ol""]",Released,Enter the World of Pandora.,Avatar,7.2,11800
300000000,"[""id"": 12, ""name"": ""Adventure"", ""id"": 14, ""name"": ""Fantasy"", ""id"": 28, ""name"": ""Action""]",http://disney.go.com/disneypictures/pirates/,285,"[""id"": 270, ""name"": ""ocean"", ""id"": 726, ""name"": ""drug abuse"", ""id"": 911, ""name"": ""exotic island"", ""id"": 1319, ""name"": ""east india trading company"", ""id"": 2038, ""name"": ""love of one's life"", ""id"": 2052, ""name"": ""traitor"", ""id"": 2580, ""name"": ""shipwreck"", ""id"": 2660, ""name"": ""strong woman"", ""id"": 3799, ""name"": ""ship"", ""id"": 5740, ""name"": ""alliance"", ""id"": 5941, ""name"": ""calypso"", ""id"": 6155, ""name"": ""afterlife"", ""id"": 6211, ""name"": ""fighter"", ""id"": 12988, ""name"": ""pirate"", ""id"": 157186, ""name"": ""swashbuckler"", ""id"": 179430, ""name"": ""aftercreditsstinger""]",en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, has come back to life and is headed to the edge of the Earth with Will Turner and Elizabeth Swann. But nothing is quite as it seems.",139.082615,"[""name"": ""Walt Disney Pictures"", ""id"": 2, ""name"": ""Jerry Bruckheimer Films"", ""id"": 130, ""name"": ""Second Mate Productions"", ""id"": 19936]","[""iso_3166_1"": ""US"", ""name"": ""United States of America""]",2007-05-19,961000000,169,"[""iso_639_1"": ""en"", ""name"": ""English""]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500
245000000,"[""id"": 28, ""name"": ""Action"", ""id"": 12, ""name"": ""Adventure"", ""id"": 80, ""name"": ""Crime""]",http://www.sonypictures.com/movies/spectre/,206647,"[""id"": 470, ""name"": ""spy"", ""id"": 818, ""name"": ""based on novel"", ""id"": 4289, ""name"": ""secret agent"", ""id"": 9663, ""name"": ""sequel"", ""id"": 14555, ""name"": ""mi6"", ""id"": 156095, ""name"": ""british secret service"", ""id"": 158431, ""name"": ""united kingdom""]",en,Spectre,"A cryptic message from Bond’s past sends him on a trail to uncover a sinister organization. While M battles political forces to keep the secret service alive, Bond peels back the layers of deceit to reveal the terrible truth behind SPECTRE.",107.376788,"[""name"": ""Columbia Pictures"", ""id"": 5, ""name"": ""Danjaq"", ""id"": 10761, ""name"": ""B24"", ""id"": 69434]","[""iso_3166_1"": ""GB"", ""name"": ""United Kingdom"", ""iso_3166_1"": ""US"", ""name"": ""United States of America""]",2015-10-26,880674609,148,"[""iso_639_1"": ""fr"", ""name"": ""Franu00e7ais"", ""iso_639_1"": ""en"", ""name"": ""English"", ""iso_639_1"": ""es"", ""name"": ""Espau00f1ol"", ""iso_639_1"": ""it"", ""name"": ""Italiano"", ""iso_639_1"": ""de"", ""name"": ""Deutsch""]",Released,A Plan No One Escapes,Spectre,6.3,4466
250000000,"[""id"": 28, ""name"": ""Action"", ""id"": 80, ""name"": ""Crime"", ""id"": 18, ""name"": ""Drama"", ""id"": 53, ""name"": ""Thriller""]",http://www.thedarkknightrises.com/,49026,"[""id"": 849, ""name"": ""dc comics"", ""id"": 853, ""name"": ""crime fighter"", ""id"": 949, ""name"": ""terrorist"", ""id"": 1308, ""name"": ""secret identity"", ""id"": 1437, ""name"": ""burglar"", ""id"": 3051, ""name"": ""hostage drama"", ""id"": 3562, ""name"": ""time bomb"", ""id"": 6969, ""name"": ""gotham city"", ""id"": 7002, ""name"": ""vigilante"", ""id"": 9665, ""name"": ""cover-up"", ""id"": 9715, ""name"": ""superhero"", ""id"": 9990, ""name"": ""villainess"", ""id"": 10044, ""name"": ""tragic hero"", ""id"": 13015, ""name"": ""terrorism"", ""id"": 14796, ""name"": ""destruction"", ""id"": 18933, ""name"": ""catwoman"", ""id"": 156082, ""name"": ""cat burglar"", ""id"": 156395, ""name"": ""imax"", ""id"": 173272, ""name"": ""flood"", ""id"": 179093, ""name"": ""criminal underworld"", ""id"": 230775, ""name"": ""batman""]",en,The Dark Knight Rises,"Following the death of District Attorney Harvey Dent, Batman assumes responsibility for Dent's crimes to protect the late attorney's reputation and is subsequently hunted by the Gotham City Police Department. Eight years later, Batman encounters the mysterious Selina Kyle and the villainous Bane, a new terrorist leader who overwhelms Gotham's finest. The Dark Knight resurfaces to protect a city that has branded him an enemy.",112.31295,"[""name"": ""Legendary Pictures"", ""id"": 923, ""name"": ""Warner Bros."", ""id"": 6194, ""name"": ""DC Entertainment"", ""id"": 9993, ""name"": ""Syncopy"", ""id"": 9996]","[""iso_3166_1"": ""US"", ""name"": ""United States of America""]",2012-07-16,1084939099,165,"[""iso_639_1"": ""en"", ""name"": ""English""]",Released,The Legend Ends,The Dark Knight Rises,7.6,9106


There are 19 columns (column attributes are shown in row 1), the 4th column attribute "id" is the unique identifier for each row i.e. movie. Certain columns contain a list of dictionaries, for example the genres column in row 2 contains:



[""id"": 28, ""name"": ""Action"", ""id"": 12, ""name"": ""Adventure"", ""id"": 14, ""name"": ""Fantasy"", ""id"": 878, ""name"": ""Science Fiction""]


Below is what I am currently doing which is just resulting in a list of strings for each attribute column:



import csv

budgets = []
genres = []
homepage = []
mainID = []
keywords = []
original_language = []
original_title = []
overview = []
popularity = []
production_companies = []
production_countries = []
release_date = []
revenue = []
runtime = []
spoken_languages = []
status = []
tag_line = []
title = []
vote_avg = []
vote_count = []
key_id = []

with open('tmdb_5000_movies.csv', 'r', encoding='utf-8') as csv_file:
csv_reader = csv.reader(csv_file)
skip_attributes_row = next(csv_reader)
for item in csv_reader:

budgets.append(item[0])
genres.append(item[1])
homepage.append(item[2])
mainID.append(item[3])
keywords.append(item[4])
original_language.append(item[5])
original_title.append(item[6])
overview.append(item[7])
popularity.append(item[8])
production_companies.append(item[9])
production_countries.append(item[10])
release_date.append(item[11])
revenue.append(item[12])
runtime.append(item[13])
spoken_languages.append(item[14])
status.append(item[15])
tag_line.append(item[16])
title.append(item[17])
vote_avg.append(item[18])
vote_count.append(item[19])









share|improve this question






















  • Please provide a sample of expected output.

    – gmds
    Mar 23 at 3:04











  • I can edit the post, essentially I want each column in its own list but columns like "genres" are causing problems and need to be separated into two lists, one storing genre_id and the other storing genre_name.

    – SSJ
    Mar 23 at 3:17











  • "each column in its own list": As I don't see the benefit of doing so you have to flatten the column genres list doing genres.extend(<column generes list>). Are you aware that you are dealing with json column data?

    – stovfl
    Mar 23 at 9:21


















0















*If anyone has suggestions on how to format this better I am more than willing to do so.



This is for a homework assignment, the end goal is to normalize the data and insert into a schema that is in 2NF (Second Normal Form) then perform queries on the database. I am thinking I could store the data in a bunch of lists and then insert into the schema from there, how to setup the schema in a way to ensure 2NF is separate question but any suggestions would really help.



As far as the main question regarding parsing goes I am not expecting someone to just code it up for me but I am very much stumped on how this can be done in a way that preserves the dictionary like formatting of certain values.



Below is a snippet from the CSV file (each row is a movie):



budget , genres, homepage, id, keywords, original_language, original_title, overview,popularity, production_companies, production_countries, release_date, revenue, runtime, spoken_languages, status, tagline, title, vote_average, vote_count

237000000,"[""id"": 28, ""name"": ""Action"", ""id"": 12, ""name"": ""Adventure"", ""id"": 14, ""name"": ""Fantasy"", ""id"": 878, ""name"": ""Science Fiction""]",http://www.avatarmovie.com/,19995,"[""id"": 1463, ""name"": ""culture clash"", ""id"": 2964, ""name"": ""future"", ""id"": 3386, ""name"": ""space war"", ""id"": 3388, ""name"": ""space colony"", ""id"": 3679, ""name"": ""society"", ""id"": 3801, ""name"": ""space travel"", ""id"": 9685, ""name"": ""futuristic"", ""id"": 9840, ""name"": ""romance"", ""id"": 9882, ""name"": ""space"", ""id"": 9951, ""name"": ""alien"", ""id"": 10148, ""name"": ""tribe"", ""id"": 10158, ""name"": ""alien planet"", ""id"": 10987, ""name"": ""cgi"", ""id"": 11399, ""name"": ""marine"", ""id"": 13065, ""name"": ""soldier"", ""id"": 14643, ""name"": ""battle"", ""id"": 14720, ""name"": ""love affair"", ""id"": 165431, ""name"": ""anti war"", ""id"": 193554, ""name"": ""power relations"", ""id"": 206690, ""name"": ""mind and soul"", ""id"": 209714, ""name"": ""3d""]",en,Avatar,"In the 22nd century, a paraplegic Marine is dispatched to the moon Pandora on a unique mission, but becomes torn between following orders and protecting an alien civilization.",150.437577,"[""name"": ""Ingenious Film Partners"", ""id"": 289, ""name"": ""Twentieth Century Fox Film Corporation"", ""id"": 306, ""name"": ""Dune Entertainment"", ""id"": 444, ""name"": ""Lightstorm Entertainment"", ""id"": 574]","[""iso_3166_1"": ""US"", ""name"": ""United States of America"", ""iso_3166_1"": ""GB"", ""name"": ""United Kingdom""]",2009-12-10,2787965087,162,"[""iso_639_1"": ""en"", ""name"": ""English"", ""iso_639_1"": ""es"", ""name"": ""Espau00f1ol""]",Released,Enter the World of Pandora.,Avatar,7.2,11800
300000000,"[""id"": 12, ""name"": ""Adventure"", ""id"": 14, ""name"": ""Fantasy"", ""id"": 28, ""name"": ""Action""]",http://disney.go.com/disneypictures/pirates/,285,"[""id"": 270, ""name"": ""ocean"", ""id"": 726, ""name"": ""drug abuse"", ""id"": 911, ""name"": ""exotic island"", ""id"": 1319, ""name"": ""east india trading company"", ""id"": 2038, ""name"": ""love of one's life"", ""id"": 2052, ""name"": ""traitor"", ""id"": 2580, ""name"": ""shipwreck"", ""id"": 2660, ""name"": ""strong woman"", ""id"": 3799, ""name"": ""ship"", ""id"": 5740, ""name"": ""alliance"", ""id"": 5941, ""name"": ""calypso"", ""id"": 6155, ""name"": ""afterlife"", ""id"": 6211, ""name"": ""fighter"", ""id"": 12988, ""name"": ""pirate"", ""id"": 157186, ""name"": ""swashbuckler"", ""id"": 179430, ""name"": ""aftercreditsstinger""]",en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, has come back to life and is headed to the edge of the Earth with Will Turner and Elizabeth Swann. But nothing is quite as it seems.",139.082615,"[""name"": ""Walt Disney Pictures"", ""id"": 2, ""name"": ""Jerry Bruckheimer Films"", ""id"": 130, ""name"": ""Second Mate Productions"", ""id"": 19936]","[""iso_3166_1"": ""US"", ""name"": ""United States of America""]",2007-05-19,961000000,169,"[""iso_639_1"": ""en"", ""name"": ""English""]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500
245000000,"[""id"": 28, ""name"": ""Action"", ""id"": 12, ""name"": ""Adventure"", ""id"": 80, ""name"": ""Crime""]",http://www.sonypictures.com/movies/spectre/,206647,"[""id"": 470, ""name"": ""spy"", ""id"": 818, ""name"": ""based on novel"", ""id"": 4289, ""name"": ""secret agent"", ""id"": 9663, ""name"": ""sequel"", ""id"": 14555, ""name"": ""mi6"", ""id"": 156095, ""name"": ""british secret service"", ""id"": 158431, ""name"": ""united kingdom""]",en,Spectre,"A cryptic message from Bond’s past sends him on a trail to uncover a sinister organization. While M battles political forces to keep the secret service alive, Bond peels back the layers of deceit to reveal the terrible truth behind SPECTRE.",107.376788,"[""name"": ""Columbia Pictures"", ""id"": 5, ""name"": ""Danjaq"", ""id"": 10761, ""name"": ""B24"", ""id"": 69434]","[""iso_3166_1"": ""GB"", ""name"": ""United Kingdom"", ""iso_3166_1"": ""US"", ""name"": ""United States of America""]",2015-10-26,880674609,148,"[""iso_639_1"": ""fr"", ""name"": ""Franu00e7ais"", ""iso_639_1"": ""en"", ""name"": ""English"", ""iso_639_1"": ""es"", ""name"": ""Espau00f1ol"", ""iso_639_1"": ""it"", ""name"": ""Italiano"", ""iso_639_1"": ""de"", ""name"": ""Deutsch""]",Released,A Plan No One Escapes,Spectre,6.3,4466
250000000,"[""id"": 28, ""name"": ""Action"", ""id"": 80, ""name"": ""Crime"", ""id"": 18, ""name"": ""Drama"", ""id"": 53, ""name"": ""Thriller""]",http://www.thedarkknightrises.com/,49026,"[""id"": 849, ""name"": ""dc comics"", ""id"": 853, ""name"": ""crime fighter"", ""id"": 949, ""name"": ""terrorist"", ""id"": 1308, ""name"": ""secret identity"", ""id"": 1437, ""name"": ""burglar"", ""id"": 3051, ""name"": ""hostage drama"", ""id"": 3562, ""name"": ""time bomb"", ""id"": 6969, ""name"": ""gotham city"", ""id"": 7002, ""name"": ""vigilante"", ""id"": 9665, ""name"": ""cover-up"", ""id"": 9715, ""name"": ""superhero"", ""id"": 9990, ""name"": ""villainess"", ""id"": 10044, ""name"": ""tragic hero"", ""id"": 13015, ""name"": ""terrorism"", ""id"": 14796, ""name"": ""destruction"", ""id"": 18933, ""name"": ""catwoman"", ""id"": 156082, ""name"": ""cat burglar"", ""id"": 156395, ""name"": ""imax"", ""id"": 173272, ""name"": ""flood"", ""id"": 179093, ""name"": ""criminal underworld"", ""id"": 230775, ""name"": ""batman""]",en,The Dark Knight Rises,"Following the death of District Attorney Harvey Dent, Batman assumes responsibility for Dent's crimes to protect the late attorney's reputation and is subsequently hunted by the Gotham City Police Department. Eight years later, Batman encounters the mysterious Selina Kyle and the villainous Bane, a new terrorist leader who overwhelms Gotham's finest. The Dark Knight resurfaces to protect a city that has branded him an enemy.",112.31295,"[""name"": ""Legendary Pictures"", ""id"": 923, ""name"": ""Warner Bros."", ""id"": 6194, ""name"": ""DC Entertainment"", ""id"": 9993, ""name"": ""Syncopy"", ""id"": 9996]","[""iso_3166_1"": ""US"", ""name"": ""United States of America""]",2012-07-16,1084939099,165,"[""iso_639_1"": ""en"", ""name"": ""English""]",Released,The Legend Ends,The Dark Knight Rises,7.6,9106


There are 19 columns (column attributes are shown in row 1), the 4th column attribute "id" is the unique identifier for each row i.e. movie. Certain columns contain a list of dictionaries, for example the genres column in row 2 contains:



[""id"": 28, ""name"": ""Action"", ""id"": 12, ""name"": ""Adventure"", ""id"": 14, ""name"": ""Fantasy"", ""id"": 878, ""name"": ""Science Fiction""]


Below is what I am currently doing which is just resulting in a list of strings for each attribute column:



import csv

budgets = []
genres = []
homepage = []
mainID = []
keywords = []
original_language = []
original_title = []
overview = []
popularity = []
production_companies = []
production_countries = []
release_date = []
revenue = []
runtime = []
spoken_languages = []
status = []
tag_line = []
title = []
vote_avg = []
vote_count = []
key_id = []

with open('tmdb_5000_movies.csv', 'r', encoding='utf-8') as csv_file:
csv_reader = csv.reader(csv_file)
skip_attributes_row = next(csv_reader)
for item in csv_reader:

budgets.append(item[0])
genres.append(item[1])
homepage.append(item[2])
mainID.append(item[3])
keywords.append(item[4])
original_language.append(item[5])
original_title.append(item[6])
overview.append(item[7])
popularity.append(item[8])
production_companies.append(item[9])
production_countries.append(item[10])
release_date.append(item[11])
revenue.append(item[12])
runtime.append(item[13])
spoken_languages.append(item[14])
status.append(item[15])
tag_line.append(item[16])
title.append(item[17])
vote_avg.append(item[18])
vote_count.append(item[19])









share|improve this question






















  • Please provide a sample of expected output.

    – gmds
    Mar 23 at 3:04











  • I can edit the post, essentially I want each column in its own list but columns like "genres" are causing problems and need to be separated into two lists, one storing genre_id and the other storing genre_name.

    – SSJ
    Mar 23 at 3:17











  • "each column in its own list": As I don't see the benefit of doing so you have to flatten the column genres list doing genres.extend(<column generes list>). Are you aware that you are dealing with json column data?

    – stovfl
    Mar 23 at 9:21














0












0








0








*If anyone has suggestions on how to format this better I am more than willing to do so.



This is for a homework assignment, the end goal is to normalize the data and insert into a schema that is in 2NF (Second Normal Form) then perform queries on the database. I am thinking I could store the data in a bunch of lists and then insert into the schema from there, how to setup the schema in a way to ensure 2NF is separate question but any suggestions would really help.



As far as the main question regarding parsing goes I am not expecting someone to just code it up for me but I am very much stumped on how this can be done in a way that preserves the dictionary like formatting of certain values.



Below is a snippet from the CSV file (each row is a movie):



budget , genres, homepage, id, keywords, original_language, original_title, overview,popularity, production_companies, production_countries, release_date, revenue, runtime, spoken_languages, status, tagline, title, vote_average, vote_count

237000000,"[""id"": 28, ""name"": ""Action"", ""id"": 12, ""name"": ""Adventure"", ""id"": 14, ""name"": ""Fantasy"", ""id"": 878, ""name"": ""Science Fiction""]",http://www.avatarmovie.com/,19995,"[""id"": 1463, ""name"": ""culture clash"", ""id"": 2964, ""name"": ""future"", ""id"": 3386, ""name"": ""space war"", ""id"": 3388, ""name"": ""space colony"", ""id"": 3679, ""name"": ""society"", ""id"": 3801, ""name"": ""space travel"", ""id"": 9685, ""name"": ""futuristic"", ""id"": 9840, ""name"": ""romance"", ""id"": 9882, ""name"": ""space"", ""id"": 9951, ""name"": ""alien"", ""id"": 10148, ""name"": ""tribe"", ""id"": 10158, ""name"": ""alien planet"", ""id"": 10987, ""name"": ""cgi"", ""id"": 11399, ""name"": ""marine"", ""id"": 13065, ""name"": ""soldier"", ""id"": 14643, ""name"": ""battle"", ""id"": 14720, ""name"": ""love affair"", ""id"": 165431, ""name"": ""anti war"", ""id"": 193554, ""name"": ""power relations"", ""id"": 206690, ""name"": ""mind and soul"", ""id"": 209714, ""name"": ""3d""]",en,Avatar,"In the 22nd century, a paraplegic Marine is dispatched to the moon Pandora on a unique mission, but becomes torn between following orders and protecting an alien civilization.",150.437577,"[""name"": ""Ingenious Film Partners"", ""id"": 289, ""name"": ""Twentieth Century Fox Film Corporation"", ""id"": 306, ""name"": ""Dune Entertainment"", ""id"": 444, ""name"": ""Lightstorm Entertainment"", ""id"": 574]","[""iso_3166_1"": ""US"", ""name"": ""United States of America"", ""iso_3166_1"": ""GB"", ""name"": ""United Kingdom""]",2009-12-10,2787965087,162,"[""iso_639_1"": ""en"", ""name"": ""English"", ""iso_639_1"": ""es"", ""name"": ""Espau00f1ol""]",Released,Enter the World of Pandora.,Avatar,7.2,11800
300000000,"[""id"": 12, ""name"": ""Adventure"", ""id"": 14, ""name"": ""Fantasy"", ""id"": 28, ""name"": ""Action""]",http://disney.go.com/disneypictures/pirates/,285,"[""id"": 270, ""name"": ""ocean"", ""id"": 726, ""name"": ""drug abuse"", ""id"": 911, ""name"": ""exotic island"", ""id"": 1319, ""name"": ""east india trading company"", ""id"": 2038, ""name"": ""love of one's life"", ""id"": 2052, ""name"": ""traitor"", ""id"": 2580, ""name"": ""shipwreck"", ""id"": 2660, ""name"": ""strong woman"", ""id"": 3799, ""name"": ""ship"", ""id"": 5740, ""name"": ""alliance"", ""id"": 5941, ""name"": ""calypso"", ""id"": 6155, ""name"": ""afterlife"", ""id"": 6211, ""name"": ""fighter"", ""id"": 12988, ""name"": ""pirate"", ""id"": 157186, ""name"": ""swashbuckler"", ""id"": 179430, ""name"": ""aftercreditsstinger""]",en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, has come back to life and is headed to the edge of the Earth with Will Turner and Elizabeth Swann. But nothing is quite as it seems.",139.082615,"[""name"": ""Walt Disney Pictures"", ""id"": 2, ""name"": ""Jerry Bruckheimer Films"", ""id"": 130, ""name"": ""Second Mate Productions"", ""id"": 19936]","[""iso_3166_1"": ""US"", ""name"": ""United States of America""]",2007-05-19,961000000,169,"[""iso_639_1"": ""en"", ""name"": ""English""]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500
245000000,"[""id"": 28, ""name"": ""Action"", ""id"": 12, ""name"": ""Adventure"", ""id"": 80, ""name"": ""Crime""]",http://www.sonypictures.com/movies/spectre/,206647,"[""id"": 470, ""name"": ""spy"", ""id"": 818, ""name"": ""based on novel"", ""id"": 4289, ""name"": ""secret agent"", ""id"": 9663, ""name"": ""sequel"", ""id"": 14555, ""name"": ""mi6"", ""id"": 156095, ""name"": ""british secret service"", ""id"": 158431, ""name"": ""united kingdom""]",en,Spectre,"A cryptic message from Bond’s past sends him on a trail to uncover a sinister organization. While M battles political forces to keep the secret service alive, Bond peels back the layers of deceit to reveal the terrible truth behind SPECTRE.",107.376788,"[""name"": ""Columbia Pictures"", ""id"": 5, ""name"": ""Danjaq"", ""id"": 10761, ""name"": ""B24"", ""id"": 69434]","[""iso_3166_1"": ""GB"", ""name"": ""United Kingdom"", ""iso_3166_1"": ""US"", ""name"": ""United States of America""]",2015-10-26,880674609,148,"[""iso_639_1"": ""fr"", ""name"": ""Franu00e7ais"", ""iso_639_1"": ""en"", ""name"": ""English"", ""iso_639_1"": ""es"", ""name"": ""Espau00f1ol"", ""iso_639_1"": ""it"", ""name"": ""Italiano"", ""iso_639_1"": ""de"", ""name"": ""Deutsch""]",Released,A Plan No One Escapes,Spectre,6.3,4466
250000000,"[""id"": 28, ""name"": ""Action"", ""id"": 80, ""name"": ""Crime"", ""id"": 18, ""name"": ""Drama"", ""id"": 53, ""name"": ""Thriller""]",http://www.thedarkknightrises.com/,49026,"[""id"": 849, ""name"": ""dc comics"", ""id"": 853, ""name"": ""crime fighter"", ""id"": 949, ""name"": ""terrorist"", ""id"": 1308, ""name"": ""secret identity"", ""id"": 1437, ""name"": ""burglar"", ""id"": 3051, ""name"": ""hostage drama"", ""id"": 3562, ""name"": ""time bomb"", ""id"": 6969, ""name"": ""gotham city"", ""id"": 7002, ""name"": ""vigilante"", ""id"": 9665, ""name"": ""cover-up"", ""id"": 9715, ""name"": ""superhero"", ""id"": 9990, ""name"": ""villainess"", ""id"": 10044, ""name"": ""tragic hero"", ""id"": 13015, ""name"": ""terrorism"", ""id"": 14796, ""name"": ""destruction"", ""id"": 18933, ""name"": ""catwoman"", ""id"": 156082, ""name"": ""cat burglar"", ""id"": 156395, ""name"": ""imax"", ""id"": 173272, ""name"": ""flood"", ""id"": 179093, ""name"": ""criminal underworld"", ""id"": 230775, ""name"": ""batman""]",en,The Dark Knight Rises,"Following the death of District Attorney Harvey Dent, Batman assumes responsibility for Dent's crimes to protect the late attorney's reputation and is subsequently hunted by the Gotham City Police Department. Eight years later, Batman encounters the mysterious Selina Kyle and the villainous Bane, a new terrorist leader who overwhelms Gotham's finest. The Dark Knight resurfaces to protect a city that has branded him an enemy.",112.31295,"[""name"": ""Legendary Pictures"", ""id"": 923, ""name"": ""Warner Bros."", ""id"": 6194, ""name"": ""DC Entertainment"", ""id"": 9993, ""name"": ""Syncopy"", ""id"": 9996]","[""iso_3166_1"": ""US"", ""name"": ""United States of America""]",2012-07-16,1084939099,165,"[""iso_639_1"": ""en"", ""name"": ""English""]",Released,The Legend Ends,The Dark Knight Rises,7.6,9106


There are 19 columns (column attributes are shown in row 1), the 4th column attribute "id" is the unique identifier for each row i.e. movie. Certain columns contain a list of dictionaries, for example the genres column in row 2 contains:



[""id"": 28, ""name"": ""Action"", ""id"": 12, ""name"": ""Adventure"", ""id"": 14, ""name"": ""Fantasy"", ""id"": 878, ""name"": ""Science Fiction""]


Below is what I am currently doing which is just resulting in a list of strings for each attribute column:



import csv

budgets = []
genres = []
homepage = []
mainID = []
keywords = []
original_language = []
original_title = []
overview = []
popularity = []
production_companies = []
production_countries = []
release_date = []
revenue = []
runtime = []
spoken_languages = []
status = []
tag_line = []
title = []
vote_avg = []
vote_count = []
key_id = []

with open('tmdb_5000_movies.csv', 'r', encoding='utf-8') as csv_file:
csv_reader = csv.reader(csv_file)
skip_attributes_row = next(csv_reader)
for item in csv_reader:

budgets.append(item[0])
genres.append(item[1])
homepage.append(item[2])
mainID.append(item[3])
keywords.append(item[4])
original_language.append(item[5])
original_title.append(item[6])
overview.append(item[7])
popularity.append(item[8])
production_companies.append(item[9])
production_countries.append(item[10])
release_date.append(item[11])
revenue.append(item[12])
runtime.append(item[13])
spoken_languages.append(item[14])
status.append(item[15])
tag_line.append(item[16])
title.append(item[17])
vote_avg.append(item[18])
vote_count.append(item[19])









share|improve this question














*If anyone has suggestions on how to format this better I am more than willing to do so.



This is for a homework assignment, the end goal is to normalize the data and insert into a schema that is in 2NF (Second Normal Form) then perform queries on the database. I am thinking I could store the data in a bunch of lists and then insert into the schema from there, how to setup the schema in a way to ensure 2NF is separate question but any suggestions would really help.



As far as the main question regarding parsing goes I am not expecting someone to just code it up for me but I am very much stumped on how this can be done in a way that preserves the dictionary like formatting of certain values.



Below is a snippet from the CSV file (each row is a movie):



budget , genres, homepage, id, keywords, original_language, original_title, overview,popularity, production_companies, production_countries, release_date, revenue, runtime, spoken_languages, status, tagline, title, vote_average, vote_count

237000000,"[""id"": 28, ""name"": ""Action"", ""id"": 12, ""name"": ""Adventure"", ""id"": 14, ""name"": ""Fantasy"", ""id"": 878, ""name"": ""Science Fiction""]",http://www.avatarmovie.com/,19995,"[""id"": 1463, ""name"": ""culture clash"", ""id"": 2964, ""name"": ""future"", ""id"": 3386, ""name"": ""space war"", ""id"": 3388, ""name"": ""space colony"", ""id"": 3679, ""name"": ""society"", ""id"": 3801, ""name"": ""space travel"", ""id"": 9685, ""name"": ""futuristic"", ""id"": 9840, ""name"": ""romance"", ""id"": 9882, ""name"": ""space"", ""id"": 9951, ""name"": ""alien"", ""id"": 10148, ""name"": ""tribe"", ""id"": 10158, ""name"": ""alien planet"", ""id"": 10987, ""name"": ""cgi"", ""id"": 11399, ""name"": ""marine"", ""id"": 13065, ""name"": ""soldier"", ""id"": 14643, ""name"": ""battle"", ""id"": 14720, ""name"": ""love affair"", ""id"": 165431, ""name"": ""anti war"", ""id"": 193554, ""name"": ""power relations"", ""id"": 206690, ""name"": ""mind and soul"", ""id"": 209714, ""name"": ""3d""]",en,Avatar,"In the 22nd century, a paraplegic Marine is dispatched to the moon Pandora on a unique mission, but becomes torn between following orders and protecting an alien civilization.",150.437577,"[""name"": ""Ingenious Film Partners"", ""id"": 289, ""name"": ""Twentieth Century Fox Film Corporation"", ""id"": 306, ""name"": ""Dune Entertainment"", ""id"": 444, ""name"": ""Lightstorm Entertainment"", ""id"": 574]","[""iso_3166_1"": ""US"", ""name"": ""United States of America"", ""iso_3166_1"": ""GB"", ""name"": ""United Kingdom""]",2009-12-10,2787965087,162,"[""iso_639_1"": ""en"", ""name"": ""English"", ""iso_639_1"": ""es"", ""name"": ""Espau00f1ol""]",Released,Enter the World of Pandora.,Avatar,7.2,11800
300000000,"[""id"": 12, ""name"": ""Adventure"", ""id"": 14, ""name"": ""Fantasy"", ""id"": 28, ""name"": ""Action""]",http://disney.go.com/disneypictures/pirates/,285,"[""id"": 270, ""name"": ""ocean"", ""id"": 726, ""name"": ""drug abuse"", ""id"": 911, ""name"": ""exotic island"", ""id"": 1319, ""name"": ""east india trading company"", ""id"": 2038, ""name"": ""love of one's life"", ""id"": 2052, ""name"": ""traitor"", ""id"": 2580, ""name"": ""shipwreck"", ""id"": 2660, ""name"": ""strong woman"", ""id"": 3799, ""name"": ""ship"", ""id"": 5740, ""name"": ""alliance"", ""id"": 5941, ""name"": ""calypso"", ""id"": 6155, ""name"": ""afterlife"", ""id"": 6211, ""name"": ""fighter"", ""id"": 12988, ""name"": ""pirate"", ""id"": 157186, ""name"": ""swashbuckler"", ""id"": 179430, ""name"": ""aftercreditsstinger""]",en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, has come back to life and is headed to the edge of the Earth with Will Turner and Elizabeth Swann. But nothing is quite as it seems.",139.082615,"[""name"": ""Walt Disney Pictures"", ""id"": 2, ""name"": ""Jerry Bruckheimer Films"", ""id"": 130, ""name"": ""Second Mate Productions"", ""id"": 19936]","[""iso_3166_1"": ""US"", ""name"": ""United States of America""]",2007-05-19,961000000,169,"[""iso_639_1"": ""en"", ""name"": ""English""]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500
245000000,"[""id"": 28, ""name"": ""Action"", ""id"": 12, ""name"": ""Adventure"", ""id"": 80, ""name"": ""Crime""]",http://www.sonypictures.com/movies/spectre/,206647,"[""id"": 470, ""name"": ""spy"", ""id"": 818, ""name"": ""based on novel"", ""id"": 4289, ""name"": ""secret agent"", ""id"": 9663, ""name"": ""sequel"", ""id"": 14555, ""name"": ""mi6"", ""id"": 156095, ""name"": ""british secret service"", ""id"": 158431, ""name"": ""united kingdom""]",en,Spectre,"A cryptic message from Bond’s past sends him on a trail to uncover a sinister organization. While M battles political forces to keep the secret service alive, Bond peels back the layers of deceit to reveal the terrible truth behind SPECTRE.",107.376788,"[""name"": ""Columbia Pictures"", ""id"": 5, ""name"": ""Danjaq"", ""id"": 10761, ""name"": ""B24"", ""id"": 69434]","[""iso_3166_1"": ""GB"", ""name"": ""United Kingdom"", ""iso_3166_1"": ""US"", ""name"": ""United States of America""]",2015-10-26,880674609,148,"[""iso_639_1"": ""fr"", ""name"": ""Franu00e7ais"", ""iso_639_1"": ""en"", ""name"": ""English"", ""iso_639_1"": ""es"", ""name"": ""Espau00f1ol"", ""iso_639_1"": ""it"", ""name"": ""Italiano"", ""iso_639_1"": ""de"", ""name"": ""Deutsch""]",Released,A Plan No One Escapes,Spectre,6.3,4466
250000000,"[""id"": 28, ""name"": ""Action"", ""id"": 80, ""name"": ""Crime"", ""id"": 18, ""name"": ""Drama"", ""id"": 53, ""name"": ""Thriller""]",http://www.thedarkknightrises.com/,49026,"[""id"": 849, ""name"": ""dc comics"", ""id"": 853, ""name"": ""crime fighter"", ""id"": 949, ""name"": ""terrorist"", ""id"": 1308, ""name"": ""secret identity"", ""id"": 1437, ""name"": ""burglar"", ""id"": 3051, ""name"": ""hostage drama"", ""id"": 3562, ""name"": ""time bomb"", ""id"": 6969, ""name"": ""gotham city"", ""id"": 7002, ""name"": ""vigilante"", ""id"": 9665, ""name"": ""cover-up"", ""id"": 9715, ""name"": ""superhero"", ""id"": 9990, ""name"": ""villainess"", ""id"": 10044, ""name"": ""tragic hero"", ""id"": 13015, ""name"": ""terrorism"", ""id"": 14796, ""name"": ""destruction"", ""id"": 18933, ""name"": ""catwoman"", ""id"": 156082, ""name"": ""cat burglar"", ""id"": 156395, ""name"": ""imax"", ""id"": 173272, ""name"": ""flood"", ""id"": 179093, ""name"": ""criminal underworld"", ""id"": 230775, ""name"": ""batman""]",en,The Dark Knight Rises,"Following the death of District Attorney Harvey Dent, Batman assumes responsibility for Dent's crimes to protect the late attorney's reputation and is subsequently hunted by the Gotham City Police Department. Eight years later, Batman encounters the mysterious Selina Kyle and the villainous Bane, a new terrorist leader who overwhelms Gotham's finest. The Dark Knight resurfaces to protect a city that has branded him an enemy.",112.31295,"[""name"": ""Legendary Pictures"", ""id"": 923, ""name"": ""Warner Bros."", ""id"": 6194, ""name"": ""DC Entertainment"", ""id"": 9993, ""name"": ""Syncopy"", ""id"": 9996]","[""iso_3166_1"": ""US"", ""name"": ""United States of America""]",2012-07-16,1084939099,165,"[""iso_639_1"": ""en"", ""name"": ""English""]",Released,The Legend Ends,The Dark Knight Rises,7.6,9106


There are 19 columns (column attributes are shown in row 1), the 4th column attribute "id" is the unique identifier for each row i.e. movie. Certain columns contain a list of dictionaries, for example the genres column in row 2 contains:



[""id"": 28, ""name"": ""Action"", ""id"": 12, ""name"": ""Adventure"", ""id"": 14, ""name"": ""Fantasy"", ""id"": 878, ""name"": ""Science Fiction""]


Below is what I am currently doing which is just resulting in a list of strings for each attribute column:



import csv

budgets = []
genres = []
homepage = []
mainID = []
keywords = []
original_language = []
original_title = []
overview = []
popularity = []
production_companies = []
production_countries = []
release_date = []
revenue = []
runtime = []
spoken_languages = []
status = []
tag_line = []
title = []
vote_avg = []
vote_count = []
key_id = []

with open('tmdb_5000_movies.csv', 'r', encoding='utf-8') as csv_file:
csv_reader = csv.reader(csv_file)
skip_attributes_row = next(csv_reader)
for item in csv_reader:

budgets.append(item[0])
genres.append(item[1])
homepage.append(item[2])
mainID.append(item[3])
keywords.append(item[4])
original_language.append(item[5])
original_title.append(item[6])
overview.append(item[7])
popularity.append(item[8])
production_companies.append(item[9])
production_countries.append(item[10])
release_date.append(item[11])
revenue.append(item[12])
runtime.append(item[13])
spoken_languages.append(item[14])
status.append(item[15])
tag_line.append(item[16])
title.append(item[17])
vote_avg.append(item[18])
vote_count.append(item[19])






python csv






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Mar 23 at 2:39









SSJSSJ

1




1












  • Please provide a sample of expected output.

    – gmds
    Mar 23 at 3:04











  • I can edit the post, essentially I want each column in its own list but columns like "genres" are causing problems and need to be separated into two lists, one storing genre_id and the other storing genre_name.

    – SSJ
    Mar 23 at 3:17











  • "each column in its own list": As I don't see the benefit of doing so you have to flatten the column genres list doing genres.extend(<column generes list>). Are you aware that you are dealing with json column data?

    – stovfl
    Mar 23 at 9:21


















  • Please provide a sample of expected output.

    – gmds
    Mar 23 at 3:04











  • I can edit the post, essentially I want each column in its own list but columns like "genres" are causing problems and need to be separated into two lists, one storing genre_id and the other storing genre_name.

    – SSJ
    Mar 23 at 3:17











  • "each column in its own list": As I don't see the benefit of doing so you have to flatten the column genres list doing genres.extend(<column generes list>). Are you aware that you are dealing with json column data?

    – stovfl
    Mar 23 at 9:21

















Please provide a sample of expected output.

– gmds
Mar 23 at 3:04





Please provide a sample of expected output.

– gmds
Mar 23 at 3:04













I can edit the post, essentially I want each column in its own list but columns like "genres" are causing problems and need to be separated into two lists, one storing genre_id and the other storing genre_name.

– SSJ
Mar 23 at 3:17





I can edit the post, essentially I want each column in its own list but columns like "genres" are causing problems and need to be separated into two lists, one storing genre_id and the other storing genre_name.

– SSJ
Mar 23 at 3:17













"each column in its own list": As I don't see the benefit of doing so you have to flatten the column genres list doing genres.extend(<column generes list>). Are you aware that you are dealing with json column data?

– stovfl
Mar 23 at 9:21






"each column in its own list": As I don't see the benefit of doing so you have to flatten the column genres list doing genres.extend(<column generes list>). Are you aware that you are dealing with json column data?

– stovfl
Mar 23 at 9:21













0






active

oldest

votes












Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55310092%2fhow-can-i-parse-csv-file-while-preserving-dictionary-values-within-rows%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55310092%2fhow-can-i-parse-csv-file-while-preserving-dictionary-values-within-rows%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript