How can I concatenate multiple rows of excel data into one?How can I represent an 'Enum' in Python?How can I safely create a nested directory?How can I remove a trailing newline?How can I make a time delay in Python?How can you profile a Python script?How do I concatenate two lists in Python?How can I count the occurrences of a list item?“Large data” work flows using pandasHow to iterate over rows in a DataFrame in Pandas?Importing Multiple Excel Files using OpenPyXL

Why is the UH-60 tail rotor canted?

Considerations when providing money to one child now, and the other later?

Can't understand how static works exactly

How often should alkaline batteries be checked when they are in a device?

Where can I find maps and other historical resources / references of Calcutta / Kolkata in the Victorian era?

Adding gears to my grandson's 12" bike

Origin of the suffix in hippocampus

Why are Oscar, India, and X-Ray (O, I, and X) not used as taxiway identifiers?

Could I play forever with this loop combination?

Why does the salt in the oceans not sink to the bottom?

What kind of world would drive brains to evolve high-throughput sensory?

How can Kazakhstan perform MITM attacks on all HTTPS traffic?

Found old paper shares of Motorola Inc that has since been broken up

Are stackless C++20 coroutines a problem?

How can I show that the speed of light in vacuum is the same in all reference frames?

How am I supposed to put out fires?

Is an easily guessed plot twist a good plot twist?

Book in which the "mountain" in the distance was a hole in the flat world

Do you have to hide in order for other creatures to not know your location inside a fog cloud?

Why is DC so, so, so Democratic?

Create a dropshadow only layer

How does mathematics work?

Is it better to merge "often" or only after completion do a big merge of feature branches?

Is the apartment I want to rent a scam?

How can I concatenate multiple rows of excel data into one?

How can I represent an 'Enum' in Python?How can I safely create a nested directory?How can I remove a trailing newline?How can I make a time delay in Python?How can you profile a Python script?How do I concatenate two lists in Python?How can I count the occurrences of a list item?“Large data” work flows using pandasHow to iterate over rows in a DataFrame in Pandas?Importing Multiple Excel Files using OpenPyXL

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;

I'm currently facing an issue where I need to bring all of the data shown in the images below into one line only.

enter image description here

So using Python and Openpyxl, I tried to write a parsing script that reads the line and only copies when values are non-null or non-identical, into a new workbook.

I get out of range errors, and the code does not keep just the data I want. I've spent multiple hours on it, so I thought I would ask here to see if I can get unstuck.

I've read some documentation on Openpyxl and about making lists in python, tried a couple of videos on youtube, but none of them did exactly what I was trying to achieve.

import openpyxl
from openpyxl import Workbook

path = "sample.xlsx"
wb = openpyxl.load_workbook(path)
ws = wb.active
path2 = "output.xlsx"
wb2 = Workbook()
ws2 = wb2.active

listab = []
rows = ws.max_row
columns = ws.max_column

for i in range (1, rows+1):
 listab.append([])

cellValue = " " 
prevCell = " "

for c in range (1, rows+1):
 for r in range(1, columns+1):
 cellValue = ws.cell(row=r, column=c).value
 if cellValue == prevCell:
 listab[r-1].append(prevCell)
 elif cellValue == "NULL":
 listab[r-1].append(prevCell)
 elif cellValue != prevCell:
 listab[r-1].append(cellValue)
 prevCell = cellValue 

for r in range(1, rows+1):
 for c in range (1, columns+1):
 j = ws2.cell(row = r, column=c)
 j.value = listab[r-1][c-1]

print(listab)

wb2.save("output.xlsx")

There should be one line with the below information:

ods_service_id | service_name| service_plan_name| CPU | RAM | NIC | DRIVE |

edited Mar 26 at 14:45

Mailerdaimon

4,7651 gold badge26 silver badges40 bronze badges

asked Mar 26 at 14:08

Mathieu Moquin

192 bronze badges

It's not entirely clear what you want to do. Please describe what you want to do with individual rows.

– Charlie Clark
Mar 26 at 14:40

add a comment |

I'm currently facing an issue where I need to bring all of the data shown in the images below into one line only.

enter image description here

So using Python and Openpyxl, I tried to write a parsing script that reads the line and only copies when values are non-null or non-identical, into a new workbook.

I get out of range errors, and the code does not keep just the data I want. I've spent multiple hours on it, so I thought I would ask here to see if I can get unstuck.

I've read some documentation on Openpyxl and about making lists in python, tried a couple of videos on youtube, but none of them did exactly what I was trying to achieve.

import openpyxl
from openpyxl import Workbook

path = "sample.xlsx"
wb = openpyxl.load_workbook(path)
ws = wb.active
path2 = "output.xlsx"
wb2 = Workbook()
ws2 = wb2.active

listab = []
rows = ws.max_row
columns = ws.max_column

for i in range (1, rows+1):
 listab.append([])

cellValue = " " 
prevCell = " "

for c in range (1, rows+1):
 for r in range(1, columns+1):
 cellValue = ws.cell(row=r, column=c).value
 if cellValue == prevCell:
 listab[r-1].append(prevCell)
 elif cellValue == "NULL":
 listab[r-1].append(prevCell)
 elif cellValue != prevCell:
 listab[r-1].append(cellValue)
 prevCell = cellValue 

for r in range(1, rows+1):
 for c in range (1, columns+1):
 j = ws2.cell(row = r, column=c)
 j.value = listab[r-1][c-1]

print(listab)

wb2.save("output.xlsx")

There should be one line with the below information:

ods_service_id | service_name| service_plan_name| CPU | RAM | NIC | DRIVE |

edited Mar 26 at 14:45

Mailerdaimon

4,7651 gold badge26 silver badges40 bronze badges

asked Mar 26 at 14:08

Mathieu Moquin

192 bronze badges

It's not entirely clear what you want to do. Please describe what you want to do with individual rows.

– Charlie Clark
Mar 26 at 14:40

add a comment |

I'm currently facing an issue where I need to bring all of the data shown in the images below into one line only.

enter image description here

So using Python and Openpyxl, I tried to write a parsing script that reads the line and only copies when values are non-null or non-identical, into a new workbook.

I get out of range errors, and the code does not keep just the data I want. I've spent multiple hours on it, so I thought I would ask here to see if I can get unstuck.

I've read some documentation on Openpyxl and about making lists in python, tried a couple of videos on youtube, but none of them did exactly what I was trying to achieve.

import openpyxl
from openpyxl import Workbook

path = "sample.xlsx"
wb = openpyxl.load_workbook(path)
ws = wb.active
path2 = "output.xlsx"
wb2 = Workbook()
ws2 = wb2.active

listab = []
rows = ws.max_row
columns = ws.max_column

for i in range (1, rows+1):
 listab.append([])

cellValue = " " 
prevCell = " "

for c in range (1, rows+1):
 for r in range(1, columns+1):
 cellValue = ws.cell(row=r, column=c).value
 if cellValue == prevCell:
 listab[r-1].append(prevCell)
 elif cellValue == "NULL":
 listab[r-1].append(prevCell)
 elif cellValue != prevCell:
 listab[r-1].append(cellValue)
 prevCell = cellValue 

for r in range(1, rows+1):
 for c in range (1, columns+1):
 j = ws2.cell(row = r, column=c)
 j.value = listab[r-1][c-1]

print(listab)

wb2.save("output.xlsx")

There should be one line with the below information:

ods_service_id | service_name| service_plan_name| CPU | RAM | NIC | DRIVE |

edited Mar 26 at 14:45

Mailerdaimon

4,7651 gold badge26 silver badges40 bronze badges

asked Mar 26 at 14:08

Mathieu Moquin

192 bronze badges

I'm currently facing an issue where I need to bring all of the data shown in the images below into one line only.

enter image description here

So using Python and Openpyxl, I tried to write a parsing script that reads the line and only copies when values are non-null or non-identical, into a new workbook.

I get out of range errors, and the code does not keep just the data I want. I've spent multiple hours on it, so I thought I would ask here to see if I can get unstuck.

I've read some documentation on Openpyxl and about making lists in python, tried a couple of videos on youtube, but none of them did exactly what I was trying to achieve.

import openpyxl
from openpyxl import Workbook

path = "sample.xlsx"
wb = openpyxl.load_workbook(path)
ws = wb.active
path2 = "output.xlsx"
wb2 = Workbook()
ws2 = wb2.active

listab = []
rows = ws.max_row
columns = ws.max_column

for i in range (1, rows+1):
 listab.append([])

cellValue = " " 
prevCell = " "

for c in range (1, rows+1):
 for r in range(1, columns+1):
 cellValue = ws.cell(row=r, column=c).value
 if cellValue == prevCell:
 listab[r-1].append(prevCell)
 elif cellValue == "NULL":
 listab[r-1].append(prevCell)
 elif cellValue != prevCell:
 listab[r-1].append(cellValue)
 prevCell = cellValue 

for r in range(1, rows+1):
 for c in range (1, columns+1):
 j = ws2.cell(row = r, column=c)
 j.value = listab[r-1][c-1]

print(listab)

wb2.save("output.xlsx")

There should be one line with the below information:

ods_service_id | service_name| service_plan_name| CPU | RAM | NIC | DRIVE |

python openpyxl

edited Mar 26 at 14:45

Mailerdaimon

4,7651 gold badge26 silver badges40 bronze badges

asked Mar 26 at 14:08

Mathieu Moquin

192 bronze badges

edited Mar 26 at 14:45

Mailerdaimon

4,7651 gold badge26 silver badges40 bronze badges

asked Mar 26 at 14:08

Mathieu Moquin

192 bronze badges

edited Mar 26 at 14:45

Mailerdaimon

4,7651 gold badge26 silver badges40 bronze badges

edited Mar 26 at 14:45

Mailerdaimon

4,7651 gold badge26 silver badges40 bronze badges

edited Mar 26 at 14:45

Mailerdaimon

4,7651 gold badge26 silver badges40 bronze badges

asked Mar 26 at 14:08

Mathieu Moquin

192 bronze badges

asked Mar 26 at 14:08

Mathieu Moquin

192 bronze badges

asked Mar 26 at 14:08

Mathieu Moquin

192 bronze badges

It's not entirely clear what you want to do. Please describe what you want to do with individual rows.

– Charlie Clark
Mar 26 at 14:40

add a comment |

It's not entirely clear what you want to do. Please describe what you want to do with individual rows.

– Charlie Clark
Mar 26 at 14:40

It's not entirely clear what you want to do. Please describe what you want to do with individual rows.

– Charlie Clark
Mar 26 at 14:40

add a comment |

3 Answers
3

active

oldest

votes

Personally I would go with pandas.

import pandas as pd

#Loading into pandas
df_data = pd.read_excel('sample.xlsx')
df_data.fillna("NO DATA",inplace=True) ## Replaced nan values with "NO DATA"
unique_ids = df_data.ods_service_ids.unique()

#Storing pd into a list
records_list = df_data.to_dict('records') 
keys_to_check = ['service_name', 'service_plan_name', 'CPU','RAM','NIC','DRIVE']
processed = 

#Go through unique ids
for key in unique_ids:
 processed[key] = 

 #Get related records
 matching_records = [y for y in records_list if y['ods_service_ids'] == key]
 #Loop through records
 for record in matching_records:
 #For each key to check, save in dict if non null
 processed[key]['ods_service_ids'] = key
 for detail_key in keys_to_check:
 if record[detail_key] != "NO DATA" :
 processed[key][detail_key] = record[detail_key]
 ##Note : doesn't handle duplicate values for different keys so far


#Records are put back in list
output_data = [processed[x] for x in processed.keys()]
# -> to Pandas
df = pd.DataFrame(output_data)[['ods_service_ids','service_name', 'service_plan_name', 'CPU','RAM','NIC','DRIVE']]

#Export to Excel
df.to_excel("output.xlsx",sheet_name='Sheet_name_1', index=False)

The above should work but I wasn't really sure on how you wanted to save duplicated records for the same id. Do you look to store them as DRIVE_0, DRIVE_1, DRIVE_2 ?

EDIT:

df could be exported in a different way. Replaced below #export to Excel with the following :

df.to_excel("output.xlsx",sheet_name='Sheet_name_1')

EDIT 2:

with no input data it was hard to see any flows. Corrected the code above with fake data

edited Mar 26 at 17:34

answered Mar 26 at 15:21

Sebastien D

2,2072 gold badges8 silver badges29 bronze badges

I think doing the processing Pandas is fine but then I'd write rows directly with openpyxl.

– Charlie Clark
Mar 26 at 16:28

I tried running this and I get: ```` >>> xcel Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'xcel' is not defined >>> >>> writer = pd.ExcelWriter('output.xlsx') Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'pd' is not defined >>> df.to_excel(writer,'Sheet1') Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'df' is not defined >>> writer.save() File "<stdin>", line 1, in <module> NameError: name 'writer' is not defined >>> ````

– Mathieu Moquin
Mar 26 at 16:52

Just edited the code for exporting

– Sebastien D
Mar 26 at 17:01

@SebastienD it says that df is not defined, which is weird given the fact that you actually define its values. Could it be an error with my Visual Code configuration? I set Environment variables for Python, do I need to do it for each library I add?

– Mathieu Moquin
Mar 26 at 17:10

1

Mathieu, I just ran the code and indeed there were a couple errors. Corrected them, hope it will work. Anyway, do not focus on my approach, please consider what @CharlieClarck suggested which looks interesting

– Sebastien D
Mar 26 at 17:35

add a comment |

To be honest, I think you've managed to get confused by data structures and come up with something far more complicated than you need.

One approach that would suit would be to use Python dictionaries for each service, updating them row by row.

wb = load_workbook("sample.xlsx")
ws = wb.active
objs = 
headers = next(ws.iter_rows(min_row=1, max_row=1, values_only=True))
for row in ws.iter_rows(min_row=2, values_only=True):
 if row[0] not in objs:
 obj = key:value for key, value in zip(headers, row)
 objs[obj['ods_service_id']] = obj
 else:# update dict with non-None values
 extra = key:value for key, value in zip(headers[3:], row[3:]) if value != "NULL"
 obj.update(extra)

# write to new workbook
wb2 = Workbook()
ws2 = wb2.active
ws2.append(headers)
for row in objs.values(): # do they need sorting?
 ws2.append([obj[key] for key in headers])

Note how you can do everything without using counters.

edited Mar 27 at 8:33

answered Mar 26 at 16:44

Charlie Clark

11k2 gold badges26 silver badges37 bronze badges

For the most part this runs, but it is telling me that for "header", in zip(header, row), header is not defined. because of that error though, the script fails to write anything in output.xlsx

– Mathieu Moquin
Mar 26 at 19:02

seems the header issue was indeed the S missing. So the code continues running here and I'm getting a keyerror on line 13 : objs[obj['ods_service_id']] = obj

– Mathieu Moquin
Mar 26 at 19:11

What does obj.keys() give you? Or what's the first value of ´headers`?

– Charlie Clark
Mar 27 at 8:31

add a comment |

-1

I would suggest using the pandas library for this and then you can easily do any kind of transformation.

import pandas as pd

exceldata = pd.read_excel('tmp.xlsx', index_col=0) 

print(exceldata)

You can easily remove null/na value or you can replace it and export it into excel format.

Reference For Help:

Reading Excel

Drop Na Value

Replace NA Value

edited Mar 26 at 15:31

baduker

1,2224 gold badges11 silver badges21 bronze badges

answered Mar 26 at 14:59

Prakash Choudhary

849 bronze badges

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55359200%2fhow-can-i-concatenate-multiple-rows-of-excel-data-into-one%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

Personally I would go with pandas.

import pandas as pd

#Loading into pandas
df_data = pd.read_excel('sample.xlsx')
df_data.fillna("NO DATA",inplace=True) ## Replaced nan values with "NO DATA"
unique_ids = df_data.ods_service_ids.unique()

#Storing pd into a list
records_list = df_data.to_dict('records') 
keys_to_check = ['service_name', 'service_plan_name', 'CPU','RAM','NIC','DRIVE']
processed = 

#Go through unique ids
for key in unique_ids:
 processed[key] = 

 #Get related records
 matching_records = [y for y in records_list if y['ods_service_ids'] == key]
 #Loop through records
 for record in matching_records:
 #For each key to check, save in dict if non null
 processed[key]['ods_service_ids'] = key
 for detail_key in keys_to_check:
 if record[detail_key] != "NO DATA" :
 processed[key][detail_key] = record[detail_key]
 ##Note : doesn't handle duplicate values for different keys so far


#Records are put back in list
output_data = [processed[x] for x in processed.keys()]
# -> to Pandas
df = pd.DataFrame(output_data)[['ods_service_ids','service_name', 'service_plan_name', 'CPU','RAM','NIC','DRIVE']]

#Export to Excel
df.to_excel("output.xlsx",sheet_name='Sheet_name_1', index=False)

The above should work but I wasn't really sure on how you wanted to save duplicated records for the same id. Do you look to store them as DRIVE_0, DRIVE_1, DRIVE_2 ?

EDIT:

df could be exported in a different way. Replaced below #export to Excel with the following :

df.to_excel("output.xlsx",sheet_name='Sheet_name_1')

EDIT 2:

with no input data it was hard to see any flows. Corrected the code above with fake data

edited Mar 26 at 17:34

answered Mar 26 at 15:21

Sebastien D

2,2072 gold badges8 silver badges29 bronze badges

I think doing the processing Pandas is fine but then I'd write rows directly with openpyxl.

– Charlie Clark
Mar 26 at 16:28

I tried running this and I get: ```` >>> xcel Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'xcel' is not defined >>> >>> writer = pd.ExcelWriter('output.xlsx') Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'pd' is not defined >>> df.to_excel(writer,'Sheet1') Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'df' is not defined >>> writer.save() File "<stdin>", line 1, in <module> NameError: name 'writer' is not defined >>> ````

– Mathieu Moquin
Mar 26 at 16:52

Just edited the code for exporting

– Sebastien D
Mar 26 at 17:01

@SebastienD it says that df is not defined, which is weird given the fact that you actually define its values. Could it be an error with my Visual Code configuration? I set Environment variables for Python, do I need to do it for each library I add?

– Mathieu Moquin
Mar 26 at 17:10

1

Mathieu, I just ran the code and indeed there were a couple errors. Corrected them, hope it will work. Anyway, do not focus on my approach, please consider what @CharlieClarck suggested which looks interesting

– Sebastien D
Mar 26 at 17:35

add a comment |

Personally I would go with pandas.

import pandas as pd

#Loading into pandas
df_data = pd.read_excel('sample.xlsx')
df_data.fillna("NO DATA",inplace=True) ## Replaced nan values with "NO DATA"
unique_ids = df_data.ods_service_ids.unique()

#Storing pd into a list
records_list = df_data.to_dict('records') 
keys_to_check = ['service_name', 'service_plan_name', 'CPU','RAM','NIC','DRIVE']
processed = 

#Go through unique ids
for key in unique_ids:
 processed[key] = 

 #Get related records
 matching_records = [y for y in records_list if y['ods_service_ids'] == key]
 #Loop through records
 for record in matching_records:
 #For each key to check, save in dict if non null
 processed[key]['ods_service_ids'] = key
 for detail_key in keys_to_check:
 if record[detail_key] != "NO DATA" :
 processed[key][detail_key] = record[detail_key]
 ##Note : doesn't handle duplicate values for different keys so far


#Records are put back in list
output_data = [processed[x] for x in processed.keys()]
# -> to Pandas
df = pd.DataFrame(output_data)[['ods_service_ids','service_name', 'service_plan_name', 'CPU','RAM','NIC','DRIVE']]

#Export to Excel
df.to_excel("output.xlsx",sheet_name='Sheet_name_1', index=False)

The above should work but I wasn't really sure on how you wanted to save duplicated records for the same id. Do you look to store them as DRIVE_0, DRIVE_1, DRIVE_2 ?

EDIT:

df could be exported in a different way. Replaced below #export to Excel with the following :

df.to_excel("output.xlsx",sheet_name='Sheet_name_1')

EDIT 2:

with no input data it was hard to see any flows. Corrected the code above with fake data

edited Mar 26 at 17:34

answered Mar 26 at 15:21

Sebastien D

2,2072 gold badges8 silver badges29 bronze badges

I think doing the processing Pandas is fine but then I'd write rows directly with openpyxl.

– Charlie Clark
Mar 26 at 16:28

I tried running this and I get: ```` >>> xcel Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'xcel' is not defined >>> >>> writer = pd.ExcelWriter('output.xlsx') Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'pd' is not defined >>> df.to_excel(writer,'Sheet1') Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'df' is not defined >>> writer.save() File "<stdin>", line 1, in <module> NameError: name 'writer' is not defined >>> ````

– Mathieu Moquin
Mar 26 at 16:52

Just edited the code for exporting

– Sebastien D
Mar 26 at 17:01

@SebastienD it says that df is not defined, which is weird given the fact that you actually define its values. Could it be an error with my Visual Code configuration? I set Environment variables for Python, do I need to do it for each library I add?

– Mathieu Moquin
Mar 26 at 17:10

1

Mathieu, I just ran the code and indeed there were a couple errors. Corrected them, hope it will work. Anyway, do not focus on my approach, please consider what @CharlieClarck suggested which looks interesting

– Sebastien D
Mar 26 at 17:35

add a comment |

Personally I would go with pandas.

import pandas as pd

#Loading into pandas
df_data = pd.read_excel('sample.xlsx')
df_data.fillna("NO DATA",inplace=True) ## Replaced nan values with "NO DATA"
unique_ids = df_data.ods_service_ids.unique()

#Storing pd into a list
records_list = df_data.to_dict('records') 
keys_to_check = ['service_name', 'service_plan_name', 'CPU','RAM','NIC','DRIVE']
processed = 

#Go through unique ids
for key in unique_ids:
 processed[key] = 

 #Get related records
 matching_records = [y for y in records_list if y['ods_service_ids'] == key]
 #Loop through records
 for record in matching_records:
 #For each key to check, save in dict if non null
 processed[key]['ods_service_ids'] = key
 for detail_key in keys_to_check:
 if record[detail_key] != "NO DATA" :
 processed[key][detail_key] = record[detail_key]
 ##Note : doesn't handle duplicate values for different keys so far


#Records are put back in list
output_data = [processed[x] for x in processed.keys()]
# -> to Pandas
df = pd.DataFrame(output_data)[['ods_service_ids','service_name', 'service_plan_name', 'CPU','RAM','NIC','DRIVE']]

#Export to Excel
df.to_excel("output.xlsx",sheet_name='Sheet_name_1', index=False)

The above should work but I wasn't really sure on how you wanted to save duplicated records for the same id. Do you look to store them as DRIVE_0, DRIVE_1, DRIVE_2 ?

EDIT:

df could be exported in a different way. Replaced below #export to Excel with the following :

df.to_excel("output.xlsx",sheet_name='Sheet_name_1')

EDIT 2:

with no input data it was hard to see any flows. Corrected the code above with fake data

edited Mar 26 at 17:34

answered Mar 26 at 15:21

Sebastien D

2,2072 gold badges8 silver badges29 bronze badges

Personally I would go with pandas.

import pandas as pd

#Loading into pandas
df_data = pd.read_excel('sample.xlsx')
df_data.fillna("NO DATA",inplace=True) ## Replaced nan values with "NO DATA"
unique_ids = df_data.ods_service_ids.unique()

#Storing pd into a list
records_list = df_data.to_dict('records') 
keys_to_check = ['service_name', 'service_plan_name', 'CPU','RAM','NIC','DRIVE']
processed = 

#Go through unique ids
for key in unique_ids:
 processed[key] = 

 #Get related records
 matching_records = [y for y in records_list if y['ods_service_ids'] == key]
 #Loop through records
 for record in matching_records:
 #For each key to check, save in dict if non null
 processed[key]['ods_service_ids'] = key
 for detail_key in keys_to_check:
 if record[detail_key] != "NO DATA" :
 processed[key][detail_key] = record[detail_key]
 ##Note : doesn't handle duplicate values for different keys so far


#Records are put back in list
output_data = [processed[x] for x in processed.keys()]
# -> to Pandas
df = pd.DataFrame(output_data)[['ods_service_ids','service_name', 'service_plan_name', 'CPU','RAM','NIC','DRIVE']]

#Export to Excel
df.to_excel("output.xlsx",sheet_name='Sheet_name_1', index=False)

The above should work but I wasn't really sure on how you wanted to save duplicated records for the same id. Do you look to store them as DRIVE_0, DRIVE_1, DRIVE_2 ?

EDIT:

df could be exported in a different way. Replaced below #export to Excel with the following :

df.to_excel("output.xlsx",sheet_name='Sheet_name_1')

EDIT 2:

with no input data it was hard to see any flows. Corrected the code above with fake data

edited Mar 26 at 17:34

answered Mar 26 at 15:21

Sebastien D

2,2072 gold badges8 silver badges29 bronze badges

edited Mar 26 at 17:34

answered Mar 26 at 15:21

Sebastien D

2,2072 gold badges8 silver badges29 bronze badges

answered Mar 26 at 15:21

Sebastien D

2,2072 gold badges8 silver badges29 bronze badges

answered Mar 26 at 15:21

Sebastien D

2,2072 gold badges8 silver badges29 bronze badges

I think doing the processing Pandas is fine but then I'd write rows directly with openpyxl.

– Charlie Clark
Mar 26 at 16:28

I tried running this and I get: ```` >>> xcel Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'xcel' is not defined >>> >>> writer = pd.ExcelWriter('output.xlsx') Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'pd' is not defined >>> df.to_excel(writer,'Sheet1') Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'df' is not defined >>> writer.save() File "<stdin>", line 1, in <module> NameError: name 'writer' is not defined >>> ````

– Mathieu Moquin
Mar 26 at 16:52

Just edited the code for exporting

– Sebastien D
Mar 26 at 17:01

@SebastienD it says that df is not defined, which is weird given the fact that you actually define its values. Could it be an error with my Visual Code configuration? I set Environment variables for Python, do I need to do it for each library I add?

– Mathieu Moquin
Mar 26 at 17:10

1

Mathieu, I just ran the code and indeed there were a couple errors. Corrected them, hope it will work. Anyway, do not focus on my approach, please consider what @CharlieClarck suggested which looks interesting

– Sebastien D
Mar 26 at 17:35

add a comment |

I think doing the processing Pandas is fine but then I'd write rows directly with openpyxl.

– Charlie Clark
Mar 26 at 16:28

I tried running this and I get: ```` >>> xcel Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'xcel' is not defined >>> >>> writer = pd.ExcelWriter('output.xlsx') Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'pd' is not defined >>> df.to_excel(writer,'Sheet1') Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'df' is not defined >>> writer.save() File "<stdin>", line 1, in <module> NameError: name 'writer' is not defined >>> ````

– Mathieu Moquin
Mar 26 at 16:52

Just edited the code for exporting

– Sebastien D
Mar 26 at 17:01

@SebastienD it says that df is not defined, which is weird given the fact that you actually define its values. Could it be an error with my Visual Code configuration? I set Environment variables for Python, do I need to do it for each library I add?

– Mathieu Moquin
Mar 26 at 17:10

1

Mathieu, I just ran the code and indeed there were a couple errors. Corrected them, hope it will work. Anyway, do not focus on my approach, please consider what @CharlieClarck suggested which looks interesting

– Sebastien D
Mar 26 at 17:35

I think doing the processing Pandas is fine but then I'd write rows directly with openpyxl.

– Charlie Clark
Mar 26 at 16:28

I tried running this and I get: ```` >>> xcel Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'xcel' is not defined >>> >>> writer = pd.ExcelWriter('output.xlsx') Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'pd' is not defined >>> df.to_excel(writer,'Sheet1') Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'df' is not defined >>> writer.save() File "<stdin>", line 1, in <module> NameError: name 'writer' is not defined >>> ````

– Mathieu Moquin
Mar 26 at 16:52

Just edited the code for exporting

– Sebastien D
Mar 26 at 17:01

@SebastienD it says that df is not defined, which is weird given the fact that you actually define its values. Could it be an error with my Visual Code configuration? I set Environment variables for Python, do I need to do it for each library I add?

– Mathieu Moquin
Mar 26 at 17:10

Mathieu, I just ran the code and indeed there were a couple errors. Corrected them, hope it will work. Anyway, do not focus on my approach, please consider what @CharlieClarck suggested which looks interesting

– Sebastien D
Mar 26 at 17:35

add a comment |

To be honest, I think you've managed to get confused by data structures and come up with something far more complicated than you need.

One approach that would suit would be to use Python dictionaries for each service, updating them row by row.

wb = load_workbook("sample.xlsx")
ws = wb.active
objs = 
headers = next(ws.iter_rows(min_row=1, max_row=1, values_only=True))
for row in ws.iter_rows(min_row=2, values_only=True):
 if row[0] not in objs:
 obj = key:value for key, value in zip(headers, row)
 objs[obj['ods_service_id']] = obj
 else:# update dict with non-None values
 extra = key:value for key, value in zip(headers[3:], row[3:]) if value != "NULL"
 obj.update(extra)

# write to new workbook
wb2 = Workbook()
ws2 = wb2.active
ws2.append(headers)
for row in objs.values(): # do they need sorting?
 ws2.append([obj[key] for key in headers])

Note how you can do everything without using counters.

edited Mar 27 at 8:33

answered Mar 26 at 16:44

Charlie Clark

11k2 gold badges26 silver badges37 bronze badges

For the most part this runs, but it is telling me that for "header", in zip(header, row), header is not defined. because of that error though, the script fails to write anything in output.xlsx

– Mathieu Moquin
Mar 26 at 19:02

seems the header issue was indeed the S missing. So the code continues running here and I'm getting a keyerror on line 13 : objs[obj['ods_service_id']] = obj

– Mathieu Moquin
Mar 26 at 19:11

What does obj.keys() give you? Or what's the first value of ´headers`?

– Charlie Clark
Mar 27 at 8:31

add a comment |

To be honest, I think you've managed to get confused by data structures and come up with something far more complicated than you need.

One approach that would suit would be to use Python dictionaries for each service, updating them row by row.

wb = load_workbook("sample.xlsx")
ws = wb.active
objs = 
headers = next(ws.iter_rows(min_row=1, max_row=1, values_only=True))
for row in ws.iter_rows(min_row=2, values_only=True):
 if row[0] not in objs:
 obj = key:value for key, value in zip(headers, row)
 objs[obj['ods_service_id']] = obj
 else:# update dict with non-None values
 extra = key:value for key, value in zip(headers[3:], row[3:]) if value != "NULL"
 obj.update(extra)

# write to new workbook
wb2 = Workbook()
ws2 = wb2.active
ws2.append(headers)
for row in objs.values(): # do they need sorting?
 ws2.append([obj[key] for key in headers])

Note how you can do everything without using counters.

edited Mar 27 at 8:33

answered Mar 26 at 16:44

Charlie Clark

11k2 gold badges26 silver badges37 bronze badges

For the most part this runs, but it is telling me that for "header", in zip(header, row), header is not defined. because of that error though, the script fails to write anything in output.xlsx

– Mathieu Moquin
Mar 26 at 19:02

seems the header issue was indeed the S missing. So the code continues running here and I'm getting a keyerror on line 13 : objs[obj['ods_service_id']] = obj

– Mathieu Moquin
Mar 26 at 19:11

What does obj.keys() give you? Or what's the first value of ´headers`?

– Charlie Clark
Mar 27 at 8:31

add a comment |

To be honest, I think you've managed to get confused by data structures and come up with something far more complicated than you need.

One approach that would suit would be to use Python dictionaries for each service, updating them row by row.

wb = load_workbook("sample.xlsx")
ws = wb.active
objs = 
headers = next(ws.iter_rows(min_row=1, max_row=1, values_only=True))
for row in ws.iter_rows(min_row=2, values_only=True):
 if row[0] not in objs:
 obj = key:value for key, value in zip(headers, row)
 objs[obj['ods_service_id']] = obj
 else:# update dict with non-None values
 extra = key:value for key, value in zip(headers[3:], row[3:]) if value != "NULL"
 obj.update(extra)

# write to new workbook
wb2 = Workbook()
ws2 = wb2.active
ws2.append(headers)
for row in objs.values(): # do they need sorting?
 ws2.append([obj[key] for key in headers])

Note how you can do everything without using counters.

edited Mar 27 at 8:33

answered Mar 26 at 16:44

Charlie Clark

11k2 gold badges26 silver badges37 bronze badges

To be honest, I think you've managed to get confused by data structures and come up with something far more complicated than you need.

One approach that would suit would be to use Python dictionaries for each service, updating them row by row.

wb = load_workbook("sample.xlsx")
ws = wb.active
objs = 
headers = next(ws.iter_rows(min_row=1, max_row=1, values_only=True))
for row in ws.iter_rows(min_row=2, values_only=True):
 if row[0] not in objs:
 obj = key:value for key, value in zip(headers, row)
 objs[obj['ods_service_id']] = obj
 else:# update dict with non-None values
 extra = key:value for key, value in zip(headers[3:], row[3:]) if value != "NULL"
 obj.update(extra)

# write to new workbook
wb2 = Workbook()
ws2 = wb2.active
ws2.append(headers)
for row in objs.values(): # do they need sorting?
 ws2.append([obj[key] for key in headers])

Note how you can do everything without using counters.

edited Mar 27 at 8:33

answered Mar 26 at 16:44

Charlie Clark

11k2 gold badges26 silver badges37 bronze badges

edited Mar 27 at 8:33

answered Mar 26 at 16:44

Charlie Clark

11k2 gold badges26 silver badges37 bronze badges

answered Mar 26 at 16:44

Charlie Clark

11k2 gold badges26 silver badges37 bronze badges

answered Mar 26 at 16:44

Charlie Clark

11k2 gold badges26 silver badges37 bronze badges

For the most part this runs, but it is telling me that for "header", in zip(header, row), header is not defined. because of that error though, the script fails to write anything in output.xlsx

– Mathieu Moquin
Mar 26 at 19:02

seems the header issue was indeed the S missing. So the code continues running here and I'm getting a keyerror on line 13 : objs[obj['ods_service_id']] = obj

– Mathieu Moquin
Mar 26 at 19:11

What does obj.keys() give you? Or what's the first value of ´headers`?

– Charlie Clark
Mar 27 at 8:31

add a comment |

For the most part this runs, but it is telling me that for "header", in zip(header, row), header is not defined. because of that error though, the script fails to write anything in output.xlsx

– Mathieu Moquin
Mar 26 at 19:02

seems the header issue was indeed the S missing. So the code continues running here and I'm getting a keyerror on line 13 : objs[obj['ods_service_id']] = obj

– Mathieu Moquin
Mar 26 at 19:11

What does obj.keys() give you? Or what's the first value of ´headers`?

– Charlie Clark
Mar 27 at 8:31

For the most part this runs, but it is telling me that for "header", in zip(header, row), header is not defined. because of that error though, the script fails to write anything in output.xlsx

– Mathieu Moquin
Mar 26 at 19:02

seems the header issue was indeed the S missing. So the code continues running here and I'm getting a keyerror on line 13 : objs[obj['ods_service_id']] = obj

– Mathieu Moquin
Mar 26 at 19:11

What does obj.keys() give you? Or what's the first value of ´headers`?

– Charlie Clark
Mar 27 at 8:31

add a comment |

-1

I would suggest using the pandas library for this and then you can easily do any kind of transformation.

import pandas as pd

exceldata = pd.read_excel('tmp.xlsx', index_col=0) 

print(exceldata)

You can easily remove null/na value or you can replace it and export it into excel format.

Reference For Help:

Reading Excel

Drop Na Value

Replace NA Value

edited Mar 26 at 15:31

baduker

1,2224 gold badges11 silver badges21 bronze badges

answered Mar 26 at 14:59

Prakash Choudhary

849 bronze badges

add a comment |

-1

I would suggest using the pandas library for this and then you can easily do any kind of transformation.

import pandas as pd

exceldata = pd.read_excel('tmp.xlsx', index_col=0) 

print(exceldata)

You can easily remove null/na value or you can replace it and export it into excel format.

Reference For Help:

Reading Excel

Drop Na Value

Replace NA Value

edited Mar 26 at 15:31

baduker

1,2224 gold badges11 silver badges21 bronze badges

answered Mar 26 at 14:59

Prakash Choudhary

849 bronze badges

add a comment |

-1

I would suggest using the pandas library for this and then you can easily do any kind of transformation.

import pandas as pd

exceldata = pd.read_excel('tmp.xlsx', index_col=0) 

print(exceldata)

You can easily remove null/na value or you can replace it and export it into excel format.

Reference For Help:

Reading Excel

Drop Na Value

Replace NA Value

edited Mar 26 at 15:31

baduker

1,2224 gold badges11 silver badges21 bronze badges

answered Mar 26 at 14:59

Prakash Choudhary

849 bronze badges

I would suggest using the pandas library for this and then you can easily do any kind of transformation.

import pandas as pd

exceldata = pd.read_excel('tmp.xlsx', index_col=0) 

print(exceldata)

You can easily remove null/na value or you can replace it and export it into excel format.

Reference For Help:

Reading Excel

Drop Na Value

Replace NA Value

edited Mar 26 at 15:31

baduker

1,2224 gold badges11 silver badges21 bronze badges

answered Mar 26 at 14:59

Prakash Choudhary

849 bronze badges

edited Mar 26 at 15:31

baduker

1,2224 gold badges11 silver badges21 bronze badges

edited Mar 26 at 15:31

baduker

1,2224 gold badges11 silver badges21 bronze badges

edited Mar 26 at 15:31

baduker

1,2224 gold badges11 silver badges21 bronze badges

answered Mar 26 at 14:59

Prakash Choudhary

849 bronze badges

answered Mar 26 at 14:59

Prakash Choudhary

849 bronze badges

answered Mar 26 at 14:59

Prakash Choudhary

849 bronze badges

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Styjun

3 Answers
3

EDIT:

EDIT 2:

Your Answer

Post as a guest

3 Answers
3

3 Answers
3

EDIT:

EDIT 2:

EDIT:

EDIT 2:

EDIT:

EDIT 2:

EDIT:

EDIT 2:

Post as a guest

Popular posts from this blog

밀양 대씨 역사 각주 함께 보기 둘러보기 메뉴밀양 대씨

1973년 목차 사건 문화 탄생 사망 노벨상 달력 둘러보기 메뉴

3 Answers 3

EDIT:

EDIT 2:

Your Answer

Sign up or log in

Post as a guest

Post as a guest

3 Answers 3

3 Answers 3

EDIT:

EDIT 2:

EDIT:

EDIT 2:

EDIT:

EDIT 2:

EDIT:

EDIT 2:

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

밀양 대씨 역사 각주 함께 보기 둘러보기 메뉴밀양 대씨

1973년 목차 사건 문화 탄생 사망 노벨상 달력 둘러보기 메뉴

3 Answers
3

3 Answers
3

3 Answers
3