How can I concatenate multiple rows of excel data into one?How can I represent an 'Enum' in Python?How can I safely create a nested directory?How can I remove a trailing newline?How can I make a time delay in Python?How can you profile a Python script?How do I concatenate two lists in Python?How can I count the occurrences of a list item?“Large data” work flows using pandasHow to iterate over rows in a DataFrame in Pandas?Importing Multiple Excel Files using OpenPyXL
Why is the UH-60 tail rotor canted?
Considerations when providing money to one child now, and the other later?
Can't understand how static works exactly
How often should alkaline batteries be checked when they are in a device?
Where can I find maps and other historical resources / references of Calcutta / Kolkata in the Victorian era?
Adding gears to my grandson's 12" bike
Origin of the suffix in hippocampus
Why are Oscar, India, and X-Ray (O, I, and X) not used as taxiway identifiers?
Could I play forever with this loop combination?
Why does the salt in the oceans not sink to the bottom?
What kind of world would drive brains to evolve high-throughput sensory?
How can Kazakhstan perform MITM attacks on all HTTPS traffic?
Found old paper shares of Motorola Inc that has since been broken up
Are stackless C++20 coroutines a problem?
How can I show that the speed of light in vacuum is the same in all reference frames?
How am I supposed to put out fires?
Is an easily guessed plot twist a good plot twist?
Book in which the "mountain" in the distance was a hole in the flat world
Do you have to hide in order for other creatures to not know your location inside a fog cloud?
Why is DC so, so, so Democratic?
Create a dropshadow only layer
How does mathematics work?
Is it better to merge "often" or only after completion do a big merge of feature branches?
Is the apartment I want to rent a scam?
How can I concatenate multiple rows of excel data into one?
How can I represent an 'Enum' in Python?How can I safely create a nested directory?How can I remove a trailing newline?How can I make a time delay in Python?How can you profile a Python script?How do I concatenate two lists in Python?How can I count the occurrences of a list item?“Large data” work flows using pandasHow to iterate over rows in a DataFrame in Pandas?Importing Multiple Excel Files using OpenPyXL
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
I'm currently facing an issue where I need to bring all of the data shown in the images below into one line only.

So using Python and Openpyxl, I tried to write a parsing script that reads the line and only copies when values are non-null or non-identical, into a new workbook.
I get out of range errors, and the code does not keep just the data I want. I've spent multiple hours on it, so I thought I would ask here to see if I can get unstuck.
I've read some documentation on Openpyxl and about making lists in python, tried a couple of videos on youtube, but none of them did exactly what I was trying to achieve.
import openpyxl
from openpyxl import Workbook
path = "sample.xlsx"
wb = openpyxl.load_workbook(path)
ws = wb.active
path2 = "output.xlsx"
wb2 = Workbook()
ws2 = wb2.active
listab = []
rows = ws.max_row
columns = ws.max_column
for i in range (1, rows+1):
listab.append([])
cellValue = " "
prevCell = " "
for c in range (1, rows+1):
for r in range(1, columns+1):
cellValue = ws.cell(row=r, column=c).value
if cellValue == prevCell:
listab[r-1].append(prevCell)
elif cellValue == "NULL":
listab[r-1].append(prevCell)
elif cellValue != prevCell:
listab[r-1].append(cellValue)
prevCell = cellValue
for r in range(1, rows+1):
for c in range (1, columns+1):
j = ws2.cell(row = r, column=c)
j.value = listab[r-1][c-1]
print(listab)
wb2.save("output.xlsx")
There should be one line with the below information:
ods_service_id | service_name| service_plan_name| CPU | RAM | NIC | DRIVE |
python openpyxl
add a comment |
I'm currently facing an issue where I need to bring all of the data shown in the images below into one line only.

So using Python and Openpyxl, I tried to write a parsing script that reads the line and only copies when values are non-null or non-identical, into a new workbook.
I get out of range errors, and the code does not keep just the data I want. I've spent multiple hours on it, so I thought I would ask here to see if I can get unstuck.
I've read some documentation on Openpyxl and about making lists in python, tried a couple of videos on youtube, but none of them did exactly what I was trying to achieve.
import openpyxl
from openpyxl import Workbook
path = "sample.xlsx"
wb = openpyxl.load_workbook(path)
ws = wb.active
path2 = "output.xlsx"
wb2 = Workbook()
ws2 = wb2.active
listab = []
rows = ws.max_row
columns = ws.max_column
for i in range (1, rows+1):
listab.append([])
cellValue = " "
prevCell = " "
for c in range (1, rows+1):
for r in range(1, columns+1):
cellValue = ws.cell(row=r, column=c).value
if cellValue == prevCell:
listab[r-1].append(prevCell)
elif cellValue == "NULL":
listab[r-1].append(prevCell)
elif cellValue != prevCell:
listab[r-1].append(cellValue)
prevCell = cellValue
for r in range(1, rows+1):
for c in range (1, columns+1):
j = ws2.cell(row = r, column=c)
j.value = listab[r-1][c-1]
print(listab)
wb2.save("output.xlsx")
There should be one line with the below information:
ods_service_id | service_name| service_plan_name| CPU | RAM | NIC | DRIVE |
python openpyxl
It's not entirely clear what you want to do. Please describe what you want to do with individual rows.
– Charlie Clark
Mar 26 at 14:40
add a comment |
I'm currently facing an issue where I need to bring all of the data shown in the images below into one line only.

So using Python and Openpyxl, I tried to write a parsing script that reads the line and only copies when values are non-null or non-identical, into a new workbook.
I get out of range errors, and the code does not keep just the data I want. I've spent multiple hours on it, so I thought I would ask here to see if I can get unstuck.
I've read some documentation on Openpyxl and about making lists in python, tried a couple of videos on youtube, but none of them did exactly what I was trying to achieve.
import openpyxl
from openpyxl import Workbook
path = "sample.xlsx"
wb = openpyxl.load_workbook(path)
ws = wb.active
path2 = "output.xlsx"
wb2 = Workbook()
ws2 = wb2.active
listab = []
rows = ws.max_row
columns = ws.max_column
for i in range (1, rows+1):
listab.append([])
cellValue = " "
prevCell = " "
for c in range (1, rows+1):
for r in range(1, columns+1):
cellValue = ws.cell(row=r, column=c).value
if cellValue == prevCell:
listab[r-1].append(prevCell)
elif cellValue == "NULL":
listab[r-1].append(prevCell)
elif cellValue != prevCell:
listab[r-1].append(cellValue)
prevCell = cellValue
for r in range(1, rows+1):
for c in range (1, columns+1):
j = ws2.cell(row = r, column=c)
j.value = listab[r-1][c-1]
print(listab)
wb2.save("output.xlsx")
There should be one line with the below information:
ods_service_id | service_name| service_plan_name| CPU | RAM | NIC | DRIVE |
python openpyxl
I'm currently facing an issue where I need to bring all of the data shown in the images below into one line only.

So using Python and Openpyxl, I tried to write a parsing script that reads the line and only copies when values are non-null or non-identical, into a new workbook.
I get out of range errors, and the code does not keep just the data I want. I've spent multiple hours on it, so I thought I would ask here to see if I can get unstuck.
I've read some documentation on Openpyxl and about making lists in python, tried a couple of videos on youtube, but none of them did exactly what I was trying to achieve.
import openpyxl
from openpyxl import Workbook
path = "sample.xlsx"
wb = openpyxl.load_workbook(path)
ws = wb.active
path2 = "output.xlsx"
wb2 = Workbook()
ws2 = wb2.active
listab = []
rows = ws.max_row
columns = ws.max_column
for i in range (1, rows+1):
listab.append([])
cellValue = " "
prevCell = " "
for c in range (1, rows+1):
for r in range(1, columns+1):
cellValue = ws.cell(row=r, column=c).value
if cellValue == prevCell:
listab[r-1].append(prevCell)
elif cellValue == "NULL":
listab[r-1].append(prevCell)
elif cellValue != prevCell:
listab[r-1].append(cellValue)
prevCell = cellValue
for r in range(1, rows+1):
for c in range (1, columns+1):
j = ws2.cell(row = r, column=c)
j.value = listab[r-1][c-1]
print(listab)
wb2.save("output.xlsx")
There should be one line with the below information:
ods_service_id | service_name| service_plan_name| CPU | RAM | NIC | DRIVE |
python openpyxl
python openpyxl
edited Mar 26 at 14:45
Mailerdaimon
4,7651 gold badge26 silver badges40 bronze badges
4,7651 gold badge26 silver badges40 bronze badges
asked Mar 26 at 14:08
Mathieu MoquinMathieu Moquin
192 bronze badges
192 bronze badges
It's not entirely clear what you want to do. Please describe what you want to do with individual rows.
– Charlie Clark
Mar 26 at 14:40
add a comment |
It's not entirely clear what you want to do. Please describe what you want to do with individual rows.
– Charlie Clark
Mar 26 at 14:40
It's not entirely clear what you want to do. Please describe what you want to do with individual rows.
– Charlie Clark
Mar 26 at 14:40
It's not entirely clear what you want to do. Please describe what you want to do with individual rows.
– Charlie Clark
Mar 26 at 14:40
add a comment |
3 Answers
3
active
oldest
votes
Personally I would go with pandas.
import pandas as pd
#Loading into pandas
df_data = pd.read_excel('sample.xlsx')
df_data.fillna("NO DATA",inplace=True) ## Replaced nan values with "NO DATA"
unique_ids = df_data.ods_service_ids.unique()
#Storing pd into a list
records_list = df_data.to_dict('records')
keys_to_check = ['service_name', 'service_plan_name', 'CPU','RAM','NIC','DRIVE']
processed =
#Go through unique ids
for key in unique_ids:
processed[key] =
#Get related records
matching_records = [y for y in records_list if y['ods_service_ids'] == key]
#Loop through records
for record in matching_records:
#For each key to check, save in dict if non null
processed[key]['ods_service_ids'] = key
for detail_key in keys_to_check:
if record[detail_key] != "NO DATA" :
processed[key][detail_key] = record[detail_key]
##Note : doesn't handle duplicate values for different keys so far
#Records are put back in list
output_data = [processed[x] for x in processed.keys()]
# -> to Pandas
df = pd.DataFrame(output_data)[['ods_service_ids','service_name', 'service_plan_name', 'CPU','RAM','NIC','DRIVE']]
#Export to Excel
df.to_excel("output.xlsx",sheet_name='Sheet_name_1', index=False)
The above should work but I wasn't really sure on how you wanted to save duplicated records for the same id. Do you look to store them as DRIVE_0, DRIVE_1, DRIVE_2 ?
EDIT:
df could be exported in a different way. Replaced below #export to Excel with the following :
df.to_excel("output.xlsx",sheet_name='Sheet_name_1')
EDIT 2:
with no input data it was hard to see any flows. Corrected the code above with fake data
I think doing the processing Pandas is fine but then I'd write rows directly with openpyxl.
– Charlie Clark
Mar 26 at 16:28
I tried running this and I get: ```` >>> xcel Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'xcel' is not defined >>> >>> writer = pd.ExcelWriter('output.xlsx') Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'pd' is not defined >>> df.to_excel(writer,'Sheet1') Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'df' is not defined >>> writer.save() File "<stdin>", line 1, in <module> NameError: name 'writer' is not defined >>> ````
– Mathieu Moquin
Mar 26 at 16:52
Just edited the code for exporting
– Sebastien D
Mar 26 at 17:01
@SebastienD it says that df is not defined, which is weird given the fact that you actually define its values. Could it be an error with my Visual Code configuration? I set Environment variables for Python, do I need to do it for each library I add?
– Mathieu Moquin
Mar 26 at 17:10
1
Mathieu, I just ran the code and indeed there were a couple errors. Corrected them, hope it will work. Anyway, do not focus on my approach, please consider what @CharlieClarck suggested which looks interesting
– Sebastien D
Mar 26 at 17:35
add a comment |
To be honest, I think you've managed to get confused by data structures and come up with something far more complicated than you need.
One approach that would suit would be to use Python dictionaries for each service, updating them row by row.
wb = load_workbook("sample.xlsx")
ws = wb.active
objs =
headers = next(ws.iter_rows(min_row=1, max_row=1, values_only=True))
for row in ws.iter_rows(min_row=2, values_only=True):
if row[0] not in objs:
obj = key:value for key, value in zip(headers, row)
objs[obj['ods_service_id']] = obj
else:# update dict with non-None values
extra = key:value for key, value in zip(headers[3:], row[3:]) if value != "NULL"
obj.update(extra)
# write to new workbook
wb2 = Workbook()
ws2 = wb2.active
ws2.append(headers)
for row in objs.values(): # do they need sorting?
ws2.append([obj[key] for key in headers])
Note how you can do everything without using counters.
For the most part this runs, but it is telling me that for "header", in zip(header, row), header is not defined. because of that error though, the script fails to write anything in output.xlsx
– Mathieu Moquin
Mar 26 at 19:02
seems the header issue was indeed the S missing. So the code continues running here and I'm getting a keyerror on line 13 : objs[obj['ods_service_id']] = obj
– Mathieu Moquin
Mar 26 at 19:11
What doesobj.keys()give you? Or what's the first value of ´headers`?
– Charlie Clark
Mar 27 at 8:31
add a comment |
I would suggest using the pandas library for this and then you can easily do any kind of transformation.
import pandas as pd
exceldata = pd.read_excel('tmp.xlsx', index_col=0)
print(exceldata)
You can easily remove null/na value or you can replace it and export it into excel format.
Reference For Help:
Reading Excel
Drop Na Value
Replace NA Value
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55359200%2fhow-can-i-concatenate-multiple-rows-of-excel-data-into-one%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
Personally I would go with pandas.
import pandas as pd
#Loading into pandas
df_data = pd.read_excel('sample.xlsx')
df_data.fillna("NO DATA",inplace=True) ## Replaced nan values with "NO DATA"
unique_ids = df_data.ods_service_ids.unique()
#Storing pd into a list
records_list = df_data.to_dict('records')
keys_to_check = ['service_name', 'service_plan_name', 'CPU','RAM','NIC','DRIVE']
processed =
#Go through unique ids
for key in unique_ids:
processed[key] =
#Get related records
matching_records = [y for y in records_list if y['ods_service_ids'] == key]
#Loop through records
for record in matching_records:
#For each key to check, save in dict if non null
processed[key]['ods_service_ids'] = key
for detail_key in keys_to_check:
if record[detail_key] != "NO DATA" :
processed[key][detail_key] = record[detail_key]
##Note : doesn't handle duplicate values for different keys so far
#Records are put back in list
output_data = [processed[x] for x in processed.keys()]
# -> to Pandas
df = pd.DataFrame(output_data)[['ods_service_ids','service_name', 'service_plan_name', 'CPU','RAM','NIC','DRIVE']]
#Export to Excel
df.to_excel("output.xlsx",sheet_name='Sheet_name_1', index=False)
The above should work but I wasn't really sure on how you wanted to save duplicated records for the same id. Do you look to store them as DRIVE_0, DRIVE_1, DRIVE_2 ?
EDIT:
df could be exported in a different way. Replaced below #export to Excel with the following :
df.to_excel("output.xlsx",sheet_name='Sheet_name_1')
EDIT 2:
with no input data it was hard to see any flows. Corrected the code above with fake data
I think doing the processing Pandas is fine but then I'd write rows directly with openpyxl.
– Charlie Clark
Mar 26 at 16:28
I tried running this and I get: ```` >>> xcel Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'xcel' is not defined >>> >>> writer = pd.ExcelWriter('output.xlsx') Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'pd' is not defined >>> df.to_excel(writer,'Sheet1') Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'df' is not defined >>> writer.save() File "<stdin>", line 1, in <module> NameError: name 'writer' is not defined >>> ````
– Mathieu Moquin
Mar 26 at 16:52
Just edited the code for exporting
– Sebastien D
Mar 26 at 17:01
@SebastienD it says that df is not defined, which is weird given the fact that you actually define its values. Could it be an error with my Visual Code configuration? I set Environment variables for Python, do I need to do it for each library I add?
– Mathieu Moquin
Mar 26 at 17:10
1
Mathieu, I just ran the code and indeed there were a couple errors. Corrected them, hope it will work. Anyway, do not focus on my approach, please consider what @CharlieClarck suggested which looks interesting
– Sebastien D
Mar 26 at 17:35
add a comment |
Personally I would go with pandas.
import pandas as pd
#Loading into pandas
df_data = pd.read_excel('sample.xlsx')
df_data.fillna("NO DATA",inplace=True) ## Replaced nan values with "NO DATA"
unique_ids = df_data.ods_service_ids.unique()
#Storing pd into a list
records_list = df_data.to_dict('records')
keys_to_check = ['service_name', 'service_plan_name', 'CPU','RAM','NIC','DRIVE']
processed =
#Go through unique ids
for key in unique_ids:
processed[key] =
#Get related records
matching_records = [y for y in records_list if y['ods_service_ids'] == key]
#Loop through records
for record in matching_records:
#For each key to check, save in dict if non null
processed[key]['ods_service_ids'] = key
for detail_key in keys_to_check:
if record[detail_key] != "NO DATA" :
processed[key][detail_key] = record[detail_key]
##Note : doesn't handle duplicate values for different keys so far
#Records are put back in list
output_data = [processed[x] for x in processed.keys()]
# -> to Pandas
df = pd.DataFrame(output_data)[['ods_service_ids','service_name', 'service_plan_name', 'CPU','RAM','NIC','DRIVE']]
#Export to Excel
df.to_excel("output.xlsx",sheet_name='Sheet_name_1', index=False)
The above should work but I wasn't really sure on how you wanted to save duplicated records for the same id. Do you look to store them as DRIVE_0, DRIVE_1, DRIVE_2 ?
EDIT:
df could be exported in a different way. Replaced below #export to Excel with the following :
df.to_excel("output.xlsx",sheet_name='Sheet_name_1')
EDIT 2:
with no input data it was hard to see any flows. Corrected the code above with fake data
I think doing the processing Pandas is fine but then I'd write rows directly with openpyxl.
– Charlie Clark
Mar 26 at 16:28
I tried running this and I get: ```` >>> xcel Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'xcel' is not defined >>> >>> writer = pd.ExcelWriter('output.xlsx') Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'pd' is not defined >>> df.to_excel(writer,'Sheet1') Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'df' is not defined >>> writer.save() File "<stdin>", line 1, in <module> NameError: name 'writer' is not defined >>> ````
– Mathieu Moquin
Mar 26 at 16:52
Just edited the code for exporting
– Sebastien D
Mar 26 at 17:01
@SebastienD it says that df is not defined, which is weird given the fact that you actually define its values. Could it be an error with my Visual Code configuration? I set Environment variables for Python, do I need to do it for each library I add?
– Mathieu Moquin
Mar 26 at 17:10
1
Mathieu, I just ran the code and indeed there were a couple errors. Corrected them, hope it will work. Anyway, do not focus on my approach, please consider what @CharlieClarck suggested which looks interesting
– Sebastien D
Mar 26 at 17:35
add a comment |
Personally I would go with pandas.
import pandas as pd
#Loading into pandas
df_data = pd.read_excel('sample.xlsx')
df_data.fillna("NO DATA",inplace=True) ## Replaced nan values with "NO DATA"
unique_ids = df_data.ods_service_ids.unique()
#Storing pd into a list
records_list = df_data.to_dict('records')
keys_to_check = ['service_name', 'service_plan_name', 'CPU','RAM','NIC','DRIVE']
processed =
#Go through unique ids
for key in unique_ids:
processed[key] =
#Get related records
matching_records = [y for y in records_list if y['ods_service_ids'] == key]
#Loop through records
for record in matching_records:
#For each key to check, save in dict if non null
processed[key]['ods_service_ids'] = key
for detail_key in keys_to_check:
if record[detail_key] != "NO DATA" :
processed[key][detail_key] = record[detail_key]
##Note : doesn't handle duplicate values for different keys so far
#Records are put back in list
output_data = [processed[x] for x in processed.keys()]
# -> to Pandas
df = pd.DataFrame(output_data)[['ods_service_ids','service_name', 'service_plan_name', 'CPU','RAM','NIC','DRIVE']]
#Export to Excel
df.to_excel("output.xlsx",sheet_name='Sheet_name_1', index=False)
The above should work but I wasn't really sure on how you wanted to save duplicated records for the same id. Do you look to store them as DRIVE_0, DRIVE_1, DRIVE_2 ?
EDIT:
df could be exported in a different way. Replaced below #export to Excel with the following :
df.to_excel("output.xlsx",sheet_name='Sheet_name_1')
EDIT 2:
with no input data it was hard to see any flows. Corrected the code above with fake data
Personally I would go with pandas.
import pandas as pd
#Loading into pandas
df_data = pd.read_excel('sample.xlsx')
df_data.fillna("NO DATA",inplace=True) ## Replaced nan values with "NO DATA"
unique_ids = df_data.ods_service_ids.unique()
#Storing pd into a list
records_list = df_data.to_dict('records')
keys_to_check = ['service_name', 'service_plan_name', 'CPU','RAM','NIC','DRIVE']
processed =
#Go through unique ids
for key in unique_ids:
processed[key] =
#Get related records
matching_records = [y for y in records_list if y['ods_service_ids'] == key]
#Loop through records
for record in matching_records:
#For each key to check, save in dict if non null
processed[key]['ods_service_ids'] = key
for detail_key in keys_to_check:
if record[detail_key] != "NO DATA" :
processed[key][detail_key] = record[detail_key]
##Note : doesn't handle duplicate values for different keys so far
#Records are put back in list
output_data = [processed[x] for x in processed.keys()]
# -> to Pandas
df = pd.DataFrame(output_data)[['ods_service_ids','service_name', 'service_plan_name', 'CPU','RAM','NIC','DRIVE']]
#Export to Excel
df.to_excel("output.xlsx",sheet_name='Sheet_name_1', index=False)
The above should work but I wasn't really sure on how you wanted to save duplicated records for the same id. Do you look to store them as DRIVE_0, DRIVE_1, DRIVE_2 ?
EDIT:
df could be exported in a different way. Replaced below #export to Excel with the following :
df.to_excel("output.xlsx",sheet_name='Sheet_name_1')
EDIT 2:
with no input data it was hard to see any flows. Corrected the code above with fake data
edited Mar 26 at 17:34
answered Mar 26 at 15:21
Sebastien DSebastien D
2,2072 gold badges8 silver badges29 bronze badges
2,2072 gold badges8 silver badges29 bronze badges
I think doing the processing Pandas is fine but then I'd write rows directly with openpyxl.
– Charlie Clark
Mar 26 at 16:28
I tried running this and I get: ```` >>> xcel Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'xcel' is not defined >>> >>> writer = pd.ExcelWriter('output.xlsx') Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'pd' is not defined >>> df.to_excel(writer,'Sheet1') Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'df' is not defined >>> writer.save() File "<stdin>", line 1, in <module> NameError: name 'writer' is not defined >>> ````
– Mathieu Moquin
Mar 26 at 16:52
Just edited the code for exporting
– Sebastien D
Mar 26 at 17:01
@SebastienD it says that df is not defined, which is weird given the fact that you actually define its values. Could it be an error with my Visual Code configuration? I set Environment variables for Python, do I need to do it for each library I add?
– Mathieu Moquin
Mar 26 at 17:10
1
Mathieu, I just ran the code and indeed there were a couple errors. Corrected them, hope it will work. Anyway, do not focus on my approach, please consider what @CharlieClarck suggested which looks interesting
– Sebastien D
Mar 26 at 17:35
add a comment |
I think doing the processing Pandas is fine but then I'd write rows directly with openpyxl.
– Charlie Clark
Mar 26 at 16:28
I tried running this and I get: ```` >>> xcel Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'xcel' is not defined >>> >>> writer = pd.ExcelWriter('output.xlsx') Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'pd' is not defined >>> df.to_excel(writer,'Sheet1') Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'df' is not defined >>> writer.save() File "<stdin>", line 1, in <module> NameError: name 'writer' is not defined >>> ````
– Mathieu Moquin
Mar 26 at 16:52
Just edited the code for exporting
– Sebastien D
Mar 26 at 17:01
@SebastienD it says that df is not defined, which is weird given the fact that you actually define its values. Could it be an error with my Visual Code configuration? I set Environment variables for Python, do I need to do it for each library I add?
– Mathieu Moquin
Mar 26 at 17:10
1
Mathieu, I just ran the code and indeed there were a couple errors. Corrected them, hope it will work. Anyway, do not focus on my approach, please consider what @CharlieClarck suggested which looks interesting
– Sebastien D
Mar 26 at 17:35
I think doing the processing Pandas is fine but then I'd write rows directly with openpyxl.
– Charlie Clark
Mar 26 at 16:28
I think doing the processing Pandas is fine but then I'd write rows directly with openpyxl.
– Charlie Clark
Mar 26 at 16:28
I tried running this and I get: ```` >>> xcel Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'xcel' is not defined >>> >>> writer = pd.ExcelWriter('output.xlsx') Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'pd' is not defined >>> df.to_excel(writer,'Sheet1') Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'df' is not defined >>> writer.save() File "<stdin>", line 1, in <module> NameError: name 'writer' is not defined >>> ````
– Mathieu Moquin
Mar 26 at 16:52
I tried running this and I get: ```` >>> xcel Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'xcel' is not defined >>> >>> writer = pd.ExcelWriter('output.xlsx') Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'pd' is not defined >>> df.to_excel(writer,'Sheet1') Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'df' is not defined >>> writer.save() File "<stdin>", line 1, in <module> NameError: name 'writer' is not defined >>> ````
– Mathieu Moquin
Mar 26 at 16:52
Just edited the code for exporting
– Sebastien D
Mar 26 at 17:01
Just edited the code for exporting
– Sebastien D
Mar 26 at 17:01
@SebastienD it says that df is not defined, which is weird given the fact that you actually define its values. Could it be an error with my Visual Code configuration? I set Environment variables for Python, do I need to do it for each library I add?
– Mathieu Moquin
Mar 26 at 17:10
@SebastienD it says that df is not defined, which is weird given the fact that you actually define its values. Could it be an error with my Visual Code configuration? I set Environment variables for Python, do I need to do it for each library I add?
– Mathieu Moquin
Mar 26 at 17:10
1
1
Mathieu, I just ran the code and indeed there were a couple errors. Corrected them, hope it will work. Anyway, do not focus on my approach, please consider what @CharlieClarck suggested which looks interesting
– Sebastien D
Mar 26 at 17:35
Mathieu, I just ran the code and indeed there were a couple errors. Corrected them, hope it will work. Anyway, do not focus on my approach, please consider what @CharlieClarck suggested which looks interesting
– Sebastien D
Mar 26 at 17:35
add a comment |
To be honest, I think you've managed to get confused by data structures and come up with something far more complicated than you need.
One approach that would suit would be to use Python dictionaries for each service, updating them row by row.
wb = load_workbook("sample.xlsx")
ws = wb.active
objs =
headers = next(ws.iter_rows(min_row=1, max_row=1, values_only=True))
for row in ws.iter_rows(min_row=2, values_only=True):
if row[0] not in objs:
obj = key:value for key, value in zip(headers, row)
objs[obj['ods_service_id']] = obj
else:# update dict with non-None values
extra = key:value for key, value in zip(headers[3:], row[3:]) if value != "NULL"
obj.update(extra)
# write to new workbook
wb2 = Workbook()
ws2 = wb2.active
ws2.append(headers)
for row in objs.values(): # do they need sorting?
ws2.append([obj[key] for key in headers])
Note how you can do everything without using counters.
For the most part this runs, but it is telling me that for "header", in zip(header, row), header is not defined. because of that error though, the script fails to write anything in output.xlsx
– Mathieu Moquin
Mar 26 at 19:02
seems the header issue was indeed the S missing. So the code continues running here and I'm getting a keyerror on line 13 : objs[obj['ods_service_id']] = obj
– Mathieu Moquin
Mar 26 at 19:11
What doesobj.keys()give you? Or what's the first value of ´headers`?
– Charlie Clark
Mar 27 at 8:31
add a comment |
To be honest, I think you've managed to get confused by data structures and come up with something far more complicated than you need.
One approach that would suit would be to use Python dictionaries for each service, updating them row by row.
wb = load_workbook("sample.xlsx")
ws = wb.active
objs =
headers = next(ws.iter_rows(min_row=1, max_row=1, values_only=True))
for row in ws.iter_rows(min_row=2, values_only=True):
if row[0] not in objs:
obj = key:value for key, value in zip(headers, row)
objs[obj['ods_service_id']] = obj
else:# update dict with non-None values
extra = key:value for key, value in zip(headers[3:], row[3:]) if value != "NULL"
obj.update(extra)
# write to new workbook
wb2 = Workbook()
ws2 = wb2.active
ws2.append(headers)
for row in objs.values(): # do they need sorting?
ws2.append([obj[key] for key in headers])
Note how you can do everything without using counters.
For the most part this runs, but it is telling me that for "header", in zip(header, row), header is not defined. because of that error though, the script fails to write anything in output.xlsx
– Mathieu Moquin
Mar 26 at 19:02
seems the header issue was indeed the S missing. So the code continues running here and I'm getting a keyerror on line 13 : objs[obj['ods_service_id']] = obj
– Mathieu Moquin
Mar 26 at 19:11
What doesobj.keys()give you? Or what's the first value of ´headers`?
– Charlie Clark
Mar 27 at 8:31
add a comment |
To be honest, I think you've managed to get confused by data structures and come up with something far more complicated than you need.
One approach that would suit would be to use Python dictionaries for each service, updating them row by row.
wb = load_workbook("sample.xlsx")
ws = wb.active
objs =
headers = next(ws.iter_rows(min_row=1, max_row=1, values_only=True))
for row in ws.iter_rows(min_row=2, values_only=True):
if row[0] not in objs:
obj = key:value for key, value in zip(headers, row)
objs[obj['ods_service_id']] = obj
else:# update dict with non-None values
extra = key:value for key, value in zip(headers[3:], row[3:]) if value != "NULL"
obj.update(extra)
# write to new workbook
wb2 = Workbook()
ws2 = wb2.active
ws2.append(headers)
for row in objs.values(): # do they need sorting?
ws2.append([obj[key] for key in headers])
Note how you can do everything without using counters.
To be honest, I think you've managed to get confused by data structures and come up with something far more complicated than you need.
One approach that would suit would be to use Python dictionaries for each service, updating them row by row.
wb = load_workbook("sample.xlsx")
ws = wb.active
objs =
headers = next(ws.iter_rows(min_row=1, max_row=1, values_only=True))
for row in ws.iter_rows(min_row=2, values_only=True):
if row[0] not in objs:
obj = key:value for key, value in zip(headers, row)
objs[obj['ods_service_id']] = obj
else:# update dict with non-None values
extra = key:value for key, value in zip(headers[3:], row[3:]) if value != "NULL"
obj.update(extra)
# write to new workbook
wb2 = Workbook()
ws2 = wb2.active
ws2.append(headers)
for row in objs.values(): # do they need sorting?
ws2.append([obj[key] for key in headers])
Note how you can do everything without using counters.
edited Mar 27 at 8:33
answered Mar 26 at 16:44
Charlie ClarkCharlie Clark
11k2 gold badges26 silver badges37 bronze badges
11k2 gold badges26 silver badges37 bronze badges
For the most part this runs, but it is telling me that for "header", in zip(header, row), header is not defined. because of that error though, the script fails to write anything in output.xlsx
– Mathieu Moquin
Mar 26 at 19:02
seems the header issue was indeed the S missing. So the code continues running here and I'm getting a keyerror on line 13 : objs[obj['ods_service_id']] = obj
– Mathieu Moquin
Mar 26 at 19:11
What doesobj.keys()give you? Or what's the first value of ´headers`?
– Charlie Clark
Mar 27 at 8:31
add a comment |
For the most part this runs, but it is telling me that for "header", in zip(header, row), header is not defined. because of that error though, the script fails to write anything in output.xlsx
– Mathieu Moquin
Mar 26 at 19:02
seems the header issue was indeed the S missing. So the code continues running here and I'm getting a keyerror on line 13 : objs[obj['ods_service_id']] = obj
– Mathieu Moquin
Mar 26 at 19:11
What doesobj.keys()give you? Or what's the first value of ´headers`?
– Charlie Clark
Mar 27 at 8:31
For the most part this runs, but it is telling me that for "header", in zip(header, row), header is not defined. because of that error though, the script fails to write anything in output.xlsx
– Mathieu Moquin
Mar 26 at 19:02
For the most part this runs, but it is telling me that for "header", in zip(header, row), header is not defined. because of that error though, the script fails to write anything in output.xlsx
– Mathieu Moquin
Mar 26 at 19:02
seems the header issue was indeed the S missing. So the code continues running here and I'm getting a keyerror on line 13 : objs[obj['ods_service_id']] = obj
– Mathieu Moquin
Mar 26 at 19:11
seems the header issue was indeed the S missing. So the code continues running here and I'm getting a keyerror on line 13 : objs[obj['ods_service_id']] = obj
– Mathieu Moquin
Mar 26 at 19:11
What does
obj.keys() give you? Or what's the first value of ´headers`?– Charlie Clark
Mar 27 at 8:31
What does
obj.keys() give you? Or what's the first value of ´headers`?– Charlie Clark
Mar 27 at 8:31
add a comment |
I would suggest using the pandas library for this and then you can easily do any kind of transformation.
import pandas as pd
exceldata = pd.read_excel('tmp.xlsx', index_col=0)
print(exceldata)
You can easily remove null/na value or you can replace it and export it into excel format.
Reference For Help:
Reading Excel
Drop Na Value
Replace NA Value
add a comment |
I would suggest using the pandas library for this and then you can easily do any kind of transformation.
import pandas as pd
exceldata = pd.read_excel('tmp.xlsx', index_col=0)
print(exceldata)
You can easily remove null/na value or you can replace it and export it into excel format.
Reference For Help:
Reading Excel
Drop Na Value
Replace NA Value
add a comment |
I would suggest using the pandas library for this and then you can easily do any kind of transformation.
import pandas as pd
exceldata = pd.read_excel('tmp.xlsx', index_col=0)
print(exceldata)
You can easily remove null/na value or you can replace it and export it into excel format.
Reference For Help:
Reading Excel
Drop Na Value
Replace NA Value
I would suggest using the pandas library for this and then you can easily do any kind of transformation.
import pandas as pd
exceldata = pd.read_excel('tmp.xlsx', index_col=0)
print(exceldata)
You can easily remove null/na value or you can replace it and export it into excel format.
Reference For Help:
Reading Excel
Drop Na Value
Replace NA Value
edited Mar 26 at 15:31
baduker
1,2224 gold badges11 silver badges21 bronze badges
1,2224 gold badges11 silver badges21 bronze badges
answered Mar 26 at 14:59
Prakash ChoudharyPrakash Choudhary
849 bronze badges
849 bronze badges
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55359200%2fhow-can-i-concatenate-multiple-rows-of-excel-data-into-one%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
It's not entirely clear what you want to do. Please describe what you want to do with individual rows.
– Charlie Clark
Mar 26 at 14:40