Count number of episodes that has a hash Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30 pm US/Eastern) Data science time! April 2019 and salary with experience The Ask Question Wizard is Live!Display number with leading zerosHow do I check if a string is a number (float)?How to get line count cheaply in Python?Count the number occurrences of a character in a stringHow do I get the number of elements in a list in Python?How to print number with commas as thousands separators?How can I count the occurrences of a list item?How do I get the row count of a pandas DataFrame?SQLAlchemy + MySQL Large Table Performance AdviceHow to count child table items with or without join to parent table using SQLAlchemy?

Could a cockatrice have parasitic embryos?

Is there a way to fake a method response using Mock or Stubs?

Is it OK if I do not take the receipt in Germany?

How do I deal with an erroneously large refund?

Where to find documentation for `whois` command options?

Why isn't everyone flabbergasted about Bran's "gift"?

Determinant of a matrix with 2 equal rows

RIP Packet Format

What's called a person who work as someone who puts products on shelves in stores?

Is it appropriate to mention a relatable company blog post when you're asked about the company?

What is the numbering system used for the DSN dishes?

Was there ever a LEGO store in Miami International Airport?

"Working on a knee"

Has a Nobel Peace laureate ever been accused of war crimes?

Writing a T-SQL stored procedure to receive 4 numbers and insert them into a table

Does using the Inspiration rules for character defects encourage My Guy Syndrome?

/bin/ls sorts differently than just ls

Suing a Police Officer Instead of the Police Department

Why does Java have support for time zone offsets with seconds precision?

All ASCII characters with a given bit count

Eigenvalues of the Laplacian of the directed De Bruijn graph

How was Lagrange appointed professor of mathematics so early?

Why I cannot instantiate a class whose constructor is private in a friend class?

Variable does not exist: sObjectType (Task.sObjectType)



Count number of episodes that has a hash



Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30 pm US/Eastern)
Data science time! April 2019 and salary with experience
The Ask Question Wizard is Live!Display number with leading zerosHow do I check if a string is a number (float)?How to get line count cheaply in Python?Count the number occurrences of a character in a stringHow do I get the number of elements in a list in Python?How to print number with commas as thousands separators?How can I count the occurrences of a list item?How do I get the row count of a pandas DataFrame?SQLAlchemy + MySQL Large Table Performance AdviceHow to count child table items with or without join to parent table using SQLAlchemy?



.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








1















I would like some help with a sql query, i'm using SQLAlchemy but I don't even understand how i can express the query in raw sql.



Im phashing every frames of all videos in a season and adding them to the db.
My goal is to find intros the videos checking for the same reaccuring frames in the videos.



My table looks like:



|id|tvdbid|hash |season|episode|offset
---+------+-----+------+-------+------
|1 |1337 |a1a1a|1 |1 |42
---+------+-----+------+-------+-------
|2 |1337 |a1a1a|1 |1 |68
---+------+-----+------+-------+-------
|3 |1337 |a1a1b|1 |2 |92
---+------+-----+------+-------+-------
|4 |1337 |a1a1a|1 |2 |116
---+------+-----+------+-------+-------
|5 |1337 |a1a1a|1 |3 |42
---+------+-----+------+-------+-------
|6 |1337 |a1a1a|1 |3 |42


The result im looking for a is a list of rows where the hash matches in n number of episodes(it can only match on episode at the time) and has the same tvdbid and season number.



At the moment i'm doing:



import sqlalchemy as sa 
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

class Hashes(Base):
__tablename__ = 'hashes'

id = sa.Column(sa.Integer, primary_key=True)
season = sa.Column(sa.Integer)
episode = sa.Column(sa.Integer)
tvdbid = sa.Column(sa.Text(length=100))
hash = sa.Column(sa.Text(length=16))
offset = sa.Column(sa.Integer)

h = Hashes.__table__

async def some_web_request(request):

# I need to use raw sql or core as the db library requires it.
# my cli tool uses a sync method to insert the rows in the db.
query = h.select().where(sa.and_(h.c.tvdbid ==
request.path_params['tvdbid'],
h.c.season == request.path_params['season'])).group_by('hash', 'episode')
result = await DB.fetch_all(query)
return result


This seems to work just fine, but it isn't exactly what I want so I have to clean up up with python and it will not be viable in the long run. The table will have have between 5 - 500 million rows.



My current "work around":



from collections import defaultdict

def clean_up(result):
d = defaultdict(set)
for row in result:
d[row.hash].add(row.episode)

final_result = []
for k, v in d.items():
if (l) > 4: # 4 is the number of episodes.
final_result.append(k)

return final_result



The desired output should have been:



|id|tvdbid|hash |season|episode|offset
---+------+-----+------+-------+------
|1 |1337 |a1a1a|1 |1 |42



as the hash needs to present in atleast 50% of the episodes.
or it could simply be a1a1a i dont really need to entire rows now. (this will needed laster to check for recaps etc.)










share|improve this question
























  • I'm sorry if I didn't understand your data well, but wouldn't it be enough to just SELECT DISTINCT ? Or rather make a SELECT DISTINCT out of your SELECT results ?

    – reportgunner
    Mar 22 at 15:00











  • could you please update the question by adding some more rows of sample data & the desired result. That would help in understanding the problem better

    – Haleemur Ali
    Mar 22 at 15:12











  • I have updated what im trying to do, with more sample data and the desired output. Thanks!

    – steffen fredriksen
    Mar 22 at 15:36

















1















I would like some help with a sql query, i'm using SQLAlchemy but I don't even understand how i can express the query in raw sql.



Im phashing every frames of all videos in a season and adding them to the db.
My goal is to find intros the videos checking for the same reaccuring frames in the videos.



My table looks like:



|id|tvdbid|hash |season|episode|offset
---+------+-----+------+-------+------
|1 |1337 |a1a1a|1 |1 |42
---+------+-----+------+-------+-------
|2 |1337 |a1a1a|1 |1 |68
---+------+-----+------+-------+-------
|3 |1337 |a1a1b|1 |2 |92
---+------+-----+------+-------+-------
|4 |1337 |a1a1a|1 |2 |116
---+------+-----+------+-------+-------
|5 |1337 |a1a1a|1 |3 |42
---+------+-----+------+-------+-------
|6 |1337 |a1a1a|1 |3 |42


The result im looking for a is a list of rows where the hash matches in n number of episodes(it can only match on episode at the time) and has the same tvdbid and season number.



At the moment i'm doing:



import sqlalchemy as sa 
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

class Hashes(Base):
__tablename__ = 'hashes'

id = sa.Column(sa.Integer, primary_key=True)
season = sa.Column(sa.Integer)
episode = sa.Column(sa.Integer)
tvdbid = sa.Column(sa.Text(length=100))
hash = sa.Column(sa.Text(length=16))
offset = sa.Column(sa.Integer)

h = Hashes.__table__

async def some_web_request(request):

# I need to use raw sql or core as the db library requires it.
# my cli tool uses a sync method to insert the rows in the db.
query = h.select().where(sa.and_(h.c.tvdbid ==
request.path_params['tvdbid'],
h.c.season == request.path_params['season'])).group_by('hash', 'episode')
result = await DB.fetch_all(query)
return result


This seems to work just fine, but it isn't exactly what I want so I have to clean up up with python and it will not be viable in the long run. The table will have have between 5 - 500 million rows.



My current "work around":



from collections import defaultdict

def clean_up(result):
d = defaultdict(set)
for row in result:
d[row.hash].add(row.episode)

final_result = []
for k, v in d.items():
if (l) > 4: # 4 is the number of episodes.
final_result.append(k)

return final_result



The desired output should have been:



|id|tvdbid|hash |season|episode|offset
---+------+-----+------+-------+------
|1 |1337 |a1a1a|1 |1 |42



as the hash needs to present in atleast 50% of the episodes.
or it could simply be a1a1a i dont really need to entire rows now. (this will needed laster to check for recaps etc.)










share|improve this question
























  • I'm sorry if I didn't understand your data well, but wouldn't it be enough to just SELECT DISTINCT ? Or rather make a SELECT DISTINCT out of your SELECT results ?

    – reportgunner
    Mar 22 at 15:00











  • could you please update the question by adding some more rows of sample data & the desired result. That would help in understanding the problem better

    – Haleemur Ali
    Mar 22 at 15:12











  • I have updated what im trying to do, with more sample data and the desired output. Thanks!

    – steffen fredriksen
    Mar 22 at 15:36













1












1








1


1






I would like some help with a sql query, i'm using SQLAlchemy but I don't even understand how i can express the query in raw sql.



Im phashing every frames of all videos in a season and adding them to the db.
My goal is to find intros the videos checking for the same reaccuring frames in the videos.



My table looks like:



|id|tvdbid|hash |season|episode|offset
---+------+-----+------+-------+------
|1 |1337 |a1a1a|1 |1 |42
---+------+-----+------+-------+-------
|2 |1337 |a1a1a|1 |1 |68
---+------+-----+------+-------+-------
|3 |1337 |a1a1b|1 |2 |92
---+------+-----+------+-------+-------
|4 |1337 |a1a1a|1 |2 |116
---+------+-----+------+-------+-------
|5 |1337 |a1a1a|1 |3 |42
---+------+-----+------+-------+-------
|6 |1337 |a1a1a|1 |3 |42


The result im looking for a is a list of rows where the hash matches in n number of episodes(it can only match on episode at the time) and has the same tvdbid and season number.



At the moment i'm doing:



import sqlalchemy as sa 
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

class Hashes(Base):
__tablename__ = 'hashes'

id = sa.Column(sa.Integer, primary_key=True)
season = sa.Column(sa.Integer)
episode = sa.Column(sa.Integer)
tvdbid = sa.Column(sa.Text(length=100))
hash = sa.Column(sa.Text(length=16))
offset = sa.Column(sa.Integer)

h = Hashes.__table__

async def some_web_request(request):

# I need to use raw sql or core as the db library requires it.
# my cli tool uses a sync method to insert the rows in the db.
query = h.select().where(sa.and_(h.c.tvdbid ==
request.path_params['tvdbid'],
h.c.season == request.path_params['season'])).group_by('hash', 'episode')
result = await DB.fetch_all(query)
return result


This seems to work just fine, but it isn't exactly what I want so I have to clean up up with python and it will not be viable in the long run. The table will have have between 5 - 500 million rows.



My current "work around":



from collections import defaultdict

def clean_up(result):
d = defaultdict(set)
for row in result:
d[row.hash].add(row.episode)

final_result = []
for k, v in d.items():
if (l) > 4: # 4 is the number of episodes.
final_result.append(k)

return final_result



The desired output should have been:



|id|tvdbid|hash |season|episode|offset
---+------+-----+------+-------+------
|1 |1337 |a1a1a|1 |1 |42



as the hash needs to present in atleast 50% of the episodes.
or it could simply be a1a1a i dont really need to entire rows now. (this will needed laster to check for recaps etc.)










share|improve this question
















I would like some help with a sql query, i'm using SQLAlchemy but I don't even understand how i can express the query in raw sql.



Im phashing every frames of all videos in a season and adding them to the db.
My goal is to find intros the videos checking for the same reaccuring frames in the videos.



My table looks like:



|id|tvdbid|hash |season|episode|offset
---+------+-----+------+-------+------
|1 |1337 |a1a1a|1 |1 |42
---+------+-----+------+-------+-------
|2 |1337 |a1a1a|1 |1 |68
---+------+-----+------+-------+-------
|3 |1337 |a1a1b|1 |2 |92
---+------+-----+------+-------+-------
|4 |1337 |a1a1a|1 |2 |116
---+------+-----+------+-------+-------
|5 |1337 |a1a1a|1 |3 |42
---+------+-----+------+-------+-------
|6 |1337 |a1a1a|1 |3 |42


The result im looking for a is a list of rows where the hash matches in n number of episodes(it can only match on episode at the time) and has the same tvdbid and season number.



At the moment i'm doing:



import sqlalchemy as sa 
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

class Hashes(Base):
__tablename__ = 'hashes'

id = sa.Column(sa.Integer, primary_key=True)
season = sa.Column(sa.Integer)
episode = sa.Column(sa.Integer)
tvdbid = sa.Column(sa.Text(length=100))
hash = sa.Column(sa.Text(length=16))
offset = sa.Column(sa.Integer)

h = Hashes.__table__

async def some_web_request(request):

# I need to use raw sql or core as the db library requires it.
# my cli tool uses a sync method to insert the rows in the db.
query = h.select().where(sa.and_(h.c.tvdbid ==
request.path_params['tvdbid'],
h.c.season == request.path_params['season'])).group_by('hash', 'episode')
result = await DB.fetch_all(query)
return result


This seems to work just fine, but it isn't exactly what I want so I have to clean up up with python and it will not be viable in the long run. The table will have have between 5 - 500 million rows.



My current "work around":



from collections import defaultdict

def clean_up(result):
d = defaultdict(set)
for row in result:
d[row.hash].add(row.episode)

final_result = []
for k, v in d.items():
if (l) > 4: # 4 is the number of episodes.
final_result.append(k)

return final_result



The desired output should have been:



|id|tvdbid|hash |season|episode|offset
---+------+-----+------+-------+------
|1 |1337 |a1a1a|1 |1 |42



as the hash needs to present in atleast 50% of the episodes.
or it could simply be a1a1a i dont really need to entire rows now. (this will needed laster to check for recaps etc.)







python sqlalchemy






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 22 at 15:35







steffen fredriksen

















asked Mar 22 at 14:58









steffen fredriksensteffen fredriksen

62




62












  • I'm sorry if I didn't understand your data well, but wouldn't it be enough to just SELECT DISTINCT ? Or rather make a SELECT DISTINCT out of your SELECT results ?

    – reportgunner
    Mar 22 at 15:00











  • could you please update the question by adding some more rows of sample data & the desired result. That would help in understanding the problem better

    – Haleemur Ali
    Mar 22 at 15:12











  • I have updated what im trying to do, with more sample data and the desired output. Thanks!

    – steffen fredriksen
    Mar 22 at 15:36

















  • I'm sorry if I didn't understand your data well, but wouldn't it be enough to just SELECT DISTINCT ? Or rather make a SELECT DISTINCT out of your SELECT results ?

    – reportgunner
    Mar 22 at 15:00











  • could you please update the question by adding some more rows of sample data & the desired result. That would help in understanding the problem better

    – Haleemur Ali
    Mar 22 at 15:12











  • I have updated what im trying to do, with more sample data and the desired output. Thanks!

    – steffen fredriksen
    Mar 22 at 15:36
















I'm sorry if I didn't understand your data well, but wouldn't it be enough to just SELECT DISTINCT ? Or rather make a SELECT DISTINCT out of your SELECT results ?

– reportgunner
Mar 22 at 15:00





I'm sorry if I didn't understand your data well, but wouldn't it be enough to just SELECT DISTINCT ? Or rather make a SELECT DISTINCT out of your SELECT results ?

– reportgunner
Mar 22 at 15:00













could you please update the question by adding some more rows of sample data & the desired result. That would help in understanding the problem better

– Haleemur Ali
Mar 22 at 15:12





could you please update the question by adding some more rows of sample data & the desired result. That would help in understanding the problem better

– Haleemur Ali
Mar 22 at 15:12













I have updated what im trying to do, with more sample data and the desired output. Thanks!

– steffen fredriksen
Mar 22 at 15:36





I have updated what im trying to do, with more sample data and the desired output. Thanks!

– steffen fredriksen
Mar 22 at 15:36












0






active

oldest

votes












Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55302403%2fcount-number-of-episodes-that-has-a-hash%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55302403%2fcount-number-of-episodes-that-has-a-hash%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript