Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
183 commits
Select commit Hold shift + click to select a range
c83e06f
Add files via upload
webcoderz Sep 4, 2020
41bf592
Update TwintPool.py
webcoderz Sep 4, 2020
ddba27c
delete twint source
webcoderz Sep 4, 2020
e164212
comment out testing func left in
webcoderz Sep 4, 2020
cd18fb2
adding acct write
webcoderz Sep 4, 2020
099da1f
trying to fix tweet type inference
webcoderz Sep 4, 2020
247cfa0
trying to fix tweet type inference
webcoderz Sep 4, 2020
fdff8af
get info fix
webcoderz Sep 4, 2020
4ebacb6
get info fix
webcoderz Sep 4, 2020
f5499be
changed hydration status to partial
webcoderz Sep 4, 2020
2692d17
enrich usr info fx change can enrich any df with df["user_screen_name…
webcoderz Sep 5, 2020
3a827e7
enrich usr info fx change can enrich any df with df["user_screen_name…
webcoderz Sep 5, 2020
575096a
enrich usr info fx change can enrich any df with df["user_screen_name…
webcoderz Sep 5, 2020
1f1d8f7
enrich usr info fx change can enrich any df with df["user_screen_name…
webcoderz Sep 5, 2020
40552bc
enrich usr info fx change can enrich any df with df["user_screen_name…
webcoderz Sep 5, 2020
7dcf8fc
enrich usr info fx change can enrich any df with df["user_screen_name…
webcoderz Sep 5, 2020
7462633
enrich usr info fx change can enrich any df with df["user_screen_name…
webcoderz Sep 5, 2020
b1396a4
drop dupes b4 user acct enrichment
webcoderz Sep 5, 2020
984b556
drop dupes b4 user acct enrichment
webcoderz Sep 5, 2020
f72c1db
drop dupes b4 fix user acct enrichment and before write
webcoderz Sep 5, 2020
61bcf68
drop dupes b4 fix user acct enrichment and before write
webcoderz Sep 5, 2020
4577311
usr info enrichment fx improvement
webcoderz Sep 5, 2020
8f2eead
usr info enrichment fx improvement
webcoderz Sep 5, 2020
7b7d4a8
usr info enrichment fx improvement
webcoderz Sep 5, 2020
879b4d5
usr info enrichment fx improvement
webcoderz Sep 5, 2020
e27ec6d
usr info enrichment fx improvement
webcoderz Sep 5, 2020
16b07b1
twint egg to be installed via pip instead of twint directly from pip …
webcoderz Sep 5, 2020
ac7e91f
removed twint get just use fh.search_time_range
webcoderz Sep 6, 2020
b41dd05
removed twint get just use fh.search_time_range
webcoderz Sep 6, 2020
43f7736
removed twint get just use fh.search_time_range
webcoderz Sep 6, 2020
daf0fb8
added scale and changed version of nonrapids compose to 2
webcoderz Sep 6, 2020
0634538
added scale and changed version of nonrapids compose to 2
webcoderz Sep 6, 2020
ec37636
added scale and changed version of nonrapids compose to 2
webcoderz Sep 6, 2020
e755491
added replicas and changed version of nonrapids compose to 3
webcoderz Sep 6, 2020
14f4cc9
uncommenting out batch to improve on periodic write frequency
webcoderz Sep 6, 2020
568c82a
recommenting out batch to improve on periodic write frequency
webcoderz Sep 6, 2020
48735df
Merge branch 'master' into webcoderz-twint-patch-1
webcoderz Sep 7, 2020
bc28029
recommenting out batch to improve on periodic write frequency
webcoderz Sep 7, 2020
79e0f3c
removed container name from prefect agent for scaling
webcoderz Sep 7, 2020
464acd8
added jobs container to nonrapids-docker-compose.yml
webcoderz Sep 7, 2020
6c71492
added jobs container to nonrapids-docker-compose.yml
webcoderz Sep 7, 2020
2889dc3
added jobs container to nonrapids-docker-compose.yml
webcoderz Sep 7, 2020
5416a68
added jobs container to nonrapids-docker-compose.yml
webcoderz Sep 7, 2020
2ff4507
added jobs container to nonrapids-docker-compose.yml
webcoderz Sep 7, 2020
6e4ffee
added jobs container to nonrapids-docker-compose.yml
webcoderz Sep 7, 2020
5ae817f
refining jobs container
webcoderz Sep 7, 2020
d55e7e0
refining jobs container
webcoderz Sep 7, 2020
04e34bd
refining jobs container
webcoderz Sep 7, 2020
821d73d
fixing twintpool import
webcoderz Sep 7, 2020
aa2c6b2
fixing twintpool import
webcoderz Sep 7, 2020
80b7b0b
fixing jobs container import
webcoderz Sep 7, 2020
7ea96dc
logger type error fix
webcoderz Sep 7, 2020
6fe89aa
logger type error fix
webcoderz Sep 7, 2020
e888f63
added datastream-docker-compose.yml
webcoderz Sep 7, 2020
9f38550
command to kick job off on compose
webcoderz Sep 7, 2020
d2cb695
command to kick job off on compose
webcoderz Sep 7, 2020
55b2cab
fix fh debug
webcoderz Sep 7, 2020
92fe851
fix fh debug
webcoderz Sep 7, 2020
eb6af3b
change dir for check hydrate creds
webcoderz Sep 7, 2020
559ac27
fixed nonrapids compose template
webcoderz Sep 7, 2020
6743fa1
async loop not needed on job in container
webcoderz Sep 7, 2020
cb81014
changed agent job run interval to 10 seconds
webcoderz Sep 7, 2020
1d1ecb4
renamed job dockerfile to datastream-Dockerfile for uniformity
webcoderz Sep 7, 2020
c6faf8f
changing logging.info to logging.debug to minimize log production
webcoderz Sep 8, 2020
f2d4eb4
added relationship writer
webcoderz Sep 8, 2020
056a144
added relationship writer
webcoderz Sep 8, 2020
3837d77
fix
webcoderz Sep 8, 2020
4e15e20
fix
webcoderz Sep 8, 2020
7161af1
writer fix- debug still says its writing relationships yet, not regis…
webcoderz Sep 8, 2020
54e375e
writer fix- debug still says its writing relationships yet, not regis…
webcoderz Sep 8, 2020
d49b6d2
writer fix- debug still says its writing relationships yet, not regis…
webcoderz Sep 9, 2020
c065688
added tor node into agent
webcoderz Sep 9, 2020
2feb3c3
twint import fix
webcoderz Sep 9, 2020
ac5c011
datastream tor fix to not expose 9050 outside container
webcoderz Sep 9, 2020
d99a3ea
fix
webcoderz Sep 9, 2020
d78a942
change directory check hydrate checks for creds
webcoderz Sep 9, 2020
c8784ba
removed tor dir just put dockerfile with rest
webcoderz Sep 9, 2020
077e8da
removed tor dir just put dockerfile with rest
webcoderz Sep 9, 2020
ef66b13
fixing unused continuation in dockerfile
webcoderz Sep 9, 2020
3269790
adding tor to datastream compose
webcoderz Sep 9, 2020
e00cf1a
added iptables bash script to route all container traffic through tor
webcoderz Sep 9, 2020
1ffab43
tor container name to compose
webcoderz Sep 9, 2020
44053c2
tor container name to compose
webcoderz Sep 9, 2020
b690c12
tor container name to compose
webcoderz Sep 9, 2020
7bf36f9
tor container name to tor compose
webcoderz Sep 9, 2020
fb1adb8
tor container name to tor compose
webcoderz Sep 9, 2020
64c6110
tor container fix for iptables and tor proxy
webcoderz Sep 9, 2020
6e09494
ra
webcoderz Sep 9, 2020
299bd92
remove restart policy for data stream compose
webcoderz Sep 9, 2020
a09b270
adding info logs into writer
webcoderz Sep 10, 2020
ebe46c4
reset_tables.sh
webcoderz Sep 10, 2020
eafdf9a
reset_tables.sh
webcoderz Sep 10, 2020
60edc21
reset_tables.sh
webcoderz Sep 10, 2020
2ea3ad7
writer related logs to info
webcoderz Sep 10, 2020
ab0cb04
writer related fixes
webcoderz Sep 10, 2020
d28eb7d
writer related df cleanups
webcoderz Sep 10, 2020
68dea2e
writer related df cleanups
webcoderz Sep 10, 2020
61a8d1b
writer related df cleanups
webcoderz Sep 10, 2020
44b5211
fixed relationship writer, using old twarc writer.
webcoderz Sep 10, 2020
f813b95
, using old twarc writer.
webcoderz Sep 10, 2020
993906b
, using old twarc writer.
webcoderz Sep 10, 2020
a44ffb9
added logs to writer
webcoderz Sep 10, 2020
6e48b12
added logs to writer
webcoderz Sep 10, 2020
51cd8e6
added logs to writer
webcoderz Sep 10, 2020
d05c3be
writer improvement
webcoderz Sep 10, 2020
41a4434
writer improvement
webcoderz Sep 11, 2020
6583550
writer improvement
webcoderz Sep 11, 2020
d209e27
writer improvement
webcoderz Sep 11, 2020
97c6ca6
writer improvements cleanup
webcoderz Sep 11, 2020
49b698c
added clocks to enrichment fx
webcoderz Sep 11, 2020
67c55e1
aiohttp_socks
webcoderz Sep 12, 2020
6abbf07
aiohttp_socks
webcoderz Sep 12, 2020
1ed1d70
removing debug forloop in twintdf enrichment fx
webcoderz Sep 12, 2020
1ff9b37
removing debug forloop in twintdf enrichment fx
webcoderz Sep 12, 2020
abe83a3
removing debug forloop in twintdf enrichment fx
webcoderz Sep 12, 2020
8d5ed3a
removing debug forloop in twintdf enrichment fx
webcoderz Sep 12, 2020
c980ab8
increased default limit to 1000
webcoderz Sep 12, 2020
eed898d
added clocks to writer and each enrichment from twint to neo
webcoderz Sep 12, 2020
b180bdd
added clocks to twintpool and each enrichment from twint to neo
webcoderz Sep 12, 2020
ebbb353
added clocks to twintpool and each enrichment from twint to neo
webcoderz Sep 12, 2020
8c484d7
commented out acct enrichment
webcoderz Sep 13, 2020
1773635
commented out acct enrichment
webcoderz Sep 13, 2020
a07e6a2
logger to info
webcoderz Sep 13, 2020
ccb3864
logger to info
webcoderz Sep 13, 2020
fa17653
logger to info
webcoderz Sep 13, 2020
6651382
logger to info
webcoderz Sep 13, 2020
7930fc8
logger to info
webcoderz Sep 13, 2020
1c5f7e6
set parameters hydrated status to PARTIAL
webcoderz Sep 13, 2020
52983b3
fix(twint): container and tor setup
lmeyerov Sep 16, 2020
ae2eb1d
adding is_tor flag in and setting in twintpool in checkhydrate proper…
webcoderz Sep 16, 2020
ff45ea0
adding is_tor flag in and setting in twintpool in checkhydrate proper…
webcoderz Sep 16, 2020
3c279f1
adding is_tor flag in and setting in twintpool in checkhydrate proper…
webcoderz Sep 16, 2020
10b241e
adding is_tor flag in and setting in twintpool in checkhydrate proper…
webcoderz Sep 16, 2020
14017fd
updating job
webcoderz Sep 16, 2020
3d7c969
updating job
webcoderz Sep 16, 2020
43b02d9
updating job
webcoderz Sep 16, 2020
9230edc
updating job
webcoderz Sep 16, 2020
88ed1ce
setting up datastream-Dockerfile for twitterscraper library
webcoderz Sep 24, 2020
06e322c
setting up datastream-Dockerfile for twitterscraper library
webcoderz Sep 24, 2020
9580d40
setting up datastream-Dockerfile for twitterscraper library w/ geckod…
webcoderz Sep 24, 2020
c36852b
setting up datastream-Dockerfile for twitterscraper library w/ geckod…
webcoderz Sep 24, 2020
a8ac909
setting up datastream-Dockerfile for twitterscraper library w/ geckod…
webcoderz Sep 24, 2020
5ecbedd
setting up datastream-Dockerfile for twitterscraper library w/ geckod…
webcoderz Sep 24, 2020
df9bd0b
setting up datastream-Dockerfile for twitterscraper library w/ geckod…
webcoderz Sep 24, 2020
2f84bcb
setting up datastream-Dockerfile for twitterscraper library w/ geckod…
webcoderz Sep 24, 2020
047430e
setting up datastream-Dockerfile for twitterscraper library w/ geckod…
webcoderz Sep 24, 2020
59d512b
setting up datastream-Dockerfile for twitterscraper library w/ geckod…
webcoderz Sep 24, 2020
a282d8d
setting up datastream-Dockerfile for twitterscraper library w/ geckod…
webcoderz Sep 24, 2020
540efe2
setting up datastream-Dockerfile for twitterscraper library w/ geckod…
webcoderz Sep 24, 2020
5ea76b8
setting up datastream-Dockerfile for twitterscraper library w/ geckod…
webcoderz Sep 24, 2020
58a66c7
setting up datastream-Dockerfile for twitterscraper library w/ geckod…
webcoderz Sep 24, 2020
7461321
setting up datastream-Dockerfile for twitterscraper library w/ geckod…
webcoderz Sep 24, 2020
332a4d9
setting up datastream-Dockerfile for twitterscraper library w/ geckod…
webcoderz Sep 24, 2020
431af04
setting up datastream-Dockerfile for twitterscraper library w/ geckod…
webcoderz Sep 24, 2020
07a7181
setting up datastream-Dockerfile for twitterscraper library w/ geckod…
webcoderz Sep 24, 2020
9d25744
setting up datastream-Dockerfile for twitterscraper library w/ geckod…
webcoderz Sep 24, 2020
c1e0eb9
setting up datastream-Dockerfile for twitterscraper library w/ geckod…
webcoderz Sep 24, 2020
a6983a4
setting up datastream-Dockerfile for twitterscraper library w/ geckod…
webcoderz Sep 24, 2020
10e600b
setting up datastream-Dockerfile for twitterscraper library w/ geckod…
webcoderz Sep 24, 2020
fc18003
setting up datastream-Dockerfile for twitterscraper library w/ geckod…
webcoderz Sep 24, 2020
0bd26b6
setting up datastream-Dockerfile for twitterscraper library w/ geckod…
webcoderz Sep 24, 2020
47297c5
setting up datastream-Dockerfile for twitterscraper library w/ geckod…
webcoderz Sep 24, 2020
84060b6
setting up datastream-Dockerfile for twitterscraper library w/ geckod…
webcoderz Sep 24, 2020
fad3bce
setting up datastream-Dockerfile for twitterscraper library w/ geckod…
webcoderz Sep 24, 2020
15463b4
setting up datastream-Dockerfile for twitterscraper library w/ geckod…
webcoderz Sep 25, 2020
dec74eb
setting up datastream-Dockerfile for twitterscraper library w/ geckod…
webcoderz Sep 25, 2020
631eb57
setting up datastream-Dockerfile for twitterscraper library w/ geckod…
webcoderz Sep 25, 2020
1aa8191
setting up datastream-Dockerfile for twitterscraper library w/ geckod…
webcoderz Sep 25, 2020
580a697
setting up datastream-Dockerfile for twitterscraper library w/ geckod…
webcoderz Sep 25, 2020
779cc99
setting up datastream-Dockerfile for twitterscraper library w/ geckod…
webcoderz Sep 25, 2020
b572c8e
setting up datastream-Dockerfile for twitterscraper library w/ geckod…
webcoderz Sep 25, 2020
0f0290f
setting up datastream-Dockerfile for twitterscraper library w/ geckod…
webcoderz Sep 25, 2020
a292cfe
setting up datastream-Dockerfile for twitterscraper library w/ geckod…
webcoderz Sep 25, 2020
b6a8ef3
setting up datastream-Dockerfile for twitterscraper library w/ geckod…
webcoderz Sep 25, 2020
2fed016
setting up datastream-Dockerfile for twitterscraper library w/ geckod…
webcoderz Sep 25, 2020
e862eef
setting up datastream-Dockerfile for twitterscraper library w/ geckod…
webcoderz Sep 25, 2020
b8e9757
setting up datastream-Dockerfile for twitterscraper library w/ geckod…
webcoderz Sep 25, 2020
39bc1e9
setting up datastream-Dockerfile for twitterscraper library w/ geckod…
webcoderz Sep 25, 2020
badca9f
ublock extension
webcoderz Sep 26, 2020
b03e85b
ublock extension
webcoderz Sep 26, 2020
b269256
ublock extension
webcoderz Sep 26, 2020
2fa3400
fix
webcoderz Sep 26, 2020
e4ca790
fix
webcoderz Oct 2, 2020
File filter

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
removed twint get just use fh.search_time_range
  • Loading branch information
webcoderz committed Sep 6, 2020
commit ac7e91ff4669e121f49de7ce5b59fb8e061dbbd5
5 changes: 3 additions & 2 deletions modules/FirehoseJob.py
Original file line number Diff line number Diff line change
Expand Up @@ -717,10 +717,11 @@ def search_time_range(self,
logger.debug('hits %s to %s: %s', t0, t1, len(df))
if self.save_to_neo:
logger.debug('writing to neo4j')
df2 = Neo4jDataAccess(self.neo4j_creds).save_twintdf_to_neo(df, job_name, job_id)
df = tp.check_hydrate(df)
res = Neo4jDataAccess(self.neo4j_creds).save_twintdf_to_neo(df, job_name, job_id)
# df3 = Neo4jDataAccess(self.debug, self.neo4j_creds).save_df_to_graph(df2, job_name)
logger.debug('wrote to neo4j, # ', len(df2))
yield df2
yield res
else:
yield df
logger.debug('done')
Expand Down
17 changes: 1 addition & 16 deletions modules/TwintPool.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,22 +68,7 @@ def _get_timeline(self, username, limit):
tweets_df = twint.storage.panda.Tweets_df
return tweets_df

def twint_get(self, Search, Since, Until, job_name, Limit, **kwargs):
from .FirehoseJob import FirehoseJob
neo4j_creds = None
with open('../neo4jcreds.json') as json_file:
neo4j_creds = json.load(json_file)
fh = FirehoseJob(neo4j_creds=neo4j_creds, PARQUET_SAMPLE_RATE_TIME_S=30, save_to_neo=False, writers={})
dfs = []
for df in fh.search_time_range(
Search=Search,
Since=Since,
Until=Until,
job_name=job_name,
Limit=Limit):
dfs.append(df)
df = pd.concat(dfs)
return df


def _get_user_info(self, username):
self.config.User_full = True
Expand Down