Skip to content

Webcoderz twint update#85

Closed
webcoderz wants to merge 29 commits intomasterfrom
webcoderz-twint-update
Closed

Webcoderz twint update#85
webcoderz wants to merge 29 commits intomasterfrom
webcoderz-twint-update

Conversation

@webcoderz
Copy link
Copy Markdown
Collaborator

No description provided.

Comment thread modules/TwintPool.py Outdated
neo4j_df['user_followers_count'] = None
neo4j_df['user_friends_count'] = None
neo4j_df['date'] = df['date']
# neo4j_df['user_created_at'] = None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@webcoderz afaict, and following the naming convention, neo4j_df['user_created_at'] is for user account creation date and neo4j_df['created_at'] for tweet creation date. So instead of adding neo4j['date'], we should put it in neo4j['user_created_at'] or neo4j['created_at']

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea we don't really need it i made this commit before i noticed the bug with neo4jdf["created_at"] theyre both the same now .. in twint when you get the user info the user created date would be df["join_datetime"]

'user_created_at': pd.to_datetime(
df['user_created_at']) if 'user_created_at' in row else None,
row['join_datetime']) if 'join_datetime' in row else None,
'user_profile_image_url': row[
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

join_datetime seems to be in https://github.com/twintproject/twint/blob/master/twint/storage/panda.py#L125 , but the rest of the fields here aren't... surprised this works? but guessing manual tests were fine, so will just note it here in case need to revisit

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i removed it this morning , forgot i did account enrichments at a diff stage in the transformation when i committed this.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

going to do a round trip into neo today on everything

lmeyerov
lmeyerov previously approved these changes Dec 28, 2020
Copy link
Copy Markdown
Contributor

@lmeyerov lmeyerov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has this been tested w/ both tweet+acct timestamps etc all the way through neo4j roundtrip?

@lmeyerov lmeyerov closed this Sep 23, 2022
@lmeyerov lmeyerov deleted the webcoderz-twint-update branch September 23, 2022 02:57
Copy link
Copy Markdown

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@webcoderz I'm just now seeing this. Definitely got some skills.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants