Sentiment Analysis of Data Science Job Listing on LinkedIn
Southern Methodist University
By Barry Daemi
Updated: May 23, 2023
$$\newcommand{\C}{\mathbb{C}} \newcommand{\R}{\mathbb{R}} \newcommand{\Z}{\mathbb{Z}} \newcommand{\Q}{\mathbb{Q}} \newcommand{\P}{\mathbb{P}}
\newcommand{\F}{\mathbb{F}} \newcommand{\N}{\mathbb{N}} \newcommand{\E}{\mathbb{E}}$$
1. Proof of Concept
With the dawn of ATS (applicant tracking system) in the hiring process of so many large U.S. firms, it has become important that an applicant understand how ATS functioned in the hiring process, and how best to sell themselves to this complex HR software, and the human-element behind said software - the Human resource team. Long-gone are the days of simple resumes and human communication through venting/screening interviews; the dawn of automated venting and text sentiment analysis from Modern ATS was brought upon us in the mid-2010s, and only become more popular amongst the Fortune 500 companies, with more limited success in middle and small size U.S. firms - due to cost habitations.
Being of self-interest, this project is aimed discerning the keywords that are need to sell-oneself to ATS to land a Data science role at a large U.S. firm. The data source was collected from Data science Job listings on LinkedIn; to scrape data from LinkedIn job listing - e.g. role descriptions, a Python package named Beautiful Soup [1] was utilized to scrape the role description from a scraped webpage HTML file. To conduct the text sentiment analysis a Python package named Natural Language Toolkit [2] was utilized for the formatting and analysis itself. In addition the following are all the packages that were utilized, nltk (e.g. Natural Language Toolkit), pprint, bs4 (e.g. Beautiful Soup), re, requests, numpy, pandas and matplotlib.pyplot.
import nltk
from pprint import pprint
import requests
from bs4 import BeautifulSoup
import re
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
The following script generates the soup, in other words the scraped HTML file; from this HTML file, one can scrape the role description from it. Note that response[200] is printed to the console, when requests successfully linked with the webpage and successfully scrape its HTML contents.
url='https://www.linkedin.com/jobs/view/3422401649/?eBP=CwEAAAGIKwjYj_vQEnYbvBBn2V6wiKWn9ClN6m9vyujQQerrMecjbzIiAGBMoQtO075kD9qxVfPYdICvZOFJH8TgWvs_pyK5rK3IxlOdjyOkgdw6i6Ijaq5kHN9XjUlxkeIkorciw9wzIbpf7MDORrpc_oIbDCGYzj_jqY5BqT-AXYpZOdqEBqUyUzGwPKOgpvf9wZIoWneDJayglEX89z3v2ubRTxuRPdvUonTrm0y7Zg-5HJvy7M-wCjHIBolo8FSjl2TrpUqlE0XhZllwshCwuMrZfT8zhxfFoVQFd_3oucfQ8m3mWRZo_5beZ7eqckh17GvZYyuV5KBgKzjlrNLV6m1HJyUFHbTbixkdna3Nw4fG5TrALIbeQo8XtMyPE4TaIwC6TDr3RUqJ&recommendedFlavor=IN_NETWORK&refId=2oxC6zYa%2Fj%2F0ApGKWLNEbA%3D%3D&trackingId=1cPJJhWY2gdGWL9BkbjRig%3D%3D&trk=flagship3_search_srp_jobs';
response=requests.get(url);
print(response);
soup=BeautifulSoup(response.content,'html.parser');
<Response [200]>
HTML (HyperText Markup Language) is a Markup language, and so it is tag-based programming language, where tags, '<' '>', defines the functionlity of said high-level programming language. These HTML tags can be used to narrow the scope of scraping, and their respective class/id attributes can narrow the scope of a scraping to a specific piece of text. To accomplish this task, one would use .find() from BeautifulSoup; where the specific HTML tag and class/id are passed into the find() function as arguments. Though there is an significant dilemma with the resulting output; it contains HTML tags. These HTML tags need to be removed before any text sentiment analysis can be conducted; if not removed, these frivolous tokens will affect the result of our sentiment analysis - which is a undesired outcome.
The following script below find() removes the HTML tag from the string, the result was printed to console.
d=soup.find('div',class_="description__text description__text--rich");
CLEANR=re.compile('<.*?>|&([a-z0-9]+|#[0-9]{1,6}|#x[0-9a-f]{1,6});');
def cleanhtml(raw_html):
cleantext = re.sub(CLEANR,'', raw_html);
return cleantext;
text=cleanhtml(str(d));
text
'\n\n\n Job Id: 22598145The Data Science Lead Analyst is a strategic professional who stays abreast of developments within own field and contributes to directional strategy by considering their application in own job and the business. Recognized technical authority for an area within the business. Requires basic commercial awareness. There are typically multiple people within the business that provide the same level of subject matter expertise. Developed communication and diplomacy skills are required in order to guide, influence and convince others, in particular colleagues in other areas and occasional external customers. Significant impact on the area through complex deliverables. Provides advice and counsel related to the technology or operations of the business. Work impacts an entire area, which eventually affects the overall performance and effectiveness of the sub-function/job family.Responsibilities:Conducts strategic data analysis, identifies insights and implications and make strategic recommendations, develops data displays that clearly communicate complex analysis.Mines and analyzes data from various banking platforms to drive optimization and improve data quality.Deliver analytics initiatives to address business problems with the ability to determine data required, assess time effort required and establish a project plan.Consults with business clients to identify system functional specifications. Applies comprehensive understanding of how multiple areas collectively integrate to contribute towards achieving business goals.Consults with users and clients to solve complex system issues/problems through in-depth evaluation of business processes, systems and industry standards; recommends solutions.Leads system change process from requirements through implementation; provides user and operational support of application to business users.Formulates and defines systems scope and goals for complex projects through research and fact-finding combined with an understanding of applicable business systems and industry standards.Impacts the business directly by ensuring the quality of work provided by self and others; impacts own team and closely related work teams.Considers the business implications of the application of technology to the current business environment; identifies and communicates risks and impacts.Drives communication between business leaders and IT; exhibits sound and comprehensive communication and diplomacy skills to exchange complex information.Conduct workflow analysis, business process modeling; develop use cases, test plans, and business rules; assist in user acceptance testing.Collaborate on design and implementation of workflow solutions that provide long term scalability, reliability, and performance, and integration with reporting.Develop in-depth knowledge and proficiency of supported business areas and engage business partners in evaluating opportunities for process integration and refinement.Gather requirements and provide solutions across Business SectorsPartner with cross functional teams to analyze, deconstruct, and map current state process and identify improvement opportunities including creation of target operation models.Assist in negotiating for resources owned by other areas in order ensure required work is completed on scheduleDevelop and maintain documentation on an ongoing basis, and train new and existing usersDirect the communication of status, issue, and risk disposition to all stakeholders, including Senior ManagementDirect the identification of risks which impact project delivery and ensure mitigation strategies are developed and executed when necessaryEnsure that work flow business case / cost benefit analyses are in line with business objectivesDeliver coherent and concise communications detailing the scope, progress and results of initiatives underwayDevelop strategies to reduce costs, manage risk, and enhance servicesDeploy influencing and matrix management skills in order to ensure technology solutions meet business requirementsPerforms other duties and functions as assigned.Appropriately assess risk when business decisions are made, demonstrating particular consideration for the firm\'s reputation and safeguarding Citigroup, its clients and assets, by driving compliance with applicable laws, rules and regulations, adhering to Policy, applying sound ethical judgment regarding personal behavior, conduct and business practices, and escalating, managing and reporting control issues with transparency.Qualifications:MBA or Advanced Degree Information Systems, Business Analysis / Computer Science6-10 years experience using tools for statistical modeling of large data setsProcess Improvement or Project Management experienceEducation:Bachelor’s/University degree or equivalent experience, potentially Masters degreeThis job description provides a high-level review of the types of work performed. Other job-related duties may be assigned as required.-------------------------------------------------Job Family Group: Technology-------------------------------------------------Job Family:Data Science------------------------------------------------------Time Type:Full time------------------------------------------------------Primary Location:Irving Texas United States------------------------------------------------------Primary Location Salary Range:$121,560.00 - $182,340.00------------------------------------------------------Citi is an equal opportunity and affirmative action employer.Qualified applicants will receive consideration without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or status as a protected veteran.Citigroup Inc. and its subsidiaries ("Citi”) invite all qualified interested applicants to apply for career opportunities. If you are a person with a disability and need a reasonable accommodation to use our search tools and/or apply for a career opportunity review Accessibility at Citi.View the "EEO is the Law" poster. View the EEO is the Law Supplement.View the EEO Policy Statement.View the Pay Transparency Posting\n\n\n Show more\n\n \n\n\n Show less\n\n \n\n\n'
After cleaning the text string, the next step is breaking the string upon into it component parts - its words and special characters - which are called tokens. To tokenize a text, word_tokenize() is used from nltk; it parses each word by space, and speicial characters with themselves. To demonstrate the result the first 15 tokens were printed to console.
tokens=nltk.word_tokenize(text);
# display the first 15 tokens
tokens[0:15]
['Job', 'Id', ':', '22598145The', 'Data', 'Science', 'Lead', 'Analyst', 'is', 'a', 'strategic', 'professional', 'who', 'stays', 'abreast']
With the text tokenized, the next step is to remove the transition words and special characters, as these words are used for flow, context and tense in written communication, but is not the core content itself. And so as a consequence to discern the true content of a text, one needs to stripe these frivalous words/special characters from the text, so to attain the content words, that cares the sentiment/meaning of the text.
Fortunate for ourselves Natural Language Toolkit (e.g. nltk) provides a predefined "English" stopwords list through nltk.corpus.stopwords .words("english"); which helps as a basic to create a stopword dictionary, which is simply transition words and special characters that are filler in a text that one desires to remove to attain the content words. Nonetheless as the predefined "English" stopwords list was proven insufficient to remove all frivalous words/special characters, we had to amend the list with additional special characters and special words; this new list is what was defined in our stopword dictionary, e.g. stopwords.
With the stopword dictionary complete, an comprehensive list was used to remove the defined words form the text: the result the text went from 6,212 tokens to 538 tokens. These 538 tokens are the content words of the Data science job posting; these are the words that care meaning in the text.
# Stop words: transition words and special characters
Additional_Stopwords=['-','\n','--','-Job','Id','.','``',"''",')','(',',',':',';','’','/'];
stopwords=nltk.corpus.stopwords.words("english")+Additional_Stopwords;
# Removes the Stop words from text
ModifiedText=[w for w in tokens if w.lower() not in stopwords];
# Display the original character count with the new text count
print("Text: " + str(len(text)));
print("Text - stopwords removed: " + str(len(ModifiedText)));
Text: 6212 Text - stopwords removed: 538
What are the top ten most frequent content words? To answer that question one needs to first create a frequency distribution of the content words with nltk.FreqDist(), then implement through dot-heirarchy .most_common(10); one can then print the result with pprint().
fd=nltk.FreqDist(ModifiedText);
content_words=fd.most_common();
pprint(content_words[0:10]);
[('business', 23),
('data', 6),
('complex', 5),
('work', 5),
('communication', 4),
('required', 4),
('areas', 4),
('process', 4),
('strategic', 3),
('within', 3)]
Due to limitation with with find() from nltk, we had to hand entry this data into Python lists to generate a Pandas dataframe. The following column variables were used in the generation of a Pandas dataframe, named dataframe: Date, Company, Industry, Position, Experience, Employment, Salary, Job_description, Content_words, and Website (e.g. website url).
Date=['Two weeks ago','One week ago','One day ago','One week ago','Three weeks ago','Two days ago','One Month'
,'Three days ago','Three days ago','Three days ago','One month ago','One week ago','Five days ago'
,'Six days ago','One months ago','Two weeks ago','One week ago','One week ago','One month ago'
,'One month ago'];
Company=['Citi','Charles Schwab','Strker','AEGIS Hedging','Aditi Consulting','Gartner','Concurrency Inc'
,'Quantlab Group','Tata Consultancy Services','Google','Balyasny Asset Management L.P.'
,'Biamp','Amherst','Texas Capital Bank','Abbott','Medpace','Strategic Staffing Solutions'
,'SoFi','incedo','AE Studio'];
Industry=['Financial Services','Financial Services','Medical Equipment Manufacturing','Financial Services'
,'IT Services and IT Consulting','Information','IT Services and IT Consulting','Financial Services'
,'IT Services and IT Consulting','Technology, Information and Internet','Investment Management'
,'Appliances, Electrical, and Electronics Manufacturing','Investment Management','Banking'
,'Hospitals and Healthcare','Pharmaceutical Manufacturing','IT Services and IT Consulting'
,'Financial Services','Information Technology & Services','Software Development'];
Position=['Data Scientist','Data Scientist','Data Scientist - Sales Operations (Remote)','Python/R Quantitative Developer'
,'Machine Learning Engineer','Data Scientist','Data Scientist','Quantitative Developer','Data Scientist'
,'AI Consultant, Google Cloud','Investment Data Analyst - Equities','AI Opportunities Analyst',
'Financial Data Scientist','Data Analyst','Data Scientist','Statistical Analyst - Experienced'
,'Data Scientist (Remote)','Staff Data Scientist - Machine Learning','Machine Learning Engineer'
,'Data Scientist'];
Experience=['Senior','Entry','Associate','Senior','Mid-Senior','Associate','Entry','Associate','Mid-Senior'
,'Senior','Associate','Entry','Mid-Senior','Mid-Senior','Associate','Mid-Senior',
'Entry','Senior','Mid-Senior','Entry'];
Employment=['Full-Time','Full-Time','Full-Time','Full-Time','Contract','Full-Time','Full-Time','Full-Time'
,'Full-Time','Full-Time','Full-Time','Full-Time','Full-Time','Full-Time','Full-Time'
,'Full-Time','Contract','Full-Time','Full-Time','Full-Time'];
Salary=['121,256/yr-182,340/yr','81,800/yr-167,200/yr','83,000/yr-176,800','Not-posted','115,200/yr-134,400/yr'
,'Not-posted','Not-posted','146,000/yr-194,000/yr','Not-posted','215,000/yr-315,000/yr','Not-posted'
,'Not-posted','Not-post','Not-posted','Not-posted','Not-posted','Not-posted','162,000/yr-247,500/yr'
,'Not-posted','120000/yr-220000/yr'];
Job_description=[];
Job_description.append(text);
Content_words=[];
Content_words.append(content_words);
Response=[' '];
Website=['https://jobs.citi.com/job/-/-/287/42525672736?source=LinkedInJB&dclid=CJKQydmBh_8CFc3HGAIdw90DJA'
,'https://www.linkedin.com/jobs/view/3613370923/?alternateChannel=search&refId=3JbP0Tt1XIXpTWTT0lD8gw%3D%3D&trackingId=%2BpfA1xUVxZ1eWp3i1tX3Lg%3D%3D&trk=d_flagship3_search_srp_jobs'
,'https://www.linkedin.com/jobs/view/3613108464/?alternateChannel=search&refId=%2FkWkKxTFjB3FfdAProdkbQ%3D%3D&trackingId=nA%2BLlIWXwZiwfKoIPJNE2w%3D%3D&trk=d_flagship3_search_srp_jobs'
,'https://www.linkedin.com/jobs/view/3599977799/?eBP=CwEAAAGIP8g534pplxr1vYZ_6J-B0u9Xl4s3mN-A8kzMYJyvUcZ5adabeKC0Cuf0_4zpcyXokuzF44-TuDZQ_Vq8fhsCGQ2G4_P9bHPRHO8UulZlQuLnrp_JxcdTyCQ29K_nwTai-s9xgCb3rqwt4XPJ3Cy4dIRw7ZlD0nDatE6yzphRJ4elK9qjJTth5YUXt2YTcxUvITdvrPvAwCfBX7RqV5rCs7T1NuEMejJNP0QrrAUMYs94Eujwrn6JViC3Elrh8Ig4-qybRymK_E7v-YDAabbISh0C_-_PLSfZMeF7dfHvqPNqdNblLm_-GY4d-QUieZtY0IAKP0s1b5au1qWXFPgNj8C-zKF-QS-t9fv84rGbt00yhlFGqoWKdpE2crvavQKX1SY&recommendedFlavor=SCHOOL_RECRUIT&refId=%2FkWkKxTFjB3FfdAProdkbQ%3D%3D&trackingId=%2FPLKxHVvQFxnGJi8r4nPgA%3D%3D&trk=flagship3_search_srp_jobs'
,'https://www.linkedin.com/jobs/view/3582138574/?alternateChannel=search&refId=3JbP0Tt1XIXpTWTT0lD8gw%3D%3D&trackingId=J7Cx84BH16Eyl%2BQRVSskfg%3D%3D&trk=d_flagship3_search_srp_jobs'
,'https://www.linkedin.com/jobs/view/3477165235/?eBP=CwEAAAGIP8g0KXzFEofkEzElfx54RNh0nOlnShDjndi4QqQOkqwwN8hbMGVioOW7NHTak_v6IDEWwPDc4oEoPd2r4twihcCYJnGi3g9bByLR2A8ecEjIGinHlUd4WSI-5-1rmOm61lFsuU63LwxjWCXO-IsZCyv-daOqM6DMYLxlcE8kkqoJqgy877PmFH5ojerEelowOFmdWxXKbfngAiLZNaahF4rfXeVuvmg0wszA1hS89nxMiNoepeoqRJkm4-tomwXpTD8tkUSOyeKNhKH0gbOJ1sITrsiOw2-J9r3ZJejufgIV63i44tTn9rSfku6dX3-bqwkDXy_sziqE63jd7Nq7irvezhVyUUJFrSQasT-mUohTDfOFT8Z4VSeCQycvYWep3Bk&recommendedFlavor=COMPANY_RECRUIT&refId=3JbP0Tt1XIXpTWTT0lD8gw%3D%3D&trackingId=TqZ7Qpvo3OX1rKQsbE4%2Fnw%3D%3D&trk=flagship3_search_srp_jobs'
,'https://www.linkedin.com/jobs/view/3576885404/?alternateChannel=search&refId=%2FkWkKxTFjB3FfdAProdkbQ%3D%3D&trackingId=pYdhv1%2FLXu%2BXatill6KqwA%3D%3D&trk=d_flagship3_search_srp_jobs'
,'https://www.linkedin.com/jobs/view/3533888707/?eBP=CwEAAAGIQHmkElZsjKXsgv7Uio5Yz3seGaXbtH0VJedWjQX2hiNSv84sJRpLrBUCVZJ0DJO3Aj7p_Ie8gk7dgRjTFhfTZP6W7wAkKatNrqi33SOxMlQxcp5f0XMLFo_J12KreXV8rlQMV6o_6pGkvHYezURpb5zWmewq9PqR68DWQ57rSUcnvtv9TB-xBNO1DRYmKaF10dCeIhojY4BhwB2b6X5KpO1dypD7E-iRJlyPG80XhhpZe_w_XGdbuclMgTclU1f0Fy0vS5vzzFnV31PErrat5Q4k6CqNGSA6m_w7FbkOikrf60Dd4_FWJaW2TtVzeIwUX6NuygD4lxmBn74iwACJLvyhhNBJ633_T6cYk_LA3uWEImM4T99JWYFxHXBAd7N3_jpjew&refId=2lqK2EIwoP%2FvzvGr0JOMFg%3D%3D&trackingId=UxEkMI4vaGKfDjn9ytiG0A%3D%3D&trk=flagship3_search_srp_jobs'
,'https://www.linkedin.com/jobs/view/3606141065/?alternateChannel=search&refId=2lqK2EIwoP%2FvzvGr0JOMFg%3D%3D&trackingId=H7TRoaQDIoi3GV9cOP1Ngw%3D%3D&trk=d_flagship3_postapply_demographics'
,'https://www.linkedin.com/jobs/view/3583462407/?alternateChannel=search&refId=2lqK2EIwoP%2FvzvGr0JOMFg%3D%3D&trackingId=1jewIfuP8C7DJJUywH8Khw%3D%3D&trk=d_flagship3_search_srp_jobs'
,'https://www.linkedin.com/jobs/view/3535282529/?alternateChannel=search&refId=2lqK2EIwoP%2FvzvGr0JOMFg%3D%3D&trackingId=HbWwvjEbZbKIPFSxxQ1mDw%3D%3D&trk=d_flagship3_search_srp_jobs'
,'https://www.linkedin.com/jobs/view/3580011415/?alternateChannel=search&refId=2lqK2EIwoP%2FvzvGr0JOMFg%3D%3D&trackingId=iDN25XLEt7TXdPZ%2Fh7J4VA%3D%3D&trk=d_flagship3_search_srp_jobs'
,'https://www.linkedin.com/jobs/view/3575142950/?alternateChannel=search&refId=2lqK2EIwoP%2FvzvGr0JOMFg%3D%3D&trackingId=mX30J8rjqeH00qDJfKLu6A%3D%3D&trk=d_flagship3_search_srp_jobs'
,'https://www.linkedin.com/jobs/view/3603254581/?alternateChannel=search&refId=2lqK2EIwoP%2FvzvGr0JOMFg%3D%3D&trackingId=55VlMw5TbhgR7BOJc9wz4Q%3D%3D&trk=d_flagship3_search_srp_jobs'
,'https://www.linkedin.com/jobs/view/3607337417/?alternateChannel=search&refId=2lqK2EIwoP%2FvzvGr0JOMFg%3D%3D&trackingId=MskjAWWvcsLFt5yiDfYdfQ%3D%3D&trk=d_flagship3_search_srp_jobs'
,'https://www.linkedin.com/jobs/view/3595564767/?alternateChannel=search&refId=YlVwMFTzTZ%2BK4mzVOGQVMw%3D%3D&trackingId=8%2B2pgOHhQrW%2B8KImLsU%2FSA%3D%3D&trk=d_flagship3_search_srp_jobs'
,'https://www.linkedin.com/jobs/view/3601784881/?alternateChannel=search&refId=YlVwMFTzTZ%2BK4mzVOGQVMw%3D%3D&trackingId=iWmX3DowrpGGxKspMX8Czg%3D%3D&trk=d_flagship3_search_srp_jobs'
,'https://www.linkedin.com/jobs/view/3602626868/?alternateChannel=search&refId=YlVwMFTzTZ%2BK4mzVOGQVMw%3D%3D&trackingId=PaIUJiUbsYox8Qyh8sNowg%3D%3D&trk=d_flagship3_search_srp_jobs'
,'https://www.linkedin.com/jobs/view/3568734219/?eBP=CwEAAAGIQHmtM7hiePfMP6z0GXdQ9JFGDpC8Arz8Rehh-uUKB0hOyxYFJxtvNWf7UiU-fqt21o6NnB_186LSAWNNHMQxrlMwLCubP-5FeseBtYvZ1BWK3d9b56uururxV8OpZcgBKz0InlxPlVxGEb-xe5B3Ak8z1sYZUhwUpQ-z6z92beMRnxTkFwKh_BQHWhmu6QRmmCIQDZLw0GW52mihT3tQqAXYJWACOVhFl3EQRFCceQ2B3GQR4SlWurvxDe9SEo2ZAUo8CeWXmzyWY5n_z0J8Nmh0Yti-B0GyDt2tMJtt6IEM5zAwHLGfgxgdnNFRye-Lwge7zVzjzAhMBKDtwFKjwvrDf5L8Db15KS-lBFNsfZvfDlDmDjRgAhSrPszGsX6uqw5spg&recommendedFlavor=SCHOOL_RECRUIT&refId=YlVwMFTzTZ%2BK4mzVOGQVMw%3D%3D&trackingId=Yd%2FKGb8lRSDuZ9VLo%2B5jow%3D%3D&trk=flagship3_search_srp_jobs'
,'https://www.linkedin.com/jobs/view/3586405131/?eBP=JOB_SEARCH_ORGANIC&refId=YlVwMFTzTZ%2BK4mzVOGQVMw%3D%3D&trackingId=cAIjHogg8mgMWkONgwQP8g%3D%3D&trk=flagship3_search_srp_jobs'];
2. Automate Concept and Full Data Analysis
The next step is the automation of Job Description scraping script: the next two code blocks accomplishes this objective. The LinkedIn urls were stored on a notepad .txt file, which was opened with open() and generated as a Python List named file. Using a element iterative for loop, each url was feed through Job_Data function; note nothing is returned, the Job descriptions and content words are appended to their respective lists.
def Job_Data(url,Job_description,Content_words,Response):
response=requests.get(str(url));
web=BeautifulSoup(response.content,'html.parser');
temp=web.find('div',class_="description__text description__text--rich");
temp_text=cleanhtml(str(temp));
Job_description.append(temp_text);
tokens=nltk.word_tokenize(temp_text)
ModifiedText=[w for w in tokens if w.lower() not in stopwords];
fd=nltk.FreqDist(ModifiedText);
content_words=fd.most_common();
Content_words.append(content_words);
Response=Response.append(' ');
pass;
file=open("LinkedIn_urls_Full.txt","r");
for links in file.readlines():
Job_Data(links,Job_description,Content_words,Response);
To generate the Pandas dataframe, one only needs to use pd.DataFrame() with an agrument, such as, { 'Name of column': data_list }. For convenience the resulted Pandas dataframe was printed to console.
dataframe=pd.DataFrame({"Date":Date, "Company":Company, "Industry":Industry, "Position":Position
, "Experience":Experience, "Employment":Employment, "Salary":Salary
, "Job_description":Job_description, "Content_words":Content_words
, "Website":Website});
dataframe
| Date | Company | Industry | Position | Experience | Employment | Salary | Job_description | Content_words | Website | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Two weeks ago | Citi | Financial Services | Data Scientist | Senior | Full-Time | 121,256/yr-182,340/yr | \n\n\n Job Id: 22598145The Data Science... | [(business, 23), (data, 6), (complex, 5), (wor... | https://jobs.citi.com/job/-/-/287/42525672736?... |
| 1 | One week ago | Charles Schwab | Financial Services | Data Scientist | Entry | Full-Time | 81,800/yr-167,200/yr | \n\n\n Your OpportunityAt Schwab, you’r... | [(data, 14), (business, 7), (experience, 5), (... | https://www.linkedin.com/jobs/view/3613370923/... |
| 2 | One day ago | Strker | Medical Equipment Manufacturing | Data Scientist - Sales Operations (Remote) | Associate | Full-Time | 83,000/yr-176,800 | \n\n\n Why join Stryker?We are proud to... | [(data, 10), (information, 5), (Stryker, 4), (... | https://www.linkedin.com/jobs/view/3613108464/... |
| 3 | One week ago | AEGIS Hedging | Financial Services | Python/R Quantitative Developer | Senior | Full-Time | Not-posted | \n\n\nCompany: AEGIS Hedging SolutionsAEGIS si... | [(data, 13), (financial, 9), (models, 6), (AEG... | https://www.linkedin.com/jobs/view/3599977799/... |
| 4 | Three weeks ago | Aditi Consulting | IT Services and IT Consulting | Machine Learning Engineer | Mid-Senior | Contract | 115,200/yr-134,400/yr | \n\n\nDetails/Scope of the project: We are see... | [(data, 7), (skills, 6), (team, 4), (pipelines... | https://www.linkedin.com/jobs/view/3582138574/... |
| 5 | Two days ago | Gartner | Information | Data Scientist | Associate | Full-Time | Not-posted | \n\n\nAbout the role: This is a unique opportu... | [(data, 10), (science, 7), (e.g., 6), (product... | https://www.linkedin.com/jobs/view/3477165235/... |
| 6 | One Month | Concurrency Inc | IT Services and IT Consulting | Data Scientist | Entry | Full-Time | Not-posted | \n\n\nWho We AreWe are change agents. We are i... | [(data, 12), (Data, 6), (machine, 5), (learnin... | https://www.linkedin.com/jobs/view/3576885404/... |
| 7 | Three days ago | Quantlab Group | Financial Services | Quantitative Developer | Associate | Full-Time | 146,000/yr-194,000/yr | \n\n\nWe are seeking a Quantitative Developer ... | [(Quantlab, 9), (work, 5), (trading, 4), (writ... | https://www.linkedin.com/jobs/view/3533888707/... |
| 8 | Three days ago | Tata Consultancy Services | IT Services and IT Consulting | Data Scientist | Mid-Senior | Full-Time | Not-posted | \n\n\nAbout TCS :Tata Consultancy Services is ... | [(analysis, 4), (data, 4), (Job, 3), (Neo4j, 3... | https://www.linkedin.com/jobs/view/3606141065/... |
| 9 | Three days ago | Technology, Information and Internet | AI Consultant, Google Cloud | Senior | Full-Time | 215,000/yr-315,000/yr | \n\n\n This role may also be located in... | [(USA, 12), (technical, 9), (Google, 8), (cust... | https://www.linkedin.com/jobs/view/3583462407/... | |
| 10 | One month ago | Balyasny Asset Management L.P. | Investment Management | Investment Data Analyst - Equities | Associate | Full-Time | Not-posted | \n\n\nROLE OVERVIEWWithin a global team of dat... | [(data, 15), (Data, 5), (working, 5), (experie... | https://www.linkedin.com/jobs/view/3535282529/... |
| 11 | One week ago | Biamp | Appliances, Electrical, and Electronics Manufa... | AI Opportunities Analyst | Entry | Full-Time | Not-posted | \n\n\nThe role, at a glance: The AI Opportun... | [(AI, 12), (Biamp, 10), (business, 7), (role, ... | https://www.linkedin.com/jobs/view/3580011415/... |
| 12 | Five days ago | Amherst | Investment Management | Financial Data Scientist | Mid-Senior | Full-Time | Not-post | \n\n\nResponsibilities:Support modeling analyt... | [(business, 4), (modeling, 3), (team, 3), (tea... | https://www.linkedin.com/jobs/view/3575142950/... |
| 13 | Six days ago | Texas Capital Bank | Banking | Data Analyst | Mid-Senior | Full-Time | Not-posted | \n\n\nA Data Analyst collects data about an or... | [(data, 13), (experience, 10), (skills, 9), (m... | https://www.linkedin.com/jobs/view/3603254581/... |
| 14 | One months ago | Abbott | Hospitals and Healthcare | Data Scientist | Associate | Full-Time | Not-posted | \n\n\n Abbott is a global healthcare le... | [(data, 8), (Abbott, 6), (company, 6), (people... | https://www.linkedin.com/jobs/view/3607337417/... |
| 15 | Two weeks ago | Medpace | Pharmaceutical Manufacturing | Statistical Analyst - Experienced | Mid-Senior | Full-Time | Not-posted | \n\n\nResponsibilities Write statistical progr... | [(Medpace, 5), (clinical, 4), (work, 4), (stat... | https://www.linkedin.com/jobs/view/3595564767/... |
| 16 | One week ago | Strategic Staffing Solutions | IT Services and IT Consulting | Data Scientist (Remote) | Entry | Contract | Not-posted | \n\n\n STRATEGIC STAFFING SOLUTIONS (S3... | [(Data, 6), (S3, 5), (Expert, 5), (business, 4... | https://www.linkedin.com/jobs/view/3601784881/... |
| 17 | One week ago | SoFi | Financial Services | Staff Data Scientist - Machine Learning | Senior | Full-Time | 162,000/yr-247,500/yr | \n\n\nEmployee Applicant Privacy NoticeWho we ... | [(learning, 10), (machine, 8), (Risk, 6), (mod... | https://www.linkedin.com/jobs/view/3602626868/... |
| 18 | One month ago | incedo | Information Technology & Services | Machine Learning Engineer | Mid-Senior | Full-Time | Not-posted | \n\n\nJob Title: Machine Learning Engineer Loc... | [(work, 3), (experience, 3), (knowledge, 3), (... | https://www.linkedin.com/jobs/view/3568734219/... |
| 19 | One month ago | AE Studio | Software Development | Data Scientist | Entry | Full-Time | 120000/yr-220000/yr | \n\n\n AE Studio is an LA-based company... | [(equity, 11), (AE, 9), (!, 9), (projects, 8),... | https://www.linkedin.com/jobs/view/3586405131/... |
The following just prints the number of elements for each column variable; i.e. the column variable Date contains twenty dates.
print("Date: " + str(len(Date)))
print("Company: " + str(len(Company)))
print("Industry: " + str(len(Industry)))
print("Position: " + str(len(Position)))
print("Experience: " + str(len(Experience)))
print("Employment: " + str(len(Employment)))
print("Salary: " + str(len(Salary)))
print("Job_description: " + str(len(Job_description)))
print("Content_words: " + str(len(Content_words)))
print("Response: " + str(len(Response)))
print("Website: " + str(len(Website)))
Date: 20 Company: 20 Industry: 20 Position: 20 Experience: 20 Employment: 20 Salary: 20 Job_description: 20 Content_words: 20 Response: 20 Website: 20
For the convenience of the reader, we generate easilly clickable links of the Job Postings.
# Easy access to the LinkedIn urls
file=open("LinkedIn_urls_Full.txt","r");
for links in file.readlines():
print(links)
https://www.linkedin.com/jobs/view/3613370923/?alternateChannel=search&refId=3JbP0Tt1XIXpTWTT0lD8gw%3D%3D&trackingId=%2BpfA1xUVxZ1eWp3i1tX3Lg%3D%3D&trk=d_flagship3_search_srp_jobs https://www.linkedin.com/jobs/view/3613108464/?alternateChannel=search&refId=%2FkWkKxTFjB3FfdAProdkbQ%3D%3D&trackingId=nA%2BLlIWXwZiwfKoIPJNE2w%3D%3D&trk=d_flagship3_search_srp_jobs https://www.linkedin.com/jobs/view/3599977799/?eBP=CwEAAAGIP8g534pplxr1vYZ_6J-B0u9Xl4s3mN-A8kzMYJyvUcZ5adabeKC0Cuf0_4zpcyXokuzF44-TuDZQ_Vq8fhsCGQ2G4_P9bHPRHO8UulZlQuLnrp_JxcdTyCQ29K_nwTai-s9xgCb3rqwt4XPJ3Cy4dIRw7ZlD0nDatE6yzphRJ4elK9qjJTth5YUXt2YTcxUvITdvrPvAwCfBX7RqV5rCs7T1NuEMejJNP0QrrAUMYs94Eujwrn6JViC3Elrh8Ig4-qybRymK_E7v-YDAabbISh0C_-_PLSfZMeF7dfHvqPNqdNblLm_-GY4d-QUieZtY0IAKP0s1b5au1qWXFPgNj8C-zKF-QS-t9fv84rGbt00yhlFGqoWKdpE2crvavQKX1SY&recommendedFlavor=SCHOOL_RECRUIT&refId=%2FkWkKxTFjB3FfdAProdkbQ%3D%3D&trackingId=%2FPLKxHVvQFxnGJi8r4nPgA%3D%3D&trk=flagship3_search_srp_jobs https://www.linkedin.com/jobs/view/3582138574/?alternateChannel=search&refId=3JbP0Tt1XIXpTWTT0lD8gw%3D%3D&trackingId=J7Cx84BH16Eyl%2BQRVSskfg%3D%3D&trk=d_flagship3_search_srp_jobs https://www.linkedin.com/jobs/view/3477165235/?eBP=CwEAAAGIP8g0KXzFEofkEzElfx54RNh0nOlnShDjndi4QqQOkqwwN8hbMGVioOW7NHTak_v6IDEWwPDc4oEoPd2r4twihcCYJnGi3g9bByLR2A8ecEjIGinHlUd4WSI-5-1rmOm61lFsuU63LwxjWCXO-IsZCyv-daOqM6DMYLxlcE8kkqoJqgy877PmFH5ojerEelowOFmdWxXKbfngAiLZNaahF4rfXeVuvmg0wszA1hS89nxMiNoepeoqRJkm4-tomwXpTD8tkUSOyeKNhKH0gbOJ1sITrsiOw2-J9r3ZJejufgIV63i44tTn9rSfku6dX3-bqwkDXy_sziqE63jd7Nq7irvezhVyUUJFrSQasT-mUohTDfOFT8Z4VSeCQycvYWep3Bk&recommendedFlavor=COMPANY_RECRUIT&refId=3JbP0Tt1XIXpTWTT0lD8gw%3D%3D&trackingId=TqZ7Qpvo3OX1rKQsbE4%2Fnw%3D%3D&trk=flagship3_search_srp_jobs https://www.linkedin.com/jobs/view/3576885404/?alternateChannel=search&refId=%2FkWkKxTFjB3FfdAProdkbQ%3D%3D&trackingId=pYdhv1%2FLXu%2BXatill6KqwA%3D%3D&trk=d_flagship3_search_srp_jobs https://www.linkedin.com/jobs/view/3533888707/?eBP=CwEAAAGIQHmkElZsjKXsgv7Uio5Yz3seGaXbtH0VJedWjQX2hiNSv84sJRpLrBUCVZJ0DJO3Aj7p_Ie8gk7dgRjTFhfTZP6W7wAkKatNrqi33SOxMlQxcp5f0XMLFo_J12KreXV8rlQMV6o_6pGkvHYezURpb5zWmewq9PqR68DWQ57rSUcnvtv9TB-xBNO1DRYmKaF10dCeIhojY4BhwB2b6X5KpO1dypD7E-iRJlyPG80XhhpZe_w_XGdbuclMgTclU1f0Fy0vS5vzzFnV31PErrat5Q4k6CqNGSA6m_w7FbkOikrf60Dd4_FWJaW2TtVzeIwUX6NuygD4lxmBn74iwACJLvyhhNBJ633_T6cYk_LA3uWEImM4T99JWYFxHXBAd7N3_jpjew&refId=2lqK2EIwoP%2FvzvGr0JOMFg%3D%3D&trackingId=UxEkMI4vaGKfDjn9ytiG0A%3D%3D&trk=flagship3_search_srp_jobs https://www.linkedin.com/jobs/view/3606141065/?alternateChannel=search&refId=2lqK2EIwoP%2FvzvGr0JOMFg%3D%3D&trackingId=H7TRoaQDIoi3GV9cOP1Ngw%3D%3D&trk=d_flagship3_postapply_demographics https://www.linkedin.com/jobs/view/3583462407/?alternateChannel=search&refId=2lqK2EIwoP%2FvzvGr0JOMFg%3D%3D&trackingId=1jewIfuP8C7DJJUywH8Khw%3D%3D&trk=d_flagship3_search_srp_jobs https://www.linkedin.com/jobs/view/3535282529/?alternateChannel=search&refId=2lqK2EIwoP%2FvzvGr0JOMFg%3D%3D&trackingId=HbWwvjEbZbKIPFSxxQ1mDw%3D%3D&trk=d_flagship3_search_srp_jobs https://www.linkedin.com/jobs/view/3580011415/?alternateChannel=search&refId=2lqK2EIwoP%2FvzvGr0JOMFg%3D%3D&trackingId=iDN25XLEt7TXdPZ%2Fh7J4VA%3D%3D&trk=d_flagship3_search_srp_jobs https://www.linkedin.com/jobs/view/3575142950/?alternateChannel=search&refId=2lqK2EIwoP%2FvzvGr0JOMFg%3D%3D&trackingId=mX30J8rjqeH00qDJfKLu6A%3D%3D&trk=d_flagship3_search_srp_jobs https://www.linkedin.com/jobs/view/3603254581/?alternateChannel=search&refId=2lqK2EIwoP%2FvzvGr0JOMFg%3D%3D&trackingId=55VlMw5TbhgR7BOJc9wz4Q%3D%3D&trk=d_flagship3_search_srp_jobs https://www.linkedin.com/jobs/view/3607337417/?alternateChannel=search&refId=2lqK2EIwoP%2FvzvGr0JOMFg%3D%3D&trackingId=MskjAWWvcsLFt5yiDfYdfQ%3D%3D&trk=d_flagship3_search_srp_jobs https://www.linkedin.com/jobs/view/3595564767/?alternateChannel=search&refId=YlVwMFTzTZ%2BK4mzVOGQVMw%3D%3D&trackingId=8%2B2pgOHhQrW%2B8KImLsU%2FSA%3D%3D&trk=d_flagship3_search_srp_jobs https://www.linkedin.com/jobs/view/3601784881/?alternateChannel=search&refId=YlVwMFTzTZ%2BK4mzVOGQVMw%3D%3D&trackingId=iWmX3DowrpGGxKspMX8Czg%3D%3D&trk=d_flagship3_search_srp_jobs https://www.linkedin.com/jobs/view/3602626868/?alternateChannel=search&refId=YlVwMFTzTZ%2BK4mzVOGQVMw%3D%3D&trackingId=PaIUJiUbsYox8Qyh8sNowg%3D%3D&trk=d_flagship3_search_srp_jobs https://www.linkedin.com/jobs/view/3568734219/?eBP=CwEAAAGIQHmtM7hiePfMP6z0GXdQ9JFGDpC8Arz8Rehh-uUKB0hOyxYFJxtvNWf7UiU-fqt21o6NnB_186LSAWNNHMQxrlMwLCubP-5FeseBtYvZ1BWK3d9b56uururxV8OpZcgBKz0InlxPlVxGEb-xe5B3Ak8z1sYZUhwUpQ-z6z92beMRnxTkFwKh_BQHWhmu6QRmmCIQDZLw0GW52mihT3tQqAXYJWACOVhFl3EQRFCceQ2B3GQR4SlWurvxDe9SEo2ZAUo8CeWXmzyWY5n_z0J8Nmh0Yti-B0GyDt2tMJtt6IEM5zAwHLGfgxgdnNFRye-Lwge7zVzjzAhMBKDtwFKjwvrDf5L8Db15KS-lBFNsfZvfDlDmDjRgAhSrPszGsX6uqw5spg&recommendedFlavor=SCHOOL_RECRUIT&refId=YlVwMFTzTZ%2BK4mzVOGQVMw%3D%3D&trackingId=Yd%2FKGb8lRSDuZ9VLo%2B5jow%3D%3D&trk=flagship3_search_srp_jobs https://www.linkedin.com/jobs/view/3586405131/?eBP=JOB_SEARCH_ORGANIC&refId=YlVwMFTzTZ%2BK4mzVOGQVMw%3D%3D&trackingId=cAIjHogg8mgMWkONgwQP8g%3D%3D&trk=flagship3_search_srp_jobs
With the content word data compiled for each of the twenty job listings, it is time to visualize said content words for each of these listings.
First off is the Data science role at Citi job listing; we prints the top twenty words that were most frequented on the post.
# Top twenty words for Citi - Data Scientist
print(Content_words[0][0:20]);
[('business', 23), ('data', 6), ('complex', 5), ('work', 5), ('communication', 4), ('required', 4), ('areas', 4), ('process', 4), ('strategic', 3), ('within', 3), ('application', 3), ('area', 3), ('provide', 3), ('skills', 3), ('order', 3), ('technology', 3), ('clients', 3), ('system', 3), ('systems', 3), ('solutions', 3)]
Suprisely 'business' was the most common word, followed with 'data' as the second most common. An key observation is that each job listting will contain different most frequented content words, but all posts will contain words that are related to the field of Data Science and Machine learning.
The following bar graph can illustrate the magnitude of each word frequency in the post, we create such a graph for each of the job listings.
unique_word=[];
word_count=[];
for i in range(len(Content_words[0][0:20])):
unique_word.append(Content_words[0][i][0]);
word_count.append(Content_words[0][i][1]);
plt.bar(unique_word,word_count);
plt.title('Citi - Data Scientist - Job post');
plt.xlabel("Top Twenty Content Words");
plt.ylabel("Frequency");
plt.xticks(rotation=45, ha='right');
plt.show();
For a small experiment, let create a small dictionary of technical skills that might be include in a Data science job post; we choice Python as it is the most predominant object-oriented programming language, the only other Object-orient programming language that can rival Python is C++, and that is due to it being a low-level programming language with tighter memory management and far more efficient computation. Next we chose SQL (structure query language) as it is the main language to create, maintina and query a relational structured data server/database; while Nosql is the same but for non-relational strcuture data, such as XML data. Following Excel was chosen, as it is still a predominant software in the industry; though is mostly used in Data Analysis roles. Lastly "learn", "machine",, "algorithm" were chosen, as machine learning is an important part of Data science, and also remember that Cwords is entirely composed of a tokenized text, compound words have been seperated.
Cwords=[];
for i in range(len(Content_words[1][:])):
Cwords.append(Content_words[1][i][0]);
Contained=[];
Technicals=['python','sql','nosql','excel','learn','machine','algorithms'];
contents=[];
for i in Cwords:
if i.lower() in Technicals:
contents.append(True);
Contained.append(i);
else:
contents.append(False);
print("Technical words contained inside post: " + str(sum(contents)));
print(" ")
print("Technical words: ");
print(Contained);
Technical words contained inside post: 4 Technical words: ['algorithms', 'Python', 'machine', 'SQL']
Only four technical words were found in the Citi - Data science post, 'algorithms', 'python', 'machine' and 'SQL'. For the convenience of the reader, we print the whole contents of the Citi - Data science role Content word list; it is a lengthy print statement to console, so we only did this once, but we thought it useful for the reader to see the full list of content words.
print(Content_words[0]);
[('business', 23), ('data', 6), ('complex', 5), ('work', 5), ('communication', 4), ('required', 4), ('areas', 4), ('process', 4), ('strategic', 3), ('within', 3), ('application', 3), ('area', 3), ('provide', 3), ('skills', 3), ('order', 3), ('technology', 3), ('clients', 3), ('system', 3), ('systems', 3), ('solutions', 3), ('opportunities', 3), ('ensure', 3), ('risk', 3), ('EEO', 3), ('Data', 2), ('Science', 2), ('job', 2), ('multiple', 2), ('diplomacy', 2), ('others', 2), ('particular', 2), ('impact', 2), ('related', 2), ('impacts', 2), ('performance', 2), ('analysis', 2), ('identifies', 2), ('implications', 2), ('initiatives', 2), ('assess', 2), ('time', 2), ('project', 2), ('identify', 2), ('functional', 2), ('comprehensive', 2), ('understanding', 2), ('in-depth', 2), ('industry', 2), ('requirements', 2), ('implementation', 2), ('provides', 2), ('user', 2), ('scope', 2), ('applicable', 2), ('current', 2), ('risks', 2), ('sound', 2), ('workflow', 2), ('modeling', 2), ('use', 2), ('rules', 2), ('integration', 2), ('Business', 2), ('including', 2), ('status', 2), ('strategies', 2), ('duties', 2), ('consideration', 2), ('Policy', 2), ('experience', 2), ('tools', 2), ('review', 2), ('-Job', 2), ('Family', 2), ('Primary', 2), ('Location', 2), ('$', 2), ('Citi', 2), ('opportunity', 2), ('applicants', 2), ('disability', 2), ('apply', 2), ('career', 2), ('Law', 2), ('Show', 2), ('Job', 1), ('Id', 1), ('22598145The', 1), ('Lead', 1), ('Analyst', 1), ('professional', 1), ('stays', 1), ('abreast', 1), ('developments', 1), ('field', 1), ('contributes', 1), ('directional', 1), ('strategy', 1), ('considering', 1), ('Recognized', 1), ('technical', 1), ('authority', 1), ('Requires', 1), ('basic', 1), ('commercial', 1), ('awareness', 1), ('typically', 1), ('people', 1), ('level', 1), ('subject', 1), ('matter', 1), ('expertise', 1), ('Developed', 1), ('guide', 1), ('influence', 1), ('convince', 1), ('colleagues', 1), ('occasional', 1), ('external', 1), ('customers', 1), ('Significant', 1), ('deliverables', 1), ('Provides', 1), ('advice', 1), ('counsel', 1), ('operations', 1), ('Work', 1), ('entire', 1), ('eventually', 1), ('affects', 1), ('overall', 1), ('effectiveness', 1), ('sub-function/job', 1), ('family.Responsibilities', 1), ('Conducts', 1), ('insights', 1), ('make', 1), ('recommendations', 1), ('develops', 1), ('displays', 1), ('clearly', 1), ('communicate', 1), ('analysis.Mines', 1), ('analyzes', 1), ('various', 1), ('banking', 1), ('platforms', 1), ('drive', 1), ('optimization', 1), ('improve', 1), ('quality.Deliver', 1), ('analytics', 1), ('address', 1), ('problems', 1), ('ability', 1), ('determine', 1), ('effort', 1), ('establish', 1), ('plan.Consults', 1), ('specifications', 1), ('Applies', 1), ('collectively', 1), ('integrate', 1), ('contribute', 1), ('towards', 1), ('achieving', 1), ('goals.Consults', 1), ('users', 1), ('solve', 1), ('issues/problems', 1), ('evaluation', 1), ('processes', 1), ('standards', 1), ('recommends', 1), ('solutions.Leads', 1), ('change', 1), ('operational', 1), ('support', 1), ('users.Formulates', 1), ('defines', 1), ('goals', 1), ('projects', 1), ('research', 1), ('fact-finding', 1), ('combined', 1), ('standards.Impacts', 1), ('directly', 1), ('ensuring', 1), ('quality', 1), ('provided', 1), ('self', 1), ('team', 1), ('closely', 1), ('teams.Considers', 1), ('environment', 1), ('communicates', 1), ('impacts.Drives', 1), ('leaders', 1), ('exhibits', 1), ('exchange', 1), ('information.Conduct', 1), ('develop', 1), ('cases', 1), ('test', 1), ('plans', 1), ('assist', 1), ('acceptance', 1), ('testing.Collaborate', 1), ('design', 1), ('long', 1), ('term', 1), ('scalability', 1), ('reliability', 1), ('reporting.Develop', 1), ('knowledge', 1), ('proficiency', 1), ('supported', 1), ('engage', 1), ('partners', 1), ('evaluating', 1), ('refinement.Gather', 1), ('across', 1), ('SectorsPartner', 1), ('cross', 1), ('teams', 1), ('analyze', 1), ('deconstruct', 1), ('map', 1), ('state', 1), ('improvement', 1), ('creation', 1), ('target', 1), ('operation', 1), ('models.Assist', 1), ('negotiating', 1), ('resources', 1), ('owned', 1), ('completed', 1), ('scheduleDevelop', 1), ('maintain', 1), ('documentation', 1), ('ongoing', 1), ('basis', 1), ('train', 1), ('new', 1), ('existing', 1), ('usersDirect', 1), ('issue', 1), ('disposition', 1), ('stakeholders', 1), ('Senior', 1), ('ManagementDirect', 1), ('identification', 1), ('delivery', 1), ('mitigation', 1), ('developed', 1), ('executed', 1), ('necessaryEnsure', 1), ('flow', 1), ('case', 1), ('cost', 1), ('benefit', 1), ('analyses', 1), ('line', 1), ('objectivesDeliver', 1), ('coherent', 1), ('concise', 1), ('communications', 1), ('detailing', 1), ('progress', 1), ('results', 1), ('underwayDevelop', 1), ('reduce', 1), ('costs', 1), ('manage', 1), ('enhance', 1), ('servicesDeploy', 1), ('influencing', 1), ('matrix', 1), ('management', 1), ('meet', 1), ('requirementsPerforms', 1), ('functions', 1), ('assigned.Appropriately', 1), ('decisions', 1), ('made', 1), ('demonstrating', 1), ('firm', 1), ("'s", 1), ('reputation', 1), ('safeguarding', 1), ('Citigroup', 1), ('assets', 1), ('driving', 1), ('compliance', 1), ('laws', 1), ('regulations', 1), ('adhering', 1), ('applying', 1), ('ethical', 1), ('judgment', 1), ('regarding', 1), ('personal', 1), ('behavior', 1), ('conduct', 1), ('practices', 1), ('escalating', 1), ('managing', 1), ('reporting', 1), ('control', 1), ('issues', 1), ('transparency.Qualifications', 1), ('MBA', 1), ('Advanced', 1), ('Degree', 1), ('Information', 1), ('Systems', 1), ('Analysis', 1), ('Computer', 1), ('Science6-10', 1), ('years', 1), ('using', 1), ('statistical', 1), ('large', 1), ('setsProcess', 1), ('Improvement', 1), ('Project', 1), ('Management', 1), ('experienceEducation', 1), ('Bachelor', 1), ('s/University', 1), ('degree', 1), ('equivalent', 1), ('potentially', 1), ('Masters', 1), ('degreeThis', 1), ('description', 1), ('high-level', 1), ('types', 1), ('performed', 1), ('job-related', 1), ('may', 1), ('assigned', 1), ('required.', 1), ('Group', 1), ('Technology', 1), ('Time', 1), ('Type', 1), ('Full', 1), ('Irving', 1), ('Texas', 1), ('United', 1), ('States', 1), ('Salary', 1), ('Range', 1), ('121,560.00', 1), ('182,340.00', 1), ('equal', 1), ('affirmative', 1), ('action', 1), ('employer.Qualified', 1), ('receive', 1), ('without', 1), ('regard', 1), ('race', 1), ('color', 1), ('religion', 1), ('sex', 1), ('sexual', 1), ('orientation', 1), ('gender', 1), ('identity', 1), ('national', 1), ('origin', 1), ('protected', 1), ('veteran.Citigroup', 1), ('Inc.', 1), ('subsidiaries', 1), ('”', 1), ('invite', 1), ('qualified', 1), ('interested', 1), ('person', 1), ('need', 1), ('reasonable', 1), ('accommodation', 1), ('search', 1), ('and/or', 1), ('Accessibility', 1), ('Citi.View', 1), ('poster', 1), ('View', 1), ('Supplement.View', 1), ('Statement.View', 1), ('Pay', 1), ('Transparency', 1), ('Posting', 1), ('less', 1)]
Next upon Charles Schwab - Data Scientist.
# Top twenty words for Charles Schwab - Data Scientist
print(Content_words[1][0:20]);
[('data', 14), ('business', 7), ('experience', 5), ('understand', 4), ('algorithms', 4), ('NLP', 4), ('Schwab', 3), ('across', 3), ('apply', 3), ('partners', 3), ('including', 3), ('Dialogflow', 3), ('models', 3), ('years', 3), ('building', 3), ('Knowledge', 3), ('make', 2), ('innovative', 2), ('creative', 2), ('using', 2)]
We plot yet again the top twenty most frequented words.
unique_word=[];
word_count=[];
for i in range(len(Content_words[1][0:20])):
unique_word.append(Content_words[1][i][0]);
word_count.append(Content_words[1][i][1]);
plt.bar(unique_word,word_count);
plt.title('Charles Schwab - Data Scientist - Job post');
plt.xlabel("Top Twenty Content Words");
plt.ylabel("Frequency");
plt.xticks(rotation=45,ha='right');
plt.show();
Next upon Strker - Data Scientist - Sales Operations (Remote).
# Top twenty words for Strker - Data Scientist - Sales Operations (Remote)
print(Content_words[2][0:20]);
[('data', 10), ('information', 5), ('Stryker', 4), ('based', 4), ('science', 4), ('operations', 4), ('experience', 4), ('one', 3), ('healthcare', 3), ('programs', 3), ('Data', 3), ('improve', 3), ('requirements', 3), ('work', 3), ('opportunities', 3), ('business', 3), ('Azure', 3), ('Power', 3), ('BI', 3), ("'s", 3)]
As the know procedure goes.
unique_word=[];
word_count=[];
for i in range(len(Content_words[2][0:20])):
unique_word.append(Content_words[2][i][0]);
word_count.append(Content_words[2][i][1]);
plt.bar(unique_word,word_count);
plt.title('Strker - Data Scientist - Sales Operations (Remote) - Job post');
plt.xlabel("Top Twenty Content Words");
plt.ylabel("Frequency");
plt.xticks(rotation=45,ha='right');
plt.show();
Following up AEGIS Hedging - Python/R Quantitative Developer.
# Top twenty words for AEGIS Hedging - Python/R Quantitative Developer
print(Content_words[3][0:20]);
[('data', 13), ('financial', 9), ('models', 6), ('AEGIS', 5), ('modeling', 5), ('analysis', 5), ('solutions', 4), ('tools', 4), ('Python/R', 4), ('Experience', 4), ('Hedging', 3), ('employees', 3), ('experience', 3), ('team', 3), ('pipelines', 3), ('maintain', 3), ('software', 3), ('related', 3), ('including', 3), ('time', 3)]
Following the known procedure.
unique_word=[];
word_count=[];
for i in range(len(Content_words[3][0:20])):
unique_word.append(Content_words[3][i][0]);
word_count.append(Content_words[3][i][1]);
plt.bar(unique_word,word_count);
plt.title('AEGIS Hedging - Python/R Quantitative Developer - Job post');
plt.xlabel("Top Twenty Content Words");
plt.ylabel("Frequency");
plt.xticks(rotation=45,ha='right');
plt.show();
Subequently next Aditi Consulting - Machine Learning Engineer
# Top twenty words for Aditi Consulting - Machine Learning Engineer
print(Content_words[4][0:20]);
[('data', 7), ('skills', 6), ('team', 4), ('pipelines', 4), ('work', 4), ('large', 3), ('SQL', 3), ('Excellent', 3), ('problem-solving', 3), ('critical', 3), ('thinking', 3), ('Ability', 3), ('well', 3), ('environment', 3), ('effectively', 3), ('communicate', 3), ('technical', 3), ('concepts', 3), ('non-technical', 3), ('stakeholders', 3)]
Proceeding on with the known procedure.
unique_word=[];
word_count=[];
for i in range(len(Content_words[4][0:20])):
unique_word.append(Content_words[4][i][0]);
word_count.append(Content_words[4][i][1]);
plt.bar(unique_word,word_count);
plt.title('Aditi Consulting - Machine Learning Engineer - Job post');
plt.xlabel("Top Twenty Content Words");
plt.ylabel("Frequency");
plt.xticks(rotation=45,ha='right');
plt.show();
Consequently Gartner - Data Scientist.
# Top twenty words for Gartner - Data Scientist
print(Content_words[5][0:20]);
[('data', 10), ('science', 7), ('e.g.', 6), ('product', 5), ('business', 5), ('Gartner', 5), ('products', 4), ('may', 4), ('learning', 4), ('opportunities', 4), ('development', 4), ('place', 4), ('status', 4), ('join', 3), ('organization', 3), ('work', 3), ('new', 3), ('design', 3), ('client', 3), ('great', 3)]
As eventual the known procedure.
unique_word=[];
word_count=[];
for i in range(len(Content_words[5][0:20])):
unique_word.append(Content_words[5][i][0]);
word_count.append(Content_words[5][i][1]);
plt.bar(unique_word,word_count);
plt.title('Gartner - Data Scientist - Job post');
plt.xlabel("Top Twenty Content Words");
plt.ylabel("Frequency");
plt.xticks(rotation=45,ha='right');
plt.show();
Once more into the breach, Concurrency Inc - Data Scientist
# Top twenty words for Concurrency Inc - Data Scientist
print(Content_words[6][0:20]);
[('data', 12), ('Data', 6), ('machine', 5), ('learning', 5), ('technical', 4), ('Learning', 4), ('development', 4), ('solutions', 3), ('customer', 3), ('needs', 3), ("'ll", 3), ('Machine', 3), ('science', 3), ('models', 3), ('work', 3), ('change', 2), ('inspired', 2), ('technology', 2), ('team', 2), ('status', 2)]
And yet again, a known procedure.
unique_word=[];
word_count=[];
for i in range(len(Content_words[6][0:20])):
unique_word.append(Content_words[6][i][0]);
word_count.append(Content_words[6][i][1]);
plt.bar(unique_word,word_count);
plt.title('Gartner - Data Scientist - Job post');
plt.xlabel("Top Twenty Content Words");
plt.ylabel("Frequency");
plt.xticks(rotation=45,ha='right');
plt.show();
Twice into the breach, Quantlab Group - Quantitative Developer.
# Top twenty words for Quantlab Group - Quantitative Developer
print(Content_words[7][0:20]);
[('Quantlab', 9), ('work', 5), ('trading', 4), ('written', 3), ('including', 3), ('paid', 3), ('resumes', 3), ('search', 3), ('firms', 3), ('team', 2), ('creating', 2), ('tools', 2), ('daily', 2), ('candidate', 2), ('Houston', 2), ('systems', 2), ('technical', 2), ('science', 2), ('math', 2), ('development', 2)]
And yet again, with a known procedure of data visualization.
unique_word=[];
word_count=[];
for i in range(len(Content_words[7][0:20])):
unique_word.append(Content_words[7][i][0]);
word_count.append(Content_words[7][i][1]);
plt.bar(unique_word,word_count);
plt.title('Quantlab Group - Quantitative Developer - Job post');
plt.xlabel("Top Twenty Content Words");
plt.ylabel("Frequency");
plt.xticks(rotation=45,ha='right');
plt.show();
Thrice into the breach, Tata Consultancy Services - Data Scientist.
# Top twenty words for Tata Consultancy Services - Data Scientist
print(Content_words[8][0:20]);
[('analysis', 4), ('data', 4), ('Job', 3), ('Neo4j', 3), ('fraud', 3), ('graph', 3), ('TCS', 2), ('Tata', 2), ('ETL', 2), ('procedures', 2), ('SQL', 2), ('SAS', 2), ('R', 2), ('science', 2), ('using', 2), ('results', 2), ('Show', 2), ('Consultancy', 1), ('Services', 1), ('Indian', 1)]
Onwards with the know procedure.
unique_word=[];
word_count=[];
for i in range(len(Content_words[8][0:20])):
unique_word.append(Content_words[8][i][0]);
word_count.append(Content_words[8][i][1]);
plt.bar(unique_word,word_count);
plt.title('Tata Consultancy Services - Data Scientist - Job post');
plt.xlabel("Top Twenty Content Words");
plt.ylabel("Frequency");
plt.xticks(rotation=45,ha='right');
plt.show();
Okay, I am out of idea on what to say now, so proceeding on to Google - AI Consultant, Google Cloud.
# Top twenty words for Google - AI Consultant, Google Cloud
print(Content_words[9][0:20]);
[('USA', 12), ('technical', 9), ('Google', 8), ('customers', 6), ('CA', 5), ('client', 5), ('manage', 5), ('solutions', 5), ('customer', 5), ('role', 4), ('also', 4), ('location', 4), ('Cloud', 4), ('business', 4), ('technology', 4), ('cloud', 4), ('benefits', 4), ('best', 4), ('salary', 4), ('range', 4)]
Know procedure was implemented.
unique_word=[];
word_count=[];
for i in range(len(Content_words[9][0:20])):
unique_word.append(Content_words[9][i][0]);
word_count.append(Content_words[9][i][1]);
plt.bar(unique_word,word_count);
plt.title('Google - AI Consultant, Google Cloud - Job post');
plt.xlabel("Top Twenty Content Words");
plt.ylabel("Frequency");
plt.xticks(rotation=45,ha='right');
plt.show();
Next Balyasny Asset Management L.P. - Investment Data Analyst - Equities.
# Top twenty words for Balyasny Asset Management L.P. - Investment Data Analyst - Equities
print(Content_words[10][0:20]);
[('data', 15), ('Data', 5), ('working', 5), ('experience', 5), ('team', 3), ('content', 3), ('across', 2), ('BAM', 2), ('serve', 2), ('quantitative', 2), ('assist', 2), ('new', 2), ('datasets', 2), ('requirements', 2), ('ownership', 2), ('onboarding', 2), ('closely', 2), ('related', 2), ('years', 2), ('and/or', 2)]
A known procedure was implemented for the millionth time.
unique_word=[];
word_count=[];
for i in range(len(Content_words[10][0:20])):
unique_word.append(Content_words[10][i][0]);
word_count.append(Content_words[10][i][1]);
plt.bar(unique_word,word_count);
plt.title('Balyasny Asset Management L.P. - Investment Data Analyst - Equities - Job post');
plt.xlabel("Top Twenty Content Words");
plt.ylabel("Frequency");
plt.xticks(rotation=45,ha='right');
plt.show();
Another AI developer job, Biamp - AI Opportunities Analyst
# Top twenty words for Biamp - AI Opportunities Analyst
print(Content_words[11][0:20]);
[('AI', 12), ('Biamp', 10), ('business', 7), ('role', 5), ('work', 4), ('solutions', 4), ('people', 4), ('audiovisual', 3), ('stakeholders', 3), ('tools', 3), ('related', 3), ('team', 3), ('great', 3), ('believe', 3), ('Opportunities', 2), ('Analyst', 2), ('ways', 2), ('leverage', 2), ('technology', 2), ('improve', 2)]
And yet again, another implementation of the known procedure.
unique_word=[];
word_count=[];
for i in range(len(Content_words[11][0:20])):
unique_word.append(Content_words[11][i][0]);
word_count.append(Content_words[11][i][1]);
plt.bar(unique_word,word_count);
plt.title('Biamp - AI Opportunities Analyst - Job post');
plt.xlabel("Top Twenty Content Words");
plt.ylabel("Frequency");
plt.xticks(rotation=45,ha='right');
plt.show();
Another financial job that pertains to Data science, Amherst - Financial Data Scientist (Note: we had two previous one, Citi and AEGIS Hedging).
# Top twenty words for Amherst - Financial Data Scientist
print(Content_words[12][0:20]);
[('business', 4), ('modeling', 3), ('team', 3), ('teams', 3), ('analytics', 2), ('research', 2), ('project', 2), ('strong', 2), ('learn', 2), ('new', 2), ('quantitative', 2), ('skills', 2), ('experience', 2), ('real', 2), ('estate', 2), ('Show', 2), ('Responsibilities', 1), ('Support', 1), ('production', 1), ('supportMaintain', 1)]
Despite a unique job listing, the known procedure is neither unique nor rare.
unique_word=[];
word_count=[];
for i in range(len(Content_words[12][0:20])):
unique_word.append(Content_words[12][i][0]);
word_count.append(Content_words[12][i][1]);
plt.bar(unique_word,word_count);
plt.title('Amherst - Financial Data Scientist - Job post');
plt.xlabel("Top Twenty Content Words");
plt.ylabel("Frequency");
plt.xticks(rotation=45,ha='right');
plt.show();
A fourth job in Fiance related to Data science, Texas Capital Bank - Data Analyst. Note this post was verbal diarrhea, as it contained every single buzz words in the book, and also relied the banks need for a whole data science team not a single data analyst. Why?
# Top twenty words for Texas Capital Bank - Data Analyst
print(Content_words[13][0:20]);
[('data', 13), ('experience', 10), ('skills', 9), ('management', 8), ('analysis', 5), ('information', 4), ('requirements', 4), ('team', 4), ('Data', 3), ('business', 3), ('using', 3), ('analyze', 3), ('statistical', 3), ('including', 3), ('knowledge', 3), ('work', 3), ('time', 3), ('systems', 2), ('include', 2), ('insight', 2)]
As much as the confoundedness of the current job listting a rose in me; my unease was resolve by the know fact that the known procedure yet again had to be implemented in the following code block. All was made right...
unique_word=[];
word_count=[];
for i in range(len(Content_words[13][0:20])):
unique_word.append(Content_words[13][i][0]);
word_count.append(Content_words[13][i][1]);
plt.bar(unique_word,word_count);
plt.title('Texas Capital Bank - Data Analyst - Job post');
plt.xlabel("Top Twenty Content Words");
plt.ylabel("Frequency");
plt.xticks(rotation=45,ha='right');
plt.show();
Next is Abbott Hospitals and Healthcare - Data Scientist, which is a medical equipment company owned by Johnson & Johnson.
# Top twenty words for Abbott - Data Scientist
print(Content_words[14][0:20]);
[('data', 8), ('Abbott', 6), ('company', 6), ('people', 5), ('work', 5), ('analysis', 5), ('experience', 5), ('medical', 4), ('career', 4), ('healthcare', 3), ('health', 3), ('retirement', 3), ('business', 3), ('etc', 3), ('eg', 3), ('programs', 3), ('global', 2), ('leader', 2), ('live', 2), ('fully', 2)]
Another unique posting, but non-unique and well-known procedure.
unique_word=[];
word_count=[];
for i in range(len(Content_words[14][0:20])):
unique_word.append(Content_words[14][i][0]);
word_count.append(Content_words[14][i][1]);
plt.bar(unique_word,word_count);
plt.title('Abbott - Data Scientist - Job post');
plt.xlabel("Top Twenty Content Words");
plt.ylabel("Frequency");
plt.xticks(rotation=45,ha='right');
plt.show();
It like an academic research job, but it acually pays well; Medpace - Statistical Analyst - Experienced.
# Top twenty words for Medpace - Statistical Analyst - Experienced
print(Content_words[15][0:20]);
[('Medpace', 5), ('clinical', 4), ('work', 4), ('statistical', 3), ('analysis', 3), ('study', 3), ('development', 3), ('local', 3), ('expertise', 3), ('across', 3), ('30', 3), ('programs', 2), ('methods', 2), ('review', 2), ('data', 2), ('key', 2), ('Knowledge', 2), ('pharmaceutical', 2), ('CRO', 2), ('medical', 2)]
A known procedure was executed.
unique_word=[];
word_count=[];
for i in range(len(Content_words[15][0:20])):
unique_word.append(Content_words[15][i][0]);
word_count.append(Content_words[15][i][1]);
plt.bar(unique_word,word_count);
plt.title('Medpace - Statistical Analyst - Experienced - Job post');
plt.xlabel("Top Twenty Content Words");
plt.ylabel("Frequency");
plt.xticks(rotation=45,ha='right');
plt.show();
Don't know anything about this company; other than it is a IT firm: Strategic Staffing Solutions - Data Scientist (Remote). And a huge plus, it is remote.
# Top twenty words for Strategic Staffing Solutions - Data Scientist (Remote)
print(Content_words[16][0:20]);
[('Data', 6), ('S3', 5), ('Expert', 5), ('business', 4), ('Insurance', 4), ('Scientist', 3), ('Science', 3), ('SQL', 3), ('MS', 3), ('–', 3), ('Intermediate', 3), ('!', 2), ('Corp', 2), ('Remote', 2), ('#', 2), ('Develops', 2), ('complex', 2), ('models', 2), ('and/or', 2), ('customers', 2)]
Known procedure successfully executed. Pay incremented by a single dollar.
unique_word=[];
word_count=[];
for i in range(len(Content_words[16][0:20])):
unique_word.append(Content_words[16][i][0]);
word_count.append(Content_words[16][i][1]);
plt.bar(unique_word,word_count);
plt.title('Strategic Staffing Solutions - Data Scientist (Remote) - Job post');
plt.xlabel("Top Twenty Content Words");
plt.ylabel("Frequency");
plt.xticks(rotation=45,ha='right');
plt.show();
Another financial service firm needing a Data scientist: SoFi - Staff Data Scientist - Machine Learning.
# Top twenty words for SoFi - Staff Data Scientist - Machine Learning
print(Content_words[17][0:20]);
[('learning', 10), ('machine', 8), ('Risk', 6), ('models', 6), ('work', 5), ('support', 5), ('business', 5), ('SoFi', 4), ('Engineering', 4), ('etc', 4), ('complex', 4), ('skills', 4), ('financial', 3), ('Data', 3), ('various', 3), ('credit', 3), ('closely', 3), ('Product', 3), ('teams', 3), ('solve', 3)]
Known procedure was successfully computated by Python compiler; no errors were prompted in console. Credit score incremented by 1.
unique_word=[];
word_count=[];
for i in range(len(Content_words[17][0:20])):
unique_word.append(Content_words[17][i][0]);
word_count.append(Content_words[17][i][1]);
plt.bar(unique_word,word_count);
plt.title('SoFi - Staff Data Scientist - Machine Learning - Job post');
plt.xlabel("Top Twenty Content Words");
plt.ylabel("Frequency");
plt.xticks(rotation=45,ha='right');
plt.show();
An other machine learning developer/research role: incedo - Machine Learning Engineer.
# Top twenty words for incedo - Machine Learning Engineer
print(Content_words[18][0:20]);
[('work', 3), ('experience', 3), ('knowledge', 3), ('thinking', 3), ('across', 2), ('Services', 2), ('developing', 2), ('client', 2), ('skills', 2), ('data', 2), ('systems', 2), ('business', 2), ('distributed', 2), ('libraries', 2), ('machine', 2), ('learning', 2), ('Python', 2), ('creative', 2), ('problem', 2), ('solving', 2)]
Know procedure implemented again.
unique_word=[];
word_count=[];
for i in range(len(Content_words[18][0:20])):
unique_word.append(Content_words[18][i][0]);
word_count.append(Content_words[18][i][1]);
plt.bar(unique_word,word_count);
plt.title('incedo - Machine Learning Engineer - Job post');
plt.xlabel("Top Twenty Content Words");
plt.ylabel("Frequency");
plt.xticks(rotation=45,ha='right');
plt.show();
Not sure what a custom software development firm needs a data scientist for, but here is a job listing: AE Studio - Data Scientist.
# Top twenty words for AE Studio - Data Scientist
print(Content_words[19][0:20]);
[('equity', 11), ('AE', 9), ('!', 9), ('projects', 8), ('$', 8), ('?', 7), ('work', 6), ('agency', 5), ('...', 5), ('Skunkworks', 5), ('human', 4), ('may', 4), ('data', 4), ('one', 4), ('client', 4), ('value', 4), ('receive', 4), ('free', 4), ('“', 3), ('”', 3)]
The known procedure was successfully implemented for the last time, before it was permanently deleted from the PC's RAM (random assess memory).
unique_word=[];
word_count=[];
for i in range(len(Content_words[19][0:20])):
unique_word.append(Content_words[19][i][0]);
word_count.append(Content_words[19][i][1]);
plt.bar(unique_word,word_count);
plt.title('AE Studio - Data Scientist - Job post');
plt.xlabel("Top Twenty Content Words");
plt.ylabel("Frequency");
plt.xticks(rotation=45,ha='right');
plt.show();
We might in the near future revisit this Jupyter notebook to add additional Sentiment analysis; but for done it has come to an end. We hope this Jupyter notebook was useful in some way.
Citations:
- [1]: Richardson, Leonard. Beautiful Soup Python Package. Crummy.com, 2023.
Url: https://www.crummy.com/software/BeautifulSoup/bs4/doc/ - [2]: Bird, Steven. Loper, Edward. Klein, Ewan. Natural Language Processing with Python. NLTK Project, 2009.
Url: https://www.nltk.org/