2017-10-14T15:30:23+00:00

Some articles about building web-based business

October 14, 2017/4dimensioncc/Leave a comment

https://levels.io/how-i-build-my-minimum-viable-products/ : a lot of useful tools were introduced there.

https://levels.io/go-fucking-do-it/

https://levels.io/how-i-built-a-remote-jobs-board/

https://levels.io/product-hunt-hacker-news-number-one/

https://levels.io/run-through-ideas-quickly/

How to change default path for Mac screenshot

October 14, 2017/4dimensioncc/Leave a comment

In Terminal, type:

defaults write com.apple.screencapture location <desired dir>

Then type:

killall SystemUIServer

From now on, the screenshot default location will be changed.

Data scientist – series 1: Intro to Python

September 30, 2017September 30, 2017/4dimensioncc/Leave a comment

Udacity course: https://classroom.udacity.com/courses/ud1110/lessons/

Notes:

string formatting:One particularly useful string method is format. The format method is used to construct strings by inserting values into template strings. Consider this example for generating log messages for a hypothetical web server.
```
log_message = "IP address {} accessed {} at {}".format(user_ip, url, now)
```
If the variables user_ip, url and now are defined then they will be substituted for the {} placeholder values

Web Crawler with Beautiful Soup

September 30, 2017/4dimensioncc/Leave a comment

Today I tried to play with Beautiful soup to retrieve website data, code as below:

import urllib2
from bs4 import BeautifulSoup as soup

quote_page = 'https://weworkremotely.com/'
page = urllib2.urlopen(quote_page)
read_soup = soup(page, "html.parser")

name_box_job = read_soup.find('span', attrs={'class': 'title'})
name_box_company = read_soup.find('span', attrs={'class':'company'})

job = name_box_job.text.strip()
company = name_box_company.text.strip()
print ("{} : {}".format(company, job))

The output is :

Citron Pharmaceutical : Clerical Customer Support

Basically I am accessing the weworkremotely.com, to retrieve the first item who is a span and under class “title” and and first item under class “company”.

But since there are many jobs posted on the website, I would like to retrieve all the posts and companies with same attribute class “title” and class “company”.

So instead of using read_soup.find, I should use read_soup.find_all, and a for loop to get all the items in a list.

import urllib2
from bs4 import BeautifulSoup as soup

quote_page = 'https://weworkremotely.com/'
page = urllib2.urlopen(quote_page)
read_soup = soup(page, "html.parser")
jobs = []
companys = []
name_box_job = read_soup.find_all('span', attrs={'class': 'title'})
name_box_company = read_soup.find_all('span', attrs={'class':'company'})

for n in range(len(name_box_job)):
    jobs.append(name_box_job[n].get_text())

for m in range(len(name_box_company)):

    jobs.append(name_box_company[m].get_text())

print jobs, companys

However, the output format from this code is very ugly, need to work on the improvement.

Reference websites:

http://web.stanford.edu/~zlotnick/TextAsData/Web_Scraping_with_Beautiful_Soup.html

http://altitudelabs.com/blog/web-scraping-with-python-and-beautiful-soup/

http://pwp.stevecassidy.net/dataweb/crawling.html

https://beautiful-soup-4.readthedocs.io/en/latest/

Word Cloud with Python

September 29, 2017/4dimensioncc/Leave a comment

Today I played with python word cloud library, it is quite easy to use and output is interesting.

The word cloud library here: https://github.com/amueller/word_cloud,

And owner`s blog http://peekaboo-vision.blogspot.hk/2012/11/a-wordcloud-in-python.html

The code:

import numpy as np 
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
#%matplotlib inline
from PIL import Image

from subprocess import check_output
from wordcloud import WordCloud, STOPWORDS

#mpl.rcParams['figure.figsize']=(8.0,6.0) 
mpl.rcParams['font.size']=12                
mpl.rcParams['savefig.dpi']=100             
mpl.rcParams['figure.subplot.bottom']=.1

stopwords = set(STOPWORDS)
data = pd.read_csv('ted_main.csv')

wordcloud = WordCloud(
                          background_color='black',
                          stopwords=stopwords,
                          max_words=200,
                          max_font_size=40,
                          random_state=42,
                         ).generate(str(data['description']))

print(wordcloud)
fig = plt.figure(1)
plt.imshow(wordcloud)
plt.axis('off')
plt.show()
fig.savefig("wordcloud.png", dpi=1200)

It is a very light weighted code, I used the TED talk subjects data as the input, and below is the output word cloud:

The above is the basic version, I tried a little more features such as adding more defined words to be skipped :

#more_stopwords = {'talk','TED','Information','event'}
#STOPWORDS = STOPWORDS.union(more_stopwords)

and add a mask to the output so the word cloud will be in a different shape:

image = Image.open('mask2.png')
mask = np.array(image)

wordcloud = WordCloud(
                          background_color='black',
                          stopwords=stopwords,
                          max_words=200,
                          max_font_size=40,
                          random_state=42,
                          mask = mask
                         ).generate(str(data['description']))

Output as below:

This is a very interesting topic and I will continue to expand on the current result, so just store the resources here and I will come back later to try them out:

http://luisvalesilva.com/datasimple/word_clouds.html

https://happygostacie.wordpress.com/2016/04/22/word-clouds-in-python-what-a-pil/ (this is interesting, can pick the color of word cloud based on the input mask)

http://minimaxir.com/2016/05/wordclouds/ (interesting post)

https://pypi.python.org/pypi/facebook_wordcloud/1.01b (word cloud library for Facebook chat history)

Python: The _imagingft C module is not installed when running wordcloud code

September 29, 2017/4dimensioncc/Leave a comment

When I try to run the wordcloud python library today, I received the error:” The _imagingft C module is not installed”. The reason is freetype was not installed on my Mac. Tried a lot of methods, finally the below one worked:

I have homebrew installed already,

First,