I was broken.

爸爸妈妈吵架了.

回想起在家的这几天,沉重的气氛让人眩晕.肃静,冷清取代了往日的欢声笑语,孤独的夜,暴风雨暗暗积攒着能量,秋风刮出了多少感伤,天昏地暗间,一声吵骂打破了最后的安详,在毫无保留地释放着的,是填不平的怒火与绝望.曾经举案齐眉,如今仇深似海,黑白电影般参差破烂的画片,断断续续的在上演,吱嘎吱嘎,末了,一曲绝唱,几许苍凉.

世间再无比这更让人捶胸顿足,徒呼奈何的悲剧了.手心是爸,手背是妈,手心手背,血浓于水.父母,恨不得也说不得,再多的坚强也是徒劳,再多的努力也将付之东流.我的心,碎了,头皮发麻,四肢乏力,已然内伤.

天啊,我已看不清前方.

“Do not go gentle into that good night. Rage, rage against the dying of the light.” – Dylan Thomas

野鸡版delicious推荐系统

日子有点无聊,照着<Programming collective Intelligence>写了一个delicious推荐的小模型(代码),就当作是再学python的hello, world! :p 以后无聊了就跑一下代码,看看有什么值得看的, HoHo~

运行结果如下:

>>> from delicious import *
>>> table=initializeUserDict('programming')
>>> table['myself']={}
>>> fillItems(table) 考虑到国内的网络,冲杯咖啡先吧...
>>> from recommendations import *
# 对用户username推荐n条网页链接
>>> getRecommendations(table,"username")[0:n]
# 输出结果前者为推荐值,后者为链接
[(0.61988282180389409, 'http://html5boilerplate.com/mobile/')]
...

deliciousapi

deliciousapi是非官方的api,用于从delicious上拿数据,它的接口很简洁,作者博客上有相应的demo,这里不废话.

initializeUserDict and fillItems

initializeUserDict(tag, count)首先通过deliciousapi.DeliciousAPI().get_urls(‘tag’)获得某个tag下的count链接,然后通过.get_url(‘url’)获取该链接相关的信息,e.g. bookmarks,tags.最后收集bookmarks对应的用户到字典user_dict,将其返回.代码如下:

def initializeUserDict(tag, count=1):
   user_dict={}
   dapi = deliciousapi.DeliciousAPI()

   for url in dapi.get_urls(tag=tag)[0:count]:
      for item in dapi.get_url(url).bookmarks:
         user_dict[item[0]]={}
   return user_dict

fillIteams(user_dict)遍历字典user_dict,通过.get_user(user)获取user收集的bookmarks.如果某个url被所有user收藏,那么将user_dict[user][url]置位,否则清零.代码如下:

def fillItems(user_dict):
   all_items={}
   dapi = deliciousapi.DeliciousAPI()

   for user in user_dict:
      for i in range(3):
         try:
            posts=dapi.get_user(user)
            break
         except:
            print "Fail user "+user+", retrying"
            time.sleep(4)
      for item in posts.bookmarks:
      	 url=item[0]
      	 user_dict[user][url]=1.0
      	 all_items[url]=1

   for ratings in user_dict.values():
      for item in all_items:
         if item not in ratings:
         	ratings[item]=0.0

经过initializeUserDict()和fillItems()两步后,得到的user_dict形如:

{'username': {'http://url': 0.0}, {'http://lru': 1.0},
 'nameuser': {'http://url': 0.0}, {'http://lru': 1.0},
  ...
}

getRecommendations

现在看看最核心的推荐算法,其实很简单:

def sim_distance(prefs, person1, person2):
   si={}
   for item in prefs[person1]:
      if item in prefs[person2]:
           si[item]=1

   if len(si)==0: return 0

   sum_of_squares=sum([pow(prefs[person1][item]-prefs[person2][item], 2)
                     for item in prefs[person1] if item in prefs[person2]])

   return 1/(1+sqrt(sum_of_squares))

# gets recommendations for a person by using a weighted average
# of every other user's rankings
def getRecommendations(prefs, person, similarity=sim_distance):
   totals={}
   simSums={}

   for other in prefs:
      if other==person: continue
      sim=similarity(prefs,person,other)

      if sim<=0: continue

      for item in prefs[other]:
         # only score those haven't seen yet
         if item not in prefs[person] or prefs[person][item]==0:
            #similarity * score
            totals.setdefault(item,0)
            totals[item]+=prefs[other][item]*sim
            # sum of similarities
            simSums.setdefault(item,0)
            simSums[item]+=sim

   #create the normolized list
   rankings=[(total/simSums[item],item) for item,total in totals.items()]

   # return the sorted list
   rankings.sort()
   rankings.reverse()
   return rankings

参数person表示被推荐人,而参数similarity表示计算Correlation and dependence的函数,默认值为sim_distance,其实也就是个Euclidean Distance.

Conclusion and next steps

小模型很简单,有空时可以完善一下算法.( 如果让C程序员看见如此大的稀疏矩阵,那得有多纠结 :( )

小计划

最近太挫了,不知道为什么总是很烦躁,没有多少心思读书,感觉日子过得很快,一眨眼时间就没了.明明知道是焦虑的缘故,可仍旧魂不守舍,静不下来.

毕业之前的这段时间我都打算在学校里度过了.为了抓住时间,定个小计划吧:

  • 学习Linux内核,主要是网络协议栈.
  • 读完lighttpd代码,之后再读点其他代码.
  • 写一个httpd.
  • 彻底入门python.
  • boost? design pattern?

为了SMART一点,我要多写博客,多努力.

最后,多放松,只剩下最后这点自由的日子了,唉~.