Developing Search Tool for Acfun

When the self-owned search function was cancel by the monkey, I planned to develop this tool. After thinking and strolling in my room for 2 hours, I thought I had known what to do and started to make it.

当猿把ac娘自带的搜索引擎吞了,我便开始计划做这个工具。经过2小时的构想和到处乱走,我感到自己已经知道要怎样弄了,于是开工。

Python, which I fought for this semi-finished Acfun Search Tool with, was new for me 3 days ago. I had a chance to choose a script to develop the tool, for I’ve know nothing about technique newer than JAVA. Finally I chose Python, because:

我花了3天完成现在的半成品,用的是Python,一种从来没用过的语言。我可以考虑用任何脚本语言,反正所有比JAVA要新的语言我都不会用。用Python的原因是:

  1. It worked immediately after it was installed without any other configurations.
  2. It ran as a double-click rather than “python yourname.py<enter>”, like VBScript. It is much shorter, isn’t it?
  3. For the reality. Google App Engine has supported it, and I meant to put the tool there to benefit from the powerful Google. Later I realized that GAE has maximum of 1M for data-transfer, so I changed my hosting place.
  1. 装好以后就可以用了,不用任何附加设置。
  2. 程序文件双击就可以执行,像VBS一样,不需要用命令行执行。对我来说这很重要,因为够简单,哈哈。
  3. 实际上是因为Google App Engine支持它。原本我打算在GAE上放这个程序,以便利用实力强大的Google,但我很快发现GAE只支持1M以内的数据读写,于是我改放到其它免费空间中。

At the very start, I planned to fecth all the data from acfun.cn first, keep it up to date, and built a RSS feed for it so others can secondary exploite it, but I decide to finish the search function first. I spent 2 days grabing datas from Acfun, modifying my crawler to suit to the irregular but met frequently data for several times, and change the data-structure from CSV file to SQLite database(CSV has been used for GAE). When Acfun became busy at prime time, I had to pause the crawler.

最初我打算先抓取全部数据,然后制作RSS,那样其他人也可以用这些数据来二次开发,但我决定还是先做出搜索功能。花了2天去抓数据,途中对爬虫进行了几次修改,又将数据格式从csv改成SQLite数据库(当初考虑用csv是因为GAE)。当Acfun进入访问黄金时间,就只能暂停抓取。

Finally the crawler accompalished its mission and was reformed to keep the data new. I’ve been aware of Python web frameworks such as django, but I decided to use CGI script, because it’s much easier for me who knows nothing about frameworks or MVCs or MVTs. When I directly use the hosting to test my scripts, my work increasing geometrically. Now it works, although hardly with user-interface.

数据抓完后,我便将爬虫改成数据更新器。虽然留意了很多Python的web架构,但我完全不懂那些东西,所以还是决定用最简单的Python CGI Scripts。因为要在远程服务器上调试,所以工作量非常大,无论如何现在是弄好了,但几乎没有任何用户界面……能用就行。

I’ve published to TIEBA of Acfun so as to ease maybe some load of Acfun Server. I hope people will benefit from it, and I have to go back to revise now. Visit @ http://illustrate.heliohost.org/ac.htm.

已经在Acfun吧发布了,希望能够帮到人并且减轻Ac娘的负担,我要回到复习状态……访问上面的网址就可以使用这个工具。

发表评论

电子邮件地址不会被公开。 必填项已用*标注

此站点使用Akismet来减少垃圾评论。了解我们如何处理您的评论数据