分类 ‘Uncategorized’ 的归档

Nutch 1.0 use experience

十一月 23, 2009

create a dmoz folder under the bin folder
create a urls text file to contain the urls:
http://www.britishhorrorfilms.co.uk/rillington.shtml
http://www.shoestring.org/mmi_revs/10-rillington-place.html
http://www.tvguide.com/movies/database/ShowMovie.asp?MI=22983
http://us.imdb.com/title/tt0066730/
http://www.geocities.com/aaronbcaldwell/1984.html
http://orwell.ru/a_life/movies/m84_01.htm
http://www.britmovie.co.uk/genres/fiction/filmography/014.html
http://adrianmco.batcave.net/1984.htm
http://us.imdb.com/title/tt0114746/
http://www.geocities.com/darkdaze18/
http://apolloguide.com/mov_revtemp.asp?Title=13th+Warrior,+The
http://www.boxofficemojo.com/13thwarrior.html
http://movie-reviews.colossus.net/movies/t/13th_warrior.html
http://ter.air0day.com/13thwarrior.shtml
http://www.metacritic.com/video/titles/13thwarrior
http://us.imdb.com/title/tt0120657/
http://www.all-reviews.com/videos/thirteenth-warrior.htm
http://www.haro-online.com/movies/13th_warrior.html
http://www.rottentomatoes.com/movie-1091574/
http://upcomingmovies.com/13thwarrior.html
http://www.brunching.com/selfmade/selfmade-thirteenthwarrior.html
http://www.filmtracks.com/titles/13th_warrior.html
http://www.100girls.net/
http://imdb.com/title/tt0214388/
http://us.imdb.com/title/tt0146394/
http://us.imdb.com/title/tt0085121/

tingan@tingan-laptop:~/download/spider/nutch/bin$ cd dmoz/
tingan@tingan-laptop:~/download/spider/nutch/bin/dmoz$ ls
urls
tingan@tingan-laptop:~/download/spider/nutch/bin/dmoz$ gedit urls
tingan@tingan-laptop:~/download/spider/nutch/bin/dmoz$ cd ..
tingan@tingan-laptop:~/download/spider/nutch/bin$ ./nutch inject crawl-20091123183328/crawldb/ dmoz/
Injector: starting
Injector: crawlDb: crawl-20091123183328/crawldb
Injector: urlDir: dmoz
Injector: Converting injected urls to crawl db entries.
Injector: Merging injected urls into crawl db.
Injector: done

Ubuntu Linux 8.04系统JAVA环境配置方法

十一月 23, 2009

1、首先安装JDK

  Java6 在命令行输入 apt-get install sun-Java6-jre sun-Java6-jdk 可以自行选择是否安装jre

  Java5 在命令行输入 apt-get install sun-Java5-jre sun-Java5-jdk

  多个jdk可以同时安装,而且可以随时更改当前的默认jdk

  在命令行输入 sudo update-alternatives –config Javac 来更改当前默认jdk

  sun-Java5-doc和sun-Java6-doc是jdk的文档的安装程序,但是不包括jdk文档。

  在安装jdk文档前,必须到sun的网站下载jdk文档。你下载的文档可以是任意语言,包括中文例如你 安装jdk5的文档,应该在安装前把下载的文档存放到/tmp/jdk-1_5_0-doc.zip;再例如你安装jdk6的文档,应该在安装前将下载的 文档存放到 /tmp/jdk-6-doc.zip。当然你也可以不在安装前做这件事情,因为安装的过程会提示你这么做,请注意看终端窗口的提示。

  写入环境变量 gedit /etc/environment

  在编辑器中加入 Java_HOME=/usr/lib/jvm/Java-6-sun

  classpath不必配置,错误的classpath会导致许多问题。jvm会知道自带的classpath。对于自定义的classpath,可以在运行、编译时加上-classpath参数来指定。至于环境变量,当你写的程序需要时再指定也不迟

  要使配置生效必须在命令行输入 . /etc/environment

  如果需要还要修改系统虚拟机的优先级顺序调整: sudo gedit /etc/jvm

  jdk安装完毕,在终端输入 Java -version 查看是否安装成功(或直接输入Java

  2、安装  

  从tomcat官方网站下载绿色的即可,解压后运行bin目录下的startup.sh

  在浏览器输入http://localhost:8080 查看tomcat是否启动

  要关闭tomcat只要运行bin目录下shutdown.sh即可

  如果需要tomcat开机启动可以参照一下方法:

  修改/etc/rc.local 文件 gedit /etc/rc.local 加入

  Java_HOME=/usr/lib/jvm/Java-6-sun

  CLASSPATH=.:/usr/lib/jvm/Java-6-sun/lib

  JRE_HOME=/usr/lib/jvm/Java-6-sun/jre

  export JRE_HOME

  export CLASSPATH

  export Java_HOME

  /home/allenwei/Tomcat/apache-tomcat-6.0.14/bin/startup.sh //你放tomcat的目录

  修改/etc/profile 输入 gedit /etc/profile,增加以下行:

  Java_HOME=/usr/lib/jvm/Java-6-sun

  CATALINA_HOME=/home/test/Tomcat/apache-tomcat-6.0.14

  export Java_HOME CATALINA_HOME

  重启系统后tomcat就可以随开机启动了

  3、安装mysql

  首先 sudo apt-get install mysql-server mysql-client

  安装完成后输入 sudo /etc/init.d/mysql start 启动mysql

  此时需要修改root的密码

  查看默认密码输入 sudo gedit /etc/mysql/debian.cnf 在[client]段可以看到用户名密码

  输入 mysql -uroot -p 登录,提示输入密码,输入在debian.cnf文件中的密码

  修改密码 输入 GRANT ALL PRIVILEGES ON *.* TO root@localhostIDENTIFIED BY “你要修改的密码”

  此时mysql 的安装完成

  您也可以安装mysql 的图形化管理工具 sudo apt-get mysql-admin mysql-query-browser

  4、eclipse 和 netbeans 的安装就很简单了,从官方网站上下载 deb包,安装即可。 

PHP URL query

十一月 21, 2009

parse_url

<?php
$url 
'http://username:password@hostname/path?arg=value#anchor';

print_r(parse_url($url));

echo parse_url($urlPHP_URL_PATH);
?>

The above example will output:

Array
(
    [scheme] => http
    [host] => hostname
    [user] => username
    [pass] => password
    [path] => /path
    [query] => arg=value
    [fragment] => anchor
)
<big><big><b>Http_build_query</b></big></big>
$data = array('foo'=>'bar',
              
'baz'=>'boom',
              
'cow'=>'milk',
              
'php'=>'hypertext processor');

echo http_build_query($data); // foo=bar&baz=boom&cow=milk&php=hypertext+processor
echo http_build_query($data'''&amp;'); // foo=bar&amp;baz=boom&amp;cow=milk&amp;php=hypertext+processor

?>

http_build_url

echo http_build_url("http://user@www.example.com/pub/index.php?a=b#files",
    array(
        
"scheme" => "ftp",
        
"host" => "ftp.example.com",
        
"path" => "files/current/",
        
"query" => "a=c"
    
),
    
HTTP_URL_STRIP_AUTH HTTP_URL_JOIN_PATH HTTP_URL_JOIN_QUERY HTTP_URL_STRIP_FRAGMENT
);
?>

The above example will output:

ftp://ftp.example.com/pub/files/current/?a=b&a=c<br />

Doxygen 使用

十一月 21, 2009

ubuntu中

sudo apt-get install doxygen  doxygen-gui
启动GUI程序
doxywizard

Doxygen formatting conventions

Last modified: October 27, 2009 – 19:12

Doxygen is a documentation generation system. The documentation is extracted directly from the sources, which makes it much easier to keep the documentation consistent with the source code.

There is an excellent Doxygen manual at the Doxygen site. The following notes pertain to the Drupal implementation of Doxygen.

General documentation syntax

To document a block of code, the syntax we use is:

/**
 * Documentation here.
 */

Doxygen will parse any comments located in such a block. Our style is to use as few Doxygen-specific commands as possible, so as to keep the source legible. Any mentions of functions or file names within the documentation will automatically link to the referenced code, so typically no markup need be introduced to produce links.

Text formatting (and API parser conformance)

Doxygen directives

/**<br /> * Summary here; one sentence on one line (even if it exceeds 80 chars).<br /> *<br /> * A more detailed description goes here.<br /> *<br /> * A blank line forms a paragraph. There should be no trailing white-space<br /> * anywhere.<br /> *<br /> * @param $first<br /> *   "@param" is a Doxygen directive to describe a function parameter. Like some<br /> *   other directives, it takes a term/summary on the same line, and a<br /> *   description (this text) indented by 2 spaces on the next line. All<br /> *   descriptive text should wrap at 80 chars.<br /> *   Newlines are NOT supported within directives; if a newline would be before<br /> *   this text, it would be appended to the general description above.<br /> * @param $second<br /> *   There should be no newline between multiple directives of the same type.<br /> *<br /> * @return<br /> *   "@return" is a different Doxygen directive to describe the return value of<br /> *   a function, if there is any.<br /> */<br />

Lists

 * @param $variables<br /> *   An associative array containing:<br /> *   - tags: An array of labels for the controls in the pager:<br /> *     - first: A string to use for the first pager element.<br /> *     - last: A string to use for the last pager element.<br /> *   - element: An optional integer to distinguish between multiple pagers on<br /> *     one page.<br /> *   Any further description - still belonging to the same param.<br /> * This no longer belongs to the param. There should be a newline before this<br /> * paragraph, but it was left out for clarity.<br />

Lists can appear everywhere in Doxygen, but the documentation parser requires to follow a strict syntax to make them appear correctly in the parsed HTML output:

  • The list bullet/hyphen is aligned with (uses the same indentation level as) the paragraph before it, no newline before or after the list.
  • No newlines between list items.
  • Each list item starts with the key, followed by a colon, followed by a space, followed by the key description. The key description starts capitalized.
  • If a list item exceeds 80 chars, it needs to wrap, and the following lines need to be aligned with the key (intended by 2 more spaces).
  • If there should appear text after the list that still belongs to the block before the list, then it uses the same alignment/indentation as the initial text.
  • Again: within a Doxygen directive, blank lines are NOT supported.
  • Lists can appear within lists, and the same rules apply recursively.

Links

/**<br /> * @see foo_bar()<br /> * @see ajax.inc<br /> * @see MyModuleClass<br /> * @see <a href="http://drupal.org/node/1354" title="http://drupal.org/node/1354" rel="nofollow">http://drupal.org/node/1354</a><br /> */<br />

The @see directive may be used to link to (existing) functions, files, or URLs. @see directives should always be placed on an own line.

/**<br /> * See also @link group_name Link text @endlink<br /> */<br />

The @link directive may be used to output a HTML link, but also to link to Doxygen groups that are defined elsewhere via @defgroup.

Documenting files

It is good practice to provide a comment describing what a file does at the start of it. For example:

<?php<br />// $Id: theme.inc,v 1.202 2004/07/08 16:08:21 dries Exp $<br /><br />/**<br /> * @file<br /> * The theme system, which controls the output of Drupal.<br /> *<br /> * The theme system allows for nearly all output of the Drupal system to be<br /> * customized by user themes.<br /> */<br />

The line immediately following the @file directive is a summary that will be shown in the list of all files in the generated documentation. If the line begins with a verb, that verb should be in present tense, e.g., “Handles file uploads.” Further description may follow after a blank PHPDoc line.

In general, @file directives should not contain large descriptions, because those are better placed into @defgroup directives, so developers can look up high-level documentation by reading the “group” topic. See http://api.drupal.org/api/groups for a list of API topics/groups.

To add CVS ID-Tags to your file, add a // $Id$ to your file. CVS will automatically expand it to the format shown above. In the future, you don’t have to care about that as CVS will update these information automatically.

For .install files, the following template is used:

/**<br /> * @file<br /> * Install, update and uninstall functions for the XXX module.<br /> */<br />

Documenting functions

All functions that may be called by other files should be documented; private functions optionally may be documented as well. A function documentation block should immediately precede the declaration of the function itself, like so:

/**<br /> * Verifies the syntax of the given e-mail address.<br /> *<br /> * Empty e-mail addresses are allowed. See RFC 2822 for details.<br /> *<br /> * @param $mail<br /> *   A string containing an email address.<br /> *<br /> * @return<br /> *   TRUE if the address is in a valid format.<br /> */<br />function valid_email_address($mail) {<br />

The first line of the block should contain a brief description of what the function does, limited to 80 characters, and beginning with a verb in the form “Does such and such” (third person, as in “This function does such and such”, rather than second person imperative “Do such and such”). A longer description with usage notes should follow after a blank line, if more explanation is needed. Each parameter should be listed with a @param directive, with a description indented on the following line. After all the parameters, a @return directive should be used to document the return value if there is one. There is a blank line between the @param and @return directives.

Functions that are easily described in one line may omit these directives, as follows:

/**<br /> * Converts an associative array to an anonymous object.<br /> */<br />function array2object($array) {<br />

The parameters and return value must be described within this one-line description in this case.

Documenting hook implementations

Many modules consist largely of hook implementations. If the implementation is rather standard and does not require more explanation than the hook reference provides, a shorthand documentation form may be used:

/**<br /> * Implements hook_help().<br /> */<br />function blog_help($section) {<br />  // ...<br />}<br />

This generates a link to the hook reference, reminds the developer that this is a hook implementation, and avoids having to document parameters and return values that are the same for every implementation of the hook.

Documenting forms

/**<br /> * Form builder for the user login form.<br /> *<br /> * @param $msg<br /> *   The message to display.<br /> *<br /> * @see user_login_form_validate()<br /> * @see user_login_form_submit()<br /> * @ingroup forms<br /> */<br />function user_login_form(&$form_state, $msg = '')<br />

In order to provide a quick reference for themers, we tag all form builder functions so that Doxygen can group them together. The form builder function is defined as any function meant to be used as an argument for drupal_get_form(). To do this, add a grouping instruction to the documentation of the function. Additionally, while submit, validate and other handlers for the form are not meant to be in this group, you should provide a @see to provide an easy reference to handlers that are attached to the form.

/**<br /> * Form validation handler for user_login_form().<br /> *<br /> * @see user_login_form()<br /> * @see user_login_form_submit()<br /> */<br />function user_login_form_validate($form, &$form_state) {<br />  ...<br />}<br /><br />/**<br /> * Form submission handler for user_login_form().<br /> *<br /> * @see user_login_form()<br /> * @see user_login_form_validate()<br /> */<br />function user_login_form_submit($form, &$form_state) {<br />  ...<br />}<br />

Documenting themeable functions

In order to provide a quick reference for theme developers, we tag all themeable functions so that Doxygen can group them on one page. To do this, add a grouping instruction to the documentation of all such functions:

/**<br /> * Formats a query pager.<br /> *<br /> * ...<br /> *<br /> * @ingroup themeable<br /> */<br />function theme_pager($tags = array(), $limit = 10, $element = 0, $attributes = array()) {<br />  ...<br />}<br />

Documenting theme templates

If a template and a preprocess function is used instead of a theming function, an empty function definition of the theme function that is not used should be placed in the contributed documentation (contributions/docs/developer/theme.php).

The template itself should be documented with a @file directive and contain a list of the variables that the template_preprocess_HOOK has prepared for it. If any of these variables contain data that is unsafe to output for XSS reasons, they should be documented; otherwise it can be assumed that variables available have already been appropriately filtered. Anything not listed should not be assumed to be safe to output. It should also contain a @see directive to link back to the preprocessor and the theme_X function.

<?php<br />// $Id$<br /><br />/**<br /> * @file<br /> * Default theme implementation to display a list of forums.<br /> *<br /> * Available variables:<br /> * - $forums: An array of forums to display.<br /> *<br /> * Each $forum in $forums contains:<br /> * - $forum->is_container: Is TRUE if the forum can contain other forums. Is<br /> *   FALSE if the forum can contain only topics.<br /> * - $forum->depth: How deep the forum is in the current hierarchy.<br /> * - $forum->name: The name of the forum.<br /> * - $forum->link: The URL to link to this forum.<br /> * - $forum->description: The description of this forum.<br /> * - $forum->new_topics: True if the forum contains unread posts.<br /> * - $forum->new_url: A URL to the forum's unread posts.<br /> * - $forum->new_text: Text for the above URL which tells how many new posts.<br /> * - $forum->old_topics: A count of posts that have already been read.<br /> * - $forum->num_posts: The total number of posts in the forum.<br /> * - $forum->last_reply: Text representing the last time a forum was posted<br /> *   or commented in.<br /> *<br /> * @see template_preprocess_forum_list()<br /> */<br />

The template_preprocess_HOOK function should also contain appropriate @see directives.

Documenting contributed modules and themes

  • Don’t use @mainpage. There can be only one @mainpage in the contributions repository, which is reserved for an index page of all contributes modules and themes.
  • Use Doxygen Modules (@defgroup, @ingroup, @addtogroup, see “Limitations and hints” below) sparingly. There are currently over 2,200 module directories in contrib, many of them consisting of more than one module. If each of these modules used just one @defgroup, there would be more than 2,200 entries in the global Module list. If each used more than one …
  • If you do use Doxygen Modules, make sure you give them a unique namespace, which would be your module’s name. E.g. @defgroup views ... for the views.module, @defgroup views_ui ... for the views_ui.module. Don’t use group names which are defined in Drupal core (hooks, themeable, file, batch, database, forms, form_api, format, image, validation, search, etc.).

A recommended way of using Doxygen grouping in contributed modules and themes is the following:

/**<br /> * @defgroup example Example module functionality<br /> * @{<br /> * Longer description of your module's API.<br /> */<br /><br />/**<br /> * Load an example.<br /> * ...<br /> */<br />function example_load() ...<br /><br />/**<br /> * Save an example.<br /> * ...<br /> */<br />function example_save() ...<br /><br />/**<br /> * @} End of "defgroup example".<br /> */<br />

This defines the primary Doxygen group. The syntax is @defgroup [internal_name] [Summary]. The internal name has to be prefixed with the module name (unless it is located in Drupal core’s include files).

Other functions in other files (or even entirely different modules) can then declare @ingroup example to put themselves into the same group.

Limitations and hints

Drupal’s Doxygen processing module, api.module, currently only supports a small subset of all Doxygen commands and makes some assumptions about the formatting of the source. Code to be processed by api.module is advised to stick to these conventions.

Api.module currently supports only one of Doxygen’s three grouping mechanisms: Modules (@defgroup, @ingroup, @addtogroup, @{, @}). When using those, please note the following:

  • Modules work at a global level, creating a new page for each group. They should be used only to group functions that provide some kind of API, which possibly spans multiple files. Or the other way round: they should not be used to group functions in a file when these functions are only used in that very file. Thats what Member Groups are for (which unfortunately aren’t supported by api.module yet).
  • @defgroups can be defined only once – trying to define a second @defgroup name with a name already used will result in an error. Use @defgroup name in the “most important” section/file of that group and add to it from other places with @addtogroup / @ingroup.
  • The name in @defgroup name Explaination of that group must be single-word identifier, like a PHP variable or function name. Or, as regular expression: [a-zA-Z_][a-zA-Z0-9_]*. Dots, hyphens, etc. are not allowed.

To see how a real Doxygen processes and displays the current Drupal code documentation (both core and contrib), have a look at ax’ Drupal site. Especially, look at the “doxygen error logs” and help improving Drupals code documentation.

  

Doxygen常用指令介绍

Doxygen注释风格

1 类的申明
/**
*   class declaration
*/

2 变量申明
/**< val’s brief. val’s details1. */

3 函数申明
  /**
  * a normal member taking two arguments and returning an integer value.
  * @param a an integer argument.
  * @param s a constant character pointer.
  * @see Test()
  * @see ~Test()
  * @see testMeToo()
  * @see publicVar()
  * @return The test results
  */

 
@file
档案的批注说明。
@author
作者的信息
@brief
用于class function的简易说明
eg
@brief 本函数负责打印错误信息串
@param
主要用于函数说明中,后面接参数的名字,然后再接关于该参数的说明
@return
描述该函数的返回值情况
eg:
@return 本函数返回执行结果,若成功则返回TRUE,否则返回FLASE
@retval
描述返回值类型
eg:
@retval NULL 空字符串。
@retval !NULL 非空字符串。
注解
@attention
注意
@warning
警告信息
@enum
引用了某个枚举,Doxygen会在该枚举处产生一个链接
eg
@enum CTest::MyEnum
@var
引用了某个变量,Doxygen会在该枚举处产生一个链接
eg
@var CTest::m_FileKey
@class
引用某个类,
格式:@class <name> [<header-file>] [<header-name>]
eg:
@class CTest “inc/class.h”
@exception
可能产生的异常描述
eg:
@exception 本函数执行可能会产生超出范围的异常

ubuntu Java choose

十一月 18, 2009

Please do not make any edits to this article. Its contents are currently under review and being merged with the Ubuntu Server Guide. To find the Ubuntu Server Guide related to your specific version, please go to:

Sun Microsystems developed Java, which is many things depending on who you ask. It is a language, and an execution environment and probably many more things. On this page Java refers to the software that executes programs compiled to Java byte codes (akin to machine language).

Running Java under Ubuntu

In order to run Java programs and Java applets, you must have a Java environment installed. The GCJ flavor of Java is installed as default, and is usually fine for most purposes. If it is not installed, JavaInstallation describes how to install some opensource flavors of Java. You may, however, have a need to run the Sun flavor of Java if something does not work correctly.

To get Sun Java under Ubuntu 7.04 or later running on Intel or PowerPC platform, you should enable the Universe repository in Add/Remove programs, and install either the openjdk-6-jre package or the sun-java6-bin package. (Note: PowerPC version is slow).

To get Sun Java under Ubuntu 6.06 or 6.10 running on Intel x86 platform, you should enable the Universe repository in Add/Remove programs, and install the sun-java5-bin package.

Note: The same commands will work under Xubuntu/Kubuntu (using Add/Remove or the Adept Package Installer).

Choosing the default Java to use

Just installing new Java flavours does not change the default Java pointed to by /usr/bin/java. You must explicitly set this:

  • Open a Terminal window
  • Run sudo update-java-alternatives -l to see the current configuration and possibilities.

  • Run sudo update-java-alternatives -s XXXX to set the XXX java version as default. For Sun Java 6 this would be sudo update-java-alternatives -s java-6-sun

  • Run java -version to ensure that the correct version is being called.

You can also use the following command to interactively make the change;

  • Open a Terminal window
  • Run sudo update-alternatives --config java

  • Follow the onscreen prompt

不用iconv函数实现UTF-8编码转换GB2312的PHP函数

十一月 16, 2009
发布者:IT柏拉图

如果使用 iconv() 函数转换编码就相比比较简单了,不过很多虚拟主机里并不支持这个组件,我在网上找半天,才找到一个gb2312转utf-8的方法,但不能逆向转换。

这个函数如下:

/*******************************
//GB转UTF-8编码
*******************************/
function gb2utf8($gbstr) {
 global $CODETABLE;
 if(trim($gbstr)==”") return $gbstr;
 if(empty($CODETABLE)){
  $filename = dirname(__FILE__).”/gb2312-utf8.table”;
  $fp = fopen($filename,”r”);
  while ($l = fgets($fp,15))
  { $CODETABLE[hexdec(substr($l, 0, 6))] = substr($l, 7, 6); }
  fclose($fp);
 }
 $ret = “”;
 $utf8 = “”;
 while ($gbstr) {
  if (ord(substr($gbstr, 0, 1)) > 127) {
   $thisW = substr($gbstr, 0, 2);
   $gbstr = substr($gbstr, 2, strlen($gbstr));
   $utf8 = “”;
   @$utf8 = u2utf8(hexdec($CODETABLE[hexdec(bin2hex($thisW)) - 0x8080]));
   if($utf8!=”"){
    for ($i = 0;$i < strlen($utf8);$i += 3)
     $ret .= chr(substr($utf8, $i, 3));
   }
  }
  else
  {
   $ret .= substr($gbstr, 0, 1);
   $gbstr = substr($gbstr, 1, strlen($gbstr));
  }
 }
 return $ret;
}
//Unicode转utf8
function u2utf8($c) {
 for ($i = 0;$i < count($c);$i++)
  $str = “”;
 if ($c < 0×80) {
  $str .= $c;
 } else if ($c < 0×800) {
  $str .= (0xC0 | $c >> 6);
  $str .= (0×80 | $c & 0×3F);
 } else if ($c < 0×10000) {
  $str .= (0xE0 | $c >> 12);
  $str .= (0×80 | $c >> 6 & 0×3F);
  $str .= (0×80 | $c & 0×3F);
 } else if ($c < 0×200000) {
  $str .= (0xF0 | $c >> 18);
  $str .= (0×80 | $c >> 12 & 0×3F);
  $str .= (0×80 | $c >> 6 & 0×3F);
  $str .= (0×80 | $c & 0×3F);
 }
 return $str;
}

因为gb2312都是双字节的,因此转换为utf-8就相对比较简单,但反之有很麻烦了,我尝试了一下:

这样

function utf82gb($utfstr)
{
 global $UC2GBTABLE;
 $okstr = “”;
 if(trim($utfstr)==”") return $utfstr;
 if(empty($UC2GBTABLE)){
  $filename = dirname(__FILE__).”/gb2312-utf8.table”;
  $fp = fopen($filename,”r”);
  while($l = fgets($fp,15))
  { $UC2GBTABLE[hexdec(substr($l, 7, 6))] = hexdec(substr($l, 0, 6));}
  fclose($fp);
 }
 $ulen = strlen($utfstr);
 for($i=0;$i<$ulen;$i++)
 {
  if(ord($utfstr[$i])<0×81) $okstr .= $utfstr[$i];
  else
  {
   if($ulen>$i+2)
   {
    $utfc = substr($utfstr,$i,3);
    $c = “”;
    @$c = dechex($UC2GBTABLE[utf82u_3($utfc)]+0×8080);
    if($c!=”"){
       $okstr .= chr(hexdec($c[0].$c[1])).chr(hexdec($c[2].$c[3]));
    }
   }
   else
   { $okstr .= $utfstr[$i]; }
  }
  }
  $okstr = trim($okstr);
  return $okstr;
}

function utf82u_3($c)
{
      $n = (ord($c[0]) & 0×1f) << 12;
      $n += (ord($c[1]) & 0×3f) << 6;
      $n += ord($c[2]) & 0×3f;
      return $n;
}

按这种方法,大部份字符也算是能转换成功的了,不过总是有点不妥之处,我把程序改成这样子:

function utf82gb($utfstr)
{
 global $UC2GBTABLE;
 $okstr = “”;
 if(trim($utfstr)==”") return $utfstr;
 if(empty($UC2GBTABLE)){
  $filename = dirname(__FILE__).”/gb2312-utf8.table”;
  $fp = fopen($filename,”r”);
  while($l = fgets($fp,15))
  { $UC2GBTABLE[hexdec(substr($l, 7, 6))] = hexdec(substr($l, 0, 6));}
  fclose($fp);
 }
 $okstr = “”;
 $utfstr = urlencode($utfstr);
 $ulen = strlen($utfstr);
 for($i=0;$i<$ulen;$i++)
 {
  if($utfstr[$i]==”%”)
  {
   if($ulen>$i+2){
    $hexnext = hexdec(“0x”.substr($utfstr,$i+1,2));
    if($hexnext<127){
     $okstr .= chr($hexnext);
     $i = $i+2;
    }
    else{
     if($ulen>=$i+9){
      $hexnext = substr($utfstr,$i+1,8);
      $c = “”;
      @$c = dechex($UC2GBTABLE[url_utf2u($hexnext)]+0×8080);
      if($c!=”"){
        $okstr .= chr(hexdec($c[0].$c[1])).chr(hexdec($c[2].$c[3]));
      }
      $i = $i+8;
     }
    }
   }
   else
   { $okstr .= $utfstr[$i]; }
  }
  else if($utfstr[$i]==”+”)
   $okstr .= ” “;
  else
   $okstr .= $utfstr[$i];
 }
 $okstr = trim($okstr);
 return $okstr;
}
//三字节的URL编码转成的utf8字符转为unicode编码
function url_utf2u($c)
{
 $utfc = “”;
 $cs = split(“%”,$c);
 for($i=0;$i<count($cs);$i++){
  $utfc .= chr(hexdec(“0x”.$cs[$i]));
 }
 $n = (ord($utfc[0]) & 0×1f) << 12;
  $n += (ord($utfc[1]) & 0×3f) << 6;
  $n += ord($utfc[2]) & 0×3f;
 return $n;
}

一测试,发现完全OK,而且速度居然比上一个方法要快,我真是搞不懂这是什么原因了

谁要 gb2312-utf8.table 这个文件请加我的QQ 2500875 IT柏拉图 或与 1877000 泡泡 联系

URL Encoding

十一月 7, 2009

(or: ‘What are those “%20″ codes in URLs?’)
= Index DOT Html by Brian Wilson =

Main Index | Element Index | Element Tree | HTML Support History
RFC 1738 | Which characters must be encoded and why
How to URL encode characters | URL encode a character

RFC 1738: Uniform Resource Locators (URL) specification

The specification for URLs (RFC 1738, Dec. ‘94) poses a problem, in that it limits the use of allowed characters in URLs to only a limited subset of the US-ASCII character set:
“…Only alphanumerics [0-9a-zA-Z], the special characters “$-_.+!*’(),” [not including the quotes - ed], and reserved characters used for their reserved purposes may be used unencoded within a URL.”

HTML, on the other hand, allows the entire range of the ISO-8859-1 (ISO-Latin) character set to be used in documents – and HTML4 expands the allowable range to include all of the Unicode character set as well. In the case of non-ISO-8859-1 characters (characters above FF hex/255 decimal in the Unicode set), they just can not be used in URLs, because there is no safe way to specify character set information in the URL content yet [RFC2396.]

URLs should be encoded everywhere in an HTML document that a URL is referenced to import an object (A, APPLET, AREA, BASE, BGSOUND, BODY, EMBED, FORM, FRAME, IFRAME, ILAYER, IMG, ISINDEX, INPUT, LAYER, LINK, OBJECT, SCRIPT, SOUND, TABLE, TD, TH, and TR elements.)

What characters need to be encoded and why?


ASCII Control characters
     Why: These characters are not printable.
Characters: Includes the ISO-8859-1 (ISO-Latin) character ranges 00-1F hex (0-31 decimal) and 7F (127 decimal.)
Non-ASCII characters
     Why: These are by definition not legal in URLs since they are not in the ASCII set.
Characters: Includes the entire “top half” of the ISO-Latin set 80-FF hex (128-255 decimal.)
“Reserved characters”
     Why: URLs use some characters for special use in defining their syntax. When these characters are not used in their special role inside a URL, they need to be encoded.
Characters:
Character Code
Points
(Hex)
Code
Points
(Dec)
 Dollar (“$”)
 Ampersand (“&”)
 Plus (“+”)
 Comma (“,”)
 Forward slash/Virgule (“/”)
 Colon (“:”)
 Semi-colon (“;”)
 Equals (“=”)
 Question mark (“?”)
 ’At’ symbol (“@”)
24
26
2B
2C
2F
3A
3B
3D
3F
40
36
38
43
44
47
58
59
61
63
64
“Unsafe characters”
     Why: Some characters present the possibility of being misunderstood within URLs for various reasons. These characters should also always be encoded.
Characters:
Character Code
Points
(Hex)
Code
Points
(Dec)
Why encode?
Space 20 32 Significant sequences of spaces may be lost in some uses (especially multiple spaces)
Quotation marks
‘Less Than’ symbol (“<”)
‘Greater Than’ symbol (“>”)
22
3C
3E
34
60
62
These characters are often used to delimit URLs in plain text.
‘Pound’ character (“#”) 23 35 This is used in URLs to indicate where a fragment identifier (bookmarks/anchors in HTML) begins.
Percent character (“%”) 25 37 This is used to URL encode/escape other characters, so it should itself also be encoded.
Misc. characters:
   Left Curly Brace (“{“)
   Right Curly Brace (“}”)
   Vertical Bar/Pipe (“|”)
   Backslash (“\”)
   Caret (“^”)
   Tilde (“~”)
   Left Square Bracket (“[")
   Right Square Bracket ("]“)
   Grave Accent (“`”)
7B
7D
7C
5C
5E
7E
5B
5D
60
123
125
124
92
94
126
91
93
96
Some systems can possibly modify these characters.

How are characters URL encoded?


URL encoding of a character consists of a “%” symbol, followed by the two-digit hexadecimal representation (case-insensitive) of the ISO-Latin code point for the character.
Example
  • Space = decimal code point 32 in the ISO-Latin set.
  • 32 decimal = 20 in hexadecimal
  • The URL encoded representation will be “%20″

URL encoding converter


The box below allows you to convert content between its unencoded and encoded forms. The initial input state is considered to be “unencoded” (hit ‘Convert’ at the beginning to start in the encoded state.) Further, to allow actual URLs to be encoded, this little converter does not encode URL syntax characters (the “;”, “/”, “?”, “:”, “@”, “=”, “#” and “&” characters)…if you also need to encode these characters for any reason, see the “Reserved characters” table above for the appropriate encoded values.

NOTE:
This converter uses the String.charCodeAt and String.fromCharCode functions, which are only available in Javascript version 1.2 or better, so it doesn’t work in Opera 3.x and below, Netscape 3 and below, and IE 3 and below. Browser detection can be tiresome, so this will just fail in those browsers…you have been warned. 8-}


No
Encoding
URL-Safe
Encoding

Browser Peculiarities


  • Internet Explorer is notoriously relaxed in its requirements for encoding spaces in URLs. This tends to contribute to author sloppiness in authoring URLs. Keep in mind that Netscape and Opera are much more strict on this point, and spaces MUST be encoded if the URL is to be considered to be correct.

PHP Curl Upload image

十一月 6, 2009

index.htm
<!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Strict//EN”  
“http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd”>
<html xmlns=”http://www.w3.org/1999/xhtml”>
<head>
<title></title>
</head>
<body>  
<form action=”imagem.php” method=”post” enctype=’multipart/form-data’>      
<input type=”file” id=”foto” name=”foto” />      
<input type=”submit” value=”Enviar” />  
</form>
</body>
</html>

imagem.php
<?php
preg_match(“/\.(gif|bmp|png|jpg|jpeg){1}$/i”, $_FILES['foto']['name'], $ext);
$imagem_nome = md5(uniqid(time())) . “.” . $ext[1];
rename($_FILES['foto']['tmp_name'], “/tmp/” . $imagem_nome);

$postData = array();
$postData['fileupload'] = “@/tmp/” . $imagem_nome;
$postData['submit']     = “Submit”;
$postData['key']         = “xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx”;
$postData['rembar']     = “yes”;
$postData['xml']        = “yes”;
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, “http://www.imageshack.us/index.php”);
curl_setopt($ch, CURLOPT_POST, true );
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_TIMEOUT, 240);curl_setopt($ch, CURLOPT_POSTFIELDS, $postData );

$response = curl_exec( $ch );
curl_close($ch );

echo $response;
?>

Upload files using curl + php

// upload.php
<form action="upload.php">
<input name="cert_file" value="" type="file"/>

<input class="button" name="submit" value="Save" type="submit"/>
</form>

<?php

 // Initialise cURL session
     $curl = curl_init();

         $filename = $_POST[cert_file] ;
        $size = filesize($filename);
        $file = fopen($filename,'r');
        list(,$destinationFilename) = pathinfo($filename);

        $data['upload'] = "@".$filename;
        $url = "http://x.com/{$destinationFilename}";

        curl_setopt($curl,CURLOPT_URL,$url);
        //curl_setopt($curl,CURLOPT_PUT,true);
        curl_setopt($curl,CURLOPT_POST,1);
        curl_setopt($curl,CURLOPT_POSTFIELDS,$data);

        // present the filesize of the file we're putting
        curl_setopt($curl,CURLOPT_INFILESIZE,$size);

        // load the file in by its resource handle
        curl_setopt($curl,CURLOPT_INFILE,$file);

        // Place a nice friendly user-agent
        //curl_setopt($curl,CURLOPT_USERAGENT,"Mozilla/4.0");

        // return the output instead of displaying it
        curl_setopt($curl,CURLOPT_RETURNTRANSFER,true);

         // execute, and log the result to curl_put.log
        $result = curl_exec($curl);
        $error = curl_error($curl);

?>

PHP $_FILES详解

十一月 6, 2009
文件上传表单

<form enctype="multipart/form-data" action="URL" method="post">
<input type="hidden" name="MAX_FILE_SIZE" value="1000">
<input name="myFile" type="file">
<input type="submit" value="上传文件">
</form>

 

$_FILES数组内容如下:

$_FILES['myFile']['name']   客户端文件的原名称。
$_FILES['myFile']['type']   文件的 MIME 类型,需要浏览器提供该信息的支持,例如”image/gif”。
$_FILES['myFile']['size']   已上传文件的大小,单位为字节。
$_FILES['myFile']['tmp_name']   文件被上传后在服务端储存的临时文件名,一般是系统默认。可以在php.ini的upload_tmp_dir 指定,但 用 putenv() 函数设置是不起作用的。
$_FILES['myFile']['error']   和该文件上传相关的错误代码。['error'] 是在 PHP 4.2.0 版本中增加的。下面是它的说明:(它们在PHP3.0以后成了常量)
  UPLOAD_ERR_OK
    值:0; 没有错误发生,文件上传成功。
  UPLOAD_ERR_INI_SIZE
    值:1; 上传的文件超过了 php.ini 中 upload_max_filesize 选项限制的值。
  UPLOAD_ERR_FORM_SIZE
    值:2; 上传文件的大小超过了 HTML 表单中 MAX_FILE_SIZE 选项指定的值。
  UPLOAD_ERR_PARTIAL
    值:3; 文件只有部分被上传。
  UPLOAD_ERR_NO_FILE
    值:4; 没有文件被上传。
    值:5; 上传文件大小为0.

文件被上传结束后,默认地被存储在了临时目录中,这时您必须将它从临时目录中删除或移动到其它地方,如果没有,则会被删除。也就是不管是否 上传成功,脚本执行完后临时目录里的文件肯定会被删除。所以在删除之前要用PHP的 copy() 函数将它复制到其它位置,此时,才算完成了上传文件过程。

Ubuntu 9.10 T61键盘中的鼠标中键失灵

十一月 6, 2009

1. $sudo vi etc/etc/hal/fdi/policy/mouse-wheel.fdi
如果没有创建新的。
这里vi可以是emacs,geidt等任何编辑器

2.复制下面的内容
<?xml version=”1.0″ encoding=”UTF-8″?>
<match key=”info.product” string=”TPPS/2 IBM TrackPoint”>

<merge key=”input.x11_options.EmulateWheel” type=”string”>true</merge>
<merge key=”input.x11_options.EmulateWheelButton” type=”string”>2</merge>
<merge key=”input.x11_options.XAxisMapping” type=”string”>6 7</merge>
<merge key=”input.x11_options.YAxisMapping” type=”string”>4 5</merge>
<merge key=”input.x11_options.ZAxsisMapping” type=”string”>4 5</merge>
<merge key=”input.x11_options.Emulate3Buttons” type=”string”>true</merge>
</match>

重启hal和gdm服务:

$ sudo /etc/init.d/hal restart
$ sudo /etc/init.d/gdm restart

$sudo service hal restart

$sudo service gdm restart