首页 > Golang > Go问答

为何字符串的范围和订阅操作会导致不同的结果？

来源：stackoverflow

时间：2024-03-18 18:36:30 431浏览收藏

在 Go 语言中，字符串的范围和订阅操作会产生不同的结果。通过范围获取字符串中的字符时，类型为 `int32`（`rune`），而通过订阅获取时，类型为 `uint8`（`byte`）。这是因为范围操作迭代 Unicode 代码点，而订阅操作只获取第一个字节。对于 ASCII 字符串，两种方法的结果相同，但对于非 ASCII 字符串，订阅操作可能导致数据丢失，因为一个字符可能需要多个字节。因此，在处理非 ASCII 字符串时，建议使用范围操作。

问题内容

import (
    "fmt"
    "reflect"
)
func main() {
    s := "hello" // Same results with s := "世界"
    for _, x := range s {
            kx := reflect.ValueOf(x).Kind()
            fmt.Printf("Type of x is %v\n", kx)
            break
    }
    y := s[0]
    ky := reflect.ValueOf(y).Kind()
    fmt.Printf("Type of y is %v\n", ky)
}
// Type of x is int32
// Type of y is uint8

我很惊讶地发现，如果我使用字符串订阅而不是通过范围获取它，我会得到不同的类型。

编辑：我刚刚意识到，即使 s 是一个 unicode 字符串，y 的类型始终是字节。这也意味着对字符串进行索引是不安全的，除非它是 ascii 字符串。

解决方案

对于带有范围子句的语句：(Link)

对于字符串值，“range”子句从字节索引 0 开始迭代字符串中的 unicode 代码点。在连续迭代中，索引值将是连续 utf-8 编码的第一个字节的索引字符串中的代码点，rune 类型的第二个值将是相应代码点的值。如果迭代遇到无效的 utf-8 序列，则第二个值将为 0xfffd（unicode 替换字符），并且下一次迭代将在字符串中前进一个字节。

现在让我们看看类型：(Link)

// byte is an alias for uint8 and is equivalent to uint8 in all ways. it is
// used, by convention, to distinguish byte values from 8-bit unsigned
// integer values.
type byte = uint8

// rune is an alias for int32 and is equivalent to int32 in all ways. it is
// used, by convention, to distinguish character values from integer values.
type rune = int32

因此，这解释了为什么 int32 用于 rune，而 uint8 用于 byte。

这里有一些代码可以阐明这一点。我添加了一些代码并更改了字符串以使其更好。我希望这些评论是不言自明的。另外，我还建议阅读：https://blog.golang.org/strings。

package main

import (
    "fmt"
    "reflect"
)

func main() {
    // changed the string for better understanding
    // each character is not of single byte
    s := "日本語"

    // range over the string, where x is a rune
    for _, x := range s {
        kx := reflect.valueof(x).kind()
        fmt.printf(
            "type of x is %v (%c)\n",
            kx,
            x, // expected (rune)
        )
        break
    }

    // indexing (first byte of the string)
    y := s[0]
    ky := reflect.valueof(y).kind()
    fmt.printf(
        "type of y is %v (%c)\n",
        ky,
        y,
        /*
            uh-oh, not expected. we are getting just the first byte
            of a string and not the full multi-byte character.
            but we need '日' (3 byte character).
        */

    )

    // indexing (first rune of the string)
    z := []rune(s)[0]
    kz := reflect.valueof(z).kind()
    fmt.printf(
        "type of z is %v (%c)\n",
        kz,
        z, // expected (rune)
    )
}

示例输出：

Type of x is int32 (日)
Type of y is uint8 (æ)
Type of z is int32 (日)

注意：如果您的终端没有显示相同的输出；字符编码设置可能存在一些问题。因此，改变这一点可能会有所帮助。

好了，本文到此结束，带大家了解了《为何字符串的范围和订阅操作会导致不同的结果？》，希望本文对你有所帮助！关注golang学习网公众号，给大家分享更多Golang知识！

声明：本文转载于：stackoverflow 如有侵犯，请联系study_golang@163.com删除