聊天GPT视力API和flutter

我想在我的潜水应用程序（https://www.divespotapp.com/）中添加一个新功能，即能够分析图像并找出其中的鱼类或海洋生物，这将非常酷。但是我不知道如何在不创建自己的神经网络的情况下实现这一点，而且我没有资源、资金或知识来做到这一点，但是Chat GPT有一个非常棒的新Vision API，可以理解任何图像。我将其集成到了一个Flutter应用程序中，如下所示。在flutlab上运行此代码：https://flutlab.io/editor/ae0ae36e-490c-4b19-bb8a-6db6f079842f 或者在我的GitHub上查看：https://github.com/benSmith1981/DiveSpotFlutter

请阅读他们的Vision API，链接为https://platform.openai.com/docs/guides/vision，它不仅了解图片，还能提供关于照片内容的详尽信息。我在一个简单的Flutter应用中使用了它来扫描潜水照片，并返回了内容。

为了使用它，您需要一个付费的聊天GPT账户，以便可以访问GPT 4。然后，您需要前往以下网址 https://platform.openai.com/api-keys 创建您自己的API密钥，并将其插入到下面所示的Flutter代码中的“YOUR_API_KEY”处。让我们来分解一下用于检测鱼的代码：

  Future<void> detectFish() async {
    if (_image == null) return; // Do nothing if no image is selected

    setState(() {
      responseText = "Detecting..."; // Update UI to show detecting state
    });

    // Directly read bytes from the file and encode them to Base64
    String imageBase64 = base64Encode(await _image!.readAsBytes());

    try {
      var response = await http.post(
        Uri.parse('https://api.openai.com/v1/chat/completions'),
        headers: {
          'Content-Type': 'application/json',
          'Authorization': 'Bearer YOUR_API_KEY', // Replace with your actual API Key
        },
        body: json.encode({
          'model': 'gpt-4-vision-preview', // Specify the model, replace with the actual model you want to use
          'messages': [
            {'role': 'system', 'content': 'You are a helpful assistant, capable of identifying fish and sea creatures in images.'},
            {'role': 'user', 
            "content": [
                {
                  "type": "text",
                  // "text": "What fish can you detect in thailand and sharks?"

                  "text": "What fish or sea creature (could be a shark, turtle whale any animal you find in the sea) can you detect in this image? Respond in json only. Assuming JSON content starts with '{' so we can parse it. Should have 'fish' key that shows and array dictionary of containing 'name', 'species', 'description', 'location', 'endangered', for each type of fish"
                },
                {
                  "type": "image_url",
                  "image_url": {
                    // "url": "https://cms.bbcearth.com/sites/default/files/2020-12/2fdhe0000001000.jpg",
                    "url": 'data:image/jpeg;base64,'+imageBase64
                  }
                }
              ]
            }
          ],
          'max_tokens': 1000 // Increase this value as needed

        }),
      );

      if (response.statusCode == 200) {
        setState(() {
          var data = json.decode(response.body);
          var contentString = data['choices']?.first['message']['content'] ?? '';

          // Find the start and end of the JSON content within the 'content' string
          int jsonStartIndex = contentString.indexOf('{');
          int jsonEndIndex = contentString.lastIndexOf('}');

          if (jsonStartIndex != -1 && jsonEndIndex != -1) {
            var jsonString = contentString.substring(jsonStartIndex, jsonEndIndex + 1);
            var contentData = json.decode(jsonString);

            // Process the extracted JSON data
            responseText = contentData.toString(); // Or process as needed
            List<dynamic> fishDetails = contentData['fish'] ?? [];
            fishList = fishDetails.map((f) => "Name: ${f['name']}, Species: ${f['species']}, Description: ${f['description']}").toList();
          } else {
            responseText = 'No valid JSON content found';
          }
        });

      }  else {
        setState(() {
          responseText = "Error: ${response.statusCode}";
        });
      }
    } catch (e) {
      setState(() {
        responseText = "Error: ${e.toString()}";
      });
    }
  }

我们正在测试用户是否首先添加了照片，如果是的话，我们需要将其转换为base64格式，以便可以发送给视觉API。要获取照片，需要要求用户从他们的相册中添加一张图片。

  Future<void> pickImage() async {
    final pickedFile = await _picker.pickImage(source: ImageSource.gallery);
    if (pickedFile != null) {
      setState(() {
        _image = File(pickedFile.path); // Set the _image File
      });
    }
  }

下一部分是一个HTTP请求，它与他们文档网站上的Python示例类似，我们需要向https://api.openai.com/v1/chat/completions发送POST请求，然后在头部设置一些内容，如API密钥、我们想要使用的GPT模型，以及内容、文本和图像。

请注意我的文本提示：“在这张图片中，你能找到什么鱼类或海洋生物（可能是鲨鱼、海龟、鲸等你在海洋中看到的任何动物）？只能以JSON格式进行回应。假设JSON内容以‘{’开头，以便我们可以解析它。每种鱼都应包含“fish”键，显示一个包含“name”、“species”、“description”、“location”和“endangered”的数组字典。” 我已经要求ChatGPT回复一个特定的JSON，以便我们可以解析它（这可能有点取巧，但这样做很酷，因为我们可以对数据进行特定操作，而且非常详细）。

最后一部分是关于解码响应的。我们需要获取返回的内容data['choices']?.first['message']['content']，然后我们基本上需要获取JSON字符串起始处'{'和结束处'}'之间的子字符串，一旦我们完成这一步，我们实际上可以解析JSON字符串。

最后一部分是将解析后的响应添加到我们的小部件中，我们基本上是将返回的数据显示在列表视图中。


  @override
  Widget build(BuildContext context) {
    return Scaffold(
      appBar: AppBar(
        title: Text('Fish Detector1'),
      ),
      body: Column(
        children: [
          ElevatedButton(
            onPressed: pickImage, // Button to pick the image
            child: Text('Pick Image'),
          ),
          if (_imageBytes != null) Image.memory(_imageBytes!),
          if (_image != null) Image.file(_image!), // Display the selected image
          ElevatedButton(
            onPressed: detectFish, // Button to detect fish
            child: Text('Detect Fish'),
          ),
          Expanded(
            child: ConstrainedBox(
              constraints: BoxConstraints(
                minHeight: 500, // Adjust as needed
              ),
              child: SingleChildScrollView(
                child: Text(responseText),
              ),
            ),
          ),


          Expanded(
            child: ListView.builder(
              itemCount: fishList.length,
              itemBuilder: (context, index) {
                return ListTile(
                  title: Text(fishList[index]),
                );
              },
            ),
          ),
        ],
      ),
    );
  }

而且就是这样了。当然，你可以更改文本提示来检测任何事物。但是，我希望将其添加到我的潜水应用程序中，以帮助潜水员鉴定鱼类，所以我将它设定为特定于海洋。

API的成本

请记住，Chat GPT 4 的 Vision API 是存在成本的。您可以在此处计算这些成本：https://openai.com/pricing#language-models。然而，鉴于图像尺寸是我使用的图像（如下所示）为 1170 × 1071 像素，即约为 1 MP，且细节水平高，以下是如何计算成本：

1. 由于图像最短边小于2048，因此无需调整大小。 2. 然后将图像缩小为最短边为768px的长度。 3. 计算覆盖图像区域所需的512px瓷砖数量。对于一张1170 × 1071的图片，在缩放后（假设为768x768以简化计算），宽度和高度各需要2个瓷砖。 4. 每个瓷砖的成本为170代币，所以4个瓷砖的成本为（170乘以4等于680）代币。 5. 加上高细节图像的基本成本85代币。